Sunstone - unable to show (physical) host information

Hi,

We’re experiencing something werid (again…) with the OpenNebula Sunstone.

First, let me show you the commandline equivilant of what I’m doing (just to make sure it’s all clear):

As a first step welist all hosts:

$ onehost list
  ID NAME            CLUSTER   RVM      ALLOCATED_CPU      ALLOCATED_MEM STAT
   9 sf01.**** Cluster     0       0 / 800 (0%)    0K / 31.3G (0%) on
  11 sf02.**** Cluster     0       0 / 800 (0%)    0K / 31.3G (0%) on
  12 sf03.**** Cluster     0       0 / 800 (0%)    0K / 31.3G (0%) on
  13 sf04.**** Cluster     0       0 / 800 (0%)    0K / 31.3G (0%) on

Then we show the information regarding one specific host:

$ onehost show 9
HOST 9 INFORMATION
ID                    : 9
NAME                  : sf01.****
CLUSTER               : Cluster
STATE                 : MONITORED
IM_MAD                : kvm
VM_MAD                : kvm
VN_MAD                : dummy
LAST MONITORING TIME  : 03/17 12:44:18

(and so on)

Up to this point it all works nicely.

However, when we use the Sunstone interface the following happens:

  1. The host listing is shown, instantly + update status and all.
  2. When we now click a host to show the specific information we get the loading page and at that point nothing happens… It’s just stuck.

Looking at the logging we can see that the Get request comes through to the Sunstone. However we don’t get any erorrs. Monitoring the ond.log also doesn’t give anything to work with.

The funny part however is that when we first add a host (using sunstone), and immediatly disable that host, then we can view the host info (if we’re quick). Hosts that have been monitored can’t be viewed (even after disabling).

At first we thought it was our firewall. But all opennebula components are on the same machine + if we’re quick we can get some information (eventually). But after a host is properly added (thus, status on) it will simply fail to display in the sunstone.

Does anyone know where I should start looking? Debug logging is enabled for sunstone and oned but we don’t get any information regarding errors (seems like the requests disappears in /dev/null or something…).

Any insight on this matter would be greatly appreciated!!!

Thanks in advance,

Bart

Maybe the full hos tinformation is important, below you’ll find the host along with all it’s attributes.

$ onehost show 9
HOST 9 INFORMATION
ID                    : 9
NAME                  : sf01.****
CLUSTER               : Cluster
STATE                 : MONITORED
IM_MAD                : kvm
VM_MAD                : kvm
VN_MAD                : dummy
LAST MONITORING TIME  : 03/17 12:44:18

HOST SHARES
TOTAL MEM             : 31.3G
USED MEM (REAL)       : 923.4M
USED MEM (ALLOCATED)  : 0K
TOTAL CPU             : 800
USED CPU (REAL)       : 12
USED CPU (ALLOCATED)  : 0
RUNNING VMS           : 0

MONITORING INFORMATION
ARCH="x86_64"
ARCH="x86_64"
ARCH="x86_64"
ARCH="x86_64"
CPUSPEED="2003"
CPUSPEED="2003"
CPUSPEED="2003"
CPUSPEED="2003"
HOSTNAME="sf01.****"
HOSTNAME="sf01.****"
HOSTNAME="sf01.****"
HOSTNAME="sf01.****"
HYPERVISOR="kvm"
HYPERVISOR="kvm"
HYPERVISOR="kvm"
HYPERVISOR="kvm"
MODELNAME="Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz"
MODELNAME="Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz"
MODELNAME="Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz"
MODELNAME="Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz"
NETRX="5079012522"
NETRX="5079041539"
NETRX="5079148700"
NETRX="5079166204"
NETTX="5367373336"
NETTX="5367414144"
NETTX="5367561384"
NETTX="5367583010"
RESERVED_CPU=""
RESERVED_MEM=""
VERSION="4.12.0"
VERSION="4.12.0"
VERSION="4.12.0"
VERSION="4.12.0"

VIRTUAL MACHINES

    ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME

Could you check if there is any error in the browser developer console?

Good tip! This does indeed give me some errors.

The following error is displayed:

TypeError: host_info.TEMPLATE.HYPERVISOR.toLowerCase is not a function hosts-tab.js:781:41

Which corresponds with the following code (the error console brings me to the bold line):

// Get rid of the unwanted (for show) HOST keys
var stripped_host_template = {};
var unshown_values         = {};

**if (host_info.TEMPLATE.HYPERVISOR && host_info.TEMPLATE.HYPERVISOR.toLowerCase() != "vcenter")**
{
  stripped_host_template = host_info.TEMPLATE;
}
else
{
  for (key in host_info.TEMPLATE)
      if(!key.match(/HOST/))
          stripped_host_template[key]=host_info.TEMPLATE[key];
      else
          unshown_values[key]=host_info.TEMPLATE[key];
}

I’m lost at this point, does this mean I’m missing something in my installation/configuration?

I think the problem is with the repeated keys in the monitoring information. Could you check if there are no duplicated probes in var/lib/one/remotes/im/kvm-probes.d (front end) or /var/tmp/one/im/kvm-probes.d (nodes). There should be only a key=value per attribute in the host monitoring information.

I’ve checked that directory and it turned out there were a bunch of .rpmsave files.

1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.2K Mar 10 00:43 architecture.sh
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.2K Jan 15 17:26 architecture.sh.rpmsave
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.4K Mar 10 00:43 collectd-client-shepherd.sh
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.4K Jan 15 17:26 collectd-client-shepherd.sh.rpmsave
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.4K Mar 10 00:43 cpu.sh
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.4K Jan 15 17:26 cpu.sh.rpmsave
3.5K -rwxr-xr-x. 1 oneadmin oneadmin 3.2K Mar 10 00:43 kvm.rb
3.5K -rwxr-xr-x. 1 oneadmin oneadmin 3.2K Jan 15 17:26 kvm.rb.rpmsave
2.5K -rwxr-xr-x. 1 oneadmin oneadmin 2.2K Mar 10 00:43 monitor_ds.sh
2.5K -rwxr-xr-x. 1 oneadmin oneadmin 2.2K Jan 15 17:26 monitor_ds.sh.rpmsave
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.2K Mar 10 00:43 name.sh
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.2K Jan 15 17:26 name.sh.rpmsave
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.2K Mar 10 00:43 poll.sh
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.2K Jan 15 17:26 poll.sh.rpmsave
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.3K Mar 10 00:43 version.sh
1.5K -rwxr-xr-x. 1 oneadmin oneadmin 1.3K Jan 15 17:26 version.sh.rpmsave

So I’ve searched for those and removed all of them from /var/lib/one:

cd /var/lib/one
find . -name *.rpmsave | xargs rm

The frontend and KVM clients should be the same since they have the same /var/lib/one (stored on a glusterfs volume).

After that I restarted OpenNebula and did the test again.

But sadly still no luck.

I also tested removing a host and then adding it again to see if it would registrer nicely, but that didn’t seem to make any difference. The error console still gives the same error.

Could you send us the output of one of the hosts that is giving you this error:

onehost show <HOST_ID> -x

Here’s the output:

<HOST>
  <ID>16</ID>
  <NAME>sf02</NAME>
  <STATE>2</STATE>
  <IM_MAD><![CDATA[kvm]]></IM_MAD>
  <VM_MAD><![CDATA[kvm]]></VM_MAD>
  <VN_MAD><![CDATA[dummy]]></VN_MAD>
  <LAST_MON_TIME>1426602293</LAST_MON_TIME>
  <CLUSTER_ID>100</CLUSTER_ID>
  <CLUSTER>Cluster</CLUSTER>
  <HOST_SHARE>
    <DISK_USAGE>0</DISK_USAGE>
    <MEM_USAGE>0</MEM_USAGE>
    <CPU_USAGE>0</CPU_USAGE>
    <MAX_DISK>13726726</MAX_DISK>
    <MAX_MEM>32778788</MAX_MEM>
    <MAX_CPU>800</MAX_CPU>
    <FREE_DISK>13722731</FREE_DISK>
    <FREE_MEM>32184836</FREE_MEM>
    <FREE_CPU>796</FREE_CPU>
    <USED_DISK>3996</USED_DISK>
    <USED_MEM>593952</USED_MEM>
    <USED_CPU>3</USED_CPU>
    <RUNNING_VMS>0</RUNNING_VMS>
    <DATASTORES/>
  </HOST_SHARE>
  <VMS/>
  <TEMPLATE>
    <ARCH><![CDATA[x86_64]]></ARCH>
    <ARCH><![CDATA[x86_64]]></ARCH>
    <ARCH><![CDATA[x86_64]]></ARCH>
    <ARCH><![CDATA[x86_64]]></ARCH>
    <CPUSPEED><![CDATA[2003]]></CPUSPEED>
    <CPUSPEED><![CDATA[2003]]></CPUSPEED>
    <CPUSPEED><![CDATA[2003]]></CPUSPEED>
    <CPUSPEED><![CDATA[2003]]></CPUSPEED>
    <HOSTNAME><![CDATA[sf02]]></HOSTNAME>
    <HOSTNAME><![CDATA[sf02]]></HOSTNAME>
    <HOSTNAME><![CDATA[sf02]]></HOSTNAME>
    <HOSTNAME><![CDATA[sf02]]></HOSTNAME>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
    <MODELNAME><![CDATA[Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz]]></MODELNAME>
    <MODELNAME><![CDATA[Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz]]></MODELNAME>
    <MODELNAME><![CDATA[Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz]]></MODELNAME>
    <MODELNAME><![CDATA[Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz]]></MODELNAME>
    <NETRX><![CDATA[1942119537]]></NETRX>
    <NETRX><![CDATA[1942137042]]></NETRX>
    <NETRX><![CDATA[1942201784]]></NETRX>
    <NETRX><![CDATA[1942220127]]></NETRX>
    <NETTX><![CDATA[1594807268]]></NETTX>
    <NETTX><![CDATA[1594829104]]></NETTX>
    <NETTX><![CDATA[1594916794]]></NETTX>
    <NETTX><![CDATA[1594938804]]></NETTX>
    <RESERVED_CPU><![CDATA[]]></RESERVED_CPU>
    <RESERVED_MEM><![CDATA[]]></RESERVED_MEM>
    <VERSION><![CDATA[4.12.0]]></VERSION>
    <VERSION><![CDATA[4.12.0]]></VERSION>
    <VERSION><![CDATA[4.12.0]]></VERSION>
    <VERSION><![CDATA[4.12.0]]></VERSION>
  </TEMPLATE>
</HOST>

Could you try to:

  • Remove the *.rpmsave files from /var/lib/one/remotes/im/kvm-probes.d
  • Remove the content of the /var/tmp/one directory in the nodes
  • Execute the onehost sync --force from the frontend
  • Recreate the host

Already did remove the rpmsave files.

I then cleared all /var/tmp/one directories.
Then forced a sync, but this gave an error on a few hosts. Those that didn’t give an error seemed to work afterwards.
So as a last step I removed all hosts and added them again, after this step everything was working.

So I guess the main problem was the /var/tmp/one directory?

Anyways, thanks allot for the quick help!!!

1 Like