What does Error monitoring Host XXX (13): mean?

I did an upgrade from 5.10.to to 5.12. Afterwards I opened the port 4124 (TCP and UDP) on the management server. But now at the end I want to enable the hosts again but I get the error message from the subject. I find here many answers to other error codes but none seem to answer what the error code 13 means. Can someone tell me more about it?

Can you show the output of the full error message? For monitoring problems check /var/log/one/monitor.log

I already checked the monitor.log. Sadly there is nothing useful in it:

Thu Nov 16 18:13:05 2023 [Z0][HMM][D]: Monitoring XXX(13)
Thu Nov 16 18:13:06 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:06 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:06 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:06 2023 [Z0][MDP][W]: Start monitor failed for host 13: 
Thu Nov 16 18:13:06 2023 [Z0][HMM][E]: Unable to monitor host id: 13
Thu Nov 16 18:13:50 2023 [Z0][HMM][D]: Monitoring host XXX(1)
Thu Nov 16 18:13:51 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:51 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:51 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:51 2023 [Z0][MDP][W]: Start monitor failed for host 1: 
Thu Nov 16 18:13:51 2023 [Z0][HMM][E]: Unable to monitor host id: 1

That’s how it always looks like.

I see now the “13” is not an error code. It is the internal ID of the host.

Increase the debugging level and restart the monitoring service Monitoring Configuration — OpenNebula 6.8.0 documentation. By default it doesn’t go that deep.

I already have for both services the log level on 3. Oned log doesn’t have more information either:

Sat Nov 18 16:14:49 2023 [Z0][ONE][E]: Error monitoring Host XXX (0):

It looks to me like the empty MDP entries should have some text. Maye some kind of bug.

Another question: How can I start and stop the probe agents on the host manually? And do they have some logs as well?

I tried now a few other things. One was to start opennebula with the default config which was shipped with the RPM package. With the default config file it doesn’t even start and the error message is something like “could not find /usr/lib/one/mads/collectd”. Could it be that this is something which should have been installed with the RPM?

I can change the oned.conf file to this and then it starts but I have then other error messages:

IM_MAD = [
      NAME       = "collectd",
      EXECUTABLE = "/usr/sbin/collectd",
      ARGUMENTS  = "-p 4124 -f 5 -t 50 -i 60" ]

Maybe this information helps somehow.

raise the monitor level to 4 or 5 (3 is the default) and restart opennebula. You should get more information about the problem.

Sadly there is the same amount of information. Nothing more that would help.

I found the problem now. The outdated configuration of collectd was still in the config file. It had to be replaced with this config:

IM_MAD = [
      NAME       = "monitord",
      EXECUTABLE = "onemonitord",
      ARGUMENTS  = "-c monitord.conf",
      THREADS    = 8 ]
1 Like