What does Error monitoring Host XXX (13): mean?

modir · November 15, 2023, 3:30pm

I did an upgrade from 5.10.to to 5.12. Afterwards I opened the port 4124 (TCP and UDP) on the management server. But now at the end I want to enable the hosts again but I get the error message from the subject. I find here many answers to other error codes but none seem to answer what the error code 13 means. Can someone tell me more about it?

dclavijo · November 16, 2023, 5:01pm

Can you show the output of the full error message? For monitoring problems check /var/log/one/monitor.log

modir · November 16, 2023, 5:16pm

I already checked the monitor.log. Sadly there is nothing useful in it:

Thu Nov 16 18:13:05 2023 [Z0][HMM][D]: Monitoring XXX(13)
Thu Nov 16 18:13:06 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:06 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:06 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:06 2023 [Z0][MDP][W]: Start monitor failed for host 13: 
Thu Nov 16 18:13:06 2023 [Z0][HMM][E]: Unable to monitor host id: 13
Thu Nov 16 18:13:50 2023 [Z0][HMM][D]: Monitoring host XXX(1)
Thu Nov 16 18:13:51 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:51 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:51 2023 [Z0][MDP][I]: 
Thu Nov 16 18:13:51 2023 [Z0][MDP][W]: Start monitor failed for host 1: 
Thu Nov 16 18:13:51 2023 [Z0][HMM][E]: Unable to monitor host id: 1

That’s how it always looks like.

I see now the “13” is not an error code. It is the internal ID of the host.

dclavijo · November 17, 2023, 4:29pm

Increase the debugging level and restart the monitoring service Monitoring Configuration — OpenNebula 6.8.0 documentation. By default it doesn’t go that deep.

modir · November 18, 2023, 3:16pm

I already have for both services the log level on 3. Oned log doesn’t have more information either:

Sat Nov 18 16:14:49 2023 [Z0][ONE][E]: Error monitoring Host XXX (0):

It looks to me like the empty MDP entries should have some text. Maye some kind of bug.

Another question: How can I start and stop the probe agents on the host manually? And do they have some logs as well?

modir · November 19, 2023, 5:04pm

I tried now a few other things. One was to start opennebula with the default config which was shipped with the RPM package. With the default config file it doesn’t even start and the error message is something like “could not find /usr/lib/one/mads/collectd”. Could it be that this is something which should have been installed with the RPM?

I can change the oned.conf file to this and then it starts but I have then other error messages:

IM_MAD = [
      NAME       = "collectd",
      EXECUTABLE = "/usr/sbin/collectd",
      ARGUMENTS  = "-p 4124 -f 5 -t 50 -i 60" ]

Maybe this information helps somehow.

dclavijo · November 22, 2023, 3:46pm

raise the monitor level to 4 or 5 (3 is the default) and restart opennebula. You should get more information about the problem.

modir · November 28, 2023, 5:23pm

Sadly there is the same amount of information. Nothing more that would help.

modir · November 28, 2023, 6:15pm

I found the problem now. The outdated configuration of collectd was still in the config file. It had to be replaced with this config:

IM_MAD = [
      NAME       = "monitord",
      EXECUTABLE = "onemonitord",
      ARGUMENTS  = "-c monitord.conf",
      THREADS    = 8 ]

Topic		Replies	Views
Solved: "Error monitoring Host" when trying to add host General solved	1	5258	January 29, 2019
Debugging error adding host General	16	1699	March 16, 2019
Start monitor failed for host 4: Could not update remotes General	2	323	March 21, 2024
Error monitoring host General	1	319	May 23, 2023
One master node not starting after 5.12 upgrade to 6 Matra General	2	307	April 21, 2021

What does Error monitoring Host XXX (13): mean?

Related Topics