Hello,
I recently upgraded ONE 5.10 to 6.0 and moved whole ONE installation to separate virtual machine.
Everything has been working smoothly until today’s restart, when OpenNebula refused to start without any useful info. Last two lines are everything in logs related to the problem:
Sun Feb 13 22:42:02 2022 [Z0][IPM][I]: Starting IPAM Manager...
Sun Feb 13 22:42:02 2022 [Z0][Lis][I]: IPAM Manager started.
Sun Feb 13 22:42:05 2022 [Z0][InM][I]: Starting Information Manager...
Sun Feb 13 22:42:05 2022 [Z0][DrM][E]: Unable to start driver 'monitord': Driver initialization failed
Sun Feb 13 22:42:05 2022 [Z0][InM][E]: Error starting Information Manager: Driver initialization failed
When I tried to run ONE with oned -f
under oneadmin
user, I got segfault a little while after those lines appeared in log.
After while and many attempts to dig deeper, I found out that problem component is onemonitord
which is segfaulting on start.
Here are last lines from strace /usr/lib/one/mads/onemonitord --config /etc/one/monitord.conf --oned-config /etc/one/oned.conf
:
getrandom("\x3e", 1, GRND_NONBLOCK) = 1
stat("/etc/gnutls/default-priorities", 0x7ffc957b0240) = -1 ENOENT (No such file or directory)
futex(0x7f621d87007c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f621d870088, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
[1] 15215 segmentation fault (core dumped) strace /usr/lib/one/mads/onemonitord --config /etc/one/monitord.conf
and output form coredumpctl, probably not useful without debugging libs
❯ coredumpctl info :(
PID: 15450 (onemonitord)
UID: 9869 (oneadmin)
GID: 9869 (oneadmin)
Signal: 11 (SEGV)
Timestamp: Sun 2022-02-13 23:38:32 CET (4s ago)
Command Line: /usr/lib/one/mads/onemonitord --config /etc/one/monitord.conf --oned-config /etc/one/oned.conf
Executable: /usr/lib/one/mads/onemonitord
Control Group: /system.slice/ssh.service
Unit: ssh.service
Slice: system.slice
Boot ID: cade14f3afe640dfa31ffca7d45344b1
Machine ID: 70c7672c5ce241f483e52d720c3f6158
Hostname: urc-a
Storage: /var/lib/systemd/coredump/core.onemonitord.9869.cade14f3afe640dfa31ffca7d45344b1.15450.1644791912000000.lz4
Message: Process 15450 (onemonitord) of user 9869 dumped core.
Stack trace of thread 15450:
#0 0x00007f544f6e0206 n/a (libc.so.6)
#1 0x00007f544f941e44 _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignEPKc (libstdc++.so.6)
#2 0x0000557ead4dba68 n/a (onemonitord)
#3 0x00007f544f66c09b __libc_start_main (libc.so.6)
#4 0x0000557ead4de80a n/a (onemonitord)
After this discovery, I commented lines regarding monitord
IM_MAD
in oned.conf
and ONE now at least started, but without monitord of course.
Any idea is welcomed, I’m kind of stuck and don’t know what else to try. I even tried to reinstall all opennebula packages from distro, but no improvement.
Thanks