I’ve setup OpenNebula on a VMWare virtual and am trying to get it connected to two physicals (ph01 and ph02) each of which has a 2TB LUN that’s managed via Gluster replication.
So I’m trying to get this opennebula01 VMWare server (where I have opennebula and opennebula-sunstone running ) to connect with and deploy VM’s to the two physicals. The setup seamed to go fine and the FEDERATION mode selected by the installation is STANDALONE. I manage to add a Network, define templates, added the Gluster volume which it recognized as 2TB and started to work on defining a CDROM iso image to connect to the new OpenNebula guest VM that I got defined.
However the OpenNebula guest VM (one-centos7-vm01) goes into various states like POWEROFF or HOTPLUG_PROLOG_POWEROFF. Until this point the OpenNebula GUI is still working and XML_RPC oned service is still visible from netstat -pnlt. However as soon as I try to recover the one-centos70vm01 guest VM defined in OpenNebula from the above states by selecting it to retry the previous operation, the UI stops working and displays ‘Connection refused - connect(2)’ and the opennebula service crashes with the below message from the systemd daemon on this opennebula01 CentOS 7 VM:
[root@opennebula01 one]# systemctl status opennebula
â opennebula.service - OpenNebula Cloud Controller Daemon
Loaded: loaded (/usr/lib/systemd/system/opennebula.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2018-03-25 15:55:40 EDT; 11min ago
Process: 11197 ExecStopPost=/usr/share/one/follower_cleanup (code=exited, status=0/SUCCESS)
Process: 11195 ExecStopPost=/bin/rm -f /var/lock/one/one (code=exited, status=0/SUCCESS)
Process: 11189 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=1/FAILURE)
Process: 10479 ExecStart=/usr/bin/oned -f (code=killed, signal=SEGV)
Process: 10473 ExecStartPre=/usr/sbin/logrotate -s /tmp/logrotate.state -f /etc/logrotate.d/opennebula (code=exited, status=0/SUCCESS)
Process: 10470 ExecStartPre=/bin/chown oneadmin:oneadmin /var/log/one (code=exited, status=0/SUCCESS)
Process: 10468 ExecStartPre=/bin/mkdir -p /var/log/one (code=exited, status=0/SUCCESS)
Main PID: 10479 (code=killed, signal=SEGV)
Mar 25 15:53:26 opennebula01.nix.my.dom systemd[1]: Starting OpenNebula Cloud Controller Daemon...
Mar 25 15:53:26 opennebula01.nix.my.dom systemd[1]: Started OpenNebula Cloud Controller Daemon.
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: opennebula.service: main process exited, code=killed, status=11/SEGV
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: opennebula.service: control process exited, code=exited status=1
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: Stopped OpenNebula Cloud Controller Daemon.
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: Unit opennebula.service entered failed state.
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: opennebula.service failed.
[root@opennebula01 one]#
Here are the corresponding logs for the same time as above:
==> oned.log <==
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:3120 UID:0 one.vm.info invoked , 1
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:3120 UID:0 one.vm.info result SUCCESS, "<VM><ID>1</ID><UID>0..."
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:2352 UID:0 one.vm.action invoked , "resume", 1
Sun Mar 25 15:55:40 2018 [Z0][DiM][D]: Resuming VM 1
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:2352 UID:0 one.vm.action result SUCCESS, 1
==> sunstone.log <==
Sun Mar 25 15:55:40 2018 [I]: 192.168.0.101 - - [25/Mar/2018:15:55:40 -0400] "POST /vm/1/action HTTP/1.1" 204 - 0.0425
==> 1.log <==
Sun Mar 25 15:55:40 2018 [Z0][VM][I]: New state is ACTIVE
Sun Mar 25 15:55:40 2018 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
==> oned.log <==
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:7776 UID:0 one.vm.info invoked , 1
==> 1.log <==
Sun Mar 25 15:55:40 2018 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1/deployment.4
==> sunstone.log <==
Sun Mar 25 15:55:40 2018 [I]: 192.168.0.101 - - [25/Mar/2018:15:55:40 -0400] "GET /vm/1?id=1&csrftoken=70a7513358df9e8b1ea4541452441151 HTTP/1.1" 404 - 0.0169
==> sched.log <==
Sun Mar 25 15:55:40 2018 [Z0][SCHED][I]: Stopping the scheduler...
Sun Mar 25 15:55:40 2018 [Z0][SCHED][I]: Scheduler loop stopped.
Any way to find out more why the oned daemon stops when trying to recover an OpenNebula guest VM? My version of OpenNebula is below:
[root@opennebula01 one]# rpm -aq|grep -Ei opennebula
opennebula-ruby-5.4.6-1.x86_64
opennebula-sunstone-5.4.6-1.x86_64
opennebula-common-5.4.6-1.x86_64
opennebula-server-5.4.6-1.x86_64
opennebula-flow-5.4.6-1.x86_64
opennebula-5.4.6-1.x86_64
opennebula-gate-5.4.6-1.x86_64
[root@opennebula01 one]#
It is noteworthy to say that I did see the guest VM being successfully deployed to the ph02 machine without any disks and I could ping the IP’s it had at the time. That is until I tried to attach a CDROM with an iso image to it so we can actually try to install an OS on it.
EDIT:
I see this ruby segmentation fault in the /var/log/messages log file as well:
Mar 25 15:53:26 opennebula01 systemd: Started OpenNebula Cloud Scheduler Daemon.
Mar 25 15:53:26 opennebula01 systemd: Starting OpenNebula Cloud Scheduler Daemon...
Mar 25 15:54:59 opennebula01 ruby: Thin web server (v1.7.0 codename Dunder Mifflin)
Mar 25 15:54:59 opennebula01 ruby: Maximum connections set to 1024
Mar 25 15:54:59 opennebula01 ruby: Listening on 0.0.0.0:9869, CTRL+C to stop
Mar 25 15:55:40 opennebula01 kernel: oned[10515]: segfault at fffffffffffffff8 ip 00007fbf2213d620 sp 00007fbf16ffafd8 error 5 in libstdc++.so.6.0.19[7fbf220a8000+e9000]
Mar 25 15:55:40 opennebula01 systemd: opennebula.service: main process exited, code=killed, status=11/SEGV
Mar 25 15:55:40 opennebula01 systemd: Stopping OpenNebula Cloud Scheduler Daemon...
Mar 25 15:55:40 opennebula01 systemd: opennebula.service: control process exited, code=exited status=1
Mar 25 15:55:40 opennebula01 kill: Usage:
Mar 25 15:55:40 opennebula01 kill: kill [options] <pid|name> [...]
Mar 25 15:55:40 opennebula01 kill: Options:
Mar 25 15:55:40 opennebula01 kill: -a, --all do not restrict the name-to-pid conversion to processes
Mar 25 15:55:40 opennebula01 kill: with the same uid as the present process
Mar 25 15:55:40 opennebula01 kill: -s, --signal <sig> send specified signal
Mar 25 15:55:40 opennebula01 kill: -q, --queue <sig> use sigqueue(2) rather than kill(2)
Mar 25 15:55:40 opennebula01 kill: -p, --pid print pids without signaling them
Mar 25 15:55:40 opennebula01 kill: -l, --list [=<signal>] list signal names, or convert one to a name
Mar 25 15:55:40 opennebula01 kill: -L, --table list signal names and numbers
Mar 25 15:55:40 opennebula01 kill: -h, --help display this help and exit
Mar 25 15:55:40 opennebula01 kill: -V, --version output version information and exit
Mar 25 15:55:40 opennebula01 kill: For more details see kill(1).
Mar 25 15:55:40 opennebula01 systemd: Stopped OpenNebula Cloud Scheduler Daemon.
Mar 25 15:55:40 opennebula01 systemd: Stopped OpenNebula Cloud Controller Daemon.
Mar 25 15:55:40 opennebula01 systemd: Unit opennebula.service entered failed state.
Mar 25 15:55:40 opennebula01 systemd: opennebula.service failed.
Mar 25 15:58:28 opennebula01 su: (to oneadmin) tom@my.dom on pts/3
Mar 25 16:01:01 opennebula01 systemd: Started Session 30 of user root.
Mar 25 16:01:01 opennebula01 systemd: Starting Session 30 of user root.
Cheers,
Tom K.