OpenNebula Setup: One VM + 2 Physicals: Program terminated with signal 11, Segmentation fault

I’ve setup OpenNebula on a VMWare virtual and am trying to get it connected to two physicals (ph01 and ph02) each of which has a 2TB LUN that’s managed via Gluster replication.

So I’m trying to get this opennebula01 VMWare server (where I have opennebula and opennebula-sunstone running ) to connect with and deploy VM’s to the two physicals. The setup seamed to go fine and the FEDERATION mode selected by the installation is STANDALONE. I manage to add a Network, define templates, added the Gluster volume which it recognized as 2TB and started to work on defining a CDROM iso image to connect to the new OpenNebula guest VM that I got defined.

However the OpenNebula guest VM (one-centos7-vm01) goes into various states like POWEROFF or HOTPLUG_PROLOG_POWEROFF. Until this point the OpenNebula GUI is still working and XML_RPC oned service is still visible from netstat -pnlt. However as soon as I try to recover the one-centos70vm01 guest VM defined in OpenNebula from the above states by selecting it to retry the previous operation, the UI stops working and displays ‘Connection refused - connect(2)’ and the opennebula service crashes with the below message from the systemd daemon on this opennebula01 CentOS 7 VM:

[root@opennebula01 one]# systemctl status opennebula
â opennebula.service - OpenNebula Cloud Controller Daemon
   Loaded: loaded (/usr/lib/systemd/system/opennebula.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sun 2018-03-25 15:55:40 EDT; 11min ago
  Process: 11197 ExecStopPost=/usr/share/one/follower_cleanup (code=exited, status=0/SUCCESS)
  Process: 11195 ExecStopPost=/bin/rm -f /var/lock/one/one (code=exited, status=0/SUCCESS)
  Process: 11189 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=1/FAILURE)
  Process: 10479 ExecStart=/usr/bin/oned -f (code=killed, signal=SEGV)
  Process: 10473 ExecStartPre=/usr/sbin/logrotate -s /tmp/logrotate.state -f /etc/logrotate.d/opennebula (code=exited, status=0/SUCCESS)
  Process: 10470 ExecStartPre=/bin/chown oneadmin:oneadmin /var/log/one (code=exited, status=0/SUCCESS)
  Process: 10468 ExecStartPre=/bin/mkdir -p /var/log/one (code=exited, status=0/SUCCESS)
 Main PID: 10479 (code=killed, signal=SEGV)

Mar 25 15:53:26 opennebula01.nix.my.dom systemd[1]: Starting OpenNebula Cloud Controller Daemon...
Mar 25 15:53:26 opennebula01.nix.my.dom systemd[1]: Started OpenNebula Cloud Controller Daemon.
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: opennebula.service: main process exited, code=killed, status=11/SEGV
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: opennebula.service: control process exited, code=exited status=1
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: Stopped OpenNebula Cloud Controller Daemon.
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: Unit opennebula.service entered failed state.
Mar 25 15:55:40 opennebula01.nix.my.dom systemd[1]: opennebula.service failed.
[root@opennebula01 one]#

Here are the corresponding logs for the same time as above:

==> oned.log <==
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:3120 UID:0 one.vm.info invoked , 1
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:3120 UID:0 one.vm.info result SUCCESS, "<VM><ID>1</ID><UID>0..."
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:2352 UID:0 one.vm.action invoked , "resume", 1
Sun Mar 25 15:55:40 2018 [Z0][DiM][D]: Resuming VM 1
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:2352 UID:0 one.vm.action result SUCCESS, 1

==> sunstone.log <==
Sun Mar 25 15:55:40 2018 [I]: 192.168.0.101 - - [25/Mar/2018:15:55:40 -0400] "POST /vm/1/action HTTP/1.1" 204 - 0.0425

==> 1.log <==
Sun Mar 25 15:55:40 2018 [Z0][VM][I]: New state is ACTIVE
Sun Mar 25 15:55:40 2018 [Z0][VM][I]: New LCM state is BOOT_POWEROFF

==> oned.log <==
Sun Mar 25 15:55:40 2018 [Z0][ReM][D]: Req:7776 UID:0 one.vm.info invoked , 1

==> 1.log <==
Sun Mar 25 15:55:40 2018 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1/deployment.4

==> sunstone.log <==
Sun Mar 25 15:55:40 2018 [I]: 192.168.0.101 - - [25/Mar/2018:15:55:40 -0400] "GET /vm/1?id=1&csrftoken=70a7513358df9e8b1ea4541452441151 HTTP/1.1" 404 - 0.0169

==> sched.log <==
Sun Mar 25 15:55:40 2018 [Z0][SCHED][I]: Stopping the scheduler...
Sun Mar 25 15:55:40 2018 [Z0][SCHED][I]: Scheduler loop stopped.

Any way to find out more why the oned daemon stops when trying to recover an OpenNebula guest VM? My version of OpenNebula is below:

[root@opennebula01 one]# rpm -aq|grep -Ei opennebula
opennebula-ruby-5.4.6-1.x86_64
opennebula-sunstone-5.4.6-1.x86_64
opennebula-common-5.4.6-1.x86_64
opennebula-server-5.4.6-1.x86_64
opennebula-flow-5.4.6-1.x86_64
opennebula-5.4.6-1.x86_64
opennebula-gate-5.4.6-1.x86_64
[root@opennebula01 one]#

It is noteworthy to say that I did see the guest VM being successfully deployed to the ph02 machine without any disks and I could ping the IP’s it had at the time. That is until I tried to attach a CDROM with an iso image to it so we can actually try to install an OS on it.

EDIT:

I see this ruby segmentation fault in the /var/log/messages log file as well:

Mar 25 15:53:26 opennebula01 systemd: Started OpenNebula Cloud Scheduler Daemon.
Mar 25 15:53:26 opennebula01 systemd: Starting OpenNebula Cloud Scheduler Daemon...
Mar 25 15:54:59 opennebula01 ruby: Thin web server (v1.7.0 codename Dunder Mifflin)
Mar 25 15:54:59 opennebula01 ruby: Maximum connections set to 1024
Mar 25 15:54:59 opennebula01 ruby: Listening on 0.0.0.0:9869, CTRL+C to stop
Mar 25 15:55:40 opennebula01 kernel: oned[10515]: segfault at fffffffffffffff8 ip 00007fbf2213d620 sp 00007fbf16ffafd8 error 5 in libstdc++.so.6.0.19[7fbf220a8000+e9000]
Mar 25 15:55:40 opennebula01 systemd: opennebula.service: main process exited, code=killed, status=11/SEGV
Mar 25 15:55:40 opennebula01 systemd: Stopping OpenNebula Cloud Scheduler Daemon...
Mar 25 15:55:40 opennebula01 systemd: opennebula.service: control process exited, code=exited status=1
Mar 25 15:55:40 opennebula01 kill: Usage:
Mar 25 15:55:40 opennebula01 kill: kill [options] <pid|name> [...]
Mar 25 15:55:40 opennebula01 kill: Options:
Mar 25 15:55:40 opennebula01 kill: -a, --all              do not restrict the name-to-pid conversion to processes
Mar 25 15:55:40 opennebula01 kill: with the same uid as the present process
Mar 25 15:55:40 opennebula01 kill: -s, --signal <sig>     send specified signal
Mar 25 15:55:40 opennebula01 kill: -q, --queue <sig>      use sigqueue(2) rather than kill(2)
Mar 25 15:55:40 opennebula01 kill: -p, --pid              print pids without signaling them
Mar 25 15:55:40 opennebula01 kill: -l, --list [=<signal>] list signal names, or convert one to a name
Mar 25 15:55:40 opennebula01 kill: -L, --table            list signal names and numbers
Mar 25 15:55:40 opennebula01 kill: -h, --help     display this help and exit
Mar 25 15:55:40 opennebula01 kill: -V, --version  output version information and exit
Mar 25 15:55:40 opennebula01 kill: For more details see kill(1).
Mar 25 15:55:40 opennebula01 systemd: Stopped OpenNebula Cloud Scheduler Daemon.
Mar 25 15:55:40 opennebula01 systemd: Stopped OpenNebula Cloud Controller Daemon.
Mar 25 15:55:40 opennebula01 systemd: Unit opennebula.service entered failed state.
Mar 25 15:55:40 opennebula01 systemd: opennebula.service failed.
Mar 25 15:58:28 opennebula01 su: (to oneadmin) tom@my.dom on pts/3
Mar 25 16:01:01 opennebula01 systemd: Started Session 30 of user root.
Mar 25 16:01:01 opennebula01 systemd: Starting Session 30 of user root.

Cheers,
Tom K.

The ruby packages are as follows:

[root@opennebula01 one]# /usr/share/one/install_gems
lsb_release command not found. If you are using a RedHat based
distribution install redhat-lsb

Select your distribution or press enter to continue without
installing dependencies.

0. Ubuntu/Debian
1. CentOS/RedHat/Scientific

1
Distribution "redhat" detected.
About to install these dependencies:
* gcc
* rubygem-rake
* libxml2-devel
* libxslt-devel
* patch
* gcc-c++
* sqlite-devel
* curl-devel
* mysql-devel
* openssl-devel
* ruby-devel
* make

Press enter to continue...

yum install gcc rubygem-rake libxml2-devel libxslt-devel patch gcc-c++ sqlite-devel curl-devel mysql-devel openssl-devel ruby-devel make
Loaded plugins: fastestmirror
base                                                                                                                                         | 3.6 kB  00:00:00
centos-gluster313                                                                                                                            | 2.9 kB  00:00:00
epel/x86_64/metalink                                                                                                                         |  17 kB  00:00:00
extras                                                                                                                                       | 3.4 kB  00:00:00
opennebula                                                                                                                                   | 2.9 kB  00:00:00
updates                                                                                                                                      | 3.4 kB  00:00:00
vmware-tools                                                                                                                                 |  951 B  00:00:00
Loading mirror speeds from cached hostfile
 * base: mirror.gpmidi.net
 * epel: ftp.cse.buffalo.edu
 * extras: mirror.gpmidi.net
 * updates: mirror.gpmidi.net
Package gcc-4.8.5-16.el7_4.2.x86_64 already installed and latest version
Package rubygem-rake-0.9.6-33.el7_4.noarch already installed and latest version
Package libxml2-devel-2.9.1-6.el7_2.3.x86_64 already installed and latest version
Package libxslt-devel-1.1.28-5.el7.x86_64 already installed and latest version
Package patch-2.7.1-8.el7.x86_64 already installed and latest version
Package gcc-c++-4.8.5-16.el7_4.2.x86_64 already installed and latest version
Package sqlite-devel-3.7.17-8.el7.x86_64 already installed and latest version
Package libcurl-devel-7.29.0-42.el7_4.1.x86_64 already installed and latest version
Package 1:mariadb-devel-5.5.56-2.el7.x86_64 already installed and latest version
Package 1:openssl-devel-1.0.2k-8.el7.x86_64 already installed and latest version
Package ruby-devel-2.0.0.648-33.el7_4.x86_64 already installed and latest version
Package 1:make-3.82-23.el7.x86_64 already installed and latest version
Nothing to do
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and installing your bundle as root will break this application for all non-root users on this
machine.
Using addressable 2.4.0
Using xml-simple 1.1.5
Using amazon-ec2 0.9.17
Using jmespath 1.3.1
Using aws-sdk-core 2.5.10
Using aws-sdk-resources 2.5.10
Using aws-sdk 2.5.10
Using multipart-post 2.0.0
Using faraday 0.9.2
Using faraday_middleware 0.10.0
Using nokogiri 1.6.1
Using azure-core 0.1.4
Using json 1.8.3
Using mime-types 2.99.2
Using systemu 2.6.5
Using thor 0.19.1
Using azure 0.7.6
Using builder 3.2.2
Using bundler 1.16.1
Using configparser 0.1.4
Using curb 0.9.3
Using daemons 1.2.4
Using eventmachine 1.2.0.1
Using hashie 3.4.4
Using inflection 1.0.0
Using memcache-client 1.8.5
Using mysql2 0.5.0
Using net-ldap 0.12.1
Using ox 2.4.4
Using parse-cron 0.1.4
Using polyglot 0.3.5
Using rack 1.6.4
Using rack-protection 1.5.3
Using scrub_rb 1.0.1
Using sequel 4.38.0
Using tilt 2.0.5
Using sinatra 1.4.7
Using sqlite3 1.3.11
Using thin 1.7.0
Using treetop 1.6.8
Using trollop 2.1.2
Using uuidtools 2.1.5
Using zendesk_api 1.13.4
Bundle complete! 21 Gemfile dependencies, 43 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
[root@opennebula01 one]#

OS:
[oneadmin@opennebula01 ~]$ cat /etc/*release*
CentOS Linux release 7.4.1708 (Core)
Derived from Red Hat Enterprise Linux 7.4 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.4.1708 (Core)
CentOS Linux release 7.4.1708 (Core)
cpe:/o:centos:centos:7
[oneadmin@opennebula01 ~]$

I see another has the same issue. I can also keep reproducing this issue:

The second thread by someone else is:

Cheers,
Tom

Attaching two core dumps, opennebula-core-files.zip (1.9 MB), and before and after systemctl status printouts:

[root@opennebula01 ~]# systemctl status opennebula

â opennebula.service - OpenNebula Cloud Controller Daemon
   Loaded: loaded (/usr/lib/systemd/system/opennebula.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2018-03-25 18:00:18 EDT; 7s ago
  Process: 12090 ExecStopPost=/usr/share/one/follower_cleanup (code=exited, status=0/SUCCESS)
  Process: 12088 ExecStopPost=/bin/rm -f /var/lock/one/one (code=exited, status=0/SUCCESS)
  Process: 12082 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=1/FAILURE)
  Process: 12108 ExecStartPre=/usr/sbin/logrotate -s /tmp/logrotate.state -f /etc/logrotate.d/opennebula (code=exited, status=0/SUCCESS)
  Process: 12105 ExecStartPre=/bin/chown oneadmin:oneadmin /var/log/one (code=exited, status=0/SUCCESS)
  Process: 12103 ExecStartPre=/bin/mkdir -p /var/log/one (code=exited, status=0/SUCCESS)
 Main PID: 12113 (oned)
   CGroup: /system.slice/opennebula.service
           ââ12113 /usr/bin/oned -f
           ââ12130 ruby /usr/lib/one/mads/one_hm.rb
           ââ12180 ruby /usr/lib/one/mads/one_vmm_exec.rb -t 15 -r 0 kvm
           ââ12197 ruby /usr/lib/one/mads/one_vmm_exec.rb -l deploy,shutdown,reboot,cancel,save,restore,migrate,poll,pre,post,cl...
           ââ12214 /usr/lib/one/mads/collectd -p 4124 -f 5 -t 50 -i 20
           ââ12266 ruby /usr/lib/one/mads/one_im_exec.rb -r 3 -t 15 kvm
           ââ12279 ruby /usr/lib/one/mads/one_im_exec.rb -l -c -t 15 -r 0 vcenter
           ââ12292 ruby /usr/lib/one/mads/one_tm.rb -t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,ceph,dev,vcenter,iscsi_libvirt
           ââ12312 ruby /usr/lib/one/mads/one_datastore.rb -t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter -s shared,ssh,ce...
           ââ12328 ruby /usr/lib/one/mads/one_market.rb -t 15 -m http,s3,one
           ââ12344 ruby /usr/lib/one/mads/one_ipam.rb -t 1 -i dummy
           ââ12356 ruby /usr/lib/one/mads/one_auth_mad.rb --authn ssh,x509,ldap,server_cipher,server_x509

Mar 25 18:00:18 opennebula01.nix.my.dom systemd[1]: Starting OpenNebula Cloud Controller Daemon...
Mar 25 18:00:18 opennebula01.nix.my.dom systemd[1]: Started OpenNebula Cloud Controller Daemon.
[root@opennebula01 ~]#
[root@opennebula01 ~]#
[root@opennebula01 ~]#
[root@opennebula01 ~]# systemctl status opennebula
â opennebula.service - OpenNebula Cloud Controller Daemon
   Loaded: loaded (/usr/lib/systemd/system/opennebula.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sun 2018-03-25 18:00:46 EDT; 9s ago
  Process: 12566 ExecStopPost=/usr/share/one/follower_cleanup (code=exited, status=0/SUCCESS)
  Process: 12564 ExecStopPost=/bin/rm -f /var/lock/one/one (code=exited, status=0/SUCCESS)
  Process: 12560 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=1/FAILURE)
  Process: 12113 ExecStart=/usr/bin/oned -f (code=killed, signal=SEGV)
  Process: 12108 ExecStartPre=/usr/sbin/logrotate -s /tmp/logrotate.state -f /etc/logrotate.d/opennebula (code=exited, status=0/SUCCESS)
  Process: 12105 ExecStartPre=/bin/chown oneadmin:oneadmin /var/log/one (code=exited, status=0/SUCCESS)
  Process: 12103 ExecStartPre=/bin/mkdir -p /var/log/one (code=exited, status=0/SUCCESS)
 Main PID: 12113 (code=killed, signal=SEGV)

Mar 25 18:00:18 opennebula01.nix.my.dom systemd[1]: Starting OpenNebula Cloud Controller Daemon...
Mar 25 18:00:18 opennebula01.nix.my.dom systemd[1]: Started OpenNebula Cloud Controller Daemon.
Mar 25 18:00:45 opennebula01.nix.my.dom systemd[1]: opennebula.service: main process exited, code=killed, status=11/SEGV
Mar 25 18:00:45 opennebula01.nix.my.dom systemd[1]: opennebula.service: control process exited, code=exited status=1
Mar 25 18:00:46 opennebula01.nix.my.dom systemd[1]: Stopped OpenNebula Cloud Controller Daemon.
Mar 25 18:00:46 opennebula01.nix.my.domsystemd[1]: Unit opennebula.service entered failed state.
Mar 25 18:00:46 opennebula01.nix.my.domsystemd[1]: opennebula.service failed.
[root@opennebula01 ~]#
[root@opennebula01 ~]#
[root@opennebula01 ~]#
[root@opennebula01 ~]#

Core breakdown:

[root@opennebula01 one]# gdb oned-crashed core.11623
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7_4.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
oned-crashed: No such file or directory.
[New LWP 11660]
[New LWP 11664]
[New LWP 11663]
[New LWP 11661]
[New LWP 11674]
[New LWP 11669]
[New LWP 11658]
[New LWP 11657]
[New LWP 11673]
[New LWP 11639]
[New LWP 11671]
[New LWP 11887]
[New LWP 11670]
[New LWP 11886]
[New LWP 11638]
[New LWP 11672]
[New LWP 11623]
[New LWP 11667]
[New LWP 11659]
[New LWP 11666]
[New LWP 11668]
[New LWP 11662]
[New LWP 11665]
Reading symbols from /usr/bin/oned...Reading symbols from /usr/lib/debug/usr/bin/oned.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/oned -f'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fb46ad03620 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib64/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-21.el7.x86_64 glibc-2.17-196.el7_4.2.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libcurl-7.29.0-42.el7_4.1.x86_64 libgcc-4.8.5-16.el7_4.2.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-11.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-16.el7_4.2.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 mariadb-libs-5.5.56-2.el7.x86_64 nspr-4.13.1-1.0.el7_3.x86_64 nss-3.28.4-15.el7_4.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 nss-util-3.28.4-3.el7.x86_64 openldap-2.4.44-5.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) tr
Tracepoint 1 at 0x7fb46ad03620
(gdb) where
#0  0x00007fb46ad03620 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib64/libstdc++.so.6
#1  0x00000000004494cc in LibVirtDriver::deployment_description_kvm (this=0x1276cc0, vm=0x7fb44c08aa10, file_name="/var/lib/one/vms/1/deployment.4")
    at src/vmm/LibVirtDriverKVM.cc:696
#2  0x000000000043b9ca in LibVirtDriver::deployment_description (this=0x1276cc0, vm=0x7fb44c08aa10, fn="/var/lib/one/vms/1/deployment.4")
    at include/LibVirtDriver.h:52
#3  0x00000000004304f0 in VirtualMachineManager::deploy_action (this=0x127cc30, vid=1) at src/vmm/VirtualMachineManager.cc:389
#4  0x000000000042f82c in VirtualMachineManager::user_action (this=0x127cc30, ar=...) at src/vmm/VirtualMachineManager.cc:113
#5  0x00000000005a9a88 in ActionListener::_do_action (this=0x127ce48, ar=...) at include/ActionManager.h:113
#6  0x00000000005a9990 in ActionManager::loop (this=0x127ce88, _tout=..., trequest=...) at src/common/ActionManager.cc:112
#7  0x000000000043ab63 in ActionManager::loop (this=0x127ce88, timeout=15) at include/ActionManager.h:180
#8  0x000000000042f713 in vmm_action_loop (arg=0x127cc30) at src/vmm/VirtualMachineManager.cc:70
#9  0x00007fb46af7de25 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fb46a48b34d in clone () from /lib64/libc.so.6
(gdb)

file_name from above:

[root@opennebula01 1]# cat deployment.4
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
        <name>one-1</name>
        <vcpu><![CDATA[2]]></vcpu>
        <cputune>
                <shares>2048</shares>
        </cputune>
        <memory>4194304</memory>
        <os>
                <type arch='x86_64'>hvm</type>
        </os>
        <devices>
                <emulator><![CDATA[/usr/libexec/qemu-kvm]]></emulator>
                <disk type='network' device='cdrom'>
[root@opennebula01 1]#

I’ve more core files but but it’s 7MB and there’s a 3MB limit on these posts.

I could send them privately if needed.

Cheers,
Tom K

Hello @TomK,

Does the error persist in newer OpenNebula versions? Could you try it in ONE 5.8.5 (the latest version)?

Wasn’t able to get past this so I reverted to another hypervisor. However do plan to reinstall the latest version of ON on some new hardware that I plan to get. I’ll let you know at the time?

The closest I could get to the location of the error at that moment is via the gdb output above.

#0  0x00007fb46ad03620 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib64/libstdc++.so.6
#1  0x00000000004494cc in LibVirtDriver::deployment_description_kvm (this=0x1276cc0, vm=0x7fb44c08aa10, file_name="/var/lib/one/vms/1/deployment.4")
    at src/vmm/LibVirtDriverKVM.cc:696

Cheers,
TK

Yes, please let me know if you have any troubles after reinstalling it.

The error has returned in 5.8.5:

[753534.000787] oned[3260]: segfault at 4 ip 00000000005d827e sp 00007fe5e1016630 error 4 in oned[400000+346000]

I’ll try to upgrade to the latest to see if this repeats over time.

[root@one01 one]# tail -f /var/log/messages -n 10000|grep one|grep seg
Nov 26 07:50:40 one01 kernel: oned[3260]: segfault at 4 ip 00000000005d827e sp 00007fe5e1016630 error 4 in oned[400000+346000]
Nov 26 10:37:49 one01 kernel: oned[23347]: segfault at 4 ip 00000000005d827e sp 00007f2ae7ffe630 error 4 in oned[400000+346000]

Synopsis:

This happened after one of the MySQL servers in my MySQL Galera cluster ran out of space due to too many mysql_bin logs being written to /var/lib/mysql. Cleared the space but had to restart the scheduler alongside the basic one restart command before I could login:

one restart
one restart-sched

Before I could login to the UI again. When attempting to restart before clearing the space, received this error:

[oneadmin@one01 ~]$ one restart
Could not open connect to database server: Too many connections
oned failed to start
/bin/one: line 117: 20673 Terminated              $ONE_SCHEDULER
[oneadmin@one01 ~]$

Cheers,
TK