[one.vm.info] Error getting virtual machine 725

Greetings.
Month later after upgrade from 5.8.5 to 5.12.0.4 a problem is occured. User tried to expand disk size, then rebooted VM and it stacked in BOOT status in sunstone.
Here’s the VM body xml:

oned.log:

Fri Jun 25 21:33:21 2021 [Z0][InM][E]: Error parsing VM_STATE: syntax error, unexpected VARIABLE, expecting EQUAL or EQUAL_EMPTY at line 0, columns 10:12
Fri Jun 25 21:33:21 2021 [Z0][InM][E]: Received message was: database is locked
Fri Jun 25 21:33:26 2021 [Z0][InM][E]: Error parsing VM_STATE: syntax error, unexpected VARIABLE, expecting EQUAL or EQUAL_EMPTY at line 0, columns 10:12
Fri Jun 25 21:33:26 2021 [Z0][InM][E]: Received message was: database is locked

…and others like that, line differs.
If I try to start VM with virsh create, the rights to VM’s disks are changed:

-rw-rw-r-- 1 oneadmin oneadmin 1,6K june 6 2019 deployment.7
-rw-rw-r-- 1 libvirt-qemu kvm 11G june 25 22:18 disk.0
-rw-r--r-- 1 libvirt-qemu kvm 364K june 25 18:27 disk.1
-rw-rw-r-- 1 libvirt-qemu kvm 100G june 25 22:18 disk.2

There are russian symbols in body.xml. Tried to delete it with onedb update-body but no sense. Also tried to change STATE and LCM_STATE of VM from 3 and 20 to 0 for example, but still BOOT status in sunstone.

Hi @smirnovpv ,

It doesn’t seem a problem with the VM body. According to the the translator you are getting this message:

Fri Jun 25 18:27:40 2021 : Error deploying virtual machine: ovswitch: sudo: password required

Have you tried to configure the ovswitch command to run as sudo paswordless for the OpenNebula admin user (usually oneadmin).

Cheers.

Hi @rdiaz.
Checked it out, all the config files in /etc/sudoers.d is exists and same as other hosts:

/etc/sudoers.d/opennebula-node:
oneadmin ALL=(ALL:ALL) NOPASSWD: ONE_CEPH, ONE_NET, ONE_OVS, ONE_LVM
/etc/sudoers.d/opennebula:
Defaults:oneadmin !requiretty
Defaults:oneadmin secure_path = /sbin:/bin:/usr/sbin:/usr/bin
Cmnd_Alias ONE_CEPH = /usr/bin/rbd
Cmnd_Alias ONE_FIRECRACKER = /usr/bin/jailer, /bin/mount, /usr/sbin/one-clean-firecracker-domain, /usr/sbin/one-prepare-firecracker-domain
Cmnd_Alias ONE_HA = /bin/systemctl start opennebula-flow, /bin/systemctl stop opennebula-flow, /bin/systemctl start opennebula-gate, /bin/systemctl stop opennebula-gate, /bin/systemctl start opennebula-hem, /bin/systemctl stop opennebula-hem, /bin/systemctl start opennebula-showback.timer, /bin/systemctl stop opennebula-showback.timer, /usr/sbin/service opennebula-flow start, /usr/sbin/service opennebula-flow stop, /usr/sbin/service opennebula-gate start, /usr/sbin/service opennebula-gate stop, /usr/sbin/service opennebula-hem start, /usr/sbin/service opennebula-hem stop, /usr/bin/arping, /sbin/ip address *
Cmnd_Alias ONE_LVM = /sbin/lvcreate, /sbin/lvremove, /sbin/lvs, /sbin/vgdisplay, /sbin/lvchange, /sbin/lvscan, /sbin/lvextend
Cmnd_Alias ONE_LXD = /snap/bin/lxc, /usr/bin/catfstab, /bin/mount, /bin/umount, /bin/mkdir, /bin/lsblk, /sbin/losetup, /sbin/kpartx, /usr/bin/qemu-nbd, /sbin/blkid, /sbin/e2fsck, /sbin/resize2fs, /usr/sbin/xfs_growfs, /usr/bin/rbd-nbd, /usr/sbin/xfs_admin, /sbin/tune2fs
Cmnd_Alias ONE_MARKET = /usr/lib/one/sh/create_container_image.sh, /usr/lib/one/sh/create_docker_image.sh
Cmnd_Alias ONE_NET = /sbin/ebtables, /sbin/iptables, /sbin/ip6tables, /sbin/ipset, /sbin/ip link *, /sbin/ip tuntap *
Cmnd_Alias ONE_OVS = /usr/bin/ovs-ofctl, /usr/bin/ovs-vsctl
## Command aliases are enabled individually in dedicated
## sudoers files by each OpenNebula component (server, node).
# oneadmin ALL=(ALL) NOPASSWD: ONE_CEPH, ONE_FIRECRACKER, ONE_HA, ONE_LVM, ONE_LXD, ONE_MARKET, ONE_NET, ONE_OVS
sudo visudo:
Defaults        env_reset
Defaults        mail_badpass
Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
# Host alias specification
# User alias specification
# Cmnd alias specification
# User privilege specification
root    ALL=(ALL:ALL) ALL
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL
# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL
# See sudoers(5) for more information on "#include" directives:
#includedir /etc/sudoers.d

That’s all by default, what should I try to do?

Please try running onehost sync -f and then running onevm recover --retry <VM_ID>. If the VM fails again, please follow this instructions:

  • Set LOG/DEBUG_LEVEL to 5 in /etc/one/oned.conf and restart OpenNebula
  • Run recover --retry <VM_ID> again
  • Share the content of /var/log/one/<VM_ID>.log and the error printed in /var/log/one/oned.log

Cheers.

Hi @rdiaz

onehost sync -f:
All hosts updated successfully.

oneadmin@nola2:/home/netman$ onevm recover --retry 725
[one.vm.recover] Error getting virtual machine [725].

/var/log/one/725.log:
Fri Jun 25 18:27:21 2021 [Z0][LCM][I]: VM disk resize operation completed.
Fri Jun 25 18:27:21 2021 [Z0][VM][I]: New state is POWEROFF
Fri Jun 25 18:27:21 2021 [Z0][VM][I]: New LCM state is LCM_INIT
Fri Jun 25 18:27:38 2021 [Z0][VM][I]: New state is ACTIVE
Fri Jun 25 18:27:38 2021 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Fri Jun 25 18:27:38 2021 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/725/deployment.22
Fri Jun 25 18:27:40 2021 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Jun 25 18:27:40 2021 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vnm/ovswitch/pre
Fri Jun 25 18:27:40 2021 [Z0][VMM][I]: sudo: требуется пароль
Fri Jun 25 18:27:40 2021 [Z0][VMM][E]: pre: Command "sudo -n ovs-vsctl --may-exist add-br ovsbr0" failed.
Fri Jun 25 18:27:40 2021 [Z0][VMM][E]: pre: sudo: требуется пароль
Fri Jun 25 18:27:40 2021 [Z0][VMM][E]: sudo: требуется пароль
Fri Jun 25 18:27:40 2021 [Z0][VMM][E]:
Fri Jun 25 18:27:40 2021 [Z0][VMM][I]: ExitCode: 1
Fri Jun 25 18:27:40 2021 [Z0][VMM][I]: Failed to execute network driver operation: pre.
Fri Jun 25 18:27:40 2021 [Z0][VMM][E]: Error deploying virtual machine: ovswitch: sudo: требуется пароль
Fri Jun 25 18:27:40 2021 [Z0][VM][I]: New state is POWEROFF
Fri Jun 25 18:27:40 2021 [Z0][VM][I]: New LCM state is LCM_INIT
Fri Jun 25 18:28:48 2021 [Z0][VM][I]: New state is ACTIVE
Fri Jun 25 18:28:48 2021 [Z0][VM][I]: New LCM state is BOOT_POWEROFF

Doesn’t change since 25 July as I can see.

/var/log/one/oned.log:
Tue Jun 29 15:59:17 2021 [Z0][ReM][D]: Req:1392 UID:0 IP:127.0.0.1 one.vm.recover invoked , 725, 2
Tue Jun 29 15:59:17 2021 [Z0][ReM][E]: Req:1392 UID:0 one.vm.recover result FAILURE [one.vm.recover] Error getting virtual machine [725].
Tue Jun 29 16:00:29 2021 [Z0][InM][D]: VM_STATE update from host: 7. VM id: 725, state: RUNNING
Tue Jun 29 16:00:29 2021 [Z0][InM][W]: Unable to find VM, id: 725

That RUNNING state was got at the second attempt of “onehost sync -f”

I would suggest you to change your LANG to en_US.UTF-8.
Moreover, it seems you have some problems with passwordless sudo.

@knawnd, it seems so, all I want right now is just to restore the access to VMs from sunstone to migrate 'em to another host, and reconfigure the failed host.
Also I can start VMs manually with virsh create.

@rdiaz, I’ve checked one thing - deployed new VM to failed host, everything was OK until I added a russian symbols into VM’s description - it failed right in that moment.
So please could you tell me what should I fix in DB Tables to get it work?
There was no problems in v5.8.5.

What I’ve got now:

±-------------------±---------------------------+
| character_set_name | table_name |
±-------------------±---------------------------+
| latin1 | marketplaceapp_pool |
| latin1 | db_versioning |
| latin1 | secgroup_pool |
| latin1 | pool_control |
| latin1 | history |
| latin1 | vrouter_pool |
| latin1 | cluster_vnc_bitmap |
| latin1 | group_pool |
| latin1 | marketplace_pool |
| latin1 | vmgroup_pool |
| latin1 | zone_pool |
| latin1 | user_pool |
| latin1 | network_vlan_bitmap |
| latin1 | old_document_pool |
| latin1 | local_db_versioning |
| latin1 | template_pool |
| latin1 | vm_showback |
| latin1 | vm_import |
| latin1 | vdc_pool |
| utf8 | vn_template_pool |
| utf8 | network_pool |
| utf8 | image_pool |
| utf8mb4 | system_attributes |
| utf8mb4 | hook_log |
| utf8mb4 | group_quotas |
| utf8mb4 | cluster_pool |
| utf8mb4 | cluster_datastore_relation |
| utf8mb4 | hook_pool |
| utf8mb4 | cluster_network_relation |
| utf8mb4 | vm_pool |
| utf8mb4 | vm_monitoring |
| utf8mb4 | user_quotas |
| utf8mb4 | logdb |
| utf8mb4 | acl |
| utf8mb4 | host_pool |
| utf8mb4 | document_pool |
| utf8mb4 | datastore_pool |
| utf8mb4 | host_monitoring |
±-------------------±---------------------------+

So is it possible to change it now?

@rdiaz, I’ve converted all tables in utf8mb4. Still VM getting an error. It is 100% bug with russian letters, but I’ve already cleaned it up with onedb update-body vm --id, what else should I do?
Thanks!

That command is falling. Try running sudo -n ovs-vsctl --may-exist add-br ovsbr0 on host 7.

On the other hand, it seems that VM 725 no longer exists in the OpenNebula database (you probably deleted it).

@rdiaz,
I’ve corrected all the problems with sudo on host, new VM can be deployed without problems.

On the other hand, it seems that VM 725 no longer exists in the OpenNebula database (you probably deleted it).

If I do SELECT body FROM vm_pool WHERE oid=725; the body exist in DB, also onedb show-body vm works. All VM’s files in /var/lib/one/datastores/0/725 are OK, I can use virsh create and VM will start, respond to ping, etc.
I’ve also tried to diff 2 VM bodies - failed (on the left side) and alive (on the right side) - deadVMleft-aliveVMright - Diff Checker
Please, don’t pay attention about VM’s id - I have some of them stacked in BOOT state so id 450 is one of them.
Here is last VM’s history seq - oneadmin@nola2:/home/netman/backups$ onedb show-history --id 450 --seq 14<HIST - Pastebin.com - checked it out also for the russian letters or error, seems to be OK. Tried to validate XML and pass it through online-services which checks for russian symbols - all good.

Now I want to try onedb purge-history but here’s output:

Empty SQL query result
["/usr/lib/one/ruby/onedb/onedb_live.rb:96:in `select'",
 "/usr/lib/one/ruby/onedb/onedb_live.rb:191:in `block in purge_history'",
 "/usr/lib/one/ruby/opennebula/xml_pool.rb:35:in `block in each_element'",
 "/usr/share/one/gems-dist/gems/nokogiri-1.10.9/lib/nokogiri/xml/node_set.rb:238:in `block in each'",
 "/usr/share/one/gems-dist/gems/nokogiri-1.10.9/lib/nokogiri/xml/node_set.rb:237:in `upto'",
 "/usr/share/one/gems-dist/gems/nokogiri-1.10.9/lib/nokogiri/xml/node_set.rb:237:in `each'",
 "/usr/lib/one/ruby/opennebula/xml_pool.rb:34:in `each_element'",
 "/usr/lib/one/ruby/opennebula/pool.rb:159:in `each'",
 "/usr/lib/one/ruby/onedb/onedb_live.rb:133:in `purge_history'",
 "/usr/bin/onedb:532:in `block (2 levels) in <main>'",
 "/usr/lib/one/ruby/cli/command_parser.rb:482:in `run'",
 "/usr/lib/one/ruby/cli/command_parser.rb:84:in `initialize'",
 "/usr/bin/onedb:336:in `new'",
 "/usr/bin/onedb:336:in `<main>'"]

Conclusion is the failure occured not because of sudo issue itself but because of russian symbols in error section about sudo.

@rdiaz
Hi. I’ve reacreated failed VMs with new IDs.
The last thing I need to do is remove failed VMs from the sunstone (even in DB I believe). I’ve tried to change state in body XML like that - <STATE>6</STATE><LCM_STATE>0</LCM_STATE> - but with no effect, VMs are still stacked in BOOT state.
What is the right way to delete them now?
Thank you.