Migration failure on 6.10.0 - missing to_s. How to recover?

Yenya · February 3, 2025, 2:11pm

Hi all,

I have just upgraded to 6.10.0 from 6.8 (CE), and wanted to reboot all nodes. During onehost flush I have seen migrating VMs failing with the following message in the VM log:

Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: Command execution fail (exit code: 1): cat << 'EOT' | /var/lib/one/tmp/vmm/kvm/migrate '0b62ee41-3530-459d-9f92-ab0de19d826a' 'node5' 'node4' 3853 node4
Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: virsh --connect qemu:///system migrate --live 0b62ee41-3530-459d-9f92-ab0de19d826a qemu+ssh://node5/system (23.462960391s)
Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: Error mirgating VM 0b62ee41-3530-459d-9f92-ab0de19d826a to host node5: undefined method `upcase' for nil:NilClass
Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: ["/var/lib/one/tmp/vmm/kvm/migrate:255:in `<main>'"]
Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: ExitCode: 1
Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_failmigrate.
Mon Feb 3 14:18:16 2025 [Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Mon Feb 3 14:18:16 2025 [Z0][VMM][E]: MIGRATE: virsh --connect qemu:///system migrate --live 0b62ee41-3530-459d-9f92-ab0de19d826a qemu+ssh://node5/system (23.462960391s) Error mirgating VM 0b62ee41-3530-459d-9f92-ab0de19d826a to host node5: undefined method `upcase' for nil:NilClass ["/var/lib/one/tmp/vmm/kvm/migrate:255:in `<main>'"] ExitCode: 1
Mon Feb 3 14:18:16 2025 [Z0][VM][I]: New LCM state is RUNNING
Mon Feb 3 14:18:16 2025 [Z0][LCM][I]: Fail to live migrate VM. Assuming that the VM is still RUNNING.
Mon Feb 3 14:18:47 2025 [Z0][LCM][I]: VM running but monitor state is POWEROFF

Now the VM seems to be running on node5 (i.e. it migrated successfully), but OpenNebula reports that it is in POWEROFF state.

The fix seems to be simple:

--- /var/lib/one/remotes-6.10.0-1.el9-dist/vmm/kvm/migrate	2024-08-27 18:27:44.000000000 +0200
+++ /var/lib/one/remotes/vmm/kvm/migrate	2025-02-03 14:58:18.190160184 +0100
@@ -252,7 +252,7 @@
 
     # Compact memory
     # rubocop:disable Layout/LineLength
-    if ENV['CLEANUP_MEMORY_ON_STOP'].upcase == 'YES'
+    if ENV['CLEANUP_MEMORY_ON_STOP'].to_s.upcase == 'YES'
         `(sudo -l | grep -q sysctl) && sudo -n sysctl vm.drop_caches=3 vm.compact_memory=1 &>/dev/null &`
     end
     # rubocop:enable Layout/LineLength

But how can I recover the VMs without disruption? As I said, they are running on new hosts, so I just need to tell that to ONe. How can I do this? Thanks!

-Yenya

Yenya · February 3, 2025, 7:30pm

Update: trying to recover running QEMU processes while ONe thinks the VMs are POWEROFF. I tried to do

figure out the host where QEMU is really running
figure out the number of placements, something like onevm show --json $VM_ID | jq .VM.HISTORY_RECORDS.HISTORY[-1].SEQ
onedb update-history --id $VM_ID --seq $LAST_SEQ (there really should be a --last-seq switch instead of just --seq N) - edit the entry to reflect the hostname and ID of the host where the QEMU process is running.
onedb update-body vm --id $VM_ID – set all of STATE, LCM_STATE, PREV_STATE and PREV_LCM_STATE to 3.
onevm resched $VM_ID to set up a new QEMU process based on what ONe expects

This more or less works, but these VMs now do not have a VNC console. Clicking on the console icon in sunstone displays a new browser tab, but connection error Something went wrong, connection is closed is displayed instead of the VNC session. Some of the VMs now have “None” console set up in VM tab → Conf → Update Configuration → Input/Output. But even when I enable VNC manually, the console is still inaccessible.

So, what is the correct way how to tell ONe about a running QEMU ONe seems to forget?
Thanks,

-Yenya

jorel · February 6, 2025, 5:16pm

To recover the VNC for those VMs (in the new Sunstone, port 2616), I needed to restart also opennebula-guacd.

Yenya · February 7, 2025, 7:49am

@jorel: ok, thanks! We still use the original Ruby Sunstone by default.

Anyway, does the fix in the first post look feasible? Can you apply it?

Thanks,

-Yenya

jorel · February 7, 2025, 8:07am

It was actually already fixed on Sep 2.

but it was only released in EE hotfixes.

Yenya · February 7, 2025, 7:13pm

@jorel: thanks.

This is pretty sad to leave such a critical bug in CE and fix it only in EE. But never mind, it is not my project.

-Yenya

system · June 25, 2025, 3:47pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with migrating vms after host error hook launched Product Support	2	2240	September 12, 2017
Can't run VM after Opennebula upgrade. Need URGENT help! Product Support	9	1651	April 11, 2015
After upgrade to 5 one vm stuck in FAILED state Product Support	6	885	October 30, 2016
What to do with a ghost VM? Product Support	5	281	June 25, 2025
Shrödinger's virtual machine: simultaneously in RUNNING and POWEROFF states Product Support	0	599	May 17, 2019

Migration failure on 6.10.0 - missing to_s. How to recover?

Related topics