Does anybody see the "CPU stuck for <large number>s" guest errors?

Yenya · September 9, 2022, 12:39pm

Hello,

my quesstion is not directly related to OpenNebula (probably). But for last few months, I ocassionally see my guest VMs locking up with the following error printed to their console:

Oct  4 09:55:40 guest123 kernel: [2148740.198048] watchdog: BUG: soft lockup - CPU#5 stuck for 2000426s! [swapper/5:0]
Sep  9 13:16:53 guest123 kernel: [2148750.428411] watchdog: BUG: soft lockup - CPU#6 stuck for 2000430s! [lua5.2:784]

As far as I know, having CPU stuck for several seconds means that the QEMU thread in question simply did not receive the CPU time on an over-provisioned host. But this is something different: note that the lock-up time is HUGE (~23 days), and even the timestamp of the first message is 23 days in the future.

So this means timekeeping inside Qemu went horribly bad, maybe because of Qemu threads being scheduled on different host CPUs.

It usually happens for me after the VM is live-migrated to a different host, but I think it sometimes happens even without migration. And on the other hand, I wrote a script that live migrates one of my VMs sequentially to all my physical hosts, and the script completed 5 loops without the VM crashing or reporting the stuck CPU.

My physical hosts have time synchronized with the local NTP server, and I have verified that the time on them is in indeed in sync.

Does anybody see this? Thanks,

-Yenya

Yenya · September 9, 2022, 4:45pm

OK, apparently downgrading qemu-kvm-core helped.

2022-09-09T16:33:16+0200 SUBDEBUG Downgrade: qemu-kvm-core-15:6.2.0-5.module_el8.6.0+1087+b42c8331.x86_64
2022-09-09T16:33:16+0200 SUBDEBUG Downgraded: qemu-kvm-core-15:6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64

system · June 25, 2025, 3:48pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Live migration fails randomly Product Support	16	3221	August 19, 2017
I/O wait and CPU load high => VM stuck Product Support	6	5057	November 2, 2018
Cannot limit CPU in OpenNebula Installation & Configuration	8	1216	February 1, 2020
Problem with migrating vms after host error hook launched Product Support	2	2236	September 12, 2017
Too High CPU usage Product Support	3	860	June 8, 2016

Does anybody see the "CPU stuck for <large number>s" guest errors?

Related topics