Oned keeps crashing (5.0.2)

Hello,

today my oned crashed, and keeps crashing frequently since then. I have discovered that the crash occurs when I want to run a particular VM - it gets stuck in ACTIVE/BOOT state. The log of that VM is here:

Wed Sep 21 22:48:44 2016 [Z0][VM][I]: New LCM state is CLEANUP_RESUBMIT
Wed Sep 21 22:48:44 2016 [Z0][VMM][I]: Driver command for 535 cancelled
Wed Sep 21 22:48:46 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_delete.
Wed Sep 21 22:48:48 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_delete.
Wed Sep 21 22:48:49 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_delete.
Wed Sep 21 22:48:49 2016 [Z0][VMM][I]: Host successfully cleaned.
Wed Sep 21 22:48:49 2016 [Z0][VM][I]: New LCM state is LCM_INIT
Wed Sep 21 22:48:49 2016 [Z0][VM][I]: New state is PENDING
Wed Sep 21 22:49:12 2016 [Z0][VM][I]: New state is ACTIVE
Wed Sep 21 22:49:12 2016 [Z0][VM][I]: New LCM state is PROLOG
Wed Sep 21 22:49:14 2016 [Z0][VM][I]: New LCM state is BOOT
Wed Sep 21 22:49:14 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/535/deployment.4

When I try to recover the VM with delete_recreate action, oned crashes again. The dmesg output suggests that is is segfaulting:

[11954702.553854] oned[24194]: segfault at 0 ip 0000000000419701 sp 00007f8a6f8880e0 error 4 in oned[400000+2a7000]
[12050439.863920] oned[3388]: segfault at 0 ip 0000000000419701 sp 00007f47f6fc80e0 error 4 in oned[400000+2a7000]
[12050645.599567] oned[9812]: segfault at 0 ip 0000000000419701 sp 00007fd1361ce0e0 error 4 in oned[400000+2a7000]
[12050776.849441] oned[11661]: segfault at 0 ip 0000000000419701 sp 00007ff67b76e0e0 error 4 in oned[400000+2a7000]
[12050816.727402] oned[12690]: segfault at 0 ip 0000000000419701 sp 00007f43e134c0e0 error 4 in oned[400000+2a7000]
[12051580.822236] oned[22275]: segfault at 0 ip 0000000000419701 sp 00007f4b0a22a0e0 error 4 in oned[400000+2a7000]
[12051929.002355] oned[27786]: segfault at 0 ip 0000000000419701 sp 00007f54c4fdb0e0 error 4 in oned[400000+2a7000]
[12053932.149952] oned[29424]: segfault at 0 ip 0000000000419701 sp 00007f1f2f2b50e0 error 4 in oned[400000+2a7000]
[12054268.239181] oned[25421]: segfault at 0 ip 0000000000419701 sp 00007f049bffd0e0 error 4 in oned[400000+2a7000]

What can I do in order to debug the problem? I have verified that (at least some) other VMs can be booted without problem.

Thanks,

-Yenya

OK, apparently the owner of that VM tried to use a volatile disk as bootable. We deleted the VM and instantiated it from the template once again, and oned crashed. Unfortunately, the owner managed to delete the template before I made a backup for further examination.

Can I extract the template from the MySQL database? I think I have a nightly backup.

I will try to reproduce it myself, but so far it seems it was not a race condition or something, but a fully reproducible bug with some unususal prerequisities.

-Yenya

That would be helpful, thank you.

You can get the complete xml from the ‘body’ column of the template_pool table.
Or get the failed VMs. A deleted VM is actually kept in the vm_pool table in the ‘DONE’ state, and can be retrieved for example with a onevm show -x command.