New VMs stuck in BOOT state

Hello,

I’ve got small, 5-node OpenNebula cluster. Creating of new VMs works fine on 4 of 5 nodes. On 5’th node it stucks on BOOT state.

LOGS of newly created VM:
Thu Jan 14 10:52:30 2016 [Z0][DiM][I]: New VM state is ACTIVE.
Thu Jan 14 10:52:30 2016 [Z0][LCM][I]: New VM state is PROLOG.
Thu Jan 14 10:52:31 2016 [Z0][LCM][I]: New VM state is BOOT
Thu Jan 14 10:52:31 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/303/deployment.0

Deployment file on master node looks fine, but I cannot find it on problematic node. It is not visible in ‘virsh list --all’. /var/log/libvirt/qemu/one-XXX.log is not created.

I cannot find anything in Nebula’s logs. My Nebula’s version is 4.12.0.

Any help would be appreciated.

Reagrds,
Konrad

Konrad forum@opennebula.org writes:

Hello,

Hello,

Deployment file on master node looks fine, but I cannot find it on
problematic node. It is not visible in ‘virsh list
–all’. /var/log/libvirt/qemu/one-XXX.log is not created.

Nothing in /var/log/libvirt/libvirtd.log either?

Regards.

Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF

No, nothing. I don’t even have such file.

Can I enable some verbose logging on the fly (without restarting running VMs)?

Oh, I DO have running VMs on this node. It suddenly stopped accepting new VMs.

Regards,
Konrad

What is very strage to me, live migration TO this problematic node works fine. But live migration FROM broken node to another does not.

Hi,

I am having a very similar problem after doing by mistake a rm -Rf * in /var/lib/one. I restored /var/lib/one/remotes from the RPM and I have everything working except that when I launch a VM, it is stucked in BOOT state as described by @kobe, right after the Generating image file message. What is really strange is that if I connect on the selected hypervisor, I can do successfully a virsh create /var/lib/one/xxx/deployment.0 and log into the VM (the only problem is that the network is not configured as ONE has not yet done the network configuration phase).

I spent the whole day trying to find the problem or what could be missing, without any success. The only thing that I have not done is to restart the oned service because I didn’t want to have any trouble with the VMs still running and be in a situation were I could not restart them…

Any hint would be very much appreciated. Cheers,

Michel

PS: I am still running a very old version of ONE, 3.2. In fact I use ONE through StratusLab but this should be irrelevant here…

Answering to myself… Restarting the oned daemon fixed the problem.

Michel

Thanks! I did the same.

Worked here too, but can hardly be called a fix.
A fix is if it stops happening

Anyone know how / what data we should gather to open a bug on this?
database state? coredump of oned?

One of the physical nodes was broken,so I restart it. Then I started the vms deployed on it, but all of them stuck in BOOT.
I was thougth of something wrong with this node. Today I restarted the opennebula.service, this node work properly now.

Hello:
I also having the same problem and resolve the issues after restart OpenNebula services on master node.

and again i am here with the same issue. i hope going to 5.10 finally changes anything :frowning: