Best Way to protect againt duplicate running Instance with QEMU/KVM and libvirt [RESOLVED]

Hi everyone,

I’m looking to the best solution to protect againt the duplicate running instance between multiple hosts.
I’m using virtlockd with libvirt in order to lock each disks when VM is in running state and an error appeared on a host creating an auto migration with the hooks configured from opennebula manager.

I’m using a SAN connected by ISCSI with OCFS2 File System on each nodes and manager and use the shared filesystem as datastore in OpenNebula. So all the disks (datastore 1) and all the running instance (datastore 0) are shared between hosts and manager.

That seems to be a good solution but the VM is steel starting on another host when I stop the libvirtd process on the first node.

Does the architecture that I have implemented myself has something wrong or that’s simply not possible ?

I’m waiting to your reply,

Thanks,

Hi,

Nobody as any idea about that point ?
I’m steel looking for a solution, but I didn’t resolved the problem.
I can’t understand why the virtlockd wouldn’t lock the disk and prevent from starting on the other Node.

Any advises would be very useful :smile:

Sincerly,

When an Image is persistent OpenNebula will not allow another VM run from
it. So, I guess that is the best way o protect against a duplicate running
VM from the same disk image

Hi,

Thanks Ruben to your reply,

The point is that the problem happening when a crash or stop of libvirtd occured. So the VM is automatically migrated to the other host using the hooks, but when the other Node come back, the VM is running from 2 node so there is 2 different access to the disks that could create filesystem’s problems.

How can I resolved this problem ?

we usually implement this by interfacing with a fencing mechanism that
shutdowns the host

I’m surprised, I was thinking that I need to use a lock mechanism process more than a “fencing mechanism that
shutdowns the host” ?

“libvirt” describe the lock mechanism to prevent this kind of problem : https://libvirt.org/locking.html

Which kind of fencing mechanisms should I use to prevent that problem ?

Fencing is the standard approach to prevent split brain conditions in a
distributed system. AFAIK the locking mechanism of libvirt is not safe… A
proper fence requires a dedicated network and device.

Hi,

So I will install watchdog to act as fencing system which one will reboot host if the libvirt process is detected as stopped or crashed. Do you think that is a good solution, do you think about another fencing mechanism ?

Many thanks Ruben to your support.

Finally,

I succeed to make the protection against duplicates running instances.
I used sanlock with watchdog to create a lock file and monitor the read/write access.
I spoke with a libvirt engineer which one said me that virtlockd is not yet compliant with OCFS2 Filesystem. So that was the reasons why I had the lock files created but the VM could still start on the other Node.

So right now, I get this error when a VM try to boot on the other node and a lock file persist:
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/129/deployment.4
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: error: internal error: Failed to acquire lock: error -243
Fri Apr 24 14:37:39 2015 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/129/deployment.4
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: ExitCode: 255
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Apr 24 14:37:39 2015 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/129/deployment.4
Fri Apr 24 14:37:39 2015 [Z0][DiM][I]: New VM state is FAILED

Everything is working perfectly,

1 Like

Thanks for sharing :slight_smile: