Best Way to protect againt duplicate running Instance with QEMU/KVM and libvirt [RESOLVED]

slefeuvre · April 21, 2015, 10:56am

Hi everyone,

I’m looking to the best solution to protect againt the duplicate running instance between multiple hosts.
I’m using virtlockd with libvirt in order to lock each disks when VM is in running state and an error appeared on a host creating an auto migration with the hooks configured from opennebula manager.

I’m using a SAN connected by ISCSI with OCFS2 File System on each nodes and manager and use the shared filesystem as datastore in OpenNebula. So all the disks (datastore 1) and all the running instance (datastore 0) are shared between hosts and manager.

That seems to be a good solution but the VM is steel starting on another host when I stop the libvirtd process on the first node.

Does the architecture that I have implemented myself has something wrong or that’s simply not possible ?

I’m waiting to your reply,

Thanks,

slefeuvre · April 23, 2015, 2:01pm

Hi,

Nobody as any idea about that point ?
I’m steel looking for a solution, but I didn’t resolved the problem.
I can’t understand why the virtlockd wouldn’t lock the disk and prevent from starting on the other Node.

Any advises would be very useful

Sincerly,

ruben · April 23, 2015, 2:11pm

When an Image is persistent OpenNebula will not allow another VM run from
it. So, I guess that is the best way o protect against a duplicate running
VM from the same disk image

slefeuvre · April 23, 2015, 2:19pm

Hi,

Thanks Ruben to your reply,

The point is that the problem happening when a crash or stop of libvirtd occured. So the VM is automatically migrated to the other host using the hooks, but when the other Node come back, the VM is running from 2 node so there is 2 different access to the disks that could create filesystem’s problems.

How can I resolved this problem ?

ruben · April 23, 2015, 2:29pm

we usually implement this by interfacing with a fencing mechanism that
shutdowns the host

slefeuvre · April 23, 2015, 2:52pm

I’m surprised, I was thinking that I need to use a lock mechanism process more than a “fencing mechanism that
shutdowns the host” ?

“libvirt” describe the lock mechanism to prevent this kind of problem : https://libvirt.org/locking.html

Which kind of fencing mechanisms should I use to prevent that problem ?

ruben · April 23, 2015, 3:04pm

Fencing is the standard approach to prevent split brain conditions in a
distributed system. AFAIK the locking mechanism of libvirt is not safe… A
proper fence requires a dedicated network and device.

slefeuvre · April 24, 2015, 7:24am

Hi,

So I will install watchdog to act as fencing system which one will reboot host if the libvirt process is detected as stopped or crashed. Do you think that is a good solution, do you think about another fencing mechanism ?

Many thanks Ruben to your support.

slefeuvre · April 24, 2015, 2:19pm

Finally,

I succeed to make the protection against duplicates running instances.
I used sanlock with watchdog to create a lock file and monitor the read/write access.
I spoke with a libvirt engineer which one said me that virtlockd is not yet compliant with OCFS2 Filesystem. So that was the reasons why I had the lock files created but the VM could still start on the other Node.

So right now, I get this error when a VM try to boot on the other node and a lock file persist:
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/129/deployment.4
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: error: internal error: Failed to acquire lock: error -243
Fri Apr 24 14:37:39 2015 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/129/deployment.4
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: ExitCode: 255
Fri Apr 24 14:37:39 2015 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Apr 24 14:37:39 2015 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/129/deployment.4
Fri Apr 24 14:37:39 2015 [Z0][DiM][I]: New VM state is FAILED

Everything is working perfectly,

ruben · April 24, 2015, 4:26pm

Thanks for sharing

Topic		Replies	Views
VMs are on different Hosts than ONE is thinking Storage	1	230	December 28, 2020
VM crashes one by one at a time Community Support	9	1356	July 6, 2017
VMs fail to start after hypervisor failure General	2	266	February 15, 2021
Problem creating KVM hosts on former libvirt/virsh working station General	0	313	November 20, 2020
Shrödinger's virtual machine: simultaneously in RUNNING and POWEROFF states Community Support	0	556	May 17, 2019

Best Way to protect againt duplicate running Instance with QEMU/KVM and libvirt [RESOLVED]

Related Topics