OpenNebula 5.0.2 Attach/Detach images - datastore folder not cleaned up

sysval · October 10, 2016, 3:00pm

Hi guys,
working on OpenNebula 5.0.2 I noticed strange behaviors attaching/detaching disks on my Virtual Machines.
Here tests I ran on two vms:

TEST 1
I attached a non-persistent image to a VM.
This is how appeared my vm directory /var/lib/one/datastores/0/893:
-rw-rw-r–. 1 oneadmin oneadmin 1079 Oct 10 11:16 deployment.0
lrwxrwxrwx. 1 oneadmin oneadmin 13 Oct 10 11:16 disk.0 -> disk.0.snap/0
drwxrwxr-x. 2 oneadmin oneadmin 4 Oct 10 11:16 disk.0.snap
-rw-r–r--. 1 oneadmin oneadmin 372736 Oct 10 11:16 disk.1
lrwxrwxrwx. 1 oneadmin oneadmin 13 Oct 10 12:16 disk.2 -> disk.2.snap/0
drwxrwxr-x. 2 oneadmin oneadmin 4 Oct 10 12:16 disk.2.snap

When I detached the image, the symbolic link is destroyed but snapshot still exists:
-rw-rw-r–. 1 oneadmin oneadmin 1079 Oct 10 11:16 deployment.0
lrwxrwxrwx. 1 oneadmin oneadmin 13 Oct 10 11:16 disk.0 -> disk.0.snap/0
drwxrwxr-x. 2 oneadmin oneadmin 4 Oct 10 11:16 disk.0.snap
-rw-r–r--. 1 oneadmin oneadmin 372736 Oct 10 11:16 disk.1
drwxrwxr-x. 2 oneadmin oneadmin 4 Oct 10 12:16 disk.2.snap

I got following errors trying to attach on the same vm a disk:
-attaching non-persistent image:
Mon Oct 10 12:17:37 2016 : Error attaching new VM Disk: Error copying myserver:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 to myhost:/var/lib/one//datastores/0/893/disk.2
-attaching persistent image:
Mon Oct 10 12:18:38 2016 : Error attaching new VM Disk: Error linking myserver:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 to myhost:/var/lib/one//datastores/0/893/disk.2

TEST 2
I attached a persistent image to a VM.
This is how appeared my vm directory /var/lib/one/datastores/0/894:
-rw-rw-r–. 1 oneadmin oneadmin 1079 Oct 10 11:16 deployment.0
lrwxrwxrwx. 1 oneadmin oneadmin 13 Oct 10 11:16 disk.0 -> disk.0.snap/0
drwxrwxr-x. 2 oneadmin oneadmin 4 Oct 10 11:16 disk.0.snap
-rw-r–r--. 1 oneadmin oneadmin 372736 Oct 10 11:16 disk.1
lrwxrwxrwx. 1 oneadmin oneadmin 67 Oct 10 12:24 disk.2 -> /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20.snap/0
lrwxrwxrwx. 1 oneadmin oneadmin 65 Oct 10 12:24 disk.2.snap -> /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20.snap

When I detached the image, both files created (disk.2 and disk.2.snap) were still present in the vm directory.

If I attached a new persistent disk link were overwritten and disk was attach correctly.
If I attached non persistent image I got the following error:
Mon Oct 10 12:28:26 2016 : Error attaching new VM Disk: Error copying myserver:/var/lib/one//datastores/1/85f115b6bff8f8c97600d68817743c20 to myhost:/var/lib/one//datastores/0/894/disk.2
and this is how appeared the vm directory /var/lib/one/datastores/0/894:
-rw-rw-r–. 1 oneadmin oneadmin 1079 Oct 10 11:16 deployment.0
lrwxrwxrwx. 1 oneadmin oneadmin 13 Oct 10 11:16 disk.0 -> disk.0.snap/0
drwxrwxr-x. 2 oneadmin oneadmin 4 Oct 10 11:16 disk.0.snap
-rw-r–r--. 1 oneadmin oneadmin 372736 Oct 10 11:16 disk.1
lrwxrwxrwx. 1 oneadmin oneadmin 13 Oct 10 12:26 disk.2 -> disk.2.snap/0
lrwxrwxrwx. 1 oneadmin oneadmin 65 Oct 10 12:24 disk.2.snap -> /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20.snap

My datastores TM_MAD attribute is qcow2.
Seems that there is something wrong during disk detaching, files doesn’t clean up correctly.

Can you help me out?

atodorov_storpool · October 10, 2016, 8:37pm

Hi Valentina,

There should be some more lines in the logs before or after the quoted error messages describing the exact reason for the failure.

I am only reading the scripts and think the following changes should fix the reported behavior. Can you try the following changes?

Edit /var/lib/one/remotes/tm/qcow2/clone and replace

mkdir -p ${DST_PATH}.snap
rm -f $DST_PATH

with

rm -rf $DST_PATH{,snap}
mkdir -p ${DST_PATH}.snap

I think this will fix the non-persistent disk attachment after persistent disk detach. Reading the code I think it is possible to have data corruption on the persistent image as it is probably overwritten with the non-persistent image source by qemu-img. Not sure without knowing which command of the script failed.

To proper clean-up the artifacts after detaching non-persistent disk, please try editing /var/lib/one/remotes/tm/qcow2/delete replacing:

function=“rm -rf $DST_PATH”

with

function=“rm -rf $DST_PATH{,snap}”

Kind Regards,
Anton Todorov

sysval · October 11, 2016, 7:12am

Hello Anton,
I cheked out logs and errors come from clone and ln scripts as you can see below.

About the test1, non-persistent disk attachment after non-persistent disk detach:

Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: Command execution fail: /var/lib/one/remotes/tm/qcow2/clone myfrontend:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 myhost:/var/lib/one//datastores/0/910/disk.2 910 101
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: /var/lib/one/remotes/tm/qcow2/clone: line 101: [: : integer expression expected
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: clone: Cloning /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20 in myhost:/var/lib/one//datastores/0/910/disk.2
Tue Oct 11 06:42:51 2016 [Z0][VMM][E]: clone: Command “cd /var/lib/one/datastores/0/910
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: mkdir -p /var/lib/one/datastores/0/910/disk.2.snap
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: rm -f /var/lib/one/datastores/0/910/disk.2
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: qemu-img create -b /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20 -f qcow2 /var/lib/one/datastores/0/910/disk.2.snap/0
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: ln -s disk.2.snap/0 /var/lib/one/datastores/0/910/disk.2
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: cd /var/lib/one/datastores/0/910/disk.2.snap
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: ln -s . /var/lib/one/datastores/0/910/disk.2.snap/disk.2.snap” failed: ln: failed to create symbolic link ‘/var/lib/one/datastores/0/910/disk.2.snap/disk.2.snap/.’: File exists
Tue Oct 11 06:42:51 2016 [Z0][VMM][E]: Error copying myfrontend:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 to myhost:/var/lib/one//datastores/0/910/disk.2
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: ExitCode: 1
Tue Oct 11 06:42:51 2016 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_attach.

About the test1, persistent disk attachment after non-persistent disk detach:

Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: Command execution fail: /var/lib/one/remotes/tm/qcow2/ln myfrontend:/var/lib/one//datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd myhost:/var/lib/one//datastores/0/910/disk.2 910 1
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ln: Linking /var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd in myhost:/var/lib/one//datastores/0/910/disk.2
Tue Oct 11 06:44:57 2016 [Z0][VMM][E]: ln: Command “set -ex
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: mkdir -p “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap”
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: if [ ! -L “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap/6f1f6f41c86843157b6a9ba30c1e10cd.snap” ]; then
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ln -s “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap” “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap/6f1f6f41c86843157b6a9ba30c1e10cd.snap”
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: fi
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: SNAP=”$(ls /var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap | grep ‘^[1]$’ | sort -n | tail -n 1)"
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: if [ -z “${SNAP}” ]; then
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: SNAP=0
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ln -sf “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd” “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap/0”
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: fi
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ln -sf “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap/$SNAP” “/var/lib/one/datastores/0/910/disk.2”
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: rm -f “/var/lib/one/datastores/0/910/disk.2.snap”
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ln -sf “/var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap” “/var/lib/one/datastores/0/910/disk.2.snap”" failed: + mkdir -p /var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: + ‘[’ ‘!’ -L /var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap/6f1f6f41c86843157b6a9ba30c1e10cd.snap ‘]’
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ++ ls /var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ++ grep '^[2]$’
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ++ sort -n
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ++ tail -n 1
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: + SNAP=0
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: + ‘[’ -z 0 ‘]’
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: + ln -sf /var/lib/one/datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd.snap/0 /var/lib/one/datastores/0/910/disk.2
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: + rm -f /var/lib/one/datastores/0/910/disk.2.snap
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: rm: cannot remove ‘/var/lib/one/datastores/0/910/disk.2.snap’: Is a directory
Tue Oct 11 06:44:57 2016 [Z0][VMM][E]: Error linking myhost:/var/lib/one//datastores/1/6f1f6f41c86843157b6a9ba30c1e10cd to myhost:/var/lib/one//datastores/0/910/disk.2
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: ExitCode: 1
Tue Oct 11 06:44:57 2016 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_attach.

About test2, disk attachment after persistent disk detach:

Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: Command execution fail: /var/lib/one/remotes/tm/qcow2/clone myfrontend:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 myhost:/var/lib/one//datastores/0/911/disk.2 911 101
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: /var/lib/one/remotes/tm/qcow2/clone: line 101: [: : integer expression expected
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: clone: Cloning /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20 in myhost:/var/lib/one//datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][E]: clone: Command “cd /var/lib/one/datastores/0/911
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: mkdir -p /var/lib/one/datastores/0/911/disk.2.snap
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: rm -f /var/lib/one/datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: qemu-img create -b /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20 -f qcow2 /var/lib/one/datastores/0/911/disk.2.snap/0
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: ln -s disk.2.snap/0 /var/lib/one/datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: cd /var/lib/one/datastores/0/911/disk.2.snap
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: ln -s . /var/lib/one/datastores/0/911/disk.2.snap/disk.2.snap” failed: ln: failed to create symbolic link ‘/var/lib/one/datastores/0/911/disk.2.snap/disk.2.snap/.’: File exists
Tue Oct 11 06:56:19 2016 [Z0][VMM][E]: Error copying myhost:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 to myhost:/var/lib/one//datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: ExitCode: 1
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_attach.

I will try to change clone script as you suggested and let you know.
Best Regards,
Valentina

[:digit:] ↩︎
[:digit:] ↩︎

atodorov_storpool · October 11, 2016, 7:47am

Hi Valentina,

While checking the logs I’ve figured out that I was looking at the drivers of v5.2beta.

I think that to fix this you should patch the /var/lib/one/remotes/tm/qcow2/ln too:

$RM -f “${DST_PATH}.snap”

to

$RM -rf “${DST_PATH}.snap”

Please let me know is the proposed changes working so I can generate push requests for upstream merge.

Kind Regards,
Anton Todorov

sysval · October 11, 2016, 8:29am

Hi Anton,
I applied the changes you suggested but I still get the same error below:

sysval:

Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: Command execution fail: /var/lib/one/remotes/tm/qcow2/clone myfrontend:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 myhost:/var/lib/one//datastores/0/911/disk.2 911 101
Tue Oct 11 06:56:19 2016 [Z0][VMM] line 101: [: : integer expression expected
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: clone: Cloning /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20 in myhost:/var/lib/one//datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][E]: clone: Command “cd /var/lib/one/datastores/0/911
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: mkdir -p /var/lib/one/datastores/0/911/disk.2.snap
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: rm -f /var/lib/one/datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: qemu-img create -b /var/lib/one/datastores/101/85f115b6bff8f8c97600d68817743c20 -f qcow2 /var/lib/one/datastores/0/911/disk.2.snap/0
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: ln -s disk.2.snap/0 /var/lib/one/datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: cd /var/lib/one/datastores/0/911/disk.2.snap
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: ln -s . /var/lib/one/datastores/0/911/disk.2.snap/disk.2.snap” failed: ln: failed to create symbolic link ‘/var/lib/one/datastores/0/911/disk.2.snap/disk.2.snap/.’: File exists
Tue Oct 11 06:56:19 2016 [Z0][VMM][E]: Error copying myhost:/var/lib/one//datastores/101/85f115b6bff8f8c97600d68817743c20 to myhost:/var/lib/one//datastores/0/911/disk.2
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: ExitCode: 1
Tue Oct 11 06:56:19 2016 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_attach.

but instead getting this one regarding ln script :

I get the following and the vm goes in poweroff state:

Tue Oct 11 08:21:09 2016 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/attach_disk ‘one-913’ ‘/var/lib/one//datastores/0/913/disk.2’ ‘vdb’ ‘1’ ‘DATA’ 913 myhost
Tue Oct 11 08:21:09 2016 [Z0][VMM][E]: attach_disk: Command “virsh --connect qemu:///system attach-device one-913 /var/lib/one//datastores/0/913/disk.2.attach” failed: error: Failed to attach device from /var/lib/one//datastores/0/913/disk.2.attach
Tue Oct 11 08:21:09 2016 [Z0][VMM][I]: error: Unable to read from monitor: Connection reset by peer
Tue Oct 11 08:21:09 2016 [Z0][VMM][E]: Could not attach /var/lib/one//datastores/0/913/disk.2 (vdb) to one-913
Tue Oct 11 08:21:09 2016 [Z0][VMM][I]: ExitCode: 1
Tue Oct 11 08:21:09 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_detach.
Tue Oct 11 08:21:09 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: attach_disk.
Tue Oct 11 08:21:09 2016 [Z0][VMM][E]: Error attaching new VM Disk: Could not attach /var/lib/one//datastores/0/913/disk.2 (vdb) to one-913
Tue Oct 11 08:21:09 2016 [Z0][VM][I]: New LCM state is RUNNING
Tue Oct 11 08:21:51 2016 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Tue Oct 11 08:21:51 2016 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Tue Oct 11 08:21:51 2016 [Z0][VM][I]: New state is POWEROFF
Tue Oct 11 08:21:51 2016 [Z0][VM][I]: New LCM state is LCM_INIT

atodorov_storpool · October 11, 2016, 8:57am

Hi Valentina,

My bad, missed a dort(.) before the snap please try this one dor /var/lib/one/remotes/tm/qcow2/clone

and for /var/lib/one/remotes/tm/qcow2/delete

Regarding the ln failure, it is strange because the rm -rf is always returning true…

Please try witht he updated fixes and paste the new logs if there is failure.

Kind Regards,
Anton Todorov

sysval · October 11, 2016, 9:48am

Hi Anton,
all works as expected!
Let me know if I need to open a bug or a push request.

Best Regards,
Valentina

atodorov_storpool · October 11, 2016, 11:35am

Hi Valentina,

I am glad it works. I’ll add pull requests linking to this forum topic.

Best Regards,
Anton Todorov

sysval · October 11, 2016, 3:46pm

Hi Anton,
I was doing same tests with “shared” transport mode and I noticed a very bad thing.
The clone script in /var/lib/one/remotes/tm/shared/clone doesn’t have

rm -f $DST_PATH

as the one for qcow2 tm.
This means that, if you attach to a persistent disk to a vm, detach it and attach another non-persistent one, you destroy the persistent disk (previosly detached) because the symbolic link still exist in my VM directory /var/lib/one/datastore/0/VMID.
I don’t get any error in the logs because the disk is mounted “correctly”.
I tried to add the remove command in the clone script and resolve the bug.
Can you take a look please?
Best Regards,
Valentina

atodorov_storpool · October 11, 2016, 4:39pm

Hi Valentina,

Please try patching /var/lib/one/remotes/tm/shared/clone to add rm -rf ${DST_PATH}

CLONE_CMD=“cd ${DST_DIR};
rm -rf ${DST_PATH};
cp ${SRC_PATH} ${DST_PATH}
${RESIZE_CMD}”

Best,
Anton

atodorov_storpool · October 11, 2016, 6:53pm

Hi Valentina,

I would like to note that this patch will work but the root cause is that there is no cleanup after persistent disk detach. The proper fix is to develop mvds script that needs more time and testing.

Kinf Regards,
Anton Todorov

sysval · October 12, 2016, 7:02am

Hi Anton,
the patch works.
Thanks again for the help.

King Regards,
Valentina

ruben · October 13, 2016, 9:24am

Hi,

Just an update, we are going to keep the original patches from Anton (in fact some rm operations were already in place). We prefer not to perform additional calls to ssh in mvds, and protect the disks from potential failures executing mvds. So the idea is remove any target link before performing any destructive operation on the target links.

The affected scripts are:
qcow2/clone
qcow2/ln
shared/clone
shared/ln

(Note that some of them were already working as described)

Cheers

Topic		Replies	Views
Open Nebula : After attaching a disk to a VM, getting "failed to lock the file" When I am starting the machine Product Support	9	3047	May 24, 2017
(BOOT_MIGRATE_FAILURE) during datastore migration with OpenNebula 4.14.0, BUG? Product Support	10	3311	January 12, 2016
Image data couldn't came back to image file in SSH image datastore Product Support	12	1359	May 19, 2016
Onevm migrate and cloned images Product Support	14	2356	June 3, 2015
Secondary disk being removed Product Support	4	458	March 27, 2020

OpenNebula 5.0.2 Attach/Detach images - datastore folder not cleaned up

Related topics