Lizardfs datastore and sheduling error about unsupported qcow2 transfert mode

DaD · July 10, 2020, 11:53am

Hello.

In preparation for the migration to 5.12 from 5.8, we are finishing our new infrastructure based on Lizardfs.

We setup our new hypervisors with Lizardfs storage but we experience messages like:

Fri Jul 10 12:06:18 2020 [Z0][VM][E]: Error deploying virtual machine 380825 to HID: 15. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2
Fri Jul 10 12:06:18 2020 [Z0][VM][E]: Error deploying virtual machine 380825 to HID: 14. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2
Fri Jul 10 12:06:18 2020 [Z0][VM][E]: Error deploying virtual machine 380825 to HID: 13. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2
Fri Jul 10 12:06:18 2020 [Z0][VM][E]: Error deploying virtual machine 380825 to HID: 16. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2
Fri Jul 10 12:06:18 2020 [Z0][VM][E]: Error deploying virtual machine 380825 to HID: 17. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

We setup the transfert driver and the datastore driver.

The transition between the previous hypervisors using a SAN and the new ones was in several steps:

Images are stored on SAN but usable on new hypervisors

NFS mount the SAN backed image datastores (with TM_MAD=qcow2) on the new hypervisors
create a SHARED system datastore on the new hypervisors (backed by Lizardfs)

Here are it’s informations:

DATASTORE 107 INFORMATION
ID             : 107
NAME           : test-cluster-system
USER           : nebula
GROUP          : oneadmin
CLUSTERS       : 102
TYPE           : SYSTEM
DS_MAD         : -
TM_MAD         : shared
BASE PATH      : /var/lib/one//datastores/107
DISK_TYPE      : FILE
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : 36.4T
FREE:          : 26.1T
USED:          : 10.3T
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="NO"
DISK_TYPE="FILE"
DS_MIGRATE="YES"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
SHARED="YES"
TM_MAD="shared"
TYPE="SYSTEM_DS"

This way, the new hypervisors can run VMs but the images are copied from the NFS.

New Lizardfs datastores

As it was not used before, we used the default datastore as the new Lizardfs image datastore:

DATASTORE 1 INFORMATION
ID             : 1
NAME           : default
USER           : nebula
GROUP          : oneadmin
CLUSTERS       : 102
TYPE           : IMAGE
DS_MAD         : lizardfs
TM_MAD         : lizardfs
BASE PATH      : /var/lib/one//datastores/1
DISK_TYPE      : FILE
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : 36.4T
FREE:          : 26.1T
USED:          : 10.3T
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
BRIDGE_LIST="nebula80 nebula81 nebula82 nebula83 nebula84"
CLONE_TARGET="SYSTEM"
CLONE_TARGET_SHARED="SYSTEM"
DISK_TYPE="FILE"
DISK_TYPE_SHARED="FILE"
DRIVER="qcow2"
DS_MAD="lizardfs"
LN_TARGET="NONE"
LN_TARGET_SHARED="NONE"
TM_MAD="lizardfs"
TM_MAD_SYSTEM="shared"
TYPE="IMAGE_DS"

and since it was not used either, we used the system datastore as the new Lizardfs system datastore:

DATASTORE 0 INFORMATION
ID             : 0
NAME           : system
USER           : nebula
GROUP          : oneadmin
CLUSTERS       : 102
TYPE           : SYSTEM
DS_MAD         : -
TM_MAD         : lizardfs
BASE PATH      : /var/lib/one//datastores/0
DISK_TYPE      : FILE
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : 36.4T
FREE:          : 26.1T
USED:          : 10.3T
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
DS_MIGRATE="YES"
SHARED="YES"
TM_MAD="lizardfs"
TYPE="SYSTEM_DS"

Unable to disable the `test-cluster-system` system datastore

Now we are ready cleanup old stuffs, I tried to disable the test-cluster-system before removing it when all VM will be migrated, but this results in the error message Error deploying virtual machine X to HID: Y. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

Do you have any suggestion of what could I have missed?

Regards.

`oned.conf` informations

I modified the qcow2 TM_MAD_CONF as describe in another post:

TM_MAD_CONF = [
    NAME = "qcow2", LN_TARGET = "NONE", CLONE_TARGET = "SYSTEM", SHARED = "YES",
    DRIVER = "qcow2", TM_MAD_SYSTEM = "ssh,shared",
    LN_TARGET_SSH = "SYSTEM", CLONE_TARGET_SSH = "SYSTEM", DISK_TYPE_SSH = "FILE",
    LN_TARGET_SHARED = "SYSTEM", CLONE_TARGET_SHARED = "SYSTEM", DISK_TYPE_SHARED = "FILE"
]

Here are the configuration for Lizardfs:

TM_MAD_CONF = [
    NAME = "lizardfs",
    LN_TARGET = "NONE",
    CLONE_TARGET = "SYSTEM",
    SHARED = "YES",
    DS_MIGRATE = "YES",
    ALLOW_ORPHANS = "YES",

    TM_MAD_SYSTEM = "shared",
    LN_TARGET_SHARED = "NONE",
    CLONE_TARGET_SHARED = "SYSTEM",
    DISK_TYPE_SHARED = "FILE"
]

and

DS_MAD_CONF = [
    NAME = "lizardfs",
    REQUIRED_ATTRS = "",
    PERSISTENT_ONLY = "NO",
    MARKETPLACE_ACTIONS = "export"
]

DaD · July 13, 2020, 11:24am

I don’t understand, because when I select the system datastore manually during vm creation, it’s working fine :-/

DaD · July 13, 2020, 12:53pm

Digging in the code, the offending code is from src/vm/VirtualMachineDisk.cc:

    if ( ds_img->get_tm_mad_targets(tm_mad, ln_target, clone_target,
                disk_type) != 0 )
    {
        error = "Image Datastore does not support transfer mode: " + tm_mad;

        ds_img->unlock();
        return -1;
    }

I don’t understand where the tm_mad=qcow2 comes from since the default image datastore has TM_MAD=lizardfs.

I don’t manage to trace it in source code.

To me, when a VM is scheduled, there are 2 possibilities of system datastore to deploy the VM

system datastore which has TM_MAD=lizardfs
test_cluster_system datastore which has TM_MAD=shared compatible with default image datastore TM_MAD_SYSTEM=shared

I would have suppose that default image datastore attribute TM_MAD would have priority over TM_MAD_SYSTEM.

Any idea what I’m missing?

Regards.

ruben · July 13, 2020, 5:14pm

Can you check where the VM is being deployed?. In oned.log in the Error deploying virtual machine X to HID: Y. line, does it output any datastore? Also you can look at sched.log to see where the scheduler is trying to deploy the VM, there you can look for the datastore. We want to check the tm_mad of the system ds where the VM is being deployed.

Also could you check if the VM has a TM_MAD_SYSTEM attribute set?

DaD · July 13, 2020, 10:15pm

Hello Ruben.

Thanks a lot for your precious advices, you gave me great hints.

I was so confused that I completely forgot to check oned.log which shows the try one the 103 SYSTEM_DS which I thought was out of the game.

For the records, here are my reply to you questions.

It take 3 minutes for a VM to be deployed, which si too long for you jenkins which terminate the jobs with a timeout:

grep 382185 /var/log/one/sched.log

Mon Jul 13 21:07:42 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 13. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

Mon Jul 13 21:08:12 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 13. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

Mon Jul 13 21:08:43 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 15. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

Mon Jul 13 21:09:14 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 15. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

Mon Jul 13 21:09:45 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 15. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

Mon Jul 13 21:10:15 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 15. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

Mon Jul 13 21:10:46 2020 [Z0][VM][E]: Error deploying virtual machine 382185 to HID: 15. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: qcow2

[4 times previous message]

	382185	0		15	107

grep 382185 /var/log/one/oned.log

Mon Jul 13 21:07:42 2020 [Z0][ReM][D]: Req:9184 UID:0 IP:127.0.0.1 one.vm.deploy invoked , 382185, 13, false, 103, ""
Mon Jul 13 21:07:42 2020 [Z0][ReM][D]: Req:1328 UID:0 IP:127.0.0.1 one.vm.deploy invoked , 382185, 15, false, 103, ""
Mon Jul 13 21:07:42 2020 [Z0][ReM][D]: Req:8848 UID:0 IP:127.0.0.1 one.vm.deploy invoked , 382185, 17, false, 103, ""
Mon Jul 13 21:07:42 2020 [Z0][ReM][D]: Req:9744 UID:0 IP:127.0.0.1 one.vm.deploy invoked , 382185, 16, false, 103, ""
Mon Jul 13 21:07:42 2020 [Z0][ReM][D]: Req:3904 UID:0 IP:127.0.0.1 one.vm.deploy invoked , 382185, 14, false, 103, ""

Ok, I found that another SYSTEM_DS is meddling with our setup.

We hard 3 clusters and they are now replaced by our new much more powerful LizardFS hyperconvergent one.

What I did not understand was that all the SYSTEM_DS of the new cluster are tried, even the 103 which is backed by LizardFS but with TM_MAD=qcow2 (it was setup in a huge rush before lizardfs TM_MAD script were in place because the corresponding hypervisor died).

I will define COMPATIBLE_SYS_DS to restrict which SYSTEM_DS should be tried depending of the source IMAGE_DS.

This option will permit to finally empty the test-cluster-system datastore.

Yes, it was set:

onevm show -x 382185
[...]
    <DISK>
      <ALLOW_ORPHANS><![CDATA[YES]]></ALLOW_ORPHANS>
      <CLONE><![CDATA[YES]]></CLONE>
      <CLONE_TARGET><![CDATA[SYSTEM]]></CLONE_TARGET>
      <CLUSTER_ID><![CDATA[102]]></CLUSTER_ID>
      <DATASTORE><![CDATA[default]]></DATASTORE>
      <DATASTORE_ID><![CDATA[1]]></DATASTORE_ID>
      <DEV_PREFIX><![CDATA[sd]]></DEV_PREFIX>
      <DISK_ID><![CDATA[0]]></DISK_ID>
      <DISK_SNAPSHOT_TOTAL_SIZE><![CDATA[0]]></DISK_SNAPSHOT_TOTAL_SIZE>
      <DISK_TYPE><![CDATA[FILE]]></DISK_TYPE>
      <DRIVER><![CDATA[qcow2]]></DRIVER>
      <IMAGE><![CDATA[aca.zephir-2.7.2-instance-default-amd64.vm]]></IMAGE>
      <IMAGE_ID><![CDATA[68604]]></IMAGE_ID>
      <IMAGE_STATE><![CDATA[2]]></IMAGE_STATE>
      <IMAGE_UNAME><![CDATA[jenkins]]></IMAGE_UNAME>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <ORDER><![CDATA[1]]></ORDER>
      <ORIGINAL_SIZE><![CDATA[51200]]></ORIGINAL_SIZE>
      <READONLY><![CDATA[NO]]></READONLY>
      <SAVE><![CDATA[NO]]></SAVE>
      <SIZE><![CDATA[51200]]></SIZE>
      <SOURCE><![CDATA[/var/lib/one//datastores/1/0902efe48404ef12018c35a7feb6be19]]></SOURCE>
      <TARGET><![CDATA[sda]]></TARGET>
      <TM_MAD><![CDATA[lizardfs]]></TM_MAD>
      <TM_MAD_SYSTEM><![CDATA[shared]]></TM_MAD_SYSTEM>
      <TYPE><![CDATA[FILE]]></TYPE>
    </DISK>

Maybe the scheduler could just filter out incompatible SYSTEM_DS and try only our 0 and 107 which are compatible (respectively TM_MAD=lizards and TM_MAD=shared)?

This is finally solved, thanks.

ruben · July 14, 2020, 10:34am

Glad to hear you solved the issue

Topic		Replies	Views
Image Datastore does not support transfer mode: fs_lvm Product Support	6	1457	August 6, 2019
Image Datastore does not support transfer mode: ssh Operations	10	5121	March 19, 2020
Why IMAGES_DS parameters affects SYSTEM_DS? Integration Support	8	1623	April 17, 2019
LVM datastore with shared Image datastore Operations	1	814	July 7, 2019
Trying to migrate SYSTEM_DS to SYSTEM_DS qcow2 fails Product Support	0	463	April 16, 2020

Lizardfs datastore and sheduling error about unsupported qcow2 transfert mode

Images are stored on SAN but usable on new hypervisors

New Lizardfs datastores

Unable to disable the test-cluster-system system datastore

oned.conf informations

Related topics

Unable to disable the `test-cluster-system` system datastore

`oned.conf` informations