Scheduler does not distpatch VMs with RDM datastore images

Hi all

We have an issue with the scheduler and RDM images. We have included a new datastore and image (/dev/sdf) just following opennebula documentation (we are using opennebula 5.8.1):

https://docs.opennebula.org/5.8/deployment/open_cloud_storage_setup/dev_ds.html

the new datastore and image are created correctly but the problem is with the scheduler. The VM waits in a pending status and from the scheduler log we can see:

Fri Feb 14 12:54:35 2020 [Z0][VM][E]: Error deploying virtual machine 314 to HID: 8. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: shared
Fri Feb 14 12:54:35 2020 [Z0][VM][E]: Error deploying virtual machine 314 to HID: 9. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: shared
Fri Feb 14 12:54:35 2020 [Z0][VM][E]: Error deploying virtual machine 314 to HID: 7. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: shared
Fri Feb 14 12:54:35 2020 [Z0][VM][E]: Error deploying virtual machine 314 to HID: 6. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: shared

From oned.conf we are using the default values for “dev” datastores:

...
DS_MAD_CONF = [
    name = "dev",
    persistent_only = "yes",
    required_attrs = "DISK_TYPE"
]
...
TM_MAD_CONF = [
    clone_target = "NONE",
    ln_target = "NONE",
    name = "dev",
    SHARED = "YES"
]

Any idea why we are getting this scheduler error message? I have also changed the default SHARED = “YES” by NO but we get the same error message.

Thanks in advance!
Álvaro

Hi

We were testing this a bit more, just to remove some extra checks we have removed the VM main image (from a ceph datastore) so we only keep the rdm image. We still have the issue with the scheduler complaining about the shared transfer mode. I think that we had this issue since the 5.8 upgrade (I think it was working in 5.6 release). I will try to upgrade to 5.10 and try to debug the issue a bit more.

Cheers
Álvaro

Hello, there are some changes required in oned.conf, those changes are pushed on to the master branch, take a look at this. You need to add the missing options added to oned.conf for your existing RDM datastore, or recreate the datastore (it inherits from oned.conf).

Hi Daniel

Thanks a lot for the reply. We did a quick check in 5.8.1 just using the same TM_MAD_CONF used in 5.10.3

TM_MAD_CONF = [
NAME = “dev”, LN_TARGET = “NONE”, CLONE_TARGET = “NONE”, SHARED = “YES”,
TM_MAD_SYSTEM = “ssh,shared”, LN_TARGET_SSH = “SYSTEM”, CLONE_TARGET_SSH = “SYSTEM”,
DISK_TYPE_SSH = “BLOCK”, LN_TARGET_SHARED = “NONE”,
CLONE_TARGET_SHARED = “SELF”, DISK_TYPE_SHARED = “BLOCK”
]

instead of the upstream conf setup for dev:

TM_MAD_CONF = [
NAME = “dev”, LN_TARGET = “NONE”, CLONE_TARGET = “NONE”, SHARED = “YES”
]

but we still have the same issue from the scheduler, maybe this is an issue with 5.8.1 and DRM datastores? We can upgrade to 5.10.x and check this again.

Cheers and thanks!
Álvaro

Hi

We have upgraded our testing machine to 5.10.1-1 and with the updated oned option:

TM_MAD_CONF = [
NAME = “dev”, LN_TARGET = “NONE”, CLONE_TARGET = “NONE”, SHARED = “YES”,
TM_MAD_SYSTEM = “ssh,shared”, LN_TARGET_SSH = “SYSTEM”, CLONE_TARGET_SSH = “SYSTEM”,
DISK_TYPE_SSH = “BLOCK”, LN_TARGET_SHARED = “NONE”,
CLONE_TARGET_SHARED = “SELF”, DISK_TYPE_SHARED = “BLOCK”
]

but the scheduler issue persists:

Mon Mar 16 11:29:03 2020 [Z0][VM][E]: Error deploying virtual machine 8 to HID: 3. Reason: [one.vm.deploy] Image Datastore does not support transfer mode: shared

It looks like it is complaining about the system datastore, in our case it is shared between the nodes:

# onedatastore list
ID NAME SIZE AVA CLUSTERS IMAGES TYPE DS TM STAT
101 rdm.altaria 1M 100 0 1 img dev dev on
100 ceph.altaria 28T 100 0 1 img ceph ceph on
2 files 30.1G 92% 0 0 fil fs ssh on
1 default 30.1G 92% 0 0 img fs ssh on
0 system 28.1T 99% 0 0 sys - shared on

but it is also fails if we set the sys datastore to ssh instead of shared TM. In that case it complains with a Image Datastore does not support transfer mode: ssh

Anyone has tested this also with 5.10.3?

Cheers
Álvaro

Hi all

Ok I found the issue, maybe this should be included in the documentationas well.

The problem was that I did the requested change:

TM_MAD_CONF = [
NAME = “dev”, LN_TARGET = “NONE”, CLONE_TARGET = “NONE”, SHARED = “YES”,
TM_MAD_SYSTEM = “ssh,shared”, LN_TARGET_SSH = “SYSTEM”, CLONE_TARGET_SSH = “SYSTEM”,
DISK_TYPE_SSH = “BLOCK”, LN_TARGET_SHARED = “NONE”,
CLONE_TARGET_SHARED = “SELF”, DISK_TYPE_SHARED = “BLOCK”
]

but after that you should also update the “old” RDM datastore if you are already using an old one to get the changes, just onedastore update <datastore> or create a new one to take into account these changes (after oned restart).

Now it works fine in 5.10.1 but probably it will work in 5.8.0 as well after these changes.

Cheers and thanks!
Álvaro

Hi @alvaro_simongarcia glad you worked it out. Since updating oned.conf doesn’t update the existing datastores you need to also add the config to them. I suggested it in my previous answer

You need to add the missing options added to oned.conf for your existing RDM datastore, or recreate the datastore (it inherits from oned.conf)

This behavior is documented on the oned.conf comments, take a look at this, but if you feel it’s worth adding that to the documentation feel free to open a PR.

In any case it’s good to have more threads about the RDM datastore. Thanks for sharing your experience !!!

Hi @dann1

Thanks a lot for the info, sorry I forgot to try that during the test… I also did the test with the 5.8.x release and it works with the same workaround.

Cheers
Álvaro