CEPH Jewel volatile images problem

Hello,

I have upgraded my CEPH cluster from Hammer to Jewel, and I have a problem with OpenNebula volatile disks: apparently kernel-based RBD driver cannot handle CEPH RBD volumes (images) created with features based on exclusive locking in CEPH Jewel. An excerpt from the log file when creating the volatile image is here:

Thu Nov  3 17:15:15 2016 [Z0][TM][I]: if [ "swap" = "swap" ]; then
Thu Nov  3 17:15:15 2016 [Z0][TM][I]: sudo rbd --id libvirt map one/one-sys-688-1 || exit $?
Thu Nov  3 17:15:15 2016 [Z0][TM][I]: sudo mkswap -L swap /dev/rbd/one/one-sys-688-1
Thu Nov  3 17:15:15 2016 [Z0][TM][I]: sudo rbd --id libvirt unmap /dev/rbd/one/one-sys-688-1
Thu Nov  3 17:15:15 2016 [Z0][TM][I]: fi" failed: rbd: sysfs write failed
Thu Nov  3 17:15:15 2016 [Z0][TM][I]: rbd: map failed: (6) No such device or address
Thu Nov  3 17:15:15 2016 [Z0][TM][E]: Error creating volatile disk.1 (one/one-sys-688-1) in myhost26 into pool one.
Thu Nov  3 17:15:15 2016 [Z0][TM][I]: ExitCode: 6
Thu Nov  3 17:15:15 2016 [Z0][TM][E]: Error executing image transfer script: Error creating volatile disk.1 (one/one-sys-688-1) in myhost26 into pool one.

QEMU and user-space librados can handle these images without problem, so the problem is only when creating volatile swap images, because they have to be mapped by kernel-space RBD driver before mkswap(8) is run.

I don’t know a clean solution for this problem. FWIW, I managed to work around the problem by adding --image-feature layering switch to the “rbd create” command line in /var/lib/one/remotes/tm/ceph/mkimage around line 100:

$RBD create $FORMAT_OPT $RBD_SOURCE --size ${SIZE} --image-feature layering || exit \$?

I don’t know whether it would be worth patching the official ONe distribution (at least until mainstream kernels start supporting the feature set used by default by rbd create), but I am at least posting it here so that any potential users of CEPH Jewel with ONe can find it.

-Yenya

3 Likes

Hello,

this problem is still present in remotes/tm/ceph/mkimage in 5.2.1. Can somebody look at it, and should it be confirmed in an ONe installation other than mine :-), apply the fix to the current tree?

Thanks,

-Yenya