[SOLVED] Non-persistent Ceph image results in clone failure

Hi,

We have upgraded our production environment from 4.14.2 to 5.2.1 and are running in to the following issue when we start a VM with a non-persistent CEPH blockdevice attached to the VM.

The ceph cluster is running ceph version 0.94.5 and this used to work fine in 4.14.2, I hope that someone ran in to this as well and can provide a fix.

On another note, is it possible to make CEPH images persistent by default?

Tue Jan 24 18:00:28 2017 [Z0][VM][I]: New state is ACTIVE
Tue Jan 24 18:00:28 2017 [Z0][VM][I]: New LCM state is PROLOG
Tue Jan 24 18:01:31 2017 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ceph/clone xxx.xxx.xxx.nl:one/one-559 d-node01:/var/lib/one//datastores/103/24071/disk.3 24071 106
Tue Jan 24 18:01:31 2017 [Z0][TM][E]: clone: Command " RBD="rbd --id libvirt"
Tue Jan 24 18:01:31 2017 [Z0][TM][I]:
Tue Jan 24 18:01:31 2017 [Z0][TM][I]: rbd --id libvirt info one/one-559-24071-3 >/dev/null 2>&1 && exit 0
Tue Jan 24 18:01:31 2017 [Z0][TM][I]:
Tue Jan 24 18:01:31 2017 [Z0][TM][I]: rbd_make_snap one/one-559
Tue Jan 24 18:01:31 2017 [Z0][TM][I]:
Tue Jan 24 18:01:31 2017 [Z0][TM][I]: set -e -o pipefail
Tue Jan 24 18:01:31 2017 [Z0][TM][I]:
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: if [ "$(rbd_format one/one-559)" = "2" ]; then
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: rbd --id libvirt clone "one/one-559@snap" one/one-559-24071-3
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: else
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: rbd --id libvirt copy one/one-559 one/one-559-24071-3
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: fi
Tue Jan 24 18:01:32 2017 [Z0][TM][I]:
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: if [ -n "" -a "2000000" -gt "" ]; then
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: rbd --id libvirt resize one/one-559-24071-3 --size 2000000
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: fi" failed: 2017-01-24 18:01:31.972344 7fba8456a1c0 -1 librbd: error writing header: (38) Function not implemented
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: 2017-01-24 18:01:31.979964 7fba8456a1c0 -1 librbd: error creating child: (38) Function not implemented
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: rbd: clone error: (38) Function not implemented
Tue Jan 24 18:01:32 2017 [Z0][TM][E]: Error cloning one/one-559 to one/one-559-24071-3 in d-node01
Tue Jan 24 18:01:32 2017 [Z0][TM][I]: ExitCode: 38
Tue Jan 24 18:01:32 2017 [Z0][TM][E]: Error executing image transfer script: Error cloning one/one-559 to one/one-559-24071-3 in d-node01
Tue Jan 24 18:01:32 2017 [Z0][VM][I]: New LCM state is PROLOG_FAILURE

Datastore configuration

ID             : 106                 
NAME           : ceph                
USER           : oneadmin            
GROUP          : oneadmin            
CLUSTERS       : 0,102,103,105,107,109,110,112,113
TYPE           : IMAGE               
DS_MAD         : ceph                
TM_MAD         : ceph                
BASE PATH      : /var/lib/one//datastores/106
DISK_TYPE      : RBD                 
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 566.6T              
FREE:          : 433.3T              
USED:          : 133.2T              
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
BRIDGE_LIST="cephbridge01 cephbridge02 cephbridge03"
CEPH_HOST="cephmon01 cephmon02 cephmon03 cephmon04 cephmon05"
CEPH_SECRET="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
CEPH_USER="libvirt"
CLONE_TARGET="SELF"
DATASTORE_CAPACITY_CHECK="YES"
DISK_TYPE="RBD"
DS_MAD="ceph"
LN_TARGET="NONE"
POOL_NAME="one"
RBD_FORMAT="2"
TM_MAD="ceph"
TYPE="IMAGE_DS"

This issue has nothing to do with Opennebula but was a mismatch between Our ceph cluster version and ceph client version (Hammer vs. Jewel). After a downgrade of the client libraries to hammer everything started working again.