We upgraded from 4.14 to 5.0.2
Thankfully it was successful. I can create VMs, live-migrate, etc. However, after stopping a test VM and terminating I get an error:
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ceph/delete
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: delete: Deleting /var/lib/one/datastores/0/285/disk.0
Fri Aug 19 10:55:41 2016 [Z0][TM][E]: delete: Command " RBD="rbd --id libvirt"
Fri Aug 19 10:55:41 2016 [Z0][TM][I]:
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: if [ "$(rbd_format one/one-92-285-0)" = "2" ]; then
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: rbd_rm_r $(rbd_top_parent one/one-92-285-0)
Fri Aug 19 10:55:41 2016 [Z0][TM][I]:
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: if [ -n "285-0" ]; then
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: rbd_rm_snap one/one-92 285-0
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: fi
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: else
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: rbd --id libvirt rm one/one-92-285-0
…
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: bash: line 109: rbd: command not found
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: bash: line 221: rbd: command not found
Fri Aug 19 10:55:41 2016 [Z0][TM][E]: Error deleting one/one-92-285-0 in xxxxxxx
Fri Aug 19 10:55:41 2016 [Z0][TM][I]: ExitCode: 127
Fri Aug 19 10:55:41 2016 [Z0][TM][E]: Error executing image transfer script: Error deleting one/one-92-285-0
Fri Aug 19 10:55:41 2016 [Z0][VM][I]: New LCM state is EPILOG_FAILURE
I can then go into Recover menu and delete the VM there. Any thoughts?
Somehow since the upgrade from 4.14 to 5.0.2 it is not ssh-ing to the ceph nodes to run the “rbd --id libvirt rm one/one-92-285-0” command. We don’t have ceph-common installed on the host running OpenNebula. The BRIDGE_LIST variable looks correct though, based on output of “onedatastore show XXX” command. It has the ceph nodes at least.
Here is part of the 'onedatastore show" command for ceph datastore:
I added the variable “CEPH_HOST” to the ceph datastore template (c.f. http://docs.opennebula.org/5.0/deployment/open_cloud_storage_setup/ceph_ds.html), but it had no effect. Creating a test vm, I noticed this time that “Terminate” was greyed out but “Terminate hard” was available through Sunstone gui. This command failed (again) however with same errors as above. Deleting the vm by using “Recover -> Delete” button produced the following messages:
Where the test vm is 287 and the ceph host is host 12. Also, I checked on the ceph cluster using “rbd ls -p one --id libvirt” and the vm snapshot was deleted.