So I’ve picked up this project support a client when I’ve never used OpenNebula before yet I’m trying to troubleshoot a VM not starting (“pending”) status, originally it was reporting there was not enough capacity though it was there.
Running onehost sync -force seems to have corrected the values it was reading but now I’m noticing datastore 0 (system) is not set to the cluster the rest of the datastores are set to and sched is reporting this as being an issue and I’m not 100% how to resolve it or if it should even care that the system datastore is not in the cluster.
Any help is greatly appreciated here.
Fri Jan 6 03:37:01 2017 [VM][D]: Pending/rescheduling VM and capacity requirements:
VM CPU Memory System DS Image DS
Fri Jan 6 03:37:01 2017 [VM][I]: Dispatching VM 9 to host 2 and datastore 100
Fri Jan 6 03:37:01 2017 [VM][E]: Error deploying virtual machine 9 to HID: 2. Reason: [VirtualMachineDeploy] datastore [0] and host [2] are not in the same cluster.
Fri Jan 6 03:37:01 2017 [VM][I]: Dispatching VM 9 to host 0 and datastore 100
Fri Jan 6 03:37:01 2017 [VM][E]: Error deploying virtual machine 9 to HID: 0. Reason: [VirtualMachineDeploy] datastore [0] and host [0] are not in the same cluster.
[oneadmin@superadmin one]$ onedatastore list
ID NAME SIZE AVAIL CLUSTER IMAGES TYPE DS TM
0 system 142.5G 41% - 0 sys - shared
1 default 142.5G 41% LNProd 13 img fs shared
2 files 142.5G 41% LNProd 0 fil fs ssh
100 LNsystem 142.5G 41% LNProd 0 sys - shared
[oneadmin@superadmin one]$ onecluster list
ID NAME HOSTS VNETS DATASTORES
100 LNProd 2 1 3
[oneadmin@superadmin one]$ onecluster show 100
CLUSTER 100 INFORMATION
ID : 100
NAME : LNProd
the problem seems to be that your cluster with ID 100 does not have access to the datastore with the ID 0:
Assuming this is the reason the following command should solve your problem (it adds the datastore 0 to cluster 100:
Is there any way to modify the existing VM template (live) to specify that it should use the alternative system DS?
As I’m trying to support this as is, and the other hosts seem to be deployed on the system DS of 100, as well as additional VMs but I noticed the template for new VMs is setting different options.
Though I suppose adding the 0 system ds should resolve it as well, would that any impact on systems which are already located on the 0 system DS?
Where newly created VM specifies CLUSTER_ID = 100, and uses system data store 100 I don’t know if it’s relevant at all, but if possible I’d like to somehow adjust the “adminhost” VM to do exactly what this new VM I deployed is doing (which is that it deployed to the SYSTEM DS already attached to the cluster)
unfortunately I can’t answer your question for sure.
The datastore with ID 0 is a datastore of type SYSTEM in my case (in your case too?). So maybe at least one of your VMs has a volatile disk attached to it.
If you add datastore 0 to your cluster I assume that it will not have any negative impact other than that other VMs in cluster 100 could also use datastore 0. To prevent this you could disable datastore 0 (but still add it to cluster 100) as it is writen here: http://docs.opennebula.org/5.2/operation/host_cluster_management/datastore_guide.html#disable-a-system-datastore
Another idea that comes to my mind is the migration way: You could do an (offline) datastore migration to move the files into another datastore - unfortunately this operation is not supported by every TM driver.
I actually had tried disabling it previously it appears the version of opennebula doesn’t support it as it just throws the man page when I try it.
I tried adding 0 to the cluster,
[oneadmin@superadmin datastores]$ onecluster adddatastore 100 0
[ClusterAddDatastore] Cannot add object to cluster. Datastore 0 cannot be added to any cluster.
Though any new VM seems to work fine,
I did try to overwrite the disk on the new vm with the one from here (cp /var/lib/datastore/0/9/disk.0 & disk.1 /var/lib/datastore/100/47/) but when it booted I guess it overwrote the disk so I didn’t get the actual vm 9 disk.
Is there any other method to take over the new VM with the old data? I know you mentioned offline datastore migration, again I’m not familiar with OpenNebula at all, I would like to avoid taking down the entire thing if possible as the other VMs are in use.
VM 9 itself is stuck in a pending state, due to the error so it won’t allow me to perform any of the migrate actions on it.
I am new to OpenNebula too - for this reason I don’t know older versions than 5.2 and also don’t have a fundamental knowledge of the innards of OpenNebula.
But I would try the following (don’t know if this works - I have never done that before on OpenNebula but on other systems this procedure works):
Create a new VM with exactly the same disk sizes as the one you try to copy (ensure that you are using the same format either raw or qcow2 on both sides).
Start that VM once.
poweroff the VM again.
Now copy over the disks manually.
Start the newly “patched” VM again.
It should now contain the same data as the old VM. Please be aware that the newly created VM has other MAC and IP addresses. This could bring some systems into trouble.
That’s actually what I was trying previously however it seems to be cloning an image from /var/lib/one/datastores/1/ which appears to be a hash when it restarts the VM without the checkpoint being there.
So now I’m attempting to temporarily replace this image so that it clones the old disk for the old VM to the new one.
So,
I replaced the image I found it cloning during the “PROLOG” state let it clone that from the VM I was trying to bring up and it seemed to work fine as far as the IP/mac being different I was able to attach a secondary NIC request the IP of the old VM and detach the original network card from the new VM and it looks like I’ve got it back up and running now.