No Bootable disk when using Ceph as storage

Hello Everyone, we are triying to set up a a VM orchestrator in our lab with the following setup (Opennebula + Ceph + KVM) Ceph is set up as storage in ceph mode for both image and datastore (I can confirm that the machines are booting up and get to a running state, and are storred in ceph properly (Machines can reach both KVM and Storage) however Installing any OS (Centos or Windows fails with no bootable disk. Any help would be appreciated since i am unsure which log i should start checking.

Please let me know which logs i would need to provide as i am not seeing logging any usefull information in regards to the image transfer and registration in oned.log, sched.log, or ceph.log

a.gavric,

Can you please confirm the following:

  • Is the size of the pool being populated in your datastore entries?
  • Do you have rbd version 2 configured?
  • Do you have ceph keys + secret configured for KVM’s libvirtd?
  • Do you have ceph keys on your frontend?
  • Do you have the ceph hosts set as your monitor nodes?
  • Do you have the ceph bridge (i can’t remember exactly as it is called) set for your one frontends? (its what i use, your set up may vary)
  • Can you ceph --id -user- status
  • Can you rbd --id -user- ls -p
  • Can you upload to the ceph pool from the frontend?
  • Is your vm template using a image from the ceph pool?
  • Is your vm template deploying onto the correct datastore?
  • Are the images you are trying installing the OS on set to non persistent so when you deploy it again the install didnt get saved?

If you can do the above, creating a VM should just work

Hi @IowaOrganics Iowa thanks for hellping answers are below

  • Is the size of the pool being populated in your datastore entries?
    I can see the size of the pool in the datastore but its not being populsted (always stays at about 4 MB)
  • Do you have rbd version 2 configured?
    Yes
  • Do you have ceph keys + secret configured for KVM’s libvirtd?
    Yes
  • Do you have ceph keys on your frontend?
    Yes
  • Do you have the ceph hosts set as your monitor nodes?
    Yes (We have only one monitor node and 4 OSDS atm)
  • Do you have the ceph bridge (i can’t remember exactly as it is called) set for your one frontends? (its what i use, your set up may vary)
    Yes
  • Can you ceph --id -user- status
    Yes, ceph --id libvirt -status (Cluster Health ok)
  • Can you rbd --id -user- ls -p
    Yes, rbd --id libvirt ls -p one (Comes back with the rbd imagess)
  • Can you upload to the ceph pool from the frontend?
    No (Frontend only crates the RBD Image but does not populate it)
  • Is your vm template using a image from the ceph pool?
    Yes (Or atleast that is how we intended it to)
  • Is your vm template deploying onto the correct datastore?
    Yes
  • Are the images you are trying installing the OS on set to non persistent so when you deploy it again the install didnt get saved?
    Tried Persistent and non persistent
    As an additional piece of information Firewall is turned off and Cephx autentication is set to none at the moment also while troubleshooting i have found that what might be causing the issue could also be the cp script located in /var/lib/one/remote/datastores/ceph/cp
    My senior colleague is helping me write an addition that will log stderr in a log file that i will post as well.

That is a weird problem!
Can you run the following to confirm if uploaded images via sunstone have created empty rbd volumes?
rbd --id libvirt diff one/one-##

this is the output of rbd --id libvirt diff one/one-90

Offset Length Type
515899392 8192 data
2340421632 12288 data
3976200192 8192 data
5821693952 12288 data
10326376448 4096 data
10334765056 8192 data
10347347968 12288 data

Ok that is a good first sign. Let’s mount the rbd and see if there is a filesystem. You will be changing some values of the rbd so make sure this is okay to do testing.

* rbd --id libvirt map one-90 -p one #it will throw an error for you to run a command like the following
* rbd --id libvirt feature disable 90/one-90 object-map fast-diff deep-flatten
* rbd --id libvirt map one-90 -p one
* mount /dev/rbd0 /mnt
* ls /mnt

Ok rbd -id libvirt map one-90 -p one results it being mapped to /deb/rbd0

mount/dev/rbd0/mnt returned with
mount: /mnt: wrong fs type bad option, bad superblock on /dev/rbd0, missing codepage or helper program or other error.

Weird. It certainly seems something is interrupted while writing to the rbd volume. I have never had that issue and I have broken my set up numerous times.
What version of ceph and opennebula are you using on which operating systems?

Opennebula 5.10 with Ceph Octopus (Latest) running on centos 8

Hello Everyone i managed to get the problem resolved, the problem ended up being that the disks were formated to gpt using parted, which was causing the entire storage cluster to go bananas. i fixed the problem by dissasembling the cluster and before making new osds using the commmand ceph-volume lvm zap /dev/vdb after which i was able to properly upload data into my OSDS, thanks for the help in solving this. This one can be marked as solved.