My general issues with implementing OpenNebula

Good day everyone,

We’ve been trying to evaluate OpenNebula for our small business and have come to encounter many issues. I would like to address these and ask if anyone could give us advice, tips and fixes for any.

First and foremost, we want to set up our server infrastructure with our own hardware - right now, we have three machines:

A colocated bare-metal and VPS in Frankfurt and a KVM in USE with the following hardware:

AMD EPYC 7352, 128GB DDR4-3200RDIMMs (Tegu),
2 Core 10GB RAM 40GB SSD (SDNS),
3 Core 8GB RAM 80GB SSD (USE-GOFB).

Tegu is supposed to be our frontend and host our needs for cloud storage, matrix server, web & dns server, processing and internal servers as well as hosting prototypes. Will also host autorative DNS server.
SDNS is supposed to be a secondary DNS server, I thought of hosting Sunstone on it as well.
USE-GOFB is a geo-offloading server that will also act as the ternary DNS server.

Issue #1 - LXD and Datastores.

Now, the first hurdle we ran into was deploying OpenNebula. We had several issues, but we solved these. However, we want to use LXD Containers - the problem is that every time we try to use the LXC Hypervisor, we get a driver error:

DEPLOY: INFO: deploy: 
No block device on /var/lib/one/datastores/0/3/mapper/disk.0 
No block device on /var/lib/one/datastores/0/3/mapper/disk.1 lxc-create: one-3: conf.c: chown_mapped_root: 3095 
Error chowning /var/lib/lxc-one/3/disk.0 lxc-create: one-3: lxccontainer.c: do_storage_create: 1295 
Error chowning "/var/lib/lxc-one/3/disk.0" to container root lxc-create: one-3: conf.c: suggest_default_idmap: 4699 
You must either run as root, or define uid mappings lxc-create: one-3: conf.c: suggest_default_idmap: 4700 
To pass uid mappings to lxc-create, you could create lxc-create: one-3: conf.c: suggest_default_idmap: 4701 ~/.config/lxc/default.conf: lxc-create: one-3: conf.c: suggest_default_idmap: 4702 lxc.include = /etc/lxc/default.conf lxc-create: one-3: conf.c: suggest_default_idmap: 4703 lxc.idmap = u 0 600100001 65537 lxc-create: one-3: conf.c: suggest_default_idmap: 4704 lxc.idmap = g 0 600100001 65537 lxc-create: one-3: lxccontainer.c: do_lxcapi_create: 1877 
Failed to create (none) storage for one-3 lxc-create: one-3: tools/lxc_create.c: main: 331 Failed to create container one-3 There was an error creating the containter. 
ExitCode: 255

Now, here’s a little background on our datastores:
We are hosting our front-end on a 256GB NVMe. We then have 2x512GB NVMes, a 1TB SSD, 2x4TB Hybrid-Drives, 2x2TB Hybrid-Drives and 4x3TB (R5) HDDs in Hardware. Because of that, we had to set up Datastores with Symlinks - and I think this might cause this.

Because OpenNebula actually doesn’t tell you how to add other drives in the server anywhere - probably because it assumes that you will have an LVM partition that makes all the drives into one partition for any node - we had to do it with symlinks. We don’t want to put all data in one LVM partition because there are some cases where we have I/O intensive applications, and others where we don’t.

I still don’t know how we did it, but that’s because I had to offload that to a different sysadmin that figured out how - I’m not that good with servers at all, however, I own the business - so I built the server, deployed it, and did the best to my abilities to help set it up, since it was around holidays.

I did verify and attempt to reinstall packages needed multiple times. We are using RockyLinux 8.5 - Basically RHEL.

Issue #2 - Un-Official Docker-Images

As advised in this:
https://docs.opennebula.io/6.0/management_and_operations/storage_management/marketplaces.html#downloading-non-official-images

I tried to create an image for:
matrixdotorg/synapse, seafileltd/seafile-mc and hestiacp (a fork of VestaCP that just looks nicer).

Synapse and Seafile-MC didn’t work, HestiaCP did. Here’s the error I’m getting trying to install it to any of our datastores:
Wed Dec 29 12:23:43 2021:
Error copying image in the datastore: INFO: cp: Copying local image docker://matrixdotorg/synapse?size=2048&filesystem=xfs&format=raw&tag=latest to the image repository
ERROR: cp: Command “set -e -o pipefail; /var/lib/one/remotes/datastore/fs/…/downloader.sh ‘docker://matrixdotorg/synapse?size=2048&filesystem=xfs&format=raw&tag=latest’ ‘/var/lib/one//datastores/104/ed2809cf23a37186ef160458c7108dbf’” failed:
jq: error (at :118): Cannot iterate over null (null) Error copying Error copying docker://matrixdotorg/synapse?size=2048&filesystem=xfs&format=raw&tag=latest to /var/lib/one//datastores/104/ed2809cf23a37186ef160458c7108dbf

I get this error through CLI and Sunstone. I tried many times with different file formats, partition types - nothing really helped. Note that while attempting to install Docker, we were using the opennebula-node-kvm package. Kubernetes also failed to install, I don’t have the error anymore.

Issue #2.5 - IDs
Due to so many attempts that failed, we’re pushing up the image IDs and VM IDs - any way to clear an ID after it has been deleted? Its a bit offsetting having 0,1,17,37,42,44,56 with no IDs in between.

Because these are our major problems right now, I’ll leave it be like this.

Thank you for anyone to have read this and thank you in advance for any advice on how to solve these issues!

All the best,
Alyx

Hello

Could you clarify which hypervisor you intend to use ? OpenNebula have drivers for both LXD and LXC, although LXD is deprecated.

I’ve seen this error when a rootfs partition is mounted as read-only by the LXC driver on the host. Could you share the value of the attribute LXC_UNPRIVILEGED from the vm template ? You can do so with the command onevm show <vm_id> | grep LXC_UNPRIVILEGED, this is only available on 6.2, then could you also confirm which version are you using ? Also, try to manually mount the image of the rootfs on the host and list the contents inside at the root level with ls -lh. The image should be located at /var/lib/one/datastores/<system_datastore_id>/<vm_id>/disk.<disk_id>, or in the case of LVM simply mount the exported logical volume, the information for this should be readily available on the VM Template disk section.

For setting up the datastores using LVM, please take a look at this section

My bad, that was a typo, we’re trying to use LXC.

For the latter, I cannot give you the VM Template attribute at the moment as we’re doing a fresh install. I’ll get back in touch later.

This unfortunately is not supported since all the cross references will need to be updated as well.

Hi @Alyx, regarding this it seems that the images you’re trying to use are affected by a couple of bugs, we really appreciate your feedback. I’ve opened a couple of GitHub issues for them: