I’m encountering an issue while deploying VMs with VLAN ID 600 in OpenNebula. During the VM boot process or when attaching the network interface, I’m getting the following error:

Hi

I’m encountering an issue while deploying VMs with VLAN ID 600 in OpenNebula. During the VM boot process or when attaching the network interface, I’m getting the following error:

Error:
RTNETLINK answers: Device or resource busy when creating br1.600. This issue occurs because OpenNebula tries to set up the VLAN sub-interface (br1.600) and encounters a conflict with already existing interfaces or improperly linked resources.

Steps Taken to Investigate:
Checked Network Configuration: Verified that the br1.600 interface is not already created manually outside OpenNebula.

Checked bridge link membership: Ensured there are no conflicts between br1.600 and other bridges.

VLAN configuration: Ensured the proper configuration of VLAN ID 600 in the virtual network template for OpenNebula.

Solution Tried:
Cleaned up existing interfaces: Removed any leftover VLAN interfaces manually using ip link del.

Checked OpenNebula network templates: Verified that the VLAN network is configured correctly with VN_MAD=802.1Q, and the PHYDEV and VLAN_ID parameters are properly set.

Ensured no manual creation of br1.600: Made sure that br1.600 isn’t pre-created in the system’s network configuration files (netplan, etc.).

Hello @Naseem,

Could you please paste your logs instead of the results of the search engine you used?
Please, check for:

  • VM Instance template (onevm show <VM_ID> --xml)
  • Vnet Template (onevnet show <VNET_ID> --xml)
  • VM Log ( onevm log <VM_ID>)

Cheers,

Hello @FrancJP

Currently issue our fixed and Thanks for the response.

I am currently working on setting up High Availability (HA) for virtual machines in OpenNebula using two KVM hypervisors with different Linux distributions:

  • KVM Host 1: Ubuntu 22.04
  • KVM Host 2: Rocky Linux 8

Here is what I have done so far:

:white_check_mark: Both KVM hosts are added successfully to OpenNebula
:white_check_mark: Shared NFS storage is mounted on both hosts at /mnt/shared
:white_check_mark: A shared datastore is created and assigned
:white_check_mark: Both hosts can create and run VMs correctly
:white_check_mark: VM disks are stored on shared storage

Now I want to enable VM High Availability, so that if one KVM host fails, the running VM should automatically:

  1. Migrate to the second host (if possible), OR
  2. Recreate the VM on the second host (if migration is not possible)

I want to ask the community:

  1. What is the correct and recommended way to set up VM HA in OpenNebula?
  2. How does the host_error hook work in practice, and which method is better — -m migrate or -r recreate — in this scenario?
  3. Is it safe and supported to use mixed operating systems (Ubuntu and Rocky Linux) in HA setups? Any gotchas to watch out for?
  4. How can I set up fencing correctly to avoid split-brain? Is there a recommended fencing method (e.g., IPMI, SSH, etc.) for KVM nodes in OpenNebula?
  5. What logs or metrics should I monitor to verify the HA is working as expected?

Would really appreciate any examples, official links, or your experience with such setups. :folded_hands:

Thanks in advance!

Hello @Naseem,

You can check our documentation:

We have a couple of blogposts, but are quite old already.

Cheers,

Hi @FrancJP

I am currently setting up OpenNebula 6.10 with High Availability (HA) using two KVM nodes and NFS-backed shared datastores.


:white_check_mark: Architecture:

  • Frontend: Dedicated OpenNebula controller
  • Hypervisors (KVM):
    • Node 1:`
    • Node 2: `
  • Datastores:
    • Image Datastore (ID: 103): shared, TM_MAD=shared, mounted via NFS
    • System Datastore (ID: 104): shared, TM_MAD=shared, SHARED=YES

:red_exclamation_mark:Issue:

When I try to deploy a VM using these shared datastores, deployment fails with:

Could not open '/var/lib/one/datastores/103/xxxx...': Permission denied
internal error: process exited while connecting to monitor

However, if I use the default non-shared System datastore (ID: 0 with TM_MAD=ssh), the VM deploys successfully.


:test_tube: What I Checked:

  • Both 103 and 104 datastores use TM_MAD=shared and are correctly attached to the cluster.
  • Permissions on /var/lib/one/datastores/103/ are not always correct, sometimes not owned by oneadmin:oneadmin.
  • Image file exists but VM can’t read it.
  • NFS is used for image datastore — mount may not exist on all KVM nodes.

:white_check_mark: Temporary Fix:

  • If I change system datastore back to the default (non-shared), it works — but this defeats the HA purpose.
  • Re-creating datastore 104 with TM_MAD=shared allows VM creation only if image file is accessible.

:puzzle_piece: Likely Root Causes:

  1. NFS mount not available on all hypervisors:
mount | grep /var/lib/one/datastores/103
  1. File ownership/permissions wrong (not oneadmin:oneadmin):
sudo chown -R oneadmin:oneadmin /var/lib/one/datastores/103
  1. NFS export missing no_root_squash:
    Example /etc/exports on NFS server:
/var/lib/one/datastores/103 25.25.25.0/24(rw,sync,no_root_squash,no_subtree_check)
  1. SELinux/AppArmor blocking access
setenforce 0

:man_raising_hand: My Ask to the Community:

Has anyone faced this exact issue while setting up shared system/image datastores for HA?

  • Is there a best practice for ensuring consistent NFS mounts and permissions across nodes?
  • Can OpenNebula check and alert if the datastore is not mounted before VM boot?
  • Any way to automatically repair permissions on shared image files?

Any feedback or suggestions will be highly appreciated. :folded_hands:

Thanks,
Naseem

Thu Jun 19 12:11:00 2025 [Z0][VMM][D]: Message received: DEPLOY FAILURE 131 error: Failed to create domain from /var/lib/one//datastores/104/131/deployment.0 error: internal error: process exited while connecting to monitor: 2025-06-19T12:10:59.864766Z qemu-kvm-one: -blockdev {“driver”:“file”,“filename”:“/var/lib/one/datastores/103/9f29104471af45d9c360e2f22f0ec2b9”,“node-name”:“libvirt-3-storage”,“cache”:{“direct”:true,“no-flush”:false},“auto-read-only”:true,“discard”:“unmap”}: Could not open ‘/var/lib/one/datastores/103/9f29104471af45d9c360e2f22f0ec2b9’: Permission denied Could not create domain from /var/lib/one//datastores/104/131/deployment.0 ExitCode: 255

Getting error and depoloye vm