Hello everyone,
I have built an OpenNebula cluster in an HA configuration with 3 KVM hosts, and it was working well. However, I encountered some errors. When I checked, all VMs on the third kvm host were in an ‘Unknown’ state. I checked the logs and found the following lines:
On leader frontend /var/log/one/monitor.log
Fri Oct 20 15:34:17 2023 [Z0][HMM][I]: Successfully monitored VM: 20
Fri Oct 20 15:34:17 2023 [Z0][HMM][I]: Successfully monitored VM: 22
Fri Oct 20 15:34:18 2023 [Z0][HMM][D]: Monitoring host kvmhost03(2)
Fri Oct 20 15:34:18 2023 [Z0][MDP][I]: Command execution failed (exit code: 24): exec 2>/dev/null; rsync -Laz --delete '/var/lib/one/remotes'/ 'kvmhost03':'/var/tmp/one'/
Fri Oct 20 15:34:18 2023 [Z0][MDP][W]: Start monitor failed for host 2: Could not update remotes
Fri Oct 20 15:34:18 2023 [Z0][HMM][E]: Unable to monitor host id: 2
Fri Oct 20 15:34:18 2023 [Z0][HMM][D]: Updated Host 2, state ERROR
Fri Oct 20 15:34:28 2023 [Z0][HMM][I]: Successfully monitored VM: 24
Fri Oct 20 15:34:28 2023 [Z0][HMM][I]: Successfully monitored VM: 25
Fri Oct 20 15:34:28 2023 [Z0][HMM][I]: Successfully monitored VM: 32
Fri Oct 20 15:34:28 2023 [Z0][HMM][I]: Successfully monitored VM: 33
On leader frontend /var/log/one/oned.log
Fri Oct 20 15:34:15 2023 [Z0][ReM][D]: Req:3312 UID:0 IP:127.0.0.1 one.grouppool.info invoked
Fri Oct 20 15:34:15 2023 [Z0][ReM][D]: Req:3312 UID:0 one.grouppool.info result SUCCESS, "<GROUP_POOL><GROUP><..."
Fri Oct 20 15:34:16 2023 [Z0][ReM][D]: Req:8992 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked
Fri Oct 20 15:34:16 2023 [Z0][ReM][D]: Req:8992 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>3</..."
Fri Oct 20 15:34:18 2023 [Z0][ReM][D]: Req:8752 UID:0 IP:127.0.0.1 one.hostpool.info invoked
Fri Oct 20 15:34:18 2023 [Z0][ReM][D]: Req:8752 UID:0 one.hostpool.info result SUCCESS, "<HOST_POOL><HOST><ID..."
Fri Oct 20 15:34:18 2023 [Z0][ONE][E]: Error monitoring Host kvmhost03 (2):
Fri Oct 20 15:34:19 2023 [Z0][ReM][D]: Req:8384 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked
Fri Oct 20 15:34:19 2023 [Z0][ReM][D]: Req:8384 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>3</..."
Fri Oct 20 15:34:19 2023 [Z0][ReM][D]: Req:3456 UID:0 IP:127.0.0.1 one.vmpool.infoextended invoked , -2, -1, -1, -1
Fri Oct 20 15:34:19 2023 [Z0][ReM][D]: Req:3456 UID:0 one.vmpool.infoextended result SUCCESS, "<VM_POOL><VM><ID>261..."
Fri Oct 20 15:34:19 2023 [Z0][ReM][D]: Req:3760 UID:0 IP:127.0.0.1 one.vmpool.infoextended invoked , -2, -1, -1, -1
Fri Oct 20 15:34:19 2023 [Z0][ReM][D]: Req:3760 UID:0 one.vmpool.infoextended result SUCCESS, "<VM_POOL><VM><ID>261..."
On the follower server-2 /var/log/one/monitor.log:
Fri Oct 20 14:43:51 2023 [Z0][MDP][W]: Failed to monitor VM state for host 2: Error executing state.rb: database is locked
Fri Oct 20 14:44:56 2023 [Z0][MDP][W]: Failed to monitor VM state for host 2: Error executing state.rb: database is locked
Fri Oct 20 14:53:40 2023 [Z0][HMM][I]: --Mark--
Fri Oct 20 15:03:40 2023 [Z0][HMM][I]: --Mark--
Fri Oct 20 15:13:40 2023 [Z0][HMM][I]: --Mark--
Fri Oct 20 15:21:14 2023 [Z0][MDP][W]: Failed to monitor VM state for host 2: Error executing state.rb: attempt to write a readonly database
Fri Oct 20 15:21:14 2023 [Z0][MDP][W]: Failed to monitor VM state for host 0: Error executing state.rb: attempt to write a readonly database
Fri Oct 20 15:23:40 2023 [Z0][HMM][I]: --Mark--
Fri Oct 20 15:33:40 2023 [Z0][HMM][I]: --Mark--
Fri Oct 20 15:34:21 2023 [Z0][HMM][D]: Updated Host 2, state ERROR
Fri Oct 20 15:37:34 2023 [Z0][HMM][D]: Updated Host 2, state MONITORED
Fri Oct 20 15:43:40 2023 [Z0][HMM][I]: --Mark--
Fri Oct 20 15:53:40 2023 [Z0][HMM][I]: --Mark--
On the follower server-2: /var/log/one/oned.log
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 74 not in ACTIVE state
Fri Oct 20 15:34:09 2023 [Z0][InM][D]: VM_STATE update from host: 0. VM id: 89, state: POWEROFF
Fri Oct 20 15:34:09 2023 [Z0][InM][D]: VM_STATE update from host: 0. VM id: 8, state: POWEROFF
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][InM][D]: VM_STATE update from host: 0. VM id: 4, state: POWEROFF
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 80 not in ACTIVE state
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 83 not in ACTIVE state
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 85 not in ACTIVE state
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 89 not in ACTIVE state
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 8 not in ACTIVE state
Fri Oct 20 15:34:09 2023 [Z0][DiM][E]: poweroff_success action received but VM 14 not in ACTIVE state
Fri Oct 20 15:34:18 2023 [Z0][ReM][D]: Req:4608 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked
Fri Oct 20 15:34:18 2023 [Z0][ReM][D]: Req:4608 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Fri Oct 20 15:34:18 2023 [Z0][ReM][D]: Req:9664 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked
Fri Oct 20 15:34:18 2023 [Z0][ReM][D]: Req:9664 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Fri Oct 20 15:34:21 2023 [Z0][InM][D]: VM_STATE update from host: 2. VM id: 36, state: RUNNING
Fri Oct 20 15:34:21 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:21 2023 [Z0][InM][D]: VM_STATE update from host: 2. VM id: 46, state: RUNNING
Fri Oct 20 15:34:21 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:21 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Fri Oct 20 15:34:21 2023 [Z0][InM][D]: VM_STATE update from host: 2. VM id: 239, state: RUNNING
Fri Oct 20 15:34:21 2023 [Z0][DBM][E]: Tried to modify DB being a follower
Zone info (onezone show):
ZONE 0 INFORMATION
ID : 0
NAME : opennebula.zone
STATE : ENABLED
ZONE SERVERS
ID NAME ENDPOINT
2 server-2 http://192.168.0.1:2633/RPC2
3 server-0 http://192.168.0.2:2633/RPC2
4 server-1 http://192.168.0.3:2633/RPC2
HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
2 server-2 follower 1575 660243 660243 3 -1
3 server-0 leader 1575 660243 660243 3 -1
4 server-1 follower 1575 660243 660243 -1 -1
ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"
Host list:
ID NAME CLUSTER TVM ALLOCATED_CPU ALLOCATED_MEM STAT
2 kvmhost03 default 8 120 / 480 (25%) 10G / 32G (31%) on
1 kvmhost02 default 10 480 / 720 (66%) 16G / 32G (50 on
0 kvmhost01 default 10 520 / 720 (72%) 15G / 32G (46 on
"Note: The frontend server-2 is running on the same server as kvmhost03.
I have a few questions:
- Why is kvmhost03 marked as ‘ERROR’ (I need more information, such as network timeouts)?
- Why can’t VMs automatically recover their state when the host returns to the ‘ON’ state?
- Why are there log entries like ‘poweroff_success action received but VM 83 is not in the ACTIVE state’ even though VM 83 is running on KVM host 01 and there was no ‘poweroff’ action?
- Occasionally, I see the log entry ‘Failed to monitor VM state for host 2: Error executing state.rb: attempt to write a readonly database,’ but everything seems to be functioning normally, so I haven’t paid much attention to it. Is it really normal?
I’ve read this link: Host ERROR state and VM's UNKNOWN
But I’m using Ubuntu and don’t want to restart libvirtd. Is there a safer alternative? Currently, it’s still okay if I use the ‘recover’ option in Sunstone.
Thanks for your help.
Versions of the related components and OS (frontend, hypervisors, VMs): Server OS: Ubuntu 22.04,frontend: Opennebula CE 6.6.3, hypervisors: KVM.