Hi
We are testing the new opennebula 5.12 community upgrade in our testbed, just following the docs:
https://docs.opennebula.io/5.12/intro_release_notes/upgrades/start_here.html
And also:
https://docs.opennebula.io/5.12/intro_release_notes/upgrades/upgrading_single.html#upgrade-single
but during the onehost sync
step, we get the error:
$ onehost sync
* Adding hyp107.altaria.os to upgrade
* Adding hyp106.altaria.os to upgrade
* Adding hyp105.altaria.os to upgrade
* Adding hyp104.altaria.os to upgrade
[========================================] 4/4 hyp104.altaria.os
Failed to update the following hosts:
* hyp107.altaria.os
* hyp105.altaria.os
* hyp106.altaria.os
* hyp104.altaria.os
And also the hyps are in error status after the upgrade (and VMs in unknown status). We didnt get any error during the rpms/db ugprade. Is this a known issue? We have upgraded from 5.8.1 to 5.12.0 using the community migrator package.
From the oned logs we can also see these error messages:
Mon Jul 6 17:15:45 2020 [Z0][AuM][D]: Message received: LOG I 4 Command execution failed (exit code: 255): /var/lib/one/remotes/auth/server_cipher/authenticate
Mon Jul 6 17:15:45 2020 [Z0][AuM][I]: Command execution failed (exit code: 255): /var/lib/one/remotes/auth/server_cipher/authenticate
Mon Jul 6 17:15:45 2020 [Z0][AuM][D]: Message received: LOG E 4 login token expired
Mon Jul 6 17:15:45 2020 [Z0][AuM][I]: login token expired
Mon Jul 6 17:15:45 2020 [Z0][AuM][D]: Message received: AUTHENTICATE FAILURE 4 login token expired
and from monitord.log
Mon Jul 6 17:24:07 2020 [Z0][MDP][W]: Start monitor failed for host 0:
Mon Jul 6 17:24:07 2020 [Z0][HMM][E]: Unable to monitor host id: 0
Mon Jul 6 17:24:07 2020 [Z0][MDP][I]:
Mon Jul 6 17:24:07 2020 [Z0][MDP][I]:
Mon Jul 6 17:24:07 2020 [Z0][MDP][I]:
Mon Jul 6 17:24:07 2020 [Z0][MDP][I]:
Mon Jul 6 17:24:07 2020 [Z0][MDP][D]: [1:0:0] Recieved START_MONITOR message from host 3:
Cheers
Álvaro
cgonzalez
(Christian González)
July 7, 2020, 8:28am
2
Hello @alvaro_simongarcia ,
Usually the sync fails when there is some file without enough permissions or when there is some symbolic link broke. Let’s try the first one, could you run find /var/lib/one/remotes ! -user oneadmin -exec ls -l {} \;
in your frontend and share the output?
Hi @cgonzalez
Ah, indeed, we had a few files there with just root access and this was interfering with the sync. We did a backup as root for /var/lib/one/remotes/etc
so we get several files with root permissions:
# find /var/lib/one/remotes ! -user oneadmin -exec ls -l {} \;
total 24
drwxr-x--- 4 root root 4096 Jul 6 14:28 datastore
drwxr-x--- 4 root root 4096 Jul 6 14:28 im
drwxr-x--- 3 root root 4096 Jul 6 14:28 market
drwxr-x--- 3 root root 4096 Jul 6 14:28 tm
drwxr-x--- 5 root root 4096 Jul 6 14:28 vmm
drwxr-x--- 2 root root 4096 Jul 6 14:28 vnm
total 12
drwxr-x--- 2 root root 4096 Jul 6 14:28 kvm
drwxr-x--- 2 root root 4096 Jul 6 14:28 lxd
drwxr-x--- 2 root root 4096 Jul 6 14:28 vcenter
total 4
-rw-r----- 1 root root 1668 Jul 6 14:28 vcenterrc
-rw-r----- 1 root root 1668 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/vmm/vcenter/vcenterrc
total 4
-rw-r----- 1 root root 3652 Jul 6 14:28 kvmrc
-rw-r----- 1 root root 3652 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/vmm/kvm/kvmrc
total 4
-rw-r----- 1 root root 2053 Jul 6 14:28 lxdrc
-rw-r----- 1 root root 2053 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/vmm/lxd/lxdrc
total 8
-rw-r----- 1 root root 4770 Jul 6 14:28 OpenNebulaNetwork.conf
-rw-r----- 1 root root 4770 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/vnm/OpenNebulaNetwork.conf
total 8
drwxr-x--- 2 root root 4096 Jul 6 14:28 kvm-probes.d
drwxr-x--- 2 root root 4096 Jul 6 14:28 lxd-probes.d
total 4
-rw-r----- 1 root root 2650 Jul 6 14:28 pci.conf
-rw-r----- 1 root root 2650 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/im/lxd-probes.d/pci.conf
total 4
-rw-r----- 1 root root 2650 Jul 6 14:28 pci.conf
-rw-r----- 1 root root 2650 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/im/kvm-probes.d/pci.conf
total 8
drwxr-x--- 2 root root 4096 Jul 6 14:28 ceph
drwxr-x--- 2 root root 4096 Jul 6 14:28 fs
total 4
-rw-r----- 1 root root 1238 Jul 6 14:28 fs.conf
-rw-r----- 1 root root 1238 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/datastore/fs/fs.conf
total 4
-rw-r----- 1 root root 1856 Jul 6 14:28 ceph.conf
-rw-r----- 1 root root 1856 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/datastore/ceph/ceph.conf
total 4
drwxr-x--- 2 root root 4096 Jul 6 14:28 fs_lvm
total 4
-rw-r----- 1 root root 1630 Jul 6 14:28 fs_lvm.conf
-rw-r----- 1 root root 1630 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/tm/fs_lvm/fs_lvm.conf
total 4
drwxr-x--- 2 root root 4096 Jul 6 14:28 http
total 4
-rw-r----- 1 root root 1238 Jul 6 14:28 http.conf
-rw-r----- 1 root root 1238 Jul 6 14:28 /var/lib/one/remotes/etc.2020-07-06/market/http/http.conf
total 20
drwxr-x--- 3 root root 4096 Mar 10 16:52 datastore
drwxr-x--- 4 root root 4096 Mar 10 16:52 im
drwxr-x--- 3 root root 4096 Mar 10 16:52 tm
drwxr-x--- 5 root root 4096 Mar 10 16:52 vmm
drwxr-x--- 2 root root 4096 Mar 10 16:52 vnm
total 12
drwxr-x--- 2 root root 4096 Mar 10 16:52 kvm
drwxr-x--- 2 root root 4096 Mar 10 16:52 lxd
drwxr-x--- 2 root root 4096 Mar 10 16:52 vcenter
total 4
-rw-r--r-- 1 root root 1513 Mar 10 16:52 vcenterrc
-rw-r--r-- 1 root root 1513 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/vmm/vcenter/vcenterrc
total 4
-rw-r--r-- 1 root root 3436 Mar 10 16:52 kvmrc
-rw-r--r-- 1 root root 3436 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/vmm/kvm/kvmrc
total 4
-rw-r--r-- 1 root root 2053 Mar 10 16:52 lxdrc
-rw-r--r-- 1 root root 2053 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/vmm/lxd/lxdrc
total 8
-rw-r--r-- 1 root root 4572 Mar 10 16:52 OpenNebulaNetwork.conf
-rw-r--r-- 1 root root 4572 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/vnm/OpenNebulaNetwork.conf
total 8
drwxr-x--- 2 root root 4096 Mar 10 16:52 kvm-probes.d
drwxr-x--- 2 root root 4096 Mar 10 16:52 lxd-probes.d
total 4
-rw-r--r-- 1 root root 2650 Mar 10 16:52 pci.conf
-rw-r--r-- 1 root root 2650 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/im/lxd-probes.d/pci.conf
total 4
-rw-r--r-- 1 root root 2650 Mar 10 16:52 pci.conf
-rw-r--r-- 1 root root 2650 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/im/kvm-probes.d/pci.conf
total 4
drwxr-x--- 2 root root 4096 Mar 10 16:52 ceph
total 4
-rw-r--r-- 1 root root 1744 Mar 10 16:52 ceph.conf
-rw-r--r-- 1 root root 1744 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/datastore/ceph/ceph.conf
total 4
drwxr-x--- 2 root root 4096 Mar 10 16:52 fs_lvm
total 4
-rw-r--r-- 1 root root 1577 Mar 10 16:52 fs_lvm.conf
-rw-r--r-- 1 root root 1577 Mar 10 16:52 /var/lib/one/remotes/etc.2020-03-10/tm/fs_lvm/fs_lvm.conf
We should use oneadmin user to make those backups next time. I have moved the spurious /var/lib/one/remotes/etc.xxxxxxx
directories and now the sync is working correctly as oneadmin:
$ onehost sync
* Adding hyp107.altaria.os to upgrade
* Adding hyp106.altaria.os to upgrade
* Adding hyp105.altaria.os to upgrade
[========================================] 3/3 hyp105.altaria.os
So the sync issue is fixed, thanks a lot!
Cheers
Álvaro
Hi @cgonzalez
Also more good news! with this fix also the hosts now are available again, it seems this has fixed also another issue (Hosts in error after upgrading to 5.12.0 )
Cheers
Álvaro
2 Likes