Failed to adding a KVM Host to OpenNebula

My use case is:

  • 1 node (xffhyp4d) is installed as frontend server, as described here:
  • 2 nodes (xffhyp5d and xffhyp6d) are installed as KVM node, as described here
  • Installation on top of RHEL7.6
  • I can successfully SSH without a password to the other 2 nodes (from all machines) based on their short hostname, with the user oneadmin.

When I am in Step 7 Adding a Host to OpenNebula, I get a failure with the following error message from the logs:

Frontend (xffhyp4d) oned.log:

Tue Apr 9 10:51:10 2019 [Z0][ReM][D]: Req:6096 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked
Tue Apr 9 10:51:10 2019 [Z0][ReM][D]: Req:6096 UID:0 one.zone.raftstatus result SUCCESS, “<SERVER_ID>-1<…”
Tue Apr 9 10:51:10 2019 [Z0][ReM][D]: Req:7648 UID:0 IP:127.0.0.1 one.vmpool.info invoked , -2, -1, -1, -1
Tue Apr 9 10:51:10 2019 [Z0][ReM][D]: Req:7648 UID:0 one.vmpool.info result SUCCESS, “<VM_POOL></VM_POOL>”
Tue Apr 9 10:51:10 2019 [Z0][ReM][D]: Req:8608 UID:0 IP:127.0.0.1 one.vmpool.info invoked , -2, -1, -1, -1
Tue Apr 9 10:51:10 2019 [Z0][ReM][D]: Req:8608 UID:0 one.vmpool.info result SUCCESS, “<VM_POOL></VM_POOL>”
Tue Apr 9 10:51:12 2019 [Z0][AuM][D]: Message received: AUTHENTICATE SUCCESS 21 -

Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:3840 UID:0 IP:127.0.0.1 one.host.allocate invoked , “xffhyp6d”, “kvm”, “kvm”, 0
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:3840 UID:0 one.host.allocate result SUCCESS, 16
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:8400 UID:0 IP:127.0.0.1 one.host.update invoked , 16, “NAME=“xffhyp6d”
VM_M…”, 1
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:8400 UID:0 one.host.update result SUCCESS, 16
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:1200 UID:0 IP:127.0.0.1 one.host.info invoked , 16
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:1200 UID:0 one.host.info result SUCCESS, “16<NA…”
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:4400 UID:0 IP:127.0.0.1 one.hostpool.info invoked
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:4400 UID:0 one.hostpool.info result SUCCESS, “<HOST_POOL><ID…”
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:9024 UID:0 IP:127.0.0.1 one.user.info invoked , 0
Tue Apr 9 10:51:12 2019 [Z0][ReM][D]: Req:9024 UID:0 one.user.info result SUCCESS, “0<GID…”
Tue Apr 9 10:51:22 2019 [Z0][InM][D]: Monitoring host xffhyp6d (16)
Tue Apr 9 10:51:24 2019 [Z0][InM][I]: Command execution failed (exit code: 42): ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d; else exit 42; fi’
Tue Apr 9 10:51:24 2019 [Z0][InM][I]: Remote worker node files not found
Tue Apr 9 10:51:24 2019 [Z0][InM][I]: Updating remotes
Tue Apr 9 10:51:26 2019 [Z0][InM][I]: Command execution failed (exit code: 42): ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d; else exit 42; fi’
Tue Apr 9 10:51:26 2019 [Z0][InM][I]: Remote worker node files not found
Tue Apr 9 10:51:26 2019 [Z0][InM][I]: Updating remotes
Tue Apr 9 10:51:29 2019 [Z0][InM][I]: Command execution failed (exit code: 42): ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d; else exit 42; fi’
Tue Apr 9 10:51:29 2019 [Z0][InM][I]: Remote worker node files not found
Tue Apr 9 10:51:29 2019 [Z0][InM][I]: Updating remotes
Tue Apr 9 10:51:32 2019 [Z0][InM][I]: Command execution failed (exit code: 42): ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d; else exit 42; fi’
Tue Apr 9 10:51:32 2019 [Z0][ONE][E]: Error monitoring Host xffhyp6d (16): -
Tue Apr 9 10:51:34 2019 [Z0][DBM][I]: Purging obsolete LogDB records: 0 records purged. Log state: 0,0 - 0,0
Tue Apr 9 10:51:34 2019 [Z0][DBM][I]: Purging obsolete federated LogDB records: 0 records purged. Federated log size: 0
Tue Apr 9 10:51:40 2019 [Z0][ReM][D]: Req:5232 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked
Tue Apr 9 10:51:40 2019 [Z0][ReM][D]: Req:5232 UID:0 one.zone.raftstatus result SUCCESS, “<SERVER_ID>-1<…”
Tue Apr 9 10:51:40 2019 [Z0][ReM][D]: Req:4800 UID:0 IP:127.0.0.1 one.vmpool.info invoked , -2, -1, -1, -1
Tue Apr 9 10:51:40 2019 [Z0][ReM][D]: Req:4800 UID:0 one.vmpool.info result SUCCESS, “<VM_POOL></VM_POOL>”
Tue Apr 9 10:51:40 2019 [Z0][ReM][D]: Req:4480 UID:0 IP:127.0.0.1 one.vmpool.info invoked , -2, -1, -1, -1
Tue Apr 9 10:51:40 2019 [Z0][ReM][D]: Req:4480 UID:0 one.vmpool.info result SUCCESS, “<VM_POOL></VM_POOL>”

KVM host (xffhyp6d) logs:

==> /var/log/messages <==
Apr 9 10:51:31 xffhyp6d systemd-logind: Removed session 207.
Apr 9 10:51:31 xffhyp6d systemd: Removed slice User Slice of oneadmin.

==> /var/log/secure <==
Apr 9 10:51:32 xffhyp6d sshd[21660]: Accepted publickey for oneadmin from 172.25.252.65 port 44750 ssh2: RSA SHA256:COkhwHK2KghRVtG9P43lwTHdwUV3O33Enu2OwrCDQdI

==> /var/log/messages <==
Apr 9 10:51:32 xffhyp6d systemd: Created slice User Slice of oneadmin.
Apr 9 10:51:32 xffhyp6d systemd-logind: New session 208 of user oneadmin.
Apr 9 10:51:32 xffhyp6d systemd: Started Session 208 of user oneadmin.

==> /var/log/secure <==
Apr 9 10:51:32 xffhyp6d sshd[21660]: pam_unix(sshd:session): session opened for user oneadmin by (uid=0)
Apr 9 10:51:32 xffhyp6d sshd[21663]: Received disconnect from 172.25.252.65 port 44750:11: disconnected by user
Apr 9 10:51:32 xffhyp6d sshd[21663]: Disconnected from 172.25.252.65 port 44750
Apr 9 10:51:32 xffhyp6d sshd[21660]: pam_unix(sshd:session): session closed for user oneadmin

How can I resolve this error?

Hello,

could you check following?

  • you can also scp some blank file as oneadmin from the frontend to the host you are adding? (without password and host verification)?
  • you are running opennebula frontend under oneadmin user?

Thanks, Jan

Frontend server:

[oneadmin@xffhyp4d ~]$ touch testfile
[oneadmin@xffhyp4d ~]$ scp testfile xffhyp6d:~/
testfile 100% 0 0.0KB/s 00:00
[root@xffhyp4d ~]# ps aux | grep oned
oneadmin 8539 0.1 0.0 2106408 15712 ? Ssl Apr08 1:43 /usr/bin/oned -f

KVM node:

==> /var/log/secure <==
Apr 9 11:28:09 xffhyp6d sshd[24149]: Accepted publickey for oneadmin from 172.25.252.65 port 45412 ssh2: RSA SHA256:COkhwHK2KghRVtG9P43lwTHdwUV3O33Enu2OwrCDQdI

  • SELinux is turned off?
  • could you run " /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d" on xffhyp6d (onedmin) and paste the output?

Also, what OpenNebula version do you install?

SElinux is off indeed.

There is no such directory “/var/tmp/one”:

[oneadmin@xffhyp4d ~]$ /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d
-bash: /var/tmp/one/im/run_probes: No such file or directory
[oneadmin@xffhyp4d ~]$ ls /var/tmp/
systemd-private-77f4d229d5c34589a7eb767a6280c2c9-arpwatch.service-Tbo0hr
systemd-private-77f4d229d5c34589a7eb767a6280c2c9-mariadb.service-zZnzLN
systemd-private-77f4d229d5c34589a7eb767a6280c2c9-ntpd.service-HcMLOP

Opennebula version is 5.8.0.

Still looks to me like ssh keys issue. Anyway, could you run that command on xffhyp6d as oneadmin?
/var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d

And also can you try this on frontend:
onehost sync --force
and
onehost sync --force --rsync?

[oneadmin@xffhyp6d ~]$ /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 16 xffhyp6d
-bash: /var/tmp/one/im/run_probes: Permission denied
[oneadmin@xffhyp6d ~]$ ls -lah /var/tmp/one/im/run_probes
-rwxr-xr-x. 1 oneadmin oneadmin 1.9K Feb 25 17:45 /var/tmp/one/im/run_probes
[oneadmin@xffhyp6d ~]$ ls -lah /var/lib/one//datastores
ls: cannot access /var/lib/one//datastores: No such file or directory
[oneadmin@xffhyp6d ~]$ ls -lah /var/lib/one/
total 24K
drwxr-xr-x. 3 oneadmin oneadmin 127 Apr 9 11:25 .
drwxr-xr-x. 39 root root 4.0K Apr 5 11:28 …
-rw-------. 1 oneadmin oneadmin 626 Apr 9 12:01 .bash_history
-rw-r–r–. 1 oneadmin oneadmin 18 Apr 5 11:28 .bash_logout
-rw-r–r–. 1 oneadmin oneadmin 193 Apr 5 11:28 .bash_profile
-rw-r–r–. 1 oneadmin oneadmin 231 Apr 5 11:28 .bashrc
drwx------. 2 oneadmin oneadmin 94 Apr 5 16:26 .ssh
-rw-r-----. 1 oneadmin oneadmin 0 Apr 9 11:28 testfile
-rw-------. 1 oneadmin oneadmin 2.0K Apr 5 16:26 .viminfo

[oneadmin@xffhyp4d ~]$ onehost sync --force

  • Adding xffhyp6d to upgrade
    [========================================] 1/1 xffhyp6d
    All hosts updated successfully.
    [oneadmin@xffhyp4d ~]$ onehost sync --force --rsync
  • Adding xffhyp6d to upgrade
    [========================================] 1/1 xffhyp6d
    All hosts updated successfully.
    [oneadmin@xffhyp4d ~]$ onehost list
    ID NAME CLUSTER TVM ALLOCATED_CPU ALLOCATED_MEM STAT
    16 xffhyp6d default 0 - - err

What exactly should I be able to ssh?
from oneadmin@xffhyp4d to oneadmin@xffhyp5d/xffhyp6d
from oneadmin@xffhyp5d/xffhyp6d to oneadmin@xffhyp4d
This works without password…

Well, ssh looks OK, but it seems you have mounted /var/tmp with noexec, could it be the case?

Can you remount with:
mount -o remount,exec /var/tmp
on xffhyp5d/xffhyp6d and try again?

1 Like

You got it!
/var/tmp and /tmp is mounted (nodev,noexec,nosuid), accordingly to the CIS hardening standards.

Thanks for your help. We will continue our Proof of Concept.