Problems deploying VM on new host: problems with /var/lib/one/remotes/tm/ssh/clone

Running 6.4.0
I recently added a new host to my default cluster. Its in the GUI and all looks good. SSH works passwordless utilizing a key. Logged in as oneadmin on the FrontEnd and I can ssh to the new host with no issues and no password. I also have several other host in a different cluster working.

My problem is any time I try to deploy a VM template (from the cluster 0 image datastore) I get issues that prevent it process from working. Looking at the error message and the VM log I can see the command it is failing on:

/var/lib/one/remotes/tm/ssh/clone onenebula02.lab.com:/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef 10.200.18.27:/var/lib/one//datastores/0/538/disk.0 538 1

When i run this command from the front end as oneadmin I see that is is waiting to have a password provided. If I enter the password the cloning works. I watched the process in top and saw it is running under the oneadmin user context so am confused why it looking for a password as it is using ssh in the backend.

Any idea what’s happening here?

Fri Jul 7 17:34:40 2023 [Z0][VM][I]: New state is ACTIVE
Fri Jul 7 17:34:40 2023 [Z0][VM][I]: New LCM state is PROLOG
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: Command execution failed (exit code: 255): /var/lib/one/remotes/tm/ssh/clone onenebula02.lab.com:/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef 10.200.18.27:/var/lib/one//datastores/0/538/disk.0 538 1
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: clone: Cloning /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef in /var/lib/one//datastores/0/538/disk.0
Fri Jul 7 17:34:47 2023 [Z0][TrM][E]: clone: Command " set -e -o pipefail
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]:
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: if [ -d “/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef.snap” ]; then
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: SRC_SNAP=“61c44667eacd6e337f37c262dfd6feef.snap”
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: fi
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]:
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: tar -C /var/lib/one//datastores/1 --transform=“flags=r;s|61c44667eacd6e337f37c262dfd6feef|disk.0|” -cSf - 61c44667eacd6e337f37c262dfd6feef $SRC_SNAP | ssh 10.200.18.27 “tar -xSf - -C /var/lib/one//datastores/0/538"” failed: Permission denied, please try again.
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: Permission denied, please try again.
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: oneadmin@onenebula02.lab.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Jul 7 17:34:47 2023 [Z0][TrM][I]: Error copying /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef to 10.200.18.27:/var/lib/one//datastores/0/538/disk.0
Fri Jul 7 17:34:47 2023 [Z0][TrM][E]: Error executing image transfer script: INFO: clone: Cloning /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef in /var/lib/one//datastores/0/538/disk.0 ERROR: clone: Command " set -e -o pipefail if [ -d “/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef.snap” ]; then SRC_SNAP=“61c44667eacd6e337f37c262dfd6feef.snap” fi tar -C /var/lib/one//datastores/1 --transform=“flags=r;s|61c44667eacd6e337f37c262dfd6feef|disk.0|” -cSf - 61c44667eacd6e337f37c262dfd6feef $SRC_SNAP | ss Error copying /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef to 10.200.18.27:/var/lib/one//datastores/0/538/disk.0
Fri Jul 7 17:34:47 2023 [Z0][VM][I]: New LCM state is PROLOG_FAILURE
Fri Jul 7 17:38:25 2023 [Z0][VM][I]: New LCM state is PROLOG
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: Command execution failed (exit code: 255): /var/lib/one/remotes/tm/ssh/clone onenebula02.lab.com:/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef 10.200.18.27:/var/lib/one//datastores/0/538/disk.0 538 1
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: clone: Cloning /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef in /var/lib/one//datastores/0/538/disk.0
Fri Jul 7 17:38:28 2023 [Z0][TrM][E]: clone: Command " set -e -o pipefail
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]:
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: if [ -d “/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef.snap” ]; then
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: SRC_SNAP=“61c44667eacd6e337f37c262dfd6feef.snap”
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: fi
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]:
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: tar -C /var/lib/one//datastores/1 --transform=“flags=r;s|61c44667eacd6e337f37c262dfd6feef|disk.0|” -cSf - 61c44667eacd6e337f37c262dfd6feef $SRC_SNAP | ssh 10.200.18.27 “tar -xSf - -C /var/lib/one//datastores/0/538"” failed: Permission denied, please try again.
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: Permission denied, please try again.
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: oneadmin@onenebula02.lab.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Jul 7 17:38:28 2023 [Z0][TrM][I]: Error copying /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef to 10.200.18.27:/var/lib/one//datastores/0/538/disk.0
Fri Jul 7 17:38:28 2023 [Z0][TrM][E]: Error executing image transfer script: INFO: clone: Cloning /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef in /var/lib/one//datastores/0/538/disk.0 ERROR: clone: Command " set -e -o pipefail if [ -d “/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef.snap” ]; then SRC_SNAP=“61c44667eacd6e337f37c262dfd6feef.snap” fi tar -C /var/lib/one//datastores/1 --transform=“flags=r;s|61c44667eacd6e337f37c262dfd6feef|disk.0|” -cSf - 61c44667eacd6e337f37c262dfd6feef $SRC_SNAP | ss Error copying /var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef to 10.200.18.27:/var/lib/one//datastores/0/538/disk.0
Fri Jul 7 17:38:28 2023 [Z0][VM][I]: New LCM state is PROLOG_FAILURE

Ok, got it.

The script was trying to login to the fronend from the frontend.
Somehow I did not have my frontend’s key in the frontend’s authorized_keys file…doh!

1 Like

I think the best question here to ask is: why frontend have to ssh to itself. I’ve stumbled for so many “weird” behavior things that I stopped counting

Hello,

First of all, please remember that here are involved 2 different datastores:

  • The system datastore (id 0) that has each VM disk information, and every host needs it. In your case every host has a different one.
  • The default datastore (id 1) that got the OS images and in your case (ssh transfer manager) only needs to exist in the frontend

When a new VM is deployed it calls the transfer manager to copy the VM image from the frontend to the host, so as you detected, the ssh transfer manager parameters are

# clone frontend:SOURCE host:remote_system_ds/disk.i vmid dsid

So as you saw there, the call was

**/var/lib/one/remotes/tm/ssh/clone [onenebula02.lab.com](http://onenebula02.lab.com/):/var/lib/one//datastores/1/61c44667eacd6e337f37c262dfd6feef 10.200.18.27:/var/lib/one//datastores/0/538/disk.0 538 1**

as the ssh between http://onenebula02.lab.com/ and 10.200.18.27 was not passwordless you had problems on this.

If you configured the frontend as a host and the vm was being deployed on it, you need to add the pubkey of the frontend on its authorized_keys

Thank you