Onehost sync temporary failure

Hello, I have the following problem with “onehost sync”: when I increment the version of scripts in remotes/VERSION, and then run onehost sync, I get the following output:

$ onehost sync
* Adding host1 to upgrade
* Adding host2 to upgrade
...
* Adding host30 to upgrade
[========================================] 30/30 host30                      
Failed to update the following hosts:
* host3
* host1
...
* host7
$ 

When I run “onehost sync” again in a few seconds, the result is “No hosts are going to be updated.”, and the contents of the remotes directory is synchronized to all hosts as expected.

Now the weirdest thing is that the list of hosts which fail to synchronize themselves is always the same (yet they are listed in random order): host1 to host12, which is the original set of hosts in my OpenNebula cluster. The rest was added later, in several phases. So I guess that these hosts have a slightly different configuration somewhere, and the push-style request to synchronize by “onehost sync” fails on them, but they synchronize themselves later pull-style anyway.

Can anybody point me to a place to look at, where the difference between push- and pull-style synchronization of remotes/ scripts could be? A missing known_hosts entry? Somehting like that. In all the other aspects, all my hosts including the first twelve work as expected (VM deployment, migration, etc.).

Thanks in advance,

-Yenya

OK, a year later, the problem still persists, or maybe it is even worse. I have edited some of the remotes scripts, bumped the VERSION up, and ran “onehost sync”. Now the result is similar, but only two of my 31 hosts are reported as failed. Moreover, all of my hosts are stuck either in state “update” or “init” even several minutes after that, according to “onehost list”.

In oned.log, there are “Monitoring host hostN (idN)” messages, but no errors. I can move the host from the “update” to “init” state using “onehost disable hostN; onehost enable hostN”, but that’s it. It does not come out of the “init” state.

Where can be the problem? Where should I look further? Thanks!

-Yenya

Many years later, I still see this problem, usually after ONe upgrade: I run onehost sync, get the Failed to update the following hosts: error message (these times with all of my nodes listed), and after waiting for several minutes, another onehost sync invocation reports No hosts are going to be updated., and the /var/lib/one/tmp/VERSION files on all nodes reflect the new release.

Everything works as expected afterwards, but still this is a bit annoying. Where should I look for the reason of such failure?

Thanks!

-Yenya

OK, this time I looked deeper in the logs - the problem apparently was the forgotten vim recovery file in the remotes directory which was accessible by root only. But I still wonder why it made a problem only when running onehost sync directly from oneadmin account on the master node, but files still got synced some time after that.