5.6 Failed to load federation log record

Hi!
I have a federation with two zones (ONE 5.6). And I set up http marketplace. When I add new app it successfuly adds and I can see it on master zone. When i switch to slave zone - there is no such an app.
Further, I can see the following error in onde.log on masters zone leader:
Failed to load federation log record 166263 for zone 101
So, I think opennebula can’t sync DB record to a slave zone, but I can’t understand why.

Please, advise.

Thanks a lot!

Hi @roman.saprykin,

Do you see any SQL error in the oned.log of the affected zone?

Hi Christian,

No, I don’t see any SQL errors.
And it looks like no other information is replicated to the second zone - I created a new group on master zone and I don see the group on the second zone.

Now, on the master zone, I can see in oned.log the following error:
Failed to load federation log record 172044 for zone 101
And in mysql.log I see the query:
SELECT c.log_index, c.term, c.sqlcmd, c.timestamp, c.fed_index, p.log_index, p.term FROM logdb c, logdb p WHERE c.log_index = 172044 AND p.log_index = 172043
The query returns nothing and it seems here is the problem, but I still don’t understand why is it’s happening.

It seems that you’ve loosed the previous record so this one cannot be replicated.

Do you know if the host where the replication is failing have been offline in some way?

Probably the master node deleted the record during the offline time. The number of records kept for replication can be tuned by modifying the LOG_RETENTION variable in oned.conf. You can find more info here: http://docs.opennebula.org/5.8/advanced_components/ha/frontend_ha_setup.html#raft-configuration-attributes

Yes, that’s possible.
And I already have tuned LOG_RETENTION.
Finally, I just truncated logdb table on master zone. Then I started oned and everything looked fine - no any errors in logs, next I created a new app and group on master zone and it’s not replicated to the slave.
It looks like nothing replicated to the slave at all.

For fixing the issue you need to generate a backup of the federated tables in the working node:

  • onedb backup --federated -u <user> -p <password> -d <db_name>

and restore it in the failing node:

  • onedb restore --federated -u <user> -p <password> -d <db_name> <path_to_backup>

After doing that both zones should be synchronized and replication should works again.

I tried to restore federated tables and firs it looked good, but as I’ve tried to change an app name and add a new one, nothing replicated to the slave and I’ve started receive error message in oned.log:
Failed to load federation log record 5162129 for zone 101

I restored federated tables on slave zone once again and it magically works now.