Lost all configuration after "onedb fsck"

Hello,

I was running some test with my “testing OpenNebula Cluster” when, after running a “onedb fsck” and see that I was getting several errors about “server_cipher/authenticate bad cipher”, I have run “oneuser passwd --sha256 serveradmin my_password”, run "echo “serveradmin:password” > .one/sunstone_auth, restarted opennebula and opennbula-sunstone daemons and, I don’t know why, all my OpenNebula daemons start OK but I have lost ALL my configuration (users, hosts, virtual networks, templates, downloaded images, etc).

How could I recover that?

Thanks.

onedb fsck is supposed to only perform changes aimed to correct consistency changes, like wrong quotas or metrics. If you got the database nuked and can somehow reproduce it, try to post the log and run onedb fsck with --verbose. To recover, you need to restore a previous database backup.

Hi,

When I saw that I have “lost” all configuration, first I did was stopping all opennebula services and run a “onedb recover” from the automatic backup done when, before, I ran “onedb fsck”. However, that restore has not restored configuration… but if you execute a “grep” inside the one.db file, you can see some lines containing “host_pool” entry, one per each host I had configured… so it seems database is “correct”, but there is some “extra” problem that are not allowing a correct status.

Thanks.

Hello,

Just confirming I had exactly the same issue. Even restoring the backup already taken by the fsck didn’t bring back the lost VMs/users etc.

It’s almost as if it jumps the database back in time.

Marc

Hi,

exact! It seems all data saved in the database has been “removed”… Please, @dclavijo, is there any way to restore database that contains all data? As I said before, if you backup database and, then, over that backup file you run “grep” inside the file, you will see that all information are there… but if you restore the database again, that information doesn’t appear. In Dashboard and CLI, I have lost users, templates, downloaded (and modified) images, vnets, hosts…

Thanks.

Can you post:

  • the OS version
  • OpenNebula version
  • mysql/mariadb/sqlite/database software version
  • oned.conf section about DB config.

And if you can reproduce it reliably @Daniel_Ruiz_Molina @gigatux , please open a bug report.

Yes, of course:

  • the OS version → CentOS-7 7.9.2009
  • OpenNebula version → 6.4.0-1
  • mysql/mariadb/sqlite/database software version → sqlite 3.7.17-8
  • oned.conf section about DB config.
    DB = [ BACKEND = “sqlite”,
    TIMEOUT = 2500 ]

Now, opennebula daemon doesn’t start with this error: “Password file /var/lib/one//.one/sunstone_auth already exists but OpenNebula is boostraping the database. Check your database configuration in oned.conf.”

This is very strange, indeed. The last error message means the DB is empty or not accessible.

It seems onedb fsck corrupt the database and onedb restore is not able to recover it. Btw, have you seen some errors running these two commands?

To recover the DB you may try:

  • remove /var/lib/one/.one folder
  • start opennebula to bootstrap the DB
  • stop the opennebula
  • run onedb recover

It would be nice to have a DB which cause such complications. Can you please share the DB to pczerny@opennebula.io

Hi,

I don’t remember if I saw (more) errors when I ran onedb fsck and onedb restore. I had to reconfigure all my environment because I needed that to run some important tests (all these problems happened in my OpenNebula “test” cluster).
So I can’t try to recover my DB running what you say to me. However, it is very important to me that you could know what happened to the database, because I have a production OpenNebula cluster (now with 800 VMs) and, if database crash in the same way… I will have to flight to the moon and stay there for the next 500 years.

I’m going to send to @pczerny a database backup that, I think, it was taken after stopping daemons and run onedb fsck.

Thanks.