VM scheduler no longer works after database recovery

Hello everyone,

recently our opennebula database got damaged during a power outage. Once systems were back online we restored the database using onedb restore. Afterwards the system started up again users could login and it was possible for administrators to manually deploy VMs by selecting a particular node. While regular users could still access their VMs and even create new ones via the Sunstone GUI, newly created VMs are no longer automatically deployed and remain in pending state indefinitely until they are manually deployed to a node chosen by an admin.

Inspection of /var/log/one/oned.log shows that there is some kind of authentication problem:

Mon Jun 13 11:48:09 2022 [Z0][ReM][E]: Req:9504 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Mon Jun 13 11:48:11 2022 [Z0][ReM][D]: Req:5936 UID:-1 IP:127.0.0.1 one.system.config invoked 
Mon Jun 13 11:48:11 2022 [Z0][ReM][E]: Req:5936 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Mon Jun 13 11:48:13 2022 [Z0][ReM][D]: Req:2304 UID:-1 IP:127.0.0.1 one.system.config invoked 
Mon Jun 13 11:48:13 2022 [Z0][ReM][E]: Req:2304 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Mon Jun 13 11:48:15 2022 [Z0][ReM][D]: Req:8416 UID:-1 IP:127.0.0.1 one.system.config invoked 
Mon Jun 13 11:48:15 2022 [Z0][ReM][E]: Req:8416 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Mon Jun 13 11:48:17 2022 [Z0][ReM][D]: Req:1424 UID:-1 IP:127.0.0.1 one.system.config invoked 
Mon Jun 13 11:48:17 2022 [Z0][ReM][E]: Req:1424 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.

I found this older discussion where the problem seemed to be related to a password inconsistency. In order to rule out inconsisten passwords I reset the oneadmin password using oneuser passwd oneadmin --driver server_cipher <PASSWORD>. Unfortunately this did not fix the problem instead the Sunstone login now shows an error message OpenNebula is not running or there was a server exception. Please check the server logs.. However both opennebula and opennebula-sunstone are in state active (running). The most recent entries in /var/log/one/oned.log still show the same error present before resetting the oneadmin password (one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.)

Can anyone give advice on how further diagnose and fix the issue? The Opennebula Version is 6.0.0.3.

In order to consistently change the passwords please check here. Sunstone doesn’t use oneadmin credentials, more information about admin users here.

Hi Daniel, thanks for the quick reply. When I reset the password as described in my original post above using the command oneuser passwd oneadmin --driver server_cipher <PASSWORD> I used the password that is stored in /var/lib/one/.one/one_auth as substitute for <PASSWORD>. When following the instructions you linked above and trying to execute oneuser passwd 0 <PASSWORD> I now run into an issue where oneuser itself quits with the following error message [one.user.passwd] User couldn't be authenticated, aborting call. Is it possible that the first password change for oneadmin used the wrong hash algorithm and the hash mismatch now causes oneuser passwd 0 <PASSWORD> to no longer execute properly? I do have another fallback admin user, is it possible to run oneuser in context of that other admin user in order to restore the oneadmin password?

I successfully restored the oneadmin password. The oneuser command and sunstone login both work again normally. However my original problem still persists. Newly created VMs are not being deployed automatically anymore and remain in PENDING (LCM_INIT) state indefinitely. As mentioned in my original I suspect the problem is related to the following messages in /var/log/one/oned.log

Tue Jun 14 14:34:12 2022 [Z0][ReM][E]: Req:5440 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Tue Jun 14 14:34:14 2022 [Z0][ReM][D]: Req:6576 UID:-1 IP:127.0.0.1 one.system.config invoked 
Tue Jun 14 14:34:14 2022 [Z0][ReM][E]: Req:6576 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Tue Jun 14 14:34:16 2022 [Z0][ReM][D]: Req:2480 UID:-1 IP:127.0.0.1 one.system.config invoked 
Tue Jun 14 14:34:16 2022 [Z0][ReM][E]: Req:2480 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Tue Jun 14 14:34:18 2022 [Z0][ReM][D]: Req:8224 UID:-1 IP:127.0.0.1 one.system.config invoked 
Tue Jun 14 14:34:18 2022 [Z0][ReM][E]: Req:8224 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.
Tue Jun 14 14:34:20 2022 [Z0][ReM][D]: Req:6720 UID:-1 IP:127.0.0.1 one.system.config invoked 
Tue Jun 14 14:34:20 2022 [Z0][ReM][E]: Req:6720 UID:- one.system.config result FAILURE [one.system.config] User couldn't be authenticated, aborting call.

What could cause [one.system.config] not to be able to authenticate, how can this be fixed?

Can you run CLI commands, for example onevm list ? That call is issued by a oneadmin group user and for some reason is not being correctly authenticated.