XML-RPC requests timeout

Vitaly_Varyvdin · May 14, 2025, 9:45am

Running OpenNebula 5.4 on 5 combined hosts: all act as front-end and hypervisor nodes. Managed by FreeIPA. RAFT is configured and functioning properly
DB backend is PostgreSQL

Running any XML-RPC requests either via CLI ‘one*’ commands or in the UI results in Net::ReadTimeout

Resuming VM results in timeout, but ‘one.vm.info’ method is successfully completed some time after timeout, according to /var/log/one/oned.log

Sometimes VM is resumed after all, but mostly requests just timeout
Sometimes rebooting leader node results in VM automatically starting right away after new leader selection

upd: restarting opennebula service results in node functioning properly. What could be causing oned to execute requests this slow?

FrancJP · May 14, 2025, 10:53am

Hello @Vitaly_Varyvdin,

I’m sorry but we are not supporting 5.4 anymore (at least from the community side). My suggestion is to update your OpenNebula version (we are currently in 6.10), and check if the problem persists.

Then, there’s a comprehensive article about XML-RPC on our documentation.

Cheers,

mkutouski · May 14, 2025, 11:33am

Do you have any messages about slow DB queries in the /var/log/one/monitor.log like the ones described in another thread on that forum?

Have you tried to execute the same commands but with increased value for a ONE_XMLRPC_TIMEOUT env variable?
For example

ONE_XMLRPC_TIMEOUT=60 onevm list

mkutouski · May 14, 2025, 11:33am

But as @FrancJP suggested the first thing to do is to upgrade your OpenNebula.

Vitaly_Varyvdin · May 14, 2025, 1:42pm

I’d be glad to update, but can’t do that unfortunately
I have no /var/log/one/monitor.log on any of the hosts

XMLRPC timeout is set to 0 in /etc/one/oned.conf, shouldn’t that actually mean no timeout?
Increasing timeout to about 3 minutes would help, according to logs, but Sunstone would still fail to execute the commands

Could that be due to corrupted db across or on some of the hosts? Can’t use onedb fsck or anything else since cluster is running on PostgreSQL and onedb doesn’t seem to support it

upd:
Observing weird behavior:
5 hosts on, any request results in timeout
Stopping opennebula and unicorn-opennebula (latter is probably excess) on leader node, switching to new leader, requests are completed right away. Some time after - all requests timeout again. Starting previously stopped services doesn’t change anything, repeating same process on new leader solves the issue for another minute

mkutouski · May 16, 2025, 12:37pm

no, ONE_XMLRPC_TIMEOUT=0 means immediate timeout, i.e. after 0 seconds. A default value for that parameter is 30 seconds.
So could you, please, try to set it e.g. to 30 on your leader FE, restart OpenNebula service (systemctl restart opennebula.service) and check if it helps to solve the issue?
If it helps then apply the same changes on all your other HA FE nodes.

mkutouski · May 16, 2025, 12:40pm

If you meant some technical reasons then what are they?

Topic		Replies	Views
Image creation with url path causes XML-RPC timeout and failure Community Support	3	1153	February 18, 2020
Unable to find XML-RPC response in what server sent back Community Support	0	387	March 14, 2018
XML RPC Calls are taking lot of time to return Community Support	1	539	September 29, 2017
RAFT issue since upgrade Community Support	12	1443	August 1, 2018
OpenNebula RAFT HA questions Community Support	1	690	March 5, 2018

XML-RPC requests timeout

Related topics