HA upgrade path

sulaweyo · September 21, 2017, 7:33am

I’m currently preparing our upgrade from 5.4 to 5.4.1 and i have to say that i’m not entirely happy with the procedure.

To have a stable reproduce able setup we prebuild our opennebula master hosts with packer which works awesome in normal production use. For the upgrade though this get’s me in a pretty unpleasant position as the Upgrade description requires me to stop all nodes and upgrade those. Sure i can do that manually but it breaks the HA idea and is at least cumbersome which means longer downtime.

Our setup with prebuilt nodes might not be used widely but to upgrade production systems manually is not really something i want to do and maybe i’m not alone in that. I see that a db upgrade needs a downtime and i’m fine with that - it would still be great to at least be able to upgrade node by node in the future, at least on minor releases.

For our approach with prebuilt images it would be great to have a chance to add new nodes with the new version in the old cluster. That would at least allow me to cleanly migrate to new hosts on the new version without the need of either doing a manual upgrade of the old ones or rebuilding the whole cluster.

Most awesome would be to have a zero downtime update obviously for the HA setup. That would require the db sync to only push in data that fits the new db schema. This would allow to add new hosts with the upgraded version and then switch the leader to one of those nodes and remove the old ones. I don’t have enough insight in the data sync to say how much effort would be needed for that. It would just be awesome to be able to do zero downtime upgrades (or close to zero for the time needed to switch to a new leader).

ruben · September 21, 2017, 9:18am

I think that minor version upgrades can be relaxed as the DB versioning and API is not going to change. When this requirement is met nodes can be upgraded one by one.

However, for major releases with changes in the DB schema we would need to stop the cluster. In that case we could speed up the process by breaking the cluster, i.e.:

1.- Stop and remove follower HA nodes from the cluster (onezone delete…)
2.- With one front-end enable. Stop OpenNebula, upgrade and restart the service.
3.- Upgrade follower, and add it to cluster

Probably this procedure could be further improved by implementing the ability to “disable a follower”, i.e. disable replication on it. This way you do not have to remove/add the followers (with the associated change in ID).

I’ve filled and issue to take a look at it: https://dev.opennebula.org/issues/5382

sulaweyo · September 21, 2017, 10:12am

Hi ruben,

that sounds pretty good.
A downtime is expected if the db schema has to be upgraded and that’s all fine but the addon of a follower disable sounds very neat as it would save the time to “rebuild” the cluster.

My upgrade with node replacements will most likely run like that:

remove the follower nodes and scale down to a solo setup
upgrade OpenNebula on the remaining node manually
re-add new nodes and rebuild the cluster
remove the manually upgraded node from the zone

Need to test it but that should be easy to script and run with a downtime of a few minutes (which would save me from going to work on the weekend)

Edit:
Tested it in staging and it ran through in ~70sec … that’s a downtime i can live with. It will take longer in production as the db is a lot bigger but should still be finished ~2min.

Topic		Replies	Views
Upgrade MySQL to MySQL cluster Community Support	9	1127	August 9, 2017
Database Upgrade from 6.0.0 to 6.2.0 Upgrade	1	60	June 20, 2024
HA setup CMAN/CS Community Support	4	528	October 11, 2016
Host upgrade best practices Community Support	5	981	May 10, 2016
Subscription confusion Community Support	1	267	March 9, 2021

HA upgrade path

Related topics