CPU overcommit problem

I’ve one more problem before proceeding with the rollout of our new 5.12 setup.
In previous versions, I managed to overcommit memory and CPU to the max - I don’t really care in all honesty. We never ran out of resource as CPU was mostly idling and I could deploy as many VM’s as needed without impacting the system. In the new version, I want to to the same thing. The ESX cluster has 830GB real RAM, which I’ve successfully over provisioned to 1.6TB, but CPU overcommitment is a mess.
Right now, reserved CPU and RAM are both 0% (I’ve tried dropping CPU to -100%), but the allocated CPU is showing 1800/6000 with about 6 VM’s running but the real CPU is showing @ 2510%, and nothing I do in the overcommitment CPU slider (or entering any value) makes any difference. The physical cluster has 120 cores, but I just need to get the CPU overcommit as high as possible. In a previous version (5.6) the real CPU overcommit was running at 86400. The other thing I noticed is that the none of the ESX hosts show any Real Memory for some reason.

cpuovercommit1 cpuovercommit2 cpuovercommit3

You can try with the code that is in the Master branch, we have already solved this error

1 Like

Thanks for the info. Sounds like it’s a known problem then. Any idea when this update will hit release so I can update via yum? I’m pretty stuck at the moment and can’t release this new install to production until this problem is fixed. I’m currently running 5.12.0.1ce.

Thanks a lot

You can avoid using Sunstone and try to update host template directly from command line via onehost update $ID, adjust RESERVED_CPU and RESERVED_MEM attributes. See: http://docs.opennebula.io/5.12/operation/host_cluster_management/numa.html#cpu-pinning-and-overcommitment

Any news when this fix will be pushed to the release channel? It’s really holding us up now. I guess it’s pencilled in for the next 5.12 (or 5.13) release, but that can’t come soon enough - assuming this bug is fixed.

Thanks

Hi Tony, can you confirm this is fixed in the master branch? Would be good to know that the fix we have solves your problem.

Hi Tino
Thanks for the reply. I normally pull using yum from the release branch. This is the URL I’m using;

https://downloads.opennebula.org/repo/5.12/CentOS/8/$basearch

Sorry for the questions, but how would be it best to pull from the master branch?

-Tony

Hi Tony,

To try out a development version you need to compile OpenNebula from the HEAD of the master branch:

https://github.com/OpenNebula/one

You can find the steps to achieve this here:

http://docs.opennebula.io/5.12/integration/references/compile.html

Just noticed that 5.12.3 was available via yum. Not sure if that’s the version the over-committment fix is rolled up into, but I installed it anyway. After a reboot though, I can logon as oneadmin, but then just get a permanent spinning wheel and the UI never loads. Tried a full reboot, but no change!

Hi Tony,

If you are referring to the CE version (5.12.0.3) please check the release notes: http://docs.opennebula.io/5.12/intro_release_notes/release_notes_community/index.html

If you are referring to the EE version, latest 5.12.5 includes all the patches: http://docs.opennebula.io/5.12/intro_release_notes/release_notes_enterprise/resolved_issues_5125.html

Having recently updated to 5.12.3CE it seems that this overcommit problem is still not fixed in that version. This is now a massive hold up on deploying this project. I know this is the CE version and support is best effort, but for us, the reputation of OpenNebula as a product is on the line here. Is there any other fix I can apply to get this up and running? It’s just the overcommit problem now - but it’s a showstopper for us.

Just one final thing. The CE version seems very much to be treated as a second class citizen, trailing the Enterprise version considerably in fixes/features. I understand you want paying subscribers, but most potential customers first experiences will be based on CE, and if that’s not a good experience, they will look elsewhere. Releasing both versions with feature/fix parity would help considerably. Support will still be limited for CE, which is understandable, but it’s much easier for customers to get the full experience before committing.

Hi Tony!

I think you are referring to version 5.12.0.3 CE, right? :slight_smile:

Well, we actually do that for each minor/major release of OpenNebula :wink:

Hi, thanks for the reply. Yes, I’m referring to the ‘latest’ 5.12.03CE release, but I see the Enterprise version is tracking ahead of that in bug fix releases according to the download pages, according to what I can see anyway. It’s this particular over commitment bug in CE that’s causing me so many issues, and has done for many weeks/months now. I cannot go live with this version until that’s fixed. I was told the ‘fix’ was released to the master branch weeks ago, but the release notes for CE (up to 5.12.3) make no mention of it, and now I’m running the latest CE release, I can see that the problem is still there.
I do not want to build from source - not for a production system anyway. We need a stable, packaged release that we can easily upgrade as necessary. It’s these small, hanging issues that prevent the recommendation of moving to a fully supported tier.

Hi @TonyBarrett,
Sorry to hear this issue is preventing you from using our latest CE in production. We’ll synchronize both editions in less than 2 months, once we release the new OpenNebula 6.0. In the meantime, I’d suggest you keep using the CE version that works for you. If you are in a hurry, remember that there are a number of professional services we provide to corporate users, including deployment and upgrade: https://opennebula.io/enterprise/#services
Best,
Alberto.