Service Template deployment "Pending" forever waiting on host resources

kemcfarl · December 11, 2024, 11:37pm

Hello,
I’m looking for any VM Open Nebula Scheduler configurable that will stop trying to schedule a VM within a “Service Template” from “oneflow-template instantiate”. I did search for answers to this question, but only found years old post and specific to a single VM deployment.

Here we have a Service template with many VM Roles defined. At times, there is not enough resources.
Cannot dispatch VM: No host with enough capacity to deploy the VM

All VMs are required for the Service to work properly.

We see that the Template state will stay in “Deploying” forever. This is tough for our automated infra triggering these template launches. Is there any way to get a different State on the oneflow API or automatically fail the deployment after a given time period?

We can workaround this by walking through all the VM for there status, but would prefer a status update from oneflow if that’s possible.

Thanks for you time and guidance,

–Ken

dclavijo · December 12, 2024, 4:45pm

There are states for the oneflow instances and states for each of the role of the instances. If there are VMs managed by the service that have still not reported READY, then it will remain in that state until the VMs report back.

We see that the Template state will stay in “Deploying” forever. This is tough for our automated infra triggering these template launches. Is there any way to get a different State on the oneflow API or automatically fail the deployment after a given time period?

Could you elaborate a bit on what you mean by getting a different state ? There is a state for failures during deployment, but that only happens provided that a VM fails to deploy.

Please check OneFlow Services Management — OpenNebula 6.10.1 documentation

kemcfarl · December 12, 2024, 5:25pm

Hi Daniel,

What I meant by “getting different state” was is there any way for this to Fail based on not enough resources vs staying in “deploying” forever.

For example if there isn’t enough IPs in the network, it doesn’t wait. It immediately returns “Failed_Deploying”. This is nice from an automated API perspective. It also returns a nice clear log as to why it failed. (Screen shot#1 below)

However when it is short on Host Memory/CPU (I assume also for datastore, but haven’t tested that). Then the service profile will stay in “deploying” forever waiting from Host resources. I realize the job has been sent to the scheduler and it continuously attempt to find resources. There is certainly a use-case where that is desirable. In my case, I’d rather it fail and let our automation decide what to do next. (see 2nd and 3rd Screen shot below)

Thanks for your time and advice.
–Ken

dclavijo · December 16, 2024, 4:44pm

Hi Ken, could you open an issue requesting this as a feature ? In this case you’d need something like a TIMEOUT for a PENDING VM. This is a change on the scheduler that could have a lot of implications so it needs to be reviewed to check if it makes sense.

kemcfarl · December 17, 2024, 10:08pm

Hi Daniel, Thanks for considering our request for a change. I will add the github new feature request tomorrow.

Thanks!

–Ken

kemcfarl · December 19, 2024, 8:21pm

Ok added the new feature request. Let me know if its clear or not.

github.com/OpenNebula/one

Scheduler Pending Timeout waiting for VM resources with OneFlow Service Templates

opened 08:20PM - 19 Dec 24 UTC

KenM603

Type: Feature

**Description** Currently a new VM create request sent to the One Scheduler ser…vice will stay in PENDING state forever waiting on available resources (Memory, CPU, storage). The request is to provide a method to fail the request after a configurable time period. **Use case** At the Scheduler we can think of this as 1 VM. However, for our use-case this becomes more important with OneService that has many VMs tied together with a "role" relationship. For the service to work properly all initial VMs are required. We have automation creating many new independent Service environments on Demand via Open Nebula OneFlow APIs. At times we run out of KVM Host resources. Since currently the Scheduler will stay in VM Pending state forever, then the status of OneFlow Template create will also stay in "Deploying" forever. This becomes difficult for an automated infra. Ideally this will fail at some point and return an Error code on OneFlow. **Interface Changes** - [ ] Minimum change requested: /etc/one/sched.conf should have user configurable PENDING TIMEOUT value. - [ ] Nice to have aditional change requested: Add a VM Templates new configurable to pass PENDING TIMEOUT to Scheduler. This can override the configured Scheduler default value. This can be useful for larger VMs that might take longer than default PENDING TIMEOUT. **Additional Context** Please see this Forum post for additional comments: https://forum.opennebula.io/t/service-template-deployment-pending-forever-waiting-on-host-resources/13362 ![image](https://github.com/user-attachments/assets/01360613-1f5f-43aa-9fda-a9235d07fc90) ## Progress Status - [ ] Code committed - [ ] Testing - QA - [ ] Documentation (Release notes - resolved issues, compatibility, known issues)

Topic		Replies	Views
Choosing cluster when instantiating service template Community Support	3	29	January 13, 2025
How to automatically deploy a new VM Community Support	3	1056	October 1, 2018
Automatic deployment fails suddenly Community Support	3	521	April 28, 2015
Once i deploy vm from vcenter template its goes to pending stage Community Support	3	628	November 30, 2016
Problems starting VM from template Community Support	5	4040	June 10, 2015

Service Template deployment "Pending" forever waiting on host resources

Related topics