Batch service using OneFlow - best practices

Yenya · September 5, 2022, 3:54pm

Hello,

I am looking for tips how to configure a OneFlow service for CPU-intensive batch processing: I have many jobs which take several minutes on four CPU cores each. The jobs are CPU-local, no network communication is needed apart from downloading the input and uploading the result afterwards. I came with the following architecture:

a service with two roles:
master VM with outside network access would hold the job queue, each job in its own directory
private VNet for the service
computing VMs connected to the private VNet, mounting the job directory over NFS, getting job inputs from there, and uploading the result there

Now I have some questions:

when using autoscaling (when the queue has more than X waiting jobs for longer time than Y, spawn a new computing server, and when the queue is near-empty for more than Z, delete a computing server) - how can I properly drain a server? I.e. how can OneFlow inform the computing VM that it is the one which is going to be decomissioned, and after the VM finishes the current job, how can it inform the OneFlow server that it can be safely destroyed?
is it possible to lower the priority of QEMU processes for that role, so that it can use all the available CPU time on OpenNebula nodes, but not hinder the performance of the non-batch VMs?
Is it possible somehow to connect the OneGate endpoint to that private VNet, so that OneGate communication does not need to pass through the master node? I would prefer to not have to route/NAT that private VNet to the outside world, I want the computing VMs to remain as isolated as possible.

Thanks,

-Yenya

Topic		Replies	Views
Oneflow dynamic parameters Community Support	2	643	June 9, 2016
OpenNebula support for Autoscaling General	1	42	November 4, 2024
Redundant OneFlow / OneGate servers possible? Community Support	1	689	December 9, 2015
OneGate: expose a communication port inside the VM VM Configuration / Contextualization	7	552	September 29, 2023
Onegate and standalone VM's Community Support	3	473	May 27, 2019

Batch service using OneFlow - best practices

Related topics