Scheduled actions (Sysadmin/netadmin course in ONe)

Hi all,

I want to provide support for a sysadmin/netadmin course at our university with ONe, and I am looking for best practices or other hints. The course has about 200 students divided into 5 to 10 on-line study groups. Students in a particular group are supposed to work at the dedicated time (scheduled weekly) together with the lecturer. Each student will use two pre-installed VMs on a dedicated L3 prefix, with in-band access over SSH (or SSH port-forwarding), provided that they do not manage to cut their network off :-), and - if necessary - out-of-band VNC access in Sunstone.

I am thinking about the following setup: create a pair of VMs for each student, put them on a shared L2 network with non-overlapping per-student L3 prefix (e.g student 1 will have 10.0.1.10/24 and 10.0.1.11/24, student 2 will have 10.0.2.10/24 and 10.0.2.11/24, and so on). Then I will create a gateway VM for this network with public IP address, SSH+LDAP authentication, and outgoing SNAT, connected to that internal L2 network with many per-student IP addresses (10.0.1.1/24, 10.0.2.1/24, and so on). The students’ VMs will have a default route to this 10.x.y.1 address from their /24 prefix.

Then to conserve the resources, I will set up periodic undeploy hard of these VMs every evening, and a rollback to the “gold standard” snapshot few minutes later. Before the lesson, students will start their VMs themselves, so the CPU and memory will not be used all the time for all 200 students.

Does the above sound feasible? Do you use ONe for academic courses in a similar way? What is your experience and best practices?

I have few problems with the above:

  • when I want to create the image for these students’ VMs with a “gold standard” snapshot, I cannot make it non-persistent, because an image with snapshots apparently cannot be made non-persistent.

  • so I instead wanted to prepare a template using a non-persistent image without snapshots, and create the snapshot only after the instantiation. I have set up in “Actions” of the template the following way:

    • one-time relative action +5 minutes: poweroff

    • one-time relative action +10 minutes: create a snapshot of the disk 0

    • periodic hourly action at Xh 30 min: undeploy hard (for production use, I will make it daily instead of hourly, of course)

    • periodic hourly action at Xh 35 min: disk-snapshot-revert of the disk 0 to the snapshot 0.

But when I instantiated such a template, the course of actions did not go as I expected, according to the VM log (i removed some entries from the log to make it shorter).

Fri Feb 26 13:16:25 2021 [Z0][VM][I]: New LCM state is PROLOG
Fri Feb 26 13:16:35 2021 [Z0][VM][I]: New LCM state is RUNNING
Fri Feb 26 13:16:41 2021 [Z0][VM][I]: New LCM state is SHUTDOWN_UNDEPLOY
(the above only after 6 seconds instead of 5 minutes, and undeploy instead of poweroff?)
Fri Feb 26 13:16:48 2021 [Z0][VM][I]: New state is UNDEPLOYED
(the snapshot creation of the disk 0 scheduled at +10 minutes did not get done at all, so I manually
started the VM an hour later)
Fri Feb 26 14:29:42 2021 [Z0][VM][I]: New state is ACTIVE
Fri Feb 26 14:29:51 2021 [Z0][VM][I]: New LCM state is RUNNING
Fri Feb 26 14:29:58 2021 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
(why the poweroff here? it was supposed to be one-time action?)
Fri Feb 26 14:32:39 2021 [Z0][VM][I]: New state is POWEROFF
(now finally the initial snapshot is being created, why now?)
Fri Feb 26 14:32:40 2021 [Z0][VM][I]: New state is ACTIVE
Fri Feb 26 14:32:40 2021 [Z0][VM][I]: New LCM state is DISK_SNAPSHOT_POWEROFF
Fri Feb 26 14:32:42 2021 [Z0][LCM][I]: VM disk snapshot operation completed.
Fri Feb 26 14:32:42 2021 [Z0][VM][I]: New state is POWEROFF
(at :35 there is scheduled revert to the gold standard, is't it too fast at :32?)
Fri Feb 26 14:32:56 2021 [Z0][VM][I]: New LCM state is DISK_SNAPSHOT_REVERT_POWEROFF
Fri Feb 26 14:33:00 2021 [Z0][VM][I]: New LCM state is LCM_INIT
(and now its getting undeployed - why now?)
Fri Feb 26 14:33:13 2021 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
Fri Feb 26 14:33:18 2021 [Z0][VM][I]: New LCM state is LCM_INIT

Did I set up anything incorrectly?

Thanks!

-Yenya

Exploring the actions tab further: I want to create a template, which would after instantiation result in a VM which undeploys daily at - say - 23:30, and reverts to the disk snapshot 10 minutes later.

  • firstly, setting up daily action in Sunstone is pretty complicated - Weekly has to be selected first, and then all seven checkboxes should be checked. Could a separate “daily” option be added?

  • when I set up a daily undeploy in the template, it has starting date (today by default). So if someone instantiates the template several days later, the VM gets undeployed almost immediately, probably because the scheduler thinks it has already missed the previous undeploy period. How can I make a periodic action in the template, but without starting date, or with starting date set up to the VM instantiation time?

  • how do relative times work? They are relative to what? Can I for example set up “undeploy”, after 10 minutes “undeploy hard”, a and after another 10 minutes “create the initial snapshot, if not present”?

  • are relative times available in Sunstone only? I did not figure out how to set the relative action using onevm ... --schedule

Thanks,

-Yenya

TIL snapshots cannot be reverted in Undeployed state. Why? It should be the same as in Poweroff at the Ceph level.

So now I have to figure how to edit actions using onevm --schedule or something in batch mode - there is no way how would I do this for 400+ VMs I have created yesterday by hand.

OK, scheduled actions can be deleted with onevm delete-chart, and re-created with onevm poweroff --hard --schedule ... --weekly 0,1,2,3,4,5,6. There are still several problems with that:

  • What I want to do is to shutdown the VM at the evening, and the student will power it up again next day, next week or whenever he needs it. However, if the VM with daily poweroff or undeploy gets powered off or undeployed for the next scheduled period, the ONe scheduler attempts to re-run the missed scheduled task[s] as soon as possible. So the students will try to power on their VMs several days after, most probably just at the beginning of the next lesson, and scheduler will immediately shut them down under their hands.

  • for testing purposes, I wanted to schedule hourly poweroff - it is easy to do in Sunstone - select Hourly, and every 1 hour. But from the command line I cannot set it up:

$ onevm undeploy $VMID --schedule "`perl -e 'print scalar localtime(time + 900)'`" --hourly `perl -e 'print join(",", 0..23)'`
invalid argument: --hourly 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23

So, are periodic actions usable for my use case? Thanks!

OK, something is pretty broken here: I tried to delete all the undeploy actions I created yesterday or the day before yesterday. I ran the onevm delete-chart command in a for-loop for charts 0 to 2 and for several hundred VMs. I still have the following snippet in my terminal from bash -x output of these for-loops:

+ onevm delete-chart 4470 0
+ for id in '`seq 0 2`'
+ onevm delete-chart 4470 1
+ for id in '`seq 0 2`'
+ onevm delete-chart 4470 2
Sched action 2 not found

Yet still oned complains that it cannot undeploy VM 4470 in UNDEPLOYED state, and in sunstone there are two actions for VM 4470 visible. Looking at oned.log further, I found the actions are still being run for about 100 out of 400 VMs I have created.

So I took another VM, from this list, and tried to remove the scheduled actions once again:

# onevm delete-chart 4446 1
# onevm delete-chart 4446 1
Sched action 1 not found
# onevm delete-chart 4446 0
# onevm delete-chart 4446 0
Sched action 0 not found

So the actions were not deleted in that for-loop. Another VM had only action #1, and I got this while trying to delete it:

# onevm delete-chart 4447 1
# onevm delete-chart 4447 1
Sched action 1 not found
# onevm delete-chart 4447 1
# onevm delete-chart 4447 1
Sched action 1 not found
# onevm delete-chart 4447 1
Sched action 1 not found
# onevm delete-chart 4447 1
Sched action 1 not found

So the first onevm delete-chart deleted the action, the second one reported that the action is not there anymore, but the third one still “successfully” deleted the action.

EDIT: Could it be missing commit somewhere, or stale data cached somewhere? FWIW, I use MySQL as DB backend.