VM Disk size monitoring

Hello, I am developing custom storage driver for HPE 3PAR and I have problems with monitoring VM DISK_SIZE. I already implemented VMs disk monitoring to TM_MAD monitor script. Oned sucessfully onitor VMs disks from system ds with 3par tm mad, but only after restart. On next monitoring cycle it doesn’t monitor VMs disk sizes from response of tm monitor script.

here is log, where we can see, that first time it monitor disk size using system ds tm monitor script and next time it use data from host probe monitoring script.

Thu Dec 20 20:25:00 2018 [Z0][ImM][D]: Datastore 3par_system (0) successfully monitored.
Thu Dec 20 20:25:00 2018 [Z0][VMM][D]: VM 57 successfully monitored: DISK_SIZE=[ID=0,SIZE=1366] DISK_SIZE=[ID=1,SIZE=506]
Thu Dec 20 20:25:00 2018 [Z0][VMM][D]: VM 58 successfully monitored: DISK_SIZE=[ID=0,SIZE=1366] DISK_SIZE=[ID=1,SIZE=506]
Thu Dec 20 20:25:00 2018 [Z0][VMM][D]: VM 67 successfully monitored: DISK_SIZE=[ID=0,SIZE=1283]
Thu Dec 20 20:25:07 2018 [Z0][InM][D]: Monitoring host tst.lin.fedora.host (0)
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:7552 UID:0 one.zone.raftstatus invoked
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:7552 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>-1<..."
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:2624 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:2624 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>57<..."
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:6128 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:6128 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>57<..."
Thu Dec 20 20:25:11 2018 [Z0][InM][D]: Host tst.lin.fedora.host (0) successfully monitored.
Thu Dec 20 20:25:11 2018 [Z0][VMM][D]: VM 57 successfully monitored: DISK_SIZE=[ID=0,SIZE=0] DISK_SIZE=[ID=1,SIZE=0] DISK_SIZE=[ID=2,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=92940 NETTX=67780 DISKRDBYTES=141090628 DISKWRBYTES=25890816 DISKRDIOPS=6961 DISKWRIOPS=2106
Thu Dec 20 20:25:11 2018 [Z0][VMM][D]: VM 58 successfully monitored: DISK_SIZE=[ID=0,SIZE=0] DISK_SIZE=[ID=1,SIZE=0] DISK_SIZE=[ID=2,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=94955 NETTX=71326 DISKRDBYTES=143212356 DISKWRBYTES=26133504 DISKRDIOPS=7041 DISKWRIOPS=2147
Thu Dec 20 20:25:11 2018 [Z0][VMM][D]: VM 67 successfully monitored: DISK_SIZE=[ID=0,SIZE=0] DISK_SIZE=[ID=1,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=106169 NETTX=139249 DISKRDBYTES=1408790340 DISKWRBYTES=1732886528 DISKRDIOPS=312972 DISKWRIOPS=420742

so I look at system datastore filesystem on host and removed .monitor file, so I disable local monitoring.

after oned restart it collects right data from system ds tm monitor script, but only first. No more VMs disk sizes monitor in next monitoring cycles…

Please help. Thanks

Hi Kristian,

I am patching the monitoring probes :slight_smile:
In brief - there is an additional file for each VM disk ${DS_ID}/{$VM_ID}/disk.${ID}.monitor holding information for the TM_MAD for which the disk belongs to in the $DRIVER variable.

Then I am patching the ssh and shared TM_MADs (https://github.com/OpenNebula/addon-storpool/tree/master/patches/tm/5.6.0) to call the disk’s TM_MAD script (named monitor_disk). As you can see the disks loop is hijacked too to skip the useless default monitoring.

As there are more third-party addons for storage backend I believe that this or similar solution should be up-streamed. I’ve developed the solution to be as much universal as possible.

I am patching the IM_MAD’s monitor_ds.sh too but it is for the case when the TM_MAD will be used for SYSTEM_DS. If you plan to support volatile disks (and the context ISO :wink:) on 3par you should take a lok what I am changing there too (total hijack).

Hope this helps. Please let me know if you have any questions :wink:

Best Regards,
Anton Todorov

Hi Anton, thank you for reply! I also look into your storage driver and copy something :).
Regarding monitoring - ${DS_ID}/{VM_ID}/disk.{ID}.monitor this you are adding to DS or it is by default? Because I can not see that file in system ds.

I use 3par also for system ds and find problem with volatile disks. By design, system ds type must be file. So it tries to attach disk to vm as file and not block. I think, that there is no way without patching vmm drivers, or is? So I leave volatile disks support alone.

Looks like, that easiest way is to create custom vmm, for ex. kvm-3par, to support volatile disks and size monitoring…

Regarding context iso, I leave it as file and use functionality from ssh driver. Is there some benefit have context isos as block storage on storpool/3par? we still need to copy/move deployment files, links to disks…

Hi Kristian,

This file is created in the addon for the disks managed by the driver. It is in the function that creates the symlink for disk.N in the VM’s home. So if there are disks from different datastore MADs it could call their monitoring probe.

Well the DS type is hard-coded as “file” in the core code that generate the domain XML. I’ve done some tests and found no issues leaving them as is (of type file). It looks like qemu-kvm is smart enough to handle them (at least on CentOS).

Usually the ISOs are relatively small files but in more complex setups they could be used to install additional software via them. Like installing Office, or other third party custom software. So keeping them on the block devie will save space on the host’s filesystem. This way the host OS could be running from a small ssd 64~128GB or even a satadom. Also, less files to scp when doing a hot migrate :wink:

For even more space saving I’ve implemented an option to store the VM’s checkpoint on a block device (requres qemu-kvm-ev though). This way on the SYSTEM DS the only file is the domain XML plus the symlinks to the disk block devices - much faster cold migrate and suspend/resume :wink:

Best Regards,
Anton

hmm, still I don’t understand, why opennebula ignores VM=[…] data result from tm/3par/monitor script. I was looking into qcow2 and shared tms and there is no monitor_ds script, just monitor and it returns VM=[…] data in same form as my script. It parses it only first time after opennebula restart till first host monitoring. After that it ignores it… strange behaviour

^ These results are from two separate monitor paths…

I am almost confident that there are two bugs/flaws that looks like related to the issue (still not tracked from when they are lurking using git blame <file>) Resolving the first is trivial but for the second one I need to take some sleep before re-thinking how to resolve it with minimal changes though.

Can you confirm that you are not using a nfs/shared system ds (i.e. using ssh/scp for file transfers, etc.)?

I’ll update you tomorrow.

Best,
Anton

Hello, I opened bug report.

^ These results are from two separate monitor paths…

After removing .monitor from system ds path, host monitoring returns just disk io, net rx/tx and so on.

When I look on VM disk next day, sizes seems updated, so looks like, there is some cycle on which sizes are updated from tm mad monitor script, but monitor script calculate and return sizes everytime called…

Hi all, I like to push the discussion on this issue because there is no reply on GitHub.

I found in the code, that monitor script only parses disk usage stats on each 10th run, so practically every 50minutes.

Few proposals:

  • this variable should be configurable
  • most important is adding additional argument when calling ds monitor script to inform, that should NOT collect data about disk/snapshot usage, because it cost resources and it is used only every 10th run.

Indeed, this is interesting. We’ll add a configuration option in 5.10