VM Disk size monitoring

feldsam · December 20, 2018, 8:37pm

Hello, I am developing custom storage driver for HPE 3PAR and I have problems with monitoring VM DISK_SIZE. I already implemented VMs disk monitoring to TM_MAD monitor script. Oned sucessfully onitor VMs disks from system ds with 3par tm mad, but only after restart. On next monitoring cycle it doesn’t monitor VMs disk sizes from response of tm monitor script.

here is log, where we can see, that first time it monitor disk size using system ds tm monitor script and next time it use data from host probe monitoring script.

Thu Dec 20 20:25:00 2018 [Z0][ImM][D]: Datastore 3par_system (0) successfully monitored.
Thu Dec 20 20:25:00 2018 [Z0][VMM][D]: VM 57 successfully monitored: DISK_SIZE=[ID=0,SIZE=1366] DISK_SIZE=[ID=1,SIZE=506]
Thu Dec 20 20:25:00 2018 [Z0][VMM][D]: VM 58 successfully monitored: DISK_SIZE=[ID=0,SIZE=1366] DISK_SIZE=[ID=1,SIZE=506]
Thu Dec 20 20:25:00 2018 [Z0][VMM][D]: VM 67 successfully monitored: DISK_SIZE=[ID=0,SIZE=1283]
Thu Dec 20 20:25:07 2018 [Z0][InM][D]: Monitoring host tst.lin.fedora.host (0)
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:7552 UID:0 one.zone.raftstatus invoked
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:7552 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>-1<..."
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:2624 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:2624 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>57<..."
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:6128 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Thu Dec 20 20:25:11 2018 [Z0][ReM][D]: Req:6128 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>57<..."
Thu Dec 20 20:25:11 2018 [Z0][InM][D]: Host tst.lin.fedora.host (0) successfully monitored.
Thu Dec 20 20:25:11 2018 [Z0][VMM][D]: VM 57 successfully monitored: DISK_SIZE=[ID=0,SIZE=0] DISK_SIZE=[ID=1,SIZE=0] DISK_SIZE=[ID=2,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=92940 NETTX=67780 DISKRDBYTES=141090628 DISKWRBYTES=25890816 DISKRDIOPS=6961 DISKWRIOPS=2106
Thu Dec 20 20:25:11 2018 [Z0][VMM][D]: VM 58 successfully monitored: DISK_SIZE=[ID=0,SIZE=0] DISK_SIZE=[ID=1,SIZE=0] DISK_SIZE=[ID=2,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=94955 NETTX=71326 DISKRDBYTES=143212356 DISKWRBYTES=26133504 DISKRDIOPS=7041 DISKWRIOPS=2147
Thu Dec 20 20:25:11 2018 [Z0][VMM][D]: VM 67 successfully monitored: DISK_SIZE=[ID=0,SIZE=0] DISK_SIZE=[ID=1,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=106169 NETTX=139249 DISKRDBYTES=1408790340 DISKWRBYTES=1732886528 DISKRDIOPS=312972 DISKWRIOPS=420742

so I look at system datastore filesystem on host and removed .monitor file, so I disable local monitoring.

after oned restart it collects right data from system ds tm monitor script, but only first. No more VMs disk sizes monitor in next monitoring cycles…

Please help. Thanks

atodorov_storpool · December 27, 2018, 1:46pm

Hi Kristian,

I am patching the monitoring probes
In brief - there is an additional file for each VM disk ${DS_ID}/{$VM_ID}/disk.${ID}.monitor holding information for the TM_MAD for which the disk belongs to in the $DRIVER variable.

Then I am patching the ssh and shared TM_MADs (https://github.com/OpenNebula/addon-storpool/tree/master/patches/tm/5.6.0) to call the disk’s TM_MAD script (named monitor_disk). As you can see the disks loop is hijacked too to skip the useless default monitoring.

As there are more third-party addons for storage backend I believe that this or similar solution should be up-streamed. I’ve developed the solution to be as much universal as possible.

I am patching the IM_MAD’s monitor_ds.sh too but it is for the case when the TM_MAD will be used for SYSTEM_DS. If you plan to support volatile disks (and the context ISO ) on 3par you should take a lok what I am changing there too (total hijack).

Hope this helps. Please let me know if you have any questions

Best Regards,
Anton Todorov

feldsam · December 27, 2018, 4:22pm

Hi Anton, thank you for reply! I also look into your storage driver and copy something :).
Regarding monitoring - ${DS_ID}/{VM_ID}/disk.{ID}.monitor this you are adding to DS or it is by default? Because I can not see that file in system ds.

I use 3par also for system ds and find problem with volatile disks. By design, system ds type must be file. So it tries to attach disk to vm as file and not block. I think, that there is no way without patching vmm drivers, or is? So I leave volatile disks support alone.

Looks like, that easiest way is to create custom vmm, for ex. kvm-3par, to support volatile disks and size monitoring…

Regarding context iso, I leave it as file and use functionality from ssh driver. Is there some benefit have context isos as block storage on storpool/3par? we still need to copy/move deployment files, links to disks…

atodorov_storpool · December 28, 2018, 11:34am

Hi Kristian,

This file is created in the addon for the disks managed by the driver. It is in the function that creates the symlink for disk.N in the VM’s home. So if there are disks from different datastore MADs it could call their monitoring probe.

Well the DS type is hard-coded as “file” in the core code that generate the domain XML. I’ve done some tests and found no issues leaving them as is (of type file). It looks like qemu-kvm is smart enough to handle them (at least on CentOS).

Usually the ISOs are relatively small files but in more complex setups they could be used to install additional software via them. Like installing Office, or other third party custom software. So keeping them on the block devie will save space on the host’s filesystem. This way the host OS could be running from a small ssd 64~128GB or even a satadom. Also, less files to scp when doing a hot migrate

For even more space saving I’ve implemented an option to store the VM’s checkpoint on a block device (requres qemu-kvm-ev though). This way on the SYSTEM DS the only file is the domain XML plus the symlinks to the disk block devices - much faster cold migrate and suspend/resume

Best Regards,
Anton

feldsam · December 29, 2018, 10:21am

hmm, still I don’t understand, why opennebula ignores VM=[…] data result from tm/3par/monitor script. I was looking into qcow2 and shared tms and there is no monitor_ds script, just monitor and it returns VM=[…] data in same form as my script. It parses it only first time after opennebula restart till first host monitoring. After that it ignores it… strange behaviour

atodorov_storpool · December 29, 2018, 8:25pm

^ These results are from two separate monitor paths…

I am almost confident that there are two bugs/flaws that looks like related to the issue (still not tracked from when they are lurking using git blame <file>) Resolving the first is trivial but for the second one I need to take some sleep before re-thinking how to resolve it with minimal changes though.

Can you confirm that you are not using a nfs/shared system ds (i.e. using ssh/scp for file transfers, etc.)?

I’ll update you tomorrow.

Best,
Anton

feldsam · December 30, 2018, 10:07am

Hello, I opened bug report.

github.com/OpenNebula/one

Disk size monitoring unexcepted behaviour

opened 10:37AM - 29 Dec 18 UTC

closed 08:35PM - 17 Jul 19 UTC

feldsam

Category: Core & System Type: Feature Status: Accepted Priority: Normal

**Description** VM disk size monitored only one time after opennebula service r…estart using tm/driver/monitor script. ``` Sat Dec 29 11:24:24 2018 [Z0][ImM][D]: Datastore 3par_system (0) successfully monitored. Sat Dec 29 11:24:24 2018 [Z0][VMM][D]: VM 97 successfully monitored: DISK_SIZE=[ID=0,SIZE=4242] Sat Dec 29 11:25:09 2018 [Z0][InM][D]: Host tst.lin.fedora.host (0) successfully monitored. Sat Dec 29 11:25:09 2018 [Z0][VMM][D]: VM 97 successfully monitored: STATE=a CPU=0.0 MEMORY=807324 NETRX=6317550576 NETTX=127931671 DISKRDBYTES=285341508 DISKWRBYTES=6207442432 DISKRDIOPS=8641 DISKWRIOPS=15705 Sat Dec 29 11:29:25 2018 [Z0][ImM][D]: Datastore 3par_system (0) successfully monitored. ``` As you see in logs, first time system ds is monitored log is followed by `VM 97 successfully monitored: DISK_SIZE=[ID=0,SIZE=4242]`, next systems ds monitorings are without VM disk size parsing. **Expected behavior** Disk size monitoring each time system datastore is monitored. **Details** - Affected Component: Storage - Hypervisor: KVM - Version: 5.6.2 ## Progress Status - [x] Branch created - [x] Code committed to development branch - [ ] Testing - QA - [x] Documentation - [x] Release notes - resolved issues, compatibility, known issues - [x] Code committed to upstream release/hotfix branches - [x] Documentation committed to upstream release/hotfix branches

^ These results are from two separate monitor paths…

After removing .monitor from system ds path, host monitoring returns just disk io, net rx/tx and so on.

When I look on VM disk next day, sizes seems updated, so looks like, there is some cycle on which sizes are updated from tm mad monitor script, but monitor script calculate and return sizes everytime called…

feldsam · July 10, 2019, 10:14am

Hi all, I like to push the discussion on this issue because there is no reply on GitHub.

I found in the code, that monitor script only parses disk usage stats on each 10th run, so practically every 50minutes.

Few proposals:

this variable should be configurable
most important is adding additional argument when calling ds monitor script to inform, that should NOT collect data about disk/snapshot usage, because it cost resources and it is used only every 10th run.

tinova · July 10, 2019, 10:32am

Indeed, this is interesting. We’ll add a configuration option in 5.10

Topic		Replies	Views
[SOLVED] VMFS Image datastore does not show capacity Product Support	8	3950	October 21, 2016
How are DSs being monitored /w NFS shared storage? Product Support	6	1364	March 23, 2015
Disk usage calculation for system datastore Integration Support	8	2026	June 9, 2015
Is there a way to force OpenNebula to rescan for datastore changes? Product Support	5	1761	November 11, 2019
Vcenter script datastore monitoring spawning multiple executions Product Support	3	496	November 7, 2016

VM Disk size monitoring

Related topics