ZFS addon / Issues with configuration of datastores

First post here, so please bear with me :slight_smile:

I would like to install ONE on a single node and configure a ZFS backend for VMs. The goal is to have separate ZFS volumes for each VM and to be able to make snapshots and clones of them (on top of the obvious performance and security benefits offered by ZFS).

I was able to install OpenNebula using the minione script, after which I added the OpenNebula ZFS Storage Driver (https://github.com/OpenNebula/addon-zfs) and followed the configuration steps. So far, so good.

These are some details about the system (hope I have included all of the relevant information):

Components

  • CentOS 7.6.1810
  • OpenNebula 5.8.1

ZFS

[root@minione01 ~]# zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
zfs4one         34.9G  2.48T  1.94G  /var/lib/one/datastores
zfs4one/one-12  8.25G  2.49T   902M  -
[...some other unused volumes...]

Datastores

[root@minione01 ~]# onedatastore list
 ID NAME     SIZE AVAIL CLUSTERS  IMAGES TYPE DS   TM      STAT
100 images   2.5T 99%   0              1 img  zfs  shared  on  
  0 system   2.5T 100%  0              0 sys  -    shared  on  

Details on datastore 100 (images)

[root@minione01 ~]# onedatastore show 100
DATASTORE 100 INFORMATION                                                       
ID             : 100                 
NAME           : images              
USER           : oneadmin            
GROUP          : oneadmin            
CLUSTERS       : 0                   
TYPE           : IMAGE               
DS_MAD         : zfs                 
TM_MAD         : shared              
BASE PATH      : /var/lib/one//datastores/100
DISK_TYPE      : BLOCK               
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 2.5T                
FREE:          : 2.5T                
USED:          : 34.9G               
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
BRIDGE_LIST="localhost"
CLONE_TARGET="SYSTEM"
CLONE_TARGET_SSH="SYSTEM"
CLUSTER="0"
DATASET_NAME="zfs4one"
DISK_TYPE="BLOCK"
DISK_TYPE_SSH="FILE"
DS_MAD="zfs"
LN_TARGET="NONE"
LN_TARGET_SSH="SYSTEM"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
TM_MAD="shared"
TM_MAD_SYSTEM="ssh"
TYPE="IMAGE_DS"

IMAGES         
12             

Details about image 12

[root@minione01 ~]# oneimage show 12
IMAGE 12 INFORMATION                                                            
ID             : 12                  
NAME           : CentOS 7 - KVM
USER           : oneadmin            
GROUP          : oneadmin            
LOCK           : None                
DATASTORE      : images              
TYPE           : OS                  
REGISTER TIME  : 05/16 09:11:14      
PERSISTENT     : No                  
SOURCE         : localhost:zfs4one/one-12
PATH           : https://marketplace.opennebula.systems//appliance/4e3b2788-d174-4151-b026-94bb0b987cbb/download/0
SIZE           : 8G                  
STATE          : used                
RUNNING_VMS    : 1                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : ---                 
OTHER          : ---                 

IMAGE TEMPLATE                                                                  
DEV_PREFIX="vd"
DRIVER="qcow2"
FORMAT="qcow2"
FROM_APP="30"
FROM_APP_MD5="dbc81ae029a17e12e51c0aac3cc5ac4d"
FROM_APP_NAME="CentOS 7 - KVM"

VIRTUAL MACHINES

    ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME
     8 oneadmin oneadmin example.com     pend    0      0K              0d 23h47

The problem

I could download a VM image from the marketplace just fine (see above), but when I try to instantiate it (VMs > Create Virtual Machine), I get the following error (I am instantiating it as non-persistent):

[one.vm.deploy] Image Datastore does not support transfer mode: shared

The expected behaviour would be as follows:

  1. The original image gets cloned (via ZFS)
  2. A new VM is instantiated

I am a bit stuck… the oned.log does not contain any extra information.

Can anyone please suggest how to debug this error?

Thanks a lot,
Corrado

I think this datastore should be configured as a local filesystem, and not as a shared filesystem, since you only have one host. But I’ve never used the ZFS addon, so I could be totally wrong…

here is some more info regarding the tm_mad setting:
http://docs.opennebula.org/5.8/deployment/open_cloud_storage_setup/fs_ds.html?highlight=tm_mad

The documentation of the addon also mentions that the tm_mad value should be set to “zfs”

see the docs on the github page, where it states for the zfs addon:
|DS_MAD |Must be zfs|
|TM_MAD |Must be zfs|

Thanks a lot for the pointers. I set the two variables DS_MAD and TM_MAD as suggested (both for the images and the system datastore) and I was able to progress a little bit.

Now, when I try to instantiate a VM, I get a different error:

Mon May 20 03:51:31 2019 : Error deploying virtual machine: Internal error
No such file or directory - /var/lib/one/remotes/tm/zfs/context

These are the contents of the /var/lib/one/remotes/tm/zfs directory:

[root@minione01 ~]# ls -alh /var/lib/one/remotes/tm/zfs/
total 40K
drwxr-xr-x 2 oneadmin oneadmin 4.0K May 14 06:50 .
drwxr-x--- 12 oneadmin oneadmin 4.0K May 14 06:50 ..
-rwxr-xr-x 1 oneadmin oneadmin 3.9K May 14 06:50 clone
-rwxr-xr-x 1 oneadmin oneadmin 3.4K May 14 06:50 cpds
-rwxr-xr-x 1 oneadmin oneadmin 3.3K May 14 06:50 delete
-rwxr-xr-x 1 oneadmin oneadmin 2.2K May 14 06:50 ln
-rwxr-xr-x 1 oneadmin oneadmin 1.2K May 14 06:50 mv
-rwxr-xr-x 1 oneadmin oneadmin 3.2K May 14 06:50 mvds
-rwxr-xr-x 1 oneadmin oneadmin 1.7K May 14 06:50 postmigrate
-rwxr-xr-x 1 oneadmin oneadmin 1.7K May 14 06:50 premigrate

Through searching I could not understand what the missing file is supposed to do or contain. Any ideas?

Thank you,
Corrado

a context is created when you deploy a VM, which contains all the settings you configured in the XML file or via Sunstone web interface. In this case that wasn’t possible during deployment, seems a rights issue.
Can user oneadmin create a file in that directory?

#touch /var/lib/one/remotes/tm/zfs/testfile

Besides the short log you pasted, is there anything else around it that might be relevant?
plz post the entire flow for the VM you try to deploy if you need more help. (about 30/50 lines showing the entire lifecycle of the VM)
Also, you can trace more details about the VM’s life from the ID-log in /var/log/one.
So a VM with ID 123 will have a logfile in /var/log/one/123.log with more details.

hope this helps!

Hey Roland, hi All,

thanks for the help provided so far.

Yes, that directory is writeable:

[root@minione01 ~]# su - oneadmin
Last login: Thu May 16 07:09:01 CEST 2019 from localhost.localdomain on pts/1
[oneadmin@minione01 ~]$ touch /var/lib/one/remotes/tm/zfs/testfile
[oneadmin@minione01 ~]$ ls -alh /var/lib/one/remotes/tm/zfs/testfile
-rw-rw-r-- 1 oneadmin oneadmin 0 May 23 12:47 /var/lib/one/remotes/tm/zfs/testfile

And you’re 100% right that I should have pasted the relevant lines of the log for this VM - thanks for pointing that out:

[root@minione01 ~]# cat /var/log/one/9.log 
Mon May 20 03:51:29 2019 [Z0][VM][I]: New state is ACTIVE
Mon May 20 03:51:29 2019 [Z0][VM][I]: New LCM state is PROLOG
Mon May 20 03:51:31 2019 [Z0][VM][I]: New LCM state is BOOT
Mon May 20 03:51:31 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/9/deployment.0
Mon May 20 03:51:31 2019 [Z0][VMM][I]: Internal error No such file or directory - /var/lib/one/remotes/tm/zfs/context
Mon May 20 03:51:31 2019 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_context.
Mon May 20 03:51:31 2019 [Z0][VMM][E]: Error deploying virtual machine: Internal error No such file or directory - /var/lib/one/remotes/tm/zfs/context
Mon May 20 03:51:31 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

I tried the entire flow again and this is the relevant excerpt from oned.log:

Thu May 23 12:57:39 2019 [Z0][DiM][D]: Deploying VM 10
Thu May 23 12:57:39 2019 [Z0][ReM][D]: Req:2064 UID:0 one.vm.deploy result SUCCESS, 10
Thu May 23 12:57:40 2019 [Z0][ReM][D]: Req:1904 UID:0 IP:127.0.0.1 one.vm.info invoked , 10
Thu May 23 12:57:40 2019 [Z0][ReM][D]: Req:1904 UID:0 one.vm.info result SUCCESS, "<VM><ID>10</ID><UID>..."
Thu May 23 12:57:40 2019 [Z0][AuM][D]: Message received: AUTHENTICATE SUCCESS 982 -
Thu May 23 12:57:40 2019 [Z0][ReM][D]: Req:2432 UID:0 IP:127.0.0.1 one.vm.info invoked , 10
Thu May 23 12:57:40 2019 [Z0][ReM][D]: Req:2432 UID:0 one.vm.info result SUCCESS, "<VM><ID>10</ID><UID>..."
Thu May 23 12:57:41 2019 [Z0][TM][D]: Message received: TRANSFER SUCCESS 10 -
Thu May 23 12:57:41 2019 [Z0][VMM][D]: Message received: LOG I 10 Internal error No such file or directory - /var/lib/one/remotes/tm/zfs/context
Thu May 23 12:57:41 2019 [Z0][VMM][D]: Message received: LOG I 10 Failed to execute transfer manager driver operation: tm_context.
Thu May 23 12:57:41 2019 [Z0][VMM][D]: Message received: DEPLOY FAILURE 10 Internal error No such file or directory - /var/lib/one/remotes/tm/zfs/context

So, in a nutshell, deploy works but contextualisation fails because there is no context command available in the ZFS addon (https://github.com/OpenNebula/addon-zfs/tree/master/tm/zfs).

  1. Is the lack of contextualisation a known limitation of the ZFS addon for OpenNebula?
  2. Can anyone suggest a workaround?

Thanks a lot,
Corrado Fiore

I can’t test that, but there is no context in github for it, indeed.

From the docs there I see a “onedatastore list” where it shows that the system and default datastores are just local filesystems and not ZFS volumes, so I assume that if you deploy a VM, the contextualization files are created there and not on ZFS.

His output @ github shows:

> onedatastore list
  ID NAME            CLUSTER  IMAGES TYPE   TM    
   0 system          none     0      fs     shared
   1 default         none     3      fs     shared
 100 zfs             none     0      zfs    shared

Is yours similar, or do you have system / default also on ZFS?

EDIT: nevermind, I saw your datastore list output in an earlier post:

[root@minione01 ~]# onedatastore list
 ID NAME     SIZE AVAIL CLUSTERS  IMAGES TYPE DS   TM      STAT
100 images   2.5T 99%   0              1 img  zfs  shared  on  
  0 system   2.5T 100%  0              0 sys  -    shared  on  

I discovered something interesting: the context script in each storage type’s tm folder is the same across different storage types, so I just copied the LVM one into ZFS:

[root@minione01 ~]# ls -al /var/lib/one/remotes/tm/*/context
-rwxr-xr-x 1 oneadmin oneadmin 3677 Apr 7 14:22 /var/lib/one/remotes/tm/ceph/context
-rwxr-xr-x 1 oneadmin oneadmin 1204 Apr 7 14:22 /var/lib/one/remotes/tm/dummy/context
-rwxr-xr-x 1 oneadmin oneadmin 3677 Apr 7 14:22 /var/lib/one/remotes/tm/fs_lvm/context
-rwxr-xr-x 1 oneadmin oneadmin 3677 Apr 7 14:22 /var/lib/one/remotes/tm/qcow2/context
-rwxr-xr-x 1 oneadmin oneadmin 3677 Apr 7 14:22 /var/lib/one/remotes/tm/shared/context
-rwxr-xr-x 1 oneadmin oneadmin 3677 Apr 7 14:22 /var/lib/one/remotes/tm/ssh/context
-rwxr-xr-x 1 oneadmin oneadmin 1204 Apr 7 14:22 /var/lib/one/remotes/tm/vcenter/context
-rwxr-xr-x 1 oneadmin oneadmin 3677 May 23 15:41 /var/lib/one/remotes/tm/zfs/context

It seems to be working just fine:

Thu May 23 15:43:54 2019 [Z0][VM][I]: New LCM state is BOOT
Thu May 23 15:43:54 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/10/deployment.0
Thu May 23 15:43:55 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.

(Note to myself: prepare a pull request for the missing file)


However, the challenge is not over yet. In fact, I am getting another error:

Thu May 23 15:43:54 2019 [Z0][VM][I]: New LCM state is BOOT
Thu May 23 15:43:54 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/10/deployment.0
Thu May 23 15:43:55 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Thu May 23 15:43:55 2019 [Z0][VMM][I]: pre: Executed "sudo brctl addif vmbr0 enp4s0".
Thu May 23 15:43:55 2019 [Z0][VMM][I]: ExitCode: 0
Thu May 23 15:43:55 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Thu May 23 15:43:56 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/10/deployment.0' 'localhost' 10 localhost
Thu May 23 15:43:56 2019 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/10/deployment.0
Thu May 23 15:43:56 2019 [Z0][VMM][I]: error: internal error: qemu unexpectedly closed the monitor: 2019-05-23T13:43:56.593381Z qemu-kvm: -drive file=/var/lib/one//datastores/0/10/disk.0,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Image is not in qcow2 format
Thu May 23 15:43:56 2019 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/10/deployment.0
Thu May 23 15:43:56 2019 [Z0][VMM][I]: ExitCode: 255
Thu May 23 15:43:56 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Thu May 23 15:43:56 2019 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/10/deployment.0
Thu May 23 15:43:56 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Notes:

These are the contents of the datastore directory:

[root@minione01 ~]# ls -al /var/lib/one//datastores/0/10/
total 19
drwxrwxr-x 2 oneadmin oneadmin      5 May 23 15:43 .
drwxr-xr-x 4 oneadmin oneadmin      4 May 23 12:57 ..
-rw-rw-r-- 1 oneadmin oneadmin   1482 May 23 15:43 deployment.0
lrwxrwxrwx 1 oneadmin oneadmin     29 May 23 12:57 disk.0 -> /dev/zvol/zfs4one/one-12-10-0
-rw-r--r-- 1 oneadmin oneadmin 372736 May 23 15:43 disk.1

The deploy file exists:

[root@minione01 ~]# ls -al /var/tmp/one/vmm/kvm/deploy
-rwxr-xr-x 1 oneadmin oneadmin 2223 May 14 06:42 /var/tmp/one/vmm/kvm/deploy

Questions

Basically, I am not understanding the error message.


Thanks in advance for any help.

Corrado Fiore

qemu-kvm: -drive file=/var/lib/one//datastores/0/10/disk.0,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Image is not in qcow2 format

Is the disk with qcow2 file format?

file -s /dev/zvol/zfs4one/one-12-10-0

Most probably it is raw…

Best Regards,
Anton Todorov