Drbd datastore live migration fail

one · January 30, 2018, 2:44pm

i am using Debian 9, opennebula 5.4.6 and drbdadm 9.2.0 with drbd driver from github. everything works correctly: i can download images and start VMs on the drbd datastore. But when i try to migrate (live or non-live). it fails. here are the output:

onedatastore show 107:

DATASTORE 107 INFORMATION                                                       
ID             : 107                 
NAME           : drbdmanage_redundant
USER           : oneadmin            
GROUP          : oneadmin            
CLUSTERS       : 0,100               
TYPE           : IMAGE               
DS_MAD         : drbdmanage          
TM_MAD         : drbdmanage          
BASE PATH      : /var/lib/one//datastores/107
DISK_TYPE      : FILE                
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 3.8T                
FREE:          : 3.7T                
USED:          : 0M                  
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="NO"
BRIDGE_LIST="virt1 virt2"
CLONE_TARGET="SELF"
DISK_TYPE="FILE"
DRBD_REDUNDANCY="2"
DRBD_SUPPORT_LIVE_MIGRATION="yes"
DS_MAD="drbdmanage"
LN_TARGET="NONE"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
TM_MAD="drbdmanage"

IMAGES         
32

log on live migrate:

Tue Jan 30 15:14:39 2018 [Z0][VM][I]: New LCM state is RUNNING
Tue Jan 30 15:19:25 2018 [Z0][VM][I]: New LCM state is MIGRATE
Tue Jan 30 15:19:25 2018 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Tue Jan 30 15:19:25 2018 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Jan 30 15:19:26 2018 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate 'one-38' 'virt1' 'virt2' 38 virt2
Tue Jan 30 15:19:26 2018 [Z0][VMM][E]: migrate: Command "virsh --connect qemu:///system migrate --live one-38 qemu+ssh://virt1/system" failed: error: Cannot access storage file '/var/lib/one//datastores/0/38/disk.1' (as uid:9869, gid:9869): No such file or directory
Tue Jan 30 15:19:26 2018 [Z0][VMM][E]: Could not migrate one-38 to virt1
Tue Jan 30 15:19:26 2018 [Z0][VMM][I]: ExitCode: 1
Tue Jan 30 15:19:26 2018 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_failmigrate.
Tue Jan 30 15:19:26 2018 [Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Tue Jan 30 15:19:26 2018 [Z0][VMM][E]: Error live migrating VM: Could not migrate one-38 to virt1
Tue Jan 30 15:19:26 2018 [Z0][VM][I]: New LCM state is RUNNING
Tue Jan 30 15:19:26 2018 [Z0][LCM][I]: Fail to live migrate VM. Assuming that the VM is still RUNNING (will poll VM).

log on migrate:

Tue Jan 30 15:21:10 2018 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Tue Jan 30 15:21:12 2018 [Z0][VMM][I]: /var/tmp/one/vmm/kvm/save: line 58: warning: command substitution: ignored null byte in input
Tue Jan 30 15:21:12 2018 [Z0][VMM][I]: ExitCode: 0
Tue Jan 30 15:21:12 2018 [Z0][VMM][I]: Successfully execute virtualization driver operation: save.
Tue Jan 30 15:21:12 2018 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue Jan 30 15:21:12 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE
Tue Jan 30 15:21:23 2018 [Z0][VM][I]: New LCM state is BOOT_MIGRATE
Tue Jan 30 15:21:23 2018 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Tue Jan 30 15:21:23 2018 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Jan 30 15:21:24 2018 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/restore '/var/lib/one//datastores/0/38/checkpoint' 'virt1' 'one-38' 38 virt1
Tue Jan 30 15:21:24 2018 [Z0][VMM][I]: /var/tmp/one/vmm/kvm/restore: line 43: warning: command substitution: ignored null byte in input
Tue Jan 30 15:21:24 2018 [Z0][VMM][E]: restore: Command "virsh --connect qemu:///system restore /var/lib/one//datastores/0/38/checkpoint --xml /var/lib/one//datastores/0/38/checkpoint.xml" failed: error: Failed to restore domain from /var/lib/one//datastores/0/38/checkpoint
Tue Jan 30 15:21:24 2018 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/0/38/disk.0' (as uid:9869, gid:9869): No such file or directory
Tue Jan 30 15:21:24 2018 [Z0][VMM][E]: Could not restore from /var/lib/one//datastores/0/38/checkpoint
Tue Jan 30 15:21:24 2018 [Z0][VMM][I]: ExitCode: 1
Tue Jan 30 15:21:24 2018 [Z0][VMM][I]: Failed to execute virtualization driver operation: restore.
Tue Jan 30 15:21:24 2018 [Z0][VMM][E]: Error restoring VM: Could not restore from /var/lib/one//datastores/0/38/checkpoint
Tue Jan 30 15:21:24 2018 [Z0][VM][I]: New LCM state is BOOT_MIGRATE_FAILURE

after live migrate, i inspect the destination server location “/var/lib/one/datastore/0/38” and it is not there

after migrate, i inspect the destination server, this time the folder is created “/var/lib/one/datastore/38” and it contains files including disk.1 :

-rw-r--r-- 1 oneadmin oneadmin 185040260 janv. 30 15:21 checkpoint
-rw-r--r-- 1 oneadmin oneadmin      2119 janv. 30 15:21 checkpoint.xml
-rw-r--r-- 1 oneadmin oneadmin       862 janv. 30 15:14 deployment.0
-rw-r--r-- 1 oneadmin oneadmin    372736 janv. 30 15:21 disk.1

but “disk.0” link to drbd device is not created.

can you provide any insight?

thank you in advance

one · February 5, 2018, 1:04pm

i have solved this issue myself. the problem was the system datastore was not shared. to do this I followed these steps:

export /var/lib/one as NFS share on controller
mount /var/lib/one share from controller on all nodes
delete and recreate system datastore with backend “Filesystem shared” mode

migration and live migration then works as all system datastore disk links are present on all nodes

i hope this helps others

Topic		Replies	Views
Drbd datastore deploy fail Product Support	3	707	March 1, 2018
Live Migration fails on OpenNebula 4.14.2 Product Support	5	1783	February 11, 2016
[Solved] OpenNebula 5.0.1 - Migrate Failed Product Support	4	2025	July 20, 2016
Live Migration Fails on Opennebula 5.0 Product Support	1	1197	April 30, 2018
ONE 6.0.0.2 (CE): live migration fails Product Support	2	445	July 25, 2021

Drbd datastore live migration fail

Related topics