[solved] GlusterFS problem in OpenNebula 4.14.2-2 - unable to boot/deploy VMs

Hello,

i have recently upgraded my system that is using GlusterFS gfapi to deploy VMs and all hypervisors are now not able to boot or deploy or migrate VMs.

Any suggestions where the problem can be ?

Fri Nov 25 11:59:44 2016 [Z0][VM][I]: New state is ACTIVE
Fri Nov 25 11:59:44 2016 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Fri Nov 25 11:59:44 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/67/deployment.5
Fri Nov 25 11:59:44 2016 [Z0][VMM][I]: ExitCode: 0
Fri Nov 25 11:59:44 2016 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/112/67/deployment.5' 'node2[reddacted]' 67 node2[reddacted]
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/112/67/deployment.5
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: error: internal error: process exited while connecting to monitor: [2016-11-25 10:59:44.744034] I [MSGID: 104045] [glfs-master.c:96:notify] 0-gfapi: New graph 6e6f6465-322d-3139-3734-382d32303136 (0) coming up
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: [2016-11-25 10:59:44.744056] I [MSGID: 114020] [client.c:2113:notify] 0-vmvol-client-0: parent translators are ready, attempting connect on transport
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: [2016-11-25 10:59:44.744323] I [MSGID: 114020] [client.c:2113:notify] 0-vmvol-client-1: parent translators are ready, attempting connect on transport
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: [2016-11-25 10:59:44.744511] I [MSGID: 114020] [client.c:2113:notify] 0-vmvol-client-2: parent translators are ready, attempting connect on transport
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: [2016-11-25 10:59:44.744608] I [rpc-clnt.c:1960:rpc_clnt_reconfig] 0-vmvol-client-1: changing port to 49152 (from 0)
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: [2016-11-25 10:59:44.744709] I [rpc-clnt.c:1960:rpc_clnt_reconfig] 0-vmvol-client-0: changing port to 49152 (from 0)
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: [2016-11-25 10:59:44.744957] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vmvol-client-1: Using Program GlusterFS 3.3
Fri Nov 25 11:59:55 2016 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/112/67/deployment.5
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: ExitCode: 255
Fri Nov 25 11:59:55 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Nov 25 11:59:55 2016 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/112/67/deployment.5
Fri Nov 25 11:59:55 2016 [Z0][VM][I]: New state is POWEROFF
Fri Nov 25 11:59:55 2016 [Z0][VM][I]: New LCM state is LCM_INIT

I can see also this problem in gluster logs while trying to boot :slight_smile:

[2016-11-25 11:04:27.969949] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-vmvol-server: accepted client from node2-23889-2016/11/25-11:04:27:972079-vmvol-client-0-0-0 (version: 3.7.17)
[2016-11-25 11:04:37.979296] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-vmvol-server: disconnecting connection from node2-23889-2016/11/25-11:04:27:972079-vmvol-client-0-0-0
[2016-11-25 11:04:37.979328] W [inodelk.c:404:pl_inodelk_log_cleanup] 0-vmvol-server: releasing lock on 731a2b65-8f11-4012-a09e-74320aff3489 held by {client=0x7fb1e80ee7c0, pid=-6 lk-owner=c41d9978347f0000}
[2016-11-25 11:04:37.979337] W [inodelk.c:404:pl_inodelk_log_cleanup] 0-vmvol-server: releasing lock on 731a2b65-8f11-4012-a09e-74320aff3489 held by {client=0x7fb1e80ee7c0, pid=-6 lk-owner=c41d9978347f0000}
[2016-11-25 11:04:37.979345] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-vmvol-server: fd cleanup on /67/disk.1
[2016-11-25 11:04:38.746825] I [socket.c:3423:socket_submit_reply] 0-tcp.vmvol-server: not connected (priv->connected = -1)
[2016-11-25 11:04:38.754732] E [rpcsvc.c:1329:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2e6d, Program: GlusterFS 3.3, ProgVers: 330, Proc: 37) to rpc-transport (tcp.vmvol-server)
[2016-11-25 11:04:38.754866] E [server.c:205:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_rchecksum_cbk+0x127) [0x7fb1f55ed6b7] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.17/xlator/protocol/server.so(server_rchecksum_cbk+0xa9) [0x7fb1ed160f39] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.17/xlator/protocol/server.so(server_submit_reply+0x2ce) [0x7fb1ed1577be] ) 0-: Reply submission failed
[2016-11-25 11:04:38.754887] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-vmvol-server: Shutting down connection node2-23889-2016/11/25-11:04:27:972079-vmvol-client-0-0-0 

Moreover, i found this in Libvirt logs.

qemu-system-x86_64: -drive file=gluster://node1:24007/vmvol/67/disk.1,if=none,id=drive-ide0-0-0,format=qcow2,cache=none: could not open disk image gluster://node1:24007/vmvol/67/disk.1: Could not read L1 table: Bad file descriptor

Regards,
Martin

Hi,
i had some trouble but i resolverd with “downgrade” glusterfs version to 3.8.4 (in the node) and everything run fine.
… sorry for my english :wink:

Gluster is running OK, mounted with FUSE, everything looks ok, there is probably some problem with qemu while accessing gluster with gfsapi.

These are our current versions (Qemu is from https://launchpad.net/~monotek/+archive/ubuntu/qemu-glusterfs-3.7 ):
ii glusterfs-client 3.7.17-ubuntu1~trusty2 amd64 clustered file-system (client package)
ii glusterfs-common 3.7.17-ubuntu1~trusty2 amd64 GlusterFS common libraries and translator modules
ii glusterfs-server 3.7.17-ubuntu1~trusty2 amd64 clustered file-system (server package)
ii qemu-keymaps 2.0.0+dfsg-2ubuntu1.28glusterfs3.7.17trusty1 all QEMU keyboard maps
ii qemu-kvm 2.0.0+dfsg-2ubuntu1.28glusterfs3.7.17trusty1 amd64 QEMU Full virtualization
ii qemu-system-common 2.0.0+dfsg-2ubuntu1.28glusterfs3.7.17trusty1 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-x86 2.0.0+dfsg-2ubuntu1.28glusterfs3.7.17trusty1 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 2.0.0+dfsg-2ubuntu1.28glusterfs3.7.17trusty1 amd64 QEMU utilities

As far as I know there is no 3.8 Gluster available as package for Ubuntu 14.04.

Hi,
i’ve Centos 7.
Anyway… i suggest a downgrade of glusterfs version…on clients
On My system this did the trick and everithings run…

I’ve solved the problem by upgrading gluster to these versions on Ubuntu 14 -

glusterfs-client - 3.8.6-ubuntu1~trusty1
glusterfs-common - 3.8.6-ubuntu1~trusty1
glusterfs-server - 3.8.6-ubuntu1~trusty1
qemu-keymaps - 2.0.0+dfsg-2ubuntu1.30glusterfs3.8.6trusty1
qemu-kvm - 2.0.0+dfsg-2ubuntu1.30glusterfs3.8.6trusty1
qemu-system-common - 2.0.0+dfsg-2ubuntu1.30glusterfs3.8.6trusty1
qemu-system-x86 - 2.0.0+dfsg-2ubuntu1.30glusterfs3.8.6trusty1
qemu-utils - 2.0.0+dfsg-2ubuntu1.30glusterfs3.8.6trusty1

Gluste now works fine even with Qemu using LibGfsApi.