I just wanna to sit down in a corner and cry. My afternoon went like this:
0 - I kept working on the same two instances started from the same template, 93
, persistent, working and 94
, ephemeral, not starting.
1 - Started by running diff /var/lib/one/datastores/100/94/deployment.0 /var/lib/one/datastores/100/93/deployment.0
, and confirmed the only relevant differences were:
< <source file='/var/lib/one/datastores/100/94/disk.0'/>
---
> <source file='/var/lib/one/datastores/100/93/disk.0'/>
22c22
< <source file='/var/lib/one/datastores/100/94/disk.1'/>
---
> <source file='/var/lib/one/datastores/100/93/disk.1'/>
2 - As next step I looked at the full path from every disk.0
to the two actual qcow2 files. This is 94:
root@za:~# ls -lha /var/lib/one/datastores/100/94/
total 11K
drwxr-xr-x 3 oneadmin oneadmin 6 Mar 14 11:13 .
drwxr-xr-x 12 oneadmin oneadmin 12 Mar 14 11:13 ..
-rw-r--r-- 1 oneadmin oneadmin 1.4K Mar 14 11:13 deployment.0
lrwxrwxrwx 1 oneadmin oneadmin 13 Mar 14 11:13 disk.0 -> disk.0.snap/0
drwxr-xr-x 2 oneadmin oneadmin 4 Mar 14 11:13 disk.0.snap
-rw-r--r-- 1 oneadmin oneadmin 364K Mar 14 11:13 disk.1
root@za:~# ls -lha /var/lib/one/datastores/100/94/disk.0.snap/
total 7.0K
drwxr-xr-x 2 oneadmin oneadmin 4 Mar 14 11:13 .
drwxr-xr-x 3 oneadmin oneadmin 6 Mar 14 11:13 ..
-rw-r--r-- 1 oneadmin oneadmin 193K Mar 14 11:13 0
lrwxrwxrwx 1 oneadmin oneadmin 1 Mar 14 11:13 disk.0.snap -> .
And this is 93:
root@za:~# ls -lha /var/lib/one/datastores/100/93/
total 11K
drwxr-xr-x 2 oneadmin oneadmin 6 Mar 14 11:13 .
drwxr-xr-x 12 oneadmin oneadmin 12 Mar 14 11:13 ..
-rw-r--r-- 1 oneadmin oneadmin 1.4K Mar 14 11:13 deployment.0
lrwxrwxrwx 1 oneadmin oneadmin 67 Mar 14 11:13 disk.0 -> /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073.snap/0
lrwxrwxrwx 1 oneadmin oneadmin 65 Mar 14 11:13 disk.0.snap -> /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073.snap
-rw-r--r-- 1 oneadmin oneadmin 364K Mar 14 11:13 disk.1
root@za:~# ls -lha /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073.snap/0
lrwxrwxrwx 1 oneadmin oneadmin 60 Mar 14 11:13 /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073.snap/0 -> /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073
root@za:~# ls -lha /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073
-rw-r--r-- 1 oneadmin oneadmin 404M Mar 19 16:25 /var/lib/one/datastores/103/0bc61cd241e410f1adbad737d908f073
3 - Permissions seemed to be ok top to bottom, so I double checked the img status:
root@za:~# qemu-img info /var/lib/one/datastores/100/93/disk.0
image: /var/lib/one/datastores/100/93/disk.0
file format: qcow2
virtual size: 2.2G (2361393152 bytes)
disk size: 353M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
root@za:~# qemu-img info /var/lib/one/datastores/100/94/disk.0
image: /var/lib/one/datastores/100/94/disk.0
file format: qcow2
virtual size: 2.2G (2361393152 bytes)
disk size: 4.5K
cluster_size: 65536
backing file: /var/lib/one/datastores/103/2653eac24c15f35c043e7bcde31c7a51
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
4 - At this point I spent hours fiddling with any kind of permission. Out of hope, I changed ownership for the entire chain of files to a temporary user, updated qemu.conf
and rebooted.
5 - Things were at this point way more broken than they used to be.
6 - Restored the original qemu.conf
(effectively fully rolling back the initial status of my machine), rebooted again.
Everything started working. I have no explanation whatsoever for this. I checked all the permissions from my terminal log, and I’ve restored them exactly as they were previously. The only .conf file I’ve touched has been restored from local backup, so it’s the same.
The only thing which happened is a reboot.
Well, I guess all is well what ends well. Thanks a lot for your support.