Deployment of the VM on CEPH problem

Hello all, I am pretty new to OpenNebula however I have inherited a misbehaving OpenNebula KVM based cluster with CEPH datastore (Ubuntu 14.04 and OpenNebula 4.8.0). I’ve looked into how-to documents and implementation is not adhering strictly to them but it was working until recently and nothing was changed manually.
I’ve been browsing the forum but I cannot find the answer to my problem. And the problem is: when trying to instantiate VM with KVM cluster based on CEPH I’m hitting the following errors in log:

Tue Mar  8 10:55:15 2016 [Z0][TM][D]: Message received: TRANSFER SUCCESS 1667 -
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 ExitCode: 0
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 Successfully execute network driver operation: pre.
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/108/1667/deployment.0' 'fras004' 1667 fras004
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 error: Failed to create domain from /var/lib/one//datastores/108/1667/deployment.0
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 error: Failed to open file '/var/lib/one//datastores/108/1667/disk.1': No such file or directory
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG E 1667 Could not create domain from /var/lib/one//datastores/108/1667/deployment.0
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 ExitCode: 255
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: LOG I 1667 Failed to execute virtualization driver operation: deploy.
Tue Mar  8 10:55:15 2016 [Z0][VMM][D]: Message received: DEPLOY FAILURE 1667 Could not create domain from /var/lib/one//datastores/108/1667/deployment.0

Log of the VM says pretty much the same:

Tue Mar  8 10:55:03 2016 [Z0][DiM][I]: New VM state is ACTIVE.
Tue Mar  8 10:55:03 2016 [Z0][LCM][I]: New VM state is PROLOG.
Tue Mar  8 10:55:15 2016 [Z0][LCM][I]: New VM state is BOOT
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1667/deployment.0
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: ExitCode: 0
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/108/1667/deployment.0' 'fras004' 1667 fras004
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/108/1667/deployment.0
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: error: Failed to open file '/var/lib/one//datastores/108/1667/disk.1': No such file or directory
Tue Mar  8 10:55:15 2016 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/108/1667/deployment.0
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: ExitCode: 255
Tue Mar  8 10:55:15 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Tue Mar  8 10:55:15 2016 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/108/1667/deployment.0
Tue Mar  8 10:55:15 2016 [Z0][DiM][I]: New VM state is FAILED

On the hypervisor there is a directory created containing only deployment.0 and disk.1.iso shortcut pointing to a non existing file.
From both controller and hypervisor I can list rbd images.
KVM log of the VM /var/log/libvirt/qemu/one-1667.log says only:

2016-03-08 09:55:15.982+0000: shutting down

If I try to execute virsh --connect qemu:///system create /var/lib/one/datastores/108/1667/deployment.0 I get the same error as previously error: Failed to create domain from /var/lib/one/datastores/108/1667/deployment.0 error: Failed to open file '/var/lib/one//datastores/108/1667/disk.1': No such file or directory

Like said previously disk.1 file is not transferred in the directory but I cannot find the reason why.
Oned.log, syslog, dmesg provides no further insight into the problem.
Any idea where can I look further to find more detail on the error or how can I proceed with solving it first.

Thank you in advance,

Can you share the output of onevm show -x <vmid> as well as onedatastore list -x?

Of course, here is one of the failed VMs:

<VM>
  <ID>763</ID>
  <UID>3</UID>
  <GID>0</GID>
  <UNAME>andkempe</UNAME>
  <GNAME>oneadmin</GNAME>
  <NAME>FRALS02-LDAP-Sync</NAME>
  <PERMISSIONS>
    <OWNER_U>1</OWNER_U>
    <OWNER_M>1</OWNER_M>
    <OWNER_A>0</OWNER_A>
    <GROUP_U>0</GROUP_U>
    <GROUP_M>0</GROUP_M>
    <GROUP_A>0</GROUP_A>
    <OTHER_U>0</OTHER_U>
    <OTHER_M>0</OTHER_M>
    <OTHER_A>0</OTHER_A>
  </PERMISSIONS>
  <LAST_POLL>1457510181</LAST_POLL>
  <STATE>8</STATE>
  <LCM_STATE>0</LCM_STATE>
  <RESCHED>0</RESCHED>
  <STIME>1422600128</STIME>
  <ETIME>0</ETIME>
  <DEPLOY_ID>one-763</DEPLOY_ID>
  <MEMORY>0</MEMORY>
  <CPU>0</CPU>
  <NET_TX>140827265</NET_TX>
  <NET_RX>7795991192</NET_RX>
  <TEMPLATE>
    <AUTOMATIC_REQUIREMENTS><![CDATA[!(PUBLIC_CLOUD = YES)]]></AUTOMATIC_REQUIREMENTS>
    <CONTEXT>
      <CEPH_CONTEXT><![CDATA[true]]></CEPH_CONTEXT>
      <DISK_ID><![CDATA[1]]></DISK_ID>
      <ETH0_DNS><![CDATA[10.160.162.254]]></ETH0_DNS>
      <ETH0_GATEWAY><![CDATA[10.160.162.254]]></ETH0_GATEWAY>
      <ETH0_IP><![CDATA[10.160.162.60]]></ETH0_IP>
      <ETH0_MAC><![CDATA[02:00:0a:a0:a2:3c]]></ETH0_MAC>
      <ETH0_MASK><![CDATA[255.255.255.0]]></ETH0_MASK>
      <ETH0_NETWORK><![CDATA[10.160.162.0]]></ETH0_NETWORK>
      <ETH1_DNS><![CDATA[10.160.180.254]]></ETH1_DNS>
      <ETH1_GATEWAY><![CDATA[10.160.180.254]]></ETH1_GATEWAY>
      <ETH1_IP><![CDATA[10.160.180.65]]></ETH1_IP>
      <ETH1_MAC><![CDATA[02:00:0a:a0:b4:41]]></ETH1_MAC>
      <ETH1_MASK><![CDATA[255.255.255.0]]></ETH1_MASK>
      <ETH1_NETWORK><![CDATA[10.160.180.0]]></ETH1_NETWORK>
      <ETH2_DNS><![CDATA[10.160.110.254]]></ETH2_DNS>
      <ETH2_GATEWAY><![CDATA[10.160.110.254]]></ETH2_GATEWAY>
      <ETH2_IP><![CDATA[10.160.110.4]]></ETH2_IP>
      <ETH2_MAC><![CDATA[02:00:0a:a0:6e:04]]></ETH2_MAC>
      <ETH2_MASK><![CDATA[255.255.255.0]]></ETH2_MASK>
      <ETH2_NETWORK><![CDATA[10.160.110.0]]></ETH2_NETWORK>
      <NETWORK><![CDATA[YES]]></NETWORK>
      <SET_HOSTNAME><![CDATA[FRALS02]]></SET_HOSTNAME>
      <TARGET><![CDATA[hda]]></TARGET>
    </CONTEXT>
    <CPU><![CDATA[0.4]]></CPU>
    <DISK>
      <CACHE><![CDATA[writeback]]></CACHE>
      <CEPH_HOST><![CDATA[FRADS001]]></CEPH_HOST>
      <CLONE><![CDATA[YES]]></CLONE>
      <CLONE_TARGET><![CDATA[SELF]]></CLONE_TARGET>
      <DATASTORE><![CDATA[ceph]]></DATASTORE>
      <DATASTORE_ID><![CDATA[104]]></DATASTORE_ID>
      <DEV_PREFIX><![CDATA[vd]]></DEV_PREFIX>
      <DISK_ID><![CDATA[0]]></DISK_ID>
      <IMAGE><![CDATA[centos7.0_x86_64_30GB]]></IMAGE>
      <IMAGE_ID><![CDATA[12]]></IMAGE_ID>
      <IMAGE_UNAME><![CDATA[oneadmin]]></IMAGE_UNAME>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <READONLY><![CDATA[NO]]></READONLY>
      <SAVE><![CDATA[NO]]></SAVE>
      <SIZE><![CDATA[30720]]></SIZE>
      <SOURCE><![CDATA[one/one-12]]></SOURCE>
      <TARGET><![CDATA[vda]]></TARGET>
      <TM_MAD><![CDATA[ceph]]></TM_MAD>
      <TYPE><![CDATA[RBD]]></TYPE>
    </DISK>
    <DISK>
      <ATTACH_DISK_SEARCH><![CDATA[FRALS]]></ATTACH_DISK_SEARCH>
      <CEPH_HOST><![CDATA[FRADS001]]></CEPH_HOST>
      <CLONE><![CDATA[YES]]></CLONE>
      <CLONE_TARGET><![CDATA[SELF]]></CLONE_TARGET>
      <DATASTORE><![CDATA[ceph]]></DATASTORE>
      <DATASTORE_ID><![CDATA[104]]></DATASTORE_ID>
      <DEV_PREFIX><![CDATA[vd]]></DEV_PREFIX>
      <DISK_ID><![CDATA[2]]></DISK_ID>
      <DRIVER><![CDATA[raw]]></DRIVER>
      <IMAGE><![CDATA[FRALS01-Log-Disc]]></IMAGE>
      <IMAGE_ID><![CDATA[92]]></IMAGE_ID>
      <IMAGE_UNAME><![CDATA[andkempe]]></IMAGE_UNAME>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <READONLY><![CDATA[NO]]></READONLY>
      <SAVE><![CDATA[NO]]></SAVE>
      <SIZE><![CDATA[30000]]></SIZE>
      <SOURCE><![CDATA[one/one-92]]></SOURCE>
      <TARGET><![CDATA[vdb]]></TARGET>
      <TM_MAD><![CDATA[ceph]]></TM_MAD>
      <TYPE><![CDATA[RBD]]></TYPE>
    </DISK>
    <GRAPHICS>
      <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
      <PORT><![CDATA[6663]]></PORT>
      <TYPE><![CDATA[VNC]]></TYPE>
    </GRAPHICS>
    <MEMORY><![CDATA[8192]]></MEMORY>
    <NIC>
      <AR_ID><![CDATA[0]]></AR_ID>
      <BRIDGE><![CDATA[ovsbr-extern]]></BRIDGE>
      <IP><![CDATA[10.160.162.60]]></IP>
      <MAC><![CDATA[02:00:0a:a0:a2:3c]]></MAC>
      <MODEL><![CDATA[virtio]]></MODEL>
      <NETWORK><![CDATA[mgmt_162]]></NETWORK>
      <NETWORK_ID><![CDATA[1]]></NETWORK_ID>
      <NETWORK_UNAME><![CDATA[oneadmin]]></NETWORK_UNAME>
      <NIC_ID><![CDATA[0]]></NIC_ID>
      <VLAN><![CDATA[YES]]></VLAN>
      <VLAN_ID><![CDATA[162]]></VLAN_ID>
    </NIC>
    <NIC_DEFAULT>
      <MODEL><![CDATA[virtio]]></MODEL>
    </NIC_DEFAULT>
    <TEMPLATE_ID><![CDATA[13]]></TEMPLATE_ID>
    <VCPU><![CDATA[4]]></VCPU>
    <VMID><![CDATA[763]]></VMID>
  </TEMPLATE>
  <USER_TEMPLATE>
    <DESCRIPTION><![CDATA[Centos 7.0 App Server Template]]></DESCRIPTION>
    <ERROR><![CDATA[Wed Mar  9 11:09:58 2016 : Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/108/763/deployment.5]]></ERROR>
    <LOGO><![CDATA[images/logos/centos.png]]></LOGO>
    <SCHED_REQUIREMENTS><![CDATA[CLUSTER_ID="105"]]></SCHED_REQUIREMENTS>
  </USER_TEMPLATE>
  <HISTORY_RECORDS>
    <HISTORY>
      <OID>763</OID>
      <SEQ>0</SEQ>
      <HOSTNAME>fras001</HOSTNAME>
      <HID>11</HID>
      <CID>105</CID>
      <STIME>1422600157</STIME>
      <ETIME>1441197292</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>ovswitch</VNMMAD>
      <TMMAD>ceph</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>108</DS_ID>
      <PSTIME>1422600157</PSTIME>
      <PETIME>1422600161</PETIME>
      <RSTIME>1422600161</RSTIME>
      <RETIME>1441197292</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>2</REASON>
      <ACTION>2</ACTION>
    </HISTORY>
    <HISTORY>
      <OID>763</OID>
      <SEQ>1</SEQ>
      <HOSTNAME>fras003</HOSTNAME>
      <HID>13</HID>
      <CID>105</CID>
      <STIME>1441197243</STIME>
      <ETIME>1444744005</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>ovswitch</VNMMAD>
      <TMMAD>ceph</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>108</DS_ID>
      <PSTIME>0</PSTIME>
      <PETIME>0</PETIME>
      <RSTIME>1441197292</RSTIME>
      <RETIME>1444744005</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>2</REASON>
      <ACTION>2</ACTION>
    </HISTORY>
    <HISTORY>
      <OID>763</OID>
      <SEQ>2</SEQ>
      <HOSTNAME>fras006</HOSTNAME>
      <HID>16</HID>
      <CID>105</CID>
      <STIME>1444743083</STIME>
      <ETIME>1444762874</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>ovswitch</VNMMAD>
      <TMMAD>ceph</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>108</DS_ID>
      <PSTIME>0</PSTIME>
      <PETIME>0</PETIME>
      <RSTIME>1444744005</RSTIME>
      <RETIME>1444762874</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>2</REASON>
      <ACTION>0</ACTION>
    </HISTORY>
    <HISTORY>
      <OID>763</OID>
      <SEQ>3</SEQ>
      <HOSTNAME>fras002</HOSTNAME>
      <HID>12</HID>
      <CID>105</CID>
      <STIME>1444762813</STIME>
      <ETIME>1457510181</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>ovswitch</VNMMAD>
      <TMMAD>ceph</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>108</DS_ID>
      <PSTIME>0</PSTIME>
      <PETIME>0</PETIME>
      <RSTIME>1444762835</RSTIME>
      <RETIME>1457510181</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>2</REASON>
      <ACTION>0</ACTION>
    </HISTORY>
    <HISTORY>
      <OID>763</OID>
      <SEQ>4</SEQ>
      <HOSTNAME>fras002</HOSTNAME>
      <HID>12</HID>
      <CID>105</CID>
      <STIME>1457518161</STIME>
      <ETIME>0</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>ovswitch</VNMMAD>
      <TMMAD>ceph</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>108</DS_ID>
      <PSTIME>0</PSTIME>
      <PETIME>0</PETIME>
      <RSTIME>1457518161</RSTIME>
      <RETIME>0</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>0</REASON>
      <ACTION>0</ACTION>
    </HISTORY>
    <HISTORY>
      <OID>763</OID>
      <SEQ>5</SEQ>
      <HOSTNAME>fras002</HOSTNAME>
      <HID>12</HID>
      <CID>105</CID>
      <STIME>1457518197</STIME>
      <ETIME>0</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>ovswitch</VNMMAD>
      <TMMAD>ceph</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>108</DS_ID>
      <PSTIME>0</PSTIME>
      <PETIME>0</PETIME>
      <RSTIME>1457518197</RSTIME>
      <RETIME>0</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>0</REASON>
      <ACTION>0</ACTION>
    </HISTORY>
  </HISTORY_RECORDS>
</VM>

And datastores:

<DATASTORE_POOL>
  <DATASTORE>
    <ID>104</ID>
    <UID>0</UID>
    <GID>0</GID>
    <UNAME>oneadmin</UNAME>
    <GNAME>oneadmin</GNAME>
    <NAME>ceph</NAME>
    <PERMISSIONS>
      <OWNER_U>1</OWNER_U>
      <OWNER_M>1</OWNER_M>
      <OWNER_A>0</OWNER_A>
      <GROUP_U>1</GROUP_U>
      <GROUP_M>0</GROUP_M>
      <GROUP_A>0</GROUP_A>
      <OTHER_U>0</OTHER_U>
      <OTHER_M>0</OTHER_M>
      <OTHER_A>0</OTHER_A>
    </PERMISSIONS>
    <DS_MAD><![CDATA[ceph]]></DS_MAD>
    <TM_MAD><![CDATA[ceph]]></TM_MAD>
    <BASE_PATH><![CDATA[/var/lib/one//datastores/104]]></BASE_PATH>
    <TYPE>0</TYPE>
    <DISK_TYPE>3</DISK_TYPE>
    <CLUSTER_ID>-1</CLUSTER_ID>
    <CLUSTER/>
    <TOTAL_MB>202091552</TOTAL_MB>
    <FREE_MB>56442260</FREE_MB>
    <USED_MB>145649296</USED_MB>
    <IMAGES>
      <ID>9</ID>
      <ID>12</ID>
      <ID>14</ID>
      <ID>15</ID>
      <ID>16</ID>
      <ID>17</ID>
      <ID>29</ID>
      <ID>30</ID>
      <ID>31</ID>
      <ID>32</ID>
      <ID>35</ID>
      <ID>36</ID>
      <ID>37</ID>
      <ID>39</ID>
      <ID>42</ID>
      <ID>43</ID>
      <ID>44</ID>
      <ID>45</ID>
      <ID>49</ID>
      <ID>50</ID>
      <ID>53</ID>
      <ID>54</ID>
      <ID>58</ID>
      <ID>61</ID>
      <ID>69</ID>
      <ID>72</ID>
      <ID>75</ID>
      <ID>79</ID>
      <ID>88</ID>
      <ID>89</ID>
      <ID>90</ID>
      <ID>91</ID>
      <ID>92</ID>
      <ID>93</ID>
      <ID>94</ID>
      <ID>95</ID>
      <ID>97</ID>
      <ID>98</ID>
      <ID>99</ID>
      <ID>100</ID>
      <ID>101</ID>
      <ID>102</ID>
      <ID>107</ID>
      <ID>108</ID>
      <ID>109</ID>
      <ID>110</ID>
      <ID>111</ID>
      <ID>113</ID>
      <ID>114</ID>
      <ID>115</ID>
      <ID>116</ID>
      <ID>117</ID>
      <ID>118</ID>
      <ID>120</ID>
      <ID>121</ID>
      <ID>122</ID>
      <ID>124</ID>
      <ID>128</ID>
      <ID>130</ID>
      <ID>134</ID>
      <ID>135</ID>
      <ID>137</ID>
      <ID>138</ID>
      <ID>139</ID>
      <ID>140</ID>
      <ID>142</ID>
      <ID>147</ID>
      <ID>151</ID>
      <ID>152</ID>
      <ID>159</ID>
      <ID>160</ID>
      <ID>161</ID>
      <ID>162</ID>
      <ID>163</ID>
      <ID>174</ID>
      <ID>175</ID>
      <ID>176</ID>
      <ID>177</ID>
      <ID>178</ID>
      <ID>179</ID>
      <ID>182</ID>
      <ID>183</ID>
      <ID>186</ID>
      <ID>187</ID>
      <ID>188</ID>
      <ID>191</ID>
      <ID>192</ID>
      <ID>198</ID>
      <ID>208</ID>
      <ID>209</ID>
      <ID>213</ID>
      <ID>214</ID>
      <ID>215</ID>
      <ID>218</ID>
      <ID>221</ID>
      <ID>222</ID>
      <ID>223</ID>
      <ID>224</ID>
      <ID>225</ID>
      <ID>226</ID>
      <ID>229</ID>
      <ID>235</ID>
      <ID>241</ID>
      <ID>243</ID>
      <ID>244</ID>
      <ID>245</ID>
      <ID>246</ID>
      <ID>247</ID>
      <ID>248</ID>
      <ID>249</ID>
      <ID>250</ID>
      <ID>251</ID>
      <ID>252</ID>
      <ID>253</ID>
      <ID>256</ID>
      <ID>257</ID>
      <ID>258</ID>
      <ID>260</ID>
      <ID>261</ID>
      <ID>262</ID>
      <ID>263</ID>
      <ID>264</ID>
      <ID>265</ID>
      <ID>266</ID>
      <ID>267</ID>
      <ID>268</ID>
      <ID>269</ID>
      <ID>270</ID>
      <ID>271</ID>
      <ID>272</ID>
      <ID>273</ID>
      <ID>274</ID>
      <ID>275</ID>
      <ID>276</ID>
      <ID>277</ID>
      <ID>282</ID>
      <ID>284</ID>
      <ID>285</ID>
      <ID>286</ID>
      <ID>287</ID>
      <ID>288</ID>
      <ID>289</ID>
      <ID>290</ID>
      <ID>291</ID>
      <ID>292</ID>
      <ID>293</ID>
      <ID>296</ID>
      <ID>297</ID>
      <ID>298</ID>
      <ID>299</ID>
      <ID>300</ID>
      <ID>302</ID>
      <ID>303</ID>
      <ID>307</ID>
      <ID>308</ID>
      <ID>309</ID>
      <ID>310</ID>
      <ID>311</ID>
      <ID>312</ID>
      <ID>314</ID>
      <ID>316</ID>
      <ID>320</ID>
      <ID>321</ID>
      <ID>326</ID>
      <ID>330</ID>
      <ID>331</ID>
      <ID>332</ID>
      <ID>333</ID>
      <ID>335</ID>
      <ID>336</ID>
      <ID>339</ID>
      <ID>341</ID>
      <ID>342</ID>
      <ID>343</ID>
      <ID>349</ID>
      <ID>350</ID>
      <ID>351</ID>
      <ID>352</ID>
      <ID>353</ID>
      <ID>359</ID>
      <ID>362</ID>
      <ID>365</ID>
      <ID>366</ID>
      <ID>367</ID>
      <ID>368</ID>
      <ID>369</ID>
      <ID>370</ID>
      <ID>371</ID>
      <ID>372</ID>
      <ID>373</ID>
      <ID>374</ID>
      <ID>375</ID>
      <ID>376</ID>
      <ID>377</ID>
      <ID>378</ID>
      <ID>379</ID>
      <ID>383</ID>
      <ID>384</ID>
      <ID>385</ID>
      <ID>388</ID>
    </IMAGES>
    <TEMPLATE>
      <BASE_PATH><![CDATA[/var/lib/one//datastores/]]></BASE_PATH>
      <BRIDGE_LIST><![CDATA[FRAS001]]></BRIDGE_LIST>
      <CEPH_HOST><![CDATA[FRADS001]]></CEPH_HOST>
      <CLONE_TARGET><![CDATA[SELF]]></CLONE_TARGET>
      <DISK_TYPE><![CDATA[RBD]]></DISK_TYPE>
      <DS_MAD><![CDATA[ceph]]></DS_MAD>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <POOL_NAME><![CDATA[one]]></POOL_NAME>
      <TM_MAD><![CDATA[ceph]]></TM_MAD>
      <TYPE><![CDATA[IMAGE_DS]]></TYPE>
    </TEMPLATE>
  </DATASTORE>
  <DATASTORE>
    <ID>108</ID>
    <UID>0</UID>
    <GID>0</GID>
    <UNAME>oneadmin</UNAME>
    <GNAME>oneadmin</GNAME>
    <NAME>system_ssh_ceph</NAME>
    <PERMISSIONS>
      <OWNER_U>1</OWNER_U>
      <OWNER_M>1</OWNER_M>
      <OWNER_A>0</OWNER_A>
      <GROUP_U>1</GROUP_U>
      <GROUP_M>0</GROUP_M>
      <GROUP_A>0</GROUP_A>
      <OTHER_U>0</OTHER_U>
      <OTHER_M>0</OTHER_M>
      <OTHER_A>0</OTHER_A>
    </PERMISSIONS>
    <DS_MAD><![CDATA[-]]></DS_MAD>
    <TM_MAD><![CDATA[ceph]]></TM_MAD>
    <BASE_PATH><![CDATA[/var/lib/one//datastores/108]]></BASE_PATH>
    <TYPE>1</TYPE>
    <DISK_TYPE>0</DISK_TYPE>
    <CLUSTER_ID>105</CLUSTER_ID>
    <CLUSTER>ceph</CLUSTER>
    <TOTAL_MB>145589</TOTAL_MB>
    <FREE_MB>125370</FREE_MB>
    <USED_MB>9781</USED_MB>
    <IMAGES/>
    <TEMPLATE>
      <BASE_PATH><![CDATA[/var/lib/one//datastores/]]></BASE_PATH>
      <SHARED><![CDATA[YES]]></SHARED>
      <TM_MAD><![CDATA[ceph]]></TM_MAD>
      <TYPE><![CDATA[SYSTEM_DS]]></TYPE>
    </TEMPLATE>
  </DATASTORE>
  <DATASTORE>
    <ID>113</ID>
    <UID>0</UID>
    <GID>0</GID>
    <UNAME>oneadmin</UNAME>
    <GNAME>oneadmin</GNAME>
    <NAME>local_storage</NAME>
    <PERMISSIONS>
      <OWNER_U>1</OWNER_U>
      <OWNER_M>1</OWNER_M>
      <OWNER_A>0</OWNER_A>
      <GROUP_U>1</GROUP_U>
      <GROUP_M>0</GROUP_M>
      <GROUP_A>0</GROUP_A>
      <OTHER_U>0</OTHER_U>
      <OTHER_M>0</OTHER_M>
      <OTHER_A>0</OTHER_A>
    </PERMISSIONS>
    <DS_MAD><![CDATA[-]]></DS_MAD>
    <TM_MAD><![CDATA[ssh]]></TM_MAD>
    <BASE_PATH><![CDATA[/var/lib/one//datastores/113]]></BASE_PATH>
    <TYPE>1</TYPE>
    <DISK_TYPE>0</DISK_TYPE>
    <CLUSTER_ID>104</CLUSTER_ID>
    <CLUSTER>local</CLUSTER>
    <TOTAL_MB>0</TOTAL_MB>
    <FREE_MB>0</FREE_MB>
    <USED_MB>0</USED_MB>
    <TEMPLATE>
      <BASE_PATH><![CDATA[/var/lib/one//datastores/]]></BASE_PATH>
      <SHARED><![CDATA[NO]]></SHARED>
      <TM_MAD><![CDATA[ssh]]></TM_MAD>
      <TYPE><![CDATA[SYSTEM_DS]]></TYPE>
    </TEMPLATE>
    <IMAGES/>
  </DATASTORE>
</DATASTORE_POOL>

Thank for the involvement.

It looks like datastore 108, the system datastore, is using TM_MAD=ceph, can you change that to TM_MAD=ssh? in fact, you have named it “system_ssh_ceph”, do you know if you’ve changed the TM for some reason?

Thank you Jaime. You are right on the spot.

That was the problem for one of the controller nodes (we have two clustered over pacemaker/corosync) the other one does not create rbd context file. But that is other thing to deal with.

I don’t know how the driver was changed, I did restart the OpenNebula services though, but didn’t touch the config, didn’t see the point in altering the configuration that was working previously.