After updating from 4.10 to 4.12 ceph became 100% used

After upgrading from 4.10 to 4.12 my ceph datastore is now
100% full with 5.2/5.2 TB used. I don’t know from where opennebula gets
this data if i have in my ceph This:

6485 GB data, 13704 GB used, 8022 GB / 21726 GB avail

Can anyone help solving it out ? I can’t create new VM’s, and that’s a big problem for me.

Please help.

Can you provide output of this commands:

ceph df detail

ceph osd pool get <opennebula_pool> size

Replace <opennebula_pool> with pool’s name used by opennebula.

Here you are. Also i attached one onedatastore list output.

[root@master-01 ceph]# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
21726G 8021G 13705G 63.08 1725k
POOLS:
NAME ID CATEGORY USED %USED OBJECTS DIRTY READ WRITE
data 0 - 0 0 0 0 0 0
metadata 1 - 0 0 0 0 0 0
rbd 2 - 0 0 0 0 0 0
test 3 - 640G 2.95 173344 164k 39850 791k
one 4 - 5309G 24.44 1387308 1354k 782M 6959M
backup 5 - 535G 2.46 205838 201k 1187k 6213k
[root@master-01 ceph]# ceph osd pool get one size
size: 2

[oneadmin@master-01 ceph]$ onedatastore list
ID NAME SIZE AVAIL CLUSTER IMAGES TYPE DS TM STAT
0 system 196.7G 63% - 0 sys - shared on
1 default 196.7G 63% - 2 img fs shared on
2 files 196.7G 63% - 2 fil fs ssh on
100 ceph 5.2T 0% - 74 img ceph ceph on
106 ssd-lvm-node- 196.7G 63% - 0 img fs shared on

Hi

It seems that there is a kind of mismatch between the output of ceph df and that expect by OpenNebula.

The script is located in remotes/datastore/ceph/monitor. It basically gets the output of ceph df and takes the 5th element (MAX AVAIL) which is %USE in your case:

MAX_AVAIL=\$($CEPH df | grep "$POOL_NAME" | awk '{print \$5}')                                                                                                                            
USED=\$($CEPH df | grep "$POOL_NAME" | awk '{print \$3}')   

In your case it may be better to restore the previous version of the script.

Cheers

I didn’t do the backup of the /var/lib/one/remotes. Can anyone paste here code from the 4.10 version?

I guess that is what you are looking for (press “Download” link).

Thank you very much. :slight_smile: Solved

Hi,

I am having the same issue however replacing monitor with the one from 14.10 did not resolve the issue for me.

When I run the monitor command from the logs I get the following result:

/usr/lib/ruby/1.8/rexml/parsers/treeparser.rb:92:in `parse': #<REXML::ParseException: #<NoMethodError: undefined    method `[]' for nil:NilClass> (REXML::ParseException)
/usr/lib/ruby/1.8/rexml/parsers/baseparser.rb:330:in `pull'
/usr/lib/ruby/1.8/rexml/parsers/treeparser.rb:22:in `parse'
/usr/lib/ruby/1.8/rexml/document.rb:245:in `build'
/usr/lib/ruby/1.8/rexml/document.rb:43:in `initialize'
/var/lib/one/remotes/datastore/ceph/../xpath.rb:58:in `new'
/var/lib/one/remotes/datastore/ceph/../xpath.rb:58
...
Exception parsing
Line:
Position:
Last 80 unconsumed characters:
</LN_TARG>
/usr/lib/ruby/1.8/rexml/parsers/baseparser.rb:418:in `pull'
/usr/lib/ruby/1.8/rexml/parsers/treeparser.rb:22:in `parse'
/usr/lib/ruby/1.8/rexml/document.rb:245:in `build'
/usr/lib/ruby/1.8/rexml/document.rb:43:in `initialize'
/var/lib/one/remotes/datastore/ceph/../xpath.rb:58:in `new'
/var/lib/one/remotes/datastore/ceph/../xpath.rb:58
...
#<NoMethodError: undefined method `[]' for nil:NilClass>
/usr/lib/ruby/1.8/rexml/parsers/baseparser.rb:330:in `pull'
/usr/lib/ruby/1.8/rexml/parsers/treeparser.rb:22:in `parse'
/usr/lib/ruby/1.8/rexml/document.rb:245:in `build'
/usr/lib/ruby/1.8/rexml/document.rb:43:in `initialize'
/var/lib/one/remotes/datastore/ceph/../xpath.rb:58:in `new'
/var/lib/one/remotes/datastore/ceph/../xpath.rb:58
...
Exception parsing
Line:
Position:
Last 80 unconsumed characters:
</LN_TARG
Line:
Position:
Last 80 unconsumed characters:
</LN_TARG
        from /usr/lib/ruby/1.8/rexml/document.rb:245:in `build'
        from /usr/lib/ruby/1.8/rexml/document.rb:43:in `initialize'
        from /var/lib/one/remotes/datastore/ceph/../xpath.rb:58:in `new'
        from /var/lib/one/remotes/datastore/ceph/../xpath.rb:58
/var/lib/one/remotes/datastore/ceph/../libfs.sh: line 238: RANDOM % 0: division by 0 (error token is "0")
ERROR MESSAGE --8<------
Datastore template missing 'BRIDGE_LIST' attribute.
ERROR MESSAGE ------>8--

Is there anything else you can advise?

Thanks

Alex