Sql errors after sqlite migration to one 5.0.2

Hi I recently upgraded a one 4.14 installation to one 5.0.2. following the migration documentation. Everything seems to work fine. I see sql errors in the log an am wondering what they tell me:

Tue Oct 11 14:58:13 2016 [Z0][ImM][D]: Datastore files (2) successfully monitored.
Tue Oct 11 14:58:13 2016 [Z0][ImM][D]: Datastore xenImage (113) successfully monitored.
Tue Oct 11 14:58:13 2016 [Z0][ImM][D]: Datastore image-nfs (1) successfully monitored.
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 130, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 131, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 243, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 3, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 56, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 58, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 60, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 73, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ONE][E]: SQL command was: SELECT body FROM vm_pool WHERE oid = 86, error: callback requested query abort
Tue Oct 11 14:58:13 2016 [Z0][ImM][D]: Datastore system-nfs (0) successfully monitored.
Tue Oct 11 14:58:13 2016 [Z0][ImM][D]: Datastore image-drbd (102) successfully monitored.
Tue Oct 11 14:58:14 2016 [Z0][MKP][D]: Marketplace OpenNebula Public (0) successfully monitored.

I had a look in the database to only find out that all these vms with the corresponding ids do not exist anymore, they have been deleted already.

the records seem to be ok in the db, for example:

sqlite3 /var/lib/one/one.db "SELECT body FROM vm_pool WHERE oid = 73;" | xmllint --format -

<?xml version="1.0"?>
<VM>
  <ID>73</ID>
  <UID>0</UID>
  <GID>0</GID>
  <UNAME>oneadmin</UNAME>
  <GNAME>oneadmin</GNAME>
  <NAME>Test WinVM</NAME>
  <PERMISSIONS>
    <OWNER_U>1</OWNER_U>
    <OWNER_M>1</OWNER_M>
    <OWNER_A>0</OWNER_A>
    <GROUP_U>0</GROUP_U>
    <GROUP_M>0</GROUP_M>
    <GROUP_A>0</GROUP_A>
    <OTHER_U>0</OTHER_U>
    <OTHER_M>0</OTHER_M>
    <OTHER_A>0</OTHER_A>
  </PERMISSIONS>
  <LAST_POLL>1466749823</LAST_POLL>
  <STATE>6</STATE>
  <LCM_STATE>0</LCM_STATE>
  <PREV_STATE>6</PREV_STATE>
  <PREV_LCM_STATE>0</PREV_LCM_STATE>
  <RESCHED>0</RESCHED>
  <STIME>1466594749</STIME>
  <ETIME>1466953265</ETIME>
  <DEPLOY_ID>one-73</DEPLOY_ID>
  <MONITORING>
    <CPU><![CDATA[0.0]]></CPU>
    <DISK_SIZE>
      <ID><![CDATA[0]]></ID>
      <SIZE><![CDATA[0]]></SIZE>
    </DISK_SIZE>
    <DISK_SIZE>
      <ID><![CDATA[1]]></ID>
      <SIZE><![CDATA[0]]></SIZE>
    </DISK_SIZE>
    <MEMORY><![CDATA[0]]></MEMORY>
    <NETRX><![CDATA[7859955]]></NETRX>
    <NETTX><![CDATA[2905424]]></NETTX>
  </MONITORING>
  <TEMPLATE>
    <AUTOMATIC_REQUIREMENTS><![CDATA[!(PUBLIC_CLOUD = YES)]]></AUTOMATIC_REQUIREMENTS>
    <CONTEXT>
      <DISK_ID><![CDATA[1]]></DISK_ID>
      <TARGET><![CDATA[hdb]]></TARGET>
    </CONTEXT>
    <CPU><![CDATA[1]]></CPU>
    <DISK>
      <CLONE><![CDATA[YES]]></CLONE>
      <CLONE_TARGET><![CDATA[SELF]]></CLONE_TARGET>
      <DATASTORE><![CDATA[image-drbd]]></DATASTORE>
      <DATASTORE_ID><![CDATA[102]]></DATASTORE_ID>
      <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
      <DISK_ID><![CDATA[0]]></DISK_ID>
      <DISK_SNAPSHOT_TOTAL_SIZE><![CDATA[0]]></DISK_SNAPSHOT_TOTAL_SIZE>
      <DRIVER><![CDATA[raw]]></DRIVER>
      <IMAGE><![CDATA[Test-WinVM]]></IMAGE>
      <IMAGE_ID><![CDATA[18]]></IMAGE_ID>
      <IMAGE_UNAME><![CDATA[oneadmin]]></IMAGE_UNAME>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <READONLY><![CDATA[NO]]></READONLY>
      <SAVE><![CDATA[NO]]></SAVE>
      <SIZE><![CDATA[35840]]></SIZE>
      <SOURCE><![CDATA[OpenNebula-image-18]]></SOURCE>
      <TARGET><![CDATA[hda]]></TARGET>
      <TM_MAD><![CDATA[drbdmanage]]></TM_MAD>
      <TYPE><![CDATA[FILE]]></TYPE>
    </DISK>
    <GRAPHICS>
      <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
      <PORT><![CDATA[5998]]></PORT>
      <TYPE><![CDATA[vnc]]></TYPE>
    </GRAPHICS>
    <INPUT>
      <BUS><![CDATA[usb]]></BUS>
      <TYPE><![CDATA[mouse]]></TYPE>
    </INPUT>
    <MEMORY><![CDATA[4096]]></MEMORY>
    <NIC>
      <AR_ID><![CDATA[0]]></AR_ID>
      <BRIDGE><![CDATA[br0]]></BRIDGE>
      <IP><![CDATA[172.16.143.126]]></IP>
      <MAC><![CDATA[02:00:ac:10:8f:7e]]></MAC>
      <NETWORK><![CDATA[pocnet1]]></NETWORK>
      <NETWORK_ID><![CDATA[0]]></NETWORK_ID>
      <NETWORK_UNAME><![CDATA[oneadmin]]></NETWORK_UNAME>
      <NIC_ID><![CDATA[0]]></NIC_ID>
      <SECURITY_GROUPS><![CDATA[0]]></SECURITY_GROUPS>
      <VLAN><![CDATA[NO]]></VLAN>
    </NIC>
    <OS>
      <ARCH><![CDATA[x86_64]]></ARCH>
      <BOOT><![CDATA[hd]]></BOOT>
    </OS>
    <SECURITY_GROUP_RULE>
      <PROTOCOL><![CDATA[ALL]]></PROTOCOL>
      <RULE_TYPE><![CDATA[OUTBOUND]]></RULE_TYPE>
      <SECURITY_GROUP_ID><![CDATA[0]]></SECURITY_GROUP_ID>
      <SECURITY_GROUP_NAME><![CDATA[default]]></SECURITY_GROUP_NAME>
    </SECURITY_GROUP_RULE>
    <SECURITY_GROUP_RULE>
      <PROTOCOL><![CDATA[ALL]]></PROTOCOL>
      <RULE_TYPE><![CDATA[INBOUND]]></RULE_TYPE>
      <SECURITY_GROUP_ID><![CDATA[0]]></SECURITY_GROUP_ID>
      <SECURITY_GROUP_NAME><![CDATA[default]]></SECURITY_GROUP_NAME>
    </SECURITY_GROUP_RULE>
    <TEMPLATE_ID><![CDATA[3]]></TEMPLATE_ID>
    <VCPU><![CDATA[4]]></VCPU>
    <VMID><![CDATA[73]]></VMID>
  </TEMPLATE>
  <USER_TEMPLATE>
    <ACPI><![CDATA[yes]]></ACPI>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
  </USER_TEMPLATE>
  <HISTORY_RECORDS>
    <HISTORY>
      <OID>73</OID>
      <SEQ>1</SEQ>
      <HOSTNAME>poc1</HOSTNAME>
      <HID>5</HID>
      <CID>-1</CID>
      <STIME>1466594803</STIME>
      <ETIME>1466749823</ETIME>
      <VMMMAD>kvm</VMMMAD>
      <VNMMAD>dummy</VNMMAD>
      <TMMAD>shared</TMMAD>
      <DS_LOCATION>/var/lib/one//datastores</DS_LOCATION>
      <DS_ID>0</DS_ID>
      <PSTIME>1466594804</PSTIME>
      <PETIME>1466594808</PETIME>
      <RSTIME>1466594808</RSTIME>
      <RETIME>1466749823</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <REASON>2</REASON>
      <ACTION>20</ACTION>
    </HISTORY>
  </HISTORY_RECORDS>
</VM>
  • what do these errors want to tell me?
  • why are these vms still in the database when they are actually already deleted? is this expected behaviour?

it’s not urgent the system is working fine at the moment, I am just wondering…

thanks!
all the best
Jojo

Hi

The problem is that the history records of this VMs have not been migrated, probably because the VM is in done. See:

VNMMAD is now part of the host

TMMAD nad VMMAD have been renamed to TM_MAD and VM_MAD for coherence with the rest of the code.

The error you are seeing is thrown by oned when it tries to read the VM from the DB and it finds missing attributes. I am not really sure why OpenNebula is trying to access those VMs (e.g. a direct query onevm show 73, showback calculations etc…)

thanks for your answer Ruben!

ok I understand now that deleted vms are in done state (after reading http://docs.opennebula.org/5.0/operation/references/vm_states.html) and that the vm is kept in the db for history/accounting purposes. makes sense didn’t think of this.

so is it expected that these “historical” vms where not migrated or was it a failure when i migrated the db??

I just checked my logfile of the db-migrate: it did not state any UNREPAIRED errors that could relate to this.
i only have some unrepaired things in templates, pretty sure this is not related.

I see two possible solutions to this: I just delete all these records from the vm_pool table because I don’t care for them any more :wink: Would it break something?

Or I try to replace this

<VMMMAD>kvm</VMMMAD>
<VNMMAD>dummy</VNMMAD>
<TMMAD>shared</TMMAD>

with that

<VM_MAD>kvm</VM_MAD>
<TM_MAD>shared</TM_MAD>

but that seems very tedious. i would have to think about how I would do this…
do you think that would even fix the errors? or is it a bad idea?

And yes it is a good question why opennebula trys to ask for those old vms. Maybe because of some usage data summary things?? the stuff i see on sunstone dashboard? maybe it uses this old data? just wild guessing, actually I have no idea what I am talking about haha…

at least what i know is that it’s not because some admins try to onevm show these vms the whole day long haha :wink:

anyway, thanks a lot for your time, probably i will just delete those records, just tell me if i could break something.
I will take a backup of the db anyway before doing this.

ok just had a look on a recent vm.
if i really would do a replacement of the <VMMMAD… stuff I should probably replace it whith this:

<VM_MAD><![CDATA[kvm]]></VM_MAD>
<TM_MAD><![CDATA[shared]]></TM_MAD>

Yes, in the past we had problems with long running clouds, migration took too much time because of this VMs. Probably the best approach is to provide a simple way to purge the DB. We already have an issue for this.

You should be fine.

Maybe there are more things in your old VMs… there is a onedb patch utility to automate DB changes, some examples here:

If you are not touching the VMs, I am not really sure why they are showing up…