ONE 5.2.1 to 5.3.80 DB upgrade failed (MariaDB)

CentOS 7.3

[oneadmin@s04 ~]$ mysql --version
mysql  Ver 15.1 Distrib 5.5.52-MariaDB, for Linux (x86_64) using readline 5.1

Error:

MySQL dump stored in /var/lib/one/mysql_localhost_opennebula_2017-6-6_18:14:33.sql
Use 'onedb restore' or restore the DB using the mysql command:
mysql -u user -h server -P port db_name < backup_file


>>> Running migrators for shared tables
Database migrated from 5.2.0 to 5.3.80 (OpenNebula 5.3.80) by onedb command.

>>> Running migrators for local tables

Mysql2::Error: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'sql MEDIUMTEXT, timestamp INTEGER)' at line 1
/usr/local/share/gems/gems/mysql2-0.4.6/lib/mysql2/client.rb:120:in `_query'
/usr/local/share/gems/gems/mysql2-0.4.6/lib/mysql2/client.rb:120:in `block in query'
/usr/local/share/gems/gems/mysql2-0.4.6/lib/mysql2/client.rb:119:in `handle_interrupt'
/usr/local/share/gems/gems/mysql2-0.4.6/lib/mysql2/client.rb:119:in `query'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/adapters/mysql2.rb:137:in `block in _execute'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/database/logging.rb:44:in `log_connection_yield'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/adapters/mysql2.rb:132:in `_execute'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/adapters/utils/mysql_mysql2.rb:36:in `block in execute'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/database/connecting.rb:251:in `block in synchronize'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/connection_pool/threaded.rb:107:in `hold'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/database/connecting.rb:251:in `synchronize'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/adapters/utils/mysql_mysql2.rb:36:in `execute'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/adapters/mysql2.rb:71:in `execute_dui'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/database/query.rb:45:in `execute_ddl'
/usr/local/share/gems/gems/sequel-4.38.0/lib/sequel/database/query.rb:78:in `run'
/usr/lib/one/ruby/onedb/database_schema.rb:98:in `create_table'
/usr/lib/one/ruby/onedb/local/4.90.0_to_5.3.80.rb:185:in `feature_4809'
/usr/lib/one/ruby/onedb/local/4.90.0_to_5.3.80.rb:49:in `up'
/usr/lib/one/ruby/onedb/onedb.rb:232:in `apply_migrators'
/usr/lib/one/ruby/onedb/onedb.rb:179:in `upgrade'
/bin/onedb:323:in `block (2 levels) in <main>'
/usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
/usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
/usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
/bin/onedb:228:in `new'
/bin/onedb:228:in `<main>'


The database will be restored
MySQL DB opennebula at localhost restored.

BR,
Anton

Looks like the following patch should be extended to include a patch for database_schema.rb too.

https://github.com/OpenNebula/one/commit/50d0a6b42c3839153d83a561b55ec9cd516187ea

BR,
Anton

Pull request

https://github.com/OpenNebula/one/pull/324

BR,
Anton

With the patch the database was migrated.
onedb fsck fixed some issues regarding datastore relations to cluster 0.

But oned does not start because it believe the DB should be bootstrapped:

DB=BACKEND=mysql,DB_NAME=opennebula,PASSWD=onepass,PORT=0,SERVER=localhost,USER=oneadmin
...
----------------------------------------
Tue Jun  6 18:47:15 2017 [Z0][ONE][I]: Log level:3 [0=ERROR,1=WARNING,2=INFO,3=DEBUG]
Tue Jun  6 18:47:15 2017 [Z0][ONE][I]: Support for xmlrpc-c > 1.31: yes
Tue Jun  6 18:47:15 2017 [Z0][ONE][I]: Checking database version.
Tue Jun  6 18:47:15 2017 [Z0][ONE][I]: oned is using version 5.3.80 for local_db_versioning
Tue Jun  6 18:47:15 2017 [Z0][ONE][I]: oned is using version 5.3.80 for db_versioning
Tue Jun  6 18:47:15 2017 [Z0][ACL][I]: Starting ACL Manager...
Tue Jun  6 18:47:15 2017 [Z0][ACL][I]: ACL Manager started.
Tue Jun  6 18:47:15 2017 [Z0][ONE][E]: Password file /var/lib/one//.one/sunstone_auth already exists but OpenNebula is boostraping the database. Check your database configuration in oned.conf.

Ideas?

BR,
Anton

I’ve messed the DB during the upgrade. restored from backup, DB upgraded, oned is almost starting(segfaulting):

Reading symbols from /usr/bin/oned...Reading symbols from /usr/bin/oned...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install opennebula-server-5.3.80-1.x86_64
(gdb) r -f
Starting program: /bin/oned -f
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff207d700 (LWP 18402)]
[Thread 0x7ffff207d700 (LWP 18402) exited]

Program received signal SIGSEGV, Segmentation fault.
0x00000000004bd028 in std::_Rb_tree<int, std::pair<int const, ExtendedAttribute*>, std::_Select1st<std::pair<int const, ExtendedAttribute*> >, std::less<int>, std::allocator<std::pair<int const, ExtendedAttribute*> > >::begin() ()
(gdb) bt
#0  0x00000000004bd028 in std::_Rb_tree<int, std::pair<int const, ExtendedAttribute*>, std::_Select1st<std::pair<int const, ExtendedAttribute*> >, std::less<int>, std::allocator<std::pair<int const, ExtendedAttribute*> > >::begin() ()
#1  0x00000000004bcf3e in std::map<int, ExtendedAttribute*, std::less<int>, std::allocator<std::pair<int const, ExtendedAttribute*> > >::begin() ()
#2  0x00000000004bc86a in ExtendedAttributeSet::begin() ()
#3  0x000000000059bbed in ZoneServers::begin() ()
#4  0x000000000059b677 in ZonePool::get_zone_servers(int, std::map<int, std::string, std::less<int>, std::allocator<std::pair<int const, std::string> > >&) ()
#5  0x00000000005d5fb9 in get_zone_servers(std::map<int, std::string, std::less<int>, std::allocator<std::pair<int const, std::string> > >&) ()
#6  0x00000000005d54c7 in RaftManager::RaftManager(int, VectorAttribute const*, VectorAttribute const*, long, long long, long long, long, std::string const&) ()
#7  0x0000000000412277 in Nebula::start(bool) ()
#8  0x000000000040c702 in oned_main() ()
#9  0x000000000040ca28 in main ()

I’ll try bootstrapping empty database to check is it something related to the upgrade.

BR,
Anton

still not bottstrapped, same segfault with debuginfo installed

(gdb) bt
#0  0x00000000004bd028 in std::_Rb_tree<int, std::pair<int const, ExtendedAttribute*>, std::_Select1st<std::pair<int const, ExtendedAttribute*> >, std::less<int>, std::allocator<std::pair<int const, ExtendedAttribute*> > >::begin (this=0x8) at /usr/include/c++/4.8.2/bits/stl_tree.h:685
#1  0x00000000004bcf3e in std::map<int, ExtendedAttribute*, std::less<int>, std::allocator<std::pair<int const, ExtendedAttribute*> > >::begin (this=0x8) at /usr/include/c++/4.8.2/bits/stl_map.h:321
#2  0x00000000004bc86a in ExtendedAttributeSet::begin (this=0x0) at include/ExtendedAttribute.h:205
#3  0x000000000059bbed in ZoneServers::begin (this=0x0) at include/ZoneServer.h:137
#4  0x000000000059b677 in ZonePool::get_zone_servers (this=0x9904d0, zone_id=0, 
    _serv=std::map with 0 elements) at src/zone/ZonePool.cc:222
#5  0x00000000005d5fb9 in get_zone_servers (_serv=std::map with 0 elements)
    at src/raft/RaftManager.cc:249
#6  0x00000000005d54c7 in RaftManager::RaftManager (this=0x991660, id=-1, leader_hook_mad=0x0, 
    follower_hook_mad=0x0, log_purge=600, bcast=500, elect=2500, xmlrpc=2000, 
    remotes_location="/var/lib/one/remotes/") at src/raft/RaftManager.cc:99
#7  0x0000000000412277 in Nebula::start (this=0x9003c0 <Nebula::instance()::nebulad>, 
    bootstrap_only=false) at src/nebula/Nebula.cc:696
#8  0x000000000040c702 in oned_main () at src/nebula/oned.cc:85
#9  0x000000000040ca28 in main (argc=2, argv=0x7fffffffe588) at src/nebula/oned.cc:218
(gdb) l
217	        return 0;
218	    }
219	
220	    ZoneServers * followers = zone->get_servers();
221	
222	    for (zit = followers->begin(); zit != followers->end(); ++zit)
223	    {
224	        int id = (*zit)->get_id();
225	        std::string  edp = (*zit)->vector_value("ENDPOINT");
226	
(gdb) p followers
$4 = (ZoneServers *) 0x0

we are single instance so there are no followers expected …

(gdb) p zone.oid
$6 = 0
(gdb) p zone.xml
$7 = (xmlDocPtr) 0x991720
(gdb) p zone.servers
$8 = (ZoneServers *) 0x0


(gdb) p _serv
$1 = std::map with 0 elements

probably I am missing something in the process of initializing RAFT

Hi Anton,

THANK YOU VERY MUCH for such detailed feedback. Unfortunately de upgrade process is not supported in this beta release as we are missing some migrators for the DB schema (and we didn’t want to delay the release more).

So the problem here is that the Zone schema is missing the Server pool of the zone.

However, the method should return in if just before getting the followers in line 215, and not failing with a segfault. I’ll take a look at it