Econe-upload fails on larger (10G) images

Hi All!
I have been trying out some of the econe interfaces, and am running into problems when uploading some of our images.
The image is about 11G and i upload it normally via the econe-upload tool.

At first i thought that it just died, but that was not the case: in the background one is still uploading the file…

So i debugged the issue a bit:

  1. Upload from the command line:
oneadmin@service01 ~ $ date && time econe-upload -U XXX -K YYY -S ZZZ /tmp/contextulize-image.raw
Mon Jan  4 17:38:44 CET 2016
/var/lib/gems/1.9.1/gems/curb-0.9.0/lib/curl/easy.rb:73:in `perform': Server returned nothing (no headers, no data) (Curl::Err::GotNothingError)
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `http_post'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

real	3m33.775s
user	0m2.659s
sys	0m19.796s

(note the time is approx. 17:41)
2.)
Check the /var/tmp for files:

oneadmin@service01 ~ $ ls -lah /var/tmp/
total 16G
drwxrwxrwt  3 root     root     4.0K Jan  4 17:41 .
drwxr-xr-x 13 root     root     4.0K May 27  2015 ..
drwx------  2 root     root      16K Oct  7 14:39 lost+found
-rw-------  1 oneadmin oneadmin 4.5G Jan  4 17:42 RackMultipart20160104-8268-1unujwz
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:41 thin-body20160104-8268-11eefhk

3.) log output
I added this line:
logger.error { "FU: #{erb_version} - #{response.result(binding)}" }
after this line:

Mon Jan 04 17:46:41 2016 [I]: Updating user pool cache.
Mon Jan 04 17:46:43 2016 [E]: FU: 2008-12-01 - <RegisterImageResponse xmlns="http://ec2.amazonaws.com/doc/2008-12-01/">
   <requestId>3e522bcb-29c3-40fa-bab5-bfd91bf4ada3</requestId>
  <imageId>ami-00000588</imageId>
</RegisterImageResponse>

Mon Jan 04 17:46:43 2016 [E]: UploadImage
Mon Jan 04 17:46:43 2016 [I]: 10.1.3.100 - - [04/Jan/2016 17:46:43] "POST / HTTP/1.1" 200 186 150.9169

note the time is alot later than when the error occured… namely when (as can be seen next) the RackMultipart* file(s) finished being uploaded… Its also the time that the image shows up in the image list in state locked.

4.) Still uploading according to files:

oneadmin@service01 ~ $ ls -lah /var/tmp/
total 23G
drwxrwxrwt  3 root     root     4.0K Jan  4 17:46 .
drwxr-xr-x 13 root     root     4.0K May 27  2015 ..
-rw-rw-r--  1 oneadmin oneadmin 1.2G Jan  4 17:47 5f2ca86da9534a0e231a4264eee13692
drwx------  2 root     root      16K Oct  7 14:39 lost+found
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:46 RackMultipart20160104-8268-1lvww3i
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:44 RackMultipart20160104-8268-1unujwz

Problem also persists with multipart-post:

oneadmin@service01 ~ $ econe-upload -M -U XXX -K YYY -S ZZZ /tmp/contextulize-image.raw 
/usr/lib/ruby/vendor_ruby/xmlsimple.rb:990:in `find_xml_file': File does not exist: end of file reached. (ArgumentError)
	from /usr/lib/ruby/vendor_ruby/xmlsimple.rb:168:in `xml_in'
	from /usr/lib/ruby/vendor_ruby/xmlsimple.rb:203:in `xml_in'
	from /usr/lib/ruby/vendor_ruby/AWS/responses.rb:15:in `parse'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:286:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

Anyone know why this is happening?

Also, why is the file uploaded twice:

-rw-------  1 oneadmin oneadmin  11G Jan  4 17:46 RackMultipart20160104-8268-1lvww3i
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:44 RackMultipart20160104-8268-1unujwz

and then not removed afterwards?

Thanks!
Jason

Running on 4.14.2 latest (i just upgraded, just to make sure…)

Still trying stuff out…

I have also gathered that this does not come from a timeout on the ClodClient side: e.g. in the http call here:

But it didnt help any. :confused:

Also, i have noticed alot of artifacts remain after one of these uploads under /var/tmp:

e.g.: (from lsof)

ruby      24430 26808   oneadmin   20u      REG                8,1 11492392960         20 /var/tmp/RackMultipart20160105-24430-dpflei
ruby      24430 26808   oneadmin   21u      REG                8,1 11492393933         29 /var/tmp/thin-body20160105-24430-1iq1e2r (deleted)

Which lead to the disk filling up:

root@service01 ~ # df -h|grep /var/tmp
/dev/sda1                                             220G  140G   69G  67% /var/tmp
root@service01 ~ # du -sh /var/tmp/
65G	/var/tmp/

restarting opennebula-econe cleans this up. probably related to ruby’s GC or not closing the file on the ruby side (maybe…)

Hope this helps…

Hi Jason,

Are you using the default server (webrick) or did you install thin/passenger to run the econe server?

Hi Daniel!
I dont think we installed anything special: however i see that the server starts with the following message in the logs:

Thin web server (v1.3.1 codename Triple Espresso)

I also see that the files are initially uploaded with a thin-body prefix (e.g. thin-body20160104-8268-11eefhk )
On the econe configuration side, i dont see anything about the server running it:

root@service01 /etc/one # egrep ^: econe.conf
:tmpdir: /var/tmp
:one_xmlrpc: http://localhost:2633/RPC2
:host: rtpain-service01.runtastic.com
:port: 4567
:auth: ec2
:core_auth: cipher
:debug_level: 3
:datastore_id: 111
:describe_with_terminated_instances: true
:terminated_instances_expiration_time: 900
:use_file_templates: false
:instance_types:
:elasticips_vnet_id: 11
:associate_script: false
:disassociate_script: false
:ebs_fstype: ext4

Anything i can check?

I would try to deploy the econe server using passenger + nginx/apache for better performance. But, let’s try to find out if the problem is in the thin server configuration.

  1. Create a config.ru file inside the econe folder (/usr/lib/one/ruby/cloud/econe)
    https://gist.github.com/dmamolina/28101ddcc1b2bfb7d19f

  2. Comment the following line in /usr/lib/one/ruby/cloud/econe/econe-server.rb
    https://github.com/OpenNebula/one/blob/master/src/cloud/ec2/lib/econe-server.rb#L142

  3. Run the following command from the econe folder (/usr/lib/one/ruby/cloud/econe) and change the timeout option:
    thin --threaded --timeout 200 start

Alright:
added 1 and 2.

3 started on a different port:
(i.E. 3000 vs. 4567 which was configured but no problem…)

and i did get the error again:

ginny ~:  econe-upload -M -U http://rtpain-service01.runtastic.com:3000 -K XXX -S YYY /home/jason/chef/cookbooks/packer-templates/build/contextulize-redis.qcow2                              
Digest::Digest is deprecated; use Digest
/home/jason/.rvm/gems/ruby-2.2.1/gems/xml-simple-1.1.5/lib/xmlsimple.rb:996:in `find_xml_file': File does not exist: end of file reached. (ArgumentError)
	from /home/jason/.rvm/gems/ruby-2.2.1/gems/xml-simple-1.1.5/lib/xmlsimple.rb:168:in `xml_in'
	from /home/jason/.rvm/gems/ruby-2.2.1/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'
	from /home/jason/.rvm/gems/ruby-2.2.1/gems/amazon-ec2-0.9.17/lib/AWS/responses.rb:15:in `parse'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:286:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

but this time with a longer stacktrace on the econe server side:
(i included the startup…)

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # thin --threaded --timeout 200 start
Using rack adapter
--------------------------------------
         Server configuration         
--------------------------------------
{:tmpdir=>"/var/tmp",
 :one_xmlrpc=>"http://localhost:2633/RPC2",
 :host=>"rtpain-service01.runtastic.com",
 :port=>4567,
 :auth=>"ec2",
 :core_auth=>"cipher",
 :debug_level=>3,
 :datastore_id=>111,
 :describe_with_terminated_instances=>true,
 :terminated_instances_expiration_time=>900,
 :use_file_templates=>false,
 :instance_types=>{:"m1.small"=>{:template=>"m1.small.erb"}},
 :elasticips_vnet_id=>11,
 :associate_script=>false,
 :disassociate_script=>false,
 :ebs_fstype=>"ext4",
 :template_location=>"/etc/one/ec2query_templates",
 :views=>"/usr/lib/one/ruby/cloud/econe/views"}
--------------------------------------

Thin web server (v1.6.4 codename Gob Bluth)
Maximum connections set to 1024
Listening on 0.0.0.0:3000, CTRL+C to stop
#log has been deprecated, please use the log_level function instead (e.g. - log_info).
!! Unexpected error while processing request: undefined method `create' for Rack::Multipart::Parser:Class
/var/lib/gems/1.9.1/gems/thin-1.6.4/lib/thin/logging.rb:156:in `log_error': wrong number of arguments (0 for 1) (ArgumentError)
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:125:in `handle_error'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:84:in `rescue in pre_process'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:59:in `pre_process'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `call'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `block in spawn_threadpool'

Now without multipart set:

ginny ~:  econe-upload -U http://rtpain-service01.runtastic.com:3000 -K XXX -S YYY /home/jason/chef/cookbooks/packer-templates/build/contextulize-redis.qcow2                                 
Digest::Digest is deprecated; use Digest
/home/jason/.rvm/gems/ruby-2.2.1/gems/curb-0.9.0/lib/curl/easy.rb:73:in `perform': Server returned nothing (no headers, no data) (Curl::Err::GotNothingError)
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `http_post'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

and the server log:

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # thin --threaded --timeout 200 start
Using rack adapter
--------------------------------------
         Server configuration         
--------------------------------------
{:tmpdir=>"/var/tmp",
 :one_xmlrpc=>"http://localhost:2633/RPC2",
 :host=>"rtpain-service01.runtastic.com",
 :port=>4567,
 :auth=>"ec2",
 :core_auth=>"cipher",
 :debug_level=>3,
 :datastore_id=>111,
 :describe_with_terminated_instances=>true,
 :terminated_instances_expiration_time=>900,
 :use_file_templates=>false,
 :instance_types=>{:"m1.small"=>{:template=>"m1.small.erb"}},
 :elasticips_vnet_id=>11,
 :associate_script=>false,
 :disassociate_script=>false,
 :ebs_fstype=>"ext4",
 :template_location=>"/etc/one/ec2query_templates",
 :views=>"/usr/lib/one/ruby/cloud/econe/views"}
--------------------------------------

Thin web server (v1.6.4 codename Gob Bluth)
Maximum connections set to 1024
Listening on 0.0.0.0:3000, CTRL+C to stop


#log has been deprecated, please use the log_level function instead (e.g. - log_info).
!! Unexpected error while processing request: undefined method `create' for Rack::Multipart::Parser:Class
/var/lib/gems/1.9.1/gems/thin-1.6.4/lib/thin/logging.rb:156:in `log_error': wrong number of arguments (0 for 1) (ArgumentError)
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:125:in `handle_error'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:84:in `rescue in pre_process'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:59:in `pre_process'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `call'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `block in spawn_threadpool'

Hm. While it looks like everything should be working, i am not seeing the tmp file created under /var/tmp at the moment? Am i getting something wrong?

I also can not call simple functions which used to work such as describe-images:

ginny ~:  econe-describe-images -U http://rtpain-service01.runtastic.com:3000 -K jattXXXest -S YYY                                                                                                  
Digest::Digest is deprecated; use Digest
econe-describe-images: end of file reached

While in the server log, i see the same error as before. Looks like i am still missing something.

just to verify:

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # cat /usr/lib/one/ruby/cloud/econe/config.ru 
# If you are using self contained installation set this variable to the
# install location
#ENV['ONE_LOCATION']='/path/to/OpenNebula/install'

WITH_RACKUP=true

$: << '.'
require 'econe-server'

run Sinatra::Application

and i commented the last line in /usr/lib/one/ruby/cloud/econe/econe-server.rb

I shut down the server via service opennebula-econe stop and started it in the foreground like you described with

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # thin --threaded --timeout 200 start
Using rack adapter
...

etc.
Did i miss something? Seems like the thin app is not starting with the correct defaults. (imho, but what do i know… :blush: )

I can try to get the passenger/nginx version running if it is the way2go: is there any documentation for that? e.g. passenger version, etc.?
We have a chef cookbook for passenger+nginx, but to configure it i would need some version/path infos.

PS… i find the arrested development reference in the thin server very nice…

Hi Daniel!
Any thoughts on this? I also tried to use our production cluster, but with the same problem :confused:
Thanks!
Jason

Hi Jason,

I don’t know what could be the problem. I would try to deploy it using Passenger, but I’m not sure if this will fix the issue.

Anyway, econe-upload is not an EC2 API call, it’s something we included to be able to upload images to the front-end. However, if I were you I would not use public interfaces (EC2) to upload images. I would move the image to the front-end and register it using oneimage command (one.image.allocate XMLRPC API call). You can also provide an url as the PATH of the image and oned will download and register it. Or you can use the AppMarket add-on to share appliances among different OpenNebula deployments

Hope this helps.

Ok, i guess i will have to find a work around for it…
we are building the images on a different server, and automatically uploading them via econe-upload, and getting the ami-id back was the real upside :confused:

But were you able to atleast reproduce the stacktrace?
Thanks!
Jason

Hi Jason,

Yes, I was able to reproduce it and I found out that problem is a timeout on the client side. I was able to upload big images by modifying the timeout param:


(there is a bug and the param in the initialize method is not used).

This change applies when uploading an image using the multipart option. If you don’t want to use this option you have to modify the connection instance since the @timeout is not used in this case:

If you confirm that it works for you I will do some change to support the timeout cli option for both curl and multipart methods.

Ok… i really dont know what to say now…

Its definitely not related to the client, regardless of the timeout setting being set @timeout = nil

After a greuling marathon of changes we have figured out that the timeout problem comes from the server running (as you mentioned) webrick…

If we start up a thin sinatra app however, not everything is golden either:

  • when Starting it with the threaded option it respected the timeout, and the upload worked
  • when starting it without a threaded the timeout was not respected, but the upload worked
  • when starting with puma as an application server, it worked fine, but needed some convincing (additional gems, etc) but it also ran without a hitch…

So we started to experiment but no matter what we did, we could not get the server to accept a timeout setting OR run in a non threaded mode (even as a test) when being started via the init scripts…

Essentially the econe_application times out once the initial upload is finished and opennebula INTERNALLY starts reconverting the image. (so after the file prefixed with “thin” is finisehd uploading the timeout starts to run…)

I guess that there would have to be some settings we need to configure for that to be acknowledged, but any idea where we can set them?
Thanks!
Jason

Hi Daniel!
I just got back from an extended vacation: did the information i found about the different setups help any?
Cheers!
Jason

Hi Jason,

Since the init_scripts use this command to start the econe-server
ruby $ECONE_SERVER >$ECONE_LOG 2>$ECONE_LOG_ERROR &
Sinatra will use the default configuration for the server.

If you want to provide extra conf parameters you will have to use the config.ru file I posted in a previous thread and modify the econe-server script to run the thin or rackup commands with this file.

BTW, if you have access to the OpenNebula XMLRPC API, you could expose your images in an HTTP server on the client side and provide the URL as the PATH of the one.image.allocate template. This call will return an ID, you can use that ID with econe-register command to expose that image as an AMI through the econe-server.

You may want to take a look to the AppMarket addon. This is a component to share images between different OpenNebula instances. We are working on a tight integration of the marketplace concept for OpenNebula 5.0. You can see the dev docs in this link

If you have further questions do not hesitate to ask me

Thanks Daniel:
For now i will use the xmlrpc api, and upload that way and wait for the new marketplace to go live.
If you need any testers for v5 we would be happy to try it out in hte office :slightly_smiling:

Jason