Econe-upload fails on larger (10G) images

jason.tevnan · January 4, 2016, 5:06pm

Hi All!
I have been trying out some of the econe interfaces, and am running into problems when uploading some of our images.
The image is about 11G and i upload it normally via the econe-upload tool.

At first i thought that it just died, but that was not the case: in the background one is still uploading the file…

So i debugged the issue a bit:

Upload from the command line:

oneadmin@service01 ~ $ date && time econe-upload -U XXX -K YYY -S ZZZ /tmp/contextulize-image.raw
Mon Jan  4 17:38:44 CET 2016
/var/lib/gems/1.9.1/gems/curb-0.9.0/lib/curl/easy.rb:73:in `perform': Server returned nothing (no headers, no data) (Curl::Err::GotNothingError)
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `http_post'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

real	3m33.775s
user	0m2.659s
sys	0m19.796s

(note the time is approx. 17:41)
2.)
Check the /var/tmp for files:

oneadmin@service01 ~ $ ls -lah /var/tmp/
total 16G
drwxrwxrwt  3 root     root     4.0K Jan  4 17:41 .
drwxr-xr-x 13 root     root     4.0K May 27  2015 ..
drwx------  2 root     root      16K Oct  7 14:39 lost+found
-rw-------  1 oneadmin oneadmin 4.5G Jan  4 17:42 RackMultipart20160104-8268-1unujwz
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:41 thin-body20160104-8268-11eefhk

3.) log output
I added this line:
logger.error { "FU: #{erb_version} - #{response.result(binding)}" }
after this line:
https://github.com/OpenNebula/one/blob/master/src/cloud/ec2/lib/EC2QueryServer.rb#L127

Mon Jan 04 17:46:41 2016 [I]: Updating user pool cache.
Mon Jan 04 17:46:43 2016 [E]: FU: 2008-12-01 - <RegisterImageResponse xmlns="http://ec2.amazonaws.com/doc/2008-12-01/">
   <requestId>3e522bcb-29c3-40fa-bab5-bfd91bf4ada3</requestId>
  <imageId>ami-00000588</imageId>
</RegisterImageResponse>

Mon Jan 04 17:46:43 2016 [E]: UploadImage
Mon Jan 04 17:46:43 2016 [I]: 10.1.3.100 - - [04/Jan/2016 17:46:43] "POST / HTTP/1.1" 200 186 150.9169

note the time is alot later than when the error occured… namely when (as can be seen next) the RackMultipart* file(s) finished being uploaded… Its also the time that the image shows up in the image list in state locked.

4.) Still uploading according to files:

oneadmin@service01 ~ $ ls -lah /var/tmp/
total 23G
drwxrwxrwt  3 root     root     4.0K Jan  4 17:46 .
drwxr-xr-x 13 root     root     4.0K May 27  2015 ..
-rw-rw-r--  1 oneadmin oneadmin 1.2G Jan  4 17:47 5f2ca86da9534a0e231a4264eee13692
drwx------  2 root     root      16K Oct  7 14:39 lost+found
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:46 RackMultipart20160104-8268-1lvww3i
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:44 RackMultipart20160104-8268-1unujwz

Problem also persists with multipart-post:

oneadmin@service01 ~ $ econe-upload -M -U XXX -K YYY -S ZZZ /tmp/contextulize-image.raw 
/usr/lib/ruby/vendor_ruby/xmlsimple.rb:990:in `find_xml_file': File does not exist: end of file reached. (ArgumentError)
	from /usr/lib/ruby/vendor_ruby/xmlsimple.rb:168:in `xml_in'
	from /usr/lib/ruby/vendor_ruby/xmlsimple.rb:203:in `xml_in'
	from /usr/lib/ruby/vendor_ruby/AWS/responses.rb:15:in `parse'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:286:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

Anyone know why this is happening?

Also, why is the file uploaded twice:

-rw-------  1 oneadmin oneadmin  11G Jan  4 17:46 RackMultipart20160104-8268-1lvww3i
-rw-------  1 oneadmin oneadmin  11G Jan  4 17:44 RackMultipart20160104-8268-1unujwz

and then not removed afterwards?

Thanks!
Jason

Running on 4.14.2 latest (i just upgraded, just to make sure…)

jason.tevnan · January 5, 2016, 3:22pm

Still trying stuff out…

I have also gathered that this does not come from a timeout on the ClodClient side: e.g. in the http call here:

github.com

OpenNebula/one/blob/master/src/cloud/common/CloudClient.rb#L98


# Starts an http connection and calls the block provided. SSL flag
# is set if needed.
# #########################################################################
def self.http_start(url, timeout, &block)
    host = nil
    port = nil


    if ENV['http_proxy']
        uri_proxy  = URI.parse(ENV['http_proxy'])
        host = uri_proxy.host
        port = uri_proxy.port
    end


    http = Net::HTTP::Proxy(host, port).new(url.host, url.port)


    if timeout
        http.read_timeout = timeout.to_i
    end


    if url.scheme=='https'
        http.use_ssl = true

But it didnt help any.

Also, i have noticed alot of artifacts remain after one of these uploads under /var/tmp:

e.g.: (from lsof)

ruby      24430 26808   oneadmin   20u      REG                8,1 11492392960         20 /var/tmp/RackMultipart20160105-24430-dpflei
ruby      24430 26808   oneadmin   21u      REG                8,1 11492393933         29 /var/tmp/thin-body20160105-24430-1iq1e2r (deleted)

Which lead to the disk filling up:

root@service01 ~ # df -h|grep /var/tmp
/dev/sda1                                             220G  140G   69G  67% /var/tmp
root@service01 ~ # du -sh /var/tmp/
65G	/var/tmp/

restarting opennebula-econe cleans this up. probably related to ruby’s GC or not closing the file on the ruby side (maybe…)

Hope this helps…

dmolina · January 11, 2016, 10:06am

Hi Jason,

Are you using the default server (webrick) or did you install thin/passenger to run the econe server?

jason.tevnan · January 11, 2016, 11:37am

Hi Daniel!
I dont think we installed anything special: however i see that the server starts with the following message in the logs:

Thin web server (v1.3.1 codename Triple Espresso)

I also see that the files are initially uploaded with a thin-body prefix (e.g. thin-body20160104-8268-11eefhk )
On the econe configuration side, i dont see anything about the server running it:

root@service01 /etc/one # egrep ^: econe.conf
:tmpdir: /var/tmp
:one_xmlrpc: http://localhost:2633/RPC2
:host: rtpain-service01.runtastic.com
:port: 4567
:auth: ec2
:core_auth: cipher
:debug_level: 3
:datastore_id: 111
:describe_with_terminated_instances: true
:terminated_instances_expiration_time: 900
:use_file_templates: false
:instance_types:
:elasticips_vnet_id: 11
:associate_script: false
:disassociate_script: false
:ebs_fstype: ext4

Anything i can check?

dmolina · January 11, 2016, 2:56pm

I would try to deploy the econe server using passenger + nginx/apache for better performance. But, let’s try to find out if the problem is in the thin server configuration.

Create a config.ru file inside the econe folder (/usr/lib/one/ruby/cloud/econe)
https://gist.github.com/dmamolina/28101ddcc1b2bfb7d19f
Comment the following line in /usr/lib/one/ruby/cloud/econe/econe-server.rb
https://github.com/OpenNebula/one/blob/master/src/cloud/ec2/lib/econe-server.rb#L142
Run the following command from the econe folder (/usr/lib/one/ruby/cloud/econe) and change the timeout option:
thin --threaded --timeout 200 start

jason.tevnan · January 11, 2016, 11:43pm

Alright:
added 1 and 2.

3 started on a different port:
(i.E. 3000 vs. 4567 which was configured but no problem…)

and i did get the error again:

ginny ~:  econe-upload -M -U http://rtpain-service01.runtastic.com:3000 -K XXX -S YYY /home/jason/chef/cookbooks/packer-templates/build/contextulize-redis.qcow2                              
Digest::Digest is deprecated; use Digest
/home/jason/.rvm/gems/ruby-2.2.1/gems/xml-simple-1.1.5/lib/xmlsimple.rb:996:in `find_xml_file': File does not exist: end of file reached. (ArgumentError)
	from /home/jason/.rvm/gems/ruby-2.2.1/gems/xml-simple-1.1.5/lib/xmlsimple.rb:168:in `xml_in'
	from /home/jason/.rvm/gems/ruby-2.2.1/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'
	from /home/jason/.rvm/gems/ruby-2.2.1/gems/amazon-ec2-0.9.17/lib/AWS/responses.rb:15:in `parse'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:286:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

but this time with a longer stacktrace on the econe server side:
(i included the startup…)

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # thin --threaded --timeout 200 start
Using rack adapter
--------------------------------------
         Server configuration         
--------------------------------------
{:tmpdir=>"/var/tmp",
 :one_xmlrpc=>"http://localhost:2633/RPC2",
 :host=>"rtpain-service01.runtastic.com",
 :port=>4567,
 :auth=>"ec2",
 :core_auth=>"cipher",
 :debug_level=>3,
 :datastore_id=>111,
 :describe_with_terminated_instances=>true,
 :terminated_instances_expiration_time=>900,
 :use_file_templates=>false,
 :instance_types=>{:"m1.small"=>{:template=>"m1.small.erb"}},
 :elasticips_vnet_id=>11,
 :associate_script=>false,
 :disassociate_script=>false,
 :ebs_fstype=>"ext4",
 :template_location=>"/etc/one/ec2query_templates",
 :views=>"/usr/lib/one/ruby/cloud/econe/views"}
--------------------------------------

Thin web server (v1.6.4 codename Gob Bluth)
Maximum connections set to 1024
Listening on 0.0.0.0:3000, CTRL+C to stop
#log has been deprecated, please use the log_level function instead (e.g. - log_info).
!! Unexpected error while processing request: undefined method `create' for Rack::Multipart::Parser:Class
/var/lib/gems/1.9.1/gems/thin-1.6.4/lib/thin/logging.rb:156:in `log_error': wrong number of arguments (0 for 1) (ArgumentError)
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:125:in `handle_error'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:84:in `rescue in pre_process'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:59:in `pre_process'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `call'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `block in spawn_threadpool'

Now without multipart set:

ginny ~:  econe-upload -U http://rtpain-service01.runtastic.com:3000 -K XXX -S YYY /home/jason/chef/cookbooks/packer-templates/build/contextulize-redis.qcow2                                 
Digest::Digest is deprecated; use Digest
/home/jason/.rvm/gems/ruby-2.2.1/gems/curb-0.9.0/lib/curl/easy.rb:73:in `perform': Server returned nothing (no headers, no data) (Curl::Err::GotNothingError)
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `http_post'
	from /usr/lib/one/ruby/cloud/econe/EC2QueryClient.rb:254:in `upload_image'
	from /usr/bin/econe-upload:74:in `block (2 levels) in <main>'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `call'
	from /usr/lib/one/ruby/cli/command_parser.rb:449:in `run'
	from /usr/lib/one/ruby/cli/command_parser.rb:76:in `initialize'
	from /usr/bin/econe-upload:44:in `new'
	from /usr/bin/econe-upload:44:in `<main>'

and the server log:

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # thin --threaded --timeout 200 start
Using rack adapter
--------------------------------------
         Server configuration         
--------------------------------------
{:tmpdir=>"/var/tmp",
 :one_xmlrpc=>"http://localhost:2633/RPC2",
 :host=>"rtpain-service01.runtastic.com",
 :port=>4567,
 :auth=>"ec2",
 :core_auth=>"cipher",
 :debug_level=>3,
 :datastore_id=>111,
 :describe_with_terminated_instances=>true,
 :terminated_instances_expiration_time=>900,
 :use_file_templates=>false,
 :instance_types=>{:"m1.small"=>{:template=>"m1.small.erb"}},
 :elasticips_vnet_id=>11,
 :associate_script=>false,
 :disassociate_script=>false,
 :ebs_fstype=>"ext4",
 :template_location=>"/etc/one/ec2query_templates",
 :views=>"/usr/lib/one/ruby/cloud/econe/views"}
--------------------------------------

Thin web server (v1.6.4 codename Gob Bluth)
Maximum connections set to 1024
Listening on 0.0.0.0:3000, CTRL+C to stop


#log has been deprecated, please use the log_level function instead (e.g. - log_info).
!! Unexpected error while processing request: undefined method `create' for Rack::Multipart::Parser:Class
/var/lib/gems/1.9.1/gems/thin-1.6.4/lib/thin/logging.rb:156:in `log_error': wrong number of arguments (0 for 1) (ArgumentError)
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:125:in `handle_error'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:84:in `rescue in pre_process'
	from /usr/lib/ruby/vendor_ruby/thin/connection.rb:59:in `pre_process'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `call'
	from /var/lib/gems/1.9.1/gems/eventmachine-1.0.8/lib/eventmachine.rb:1062:in `block in spawn_threadpool'

Hm. While it looks like everything should be working, i am not seeing the tmp file created under /var/tmp at the moment? Am i getting something wrong?

I also can not call simple functions which used to work such as describe-images:

ginny ~:  econe-describe-images -U http://rtpain-service01.runtastic.com:3000 -K jattXXXest -S YYY                                                                                                  
Digest::Digest is deprecated; use Digest
econe-describe-images: end of file reached

While in the server log, i see the same error as before. Looks like i am still missing something.

just to verify:

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # cat /usr/lib/one/ruby/cloud/econe/config.ru 
# If you are using self contained installation set this variable to the
# install location
#ENV['ONE_LOCATION']='/path/to/OpenNebula/install'

WITH_RACKUP=true

$: << '.'
require 'econe-server'

run Sinatra::Application

and i commented the last line in /usr/lib/one/ruby/cloud/econe/econe-server.rb

I shut down the server via service opennebula-econe stop and started it in the foreground like you described with

root@rtpain-service01 /usr/lib/one/ruby/cloud/econe # thin --threaded --timeout 200 start
Using rack adapter
...

etc.
Did i miss something? Seems like the thin app is not starting with the correct defaults. (imho, but what do i know… )

I can try to get the passenger/nginx version running if it is the way2go: is there any documentation for that? e.g. passenger version, etc.?
We have a chef cookbook for passenger+nginx, but to configure it i would need some version/path infos.

PS… i find the arrested development reference in the thin server very nice…

jason.tevnan · January 13, 2016, 7:51pm

Hi Daniel!
Any thoughts on this? I also tried to use our production cluster, but with the same problem
Thanks!
Jason

dmolina · January 14, 2016, 10:58am

Hi Jason,

I don’t know what could be the problem. I would try to deploy it using Passenger, but I’m not sure if this will fix the issue.

Anyway, econe-upload is not an EC2 API call, it’s something we included to be able to upload images to the front-end. However, if I were you I would not use public interfaces (EC2) to upload images. I would move the image to the front-end and register it using oneimage command (one.image.allocate XMLRPC API call). You can also provide an url as the PATH of the image and oned will download and register it. Or you can use the AppMarket add-on to share appliances among different OpenNebula deployments

Hope this helps.

jason.tevnan · January 14, 2016, 4:08pm

Ok, i guess i will have to find a work around for it…
we are building the images on a different server, and automatically uploading them via econe-upload, and getting the ami-id back was the real upside

But were you able to atleast reproduce the stacktrace?
Thanks!
Jason

dmolina · January 15, 2016, 11:36am

Hi Jason,

Yes, I was able to reproduce it and I found out that problem is a timeout on the client side. I was able to upload big images by modifying the timeout param:

(there is a bug and the param in the initialize method is not used).

This change applies when uploading an image using the multipart option. If you don’t want to use this option you have to modify the connection instance since the @timeout is not used in this case:

github.com

OpenNebula/one/blob/master/src/cloud/ec2/lib/EC2QueryClient.rb#L250


    return error
end


params.each { |k,v|
    post_fields << Curl::PostField.content(k,v)
}


post_fields << Curl::PostField.content("Signature",sig)
post_fields << Curl::PostField.file("file",file_name)


connection = Curl::Easy.new(@uri.to_s)
connection.multipart_form_post = true
connection.ssl_verify_peer = false


connection.http_post(*post_fields)


if connection.response_code == 200
    return AWS::Response.parse(:xml => connection.body_str)
else
    r=AWS::Response.parse(:xml => connection.body_str)
    message=r['Errors']['Error']['Message']

If you confirm that it works for you I will do some change to support the timeout cli option for both curl and multipart methods.

jason.tevnan · January 21, 2016, 7:15pm

Ok… i really dont know what to say now…

Its definitely not related to the client, regardless of the timeout setting being set @timeout = nil

After a greuling marathon of changes we have figured out that the timeout problem comes from the server running (as you mentioned) webrick…

If we start up a thin sinatra app however, not everything is golden either:

when Starting it with the threaded option it respected the timeout, and the upload worked
when starting it without a threaded the timeout was not respected, but the upload worked
when starting with puma as an application server, it worked fine, but needed some convincing (additional gems, etc) but it also ran without a hitch…

So we started to experiment but no matter what we did, we could not get the server to accept a timeout setting OR run in a non threaded mode (even as a test) when being started via the init scripts…

Essentially the econe_application times out once the initial upload is finished and opennebula INTERNALLY starts reconverting the image. (so after the file prefixed with “thin” is finisehd uploading the timeout starts to run…)

I guess that there would have to be some settings we need to configure for that to be acknowledged, but any idea where we can set them?
Thanks!
Jason

jason.tevnan · February 16, 2016, 1:35pm

Hi Daniel!
I just got back from an extended vacation: did the information i found about the different setups help any?
Cheers!
Jason

dmolina · February 17, 2016, 10:46am

Hi Jason,

Since the init_scripts use this command to start the econe-server
ruby $ECONE_SERVER >$ECONE_LOG 2>$ECONE_LOG_ERROR &
Sinatra will use the default configuration for the server.

If you want to provide extra conf parameters you will have to use the config.ru file I posted in a previous thread and modify the econe-server script to run the thin or rackup commands with this file.

BTW, if you have access to the OpenNebula XMLRPC API, you could expose your images in an HTTP server on the client side and provide the URL as the PATH of the one.image.allocate template. This call will return an ID, you can use that ID with econe-register command to expose that image as an AMI through the econe-server.

You may want to take a look to the AppMarket addon. This is a component to share images between different OpenNebula instances. We are working on a tight integration of the marketplace concept for OpenNebula 5.0. You can see the dev docs in this link

If you have further questions do not hesitate to ask me

jason.tevnan · February 20, 2016, 10:33pm

Thanks Daniel:
For now i will use the xmlrpc api, and upload that way and wait for the new marketplace to go live.
If you need any testers for v5 we would be happy to try it out in hte office

Jason

Topic		Replies	Views
Econe tools upload problem Product Support	1	700	October 27, 2015
Econe-upload crash Product Support	1	583	March 4, 2016
Econe cli tools fail with ParseException Product Support	8	980	February 25, 2016
Econe-run-instances fail Product Support	3	583	April 14, 2016
ISO Images upload Product Support	9	4437	July 22, 2015

Econe-upload fails on larger (10G) images

Related topics