AWS API Request limit exceeded

Any idea on how I can fix this: I have about 50 vms in AWS managed by Opennebula. It seems that opennebula has hit a api limit, and not an instance limit.

Mon Jan 16 14:41:32 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 20 0 us-west-1
Mon Jan 16 14:41:32 2017 [Z0][InM][I]: Request limit exceeded.
Mon Jan 16 14:41:32 2017 [Z0][InM][E]: Error executing poll
Mon Jan 16 14:41:32 2017 [Z0][InM][I]: ExitCode: 255
Mon Jan 16 14:41:32 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll

You need to increase those limits in AWS:

http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

It doesn’t look like an instance limit, but more of an API limit. It looks like opennebula is querying AWS too much or something.

All the VM status shows: UNKNOWN. It will randomly recover to RUNNING, but most of the time it is UNKNOWN

The aws host: uswest-1 hosts status under infrastructure is constantly stuck in “RETRY”

I tried: /usr/share/one/install_gems

Still giving error:
Wed Jan 18 11:45:25 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 20 0 us-west-1
Wed Jan 18 11:45:25 2017 [Z0][InM][I]: Request limit exceeded.
Wed Jan 18 11:45:25 2017 [Z0][InM][E]: Error executing poll
Wed Jan 18 11:45:25 2017 [Z0][InM][I]: ExitCode: 255
Wed Jan 18 11:45:25 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll

Any idea? The pressure is on to get this resolved, or they are going to force a switch to spinakerio.

How is opennebula monitoring the AWS hosts? Is this done via CloudWatch? If so what resource is it using?

My bad, I thought the limit was the number of VMs. I would space the monitoring increasing the value of MONITORING_INTERVAL. By default is 1 minute:

Cloud watch is queried every 360 seconds:

Thank you for responding.

I have changed this “MONITORING_INTERVAL = 300”, and restarted opennebula but it still gives the same error.

I am using: Opennebula 5.0.2

ed Jan 18 14:05:19 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 20 0 us-west-1
Wed Jan 18 14:05:19 2017 [Z0][InM][I]: Request limit exceeded.
Wed Jan 18 14:05:19 2017 [Z0][InM][E]: Error executing poll
Wed Jan 18 14:05:19 2017 [Z0][InM][I]: ExitCode: 255
Wed Jan 18 14:05:19 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll

I also just changed “-i” to 120 seconds. So far so good? Does these changes makes sense to you?

IM_MAD = [
NAME = “collectd”,
EXECUTABLE = “collectd”,
ARGUMENTS = “-p 4124 -f 5 -t 50 -i 120” ]
#ARGUMENTS = “-p 4124 -f 5 -t 50 -i 20” ]

Well that failed:

Wed Jan 18 14:24:44 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 120 0 us-west-1
Wed Jan 18 14:24:44 2017 [Z0][InM][I]: Request limit exceeded.
Wed Jan 18 14:24:44 2017 [Z0][InM][E]: Error executing poll
Wed Jan 18 14:24:44 2017 [Z0][InM][I]: ExitCode: 255
Wed Jan 18 14:24:44 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll

What metric in Cloud Watch is opennebula polling?

I just increased the cw_mon_time number to 760 so far been good in the last 15 min.

Spoke too soon:
Thu Jan 19 12:27:04 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 150 0 us-west-1
Thu Jan 19 12:27:04 2017 [Z0][InM][I]: Request limit exceeded.
Thu Jan 19 12:27:04 2017 [Z0][InM][E]: Error executing poll
Thu Jan 19 12:27:04 2017 [Z0][InM][I]: ExitCode: 255
Thu Jan 19 12:27:04 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll

What is the exact metric in Cloud Watch is opennebula polling?
GetMetricsStatistics, ListMetrics or something else?

Any update?

Hi debian112!
I’ve checking the source code (ec2_driver.rb) to answer your question. CloudWatch is being used to get the following metrics using get_metric_statistics:

  • CPUUtilization
  • NetworkIn
  • NetworkOut

Cheers!

yes, I checked it. Still a no go. I did notice that upgrading to 5.2.1 help a little, but I still see the error 2 twice in an hour in the oned.log

Ok so this is the AWS statement:
“If an API request exceeds the API request rate for its category, the request returns the
RequestLimitExceeded error code. To prevent this error, ensure that your application doesn’t retry
API requests at a high rate. You can do this by using care when polling and by using exponential
backoff retries”

What can I set in Opennebula to reduce the API requests at a high rate?