Any idea on how I can fix this: I have about 50 vms in AWS managed by Opennebula. It seems that opennebula has hit a api limit, and not an instance limit.
Mon Jan 16 14:41:32 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 20 0 us-west-1
Mon Jan 16 14:41:32 2017 [Z0][InM][I]: Request limit exceeded.
Mon Jan 16 14:41:32 2017 [Z0][InM][E]: Error executing poll
Mon Jan 16 14:41:32 2017 [Z0][InM][I]: ExitCode: 255
Mon Jan 16 14:41:32 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll
jfontan
(Javi Fontán)
January 17, 2017, 9:41am
2
It doesn’t look like an instance limit, but more of an API limit. It looks like opennebula is querying AWS too much or something.
All the VM status shows: UNKNOWN. It will randomly recover to RUNNING, but most of the time it is UNKNOWN
The aws host: uswest-1 hosts status under infrastructure is constantly stuck in “RETRY”
I tried: /usr/share/one/install_gems
Still giving error:
Wed Jan 18 11:45:25 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 20 0 us-west-1
Wed Jan 18 11:45:25 2017 [Z0][InM][I]: Request limit exceeded.
Wed Jan 18 11:45:25 2017 [Z0][InM][E]: Error executing poll
Wed Jan 18 11:45:25 2017 [Z0][InM][I]: ExitCode: 255
Wed Jan 18 11:45:25 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll
Any idea? The pressure is on to get this resolved, or they are going to force a switch to spinakerio.
How is opennebula monitoring the AWS hosts? Is this done via CloudWatch? If so what resource is it using?
jfontan
(Javi Fontán)
January 18, 2017, 6:35pm
7
My bad, I thought the limit was the number of VMs. I would space the monitoring increasing the value of MONITORING_INTERVAL
. By default is 1 minute:
# Values: YES or NO.
#*******************************************************************************
LOG = [
SYSTEM = "file",
DEBUG_LEVEL = 3
]
#MANAGER_TIMER = 15
MONITORING_INTERVAL = 60
MONITORING_THREADS = 50
#HOST_PER_INTERVAL = 15
#HOST_MONITORING_EXPIRATION_TIME = 43200
#VM_INDIVIDUAL_MONITORING = "no"
#VM_PER_INTERVAL = 5
#VM_MONITORING_EXPIRATION_TIME = 14400
SCRIPTS_REMOTE_DIR=/var/tmp/one
Cloud watch is queried every 360 seconds:
Thank you for responding.
I have changed this “MONITORING_INTERVAL = 300”, and restarted opennebula but it still gives the same error.
I am using: Opennebula 5.0.2
ed Jan 18 14:05:19 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 20 0 us-west-1
Wed Jan 18 14:05:19 2017 [Z0][InM][I]: Request limit exceeded.
Wed Jan 18 14:05:19 2017 [Z0][InM][E]: Error executing poll
Wed Jan 18 14:05:19 2017 [Z0][InM][I]: ExitCode: 255
Wed Jan 18 14:05:19 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll
I also just changed “-i” to 120 seconds. So far so good? Does these changes makes sense to you?
IM_MAD = [
NAME = “collectd”,
EXECUTABLE = “collectd”,
ARGUMENTS = “-p 4124 -f 5 -t 50 -i 120” ]
#ARGUMENTS = “-p 4124 -f 5 -t 50 -i 20” ]
Well that failed:
Wed Jan 18 14:24:44 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 120 0 us-west-1
Wed Jan 18 14:24:44 2017 [Z0][InM][I]: Request limit exceeded.
Wed Jan 18 14:24:44 2017 [Z0][InM][E]: Error executing poll
Wed Jan 18 14:24:44 2017 [Z0][InM][I]: ExitCode: 255
Wed Jan 18 14:24:44 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll
What metric in Cloud Watch is opennebula polling?
I just increased the cw_mon_time number to 760 so far been good in the last 15 min.
Spoke too soon:
Thu Jan 19 12:27:04 2017 [Z0][InM][I]: Command execution fail: /var/lib/one/remotes/im/run_probes ec2 /var/lib/one//datastores 4124 150 0 us-west-1
Thu Jan 19 12:27:04 2017 [Z0][InM][I]: Request limit exceeded.
Thu Jan 19 12:27:04 2017 [Z0][InM][E]: Error executing poll
Thu Jan 19 12:27:04 2017 [Z0][InM][I]: ExitCode: 255
Thu Jan 19 12:27:04 2017 [Z0][ONE][E]: Error monitoring Host us-west-1 (0): Error executing poll
What is the exact metric in Cloud Watch is opennebula polling?
GetMetricsStatistics, ListMetrics or something else?
mcabrerizo
(Miguel Ángel Alvarez Cabrerizo)
January 23, 2017, 8:18am
15
Hi debian112!
I’ve checking the source code (ec2_driver.rb) to answer your question. CloudWatch is being used to get the following metrics using get_metric_statistics :
CPUUtilization
NetworkIn
NetworkOut
Cheers!
yes, I checked it. Still a no go. I did notice that upgrading to 5.2.1 help a little, but I still see the error 2 twice in an hour in the oned.log
Ok so this is the AWS statement:
“If an API request exceeds the API request rate for its category, the request returns the
RequestLimitExceeded error code. To prevent this error, ensure that your application doesn’t retry
API requests at a high rate. You can do this by using care when polling and by using exponential
backoff retries”
What can I set in Opennebula to reduce the API requests at a high rate?