Hello,
I have setup with OpenNebula with this configuration:
3 hosts (one of this node running oned) with Ubuntu Server 14.04 LTS + KVM as hypervisor + openvswitch as network
Storage is ceph.
And I have some issues with host failure detection.
My testing case is deploy vm to one of the host, after vm succesfully booted, I’m going to host, when it running, and just shutdown this host (by running poweroff command).
after that host changing state from ok to updated, and after long time (30-40 mins) host going to retry.
How can I decrease interval for host goes from udated to retry / error state after it was shut down?
UPD:
Timers configuration:
MANAGER_TIMER = 5
MONITORING_INTERVAL = 10
MONITORING_THREADS = 50
HOST_PER_INTERVAL = 15
HOST_MONITORING_EXPIRATION_TIME = 1800
#HOST_MONITORING_EXPIRATION_TIME = 43200
#VM_INDIVIDUAL_MONITORING = "no"
VM_PER_INTERVAL = 30
VM_MONITORING_EXPIRATION_TIME = 1800
#VM_MONITORING_EXPIRATION_TIME = 14400
IM Configuration:
IM_MAD = [
name = "collectd",
executable = "collectd",
arguments = "-p 4124 -f 2 -t 50 -i 5" ]
IM_MAD = [
name = "kvm",
executable = "one_im_ssh",
arguments = "-r 3 -t 15 kvm" ]
RPC configuration:
MAX_CONN = 50
MAX_CONN_BACKLOG = 50
KEEPALIVE_TIMEOUT = 15
KEEPALIVE_MAX_CONN = 50
TIMEOUT = 15
RPC_LOG = NO
#MESSAGE_SIZE = 1073741824
#LOG_CALL_FORMAT = "Req:%i UID:%u %m invoked %l"
How can I achieve reaction on host failure in 1-2 minute interval?