TY - GEN
T1 - Dynamic Threshold Setting for VM Migration
AU - Hummaida, Abdul Rahman
AU - Paton, Norman W.
AU - Sakellariou, Rizos
PY - 2022/4/14
Y1 - 2022/4/14
N2 - Cloud data centres require efficient management of resources and robust methods that consider SLA violations, node utilisation and simplify the adaptation decision making process. Thus resource management should be ideally solved in an online manner. To address this approaches have been presented in the literature to set thresholds that trigger VM migration. One challenge with these approaches is they typically use node metrics (e.g., CPU and memory) as an indicator of VM performance and do not factor in VM performance metrics when setting the CPU migration threshold. A hypothesis is that migrating VMs without factoring in VM performance metrics, e.g., response time can lead to either early or delayed migration of VMs. We present an approach to discover the CPU utilization level for VM migration dynamically. This approach monitors VM response time and discovers the CPU threshold where response time would increase beyond a dened SLA level and uses this threshold for VM migration. We use reinforcement learning (RL) to learn when it is rewarding to migrate a VM. The RL reward function drives a policy towards high CPU utilisation and attaches a penalty to overachieving SLAs. We use simulation to evaluate the approach and compare it to 4 heuristics: Static, Interquartile Range, Median Absolute Deviation, Local Regression. The results show a significant reduction in SLA violations and an increase in CPU utilization of active nodes.
AB - Cloud data centres require efficient management of resources and robust methods that consider SLA violations, node utilisation and simplify the adaptation decision making process. Thus resource management should be ideally solved in an online manner. To address this approaches have been presented in the literature to set thresholds that trigger VM migration. One challenge with these approaches is they typically use node metrics (e.g., CPU and memory) as an indicator of VM performance and do not factor in VM performance metrics when setting the CPU migration threshold. A hypothesis is that migrating VMs without factoring in VM performance metrics, e.g., response time can lead to either early or delayed migration of VMs. We present an approach to discover the CPU utilization level for VM migration dynamically. This approach monitors VM response time and discovers the CPU threshold where response time would increase beyond a dened SLA level and uses this threshold for VM migration. We use reinforcement learning (RL) to learn when it is rewarding to migrate a VM. The RL reward function drives a policy towards high CPU utilisation and attaches a penalty to overachieving SLAs. We use simulation to evaluate the approach and compare it to 4 heuristics: Static, Interquartile Range, Median Absolute Deviation, Local Regression. The results show a significant reduction in SLA violations and an increase in CPU utilization of active nodes.
U2 - 10.1007/978-3-031-04718-3_2
DO - 10.1007/978-3-031-04718-3_2
M3 - Conference contribution
T3 - Service-Oriented and Cloud Computing
SP - 31
EP - 46
BT - Service-Oriented and Cloud Computing - 9th IFIP WG 6.12 European Conference
ER -