Scalable Virtual Machine Migration using Reinforcement Learning

Abdul Rahman Hummaida, Norman W. Paton, Rizos Sakellariou

Research output: Contribution to journalArticlepeer-review


Heuristic approaches require fixed knowledge of how resource allocation should be carried out, and this can be limiting when managing variable cloud workloads. Solutions based on Reinforcement Learning (RL) have been presented to manage cloud infrastructure, however, these tend to be centralized and suffer in their ability to maintain Quality of Service (QoS) for data centres with thousands of nodes. To address this, we propose a reinforcement learning management policy, which can run decentralized, and achieve fast convergence towards efficient resource allocation, resulting in lower SLA violations compared to centralized architectures. To address some of the common challenges in applying RL to cloud resource management, such as slow learning and state/action management, we use parallel learning and reduction of the state/action space. We apply a decision making approach to optimize the migration of a VM and choose a target node to host the VM in such a way that brings response time within SLA level. We have also demonstrate unique, multi-level reinforcement learning cooperation, that further reduces SLA violations. We use simulation to evaluate and demonstrate our proposal in practice, and compare the results obtained with an established heuristic, demonstrating significant improvement to SLA violations and higher scalability.
Original languageEnglish
JournalJournal of Grid Computing
Issue number2
Publication statusPublished - 28 Apr 2022


Dive into the research topics of 'Scalable Virtual Machine Migration using Reinforcement Learning'. Together they form a unique fingerprint.

Cite this