Understanding the Performance of Managed Runtime Environments

  • Timothy Hartley

Student thesis: Phd

Abstract

Managed runtime environments (MRE) have become commonplace across the spectrum of computing devices and are nowadays found on portable devices such as mobile phones, personal computers and data-centre servers. Optimising MRE execution requires insight into the performance of the MRE components themselves and their interactions with the workloads they host. This thesis, in two parts, is concerned with aspects of performance engineering and understanding in the context of managed runtime environments. The first part, covers an investigation arising from porting MaxineVM, a research virtual machine to the ARMv8 architecture. During this work aspects of the original design were encountered, that cut across constraints imposed by the ARMv8 architecture affecting elements of the just-in-time compilation system, specifically the construction and subsequent treatment of call-sites at runtime, that must be patched to redirect control. We investigate functionally equivalent implementations of call-sites, evaluating the performance and tradeoffs using a microbenchmark, three JVM benchmark suites and statistical profiles derived from microarchitecture performance counters, on three diverse ARMv8 platforms. The experiments show the variation in performance between the alternate strategies of up to 12%, and also variation across the different implementations of the architecture. We find the potential opportunity to explore optimisation relevant to all instruction set architectures with limited direct call ranges using code cache management to encourage local direct branches. The second part of the thesis presents two fine-grained studies into managed runtime performance. Firstly, the Top-Down Microarchitecture Analysis methodology is extended to individual managed runtime threads, demonstrating dynamic microarchitectural utilisation behaviours of individual threads at OS-scheduling quantum granularity. These behaviours reveal which threads are effectively utilising the processors microarchitecture, and which are not, identifying opportunities for optimisation and motivation for further investigation. The second study explores and refines warm-up and steady-state behaviour analysis of a MRE. Benchmarking experiments are a common technique used to gain insight into the performance of MREs, and methodologies typically rely on timing iterations in order to ascertain when peak performance has been achieved. This thesis proposes a new approach, including microarchitecture performance counters, specifically using counts of retired micro-operations as a measure of work done. It is argued that this approach offers a more reliable metric than elapsed time, that is less susceptible to interference from the OS, and microarchitectural effects.
Date of Award31 Dec 2022
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorMikel Luján (Supervisor) & Christos-Efthymios Kotselidis (Supervisor)

Cite this

'