Scalability of managed applications on Non-Uniform Memory Access (NUMA) architectures has always been a challenging task. Focus has been steered on performance-critical components of the Managed Runtimes such as the Garbage Collectors where NUMA scalability optimizations have been proposed. However, prior to knowing under which circumstances NUMA architecture can be beneficial, the extensive research investment needed for such optimizations would be on quicksand. Moreover, th lack of tooling support for managed runtimes in the context of NUMA puts additional obstacles in the way of analyzing scalability bottlenecks and conclude whether a managed application can benefit from NUMA. The current thesis studies several memory and scalability aspects of managed applications in the context of NUMA architectures, in order to enable the NUMA scalability of MREs. More specifically, it leverages several Java applications and MaxineVM, a metacircular research VM written in Java. To tackle the lack of tooling support, this thesis proposes a tool-chain composed by the NUMAProfiler, a new Java profiler enriched with NUMA awareness, and by PerfUtil, a microarchitectural profiler with multiplexing support. The effectiveness of the tool-chain is based on the co-utilization of high and low-level profiling tools towards correlating HW metrics with Java application properties. The tool-chain is used to analyze the memory behavior of multiple Java applications picked from two benchmark suites. Moreover, a scalability analysis methodology is presented and applied on those applications in order to characterize them as per their scalability-critical properties. This characterization results in revealing multiple distinct application categories in which typical Java applications can potentially fit. The research findings that occur from the memory behavior and NUMA scalability studies are formalized into effective NUMA scalability guidelines for improving the performance of a managed application in a NUMA system. The scalability guidelines are amalgamated into a dynamic, application-agnostic, online optimization mechanism that it is implemented into the runtime layer of MaxineVM. The experimental evaluation of the mechanism showcase that performance ranges from 0.66x up to 3.29x with geometric mean of 1.11x, in comparison to the naive performance of the managed applications on a NUMA system.
|Date of Award
|31 Dec 2022
- The University of Manchester
|Mikel Luján (Supervisor) & Christos-Efthymios Kotselidis (Supervisor)