Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management

Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, Nathalie Drach

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

178 Downloads (Pure)

Abstract

Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. These promises, however, are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform hardware abstraction of contemporary task-parallel programming models for both computing and memory resources with high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of accesses to task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system, and placement information from the operating system. On a 192-core system with 24 NUMA nodes, our optimizations achieve above 94% locality (fraction of local memory accesses), up to 5× better performance than NUMA-aware hierarchical work-stealing, and even 5.6× compared to static interleaved allocation. Finally, we show that state-of-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications.
Original languageEnglish
Title of host publicationInternational Conference on Parallel Architecture and Compilation Techniques
Pages125-137
DOIs
Publication statusPublished - 11 Sept 2016
EventInternational Conference on Parallel Architecture and Compilation Techniques - The Dan Carmel, Haifa, Israel
Duration: 11 Sept 201615 Sept 2016
Conference number: 25
http://pactconf.org/

Conference

ConferenceInternational Conference on Parallel Architecture and Compilation Techniques
Abbreviated titlePACT
Country/TerritoryIsrael
CityHaifa
Period11/09/1615/09/16
Internet address

Fingerprint

Dive into the research topics of 'Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management'. Together they form a unique fingerprint.
  • Best Paper Award

    Pop, A. (Recipient) & Drebes, A. (Recipient), 15 Sept 2016

    Prize: Prize (including medals and awards)

Cite this