ScootR: Scaling R Dataframes on Dataflow Systems

Andreas Kunft, Lukas Stadler, Daniele Bonetta, Cosmin Basca, Jens Meiners, Sebastian Bress, Tilmann Rabl, Juan Fumero Alfonso, Volker Markl

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

To cope with today's large scale of data, parallel dataflow engines such as Hadoop, and more recently Spark and Flink, have been proposed. They offer scalability and performance, but require data scientists to develop analysis pipelines in unfamiliar programming languages and abstractions. To overcome this hurdle, dataflow engines have introduced some forms of multi-language integrations, e.g., for Python and R. However, this results in data exchange between the dataflow engine and the integrated language runtime, which requires inter-process communication and causes high runtime overheads. In this paper, we present ScootR, a novel approach to execute R in dataflow systems. ScootR tightly integrates the dataflow and R language runtime by using the Truffle framework and the Graal compiler. As a result, ScootR executes R scripts directly in the Flink data processing engine, without serialization and inter-process communication. Our experimental study reveals that ScootR outperforms state-of-the-art systems by up to an order of magnitude.
Original languageEnglish
Title of host publicationSoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery
Pages288-300
Number of pages13
ISBN (Electronic)9781450360111
DOIs
Publication statusPublished - 11 Oct 2018
EventACM Symposium on Cloud Computing 2018 - Cape Rey Beach Resort, Carlsbad, California, Carlsbad, United States
Duration: 11 Oct 201813 Oct 2018
https://acmsocc.github.io/2018/

Publication series

NameSoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing

Conference

ConferenceACM Symposium on Cloud Computing 2018
Country/TerritoryUnited States
CityCarlsbad
Period11/10/1813/10/18
Internet address

Keywords

  • Dataflow Engines
  • Language Integration
  • Data Exchange

Fingerprint

Dive into the research topics of 'ScootR: Scaling R Dataframes on Dataflow Systems'. Together they form a unique fingerprint.

Cite this