Parallel Continuous Outlier Mining in Streaming Data

Theodoros Toliopoulos, Anastasios Gounaris, Kostas Tsichlas, Apostolos Papadopoulos, Sandra Sampaio

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

181 Downloads (Pure)

Abstract

In this work, we focus on distance-based outliers in a metric space, where the status of an entity as to whether it is an outlier is based on the number of other entities in its neighborhood. In the recent years, several solutions have tackled the problem of distance-based outliers in data streams, where outliers must be mined continuously as new elements become available. An interesting research problem is to combine the streaming environment with massively parallel systems to provide scalable stream-based algorithms. However, none of the previously proposed techniques refer to a massively parallel setting. Our proposal fills this gap and studies transferring state-of-the-art techniques in Apache Flink, a modern platform for intensive streaming analytics. We thoroughly present the technical challenges encountered and the alternatives that may be applied. We show speed-ups up to 117 (resp. 2076) times over a naive parallel (resp. non-parallel) solution in Flink, by using just an ordinary 4-core machine and a real-world dataset. Our results demonstrate that oulier mining can be achieved in an efficient and scalable manner. The resulting techniques have been made publicly available in open-source
Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018
EditorsTina Eliassi-Rad, Wei Wang, Ciro Cattuto, Foster Provost, Rayid Ghani, Francesco Bonchi
PublisherIEEE Computer Society
Pages227-236
Number of pages10
ISBN (Electronic)9781538650905
ISBN (Print)9781538650905
DOIs
Publication statusPublished - 2019

Publication series

Name2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)
ISSN (Print)2472-1573

Keywords

  • Anomaly detection
  • Flink
  • Streams

Fingerprint

Dive into the research topics of 'Parallel Continuous Outlier Mining in Streaming Data'. Together they form a unique fingerprint.

Cite this