Fast and explainable clustering based on sorting⋆

Xinye Chen, Stefan Güttel

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.
Original languageEnglish
JournalPattern Recognition
Publication statusAccepted/In press - 31 Jan 2024

Fingerprint

Dive into the research topics of 'Fast and explainable clustering based on sorting⋆'. Together they form a unique fingerprint.

Cite this