Abstract
Synthetic datasets play an important role in evaluating clustering algorithms,
as they can help shed light on consistent biases, strengths, and weaknesses of particular techniques, thereby supporting sound conclusions. Despite this, there is a surprisingly small set of established clustering benchmark data, and many of these are currently handcrafted. Even then, their difficulty is typically not quantified or considered, limiting the ability to interpret algorithmic performance on these datasets. Here, we introduce HAWKS, a new data generator that uses an evolutionary algorithm to evolve cluster structure of a synthetic data set. We demonstrate how such an approach can be used to produce datasets of a pre-specified difficulty, to trade off different aspects of problem difficulty, and how these interventions directly translate into changes in the clustering performance of established algorithms.
as they can help shed light on consistent biases, strengths, and weaknesses of particular techniques, thereby supporting sound conclusions. Despite this, there is a surprisingly small set of established clustering benchmark data, and many of these are currently handcrafted. Even then, their difficulty is typically not quantified or considered, limiting the ability to interpret algorithmic performance on these datasets. Here, we introduce HAWKS, a new data generator that uses an evolutionary algorithm to evolve cluster structure of a synthetic data set. We demonstrate how such an approach can be used to produce datasets of a pre-specified difficulty, to trade off different aspects of problem difficulty, and how these interventions directly translate into changes in the clustering performance of established algorithms.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) |
DOIs | |
Publication status | Published - 13 Jul 2019 |
Event | The Genetic and Evolutionary Computation Conference - Prague, Czech Republic Duration: 13 Jul 2019 → 17 Jul 2019 |
Conference
Conference | The Genetic and Evolutionary Computation Conference |
---|---|
Abbreviated title | GECCO 2019 |
Country/Territory | Czech Republic |
City | Prague |
Period | 13/07/19 → 17/07/19 |