Abstract
The multiobjective realisation of the data clustering
problem has shown great promise in recent years, yielding
clear conceptual advantages over the more conventional, singleobjective
approach. Evolutionary algorithms have largely contributed
to the development of this increasingly active research
area on multiobjective clustering. Nevertheless, the unprecedented
volumes of data seen widely today pose significant
challenges and highlight the need for more effective and scalable
tools for exploratory data analysis. This paper proposes an
improved version of the multiobjective clustering with automatic
k-determination algorithm. Our new algorithm improves its predecessor
in several respects, but the key changes are related to the
use of an efficient, specialised initialisation routine and two alternative
reduced-length representations. These design components
exploit information from the minimum spanning tree and redefine
the problem in terms of the most relevant subset of its edges.
Our study reveals that both the new initialisation routine and the
new solution representations not only contribute to decrease the
computational overhead, but also entail a significant reduction of
the search space, enhancing therefore the convergence capabilities
and overall effectiveness of the method. These results suggest that
the new algorithm proposed here will offer significant advantages
in the realm of ‘big data’ analytics and applications.
problem has shown great promise in recent years, yielding
clear conceptual advantages over the more conventional, singleobjective
approach. Evolutionary algorithms have largely contributed
to the development of this increasingly active research
area on multiobjective clustering. Nevertheless, the unprecedented
volumes of data seen widely today pose significant
challenges and highlight the need for more effective and scalable
tools for exploratory data analysis. This paper proposes an
improved version of the multiobjective clustering with automatic
k-determination algorithm. Our new algorithm improves its predecessor
in several respects, but the key changes are related to the
use of an efficient, specialised initialisation routine and two alternative
reduced-length representations. These design components
exploit information from the minimum spanning tree and redefine
the problem in terms of the most relevant subset of its edges.
Our study reveals that both the new initialisation routine and the
new solution representations not only contribute to decrease the
computational overhead, but also entail a significant reduction of
the search space, enhancing therefore the convergence capabilities
and overall effectiveness of the method. These results suggest that
the new algorithm proposed here will offer significant advantages
in the realm of ‘big data’ analytics and applications.
Original language | English |
---|---|
Pages (from-to) | 515 - 535 |
Journal | IEEE Transactions on Evolutionary Computation |
Volume | 22 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2018 |
Keywords
- Evolutionary computation
- Data analysis
- clustering methods
- Data mining
- Pareto Optimization