Fuzzy C-means++: Fuzzy C-means with effective seeding initialization

Adrian Stetco, Xiao-jun Zeng, John Keane

    Research output: Contribution to journalArticlepeer-review


    Fuzzy C-means has been utilized successfully in a wide range of applications, extending the clustering capability of the K-means to datasets that are uncertain, vague and otherwise hard to cluster. This paper introduces the Fuzzy C-means++ algorithm which, by utilizing the seeding mechanism of the K-means++ algorithm, improves the effectiveness and speed of Fuzzy C-means. By careful seeding that disperses the initial cluster centers through the data space, the resulting Fuzzy C-means++ approach samples starting cluster representatives during the initialization phase. The cluster representatives are well spread in the input space, resulting in both faster convergence times and higher quality solutions. Implementations in R of standard Fuzzy C-means and Fuzzy C-means++ are evaluated on various data sets. We investigate the cluster quality and iteration count as we vary the spreading factor on a series of synthetic data sets. We run the algorithm on real world data sets and to account for the non-determinism inherent in these algorithms we record multiple runs while choosing different k parameter values. The results show that the proposed method gives significant improvement in convergence times (the number of iterations) of up to 40 (2.1 on average) times the standard on synthetic datasets and, in general, an associated lower cost function value and Xie–Beni value. A proof sketch of the logarithmically bounded expected cost function value is given.
    Original languageEnglish
    Pages (from-to)7541-7548
    Number of pages7
    JournalExpert Systems with Applications
    Issue number21
    Early online date22 May 2015
    Publication statusPublished - 30 Nov 2015


    Dive into the research topics of 'Fuzzy C-means++: Fuzzy C-means with effective seeding initialization'. Together they form a unique fingerprint.

    Cite this