Journal of Computations & Modelling

Development of a hybrid K-means-expectation maximization clustering algorithm

  • Pdf Icon [ Download ]
  • Times downloaded: 10711
  • Abstract

    Data clustering is a common technique for statistical data analysis in including machine learning, data mining, pattern recognition, image analysis and bioinformatics. The computational task of classifying the data set into k clusters is often referred to as k-clustering. K-Means and Expectation-Maximization algorithms are part of the commonly employed methods in clustering of data in relational databases. Results findings in some related work revealed that both algorithms have been found to be characterized with shortcomings. K-means was established not to guarantee convergence while Expectation-Maximization’s quick and premature convergence doesn’t assure optimality of results. As such, both algorithms are not efficient and effective enough; hence, there arises a need for a proposed algorithm that could both guarantee convergence and optimality of  results in discovering knowledge in very large database. A hybrid of K-means and Expectation-Maximization (KEM) was developed for this purpose. The proposed hybrid approach was evaluated with both K-means and Expectation-Maximization algorithms using time (rate of convergence), space complexities and the number of iterations. The computational results showed that the developed algorithm guarantees both optimality and convergence over K-Means and Expectation Maximization clustering algorithms.