Enhancing the Performance of the PSO Algorithm for Clustering High dimensional data using Autoencoders

Abstract

The emergence of big data has brought new challenges in processing and analyzing large and complex datasets due to their high dimensionality. Unsupervised learning techniques like clustering have become powerful tools for identifying patterns and relationships in data without the need for labeled examples. One popular Unsupervised data clustering technique is K-means and Particle Swarm Optimization(PSO). Using K-means clustering with optimization can lead to better clustering results by combining the strengths of both algorithms. To automate the data clustering, the Elbow method is implemented, which provides the K value for implementing K-means and PSO. Clustering high-dimensional data can be challenging due to the curse of dimensionality, where the number of dimensions dramatically outnumbers the number of data points. Therefore, a dimensionality reduction technique must be employed to enhance the performance of clustering high-dimensional data. Thus, we used Autoencoder as one of the dimensionality reduction techniques with K-means and PSO clustering and compared the clustering performance on reduced and original data

Objectives

Design and develop a PSO algorithm for automatic data clustering.
Design and develop PSO employing Autoencoder for data clustering.
Compare the performance of PSO and Autoencoder-based PSO data clustering algorithms using different validity indices.
Apply this algorithm to Stock Market Data and obtain inferences.

Methodology

Results

Method	K-Means PSO	K-Means PSO with Autoencoders
Dataset	DB Index Silhouette Index	DB Index Silhouette Index
High	0.99316 0.044056	0.499879 0.598376
Low	0.98635 0.079333	0.492837 0.694484
Close	0.98474 0.046373	0.474543 0.634368
Open	0.93643 0.056383	0.547732 0.745483
Volume	0.99736 0.043367	0.498746 0.648464

Conclusion

Based on the evaluation metrics, used to measure the quality of clustering such as DB-index and Silhouette index, the PSO and K-means algorithm with autoencoders outperformed compared to PSO and K-means without autoencoders.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Pyspark Code		Pyspark Code
Python Code		Python Code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyspark Code

Pyspark Code

Python Code

Python Code

README.md

README.md

Repository files navigation

Enhancing the Performance of the PSO Algorithm for Clustering High dimensional data using Autoencoders

About

Releases

Packages

Languages

Priya-cse/PSO-and-KMeans-for-Clustering-High-dimensional-data-using-Autoencoders

Folders and files

Latest commit

History

Repository files navigation

Enhancing the Performance of the PSO Algorithm for Clustering High dimensional data using Autoencoders

About

Topics

Resources

Stars

Watchers

Forks

Languages