Skip to content

abhi40308/News-Documents-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News-Documents-Clustering

News documents clustering using latent semantic analysis. Used LSA and K-means algorithms to cluster news documents and visualized the results using UMAP (Uniform Manifold Approximation and Projection).

Considering the frequency(tf-idf) of important words in the news documents, the news documents are clustered where the related documents are shown using the same color which can be seen in the screenshots in the end. The color is decided by using k-means(running k-means on data separately and giving integer values to each documents based on k-means similarity results) and the actual positioning of documents(each document is represented by a dot on the graph) is achieved by applying LSA, thus verifying the results obtained using k-means.

This code is part of medium blog post
This post was published in mc.ai
Link to google colab

Results on 10000 documents

result

About

News documents clustering using latent semantic analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published