This program implements the Bag-of-words-model (https://en.wikipedia.org/wiki/Bag-of-words_model), to study similarities between texts. In this particular example, we selected three different texts containing the lyrics of three nice Mexican and Cuban songs, although, it can be modified to pass any documents we might want to analyze.
In your terminal run the command: python BagOfWordsM.py
Notice that the source code must be "outside" the folder "InputTexts" that has the texts we want to analyze.
- clean_data -> "clean" the input text data
- vectorization_frequencies -> vectorization of input texts, as vectors of frequencies
- cos_similarities -> Computation of the cosine similarity between vectors (input texts represented by vectors)
- "La Llorona" -> https://www.youtube.com/watch?v=5pqPFMVAIeM
- "La Ixhuateca" -> https://www.youtube.com/watch?v=VHRDLv5Y9Lg
- "La niña de Guatemala" -> https://www.youtube.com/watch?v=XAAP6bfNGK4