Bag-of-words-model.

Description

This program implements the Bag-of-words-model (https://en.wikipedia.org/wiki/Bag-of-words_model), to study similarities between texts. In this particular example, we selected three different texts containing the lyrics of three nice Mexican and Cuban songs, although, it can be modified to pass any documents we might want to analyze.

Execution of the program

In your terminal run the command: python BagOfWordsM.py

Notice that the source code must be "outside" the folder "InputTexts" that has the texts we want to analyze.

Relevant functions:

clean_data -> "clean" the input text data
vectorization_frequencies -> vectorization of input texts, as vectors of frequencies
cos_similarities -> Computation of the cosine similarity between vectors (input texts represented by vectors)

Songs:

"La Llorona" -> https://www.youtube.com/watch?v=5pqPFMVAIeM
"La Ixhuateca" -> https://www.youtube.com/watch?v=VHRDLv5Y9Lg
"La niña de Guatemala" -> https://www.youtube.com/watch?v=XAAP6bfNGK4

References:

https://en.wikipedia.org/wiki/Vector_space_model
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
https://en.wikipedia.org/wiki/Bag-of-words_model
https://en.wikipedia.org/wiki/Cosine_similarity
https://en.wikipedia.org/wiki/Information_retrieval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bag-of-words-model.

Description

Execution of the program

Relevant functions:

Songs:

References:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bag-of-words-model.

Description

Execution of the program

Relevant functions:

Songs:

References: