Skip to content

A notebook that analyses an ebook, identifies the words required to understand 80% of it, scrapes the web for simple example sentences, and generates an Anki deck.

Notifications You must be signed in to change notification settings

harabat/arabic_nlp

Repository files navigation

Arabic NLP

This is my approach to learning a language efficiently.

Method

My experience is that reading is the best way of learning a language.

After choosing a book to read (Artemis by Andy Weir) and finding both the English and Arabic versions of the ebook, I analysed the word frequency in each (with considerable preparatory NLP work to stem/lemmatise the words, remove the stopwords, remove names and the like, etc.) and identified the minimum amount of the most common words in the book that I needed to know in order to understand 80% of the text (beyond 80%, it's easy to just infer a word's meaning from the context).

I then scraped Tatoeba and Reverso for example sentences (with translations) that contained the relevant words and generated an Anki deck, where instead of memorising words one by one, I learned the words in context, in sentences, which had the benefits of being less boring and a more effective way of memorising, according to some research.

This notebook is pretty specific to Arabic (it's difficult to make a generalised NLP workflow), but can be adapted to other languages relatively easily.

About

A notebook that analyses an ebook, identifies the words required to understand 80% of it, scrapes the web for simple example sentences, and generates an Anki deck.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published