Skip to content

A Python Checker for Lexical Similarity in Identifier Names

License

Notifications You must be signed in to change notification settings

nalmadi/Namesake

Repository files navigation

GitHub top language GitHub issues GitHub GitHub last commit

⚠️ Namesake

A Checker of Lexical Similarity in Identifier Names

Namesake is an open-source tool for assessing confusing naming combinations in Python programs. Namesake flags confusing identifier naming combinations that are similar in:

  • Orthography (word form)
  • Phonology (pronunciation)
  • or Semantics (meaning)

💡 What is Lexical Similarity in Code?

Lexical access describes the retrieval of word shape (orthography), pronunciation (phonology), and meaning (semantics) from memory during reading for comprehension.

Orthographic similarity focuses on the the similarity in word form on the level of letters. Not to be confused by editing distance or Levenshtein's distance, where one letter is replaced by another, orthographic similarity focuses on the similarities between letters shapes. A good example is the confusion between 'O' and 'C' as individual letters or within words and sentences. Here's a common exmple in code:

Phonological similarity describes two words that share a similar or identical pronunciation, also known as homophones:

Semantic similarity describes words that share a meaning (synonyms):

⚙️ Installing Namesake:

first, to install the requirements:

pip install -r requirements.txt

🚀 Running Namesake:

To run Namesake on the file test1.py (with optional similarity thresholds):

python namesake.py test1.py [orth_threshold] [phon_threshold] [sem_threshold]

Threshold values must be between 0 and 1.

👀 Example Running Namesake:

📝 Citation:

Naser Al Madi. 2022. Namesake: A Checker of Lexical Similarity in Identifier Names. In Proceedings of The 37th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW 2022).

⚖️ License:

MIT (Free Software, Hell Yeah!)

About

A Python Checker for Lexical Similarity in Identifier Names

Resources

License

Stars

Watchers

Forks

Languages