Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(actions): add pre-commit framework with codespell #191

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

jbampton
Copy link
Member

@jbampton jbampton commented Dec 1, 2023

Official -> "Git hook scripts are useful for identifying simple issues before submission to code review. We run our hooks on every commit to automatically point out issues in code such as missing semicolons, trailing whitespace, and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks."

https://pre-commit.com/

Using a pre-commit framework speeds up development as a lot of tests can be run on the local machine giving instant feedback. So we don't have to wait for the CI / GitHub actions to run to get feedback. The pre-commit automatically fixes some of the issues when you do git commit and if there are any issues the tests are marked as red failed. Then you will need to commit again so that all the tests pass green.

When pre-commit runs with GitHub Actions on the GitHub website the hooks/tests either pass or fail.

There are many more pre-commit checks listed here -> https://pre-commit.com/hooks.html

Lets get this PR merged and then I will look at adding more pre-commit tests 馃憤

This PR adds codespell to our pre-commit hooks.

The words in codespell.txt are ignored and this file has basically been created by running:

codespell . | cut -f2 -d' ' | tr A-Z a-z | sort | uniq > codespell.txt

from the repo root.

https://github.com/codespell-project/codespell

codespell is one of the leading spell checkers on GitHub.

Going forwards we will need to fix a lot of the misspelled words that are in codespell.txt

Official -> "Git hook scripts are useful for identifying simple issues before submission to code review. We run our hooks on every commit to automatically point out issues in code such as missing semicolons, trailing whitespace, and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks."

https://pre-commit.com/

Using a `pre-commit` framework speeds up development as a lot of tests can be run on the local machine giving instant feedback. So we don't have to wait for the CI / GitHub actions to run to get feedback. The pre-commit automatically fixes some of the issues when you do git commit and if there are any issues the tests are marked as red failed. Then you will need to commit again so that all the tests pass green.

When pre-commit runs with GitHub Actions on the GitHub website the hooks/tests either pass or fail.

There are many more pre-commit checks listed here -> https://pre-commit.com/hooks.html

Lets get this PR merged and then I will look at adding more pre-commit tests 馃憤

This PR adds `codespell` to our pre-commit hooks.

The words in `codespell.txt` are ignored and this file has basically been created by running:

`codespell . | cut -f2 -d' ' | tr A-Z a-z | sort | uniq > codespell.txt`

from the repo root.

https://github.com/codespell-project/codespell

`codespell` is one of the leading spell checkers on GitHub.

Going forwards we will need to fix a lot of the misspelled words that are in `codespell.txt`
Comment on lines +40 to +45
- name: Set PY
run: echo "PY=$(python -VV | sha256sum | cut -d' ' -f1)" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
path: ~/.cache/pre-commit
key: pre-commit|${{ env.PY }}|${{ hashFiles('.pre-commit-config.yaml') }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@dave2wave dave2wave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a huge number of misspellings

About the linter is it only checking comments?

@jbampton
Copy link
Member Author

jbampton commented Dec 2, 2023

https://github.com/codespell-project/codespell
https://pypi.org/project/codespell/

From the official repo:

"Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code (backslash escapes are skipped), but it can be used with other files as well."

So it checks a lot more than just comments.
Some of the misspelled words maybe just code terms that we need to ignore.

@jbampton
Copy link
Member Author

jbampton commented Dec 2, 2023

That's a huge number of misspellings

About the linter is it only checking comments?

If needed you can also exclude files / folders from being spell checked.

Apache Airflow is using codespell with pre-commit as seen at the next link:

https://github.com/apache/airflow/blob/41f4766d5b4873ddbf8daa94d837398342aeaf98/.pre-commit-config.yaml#L274

And they have about 1,800 lines in their ignored words list seen here:

https://github.com/apache/airflow/blob/main/docs/spelling_wordlist.txt

@Pilot-Pirx
Copy link
Member

/extras would definitely need to be excluded.
It contains the translations for all languages. so I would expect a lot of "false positives".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants