-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inflections and the Dale-Chall-Formula #150
Comments
Hi @LKirst, thank you for raising this! We currently have an open issue (#73) touching on difficult word usage. We currently have 4 methods/metrics that use
Maybe this area could do with a re-visit, and it doesn't make sense to use the same |
I believe this is a problem that stemming solves. E.g.:
|
Great idea. Could we separate regular inflection from irregular word formation using an NLTK stemmer? |
Totally — I'll start a PR. I'll look into what NLTK supports. I can imagine providing options for the kinds of inflections accepted. |
I found a good conversation of a similar idea implemented in Javascript: |
The textstat implementation of the Dale-Chall-Formula classifies several words as difficult words that the original Dale-Chall-Formula would not. For example, Scotland, returned, giants, giant's, strongest are returned as part of
textstat.difficult_words_list(text)
, even though the base forms return, giant, strong are all part of the easy words list.Dale and Chall (1948, p. 38-49) suggest that the following word forms should be considered familiar:
The complete list of rules can be found in Dale & Chall (1948).
I understand that most of these rules are not easy to implement for the textstat package, but to avoid confusion and maybe prompt users to check the list returned by
textstat.difficult_words_list(text)
, the README could point out the deviation from the original Dale & Chall formula?Source: Dale, E., & Chall, J. (1948). A Formula for Predicting Readability: Instructions. Educational Research Bulletin, 27(2), 37-54. Retrieved August 11, 2021, from http://www.jstor.org/stable/1473669
The text was updated successfully, but these errors were encountered: