Inflections and the Dale-Chall-Formula #150

LKirst · 2021-08-11T11:28:05Z

The textstat implementation of the Dale-Chall-Formula classifies several words as difficult words that the original Dale-Chall-Formula would not. For example, Scotland, returned, giants, giant's, strongest are returned as part of textstat.difficult_words_list(text), even though the base forms return, giant, strong are all part of the easy words list.

Dale and Chall (1948, p. 38-49) suggest that the following word forms should be considered familiar:

names of persons and places
regular plurals and possessives of words on the list
the third-person, singular forms (s or ies from y), present-participle forms (ing), past-participle forms (n), and past-tense forms (ed or ied from y), when these are added to verbs appearing on the list
comparatives and superlatives of adjectives appearing on the list
adverbs familiar which are formed by adding ly to a word on the list

The complete list of rules can be found in Dale & Chall (1948).

I understand that most of these rules are not easy to implement for the textstat package, but to avoid confusion and maybe prompt users to check the list returned by textstat.difficult_words_list(text), the README could point out the deviation from the original Dale & Chall formula?

Source: Dale, E., & Chall, J. (1948). A Formula for Predicting Readability: Instructions. Educational Research Bulletin, 27(2), 37-54. Retrieved August 11, 2021, from http://www.jstor.org/stable/1473669

The text was updated successfully, but these errors were encountered:

alxwrd · 2021-08-11T15:51:14Z

Hi @LKirst, thank you for raising this!

We currently have an open issue (#73) touching on difficult word usage. We currently have 4 methods/metrics that use difficult_words:

dale_chall_readability_score
gunning_fog
spache_readability
dale_chall_readability_score_v2

Maybe this area could do with a re-visit, and it doesn't make sense to use the same difficult_words method for everything.

dogweather · 2022-07-19T04:48:36Z

I believe this is a problem that stemming solves. E.g.:

The Dale and Chall wordlist is converted to a set of the stems of the words.
An input text's words are each mapped to their stem.
Each word is then judged to be simple if its stem is in the Dale and Chall stem list. (As opposed to the word itself being present in the Dale and Chall word list.

LKirst · 2022-07-19T11:41:29Z

Great idea. Could we separate regular inflection from irregular word formation using an NLTK stemmer?
Could you implement your solution?

dogweather · 2022-07-19T23:15:29Z

Great idea. Could we separate regular inflection from irregular word formation using an NLTK stemmer?
Could you implement your solution?

Totally — I'll start a PR. I'll look into what NLTK supports. I can imagine providing options for the kinds of inflections accepted.

dogweather · 2022-07-19T23:34:12Z

I found a good conversation of a similar idea implemented in Javascript:

alxwrd added bug priority: medium labels Aug 11, 2021

LKirst mentioned this issue Aug 19, 2021

Include regular inflections in the easy words set #168

Closed

nonprofittechy mentioned this issue Apr 4, 2024

Find a version of the Dale-Chall vocabulary list that is able to understand inflections SuffolkLITLab/RateMyPDF#31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inflections and the Dale-Chall-Formula #150

Inflections and the Dale-Chall-Formula #150

LKirst commented Aug 11, 2021 •

edited

alxwrd commented Aug 11, 2021

dogweather commented Jul 19, 2022 •

edited

LKirst commented Jul 19, 2022

dogweather commented Jul 19, 2022

dogweather commented Jul 19, 2022 •

edited

Inflections and the Dale-Chall-Formula #150

Inflections and the Dale-Chall-Formula #150

Comments

LKirst commented Aug 11, 2021 • edited

alxwrd commented Aug 11, 2021

dogweather commented Jul 19, 2022 • edited

LKirst commented Jul 19, 2022

dogweather commented Jul 19, 2022

dogweather commented Jul 19, 2022 • edited

LKirst commented Aug 11, 2021 •

edited

dogweather commented Jul 19, 2022 •

edited

dogweather commented Jul 19, 2022 •

edited