Support for others languages #94

lmaczulajtys · 2019-06-24T08:41:56Z

FOG index is also applicable for Polish language. The main difference is that in Polish, difficult words are usually 4-syllable and longer.

I suggest to add lang parameter to gunning_fog function. It will be passed to syllable_count and also used to select size od syllable_threshold.

Source:
https://pl.wikipedia.org/wiki/Indeks_czytelno%C5%9Bci_FOG (pl)
https://translate.google.com/translate?sl=pl&tl=en&u=https%3A%2F%2Fpl.wikipedia.org%2Fwiki%2FIndeks_czytelno%25C5%259Bci_FOG (en)
Unfortunatelly all sources about that are in Polish.

The text was updated successfully, but these errors were encountered:

alxwrd · 2019-06-25T15:55:43Z

I've done a bit of research and it appears there are language variants to the formulas for a few languages.

Because of that, I think the:

syllable_threshold = 4 if lang == 'pl_PL' else 3

might not be a long term solution.

I will have a think about how textstat could handle other languages going forward. Something like:

import textstat
textstat.lang = "pl_PL"

All the current 'hardcoded' values for the formulas would need to be extracted and kept in a dict that could have new languages with their values added at a later stage.

langs = {
    "en_US": {
        "syllable_threshold": 3,
        etc...
    },
    "pl_PL": {
        "syllable_threshold": 3,
        etc...
    },
}

alxwrd · 2019-06-25T16:11:46Z

Based on #93, current language would also need to be passed to Pyphen.

lmaczulajtys · 2019-06-26T11:41:17Z

Because methods results are cached by repoze.lru, I think we should do something like this:

import textstat
textstat.set_lang("en_US")

We should clear caches in set_lang().

lmaczulajtys · 2019-06-29T19:16:57Z

Nice source of knowledge for flesh_reading_ease Yoast/YoastSEO.js#267

GuillemGSubies · 2019-07-16T06:25:28Z

Any updates in #97 ? I would really appreciate if it got merged.

GuillemGSubies · 2019-08-22T09:02:05Z

I'm interested in adding this list of frequencies (easy words) for Spanish language (it comes from the Spanish Language Academy). However I don't know how many of them I should add. For what I have seen, the English easy words you use here is 3k words more or less.

Any thoughts?

alxwrd · 2020-01-04T23:06:13Z

hi @GuillemGSubies, sorry I forgot to respond to this!

I'm happy for Spanish words to be added for Spanish language support. I'm not sure how many should be included though as I'm not sure of the original source of the English word list used in textstat. @shivam5992, I'm not sure if you remember?

I'm not sure if any of the papers that introduce the formulas that use "easy" or "difficult" words reference the source of easy words.

GuillemGSubies · 2020-01-08T10:14:57Z

@alxwrd I created a PR to discuss my implementation #120. Should I add the source of the easy_words file? If so, how?

alxwrd · 2020-01-08T12:02:01Z

@GuillemGSubies I think if you just add the source here, for now, that would be good. I'm thinking over how to manage multiple languages going forward, including testing.

GuillemGSubies · 2020-01-08T12:09:09Z

http://corpus.rae.es/lfrecuencias.html It is the Spanish language academy

alxwrd · 2021-08-20T22:16:57Z

Just to tie this in, with #167 Announcement: Textstat organisation other language support should get a bit better.

lmaczulajtys mentioned this issue Jun 24, 2019

FOG-PL variant for Gunning FOG #95

Merged

alxwrd changed the title ~~Gunning FOG support for Polish language~~ Support for others languages Jun 25, 2019

alxwrd added the pull request welcome label Jun 25, 2019

alxwrd mentioned this issue Jun 25, 2019

Allow to change language in syllable_count to languages supported by Pyphen #93

Closed

lmaczulajtys mentioned this issue Jun 27, 2019

Refactor for multilanguage support #96

Merged

lmaczulajtys mentioned this issue Jul 4, 2019

Flesh Reading Ease variants for other languages #97

Merged

alxwrd added enhancement priority: high lang support and removed pr welcome labels Nov 17, 2020

alxwrd mentioned this issue Dec 8, 2020

Can we use this algorithm on other language. #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for others languages #94

Support for others languages #94

lmaczulajtys commented Jun 24, 2019

alxwrd commented Jun 25, 2019

alxwrd commented Jun 25, 2019

lmaczulajtys commented Jun 26, 2019 •

edited

lmaczulajtys commented Jun 29, 2019

GuillemGSubies commented Jul 16, 2019

GuillemGSubies commented Aug 22, 2019

alxwrd commented Jan 4, 2020

GuillemGSubies commented Jan 8, 2020

alxwrd commented Jan 8, 2020

GuillemGSubies commented Jan 8, 2020

alxwrd commented Aug 20, 2021

Support for others languages #94

Support for others languages #94

Comments

lmaczulajtys commented Jun 24, 2019

alxwrd commented Jun 25, 2019

alxwrd commented Jun 25, 2019

lmaczulajtys commented Jun 26, 2019 • edited

lmaczulajtys commented Jun 29, 2019

GuillemGSubies commented Jul 16, 2019

GuillemGSubies commented Aug 22, 2019

alxwrd commented Jan 4, 2020

GuillemGSubies commented Jan 8, 2020

alxwrd commented Jan 8, 2020

GuillemGSubies commented Jan 8, 2020

alxwrd commented Aug 20, 2021

lmaczulajtys commented Jun 26, 2019 •

edited