Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Sample Usage document: stop words must be lowercase #234

Open
0dB opened this issue Aug 6, 2023 · 3 comments
Open

Update Sample Usage document: stop words must be lowercase #234

0dB opened this issue Aug 6, 2023 · 3 comments
Assignees

Comments

@0dB
Copy link
Contributor

0dB commented Aug 6, 2023

In Sample Document (https://derwen.ai/docs/ptr/sample/) I propose to update:

For each entry, you'll need to add a key that is the lemma and a value that's a list of its part-of-speech tags.

to

For each entry, you'll need to add a key that is the lemma (all lower-case) and a value that's a list of its part-of-speech tags.

@0dB 0dB changed the title Update Update Sample Usage document: stop words must be lowercase Aug 6, 2023
@Ankush-Chander
Copy link
Contributor

Ankush-Chander commented Aug 6, 2023

Hi @0dB ,
Existing documentation is exactly as it"s supposed to be.
Lemma of a token is not always necessarily lower-case. for example Proper nouns like London have lemma_ as London and not london. So suggested change will not be an accurate representation of what the stopwords field expect.

In case user want to omit London also as a stopword, the code will look like

nlp.add_pipe("textrank", config={ "stopwords": { "word": ["NOUN"], "London": ["PROPN"] }}) 

@0dB
Copy link
Contributor Author

0dB commented Aug 6, 2023

Ok, I understand. In my case it was the token "HGB" (acronym for a set of german laws for the B2B sector) that I had to lowercase to scrub it, so I thought this holds for all tokens. Ok, but I did trip over that 😊 Is it worth mentioning to others? You could point out what you wrote, no?

@ceteri
Copy link
Collaborator

ceteri commented Aug 7, 2023

Definitely, it's great to mention these points for others.

That examples/sample.ipynb notebooks is sort of the "backbone" for our MkDocs, and it could have more cases illustrated.

@ceteri ceteri self-assigned this Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants