-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to remove punctuations but exclude cases like "drive-thru"? #19
Comments
Right now, this is not possible. But this seems to me a feature this package should provide. I will look into it but this may take a while. |
You are mainly interested to keep hyphens in compound words, right? So other punctuation such as "." or "," should get removed. |
Yes that's correct. Other punctuation such as "." or "," should get removed. |
I had the same kind of scenario. I solved it like this. from cleantext import clean
def clean_with_exceptions(text, *args, **kwargs):
exceptions = kwargs.pop("exceptions", [])
for idx, exp in enumerate(exceptions):
text = text.replace(exp, "exp{}exp".format("z" * (idx + 1)))
text = clean(text, *args, **kwargs)
for idx, exp in enumerate(exceptions):
text = text.replace("exp{}exp".format("z" * (idx + 1)), exp)
return text
cleaned_text = clean_with_exceptions(
text,
exceptions=["-"],
no_line_breaks=True,
no_urls=True, # replace all URLs with a special token
no_emails=True, # replace all email addresses with a special token
no_currency_symbols=True, # replace all currency symbols with a special token
no_punct=True,
) It is a bit hackish, but it worked for my case. |
I'd like to remove punctuations from the text but would like to include "-".
For example, "text---cleaning" will become "text cleaning" but "drive-thru" will still be "drive-thru" after the cleaning/
The text was updated successfully, but these errors were encountered: