-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v3.8.2] Segmentation Fault when running lemmatisation (Windows) #13692
Comments
I'm having the same issues on the Dutch models. I tried on 3 different machines now to ensure it's not a problem with a certain installation. The event viewer shows: Faulting application name: python.exe, version: 3.12.150.1013, time stamp: 0x651ac086 I'm not sure how to get a proper stacktrace. The failt offset is the same every time. |
Thanks for the reports. I'm looking at this but it's hard to track down. I'm guessing it's an issue with the new Blis version used. |
Update
I've done some digging and this only seems to affect v3.8. Downgrading to v3.7 fixes the problem.
The only 3.8 version I've tried is 3.8.2 so I'm unsure if 3.8.0 / 3.8.1 are also affected.
Overview
When running
spacy.language.Language
in a script on Windows, it randomly produces a segmentation fault (the behaviour in powershell is to stop execution of script and you need to run it in bash to see the "segmentation fault" error). This error does NOT appear on macOS, even in identical environments.There appears to be no link between the text input and the crashes since:
I've tracked it down to
spacy.language.Language
by isolating it withlogging
statements on either side of the function call. The error is not caught by a try/except block.Three examples of sentences that it has crashed on:
The model being used is
it-core-news-lg==3.8.0
.Update: The crash occurs on the English large model too.
Any advice is appreciated!
How to reproduce the behaviour
Simplified lemmatizer class:
Simplified application code
Actual class being used is here
Actual application code is here, within the
generate_frequency_analysis
function.The code crashes on ~20% of the runs, even with identical input data. Each run has subtitles from ~100 minutes worth of mixed Italian / English content.
Your Environment
it-core-news-lg==3.8.0
The text was updated successfully, but these errors were encountered: