You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've done some digging and this only seems to affect v3.8. Downgrading to v3.7 fixes the problem.
The only 3.8 version I've tried is 3.8.2 so I'm unsure if 3.8.0 / 3.8.1 are also affected.
Overview
When running spacy.language.Language in a script on Windows, it randomly produces a segmentation fault (the behaviour in powershell is to stop execution of script and you need to run it in bash to see the "segmentation fault" error). This error does NOT appear on macOS, even in identical environments.
There appears to be no link between the text input and the crashes since:
It crashes on a different sentence each time
It processes the same sentence perfectly fine on subsequent runs
I've tracked it down to spacy.language.Language by isolating it with logging statements on either side of the function call. The error is not caught by a try/except block.
Three examples of sentences that it has crashed on:
le do la camera del sole al primo piano.
marco is offering you a drink.
non c'è niente qua giù.
The model being used is it-core-news-lg==3.8.0.
Update: The crash occurs on the English large model too.
Any advice is appreciated!
How to reproduce the behaviour
Simplified lemmatizer class:
importspacyclassClassName():
_nlp: spacy.language.Languagedef__init__(self, model_name: str):
self._nlp=spacy.load(name=model_name)
deflemmatize(self, input_str : str):
# Random crashes on this line# Try / except doesn't make any differencedoc=self._nlp(text=input_str)
# Do stuffreturnstuff
Actual class being used is here
Actual application code is here, within the generate_frequency_analysis function.
The code crashes on ~20% of the runs, even with identical input data. Each run has subtitles from ~100 minutes worth of mixed Italian / English content.
Your Environment
spaCy version: 3.8.2
Platform: Windows-11-10.0.26100-SP0
Python version: 3.12.7
Model: it-core-news-lg==3.8.0
System: Running on CPU: Ryzen 5600X, 32 GB RAM, Running on GPU: RTX 3070 Ti
The text was updated successfully, but these errors were encountered:
jonathanfox5
changed the title
Segmentation Fault Windows (Processing on CPU)
Segmentation Fault when running spacy.language.Language (Windows, Processing on CPU)
Nov 20, 2024
jonathanfox5
changed the title
Segmentation Fault when running spacy.language.Language (Windows, Processing on CPU)
Segmentation Fault when running lemmatisation (Windows, Processing on CPU)
Nov 20, 2024
jonathanfox5
changed the title
Segmentation Fault when running lemmatisation (Windows, Processing on CPU)
[v3.8.2] Segmentation Fault when running lemmatisation (Windows)
Nov 20, 2024
Update
I've done some digging and this only seems to affect v3.8. Downgrading to v3.7 fixes the problem.
The only 3.8 version I've tried is 3.8.2 so I'm unsure if 3.8.0 / 3.8.1 are also affected.
Overview
When running
spacy.language.Language
in a script on Windows, it randomly produces a segmentation fault (the behaviour in powershell is to stop execution of script and you need to run it in bash to see the "segmentation fault" error). This error does NOT appear on macOS, even in identical environments.There appears to be no link between the text input and the crashes since:
I've tracked it down to
spacy.language.Language
by isolating it withlogging
statements on either side of the function call. The error is not caught by a try/except block.Three examples of sentences that it has crashed on:
The model being used is
it-core-news-lg==3.8.0
.Update: The crash occurs on the English large model too.
Any advice is appreciated!
How to reproduce the behaviour
Simplified lemmatizer class:
Simplified application code
Actual class being used is here
Actual application code is here, within the
generate_frequency_analysis
function.The code crashes on ~20% of the runs, even with identical input data. Each run has subtitles from ~100 minutes worth of mixed Italian / English content.
Your Environment
it-core-news-lg==3.8.0
The text was updated successfully, but these errors were encountered: