Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible infinite loop #44

Closed
brunobg opened this issue Jan 19, 2021 · 3 comments
Closed

Possible infinite loop #44

brunobg opened this issue Jan 19, 2021 · 3 comments

Comments

@brunobg
Copy link
Contributor

brunobg commented Jan 19, 2021

Running my tests with spaczz@master they seem to get into an infinite loop at the nlp() call. Stack dumps:

  File "/usr/lib64/python3.8/site-packages/spacy/language.py", line 445, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "/usr/lib64/python3.8/site-packages/spaczz/pipeline/spaczzruler.py", line 150, in __call__
    for fuzzy_match in self.fuzzy_matcher(doc):
  File "/usr/lib64/python3.8/site-packages/spaczz/matcher/_phrasematcher.py", line 103, in __call__
    matches_wo_label = self._searcher.match(doc, pattern, **kwargs)
  File "/usr/lib64/python3.8/site-packages/spaczz/search/_phrasesearcher.py", line 133, in match
    matches_w_nones = [
  File "/usr/lib64/python3.8/site-packages/spaczz/search/_phrasesearcher.py", line 134, in <listcomp>
    self._optimize(
  File "/usr/lib64/python3.8/site-packages/spaczz/search/_phrasesearcher.py", line 217, in _optimize
    r = self.compare(query, doc[bp_l:bp_r], *args, **kwargs)
  File "doc.pyx", line 308, in spacy.tokens.doc.Doc.__getitem__
  File "/usr/lib64/python3.8/site-packages/spacy/util.py", line 491, in normalize_slice
    if not (step is None or step == 1):

another ctrl-c during another run:

   self._doc = nlp(text)
  File "/usr/lib64/python3.8/site-packages/spacy/language.py", line 445, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "/usr/lib64/python3.8/site-packages/spaczz/pipeline/spaczzruler.py", line 150, in __call__
    for fuzzy_match in self.fuzzy_matcher(doc):
  File "/usr/lib64/python3.8/site-packages/spaczz/matcher/_phrasematcher.py", line 103, in __call__
    matches_wo_label = self._searcher.match(doc, pattern, **kwargs)
  File "/usr/lib64/python3.8/site-packages/spaczz/search/_phrasesearcher.py", line 133, in match
    matches_w_nones = [
  File "/usr/lib64/python3.8/site-packages/spaczz/search/_phrasesearcher.py", line 134, in <listcomp>
    self._optimize(
  File "/usr/lib64/python3.8/site-packages/spaczz/search/_phrasesearcher.py", line 205, in _optimize
    rl = self.compare(query, doc[p_l : p_r - f], *args, **kwargs)
  File "/usr/lib64/python3.8/site-packages/spaczz/search/fuzzysearcher.py", line 109, in compare
    return round(self._fuzzy_funcs.get(fuzzy_func)(a_text, b_text))
@gandersen101
Copy link
Owner

Hi @brunobg, this is concerning but hard to diagnose when the information at hand. If there is any way you could pinpoint what pattern(s)/doc(s) combinations are causing this that would be extremely helpful. Spaczz is well coverage tested and I have used it on the job on medical texts but new issues will always come up as people apply spaczz in new settings.

One thing to keep in mind is that spaczz can be extremely slow given a large enough pattern list and document(s). I explain why this is and why it is beyond my capabilities to significantly speed up spaczz in the short-term in issue #20. Not saying that is what is happening here but keep that in mind as well.

@brunobg
Copy link
Contributor Author

brunobg commented Jan 19, 2021

This happens only in one specific test, so I can probably isolate the pattern like I did before. It has been "fast enough" on every other test, which is why I think it's an infinite loop. Other tests take milliseconds, this one is still going after 10 seconds. Speed is not an issue for me within reasonable times.

I read #20 and it makes sense to me (though running it through a profiler would help to pinpoint where exact it takes too long).

@brunobg
Copy link
Contributor Author

brunobg commented Jan 19, 2021

Closing this. You're right, it just takes long (~100 time longer than scrapy NER).

@brunobg brunobg closed this as completed Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants