Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-off error with end index when OP "*" ending the pattern #1148

Closed
jofatmofn opened this issue Jun 26, 2017 · 6 comments
Closed

One-off error with end index when OP "*" ending the pattern #1148

jofatmofn opened this issue Jun 26, 2017 · 6 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@jofatmofn
Copy link

Not authorized to reopen 429, and hence creating a new issue.

    document = nlp(input_text)
  File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 329, in __call__
    proc(doc)
  File "spacy/syntax/parser.pyx", line 214, in spacy.syntax.parser.Parser.__call__ (spacy/syntax/parser.cpp:7989)
    raise ParserStateError(tokens)
ParserStateError: Error analysing doc -- no valid actions available. This should never happen, so please report the error on the issue tracker. Here's the thread to do so --- reopen it if it's closed:
https://github.com/spacy-io/spaCy/issues/429
Please include the text that the parser failed on, which is:
u'Thanks, can you create a ticket again.'

This happens only if I add an acceptor. Pattern:

                        [
                                {TAG: "NN", 'OP': "*"},
                                {TAG: "RB", LOWER: "again"}
                        ]

Acceptor method:

        count = 0
        while (start + count < end - 1):
                if doc[start + count].tag_ != "NN":
                        break
                count = count + 1
        return (ent_id, label, start, end - count - 1)
@jofatmofn
Copy link
Author

jofatmofn commented Jun 26, 2017

That doesn't appear to be a bug in spaCy, but in my code. The additional -1s are not required. The corrected acceptor method is

        count = 0
        while (start + count < end):
                if doc[start + count].tag_ != "NN":
                        break
                count = count + 1
        if count == 0:
                return False
        else:
                return (ent_id, label, start, start + count)

The reason I included that -1 is based on the experience with another pattern:

                        [
                                {TAG: "JJ", LOWER: "new"},
                                {TAG: "NN", 'OP': "*"}
                        ]

For the sentence
Could you create a new ticket for me?
spaCy is passing an additional token to the acceptor as
start = 4
end = 7 <== Should be 6
doc[start:end].text = "new ticket for"
Is this a bug then?

@jofatmofn
Copy link
Author

Please let me know if I need to create a separate issue for this.

@honnibal
Copy link
Member

I think this was a bug in the v1 Matcher, that should be fixed in v2. Sorry I didn't get to this sooner --- thanks for the report!

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Oct 20, 2017
@jofatmofn
Copy link
Author

  • Python version: 2.7.12
  • Platform: Linux-4.4.0-91-generic-x86_64-with-Ubuntu-16.04-xenial
  • spaCy version: 2.0.0a17
  • Models: en_core_web_sm-2.0.0a7

Tried the following code

import spacy
from spacy.matcher import Matcher
from spacy.attrs import *

def on_match_1(matcher, doc, id, matches):
        label, start, end = matches[id]
        print(label, start, end, doc[start:end])

nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
matcher.add(
        "TSTEND",
        on_match_1,
        [
                {TAG: "JJ", LOWER: "new"},
                {TAG: "NN", 'OP': "*"}
        ]
)
document = nlp(u'Could you create a new ticket for me?')
matches = matcher(document)

and got the following output
(8454456681159448349L, 4, 7, new ticket for)

@jofatmofn jofatmofn changed the title ParserStateError on adding an acceptor One-off error with end index when OP "*" ending the pattern Oct 23, 2017
@jofatmofn
Copy link
Author

jofatmofn commented Oct 23, 2017

Am unable to re-open the issue (isaacs/github#583) and hence leaving the current issue closed and opening a fresh one #1450.

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants