Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency Matcher ignore OP for negation #12926

Closed
shaked571 opened this issue Aug 21, 2023 · 4 comments
Closed

Dependency Matcher ignore OP for negation #12926

shaked571 opened this issue Aug 21, 2023 · 4 comments
Labels
enhancement Feature requests and improvements feat / matcher Feature: Token, phrase and dependency matcher

Comments

@shaked571
Copy link

shaked571 commented Aug 21, 2023

I wrote a Dependency Matcher pattern, when I demand not to have a Negation Arc from the root.

For some reason, the matcher ignores my request and matched it anyway.
If OP is not supported I would expect to get an Exception not to fail and match.

## How to reproduce the behavior
negative_effect_pattern_dep = [
    {
        'RIGHT_ID': 'cause_verb',
        'RIGHT_ATTRS': {'LEMMA': {'IN': ['cause', 'make', 'give']}}
    },
    {
        'LEFT_ID': 'cause_verb',
        'REL_OP': '>',
        'RIGHT_ID': 'negation',
        'RIGHT_ATTRS': {'LEMMA': {'IN': ["nt", "n't", "not"]}, 'OP': '!'},
        'OP': '!'
    },
    {
        'LEFT_ID': 'cause_verb',
        'REL_OP': '>>',
        'RIGHT_ID': 'effect',
        'RIGHT_ATTRS': {'LOWER': {'IN': ['eczema', 'psoriasis', 'allergy', 'hive', 'itch', 'blister', 'inflammation', 'acne', 'dermatitis']}}
    },
    {
        'LEFT_ID': 'cause_verb',
        'REL_OP': '>',
        'RIGHT_ID': 'subject',
        'RIGHT_ATTRS': {'POS':{'IN': ['NOUN', 'PRON', 'PROPN']}}
    }
]

matcher.add('NEG_AFFECT1', [negative_effect_pattern_dep])



docs = [
    "it gave a horrible and annoying weird headache and an allergy ",
    "it didn't give a horrible and annoying weird headache and an allergy "

]
print("Found Matches:")
for doc in docs:
    parsed_doc = nlp(doc)
    matches = matcher(parsed_doc)
    for match_id, span in matches:
        string_id = nlp.vocab.strings[match_id]  # Get string representation
        span_t = parsed_doc[min(span):max(span)+1]
        print(f"{string_id:<20}{span} {span_t.text}")

NEG_AFFECT1 [1, 10, 0] it gave a horrible and annoying weird headache and an allergy
NEG_AFFECT1 [1, 10, 7] gave a horrible and annoying weird headache and an allergy
NEG_AFFECT1 [3, 12, 0] it didn't give a horrible and annoying weird headache and an allergy
NEG_AFFECT1 [3, 12, 9] gave a horrible and annoying weird headache and an allergy

Your Environment

  • spaCy version: 3.5.1
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.9.16
  • Pipelines: en_core_web_lg (3.5.0), en_core_web_trf (3.5.0), en_core_web_sm (3.5.0), en_core_web_md (3.5.0)
@svlandeg svlandeg added the feat / matcher Feature: Token, phrase and dependency matcher label Aug 21, 2023
@svlandeg
Copy link
Member

Hi!

To better understand what is happening, it's helpful to change the printing of the matches so that they show which exact string was matched for which subpart of the pattern:

        if matches:
            match_id, token_ids = matches[0]
            for i in range(len(token_ids)):
                print(negative_effect_pattern_dep[i]["RIGHT_ID"] + ":", parsed_doc[token_ids[i]].text)
            print()

Now, let's first define the negation pattern as an actual, simple negation:

        {
            'LEFT_ID': 'cause_verb',
            'REL_OP': '>',
            'RIGHT_ID': 'negation',
            'RIGHT_ATTRS': {'LEMMA': {'IN': ["nt", "n't", "not"]}}
        },

As expected, this does not produce any matches on your first (positive) example text, but it will on your second:

   cause_verb: give
   negation: n't
   effect: allergy
   subject: it

Now, you've tried adding 'OP': '!' to this subpattern:

        {
            'LEFT_ID': 'cause_verb',
            'REL_OP': '>',
            'RIGHT_ID': 'negation',
            'RIGHT_ATTRS': {'LEMMA': {'IN': ["nt", "n't", "not"]}},
            'OP': '!'
        },

but this is unfortunately not supported and won't change the output. As stated in the docs, only 4 keys are supported: 'LEFT_ID', 'REL_OP', 'RIGHT_ID' and 'RIGHT_ATTRS'. You can add 'FOO': 'BAR' and that won't have any effect. I agree with your suggestion of emittting a warning when an unrecognized key occurs.

Note that for an unrecognized value (for an existing key) an error will in fact be raised. E.g. when you put 'REL_OP': 'BAR', you'll get a ValueError: [E1007] Unsupported DependencyMatcher operator 'BAR'..

Looking at your example code, it appears that you've also tried putting the 'OP': '!' part within the RIGHT_ATTRS definition, so you get this:

        {
            'LEFT_ID': 'cause_verb',
            'REL_OP': '>',
            'RIGHT_ID': 'negation',
            'RIGHT_ATTRS': {'LEMMA': {'IN': ["nt", "n't", "not"]}, 'OP': '!'},
        },

In this case, both sentences will match and that's actually not a bug. Look at the output:

(text 1)
   cause_verb: gave
   negation: it
   effect: allergy
   subject: it

(text 2)
   cause_verb: give
   negation: it
   effect: allergy
   subject: it

What happened is that in both cases, the matcher found a token dependent on the cause_verb whose lemma is NOT in the given list ["nt", "n't", "not"] - this token was "it".

In summary (TLDR):

  • Adding the '!' operator within the 'RIGHT_ATTRS' won't give you the desired behaviour as there may be other tokens that match
  • What you want to do instead, is say to the dependency matcher that none of the dependent tokens should match the given specification. This functionality is currently not supported (but we'd accept a PR!)
  • We'll have a look at potentially warning when an unrecognized key is used in the pattern dictionary.

@svlandeg svlandeg added the enhancement Feature requests and improvements label Aug 21, 2023
@shaked571
Copy link
Author

shaked571 commented Aug 21, 2023

OK.

I now understand what happened.

Thank you for your quick and professional response.

I managed to achieve what I need using the 'REL_OP': ';':

     {
            'LEFT_ID': 'cause_verb',
            'REL_OP': ';', # didn current->[nt] [gave]<-root | ; meanning - A immediately follows B, i.e. A.i == B.i + 1, and both are within the same dependency tree
            'RIGHT_ID': 'negation',
            'RIGHT_ATTRS': {'LEMMA': {'IN': ["nt", "n't", "not"]}, 'OP': '!'},
        },

This way I can enforce that the token before give is not a negation

@svlandeg
Copy link
Member

Great, thanks for posting this solution to your specific usage example! That'll be useful for others finding this thread :-)

@github-actions
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements feat / matcher Feature: Token, phrase and dependency matcher
Projects
None yet
Development

No branches or pull requests

2 participants