You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, I've been using textacy for at least a few months and it has helped me make significant progress on a few projects I'm working on! The subject_verb_object_triples() method is what I'm most interested in for knowledge extraction.
My current use case is looking at subjects and verbs, along with coreference resolution provided by coreferee, to accumulate knowledge of what people are doing prior to certain events taking place. I'm encountering the following issue in the latest version, 0.11.0:
steps to reproduce
importspacyimporttextacynlp=spacy.load('en_core_web_trf') # or en_core_web_sm, en_core_web_lgdoc=nlp("A woman walked to the store.")
svos=textacy.extract.triples.subject_verb_object_triples(doc)
forsvoinsvos:
print(svo)
expected vs. actual behavior
Expected: Output contains a SVOTriple with a (woman, walked, store) triple. Actual: Output is empty, no svo triples are detected []
# prepositional object acting as agent of passive verbeliftok.dep==pobj:
ifhead.dep==agentandhead.head.pos==VERB:
verb_sos[head.head]["objects"].update(expand_noun(tok))
When the token in the loop reaches tok = store, head.head.pos == VERB is TRUE, but head.dep == agent is FALSE, hence the object "store" is not added to the verb_sos data. head.dep is prep in this case, not agent.
On my end I can circumvent this by disabling the head.dep == agent check or expanding it to allow [agent, prep]. However, I'm wondering if the workaround should in fact be incorporated into textacy. Was the prep case missed, or perhaps you were encountering false positives when it was included?
I may be misremembering, but I thought in previous versions of textacy SVOTriples were allowed even if the object was missing. Would you consider adding an optional parameter to the function, def subject_verb_object_triples(doclike, allow_empty_objects=False) so pairs with subjects and verbs can still be extracted?
For example: He laughed at me. -> SVOTriple (he, laughed, me)
But with no object, an SVOTriple with an empty object may still be useful: He laughed. -> SVOTriple (he, laughed, None)
context
Extracting this simple SVO triple provides important information about what the subject was doing prior to a certain incident taking place. In this case, prior to arriving to the store, the woman was walking. It also helps identify the means of transportation to the store, e.g. walk, took a bus, drove to the store, etc.
I've also noticed that objects from within prep phrases don't get returned. According to the comments in lines 79 - 82, though, that's only looking for agents of passive verbs (like, 'the ball was thrown by the boy'). If I disable the agent check or allow prepositions, it lets a little too much through-- in the test 'she sells sea shells by the sea shore' it returns [sea, shells, sea, shore]. I'm not sure what the desired behavior here is, otherwise I'd just open a PR. Would we want to ignore prepositional phrases if there is another object already identified, or return all direct and indirect objects? I think my ideal would be including the preposition with the verb but in multiple triples.. so for example, 'she sells sea shells by the sea shore' would give us [[she], [sells], [sea, shells]] and [[she], [sells, by], [sea, shore]].
I have since resolved my issue by slightly changing the code to fit my project's purpose, but figured I would open this ticket to ask around and document what I found.
Based on your examples (she sells sea shells by the sea shore), it would be cool to have this function take an optional parameter to expand or reduce the scope of what objects we want to allow through in the SVO triples. I imagine there would be backwards compatibility concerns updating the function outright, but the function could provide the option of being more permissive to allow SVOs based on the examples you and I have provided.
Hi there, I've been using textacy for at least a few months and it has helped me make significant progress on a few projects I'm working on! The
subject_verb_object_triples()
method is what I'm most interested in for knowledge extraction.My current use case is looking at subjects and verbs, along with coreference resolution provided by coreferee, to accumulate knowledge of what people are doing prior to certain events taking place. I'm encountering the following issue in the latest version, 0.11.0:
steps to reproduce
expected vs. actual behavior
Expected: Output contains a
SVOTriple
with a(woman, walked, store)
triple.Actual: Output is empty, no svo triples are detected
[]
possible solution?
I debugged https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/extract/triples.py on my end to determine what information was being captured and not. Here's what I found.
Here's the document and its dependencies:
It's a simple sentence with a nominal subject, verb, and prepositional object.
The verb and nsubj are found, but the following lines prevent "store" from being added as the object https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/extract/triples.py#L79-L82:
Lines 79 - 82:
When the token in the loop reaches
tok
=store
,head.head.pos == VERB
is TRUE, buthead.dep == agent
is FALSE, hence the object "store" is not added to theverb_sos
data.head.dep
isprep
in this case, notagent
.On my end I can circumvent this by disabling the
head.dep == agent
check or expanding it to allow[agent, prep]
. However, I'm wondering if the workaround should in fact be incorporated into textacy. Was theprep
case missed, or perhaps you were encountering false positives when it was included?Since the object is not detected, https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/extract/triples.py#L108 prevents the SVOTriple from being returned:
I may be misremembering, but I thought in previous versions of textacy SVOTriples were allowed even if the object was missing. Would you consider adding an optional parameter to the function,
def subject_verb_object_triples(doclike, allow_empty_objects=False)
so pairs with subjects and verbs can still be extracted?For example:
He laughed at me.
-> SVOTriple (he, laughed, me)But with no object, an SVOTriple with an empty object may still be useful:
He laughed.
-> SVOTriple (he, laughed, None)context
Extracting this simple SVO triple provides important information about what the subject was doing prior to a certain incident taking place. In this case, prior to arriving to the store, the woman was walking. It also helps identify the means of transportation to the store, e.g. walk, took a bus, drove to the store, etc.
environment
spacy
version: 3.1.2spacy
models: en_core_web_trf, en_core_web_sm, en_core_web_lgtextacy
version: 0.11.0The text was updated successfully, but these errors were encountered: