Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add negations to all text predicates #2559

Merged
merged 1 commit into from
Apr 24, 2021

Conversation

WheresMyStapler
Copy link
Contributor

  • Add negations to all Text predicates
  • Implements negation predicates for Elastic index backend
  • Adds new predicate Text.CONTAINS_PHRASE, which exposes Elastic's match_phrase query
  • Update Solr and Lucene to respect equivalent max edit distances as JanusGraph's Text.CONTAINS_FUZZY.evaluateRaw(...) and Elastic's fuzziness:"AUTO"

Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 5, 2021

CLA Signed

The committers are authorized under a signed CLA.

@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Apr 5, 2021
@WheresMyStapler WheresMyStapler force-pushed the text-predicate-negations branch 2 times, most recently from 92a8cd4 to e1d748a Compare April 5, 2021 23:21
Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the huge contribution! Can you please sign the CLA?
I only did a quick pass on the PR, but in general, looks good to me. Left a few comments/questions:

@WheresMyStapler WheresMyStapler force-pushed the text-predicate-negations branch 2 times, most recently from 90186c6 to 2a4cc05 Compare April 6, 2021 18:53
@WheresMyStapler WheresMyStapler force-pushed the text-predicate-negations branch from 2a4cc05 to 841f7f6 Compare April 10, 2021 14:31
Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM 👍

Copy link
Contributor

@farodin91 farodin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just on question out of curiosity: Wouldn´t be better to just wrap the predicate into a not(...)?

@@ -0,0 +1,474 @@
// Copyright 2017 JanusGraph Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2011

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this, updated the Copyright on the new file to 2021.

@li-boxuan
Copy link
Member

Wouldn´t be better to just wrap the predicate into a not(...)?

@farodin91 See discussions in #1868. Briefly speaking, not(has(key, predicate)) is not semantically equivalent to has(key, ~predicate).

@farodin91
Copy link
Contributor

Wouldn´t be better to just wrap the predicate into a not(...)?

@farodin91 See discussions in #1868. Briefly speaking, not(has(key, predicate)) is not semantically equivalent to has(key, ~predicate).

https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates
It though of putting a not around the predicate.

@li-boxuan
Copy link
Member

li-boxuan commented Apr 12, 2021

https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates

I see it mentions that not(predicate) is also a predicate. It doesn't say not(predicate) == negation of predicate, if I am understanding it correctly.

UPDATE:

After seeing the latest reply from @WheresMyStapler, I realized I might have understood @farodin91 wrong. I was thinking about the case not(has(key, textContains('foo')).

@WheresMyStapler
Copy link
Contributor Author

Wouldn´t be better to just wrap the predicate into a not(...)?

@farodin91 See discussions in #1868. Briefly speaking, not(has(key, predicate)) is not semantically equivalent to has(key, ~predicate).

https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates
It though of putting a not around the predicate.

@farodin91, wrapping the predicate in the not() raises an UnsupportedException, because the tinkerpop code eventually calls .negate() on the predicate (in v0.5.3):

gremlin> g.V().has('fullTextIndexedField', not(textContains('foo')))
java.lang.UnsupportedOperationException
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.UnsupportedOperationException
	at org.janusgraph.core.attribute.Text.negate(Text.java:312)
	at org.janusgraph.core.attribute.Text$1.negate(Text.java:45)
	at org.apache.tinkerpop.gremlin.process.traversal.P.negate(P.java:98)
	at org.apache.tinkerpop.gremlin.process.traversal.P.not(P.java:257)
	at org.apache.tinkerpop.gremlin.process.traversal.P$not.callStatic(Unknown Source)

Since this PR implements .negate for all the text predicates, this now functions:

gremlin> g.V().has('fullTextIndexedField', not(textContains('foo')))
==>v[835784]
gremlin> g.V().has('fullTextIndexedField', not(textContains('foo'))).explain()
==>Traversal Explanation
============================================================================================================
Original Traversal                          [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

ConnectiveStrategy                    [D]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
MatchPredicateStrategy                [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
EarlyLimitStrategy                    [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
IncidentToAdjacentStrategy            [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
RepeatUnrollStrategy                  [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
PathRetractionStrategy                [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
CountStrategy                         [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
FilterRankingStrategy                 [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
InlineFilterStrategy                  [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
AdjacentToIncidentStrategy            [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
LazyBarrierStrategy                   [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
JanusGraphIoRegistrationStrategy      [P]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]
JanusGraphStepStrategy                [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]
AdjacentVertexFilterOptimizerStrategy [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]
AdjacentVertexHasIdOptimizerStrategy  [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]
AdjacentVertexIsOptimizerStrategy     [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]
JanusGraphLocalQueryOptimizerStrategy [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]
ProfileStrategy                       [F]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]
StandardVerificationStrategy          [V]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

Final Traversal                             [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

But, only when using Elasticsearch, or in-memory if full scans are enabled.

@porunov
Copy link
Member

porunov commented Apr 14, 2021

Thank you @WheresMyStapler !
Will you be able to sign EasyCLA so that we could merge your contributions?
https://api.easycla.lfx.linuxfoundation.org/v2/repository-provider/github/sign/1543822/77385607/2559/#/?version=1

@WheresMyStapler
Copy link
Contributor Author

WheresMyStapler commented Apr 14, 2021

Thank you @WheresMyStapler !
Will you be able to sign EasyCLA so that we could merge your contributions?
https://api.easycla.lfx.linuxfoundation.org/v2/repository-provider/github/sign/1543822/77385607/2559/#/?version=1

Yes! Working on it now, hopefully not much longer. Resolved

@WheresMyStapler WheresMyStapler force-pushed the text-predicate-negations branch from 841f7f6 to bc65403 Compare April 16, 2021 18:37
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @WheresMyStapler !

@porunov porunov added this to the Release v0.6.0 milestone Apr 24, 2021
@porunov porunov requested a review from farodin91 April 24, 2021 10:54
@porunov
Copy link
Member

porunov commented Apr 24, 2021

@farodin91 Will you review above answers from @WheresMyStapler ?

@farodin91
Copy link
Contributor

Wouldn´t be better to just wrap the predicate into a not(...)?

@farodin91 See discussions in #1868. Briefly speaking, not(has(key, predicate)) is not semantically equivalent to has(key, ~predicate).

https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates

It though of putting a not around the predicate.

@farodin91, wrapping the predicate in the not() raises an UnsupportedException, because the tinkerpop code eventually calls .negate() on the predicate (in v0.5.3):


gremlin> g.V().has('fullTextIndexedField', not(textContains('foo')))

java.lang.UnsupportedOperationException

Type ':help' or ':h' for help.

Display stack trace? [yN]y

java.lang.UnsupportedOperationException

	at org.janusgraph.core.attribute.Text.negate(Text.java:312)

	at org.janusgraph.core.attribute.Text$1.negate(Text.java:45)

	at org.apache.tinkerpop.gremlin.process.traversal.P.negate(P.java:98)

	at org.apache.tinkerpop.gremlin.process.traversal.P.not(P.java:257)

	at org.apache.tinkerpop.gremlin.process.traversal.P$not.callStatic(Unknown Source)

Since this PR implements .negate for all the text predicates, this now functions:


gremlin> g.V().has('fullTextIndexedField', not(textContains('foo')))

==>v[835784]

gremlin> g.V().has('fullTextIndexedField', not(textContains('foo'))).explain()

==>Traversal Explanation

============================================================================================================

Original Traversal                          [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]



ConnectiveStrategy                    [D]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

MatchPredicateStrategy                [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

EarlyLimitStrategy                    [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

IncidentToAdjacentStrategy            [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

RepeatUnrollStrategy                  [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

PathRetractionStrategy                [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

CountStrategy                         [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

FilterRankingStrategy                 [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

InlineFilterStrategy                  [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

AdjacentToIncidentStrategy            [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

LazyBarrierStrategy                   [O]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

JanusGraphIoRegistrationStrategy      [P]   [GraphStep(vertex,[]), HasStep([fullTextIndexedField.textNotContains(foo)])]

JanusGraphStepStrategy                [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

AdjacentVertexFilterOptimizerStrategy [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

AdjacentVertexHasIdOptimizerStrategy  [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

AdjacentVertexIsOptimizerStrategy     [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

JanusGraphLocalQueryOptimizerStrategy [P]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

ProfileStrategy                       [F]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

StandardVerificationStrategy          [V]   [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]



Final Traversal                             [JanusGraphStep([],[fullTextIndexedField.textNotContains(foo)])]

But, only when using Elasticsearch, or in-memory if full scans are enabled.

That's great. Thank you implementing this feature.

@porunov porunov merged commit 4228d4b into JanusGraph:master Apr 24, 2021
@WheresMyStapler WheresMyStapler deleted the text-predicate-negations branch April 28, 2021 19:50
@Override
public boolean test(Object value, Object condition) {
this.preevaluate(value, condition);
return value != null && evaluateRaw(value.toString(), (String) condition);
Copy link
Contributor Author

@WheresMyStapler WheresMyStapler Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porunov / @li-boxuan / @farodin91

Sorry to bring you all back to this PR, but after using this some more locally, I ran into a case that was not covered here regarding null values / missing properties:

gremlin> g.V().has('missingField', textNotRegex('.*'))
==>v[86040]
gremlin> g.V(86040L).propertyMap()
==>[]
gremlin> g.V(86040L).has('missingField', textNotRegex('.*'))
gremlin>

Should the negation tests remove the != null tests, or just simply short circuit if null?

return value == null || evaluateRaw(value.toString(), (String) condition);

The question I really have is how to interpret the tinkerpop docs:

has(key,predicate): Remove the traverser if its element does not have a key value that satisfies the bi-predicate. For more information on predicates

Since the key doesnt have a value at all, which predicate evaluation is correct? ES or JG?

Copy link
Contributor Author

@WheresMyStapler WheresMyStapler Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and created a new PR for this change (#2593). If you think .has(X, Y) assumes .has(X).has(X, Y), feel free to close it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gremlin> g.V().has('missingField', textNotRegex('.*'))
==>v[86040]

I believe above is wrong. I actually suspect this will fail for TinkerPop test suite, since I don't believe this is TinkerPop-aligned. A related discussion is on #1868

My understanding is:
g.V().has('missingField', textNotRegex('.*')) == g.V().has('missingField', not(textRegex('.*'))) == g.V().has('missingField').has('missingField', not(textRegex('.*'))) != g.V().not(has('missingField', textRegex('.*')))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. I updated my new PR to change the negated query in ES from mustNot(predicate) to must(exists(field), mustNot(predicate)), which aligns the predicate evaluations:

gremlin> g.V().has('missingField', textNotRegex('.*'))
gremlin>
gremlin> g.V(86040L).propertyMap()
==>[]
gremlin> g.V(86040L).has('missingField', textNotRegex('.*'))
gremlin>

FlorianHockmann added a commit to FlorianHockmann/janusgraph that referenced this pull request Feb 28, 2024
The predicates added in JanusGraph#2559 were not supported by
`JanusGraphPSerializer` and could thus not be used when connecting via
remote to JanusGraph Server.

Fixes JanusGraph#4275

Signed-off-by: Florian Hockmann <[email protected]>
FlorianHockmann added a commit that referenced this pull request Mar 4, 2024
The predicates added in #2559 were not supported by
`JanusGraphPSerializer` and could thus not be used when connecting via
remote to JanusGraph Server.

Fixes #4275

Signed-off-by: Florian Hockmann <[email protected]>
janusgraph-automations pushed a commit that referenced this pull request Mar 4, 2024
The predicates added in #2559 were not supported by
`JanusGraphPSerializer` and could thus not be used when connecting via
remote to JanusGraph Server.

Fixes #4275

Signed-off-by: Florian Hockmann <[email protected]>
(cherry picked from commit 5b906fe)
FlorianHockmann added a commit that referenced this pull request Mar 5, 2024
The predicates added in #2559 were not supported by
`JanusGraphPSerializer` and could thus not be used when connecting via
remote to JanusGraph Server.

Fixes #4275

Signed-off-by: Florian Hockmann <[email protected]>
(cherry picked from commit 5b906fe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants