Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not step doesn't look in Index #1868

Closed
jakubclark opened this issue Oct 24, 2019 · 14 comments
Closed

Not step doesn't look in Index #1868

jakubclark opened this issue Oct 24, 2019 · 14 comments

Comments

@jakubclark
Copy link

jakubclark commented Oct 24, 2019

In JanusGraph .not(__.has("pname", "value")) after an or step, is not processed correctly by the query planner.
This issue was first described here: #163 (comment).

For example:

The query bellow does not use an index, since"query.force-index" is set to "true" and thus, results in an exception:

g.V().or(__.has("pname1","value1"),__.has("pname2","value2")).not(__.has("props","value3"))

The query bellow works just fine:

g.V().or(__.has("pname1","value1"),__.has("pnam2","value2")).has("props",P.neq("value3"))

Those queries are semantically the same. But the execution is not the same.
The first version tries to do a full-graph scan, which is disabled in our setup.
The second query using P.neq is able to use the index.

The Not step does not use the index, if it occurs just after an Or step.

We have a mixed index with all the properties used in the query.

  • Version: 0.3.2
  • Storage Backend: Cassandra/ScyllaDB
  • Mixed Index Backend: Elasticsearch
  • Steps to Reproduce: Execute the following query in the Gremlin console:
g.V().or(__.has("pname1","value1"),__.has("pname2","value2")).not(__.has("props","value3"))

Stack Trace

gremlin> g.V().or(__.has("pname1","value1"),__.has("pname2","value2")).not(__.has("props","value3"))
Could not find a suitable index to answer graph query and graph scans are disabled: [()]:VERTEX
Type ':help' or ':h' for help.
Display stack trace? [yN]y
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Could not find a suitable index to answer graph query and graph scans are disabled: [()]:VERTEX
	at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:178)
	at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:99)
	at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
	at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:130)
	at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:89)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:146)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:453)
@jakubclark jakubclark changed the title Not step doesn't look in Index Not step doesn't look in Index, after an Or step Oct 24, 2019
@albertlockett2
Copy link

Jakub, why did you change the title from "Not step doesn't look in Index" to "Not step doesn't look in Index, after an Or step". Is there a way to get it to hit the index using a not() step without the or() step before it?

@jakubclark
Copy link
Author

@albertlockett2 you're right, I've updated the title. I don't think I tried the following when I originally opened the issue:

gremlin> g.V().has("pname", "value")
<(No error)>
gremlin> g.V().not(__.has("pname", "value"))
Could not find a suitable index to answer graph query and graph scans are disabled: [()]:VERTEX
Type ':help' or ':h' for help.
Display stack trace? [yN]y
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Could not find a suitable index to answer graph query and graph scans are disabled: [()]:VERTEX
	at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:178)
	at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:99)
	at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
	at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:130)
	at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:89)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:146)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:453)

@jakubclark jakubclark changed the title Not step doesn't look in Index, after an Or step Not step doesn't look in Index Nov 12, 2019
@albertlockett2
Copy link

Thanks Jakub. I was digging through the code on the weekend a bit. It looks like there's support for it in the ElasticSearchIndex class, in the getFilter method:

    public Map<String,Object> getFilter(Condition<?> condition, KeyInformation.StoreRetriever information) {
...
        } else if (condition instanceof Not) {
            return compat.boolMustNot(getFilter(((Not) condition).getChild(),information));

But that "not" condition doesn't get injected into the IndexQuery object.

I didn't know which class should be responsible for that. Maybe JanusGraphStepStrategy should be responsible for setting it on JanusGraphStep, and the GraphCentricQueryBuilder should handle it?

Can anyone who's "in the know" confirm that?

@li-boxuan
Copy link
Member

not(__.has("props","value3")) and has("props",P.neq("value3")) don't sound semantically equivalent to me.

@rafaelcaricio
Copy link

@li-boxuan Do you care to explain how they are different from your perspective?

@li-boxuan
Copy link
Member

li-boxuan commented Sep 18, 2020

not(__.has("props","value3")) means it does not have props, OR its value is not value3
has("props",P.neq("value3")) means it has props, AND the value is not value3

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property("prop", "value")
==>v[0]
gremlin> g.addV().property("prop2", "value2")
==>v[2]
gremlin> g.V().not(__.has("prop", "dummy"))
==>v[0]
==>v[2]
gremlin> g.V().has("prop", P.neq("dummy"))
==>v[0]

I admit using the above example, JanusGraph shows deviated behavior:

gremlin> graph = JanusGraphFactory.open("inmemory")
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin> g.addV().property("prop", "value")
==>v[4304]
gremlin> g.addV().property("prop2", "value2")
==>v[4256]
gremlin> graph.tx().commit()
==>null
gremlin>
gremlin> g.V().not(__.has("prop", "dummy"))
23:21:24 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>v[4256]
==>v[4304]
gremlin>
gremlin> g.V().has("prop", P.neq("dummy"))
23:21:28 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(prop <> dummy)]. For better performance, use indexes
==>v[4256]
==>v[4304]

which is a bug as reported in #2205

However, when using index, JanusGraph follows the correct semantic:

gremlin> graph = JanusGraphFactory.open("conf/janusgraph-cql-es.properties")
23:24:30 WARN  com.datastax.driver.core.NettyUtil  - Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin>
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@5a7b6b75
gremlin> propKey = mgmt.makePropertyKey('prop').dataType(String.class).make()
==>prop
gremlin> mgmt.buildIndex("prop", Vertex.class).addKey(propKey, Mapping.STRING.asParameter()).buildMixedIndex("search")
==>prop
gremlin> mgmt.commit()
==>null
gremlin>
gremlin>
gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g.addV().property("prop", "value")
==>v[4256]
gremlin> g.addV().property("prop2", "value2")
==>v[4312]
gremlin> graph.tx().commit()
==>null
gremlin>
gremlin> g.V().not(__.has("prop", "dummy"))
23:24:40 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>v[4256]
==>v[4312]
gremlin>
gremlin> g.V().has("prop", P.neq("dummy"))
==>v[4256]

@jakubclark
Copy link
Author

Thanks for the explanation @li-boxuan, it clears up some of the confusion, between these 2 similar filters.

@albertlockett2
Copy link

What about steps that don't have an equivalent not predicate like Text.textRegex?

If I'm using Elasticsearch for indexing solution, I would be able to add a not clause to a bool query

gremlin> g.V().has('MyProperty', textRegex('.*MyValue.*')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],MyProperty.textRegex(.*MyValue.*)])                      32          32         409.487   100.00
    \_condition=(MyProperty textRegex .*MyValue.*)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=[(MyProperty textRegex .*MyValue.*)]:myElasticIndex
    \_index=myElasticIndex
    \_index_impl=MyGraphName
  optimization                                                                                 0.032
  optimization                                                                                51.233
  backend-query                                                       32                     357.152
    \_query=myElasticIndex:[(MyProperty MyValue.*Data.*)]:myElasticIndex
                                            >TOTAL                     -           -         409.487        -

gremlin> g.V().not(has('MyProperty', textRegex('.*MyValue.*'))).profile()
[main] WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes

@li-boxuan
Copy link
Member

li-boxuan commented Sep 28, 2020

@albertlockett2

Have you tried the complement operator? E.g. @&~(.*MyValue.*)
ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html#_valid_values

If the above does not work well for your use case, you could open up a new feature request issue. Probably we could support syntax like
g.V().has('MyProperty', notTextRegex('.*MyValue.*'))

@albertlockett2
Copy link

@li-boxuan yeah that can work too. My thinking was that from an application developer's perspective not(has("MyProperty", textRegex(".*MyValue.*")) does a better job at abstracting the underlying index implementation than has("MyProperty", textRegex("@&~(.*MyValue.*)").

I guess so does g.V().has('MyProperty', notTextRegex('.*MyValue.*')) but if I were going to open a feature request it would be to have the not() step look in the index instead of another operator. Is the idea that it's more trouble than it's worth seeing as the new operator is probably easier to implement?

@li-boxuan
Copy link
Member

li-boxuan commented Sep 28, 2020

@albertlockett2 It’s not about which way is easier to implement. Similar to what I described at #1868 (comment), not(has(“prop”, Text.textRegex(...))) means either a vertex does not have prop OR its prop value does not satisfy given regex. In other words, it’s the complement of the has clause, not just complement of the regex part.

@albertlockett2
Copy link

@li-boxuan Ok fair enough but for the record not(has("prop", Text.textRegex(...)) could also be satisfied by the ES index using a bool query and with an exists query inside it could it not?

@li-boxuan
Copy link
Member

@albertlockett2 If a vertex/edge does not have “prop” property, then it will not be indexed. If we use Regex query in ES, it returns vertices with “prop” and whose values match regex. If we use exists query in ES, it returns vertices with “prop”. If your goal is to find vertices with “prop” but not matches regex, I think you can write query in this way:

.has(“prop”).not(has(“prop”, Text.textRegex(...))

However, the above query still does not use index, but it can be a nice feature for JanusGraph to support.

For the not(has(“prop”, Text.textRegex(...)) query, I don’t see any chance to avoid a full scan. This query is not what you want anyways, since it has a different semantics meaning.

@li-boxuan
Copy link
Member

What about steps that don't have an equivalent not predicate like Text.textRegex

@albertlockett2 FYI there is a PR to add this support: #2559

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants