Search Predicates: match keywords with multiple properties using Or-step #940

myitroad · 2018-02-28T05:54:08Z

Hi there,

There are some situation that confused me, When I use Text Predicate search to match multiple properties in JanusGraph.

Specification, I desire to match keywords with multiple properties. For example, match keywords with both properties moviename and rdfs:label, I write statements with Gremlin or-step, just show as below:

g.V().has('moviename',Text.textContains('英雄'))
g.V().has('rdfs:label',Text.textContains('英雄'))
g.V().where(has('moviename',Text.textContains('英雄')).or().has('rdfs:label',Text.textContains('英雄')))

In expectation, the 3rd statement generate a union set of the 1st and 2nd statement. But, in practice, the 3rd return null.
Statement execute result posted as follows:

Simple text search

gremlin> g.V().has('rdfs:label',Text.textContains('英雄'))
==>v[2240616]
==>v[2289712]
==>v[2424936]
==>v[2416688]

gremlin> g.V().has('moviename',Text.textContains('英雄'))
==>v[2240616]
==>v[2416688]
==>v[2289712]
==>v[2424936]

Text search with or-step

gremlin> g.V().where(has('moviename',Text.textContains('英雄')).or().has('rdfs:label',Text.textContains('英雄')))
12:36:28 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

I wonder if the or-step is not compatible with text-predicate search.
Besides, if thers is any alternative ways that will reach my goals?

Thanks for your attention!

Supplementary - Component and version The janusgraph version is 0.2.0 released on 12 Oct 2017. The storage backend is HBase with version 1.1.2. The index backend is Elasticsearch with version 5.5.1.

JanusGraph schema
Mixed schema built before import data:

                               |           |                  |        |               |      rdfs:label |              ENABLED
rdfs:labele4c2                 |     Mixed |   JanusGraphEdge |  false | ontology-demo |                 |                     
                               |           |                  |        |               |      rdfs:label |              ENABLED

                               |           |                  |        |               |       moviename |              ENABLED
movienamee6b7                  |     Mixed |   JanusGraphEdge |  false | ontology-demo |                 |                     
                               |           |                  |        |               |       moviename |              ENABLED

Reference
TinkerPop3 Documentation or-step copied as below:

gremlin> g.V().or(
            __.outE('created'),
            __.inE('created').count().is(gt(1))).
              values('name')
==>marko
==>lop
==>josh
==>peter

statement profile

Profile of simple text search

gremlin> g.V().has('moviename',Text.textContains('英雄')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[moviename.textContains(英雄)])                        4           4           0.645   100.00
    \_condition=(moviename textContains 英雄)
    \_isFitted=true
    \_query=[(moviename textContains 英雄)]:movienamev1de1
    \_index=movienamev1de1
    \_orders=[]
    \_isOrdered=true
    \_index_impl=ontology-demo1
  optimization                                                                                 0.297
                                            >TOTAL                     -           -           0.645        -

Profile of text search with or-step

gremlin> g.V().where(has('moviename',Text.textContains('英雄')).or().has('rdfs:label',Text.textContains('英雄'))).profile()
12:41:27 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep(vertex,[])                                           3188        3188          29.421    43.36
    \_condition=()
    \_isFitted=false
    \_query=[]
    \_orders=[]
    \_isOrdered=true
  optimization                                                                                 0.020
  scan                                                                                         0.000
    \_condition=VERTEX
    \_query=[]
    \_fullscan=true
OrStep([[HasStep([moviename.textContains(英雄)]),...                                            38.426    56.64
  HasStep([moviename.textContains(英雄)])                                                       17.141
  HasStep([rdfs:label.textContains(英雄)])                                                      18.127
                                            >TOTAL                     -           -          67.848        -

statement explain

Explain of simple text search

gremlin> g.V().has('moviename',Text.textContains('英雄')).explain()
==>Traversal Explanation
=========================================================================================================
Original Traversal                          [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]

ConnectiveStrategy                    [D]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
MatchPredicateStrategy                [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
FilterRankingStrategy                 [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
InlineFilterStrategy                  [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
IncidentToAdjacentStrategy            [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
AdjacentToIncidentStrategy            [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
RepeatUnrollStrategy                  [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
RangeByIsCountStrategy                [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
PathRetractionStrategy                [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
LazyBarrierStrategy                   [O]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
AdjacentVertexFilterOptimizerStrategy [P]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
JanusGraphLocalQueryOptimizerStrategy [P]   [GraphStep(vertex,[]), HasStep([moviename.textContains(英雄)])]
JanusGraphStepStrategy                [P]   [JanusGraphStep([],[moviename.textContains(英雄)])]
ProfileStrategy                       [F]   [JanusGraphStep([],[moviename.textContains(英雄)])]
StandardVerificationStrategy          [V]   [JanusGraphStep([],[moviename.textContains(英雄)])]

Final Traversal                             [JanusGraphStep([],[moviename.textContains(英雄)])]

Explain of text search with or-step

gremlin> g.V().where(has('moviename',Text.textContains('英雄')).or().has('rdfs:label',Text.textContains('英雄'))).explain()
==>Traversal Explanation
=========================================================================================================================================================
Original Traversal                          [GraphStep(vertex,[]), TraversalFilterStep([HasStep([moviename.textContains(英雄)]), OrStep, HasStep([rdfs:labe
                                               l.textContains(英雄)])])]

ConnectiveStrategy                    [D]   [GraphStep(vertex,[]), TraversalFilterStep([OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:l
                                               abel.textContains(英雄)])]])])]
MatchPredicateStrategy                [O]   [GraphStep(vertex,[]), TraversalFilterStep([OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:l
                                               abel.textContains(英雄)])]])])]
FilterRankingStrategy                 [O]   [GraphStep(vertex,[]), TraversalFilterStep([OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:l
                                               abel.textContains(英雄)])]])])]
InlineFilterStrategy                  [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
IncidentToAdjacentStrategy            [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
AdjacentToIncidentStrategy            [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
RepeatUnrollStrategy                  [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
RangeByIsCountStrategy                [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
PathRetractionStrategy                [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
LazyBarrierStrategy                   [O]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
AdjacentVertexFilterOptimizerStrategy [P]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
JanusGraphLocalQueryOptimizerStrategy [P]   [GraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContains(英雄)
                                               ])]])]
JanusGraphStepStrategy                [P]   [JanusGraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContain
                                               s(英雄)])]])]
ProfileStrategy                       [F]   [JanusGraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContain
                                               s(英雄)])]])]
StandardVerificationStrategy          [V]   [JanusGraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContain
                                               s(英雄)])]])]

Final Traversal                             [JanusGraphStep(vertex,[]), OrStep([[HasStep([moviename.textContains(英雄)])], [HasStep([rdfs:label.textContain
                                               s(英雄)])]])]

The text was updated successfully, but these errors were encountered:

myitroad · 2018-02-28T06:02:40Z

supplement:
“英雄” is Chinese characters, just see it as keywords.

myitroad · 2018-02-28T08:27:02Z

I have find one alternative method, which use regex match replace contains match.

The statement as follows:

g.V().has('moviename',Text.textContains('英雄'))
g.V().has('rdfs:label',Text.textContains('英雄'))
g.V().or(__.has('moviename',Text.textContainsRegex('.*英雄.*')),__.has('rdfs:label',Text.textContainsRegex('.*英雄.*'))).dedup()

Looking forward more efficient methods.

Reference
Gremlin query search by search key across multiple vertex properties.

pluradj · 2018-02-28T16:51:56Z

In your schema description, it looks like the mixed indexes were created with Edge.class instead of Vertex.class. This seems incorrect based on the vertex-based queries you are using, so it could explain why your or() query isn't returning a result. If that doesn't solve the problem, some example data that reproduces your result would be helpful. Your queries worked fine in a simple test.

The reason you are seeing the WARN message is being tracked with this issue #163

pluradj · 2018-02-28T17:15:32Z

This question also sounds remarkably similar to #922, and it seems like there might be something related involved.

myitroad · 2018-03-01T02:25:06Z

Thank you for your time!
I carefully reviewed all steps you wrote, and two things have been identified:

First, the index for property, followed by JanusGraph docs 9.1.2. Mixed Index, should be created with Vertex.class, and it's works fine while simple query with only one Has-step, such as g.V().has('moviename',Text.textContains('英雄')).
Second,steps you wrote works fine in my environment. But in some test text, the Or-step with Text.textContains still return null. Just remove the space from the text will reproduce this problems.
Details are as follows:

Add vertices

gremlin> g.addV().property(moviename, 'O英雄').next()
==>v[8360]
gremlin> g.tx().commit()
==>null
gremlin> g.addV().property(rdfs_label, 'N英雄').next()
==>v[8272]
gremlin> g.tx().commit()
==>null

Simple textContains search

gremlin> g.V().has(moviename,Text.textContains('英雄')).valueMap(true)
==>[label:vertex,id:8360,moviename:[O英雄]]
gremlin> g.V().has(rdfs_label,Text.textContains('英雄')).valueMap(true)
==>[label:vertex,rdfs:label:[N英雄],id:8272]

gremlin> g.V().has(moviename,Text.textContains('O')).valueMap(true)
==>[label:vertex,id:8360,moviename:[O英雄]]
gremlin> g.V().has(rdfs_label,Text.textContains('N')).valueMap(true)
==>[label:vertex,rdfs:label:[N英雄],id:8272]

Or-step with textContains

gremlin> g.V().where(has(moviename,Text.textContains('英雄')).or().has(rdfs_label,Text.textContains('英雄'))).valueMap(true)
09:25:55 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
gremlin> g.V().or(has(moviename,Text.textContains('英雄')), has(rdfs_label,Text.textContains('英雄'))).valueMap(true)
09:26:09 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

gremlin> g.V().where(has(moviename,Text.textContains('O')).or().has(rdfs_label,Text.textContains('N'))).valueMap(true)
09:17:29 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
gremlin> g.V().or(has(moviename,Text.textContains('O')), has(rdfs_label,Text.textContains('N'))).valueMap(true)
09:17:35 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

Or-step with textContains (Exact match)

gremlin> g.V().where(has(moviename,Text.textContains('O英雄')).or().has(rdfs_label,Text.textContains('N英雄'))).valueMap(true)
09:17:55 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[label:vertex,rdfs:label:[N英雄],id:8272]
==>[label:vertex,id:8360,moviename:[O英雄]]
gremlin> g.V().or(has(moviename,Text.textContains('O英雄')), has(rdfs_label,Text.textContains('N英雄'))).valueMap(true)
09:18:03 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[label:vertex,rdfs:label:[N英雄],id:8272]
==>[label:vertex,id:8360,moviename:[O英雄]]

pluradj · 2018-03-01T16:45:31Z

gremlin> g.V().has(moviename,Text.textContains('英雄')).valueMap(true)
==>[label:vertex,id:8360,moviename:[O英雄]]
gremlin> g.V().has(moviename,Text.textContains('O')).valueMap(true)
==>[label:vertex,id:8360,moviename:[O英雄]]

Actually, this seems like it is not working as intended. textContains is supposed to match on exact words in the tokenized string. It is not supposed to match on a partial contains as shown above. Also note this particular behavior:

JanusGraph’s default tokenization splits the string on non-alphanumeric characters and removes any tokens with less than 2 characters.

Based on that, Text.textContains('O') actually should not return any results.

myitroad · 2018-03-02T03:25:12Z

I have learnt JanusGraph default tokenization method, witch string less than 2-characters will be ignored.
Perhaps this problem is related to string tokenization strategy. And, I hope to find some alternative ways to resovle my problems.
Thank you for your kind reply.

pluradj · 2018-03-02T19:47:57Z

Similar to what I described on 922, you could use textContainsRegex for partial matches on the tokens, if that's what you are trying to accomplish. If you think #922 duplicates what you are reporting here, please go ahead and close this issue.

pluradj · 2018-03-03T00:22:22Z

You might need to investigate using Elasticsearch Analysis Plugins to properly tokenize your target language. I don't think the default configuration can handle your character set correctly.

chupman · 2019-02-07T17:32:18Z

To prevent confusion we have recently added a default template for new issues containing the guidelines as to what belongs in issues. Usage, configuration, and general questions should be asked in gitter, stackoverflow, or the janusgraph-users google group. Github issues are for reporting bugs, requesting new features, and tracking the development of JanusGraph. If your issue is still outstanding please consult one of the communities mentioned. If you still feel like your issue belongs here and was closed in error please feel free to repoen it.

chupman closed this as completed Feb 7, 2019

FlorianHockmann mentioned this issue Feb 18, 2019

Document configuration for non-European languages for Index Backends #1423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Predicates: match keywords with multiple properties using Or-step #940

Search Predicates: match keywords with multiple properties using Or-step #940

myitroad commented Feb 28, 2018

myitroad commented Feb 28, 2018

myitroad commented Feb 28, 2018

pluradj commented Feb 28, 2018

pluradj commented Feb 28, 2018

myitroad commented Mar 1, 2018

pluradj commented Mar 1, 2018 •

edited

Loading

myitroad commented Mar 2, 2018

pluradj commented Mar 2, 2018

pluradj commented Mar 3, 2018

chupman commented Feb 7, 2019

Search Predicates: match keywords with multiple properties using Or-step #940

Search Predicates: match keywords with multiple properties using Or-step #940

Comments

myitroad commented Feb 28, 2018

statement profile

statement explain

myitroad commented Feb 28, 2018

myitroad commented Feb 28, 2018

pluradj commented Feb 28, 2018

pluradj commented Feb 28, 2018

myitroad commented Mar 1, 2018

pluradj commented Mar 1, 2018 • edited Loading

myitroad commented Mar 2, 2018

pluradj commented Mar 2, 2018

pluradj commented Mar 3, 2018

chupman commented Feb 7, 2019

pluradj commented Mar 1, 2018 •

edited

Loading