-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support lookup indexScan using IN expression as filter #2906
Conversation
33dc9be
to
8a70911
Compare
How do you evaluate the performance problems that this pr may bring, such as index degradation. |
When profiling query: | ||
""" | ||
LOOKUP ON edge_1 WHERE edge_1.col1_str CONTAINS toLower("L") YIELD edge_1.col1_str | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete these cases in another pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not deletion but modification, and this is to adapt the change made in LOOKUP validator. It makes more sense to me to include this in this PR.
""" | ||
LOOKUP ON edge_1 WHERE edge_1.col1_str NOT CONTAINS toLower("L") YIELD edge_1.col1_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
8a70911
to
b296f9a
Compare
This PR is based on an assumption, which is multiple prefix index scan is superior to a single full-index scan. |
Codecov Report
@@ Coverage Diff @@
## master #2906 +/- ##
==========================================
+ Coverage 85.47% 85.56% +0.08%
==========================================
Files 1229 1229
Lines 110375 110651 +276
==========================================
+ Hits 94341 94674 +333
+ Misses 16034 15977 -57
Continue to review full report at Codecov.
|
4d17185
to
d05e508
Compare
…OOKUP except STARTS/NOT STARTS WITH
f23b916
to
0d04710
Compare
25ac033
to
a8ce12a
Compare
When executing query: | ||
""" | ||
LOOKUP ON edge_1 WHERE edge_1.col1_str ENDS WITH "ABC" YIELD edge_1.col1_str | ||
""" | ||
Then the result should be, in any order: | ||
| SrcVID | DstVID | Ranking | edge_1.col1_str | | ||
Then a SemanticError should be raised at runtime: Expression (edge_1.col1_str ENDS WITH "ABC") is not supported, please use full-text index as an optimal solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, maybe we could schedule index scan and full-text search automatically. Of course not in current PR.
// (a OR b) AND (c OR d) -> (a AND c) OR (a AND d) OR (b AND c) OR (b AND d) | ||
{ | ||
auto andExpr = LogicalExpression::makeAnd(pool, orExpr1, orExpr2); | ||
auto transformedExpr = ExpressionUtils::rewriteLogicalAndToLogicalOr(andExpr); | ||
|
||
std::vector<Expression *> orOperands = { | ||
LogicalExpression::makeAnd( | ||
pool, ConstantExpression::make(pool, 10), ConstantExpression::make(pool, "a")), | ||
LogicalExpression::makeAnd( | ||
pool, ConstantExpression::make(pool, 10), ConstantExpression::make(pool, "b")), | ||
LogicalExpression::makeAnd( | ||
pool, ConstantExpression::make(pool, 20), ConstantExpression::make(pool, "a")), | ||
LogicalExpression::makeAnd( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support expression as A and (B or C ) and D
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpressionUtils::rewriteLogicalAndToLogicalOr
supports the rewrite but this function is only called when the filter contains IN expr for now. We could integrate this into AND expr with OR expr operand later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PROFILE lookup on player where player.name < "zzzzz" and (player.age in[30, 42]) and player.name >= "Tim Duncan"
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
| id | name | dependencies | profiling data | operator info |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
| 3 | Project | 4 | ver: 0, rows: 2, execTime: 50us, totalTime: 56us | outputVar: [ |
| | | | | { |
| | | | | "colNames": [ |
| | | | | "VertexID" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Project_3" |
| | | | | } |
| | | | | ] |
| | | | | inputVar: __Filter_2 |
| | | | | columns: [ |
| | | | | "COLUMN[0] AS VertexID" |
| | | | | ] |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
| 4 | IndexScan | 0 | { | outputVar: [ |
| | | | ver: 0, rows: 2, execTime: 0us, totalTime: 13897us | { |
| | | | storage_detail: {DedupNode:8465(us),IndexOpuputNode:8258(us),IndexScanNode:7456(us),IndexVertexNode:7895(us),RelNode:16610(us),} | "colNames": [ |
| | | | "127.0.0.1":39535 exec/total: 8819(us)/13045(us) | "_vid", |
| | | | } | "player.age", |
| | | | | "player.name" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Filter_2" |
| | | | | } |
| | | | | ] |
| | | | | inputVar: |
| | | | | space: 1 |
| | | | | dedup: false |
| | | | | limit: 9223372036854775807 |
| | | | | filter: |
| | | | | orderBy: [] |
| | | | | schemaId: 2 |
| | | | | isEdge: false |
| | | | | returnCols: [ |
| | | | | "_vid", |
| | | | | "age", |
| | | | | "name" |
| | | | | ] |
| | | | | indexCtx: [ |
| | | | | { |
| | | | | "columnHints": [ |
| | | | | { |
| | | | | "endValue": "__EMPTY__", |
| | | | | "beginValue": "30", |
| | | | | "scanType": "PREFIX", |
| | | | | "column": "age" |
| | | | | } |
| | | | | ], |
| | | | | "index_id": 25, |
| | | | | "filter": "((player.age==30) AND (player.name<\"zzzzz\") AND (player.name>=\"Tim Duncan\"))" |
| | | | | }, |
| | | | | { |
| | | | | "columnHints": [ |
| | | | | { |
| | | | | "beginValue": "42", |
| | | | | "endValue": "__EMPTY__", |
| | | | | "scanType": "PREFIX", |
| | | | | "column": "age" |
| | | | | } |
| | | | | ], |
| | | | | "index_id": 25, |
| | | | | "filter": "((player.age==42) AND (player.name<\"zzzzz\") AND (player.name>=\"Tim Duncan\"))" |
| | | | | } |
| | | | | ] |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
| 0 | Start | | ver: 0, rows: 0, execTime: 2us, totalTime: 18us | outputVar: [ |
| | | | | { |
| | | | | "colNames": [], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Start_0" |
| | | | | } |
| | | | | ] |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
return orExpr; | ||
} | ||
|
||
Expression *ExpressionUtils::rewriteLogicalAndToLogicalOr(const Expression *expr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use this method?
ExpressionUtils::expandExpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I missed it. Seems ExpressionUtils::expandExpr
has the functionality but is implemented in a recursive way. My consideration is that this may cause stack overflow when the number of elements in the list of IN expr becomes large.
Also, I checked the code base and did see this function was called anywhere. Could we consider removing it just for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I missed it. Seems
ExpressionUtils::expandExpr
has the functionality but is implemented in a recursive way. My consideration is that this may cause stack overflow when the number of elements in the list of IN expr becomes large. Also, I checked the code base and did see this function was called anywhere. Could we consider removing it just for clarity?
I agree to use your logic, which can be optimized in the next PR, but don't delete the existing methods.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you list some usages of ExpressionUtils::expandExpr
and explain a bit why this function is necessary?
if (condition->kind() == ExprKind::kLogicalAnd) { | ||
for (auto& operand : static_cast<const LogicalExpression*>(condition)->operands()) { | ||
if (operand->kind() == ExprKind::kRelIn) { | ||
auto inExpr = static_cast<RelationalExpression*>(operand); | ||
// Do not apply this rule if the IN expr has a valid index or it has only 1 element in the | ||
// list | ||
if (static_cast<ListExpression*>(inExpr->right())->size() > 1) { | ||
return TransformResult::noTransform(); | ||
} else { | ||
transformedExpr = graph::ExpressionUtils::rewriteInExpr(condition); | ||
} | ||
if (OptimizerUtils::relExprHasIndex(inExpr, indexItems)) { | ||
return TransformResult::noTransform(); | ||
} | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IN this case, we can't determine that this expression is introduced by IN, for example:
where in (....) ---> correct
where a AND (b OR c) AND ... ---> incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true.
This rule is to match AND expr containing IN expr as operand. i.g. a AND (b IN [1,2,3])
Actually, I don't really want to expand all AND expr to OR expr, since we don't know whether the property in the OR expression has a valid index or not unless we iterate all its operands. However, for IN expr a IN [b,c,d,....]
, we only need to check the index of the left operand a
.
bool OptimizerUtils::relExprHasIndex( | ||
const Expression* expr, | ||
const std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>>& indexItems) { | ||
DCHECK(expr->isRelExpr()); | ||
|
||
for (auto& index : indexItems) { | ||
const auto& fields = index->get_fields(); | ||
if (fields.empty()) { | ||
return false; | ||
} | ||
|
||
auto left = static_cast<const RelationalExpression*>(expr)->left(); | ||
DCHECK(left->kind() == Expression::Kind::kEdgeProperty || | ||
left->kind() == Expression::Kind::kTagProperty); | ||
|
||
auto propExpr = static_cast<const PropertyExpression*>(left); | ||
if (propExpr->prop() == fields[0].get_name()) { | ||
return true; | ||
} | ||
} | ||
|
||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test case as below:
tag t1 (c1, c2)
tag t2(c1, c2)
index i1 on t1 (c1, c2)
index i2 on t2 (c3, c4)
lookup on t2 where c1 in (.....) ----> this logic will return true if the guess is correct, Because index column name can be the same from different indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we check the schema type in LookupValidator
(root@nebula) [nba]> lookup on player where team.name == "Tim Duncan"
[ERROR (-1009)]: SemanticError: Schema name error: team
status = metaClient->getEdgeIndexesFromCache(scan->space()); | ||
} | ||
auto status = node->kind() == graph::PlanNode::Kind::kTagIndexFullScan | ||
? metaClient->getTagIndexesFromCache(scan->space()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's about getTagIndexesByTagNameFromCache(spaceId, tagName) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean getTagIndexByNameFromCache()
? Sounds like something we could optimize.
The current approach is:
- get all indexItems in the space
- check if indexItems is empty
OptimizerUtils::findOptimalIndex(operand, indexItems, &isPrefixScan, &ictx)
which can be turned into:
- analyze the expression to get tag info
- get indexItems using
getTagIndexByNameFromCache(spaceID, tagName)
- check if indexItems is empty
OptimizerUtils::findOptimalIndex(operand, indexItems, &isPrefixScan, &ictx)
I'll mark this as a TODO. WDYT?
STARTS WITH
IN
expression toOR
expression inLOOKUP
if the property in theIN
expression has a valid index:IN
expr:A in [B, C, D] => (A == B) or (A == C) or (A == D)
AND
expr :A in [a, b] AND B => (A==a AND B) OR (A==b AND B)
OR
expr:A in [a, b] OR B => A==a OR A==b OR B
IN
expression will be transformed to arelEQ
expression. e.g.A in [B] => A==B
Close #2881