You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current executor test framework only supports in testing three kinds of join: ASTTableJoin::Kind::left, ASTTableJoin::Kind::right and ASTTableJoin::Kind::inner.
We need to support more types of join.
Design Doc
This design doc aims to solve the following drawbacks within current executor test framework
Incomplete join kind supports, therefore incomplete join test cases
No on expression supports
Detailed work can be separated into several steps.
Current join executor translation is quite cumbersome. It includes
First, test framework phase: test API -> ASTTableJoin -> DB::mock::Join -> DAGRequest
Then, tiflash execution engine phase: DAGRequest -> part of ASTTableJoin -> TiflashJoin -> DB::Join
As can be seen from the flow, DAGRequest is the bridge between test framework and execution engine being tested. However, ASTTableJoin appears twice and is actually redundant in the first phase because
DAGRequest and ASTTableJoin has different join type classification approach, unfortunately a many-to-many mapping
From the ut writer's perspective, the only thing he/she should know is that using framework's API the product is a DAGRequest in line with expectations. Specifically, it means two things
he/she should be aware of the mapping from DAGRequest to ASTTableJoin, because it is how the execution engine will do towards the input
he/she should not be aware of the mapping from ASTTableJoin to DAGRequest, it has nothing to do with the code being tested and requires additional mental load
In a word, we should avoid using ASTTableJoin in the first phase and rather make the first phase be like testAPI -> DB::mock::Join -> DAGRequest using tiup::JoinType.
compatibility issue
Some legacy tests rely on clickhouse's native parser to translate sql query string to AST trees and then use toTiPBExecutor to get DAGRequest, since these old tests follow the code path that only supports left/right/inner join:
one workaround might be move the translation for ASTTableJoin::Kind->TiPB::JoinType from toTiPBExecutor forward to compileJoin. This is another view that can show the restriction of old API. We might be able to deprecate these tests in the future.
deliverables
After this step, deliverables include
updated DAGRequestBuilder::join test api with tipb::JoinType and avoid using ASTTableJoin in the framework phase
passed all the join regression tests
Step II: Enable all join kinds and enhance test cases
rationale
Based on the first step, now it is possible to support all DAGRequest::JoinType given a valid using expression. All join kinds passed from TiDB to TiFlash in DAGRequest are as follows
InnerJoin
LeftOuterJoin
RightOuterJoin
SemiJoin
AntiSemiJoin
LeftOuterSemiJoin
AntiLeftOuterSemiJoin
deliverables
All DAGRequest::JoinType support and corresponding test cases
Increased join executor test coverage
Step III: Support more complicated DAGRequest construction [WIP]
rationale
Though it is possible to build DAGRequest with all kinds of DAGRequest::JoinType, it does not mean that the tests are exhausted because, again, JoinType -> ASTTableJoin::Kind is not a 1-to-1 mapping. Those ASTTableJoin::Kind secretly used within tiflash with prefix cross_ requires DAGRequest to not have join keys.
For instance, if you write a query like select * from t1 left join t2 on t1.a = 1, tidb will construct a DAGRequest with type tipb::JoinType::TypeLeftOuterJoin but no join keys since expression t1.a = 1 only references left table. Then within tiflash, it will interpret the join type as ASTTableJoin::Kind::Cross_Left.
Now it is clear that we can only test those Cross_* types until we first support on expression in the framework. However, to directly support on expression in the framework might not be a good idea because
complexity: the expression used with on can be any conditional expression of the form that can be used in a WHERE clause
compatibility: in order to parse the expression correctly, the parsing logic should have the exactly same behavior as how tidb handles on
table column reference: our current framework does not support qualified column name
Since the main goal of our framework is to construct the DAGRequest, we can instead choosing a DAGRequest-oriented approach. Namely, ask test writers to be specific about how to set each field of join executor in DAGRequest rather than providing an on expression parser to do this job for them.
draft
tipb::Join has following fields related to the on expression parsing logic
left_conditions: left join && conditional expressions only reference left table
right_conditions: right join && conditional expressions only reference right table
Note: since the new join builder API is kind of harder to use (user must know how TiDB translates sql's join-on clause to DAGRequest's conditional expression fields), instead of using default parameters for conditional expressions, we explicitly require them to supplement these fields and recommend them to use the old one unless they have to test conditional expressions for join in the comment.
Enhancement
Current executor test framework only supports in testing three kinds of join:
ASTTableJoin::Kind::left
,ASTTableJoin::Kind::right
andASTTableJoin::Kind::inner
.We need to support more types of join.
Design Doc
This design doc aims to solve the following drawbacks within current executor test framework
on
expression supportsDetailed work can be separated into several steps.
Step I: Simplify join executor's translation workflow
rationale
Current join executor translation is quite cumbersome. It includes
First, test framework phase:
test API -> ASTTableJoin -> DB::mock::Join -> DAGRequest
Then, tiflash execution engine phase:
DAGRequest -> part of ASTTableJoin -> TiflashJoin -> DB::Join
As can be seen from the flow,
DAGRequest
is the bridge between test framework and execution engine being tested. However,ASTTableJoin
appears twice and is actually redundant in the first phase becauseIn a word, we should avoid using
ASTTableJoin
in the first phase and rather make the first phase be liketestAPI -> DB::mock::Join -> DAGRequest
usingtiup::JoinType
.compatibility issue
Some legacy tests rely on clickhouse's native parser to translate sql query string to AST trees and then use
toTiPBExecutor
to get DAGRequest, since these old tests follow the code path that only supports left/right/inner join:tiflash/dbms/src/Debug/astToExecutor.cpp
Lines 1281 to 1294 in 3ad743b
one workaround might be move the translation for
ASTTableJoin::Kind->TiPB::JoinType
fromtoTiPBExecutor
forward tocompileJoin
. This is another view that can show the restriction of old API. We might be able to deprecate these tests in the future.deliverables
After this step, deliverables include
DAGRequestBuilder::join
test api withtipb::JoinType
and avoid using ASTTableJoin in the framework phaseStep II: Enable all join kinds and enhance test cases
rationale
Based on the first step, now it is possible to support all
DAGRequest::JoinType
given a validusing
expression. All join kinds passed from TiDB to TiFlash in DAGRequest are as followsdeliverables
DAGRequest::JoinType
support and corresponding test casesStep III: Support more complicated DAGRequest construction [WIP]
rationale
Though it is possible to build DAGRequest with all kinds of
DAGRequest::JoinType
, it does not mean that the tests are exhausted because, again,JoinType
->ASTTableJoin::Kind
is not a 1-to-1 mapping. Those ASTTableJoin::Kind secretly used within tiflash with prefixcross_
requires DAGRequest to not have join keys.For instance, if you write a query like
select * from t1 left join t2 on t1.a = 1
, tidb will construct a DAGRequest with typetipb::JoinType::TypeLeftOuterJoin
but no join keys since expressiont1.a = 1
only references left table. Then within tiflash, it will interpret the join type asASTTableJoin::Kind::Cross_Left
.Now it is clear that we can only test those
Cross_*
types until we first supporton
expression in the framework. However, to directly supporton
expression in the framework might not be a good idea becauseon
can be any conditional expression of the form that can be used in a WHERE clauseon
Since the main goal of our framework is to construct the DAGRequest, we can instead choosing a DAGRequest-oriented approach. Namely, ask test writers to be specific about how to set each field of join executor in DAGRequest rather than providing an
on
expression parser to do this job for them.draft
tipb::Join has following fields related to the
on
expression parsing logicin
subquery whose join type should be AntiSemiJoin, AntiLeftOuterSemiJoin or LeftOuterSemiJoin (ref: https://github.com/pingcap/tidb/blob/97f66c3fecebc89fa7e1472c4a53d7a3462432cf/planner/core/plan_to_pb.go#L415-L429)deliverables
Note: since the new join builder API is kind of harder to use (user must know how TiDB translates sql's
join-on
clause to DAGRequest's conditional expression fields), instead of using default parameters for conditional expressions, we explicitly require them to supplement these fields and recommend them to use the old one unless they have to test conditional expressions for join in the comment.Progress Tracking
08:43:35 UTC+8 Sunday, August 7, 2022
The text was updated successfully, but these errors were encountered: