Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Selection can control the conditions of the below scan plan. #2834

Merged
merged 30 commits into from
Mar 28, 2017

Conversation

winoros
Copy link
Member

@winoros winoros commented Mar 15, 2017

If one of the conditions of this selection contains both key and correlated column and the child of it is a table scan, this condition could be pushed into the access conditions or filter conditions because the correlated column is constant when executed.
After this, we could change some join to index nested loop join.
PTAL @hanfei1991 @shenli @coocood @lamxTyler

@hanfei1991
Copy link
Member

Good Job!

// SelectionExec represents a filter executor.
type SelectionExec struct {
Src Executor
Condition expression.Expression
ctx context.Context
schema *expression.Schema

scanController bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments for these two fields.

}
var conds []expression.Expression
x.where, _, conds = plan.ExpressionsToPB(sc, tblFilters, client)
e.Condition = expression.ComposeCNFCondition(e.ctx, conds...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.Condition don't need to be changed.

@@ -461,21 +461,104 @@ func (e *TableDualExec) Close() error {
return nil
}

func convertCorCol2Constant(expr expression.Expression) (expression.Expression, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert-> substitute

}
newArgs = append(newArgs, newArg)
}
newSf, _ := expression.NewFunction(x.GetCtx(), x.FuncName.L, x.GetType(), newArgs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may consider the case of "Cast"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will look into this tonight.

newSf, _ := expression.NewFunction(x.GetCtx(), x.FuncName.L, x.GetType(), newArgs...)
return newSf, nil
case *expression.CorrelatedColumn:
val, err := x.Eval(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the x.Data directly, without considering err

@@ -25,7 +25,8 @@ import (
"github.com/pingcap/tipb/go-tipb"
)

func expressionsToPB(sc *variable.StatementContext, exprs []expression.Expression, client kv.Client) (pbExpr *tipb.Expr, pushed []expression.Expression, remained []expression.Expression) {
// ExpressionsToPB change expression to tipb.Expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add .

@@ -25,7 +25,8 @@ import (
"github.com/pingcap/tipb/go-tipb"
)

func expressionsToPB(sc *variable.StatementContext, exprs []expression.Expression, client kv.Client) (pbExpr *tipb.Expr, pushed []expression.Expression, remained []expression.Expression) {
// ExpressionsToPB change expression to tipb.Expr.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change -> converts

}

// Schema implements the Executor Schema interface.
func (e *SelectionExec) Schema() *expression.Schema {
return e.schema
}

// initController will init the conditions of the below scan executor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more comments for this function.

}
}

func (p *Selection) getUsableIndicesAndPk(ds *DataSource) ([]*model.IndexInfo, model.CIStr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for this function.

allConstant = allConstant && ok
}
if allConstant {
val, _ := x.Eval(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error can't be ignored ...

if err != nil {
return errors.Trace(err)
}
x.indexPlan.IndexConditionPBExpr, _, _ = plan.ExpressionsToPB(sc, e.idxFilterConditions, client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we update the e.idxFilterConditions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.idxFilterCondtion use the value of *Select.IdxConditions directly, i think this don't need to update in executor phase.

@@ -484,6 +567,9 @@ func (e *SelectionExec) Next() (*Row, error) {
if srcRow == nil {
return nil, nil
}
if e.Condition == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this.

@@ -146,6 +146,13 @@ type Selection struct {

// onTable means if this selection's child is a table scan or index scan.
onTable bool

// If ScanController is true, then the child of this selection is a scan,
// which use pk or index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add .

@@ -792,6 +1027,11 @@ func (p *Selection) convert2PhysicalPlan(prop *requiredProperty) (*physicalPlanI
if info != nil {
return info, nil
}
info = p.tryToBuildScanByKeyAndCorCol()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is very heavy. It may be called multiple times. I suggested that If 'flagDecorrelated' is true, check it in an other logic but in physical plan builder.

switch x := expr.(type) {
case *expression.ScalarFunction:
allConstant := true
newArgs := make([]expression.Expression, 0, len(x.GetArgs()))
for _, arg := range x.GetArgs() {
newArg, ok := substituteCorCol2Constant(arg)
newArg, ok, err := substituteCorCol2Constant(arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ok is redundant. You can just check if newArg is constant

TableAsName: p.TableAsName,
OutOfOrder: true,
DBName: p.DBName,
physicalTableSource: physicalTableSource{client: p.ctx.GetClient()},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client: client

@@ -783,6 +785,261 @@ func (p *Union) convert2PhysicalPlan(prop *requiredProperty) (*physicalPlanInfo,
return info, nil
}

func buildTableScanByKeyAndCorCol(p *DataSource, pkName model.CIStr, fakeConds []expression.Expression, origConds []expression.Expression) (*physicalPlanInfo, []expression.Expression, []expression.Expression) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems we don't need to do like that. Building filter condition during runtime is more reasonable.

schema: v.Schema(),
ctx: b.ctx,
scanController: v.ScanController,
Conditions: v.Conditions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you store the conditions, you don't need to store condition any more. Use a for loop to check condition.

// getUsableIndicesAndPk will simply check whether the pk or one index could used in this situation by
// checking whether this index or pk is contained in one condition that has correlated column,
// and whether this condition can be used as an access condition.
func (p *Selection) getUsableIndicesAndPk(ds *DataSource) ([]*model.IndexInfo, model.CIStr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't peek all indices out, just ensure one index during plan building.

newArgs := make([]expression.Expression, 0, len(x.GetArgs()))
for _, arg := range x.GetArgs() {
newArg, err := substituteCorCol2Constant(arg)
_, ok := newArg.(*expression.Constant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this after checking error.

// The first return value is the new expression, the second is a bool value tell whether the below expression is all constant.
// The Second is used for simplify the scalar function.
// If the args of one scalar function are all constant, we will substitute it to constant.
func substituteCorCol2Constant(cond expression.Expression) expression.Expression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You implement this function for two times. Why not put this function to expression package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is change correlated column to expression.One, another is eval the correlated column to Datum. I thought eval a correlated column when its Data is nothing doesn't make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract all cor columns and set them to one, then you can reuse this function.

scanController bool
controllerInit bool
Conditions []expression.Expression
usableIndices []*model.IndexInfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this from selection executor.

}
x.where = x.indexPlan.TableConditionPBExpr
default:
return errors.New("Error type of PhysicalPlan")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print %T to tout put the plan type

var newConds []expression.Expression
for _, expr := range p.Conditions {
cond := pushDownNot(expr.Clone(), false, nil)
if !cond.IsCorrelated() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this before push down not

// checking whether this index or pk is contained in one condition that has correlated column,
// and whether this condition can be used as an access condition.
func (p *Selection) getUsableIndicesAndPk(ds *DataSource) ([]*model.IndexInfo, model.CIStr) {
indices, _ := availableIndices(ds.indexHints, ds.tableInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore the includeTableScan

} else {
is.readOnly = true
}
is.AccessCondition, condsBackUp = DetachIndexScanConditions(condsBackUp, is)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to calculate access condtion, index conds and tbl conds here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculating access condition is to choose which index is better by accessEqualCount and accessInAndEqCount.
Idx conds and tbl conds is used to make explain result clear, can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove idx and tbl conds. and their pb

// Since one selection may call convert2PhysicalScan many times. We extract the PkName and indices
// used for scanController only once and store them to judge whether this selection can convert to
// scanController mode.
usefulPkName model.CIStr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to store the usefulPkName and usefulIndices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't store the usefulPkName and usefulIndices. We will need to extract them one more time when build controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, it seems not cost too much.

// scanController mode.
usefulPkName model.CIStr
usefulIndices []*model.IndexInfo
extractedUsefulThing bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set a flag that we set controller when the flag is true. The flag is true only the selection has correlated column and its child is DataSource. You can check this during decorrelation.

if chosenPlan == nil || chosenPlan.accessEqualCount < is.accessEqualCount || chosenPlan.accessInAndEqCount < is.accessInAndEqCount {
chosenPlan = is
}
is.DoubleRead = isCoveringIndex(is.Columns, is.Index.Columns, is.Table.PKIsHandle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing !?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test for it!

@@ -158,6 +160,11 @@ func (s *decorrelateSolver) optimize(p LogicalPlan, _ context.Context, _ *idAllo
}
}
}
if sel, ok := p.(*Selection); ok {
if ds, ok := p.(*DataSource); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.GetChildren()[0]...
output controller to sel explain result. and add explain test for it.

@winoros winoros changed the title *: Selection can control the accessConditions of the below scan plan. *: Selection can control the conditions of the below scan plan. Mar 22, 2017
@hanfei1991
Copy link
Member

LGTM
resolve the conflict.
@shenli @coocood PTAL

p: &newSel,
count: uint64(ds.statisticTable.Count),
}
info.cost = float64(info.count) * selectionFactor
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cost calculation okay there?

@winoros winoros added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 23, 2017
return &Constant{Value: *x.Data, RetType: x.GetType()}, nil
case *Constant:
return x.Clone(), nil
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case we will fall to default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Column will fall to default. i will move *Constant to default too.

@coocood
Copy link
Member

coocood commented Mar 23, 2017

@winoros
The double read fix is not covered by test.
I removed !, the test still pass.

@coocood
Copy link
Member

coocood commented Mar 23, 2017

@winoros
I checked the test coverage, the new makeScanController function is not covered at all.

// If the args of one scalar function are all constant, we will substitute it to constant.
// If it's called in plan phase, correlated column will change to expression.One.
// If in executor phase, correlated column will change to value it contains.
func SubstituteCorCol2Constant(expr Expression) (Expression, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not too hard to write a test for this function.

// hasUsableIndicesAndPk will simply check whether the pk or one index could used in this situation by
// checking whether this index or pk is contained in one condition that has correlated column,
// and whether this condition can be used as an access condition.
func (p *Selection) hasUsableIndicesAndPk(ds *DataSource) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is very complex, need to write a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

// If one cond is ok, then this index is useful.
if checker.check(cond) {
usable = true
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

}
child = ts
} else {
var chosenPlan *PhysicalIndexScan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not covered.
availableIndices returns all indices for you to chose if there is no index hint, but if the table has a PKHandle column, other indices will never be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@@ -167,3 +173,64 @@ func (s *decorrelateSolver) optimize(p LogicalPlan, _ context.Context, _ *idAllo
p.SetChildren(newChildren...)
return p, nil
}

// hasUsableIndicesAndPk will simply check whether the pk or one index could used in this situation by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be used

@@ -167,3 +173,64 @@ func (s *decorrelateSolver) optimize(p LogicalPlan, _ context.Context, _ *idAllo
p.SetChildren(newChildren...)
return p, nil
}

// hasUsableIndicesAndPk will simply check whether the pk or one index could used in this situation by
// checking whether this index or pk is contained in one condition that has correlated column,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in any condition

if ds.tableInfo.PKIsHandle && includeTableScan {
var pkCol *expression.Column
for i, col := range ds.Columns {
if mysql.HasPriKeyFlag(col.Flag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider composed index?

if pkCol != nil {
checker := conditionChecker{
pkName: pkCol.ColName,
length: types.UnspecifiedLength,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use UnspecifiedLength?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same with DetachTableScanConditions() in refiner.go.

@@ -981,6 +982,98 @@ func (p *Union) convert2PhysicalPlan(prop *requiredProperty) (*physicalPlanInfo,
return info, nil
}

func (p *Selection) makeScanController() *physicalPlanInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could be merged with hasUsableIndicesAndPk.

@winoros
Copy link
Member Author

winoros commented Mar 27, 2017

PTAL @shenli

@shenli
Copy link
Member

shenli commented Mar 28, 2017

LGTM
Well done!

@shenli shenli merged commit 0e172ac into master Mar 28, 2017
@hanfei1991 hanfei1991 deleted the yiding/inlj_1 branch March 28, 2017 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants