-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plan/statistics: concurrent build columns #2713
Conversation
Any bench result to prove the improvement? |
Bench on a table of 9 coulmns,4 single column index,3 double column index,100000 random generated rows using my laptop. |
The execution time is not important for analyze table, we need it to have minimum impact on the system. |
Execution time is very important for user @coocood |
plan/statistics/statistics.go
Outdated
@@ -395,12 +396,13 @@ func (t *Table) build4SortedColumn(sc *variable.StatementContext, offset int, re | |||
} | |||
var valuesPerBucket, lastNumber, bucketIdx int64 = 1, 0, 0 | |||
knowCount := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need know count in concurrent enviroument
@hanfei1991 |
Can we use a session variable to control the concurrency ? |
PTAL @lamxTyler |
@@ -190,21 +190,21 @@ func (s *testStatisticsSuite) TestTable(c *C) { | |||
c.Check(count, Equals, int64(1)) | |||
count, err = col.LessRowCount(sc, types.NewIntDatum(20000)) | |||
c.Check(err, IsNil) | |||
c.Check(count, Equals, int64(19980)) | |||
c.Check(count, Equals, int64(19984)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the test result be changed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there was a bug when caculate the bucketIdx
after merge buckets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @coocood , concurrent introduce too much complexity, which is our dangerous enemy. It bring than less benefit than gain.
plan/statistics/statistics.go
Outdated
go b.buildMultiColumns(t, offsets, i*groupSize, isSorted, doneCh) | ||
} | ||
for range splittedOffsets { | ||
err := <-doneCh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If error happened, then this function return, leave doneCh
alone.
worker goroutine still waiting for write to doneCh
, and would block forever, then goroutine leak .....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the logic at https://github.com/pingcap/tidb/blob/master/domain/domain.go#L103.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiancaiamao The length of channel is len(splittedOffsets). It will never be blocked. PTAL
sessionctx/variable/sysvar.go
Outdated
@@ -598,6 +599,7 @@ var defaultSysVars = []*SysVar{ | |||
{ScopeSession, TiDBSkipConstraintCheck, "0"}, | |||
{ScopeSession, TiDBSkipDDLWait, "0"}, | |||
{ScopeSession, TiDBOptAggPushDown, "ON"}, | |||
{ScopeSession, BuildStatsConcurrencyVar, "4"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should be 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To minmum performance impact, reduce the risk.
LGTM |
@coocood @tiancaiamao PTAL |
@lamxTyler Please resolve the conflicts. |
Rest LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PTAL @coocood @hanfei19910905 @zimulala @tiancaiamao @winoros