Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: refactor package. #2913

Merged
merged 17 commits into from
Mar 29, 2017
Merged

statistics: refactor package. #2913

merged 17 commits into from
Mar 29, 2017

Conversation

hanfei1991
Copy link
Member

change the way of store stats info.

@shenli shenli changed the title statistics: refactor package. [DNM]statistics: refactor package. Mar 22, 2017
@shenli
Copy link
Member

shenli commented Mar 22, 2017

We should find a safe way to do DDL in bootstrap stage.

@hanfei1991
Copy link
Member Author

@lamxTyler PTAL

tbl = statistics.PseudoTable(tableInfo)
}
SetStatisticsTableCache(tableID, tbl, version)
log.Errorf("Error occured when convert pb table for table id %d", tableID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log should be updated.

return errors.Trace(err)
}
for _, col := range t.Columns {
col.saveToStorage(ctx, t.Info.ID, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore the error?

bootstrap.go Outdated
index tbl(table_id)
);`

// CreateStatsBucketsTable stores the histogram info for every table columns.
Copy link
Contributor

@alivxxx alivxxx Mar 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to point out that the count is not the prefix sum?

@@ -68,6 +68,43 @@ type Column struct {
Repeats []int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in line 63-65 can be removed? Because we do not serialize it now.

if err != nil {
return nil, errors.Trace(err)
}
func colStatsFromStorage(ctx context.Context, tableID int64, colID int64, tp *types.FieldType, distinct int64, isIndex bool) (*Column, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sperate the function (c *Column) saveToStorage and this function? It is better to put them in the same place as the Table do.

return nil, errors.Trace(err)
bucketSize := len(rows)
if bucketSize == 0 {
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return nil? Even if it is an empty table, currently it will have at least 1 bucket. bucketSize is 0 may mean that this row is added after analyze.

if err != nil {
return nil, errors.Trace(err)
}
table.Indices = append(table.Indices, col)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also check if this index is deleted.

}
}
}
return table, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are newly added columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that load table from storage in this way needs too much check. May be it is better to iterate through the table info to load stats.

@hanfei1991
Copy link
Member Author

@shenli It's safe to create table when bootstrap.

var col *Column
for _, idxInfo := range info.Indices {
if idxID == idxInfo.ID {
col, err = colStatsFromStorage(ctx, info.ID, idxID, nil, distinct, true)
Copy link
Contributor

@alivxxx alivxxx Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If line 120 return nil, this means that this index do not have histograms as line 727 shows, as those who have histograms must have at least 1 bucket even if it is empty columns. So I suggest that line 120 return the pseudo cols, and update the comment in 727.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle all problems like this in next PR.

@shenli shenli changed the title [DNM]statistics: refactor package. statistics: refactor package. Mar 27, 2017
}
}
if indexCount != len(info.Indices) {
return nil, errors.New("The number of indices doesn't match with the schema")
Copy link
Contributor

@alivxxx alivxxx Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need . at the end?
(update: golint said no.

} else {
count = c.Numbers[i] - c.Numbers[i-1]
}
// set value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add . at the end.

@alivxxx
Copy link
Contributor

alivxxx commented Mar 27, 2017

Rest LGTM.

@alivxxx
Copy link
Contributor

alivxxx commented Mar 27, 2017

LGTM

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 27, 2017
Info: info,
Count: count,
}
selSQL := fmt.Sprintf("select * from mysql.stats_columns where table_id = %d", info.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to read and have better compatibility if we list the columns in the SQL.

bootstrap.go Outdated
count bigint(64) NOT NULL,
repeats bigint(64) NOT NULL,
value blob NOT NULL,
index tbl(table_id, col_id, index_id, bucket_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be unique.

bootstrap.go Outdated
distinct_ratio double(64) NOT NULL DEFAULT 0,
use_count_to_estimate tinyint(2) NOT NULL DEFAULT 0,
version bigint(64) unsigned NOT NULL DEFAULT 0,
index tbl(table_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique tbl(table_id, col_id, index_id)?

Indices []*Column
Count int64 // Total row count in a table.
Pseudo bool
// TableStatsFromStorage load table stats info from storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/load/loads

if err != nil {
return nil, errors.Trace(err)
}
// indexCount and columnCount records the number of indices and columns in table stats. If the number don't match with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/records/record

return nil
}

func colStatsFromStorage(ctx context.Context, tableID int64, colID int64, tp *types.FieldType, distinct int64, isIndex bool) (*Column, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use different structs to represent column stats and index stats.
index stats may contain multiple columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be changed in next pr

testKit := testkit.NewTestKit(c, store)
testKit.MustExec("use test")
testKit.MustExec("create table t (c1 int, c2 int)")
for i := 0; i < 100000; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may take too much time.

c.Assert(err, IsNil)
statsTbl2, err := statistics.TableStatsFromStorage(se, tableInfo, 100000)
c.Assert(err, IsNil)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check count of stats.

bootstrap.go Outdated
@@ -116,14 +116,38 @@ const (
UNIQUE KEY name (name)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='help topics';`

// CreateStatsMetaTable store's the meta of table statistics.
// CreateStatsMetaTable stores the meta of table statistics.
CreateStatsMetaTable = `CREATE TABLE if not exists mysql.stats_meta (
version bigint(64) unsigned NOT NULL,
table_id bigint(64) NOT NULL,
modify_count bigint(64) NOT NULL DEFAULT 0,
count bigint(64) unsigned NOT NULL DEFAULT 0,
index idx_ver(version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an index for table_id.

@zimulala zimulala assigned zimulala and unassigned zimulala Mar 27, 2017

// SaveToStorage saves stats table to storage.
func (t *Table) SaveToStorage(ctx context.Context) error {
_, err := ctx.(sqlexec.SQLExecutor).Execute("begin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this error.

table := &Table{
Info: info,
Count: count,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use table as an argument directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way is clearer.

}
}
}
if indexCount != len(info.Indices) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition can move to line366 that can make code clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this. The check must be after loop.

@hanfei1991
Copy link
Member Author

@zimulala @coocood PTAL

return nil
}
go func(do *Domain) {
ticker := time.NewTicker(lease)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the function of Stop for the ticker.

@hanfei1991
Copy link
Member Author

@zimulala PTAL

bootstrap.go Outdated
);`

// CreateStatsColsTable stores the statistics of table columns.
CreateStatsColsTable = `CREATE TABLE if not exists mysql.stats_columns (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about name the table stats_histgram?
Because it contains both index and column.

type Table struct {
Info *model.TableInfo
Columns []*Column
Indices []*Column
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about use a different type for indices?
An index may contain multiple columns, it has a different algorithm to calculate partial column condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it in next pr.

Copy link
Contributor

@alivxxx alivxxx Mar 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a different opinion. What you said is true, but they have more in common than difference, especially when you consider how they are used, for example, get row count in a range, once you encode the indices, the rest is the same. If we use a different type, will the same logic be written twice? I do not know if this is worth the complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lamxTyler
We can extract the same logic into a type like histogram, then wrap it with Column and Index.

Copy link
Member

@coocood coocood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala
Copy link
Contributor

LGTM

@hanfei1991 hanfei1991 merged commit ee7df05 into master Mar 29, 2017
@hanfei1991 hanfei1991 deleted the hanfei/stats branch March 29, 2017 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants