Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner, statistics: use the correct column ID when recording stats loading status #52208

Merged
merged 13 commits into from
Apr 2, 2024

Conversation

time-and-fate
Copy link
Member

What problem does this PR solve?

Issue Number: close #52207

What changed and how does it work?

Previously, this place incorrectly used the UniqueID, which is only valid during this query.

We should use the column ID from the metadata so that we can correctly fetch the column name from TableInfo.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 28, 2024
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 28, 2024
pkg/statistics/table.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Mar 28, 2024

Codecov Report

Merging #52208 (688b40d) into master (c9a250a) will increase coverage by 2.1758%.
Report is 12 commits behind head on master.
The diff coverage is 88.8888%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #52208        +/-   ##
================================================
+ Coverage   71.1432%   73.3190%   +2.1758%     
================================================
  Files          1468       1468                
  Lines        432193     433639      +1446     
================================================
+ Hits         307476     317940     +10464     
+ Misses       105490      95852      -9638     
- Partials      19227      19847       +620     
Flag Coverage Δ
integration 48.8438% <85.1851%> (?)
unit 70.9403% <87.0370%> (-0.0186%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (ø)
parser ∅ <ø> (∅)
br 41.4176% <ø> (+7.3304%) ⬆️

@@ -152,7 +152,7 @@ func crossEstimateRowCount(sctx context.PlanContext,
return 0, err == nil, corr
}
idxID := int64(-1)
idxIDs, idxExists := dsStatsInfo.HistColl.ColID2IdxIDs[colID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use the clearly name 'colUniqueID' instead of 'colID' to avoid the wrong used.

@@ -45,7 +45,11 @@ func GetRowCountByColumnRanges(sctx context.PlanContext, coll *statistics.HistCo
}
sc := sctx.GetSessionVars().StmtCtx
c, ok := coll.Columns[colID]
recordUsedItemStatsStatus(sctx, c, coll.PhysicalID, colID)
colInfoID := colID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

if len(coll.UniqueID2colInfoID) > 0 {
colInfoID = coll.UniqueID2colInfoID[colID]
}
recordUsedItemStatsStatus(sctx, c, coll.PhysicalID, colInfoID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should distinguish colId and colUniqueId in explain such as "colID: xxx" , "colUniqueID: XXX"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the column name in the EXPLAIN result, which should be fetched using the column ID from the metadata instead of the UniqueID.

@@ -170,15 +170,15 @@ func getIndexRowCountForStatsV1(sctx context.PlanContext, coll *statistics.HistC
}
var count float64
var err error
colIDs := coll.Idx2ColumnIDs[idxID]
colIDs := coll.Idx2ColUniqueIDs[idxID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index ID is not relevant here, only Column ID has this difference.

Copy link
Member

@Rustin170506 Rustin170506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks! 👍

/hold

Feel free to update the testing code or ignore it.

@@ -175,6 +175,7 @@
"fieldalignment": {
"exclude_files": {
"pkg/parser/parser.go": "parser/parser.go code",
"pkg/statistics/table.go": "disable this limitation that prevents us from splitting struct fields for clarity",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always want to do this, but I can never convince myself. 😃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 I think performance benefits and memory efficiency from this rule are small. So I think it's totally OK to disable it when it causes problems or inconvenience.

@@ -215,17 +215,11 @@ const (

// HistColl is a collection of histogram. It collects enough information for plan to calculate the selectivity.
type HistColl struct {
Columns map[int64]*Column
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big thanks for refactoring this structure!

Comment on lines -221 to -227
Idx2ColumnIDs map[int64][]int64
// ColID2IdxIDs maps the column id to a list index ids whose first column is it. It's used to calculate the selectivity in planner.
ColID2IdxIDs map[int64][]int64
// MVIdx2Columns maps the index id to its columns by expression.Column.
// For normal index, the column id is enough, as we already have in Idx2ColumnIDs. But currently, mv index needs more
// information to match the filter against the mv index columns, and we need this map to provide this information.
MVIdx2Columns map[int64][]*expression.Column
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems we still using some of these names in our testing code. Feel free to update it or ignore it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I found, that's used inside a function, I think that's OK.

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2024
Copy link

ti-chi-bot bot commented Apr 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hi-rustin, winoros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 2, 2024
Copy link

ti-chi-bot bot commented Apr 2, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-03-28 23:08:01.616151721 +0000 UTC m=+2195708.638398110: ☑️ agreed by winoros.
  • 2024-04-02 06:23:43.792525531 +0000 UTC m=+338685.320066077: ☑️ agreed by hi-rustin.

@time-and-fate
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2024
@ti-chi-bot ti-chi-bot bot merged commit 21e9d3c into pingcap:master Apr 2, 2024
23 checks passed
@time-and-fate
Copy link
Member Author

/cherry-pick release-7.5
/cherry-pick release-7.1

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Apr 2, 2024
@ti-chi-bot
Copy link
Member

@time-and-fate: new pull request created to branch release-7.1: #52308.

In response to this:

/cherry-pick release-7.5
/cherry-pick release-7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

@time-and-fate: new pull request created to branch release-7.5: #52309.

In response to this:

/cherry-pick release-7.5
/cherry-pick release-7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

columns with not full loaded stats may display wrong column ID in EXPLAIN
5 participants