Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/ranger: add missing Selection for range scan from like on PAD SPACE column | tidb-test=pr/2251 #48845

Merged
merged 4 commits into from
Nov 24, 2023

Conversation

time-and-fate
Copy link
Member

@time-and-fate time-and-fate commented Nov 23, 2023

UPDATE

Note that this fix introduced a very subtle regression, and it's not covered by any existing test cases. We'll try to fix this later.
In brief, if you are accessing a multi-column index using a multi-column range, and some columns of the ranges come from like function that doesn't contain %, and it's a binary collation, then after this PR, tidb may be able to make use of fewer columns to do the range scan.
Example:

create table t(a varchar(20) collate utf8mb4_bin, b varchar(20) collate utf8mb4_bin, index iab(a,b));
explain select * from t where a like 'xxx' and b = 'yyy';

Before (v7.5.0):

> explain select * from t where a like 'xxx' and b = 'yyy';
+------------------------+---------+-----------+--------------------------+-----------------------------------------------------------------+
| id                     | estRows | task      | access object            | operator info                                                   |
+------------------------+---------+-----------+--------------------------+-----------------------------------------------------------------+
| IndexReader_6          | 0.10    | root      |                          | index:IndexRangeScan_5                                          |
| └─IndexRangeScan_5     | 0.10    | cop[tikv] | table:t, index:iab(a, b) | range:["xxx" "yyy","xxx" "yyy"], keep order:false, stats:pseudo |
+------------------------+---------+-----------+--------------------------+-----------------------------------------------------------------+

Now:

> explain select * from t where a like 'xxx' and b = 'yyy';
+--------------------------+---------+-----------+--------------------------+-----------------------------------------------------+
| id                       | estRows | task      | access object            | operator info                                       |
+--------------------------+---------+-----------+--------------------------+-----------------------------------------------------+
| IndexReader_7            | 0.01    | root      |                          | index:Selection_6                                   |
| └─Selection_6            | 0.01    | cop[tikv] |                          | eq(test.t.b, "yyy"), like(test.t.a, "xxx", 92)      |
|   └─IndexRangeScan_5     | 10.00   | cop[tikv] | table:t, index:iab(a, b) | range:["xxx","xxx"], keep order:false, stats:pseudo |
+--------------------------+---------+-----------+--------------------------+-----------------------------------------------------+

What problem does this PR solve?

Issue Number: close #48821 ref #48181

What changed and how does it work?

As the title says.
And please read the comments in the code.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 23, 2023
Copy link

codecov bot commented Nov 23, 2023

Codecov Report

Merging #48845 (163d690) into master (26db590) will increase coverage by 1.7251%.
Report is 3 commits behind head on master.
The diff coverage is 90.4977%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #48845        +/-   ##
================================================
+ Coverage   71.0004%   72.7255%   +1.7251%     
================================================
  Files          1367       1392        +25     
  Lines        404899     411851      +6952     
================================================
+ Hits         287480     299521     +12041     
+ Misses        97382      93459      -3923     
+ Partials      20037      18871      -1166     
Flag Coverage Δ
integration 43.7979% <57.4660%> (?)
unit 71.0901% <85.5203%> (+0.0897%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9663% <ø> (ø)
parser ∅ <ø> (∅)
br 48.8226% <59.2592%> (-4.2581%) ⬇️

@time-and-fate
Copy link
Member Author

/test check-dev2

Copy link

tiprow bot commented Nov 23, 2023

@time-and-fate: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test tiprow_fast_test

Use /test all to run all jobs.

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 24, 2023
Copy link

ti-chi-bot bot commented Nov 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tangenta, winoros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 24, 2023
Copy link

ti-chi-bot bot commented Nov 24, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-24 03:30:20.37193711 +0000 UTC m=+547849.037163307: ☑️ agreed by tangenta.
  • 2023-11-24 06:09:21.382089535 +0000 UTC m=+557390.047315730: ☑️ agreed by winoros.

@ti-chi-bot ti-chi-bot bot merged commit 27d2ba5 into pingcap:master Nov 24, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #48881.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #48882.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.1: #48883.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Nov 24, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #48884.

// the `like` function. Therefore, a Selection is needed to filter the data.
// Since all collations, except for binary, implemented in tidb are PAD SPACE collations for now, we use a simple
// collation != binary check here.
if collation != charset.CollationBin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about make PAD SPACE as a new attribute for collation, so that we won't import bugs when a new NO PAD collation is added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I think it's better for SQL team to add this into collation-related packages, especially when they start to implement NO PAD collations. I added an isPadSpaceCollation function in #48984 since we need the same check there.
This won't cause bugs. If we use current logic when NO PAD collation is added, there will be unnecessary Selection, that's not perfect but won't cause bugs.

time-and-fate added a commit to time-and-fate/tidb that referenced this pull request Dec 5, 2023
@ti-chi-bot ti-chi-bot removed the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query result may be wrong when use like to do index range scan on PAD SPACE column
5 participants