Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: incorrect cardinality estimation on composite indexes #49801

Closed
qw4990 opened this issue Dec 26, 2023 · 1 comment · Fixed by #49808
Closed

planner: incorrect cardinality estimation on composite indexes #49801

qw4990 opened this issue Dec 26, 2023 · 1 comment · Fixed by #49808
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. component/statistics severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@qw4990
Copy link
Contributor

qw4990 commented Dec 26, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Prepare data by executing those SQLs:

create table t (category varchar(256) NOT NULL, `status` varchar(256) NOT NULL, key idx(category, `status`));


insert into t values ('superv', 'failed'); -- 52
insert into t select * from t where category='superv' and `status`='failed';
insert into t select * from t where category='superv' and `status`='failed';
insert into t select * from t where category='superv' and `status`='failed';
insert into t select * from t where category='superv' and `status`='failed';
insert into t select * from t where category='superv' and `status`='failed';
insert into t select * from t where category='superv' and `status`='failed';


insert into t values ('crea', 'processing'); -- 56
insert into t select * from t where category='crea' and `status`='processing';
insert into t select * from t where category='crea' and `status`='processing';
insert into t select * from t where category='crea' and `status`='processing';
insert into t select * from t where category='crea' and `status`='processing';
insert into t select * from t where category='crea' and `status`='processing';
insert into t select * from t where category='crea' and `status`='processing';



insert into t values ('i18n', 'ignored'); -- 112
insert into t select * from t where category='i18n' and `status`='ignored';
insert into t select * from t where category='i18n' and `status`='ignored';
insert into t select * from t where category='i18n' and `status`='ignored';
insert into t select * from t where category='i18n' and `status`='ignored';
insert into t select * from t where category='i18n' and `status`='ignored';
insert into t select * from t where category='i18n' and `status`='ignored';
insert into t select * from t where category='i18n' and `status`='ignored';


insert into t values ('icon', 'ignored');  -- 120
insert into t select * from t where category='icon' and `status`='ignored';
insert into t select * from t where category='icon' and `status`='ignored';
insert into t select * from t where category='icon' and `status`='ignored';
insert into t select * from t where category='icon' and `status`='ignored';
insert into t select * from t where category='icon' and `status`='ignored';
insert into t select * from t where category='icon' and `status`='ignored';
insert into t select * from t where category='icon' and `status`='ignored';


insert into t values ('crea', 'failed');  --240
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';
insert into t select * from t where category='crea' and `status`='failed';


insert into t values ('gift', 'success'); -- 740
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';
insert into t select * from t where category='gift' and `status`='success';


insert into t values ('superv-ref', 'success'); -- 1916
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';
insert into t select * from t where category='superv-ref' and `status`='success';


insert into t values ('art', 'success'); -- 1916
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';
insert into t select * from t where category='art' and `status`='success';


insert into t values ('art-ref', 'success'); -- 9908
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success';
insert into t select * from t where category='art-ref' and `status`='success' limit 1500;


insert into t values ('ams', 'success'); -- 12772
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success';
insert into t select * from t where category='ams' and `status`='success' limit 4000;


insert into t values ('i18n', 'success'); -- 43800
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success';
insert into t select * from t where category='i18n' and `status`='success' limit 10000;


insert into t values ('superv', 'success'); -- 62960
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';
insert into t select * from t where category='superv' and `status`='success';


insert into t values ('icon', 'success'); -- 137856
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';
insert into t select * from t where category='icon' and `status`='success';


insert into t values ('crea', 'success'); -- 167000
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success';
insert into t select * from t where category='crea' and `status`='success' limit 30000;



insert into t values ('superv-ref', 'failed'); -- 12600
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed';
insert into t select * from t where category='superv-ref' and `status`='failed' limit 4000;


insert into t values ('superv', 'processing');  -- 8400
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';
insert into t select * from t where category='superv' and `status`='processing';


insert into t values ('superv', 'ignored'); -- 12600
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored';
insert into t select * from t where category='superv' and `status`='ignored' limit 4000;


insert into t values ('i18n', 'failed');  -- 21000
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed';
insert into t select * from t where category='i18n' and `status`='failed' limit 4000;


insert into t values ('doraemon', 'success');  -- 12600
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success';
insert into t select * from t where category='doraemon' and `status`='success' limit 4000;


insert into t values ('crea', 'ignored');  -- 4200
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';
insert into t select * from t where category='crea' and `status`='ignored';


insert into t values ('common', 'success');  -- 29000
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success';
insert into t select * from t where category='common' and `status`='success' limit 3000;


insert into t values ('i18n', 'idle');
insert into t values ('crea', 'idle');
insert into t values ('superv', 'idle');

then analyze the table and execute :

explain analyze select * from t where category='i18n' and `status`='idle';

If you can't reproduce this issue, just analyze the table again, it has a certain degree of randomness.

2. What did you expect to see? (Required)

The cardinality estimation result should be accurate.

3. What did you see instead (Required)

It is incredibly over-estimated:

mysql> explain analyze select * from t where category='i18n' and `status`='idle';
+------------------------+----------+---------+-----------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+------+
| id                     | estRows  | actRows | task      | access object                        | execution info                                                                                                                                                                                             | operator info                                         | memory    | disk |
+------------------------+----------+---------+-----------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+------+
| IndexReader_6          | 31266.33 | 1       | root      |                                      | time:511.3µs, loops:2, RU:0.476099, cop_task: {num: 1, max: 426.1µs, proc_keys: 0, rpc_num: 1, rpc_time: 398.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 11.5µs, max_distsql_concurrency: 1}     | index:IndexRangeScan_5                                | 253 Bytes | N/A  |
| └─IndexRangeScan_5     | 31266.33 | 1       | cop[tikv] | table:t, index:idx(category, status) | tikv_task:{time:366.3µs, loops:0}                                                                                                                                                                          | range:["i18n" "idle","i18n" "idle"], keep order:false | N/A       | N/A  |
+------------------------+----------+---------+-----------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+------+

It seems like the stats is not collected correctly. The stats_bucket seems incorrect:

mysql> show stats_buckets;
+---------+------------+----------------+-------------+----------+-----------+-------+---------+--------------+--------------+------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Bucket_id | Count | Repeats | Lower_Bound  | Upper_Bound  | Ndv  |
+---------+------------+----------------+-------------+----------+-----------+-------+---------+--------------+--------------+------+
| test    | t          |                | idx         |        1 |         0 | 93799 |   31266 | (crea, idle) | (crea, idle) |    0 |
+---------+------------+----------------+-------------+----------+-----------+-------+---------+--------------+--------------+------+

4. What is your TiDB version? (Required)

@qw4990
Copy link
Contributor Author

qw4990 commented Dec 26, 2023

The reason is that when we scale the TopN.Count, we round down the scale factor first, and then multiply it with the count.
But in this case, the scale factor is 4.8, so after this round-down, there is a round error of 15%, which is then magnified by some following steps.
image

We should use topNList[i].Count = uint64(float64(topNList[i].Count) * sampleFactor) instead of topNList[i].Count *= uint64(sampleFactor).

@qw4990 qw4990 added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. labels Dec 26, 2023
@qw4990 qw4990 removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. component/statistics severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant