-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statistics: clean up dropped predicate columns stats usage #53680
statistics: clean up dropped predicate columns stats usage #53680
Conversation
Signed-off-by: hi-rustin <[email protected]>
971e144
to
bb85eef
Compare
Signed-off-by: hi-rustin <[email protected]>
bb85eef
to
96f1c07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔢 Self-check (PR reviewed by myself and ready for feedback.)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #53680 +/- ##
================================================
+ Coverage 74.4884% 74.4987% +0.0102%
================================================
Files 1506 1506
Lines 357618 431433 +73815
================================================
+ Hits 266384 321412 +55028
- Misses 71857 90098 +18241
- Partials 19377 19923 +546
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Tested locally:
mysql> create table t (a int, b int);
Query OK, 0 rows affected (0.04 sec)
mysql> insert into t values (1, 1), (2, 2), (3, 3);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> set global tidb_enable_column_tracking = 1;
Query OK, 0 rows affected (0.02 sec)
mysql> select @@tidb_enable_column_tracking;
+-------------------------------+
| @@tidb_enable_column_tracking |
+-------------------------------+
| 1 |
+-------------------------------+
1 row in set (0.00 sec)
mysql> select * from t where b > 1;
+------+------+
| a | b |
+------+------+
| 2 | 2 |
| 3 | 3 |
+------+------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM MYSQL.COLUMN_STATS_USAGE;
+----------+-----------+---------------------+------------------+
| table_id | column_id | last_used_at | last_analyzed_at |
+----------+-----------+---------------------+------------------+
| 104 | 2 | 2024-05-30 16:47:18 | NULL |
+----------+-----------+---------------------+------------------+
1 row in set (0.01 sec)
mysql> ANALYZE TABLE test.t PREDICATE COLUMNS;
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> select * from mysql.analyze_options;
+----------+------------+-------------+---------+------+---------------+------------+
| table_id | sample_num | sample_rate | buckets | topn | column_choice | column_ids |
+----------+------------+-------------+---------+------+---------------+------------+
| 104 | 0 | 0 | 0 | -1 | PREDICATE | |
+----------+------------+-------------+---------+------+---------------+------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM mysql.analyze_jobs;
+----+---------------------+--------------+------------+----------------+------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+
| id | update_time | table_schema | table_name | partition_name | job_info | processed_rows | start_time | end_time | state | fail_reason | instance | process_id |
+----+---------------------+--------------+------------+----------------+------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+
| 1 | 2024-05-30 16:52:28 | test | t | | analyze table columns b with 256 buckets, 500 topn, 1 samplerate | 3 | 2024-05-30 16:52:28 | 2024-05-30 16:52:28 | finished | NULL | 127.0.0.1:4000 | NULL |
+----+---------------------+--------------+------------+----------------+------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+
1 row in set (0.00 sec)
mysql> alter table t drop column b;
Query OK, 0 rows affected (0.10 sec)
mysql> SELECT * FROM MYSQL.COLUMN_STATS_USAGE;
+----------+-----------+---------------------+---------------------+
| table_id | column_id | last_used_at | last_analyzed_at |
+----------+-----------+---------------------+---------------------+
| 104 | 2 | 2024-05-30 16:47:18 | 2024-05-30 16:52:28 |
+----------+-----------+---------------------+---------------------+
1 row in set (0.00 sec)
mysql> ANALYZE TABLE test.t PREDICATE COLUMNS;
Query OK, 0 rows affected, 1 warning (0.02 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1105 | No predicate column has been collected yet for table test.t so all columns are analyzed |
| Note | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t, reason to use this rate is "use min(1, 110000/3) as the sample-rate=1" |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM mysql.analyze_jobs;
+----+---------------------+--------------+------------+----------------+--------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+
| id | update_time | table_schema | table_name | partition_name | job_info | processed_rows | start_time | end_time | state | fail_reason | instance | process_id |
+----+---------------------+--------------+------------+----------------+--------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+
| 1 | 2024-05-30 16:52:28 | test | t | | analyze table columns b with 256 buckets, 500 topn, 1 samplerate | 3 | 2024-05-30 16:52:28 | 2024-05-30 16:52:28 | finished | NULL | 127.0.0.1:4000 | NULL |
| 2 | 2024-05-30 16:53:21 | test | t | | analyze table all columns with 256 buckets, 500 topn, 1 samplerate | 3 | 2024-05-30 16:53:21 | 2024-05-30 16:53:21 | finished | NULL | 127.0.0.1:4000 | NULL |
+----+---------------------+--------------+------------+----------------+--------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM MYSQL.COLUMN_STATS_USAGE;
+----------+-----------+--------------+---------------------+
| table_id | column_id | last_used_at | last_analyzed_at |
+----------+-----------+--------------+---------------------+
| 104 | 1 | NULL | 2024-05-30 16:53:21 |
+----------+-----------+--------------+---------------------+
1 row in set (0.00 sec) I am unsure why we insert the column 'a' into the MYSQL.COLUMN_STATS_USAGE table. I don't think it makes sense, but it is off-topic for this PR. We will fix it later. |
I don't if this is a bug or a feature. But the minimal reproduction steps are: mysql> use test;
Database changed
mysql> create table t (a int, b int);
Query OK, 0 rows affected (0.04 sec)
mysql> insert into t values (1, 1), (2, 2), (3, 3);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> set global tidb_enable_column_tracking = 1;
Query OK, 0 rows affected (0.01 sec)
mysql> analyze table t;
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> SELECT * FROM MYSQL.COLUMN_STATS_USAGE;
+----------+-----------+--------------+---------------------+
| table_id | column_id | last_used_at | last_analyzed_at |
+----------+-----------+--------------+---------------------+
| 104 | 1 | NULL | 2024-05-30 17:00:36 |
| 104 | 2 | NULL | 2024-05-30 17:00:36 |
+----------+-----------+--------------+---------------------+
2 rows in set (0.00 sec) I will figure out why this happens. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hawkingrei, qw4990 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
What problem does this PR solve?
Issue Number: ref #53567
Problem Summary:
We need to clean up the outdated predicate columns that have been dropped from the schema.
Otherwise, we will attempt to analyze non-existent columns.
What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.