Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize getPartitionEndRow for window function #6668

Closed
gengliqi opened this issue Jan 19, 2023 · 0 comments · Fixed by #6693
Closed

Optimize getPartitionEndRow for window function #6668

gengliqi opened this issue Jan 19, 2023 · 0 comments · Fixed by #6693
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@gengliqi
Copy link
Contributor

Enhancement

  1. isDifferentFromPrevPartition has some useless copies.

    const auto reference_columns = inputAt(prev_frame_start);
    const auto compared_columns = inputAt(partition_end);
    for (size_t i = 0; i < partition_column_indices.size(); ++i)
    {
    const auto reference_column = reference_columns[partition_column_indices[i]];

  2. compare several times first to speed up the case that the partition end is very close.

If the partition key has a high cardinal number, the partition number will be huge and the performance lost by above mentioned is very significant.

Test step

create table win(s varchar(20));
insert into 50000000 lines where `s` are all different.
explain analyze select row_number() over (partition by s) rn from win;

v6.5.0

mysql> explain analyze select row_number() over (partition by s) rn from win;
+------------------------------------+-------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------+------+
| id                                 | estRows     | actRows  | task         | access object | execution info                                                                                                                                                                                                                                                         | operator info                                                                                                  | memory | disk |
+------------------------------------+-------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------+------+
| TableReader_25                     | 50000000.00 | 50000000 | root         |               | time:18.7s, loops:48833, cop_task: {num: 769, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}                                                                                                                                                          | data:ExchangeSender_24                                                                                         | N/A    | N/A  |
| └─ExchangeSender_24                | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:18.7s, loops:768, threads:8}                                                                                                                                                                                                                        | ExchangeType: PassThrough                                                                                      | N/A    | N/A  |
|   └─Projection_8                   | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:18.5s, loops:768, threads:8}                                                                                                                                                                                                                        | Column#4, stream_count: 8                                                                                      | N/A    | N/A  |
|     └─Window_23                    | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:18.5s, loops:768, threads:8}                                                                                                                                                                                                                        | row_number()->Column#4 over(partition by test.win.s rows between current row and current row), stream_count: 8 | N/A    | N/A  |
|       └─Sort_14                    | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:7.85s, loops:768, threads:8}                                                                                                                                                                                                                        | test.win.s, stream_count: 8                                                                                    | N/A    | N/A  |
|         └─ExchangeReceiver_13      | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:203.9ms, loops:3992, threads:8}                                                                                                                                                                                                                     | stream_count: 8                                                                                                | N/A    | N/A  |
|           └─ExchangeSender_12      | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:1.62s, loops:789, threads:20}                                                                                                                                                                                                                       | ExchangeType: HashPartition, Hash Cols: [name: test.win.s, collate: utf8mb4_bin], stream_count: 8              | N/A    | N/A  |
|             └─TableFullScan_11     | 50000000.00 | 50000000 | mpp[tiflash] | table:win     | tiflash_task:{time:28.2ms, loops:789, threads:20}, tiflash_scan:{dtfile:{total_scanned_packs:6145, total_skipped_packs:0, total_scanned_rows:50000000, total_skipped_rows:0, total_rs_index_load_time: 1ms, total_read_time: 2843ms}, total_create_snapshot_time: 0ms} | keep order:false                                                                                               | N/A    | N/A  |
+------------------------------------+-------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------+------+
8 rows in set (18.66 sec)

Applying #6625.

mysql> explain analyze select row_number() over (partition by s) rn from win;
+------------------------------------+-------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------+------+
| id                                 | estRows     | actRows  | task         | access object | execution info                                                                                                                                                                                                                                                         | operator info                                                                                                  | memory | disk |
+------------------------------------+-------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------+------+
| TableReader_25                     | 50000000.00 | 50000000 | root         |               | time:8.71s, loops:48833, cop_task: {num: 769, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}                                                                                                                                                          | data:ExchangeSender_24                                                                                         | N/A    | N/A  |
| └─ExchangeSender_24                | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:8.71s, loops:768, threads:8}                                                                                                                                                                                                                        | ExchangeType: PassThrough                                                                                      | N/A    | N/A  |
|   └─Projection_8                   | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:8.55s, loops:768, threads:8}                                                                                                                                                                                                                        | Column#4, stream_count: 8                                                                                      | N/A    | N/A  |
|     └─Window_23                    | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:8.55s, loops:768, threads:8}                                                                                                                                                                                                                        | row_number()->Column#4 over(partition by test.win.s rows between current row and current row), stream_count: 8 | N/A    | N/A  |
|       └─Sort_14                    | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:7.88s, loops:768, threads:8}                                                                                                                                                                                                                        | test.win.s, stream_count: 8                                                                                    | N/A    | N/A  |
|         └─ExchangeReceiver_13      | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:263ms, loops:4016, threads:8}                                                                                                                                                                                                                       | stream_count: 8                                                                                                | N/A    | N/A  |
|           └─ExchangeSender_12      | 50000000.00 | 50000000 | mpp[tiflash] |               | tiflash_task:{time:1.73s, loops:789, threads:20}                                                                                                                                                                                                                       | ExchangeType: HashPartition, Hash Cols: [name: test.win.s, collate: utf8mb4_bin], stream_count: 8              | N/A    | N/A  |
|             └─TableFullScan_11     | 50000000.00 | 50000000 | mpp[tiflash] | table:win     | tiflash_task:{time:16.5ms, loops:789, threads:20}, tiflash_scan:{dtfile:{total_scanned_packs:6145, total_skipped_packs:0, total_scanned_rows:50000000, total_skipped_rows:0, total_rs_index_load_time: 0ms, total_read_time: 2331ms}, total_create_snapshot_time: 0ms} | keep order:false                                                                                               | N/A    | N/A  |
+------------------------------------+-------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------+------+
8 rows in set (8.71 sec)
@gengliqi gengliqi added the type/enhancement The issue or PR belongs to an enhancement. label Jan 19, 2023
@gengliqi gengliqi self-assigned this Jan 19, 2023
ywqzzy pushed a commit to ywqzzy/tiflash_1 that referenced this issue Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant