[Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns #22216

BiteTheDDDDt · 2023-07-25T13:47:17Z

Proposed changes

select count(1) from (
select ss_customer_sk customer_sk
      ,ss_item_sk item_sk
from store_sales,date_dim
where ss_sold_date_sk = d_date_sk
  and d_month_seq between 1199 and 1199 + 11 and ss_sold_date_sk IS NOT NULL
group by ss_customer_sk
        ,ss_item_sk
) a;

uint256 original
12.27 sec
                              -  HashTableInputCount:  55.286154M  (55286154)
                              -  HashTableIterateTime:  347.592ms
                              -  HashTableSize:  54.116764M  (54116764)
                              -  InsertKeysToColumnTime:  1s79ms
                              -  MaxRowSizeInBytes:  0
                              -  MemoryUsage:  
                                  -  HashTable:  2.50  GB
                                  -  SerializeKeyArena:  1.63  GB
                              -  MergeTime:  0ns
                              -  PeakMemoryUsage:  4.13  GB

stringref
17.81 sec

                              -  HashTableInputCount:  55.285529M  (55285529)
                              -  HashTableIterateTime:  222.769ms
                              -  HashTableSize:  54.116764M  (54116764)
                              -  InsertKeysToColumnTime:  340.796ms
                              -  MaxRowSizeInBytes:  17
                              -  MemoryUsage:  
                                  -  HashTable:  1.50  GB
                                  -  SerializeKeyArena:  1.76  GB
                              -  MergeTime:  0ns
                              -  PeakMemoryUsage:  3.26  GB

uint256 opt
11.20 sec

                              -  HashTableInputCount:  55.285554M  (55285554)
                              -  HashTableIterateTime:  278.187ms
                              -  HashTableSize:  54.116764M  (54116764)
                              -  InsertKeysToColumnTime:  467.894ms
                              -  MaxRowSizeInBytes:  0
                              -  MemoryUsage:  
                                  -  HashTable:  2.50  GB
                                  -  SerializeKeyArena:  1.63  GB
                              -  MergeTime:  0ns
                              -  PeakMemoryUsage:  4.13  GB

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

BiteTheDDDDt · 2023-07-25T13:48:44Z

run buildall

github-actions · 2023-07-25T13:55:28Z

clang-tidy review says "All clean, LGTM! 👍"

hello-stephen · 2023-07-25T14:54:40Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.4 seconds
stream load tsv: 507 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162140226 Bytes

update

BiteTheDDDDt · 2023-07-26T05:09:51Z

run buildall

github-actions · 2023-07-26T05:15:49Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-07-26T05:17:46Z

clang-tidy review says "All clean, LGTM! 👍"

hello-stephen · 2023-07-26T06:20:05Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.55 seconds
stream load tsv: 508 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.9 seconds inserted 10000000 Rows, about 334K ops/s
storage size: 17156773957 Bytes

HappenLee

LGTM

github-actions · 2023-07-26T08:16:17Z

PR approved by at least one committer and no changes requested.

github-actions · 2023-07-26T08:16:19Z

PR approved by anyone and no changes requested.

…:insert_keys_into_columns (apache#22216) optimization for AggregationMethodKeysFixed::insert_keys_into_columns

…:insert_keys_into_columns (#22216) optimization for AggregationMethodKeysFixed::insert_keys_into_columns

BiteTheDDDDt force-pushed the opt_0725 branch from d6f0e9c to 31e2ca5 Compare July 26, 2023 05:09

BiteTheDDDDt added 2 commits July 26, 2023 13:09

optimization for AggregationMethodKeysFixed::insert_keys_into_columns

7b259eb

update

update

b6d20e6

BiteTheDDDDt force-pushed the opt_0725 branch from 31e2ca5 to b6d20e6 Compare July 26, 2023 05:09

HappenLee approved these changes Jul 26, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 26, 2023

github-actions bot added the reviewed label Jul 26, 2023

zhangstar333 approved these changes Jul 26, 2023

View reviewed changes

BiteTheDDDDt merged commit 9451382 into apache:master Jul 26, 2023

xiaokang added dev/2.0.0 2.0.0 release dev/2.0.1 and removed dev/2.0.0 2.0.0 release labels Jul 26, 2023

xiaokang added dev/2.0.1-merged and removed dev/2.0.1 labels Aug 8, 2023

xiaokang pushed a commit to xiaokang/doris that referenced this pull request Aug 9, 2023

[Improvement](aggregate) optimization for AggregationMethodKeysFixed:…

7f72449

…:insert_keys_into_columns (apache#22216) optimization for AggregationMethodKeysFixed::insert_keys_into_columns

xiaokang pushed a commit that referenced this pull request Aug 11, 2023

[Improvement](aggregate) optimization for AggregationMethodKeysFixed:…

155e382

…:insert_keys_into_columns (#22216) optimization for AggregationMethodKeysFixed::insert_keys_into_columns

xiaokang mentioned this pull request Aug 30, 2023

Release Note 2.0.1 #23640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns #22216

[Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns #22216

BiteTheDDDDt commented Jul 25, 2023

BiteTheDDDDt commented Jul 25, 2023

github-actions bot commented Jul 25, 2023

hello-stephen commented Jul 25, 2023

BiteTheDDDDt commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

hello-stephen commented Jul 26, 2023

HappenLee left a comment

github-actions bot commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

[Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns #22216

[Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns #22216

Conversation

BiteTheDDDDt commented Jul 25, 2023

Proposed changes

Further comments

BiteTheDDDDt commented Jul 25, 2023

github-actions bot commented Jul 25, 2023

hello-stephen commented Jul 25, 2023

BiteTheDDDDt commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Jul 26, 2023

hello-stephen commented Jul 26, 2023

HappenLee left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 26, 2023

github-actions bot commented Jul 26, 2023