Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close #19389

Merged
merged 3 commits into from
May 23, 2023

Conversation

gitccl
Copy link
Contributor

@gitccl gitccl commented May 8, 2023

Proposed changes

Issue Number: close #19283

Problem summary

Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.

ssb flat test

previous

  • lineorder 1.2G:
    • load time: 3s, query time: 0.355s
  • lineorder 5.8G:
    • load time: 330s, query time: 0.970s
    • load time: 349s, query time: 0.949s
    • load time: 349s, query time: 0.955s
    • load time: 360s, query time: 0.889s (pipeline enabled)

after

  • lineorder 1.2G:
    • load time: 3s, query time: 0.349s
  • lineorder 5.8G:
    • load time: 342s, query time: 0.929s
    • load time: 337s, query time: 0.913s
    • load time: 345s, query time: 0.946s
    • load time: 346s, query time: 0.865s (pipeline enabled)

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@gitccl gitccl marked this pull request as ready for review May 8, 2023 08:39
@gitccl
Copy link
Contributor Author

gitccl commented May 8, 2023

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented May 8, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.57 seconds
stream load tsv: 422 seconds loaded 74807831229 Bytes, about 169 MB/s
stream load json: 22 seconds loaded 2358488459 Bytes, about 102 MB/s
stream load orc: 60 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230511074731_clickbench_pr_142206.html

@gitccl gitccl force-pushed the enhance_scanner branch from 077b4c7 to ba170d4 Compare May 9, 2023 08:00
@gitccl
Copy link
Contributor Author

gitccl commented May 9, 2023

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@gitccl
Copy link
Contributor Author

gitccl commented May 9, 2023

run p0

@gitccl gitccl force-pushed the enhance_scanner branch from ba170d4 to e061256 Compare May 10, 2023 01:23
@gitccl
Copy link
Contributor Author

gitccl commented May 10, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl
Copy link
Contributor Author

gitccl commented May 10, 2023

run p0

1 similar comment
@gitccl
Copy link
Contributor Author

gitccl commented May 10, 2023

run p0

@yiguolei
Copy link
Contributor

This may have a large performance decrease. In this case, it means the block is allocated by scanner thread and used by fragment thread or released by fragment thread. In jemalloc, it will track the arena the memory is allocated from and it has to return the memory to the arena again when release. Every thread is bond to an arena. There will be lock or condition competition。

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl
Copy link
Contributor Author

gitccl commented May 11, 2023

run buildall

@gitccl
Copy link
Contributor Author

gitccl commented May 12, 2023

This may have a large performance decrease. In this case, it means the block is allocated by scanner thread and used by fragment thread or released by fragment thread. In jemalloc, it will track the arena the memory is allocated from and it has to return the memory to the arena again when release. Every thread is bond to an arena. There will be lock or condition competition。

It seems that there is no performance loss in ssb flat test. I pasted the test result above.

@gitccl gitccl force-pushed the enhance_scanner branch from 6958505 to 48fa685 Compare May 23, 2023 07:58
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl
Copy link
Contributor Author

gitccl commented May 23, 2023

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.11 seconds
stream load tsv: 442 seconds loaded 74807831229 Bytes, about 161 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 58 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 80.0 seconds inserted 10000000 Rows, about 125K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230523084143_clickbench_pr_148829.html

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 6efe6ef into apache:master May 23, 2023
@gitccl gitccl deleted the enhance_scanner branch May 24, 2023 02:15
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request May 31, 2023
yiguolei pushed a commit that referenced this pull request Aug 19, 2023
yiguolei pushed a commit that referenced this pull request Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] allocate blocks in scanner_context on demand and free them timely
3 participants