Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): distributed execution of recluster #13048

Merged
merged 16 commits into from
Oct 25, 2023

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Sep 27, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

enable_distributed_recluster: Enable distributed execution of table recluster, default is 0.

> set global enable_distributed_recluster=1;

SET
  GLOBAL enable_distributed_recluster = 1

0 row read in 0.601 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

> alter table test_order recluster final;

ALTER TABLE
  test_order recluster final

202563631 rows written in 431.972 sec. Processed 202.56 million rows, 115.82 GiB (468.93 thousand rows/s, 274.56 MiB/s)

>select * from clustering_information('zzq','test_order');

SELECT
  *
FROM
  clustering_information('zzq', 'test_order')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│    cluster_key    │ total_block_count │ constant_block_count │ unclustered_block_count │ average_overlaps │ average_depth │   block_depth_histogram  │
│       String      │       UInt64      │        UInt64        │          UInt64         │      Float64     │    Float64    │          Variant         │
├───────────────────┼───────────────────┼──────────────────────┼─────────────────────────┼──────────────────┼───────────────┼──────────────────────────┤
│ (id, insert_time) │               529 │                   91 │                       0 │          15.8261 │       16.8261 │ {"00001":437,"00128":92} │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row read in 1.334 sec. Processed 1 row, 448 B (0.75 row/s, 335 B/s)
> set global enable_distributed_recluster=0;

SET
  GLOBAL enable_distributed_recluster = 0

0 row read in 0.601 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

> alter table test_order recluster final;

ALTER TABLE
  test_order recluster final

222679181 rows written in 939.860 sec. Processed 222.68 million rows, 129.19 GiB (236.93 thousand rows/s, 140.76 MiB/s)

> select * from clustering_information('zzq','test_order');

SELECT
  *
FROM
  clustering_information('zzq', 'test_order')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│    cluster_key    │ total_block_count │ constant_block_count │ unclustered_block_count │ average_overlaps │ average_depth │   block_depth_histogram  │
│       String      │       UInt64      │        UInt64        │          UInt64         │      Float64     │    Float64    │          Variant         │
├───────────────────┼───────────────────┼──────────────────────┼─────────────────────────┼──────────────────┼───────────────┼──────────────────────────┤
│ (id, insert_time) │               543 │                   80 │                       0 │          11.9337 │       12.9337 │ {"00001":462,"00128":81} │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row read in 1.886 sec. Processed 1 row, 448 B (0.03 row/s, 12 B/s)
  • Closes #issue

This change is Reviewable

@vercel
Copy link

vercel bot commented Sep 27, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 24, 2023 3:26am

@zhyass zhyass marked this pull request as draft September 27, 2023 05:26
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Sep 27, 2023
@BohuTANG BohuTANG mentioned this pull request Sep 27, 2023
8 tasks
@zhyass zhyass force-pushed the recluster_dis branch 6 times, most recently from c00121f to dcf6e66 Compare October 12, 2023 07:26
@zhyass zhyass added the ci-cloud Build docker image for cloud test label Oct 12, 2023
@zhyass zhyass force-pushed the recluster_dis branch 2 times, most recently from 3053396 to e42175c Compare October 15, 2023 09:49
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 15, 2023
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 16, 2023
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 16, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 16, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 16, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 16, 2023
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 16, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 16, 2023
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 16, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 18, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 18, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Oct 18, 2023
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 18, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-13048-01d54a0

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dantengsky dantengsky added this pull request to the merge queue Oct 25, 2023
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Oct 25, 2023
@BohuTANG BohuTANG merged commit 2a70285 into databendlabs:main Oct 25, 2023
60 checks passed
@zhyass zhyass deleted the recluster_dis branch November 10, 2023 17:14
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* distributed execution of recluster

* enable distribute recluster

* resolve conflict

* remove checkout prefect block

* add metrics

* fix test

* add comment

* add setting enable_distribute_recluster

* enable in optimize

* add status

* add test

* resolve conflict

---------

Co-authored-by: sundyli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants