Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add stats about clustering status #7387

Closed
1 task done
ZhiHanZ opened this issue Aug 30, 2022 · 1 comment
Closed
1 task done

Feature: Add stats about clustering status #7387

ZhiHanZ opened this issue Aug 30, 2022 · 1 comment
Assignees
Labels
C-feature Category: feature

Comments

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Aug 30, 2022

Summary
Currently we support data compaction and cluster by some keys, it could be helpful for us to validate the clustering by providing some stats.
For example:

  1. Disk usage per column
  2. Clustering percentages
  3. Partition density
  4. Cluster depth by column

Tasks:

@ZhiHanZ ZhiHanZ added the C-feature Category: feature label Aug 30, 2022
@zhyass
Copy link
Member

zhyass commented Sep 2, 2022

@ZhiHanZ ,There is clustering_information. I don't know if it can meet the demand. Reference: https://docs.snowflake.com/en/sql-reference/functions/automatic_clustering_history.html

-- create table
create table t(a int, b int) cluster by(a+1);

-- insert some data to t
insert into t values(1,1),(3,3);
insert into t values(2,2),(5,5);
insert into t values(4,4);

select * from clustering_information('default','t');
+-----------------+-------------------+----------------------------+------------------+---------------+-----------------------+
| cluster_by_keys | total_block_count | total_constant_block_count | average_overlaps | average_depth | block_depth_histogram |
+-----------------+-------------------+----------------------------+------------------+---------------+-----------------------+
| ((a + 1))       |                 3 |                          1 |           1.3333 |           2.0 | {"00002":3}           |
+-----------------+-------------------+----------------------------+------------------+---------------+-----------------------+

Snowflake also has an automatic_clustering_history. maybe we can achieve this function?

AUTOMATIC_CLUSTERING_HISTORY(
      [ DATE_RANGE_START => <constant_expr> ]
      [ , DATE_RANGE_END => <constant_expr> ]
      [ , TABLE_NAME => '<string>' ] )

@zhyass zhyass closed this as completed Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature
Projects
None yet
Development

No branches or pull requests

2 participants