-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: compress ndjson files for cloud sink #43103
Comments
I think that the steps to implement this would require adding:
Probably the solution should store an This straightforward task adds a lot of value to the cloudstorage sink. One consideration should be the tolerance of consumers of these files to compression. For example, will snowflake accept compressed files? How painful will the compression make these files to use? @piyush-singh |
Snowflake supports creating custom file formats for ingestion which includes specifying compression types. The compression types they support are:
Having run through this process, this is a fairly trivial change. It would just take a one time, few minute long setup. |
Awesome! I marked this as |
@ajwerner Can I work on this issue.? |
Yes you may! Let me know if you run in to any issues. |
Is your feature request related to a problem? Please describe.
After setting up CDC on the registration cluster, we noticed that the files sent to the S3 bucket were uncompressed and therefore consumed much more space than the logical size in the cluster. For reference, the
stmtstats
table was 250 GiB in the admin UI, but it was around 8 TB in S3.We should compress the CDC output files. cc @ajwerner
The text was updated successfully, but these errors were encountered: