-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CompressionLevel and make v2 Kafka sink default #19169
Conversation
Files changed:
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site configuration. |
c9da936
to
de9c4ec
Compare
The `kafka_sink_config` option allows configuration of a changefeed's message delivery, Kafka server version, and batching parameters. | ||
You can configure flushing, acknowledgments, compression, and concurrency behavior of changefeeds running to a Kafka sink with the following: | ||
|
||
- Set the [`changefeed.sink_io_workers` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-changefeed-sink-io-workers) to configure the number of concurrent workers used by changefeeds in the cluster when sending requests to a Kafka sink. When you set `changefeed.sink_io_workers`, it will not affect running changefeeds; [pause the changefeed]({% link {{ page.version.version }}/pause-job.md %}), set `changefeed.sink_io_workers`, and then [resume the changefeed]({% link {{ page.version.version }}/resume-job.md %}). Note that this cluster setting will also affect changefeeds running to [Google Cloud Pub/Sub](#google-cloud-pub-sub) sinks and [webhook sinks](#webhook-sink). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add this now that the v2 Kafka sink is the default? (This paragraph is included for Pub/Sub + Webhook too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the caveat that this only applies if running with the v2 kafka sink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not using the terminology v2
Kafka sink (as requested by Rachael), so I'll reference the cluster setting etc. here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good thanks. should i update the linked issue to use that terminology too?
@@ -154,6 +146,7 @@ Field | Type | Description | Default | |||
`"Version"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets the appropriate Kafka cluster version, which can be used to connect to [Kafka versions < v1.0](https://docs.confluent.io/platform/current/installation/versions-interoperability.html) (`kafka_sink_config='{"Version": "0.8.2.0"}'`). | `"1.0.0.0"` | |||
<a name="kafka-required-acks"></a>`"RequiredAcks"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Specifies what a successful write to Kafka is. CockroachDB [guarantees at least once delivery of messages]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-and-delivery-guarantees) — this value defines the **delivery**. The possible values are: <br><br>`"ONE"`: a write to Kafka is successful once the leader node has committed and acknowledged the write. Note that this has the potential risk of dropped messages; if the leader node acknowledges before replicating to a quorum of other Kafka nodes, but then fails.<br><br>`"NONE"`: no Kafka brokers are required to acknowledge that they have committed the message. This will decrease latency and increase throughput, but comes at the cost of lower consistency.<br><br>`"ALL"`: a quorum must be reached (that is, most Kafka brokers have committed the message) before the leader can acknowledge. This is the highest consistency level. {% include {{ page.version.version }}/cdc/kafka-acks.md %} | `"ONE"` | |||
<a name="kafka-compression"></a>`"Compression"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets a compression protocol that the changefeed should use when emitting events. The possible values are: `"NONE"`, `"GZIP"`, `"SNAPPY"`, `"LZ4"`, `"ZSTD"`. | `"NONE"` | |||
<span class="version-tag">New in v24.3:</span>`"CompressionLevel"` | [`INT`]({% link {{ page.version.version }}/int.md %}) | Sets the level of compression. This determines the level of compression ratio versus compression speed, i.e., how much the data size is reduced (better compression) and how quickly the compression process is completed. The compression protocols have the following ranges:<br>`GZIP`:<ul><li>`0` no compression</li><li>`1` to `9` best speed to best compression</li><li>`-1` default</li><li>`-2` [Huffman-only compression](https://en.wikipedia.org/wiki/Huffman_coding)</li></ul>`ZSTD`:<ul><li>`1` fastest</li><li>`2` default</li><li>`3` better compression</li><li>`4` best compression</li></ul>`LZ4`<ul><li>0 fast default</li><li>`512 * N` Level N, where N is between `1` and `9`. The higher the number, the better compression</li></ul>**Note:** If you have the `changefeed.new_kafka_sink.enabled` cluster setting disabled, `CompressionLevel` will not affect `LZ4` compression. `SNAPPY` does not support `CompressionLevel`. | `GZIP`: `-1`<br><br>`ZSTD`: `2`<br><br>`LZ4`: `0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind cleaning this up a bit? capitalization, more words, english, etc. eg - 0: No Compression
also there's a gzip compression level -3: stateless compression
also, i just found that in kafkav2, it won't let you set compression level < 0. this should be a known issue i guess. just filed an issue for it: cockroachdb/cockroach#136492
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asg0451 I changed this up a bit, the table for all the information wasn't working so I added some subsections with the table acting more as a list. You'll have to excuse the diff noise of the other field subsections. The LZ4
values, I haven't added those — I've instead stuck to the same format as GZIP
using the LZ4
levels, let me know what you think.
For the default GZIP
value, I've left it as -1
, but noted you can't manually set this as such — this is what I interpreted from your Slack message, so I may be wrong here.
I added the known limitation and tried to be clear about how this does/doesn't apply.
PTAL! Here's the preview: https://deploy-preview-19169--cockroachdb-docs.netlify.app/docs/v24.3/changefeed-sinks.html#kafka-sink-configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the confusion -- in the screenshot above the values you specify are the ones in grey, not the constant names. so for "Level1" you specify 512, "Level2" 1024, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh, I'm sorry, I didn't grasp that the values were what the user had to specify. I will update.
@@ -0,0 +1 @@ | |||
Changefeeds created in v24.3 of CockroachDB that emit to [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka), or changefeeds created in earlier versions with the `changefeed.new_kafka_sink.enabled` cluster setting enabled, do not support negative compression level values in the [`kafka_sink_config = {... "CompressionLevel" = ...}`]({% link {{ page.version.version }}/changefeed-sinks.md %}#compressionlevel) option field. [#136492](https://github.com/cockroachdb/cockroach/issues/136492) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call out that this is specifically for GZIP pls
The `kafka_sink_config` option allows configuration of a changefeed's message delivery, Kafka server version, and batching parameters. | ||
You can configure flushing, acknowledgments, compression, and concurrency behavior of changefeeds running to a Kafka sink with the following: | ||
|
||
- Set the [`changefeed.sink_io_workers` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-changefeed-sink-io-workers) to configure the number of concurrent workers used by changefeeds in the cluster when sending requests to a Kafka sink. When you set `changefeed.sink_io_workers`, it will not affect running changefeeds; [pause the changefeed]({% link {{ page.version.version }}/pause-job.md %}), set `changefeed.sink_io_workers`, and then [resume the changefeed]({% link {{ page.version.version }}/resume-job.md %}). Note that this cluster setting will also affect changefeeds running to [Google Cloud Pub/Sub](#google-cloud-pub-sub) sinks and [webhook sinks](#webhook-sink). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the caveat that this only applies if running with the v2 kafka sink
-------------------+---------------------+------------------+------------------- | ||
`"ClientID"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Applies a Kafka client ID per changefeed. Configure [quotas](https://kafka.apache.org/documentation/#quotas) within your Kafka configuration that apply to a unique client ID. The `ClientID` field can only contain the characters `A-Za-z0-9._-`. For more details, refer to [`ClientID`](#clientid). | "" | ||
<a name="kafka-compression"></a>`"Compression"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets a compression protocol that the changefeed should use when emitting events. The possible values are: `"NONE"`, `"GZIP"`, `"SNAPPY"`, `"LZ4"`, `"ZSTD"`. | `"NONE"` | ||
<span class="version-tag">New in v24.3:</span>`"CompressionLevel"` | [`INT`]({% link {{ page.version.version }}/int.md %}) | Sets the level of compression. This determines the level of compression ratio versus compression speed, i.e., how much the data size is reduced (better compression) and how quickly the compression process is completed. For the compression protocol ranges, refer to [`CompressionLevel`](#compressionlevel).<br><br>**Note:** If you have the `changefeed.new_kafka_sink.enabled` cluster setting disabled, `CompressionLevel` will not affect `LZ4` compression. `SNAPPY` does not support `CompressionLevel`. | `GZIP`: `-1`<br><br>`ZSTD`: `2`<br><br>`LZ4`: `0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while accurate, i'm not sure how i feel about listing GZIP's default as -1 given that the user can't set that with the v2 sink. maybe there's a better way to explain the defaults than to list the magic values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to be pretty awkward given there are "default" compression levels. Throughout the docs, we provide the default values for options/settings etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i know. kinda awkward either way i guess. up to you
@@ -0,0 +1 @@ | |||
Changefeeds created in v24.3 of CockroachDB that emit to [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka), or changefeeds created in earlier versions with the `changefeed.new_kafka_sink.enabled` cluster setting enabled, do not support negative compression level values for `GZIP` compression in the [`kafka_sink_config = {... "CompressionLevel" = ...}`]({% link {{ page.version.version }}/changefeed-sinks.md %}#compressionlevel) option field. [#136492](https://github.com/cockroachdb/cockroach/issues/136492) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added GZIP
in here.
|
||
{{site.data.alerts.callout_info}} | ||
`changefeed.sink_io_workers` only applies to Kafka sinks created in v24.2.1+, or if the `changefeed.new_kafka_sink.enabled` cluster setting has been enabled in CockroachDB clusters running v23.2.10+ and v24.1.4+. | ||
{{site.data.alerts.end}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this note above re how the concurrent worker cluster setting interacts with the newer Kafka sink. I think I have the versioning and such correct here.
-------------------+---------------------+------------------+------------------- | ||
`"ClientID"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Applies a Kafka client ID per changefeed. Configure [quotas](https://kafka.apache.org/documentation/#quotas) within your Kafka configuration that apply to a unique client ID. The `ClientID` field can only contain the characters `A-Za-z0-9._-`. For more details, refer to [`ClientID`](#clientid). | "" | ||
<a name="kafka-compression"></a>`"Compression"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets a compression protocol that the changefeed should use when emitting events. The possible values are: `"NONE"`, `"GZIP"`, `"SNAPPY"`, `"LZ4"`, `"ZSTD"`. | `"NONE"` | ||
<span class="version-tag">New in v24.3:</span>`"CompressionLevel"` | [`INT`]({% link {{ page.version.version }}/int.md %}) | Sets the level of compression. This determines the level of compression ratio versus compression speed, i.e., how much the data size is reduced (better compression) and how quickly the compression process is completed. For the compression protocol ranges, refer to [`CompressionLevel`](#compressionlevel).<br><br>**Note:** If you have the `changefeed.new_kafka_sink.enabled` cluster setting disabled, `CompressionLevel` will not affect `LZ4` compression. `SNAPPY` does not support `CompressionLevel`. | Refer to [`CompressionLevel`](#compressionlevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Default column here, I took out the values and referred readers to the section with the fuller explanations (including the description re the known limitation for GZIP). I hope that's a good compromise, i.e, removing the default values without context and listing the defaults only in the section with the context.
{% comment %} | ||
These values are not available yet per KL #136492 | ||
- `-1`: Default compression | ||
- `-2`: [Huffman-only compression](https://en.wikipedia.org/wiki/Huffman_coding) | ||
- `-3`: Stateless compression | ||
{% endcomment %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've commented this out, but can fully remove.
- `LZ4`: The following list represents the supported values from fastest compression to best compression: | ||
- `0`: Fastest compression (Default) | ||
- `512` | ||
- `1024` | ||
- `2048` | ||
- `4096` | ||
- `8192` | ||
- `16384` | ||
- `32768` | ||
- `65536` | ||
- `131072`: Best compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope I have now understood this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
4fdb65c
to
3a37922
Compare
Fixes DOC-11339, DOC-10867, DOC-10830, DOC-10700
This PR:
kafka_sink_config
option with theCompressionLevel
field in v24.3.changefeed.sink-io-workers
under Kafka for the default v2 sink.Rendered Preview
https://deploy-preview-19169--cockroachdb-docs.netlify.app/docs/v24.3/changefeed-sinks.html#kafka-sink-configuration