-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow streams topic prefixed configs #3691
Conversation
# Note: the value 3 requires at least 3 brokers in your Kafka cluster. | ||
# Configure underlying Kafka Streams internal topics in order to achieve better fault tolerance and | ||
# durability, even in the face of Kafka broker failures. Highly recommended for mission critical applications. | ||
# Note that value 3 requires at least 3 brokers in your kafka cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Note that value 3 requires at least 3 brokers in your kafka cluster | |
# Note that value 3 requires at least 3 brokers in your Kafka cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -566,5 +568,8 @@ When deploying KSQL to production, the following settings are recommended in you | |||
# Bump the number of replicas for state storage for stateful operations | |||
# like aggregations and joins. By having two replicas (one main and one | |||
# standby) recovery from node failures is quicker since the state doesn't | |||
# have to be rebuilt from scratch. | |||
# have to be rebuilt from scratch. This configuration is also essential for | |||
# pull queries to be highly available during node failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# pull queries to be highly available during node failures | |
# pull queries to be highly available during node failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ksql.streams.num.standby.replicas=1 | ||
|
||
For your convenience, a sample file is provided at ``<path-to-ksql-repo>/config/ksql-production-server.properties`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People may not have cloned the KSQl repo, should we just put the URL here? https://github.com/confluentinc/ksql/tree/master/config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that. But what if this config changes based on versions? The installation directory seems to be only binaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of suggestions.
b4a97be
to
e8c053d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some inline comments and suggestions, nothing major
# Set the batch expiry to Integer.MAX_VALUE to ensure that queries will not | ||
# terminate if the underlying Kafka cluster is unavailable for a period of | ||
# time. | ||
ksql.streams.producer.delivery.timeout.ms=2147483647 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want the default to be set to something that's so high? will the behavior just hang? sometimes I'd prefer to have a hard error to notify me that something is wrong rather than silently suppress it. Same question goes for the setting below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know why this value was chosen myself. I just kept whats already documented here https://docs.confluent.io/current/ksql/docs/installation/server-config/config-reference.html#recommended-ksql-production-settings
I share some views as you, but since this is subjective I will leave it as-is. There are probably other ways of getting alerted or notified (e.g processing lag).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should revisit some of these default settings. A lot fo them filtered here from specific applications wherein was better to block forever than time out (and then have the streams app die). But in the general case, these may not be the best values.
Shall we file an issue to track follow up work to update these with a better general set of values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg. created #3706
# Note that value 3 requires at least 3 brokers in your kafka cluster | ||
# See https://docs.confluent.io/current/streams/developer-guide/config-streams.html#recommended-configuration-parameters-for-resiliency | ||
ksql.streams.replication.factor=3 | ||
ksql.streams.producer.acks=all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this for all of the producers in ksql streams? I feel like we should have a setting just for our internal topics instead of prescribing this setting to all streams apps (and maybe we shouldn't even make it configurable but rather require that setting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any streams application that prefers not losing data has to have this. This way of configuring kafka for no data loss has been documented even with say transaction. So it's a fairly standard thing to do. Otherwise, what you are computing is just an approximation. The downside of this configuration over not rf=1, acks!=all is slower performance. I still prefer to let users start from a correct state and then tune for performance.
I think there is still merit in making it configurable - for e.g: replication factor > 3
|
||
# Set the storage directory for stateful operations like aggregations and | ||
# joins to be at a durable location. By default, they are stored in /tmp. | ||
ksql.streams.state.dir=/some/non-temporary-storage-path/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something that the user needs to actively change, is there anyway we can bring attention to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server will refuse to start and error out opening that path. Hopefully that will call attention. Will add a note similar to what we have for replication factor above.
@@ -77,7 +77,8 @@ | |||
|
|||
if (propertyName.startsWith(KSQL_STREAMS_PREFIX) | |||
&& !propertyName.startsWith(KSQL_STREAMS_PREFIX + StreamsConfig.PRODUCER_PREFIX) | |||
&& !propertyName.startsWith(KSQL_STREAMS_PREFIX + StreamsConfig.CONSUMER_PREFIX)) { | |||
&& !propertyName.startsWith(KSQL_STREAMS_PREFIX + StreamsConfig.CONSUMER_PREFIX) | |||
&& !propertyName.startsWith(KSQL_STREAMS_PREFIX + StreamsConfig.TOPIC_PREFIX)) { | |||
return Optional.empty(); // Unknown streams config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of scope for this PR, but I'm wondering why we don't let people pass in any arbitrary streams config and let streams do the verification here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have invested a lot into resolving configs in specific ways. I don't know the full context. But yeah, we could keep it simpler and pass everything with prefix to streams. But as of now, there are only these three configs that Streams documents. So may be ok for now.
Fixes confluentinc#817 - Adds new property file for production settings - Changes to allow topic prefixed streams configs such as min.insync.replicas - Unit tests added - Verfied locally that the property file, creates topics with correct configuration
e8c053d
to
81cb246
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm convinced
Fixes #817
Description
Testing done
Reviewer checklist