-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be more rigorous about ensuring the right statements get executed. #2435
Comments
The server side pattern can be:
i.e. the server side would know when it got gazumped when writing to the command topic because the offset it produced a message at would be higher than expected. So the server side can automatically handle resyncing state, re-validating and posting to the command topic again. The executor side, as you say, will reject any statement that isn't marked as having been validated against the right version of state. |
This issue may be more prevalent than @rodesai's original description. If we have multiple servers, we validate a command with respect to our local meta-store. Imagine the following commands: 1: CREATE STREAM foo WITH (kafka_topic='bar');
2: CREATE STREAM foo WITH (kafka_topic='baz'); At time t=0, server |
In that case we'll actually do the right thing today - whichever statement gets queued second will fail every time, and we'll end up with the right internal state. But if (hypothetically) tomorrow we changed the semantics to implicitly drop the first stream and create the second (which would be a terrible idea that we should never do. just using it as an example here), having the validation described in this issue would guarantee we still get the same execution of the command topic. |
Things get a little bit more complicated with multi-line statements when we use the offset-validation approach. We send each statement as a separate command, so it is possible that even though |
Has this been fixed with #3660 @stevenpyzhang? |
This is fixed now. The scenario you mentioned here can still happen though since each command goes through the transaction protocol individually so users submitting multi-line statements could get partially successful results |
I think that's okay - we may want to create a new issue to track that, but I'm closing this one out for now. Thanks! |
There is ongoing work (#2329) to decouple statement validation when executing the command topic from external validation. However, even our own internal validation can have different outcomes across different runs of the KSQL server. Consider the following scenario:
Recently there was a validation bug in KafkaStreams () where streams would accept windows larger than the default retention time. Such a window should not be accepted because it is not recoverable across node failures. Once the bug is fixed, KafkaStreams will reject such windows.
KSQL may have a command topic containing a statement with such a window that was accepted by the cluster. After an upgrade, the statement will be rejected, at which point KSQL's internal state may be corrupted. Though it would be reasonable to mark the query as failed, the final set of streams and tables should be the same.
We should be even more rigorous about this to ensure that the same statements are executed every time the command topic is replayed. I propose we do the following:
- When writing commands to the command topic, include a validation signature. This can be the command topic offset for which validation was done (we'd need some way to synchronize getting the sandbox with the executor thread in this case), or a checksum of all our state. I think the former would be more extensible.
- When executing statements, if the current validation signature is not the same as the one in the command, reject the statement. If the signature is the same, then apply KSQL's internal state changes even if the statement itself fails. If it fails we should have some way of communicating to the user that the query could not be started.
The text was updated successfully, but these errors were encountered: