Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more rigorous about ensuring the right statements get executed. #2435

Closed
rodesai opened this issue Feb 12, 2019 · 7 comments
Closed

Be more rigorous about ensuring the right statements get executed. #2435

rodesai opened this issue Feb 12, 2019 · 7 comments

Comments

@rodesai
Copy link
Contributor

rodesai commented Feb 12, 2019

There is ongoing work (#2329) to decouple statement validation when executing the command topic from external validation. However, even our own internal validation can have different outcomes across different runs of the KSQL server. Consider the following scenario:

Recently there was a validation bug in KafkaStreams () where streams would accept windows larger than the default retention time. Such a window should not be accepted because it is not recoverable across node failures. Once the bug is fixed, KafkaStreams will reject such windows.

KSQL may have a command topic containing a statement with such a window that was accepted by the cluster. After an upgrade, the statement will be rejected, at which point KSQL's internal state may be corrupted. Though it would be reasonable to mark the query as failed, the final set of streams and tables should be the same.

We should be even more rigorous about this to ensure that the same statements are executed every time the command topic is replayed. I propose we do the following:
- When writing commands to the command topic, include a validation signature. This can be the command topic offset for which validation was done (we'd need some way to synchronize getting the sandbox with the executor thread in this case), or a checksum of all our state. I think the former would be more extensible.
- When executing statements, if the current validation signature is not the same as the one in the command, reject the statement. If the signature is the same, then apply KSQL's internal state changes even if the statement itself fails. If it fails we should have some way of communicating to the user that the query could not be started.

@big-andy-coates
Copy link
Contributor

big-andy-coates commented Feb 18, 2019

The server side pattern can be:

  • request received
  • create sandbox (which knows which offset its built from)
  • For each statement
    • validate
    • post to command topic. (tracking offset).
    • if (offset statement was produced at != expected)
      • wait for executor to reach max tracked offset.
      • create a new sandbox, remove any distributed statements and then start looping around statements again.

i.e. the server side would know when it got gazumped when writing to the command topic because the offset it produced a message at would be higher than expected. So the server side can automatically handle resyncing state, re-validating and posting to the command topic again.

The executor side, as you say, will reject any statement that isn't marked as having been validated against the right version of state.

@agavra
Copy link
Contributor

agavra commented Feb 20, 2019

This issue may be more prevalent than @rodesai's original description. If we have multiple servers, we validate a command with respect to our local meta-store. Imagine the following commands:

1: CREATE STREAM foo WITH (kafka_topic='bar');
2: CREATE STREAM foo WITH (kafka_topic='baz');

At time t=0, server A gets command 1 and server B gets command 0. Both are able to validate the command and enqueue the CS statement. The command topic will now have conflicting statements. This proposal fixes that as well.

@rodesai
Copy link
Contributor Author

rodesai commented Feb 22, 2019

In that case we'll actually do the right thing today - whichever statement gets queued second will fail every time, and we'll end up with the right internal state. But if (hypothetically) tomorrow we changed the semantics to implicitly drop the first stream and create the second (which would be a terrible idea that we should never do. just using it as an example here), having the validation described in this issue would guarantee we still get the same execution of the command topic.

agavra added a commit to agavra/ksql that referenced this issue Mar 19, 2019
@agavra
Copy link
Contributor

agavra commented Mar 19, 2019

Things get a little bit more complicated with multi-line statements when we use the offset-validation approach. We send each statement as a separate command, so it is possible that even though c1 and c2 are validated together, a command c3 is enqueued between them. This would make c2 fail. Not sure if this is desired behavior or not, but I will stick to this for the first pass.

agavra added a commit to agavra/ksql that referenced this issue Mar 19, 2019
agavra added a commit to agavra/ksql that referenced this issue Mar 19, 2019
agavra added a commit to agavra/ksql that referenced this issue Mar 19, 2019
agavra added a commit to agavra/ksql that referenced this issue Mar 19, 2019
@agavra
Copy link
Contributor

agavra commented Dec 3, 2019

Has this been fixed with #3660 @stevenpyzhang?

@stevenpyzhang
Copy link
Member

This is fixed now. The scenario you mentioned here can still happen though since each command goes through the transaction protocol individually so users submitting multi-line statements could get partially successful results
#2435 (comment)

@agavra
Copy link
Contributor

agavra commented Dec 4, 2019

I think that's okay - we may want to create a new issue to track that, but I'm closing this one out for now. Thanks!

@agavra agavra closed this as completed Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants