feat: add command topic offset to commands and perform validation based on offset when executing commands #3330

stevenpyzhang · 2019-09-11T19:06:52Z

Description

Based off of discussion in #3278 and #2435.

This PR adds an offset value to both Command and QueuedCommand objects. The field in Command represents the point when a Ksql statement was validated and put on the CommandTopic. The field in QueuedCommand represents the offset that the command was read from, it's assigned from the corresponding Kafka ConsumerRecord.offset()

Request Validator was also rewritten to take a KsqlExecutionContext instead of a ServiceContext

This change also makes it so that when a statement that creates a query is executed, query_id generation can utilize the current statement offset instead of relying on incrementing a value (Future PR).

Testing done

Local tests
Update existing tests

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

stevenpyzhang · 2019-09-11T19:13:06Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandRunner.java

+      final int commandTopicOffset = queuedCommand.getCommand().getCommandTopicOffset();
+      if (commandTopicOffset == -1
+          || commandTopicOffset == commandStore.getSnapshotWithOffset().getSnapshotOffset()) {
+        commandStore.setOffsetValue(queuedCommand.getOffset() + 1);


I ended up splitting updating the offset and the snapshot into two separate calls because there appears to be a race condition that can occur when a KsqlRequest contains multiple statements.

The integration test KsqlResourceFunctionalTest.shouldHandleInterDependantCsasTerminateAndDrop was failing

final List<KsqlEntity> results = makeKsqlRequest( "CREATE STREAM SS AS SELECT * FROM " + PAGE_VIEW_STREAM + ";" + "TERMINATE CSAS_SS_" + NEXT_QUERY_ID.get() + ";" + "DROP STREAM SS;"

These commands are being pushed to the command topic in this order

Command{statement='CREATE STREAM SS WITH (KAFKA_TOPIC='SS', PARTITIONS=1, REPLICAS=1) AS SELECT * FROM PAGEVIEWS_ORIGINAL PAGEVIEWS_ORIGINAL EMIT CHANGES;',commandTopicOffset=1, overwriteProperties={}} Command{statement='TERMINATE CSAS_SS_0;',commandTopicOffset=1, overwriteProperties={}} Command{statement='DROP STREAM SS;',commandTopicOffset=2, overwriteProperties={}}

The terminate command isn't being executed since the offset signature doesn't match the current one, which then causes the drop statement to fail.

The problem with doing this is that we now have a race. The runner thread does:
1. Set the new offset/signature
2. Run the command
3. Set the new snapshot

A conflicting request can come in and be validated between 1 and 2, but not actually be a valid request.

Instead, we can add a recompute() method to SnapshotWithOffset that returns a new SnapshotWithOffset with an updated offset/signature (implemented by incrementing the offset). We'd also need to know when to call recompute. We can do that by passing in a predicate from KsqlResource that returns true if the command doesn't have a custom executor.

Another option, which I think I like better would be to add the notion of a "batch" to CommandStore. So you could have something like:

class CommandStore { BatchContext newBatch(final long offset) { return new BatchContext(offset); } // then add BatchContext as an argument enqueueCommand(final BatchContext ctx, final String statement, ...) { final Command command = new Command(ctx.offset(),...); ctx.incrementOffset(); } }

…ed on offset when executing commands

rodesai

Thanks, @stevenpyzhang! I think we are on the right track, but need to tighten up some raciness. I've left that feedback in-line. I also think that we should use the name "signature" rather than "offset" outside of the CommandStore - the fact that we're using the offset is an implementation detail.

rodesai · 2019-09-11T22:11:40Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandQueue.java

@@ -91,4 +91,10 @@ void ensureConsumedPast(long seqNum, Duration timeout)
   */
  @Override
  void close();
+
+  void setOffsetValue(int offsetValue);


Combine these into a common call setSnapshotWithOffset - we need to update them together atomically.

rodesai · 2019-09-11T22:13:46Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandRunner.java

+            );
+            commandStore.setSnapshot();
+          }
+        }


We need to complete the command with an error so that the request thread is not blocked.

rodesai · 2019-09-11T22:21:24Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandStore.java

@@ -102,8 +127,11 @@ public QueuedCommandStatus enqueueCommand(final ConfiguredStatement<?> statement
    final CommandId commandId = commandIdAssigner.getCommandId(statement.getStatement());
    final Command command = new Command(
        statement.getStatementText(),
+        snapshotWithOffset.getSnapshotOffset(),


The offset we put in the command needs to be the exact same offset used when validating - by this point the offset may have changed. We should instead pass the offset in (which we validated against), and write that offset into the command.

rodesai · 2019-09-11T23:35:50Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/Command.java

                 @JsonProperty("streamsProperties") final Map<String, Object> overwriteProperties,
                 @JsonProperty("originalProperties") final Map<String, String> originalProperties) {
    this.statement = statement;
+    this.commandTopicOffset =


I would refer to this as a validationSignature. The fact that we're using the offset is an implementation detail.

I personally like offset because it also provides some guarantees its characteristics (i.e. monotonically increasing) - so it gives more, perhaps useful, information beyond "signature"

rodesai · 2019-09-11T23:39:53Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandRunner.java

+      final int commandTopicOffset = queuedCommand.getCommand().getCommandTopicOffset();
+      if (commandTopicOffset == -1
+          || commandTopicOffset == commandStore.getSnapshotWithOffset().getSnapshotOffset()) {
+        commandStore.setOffsetValue(queuedCommand.getOffset() + 1);


The problem with doing this is that we now have a race. The runner thread does:
1. Set the new offset/signature
2. Run the command
3. Set the new snapshot

A conflicting request can come in and be validated between 1 and 2, but not actually be a valid request.

Instead, we can add a recompute() method to SnapshotWithOffset that returns a new SnapshotWithOffset with an updated offset/signature (implemented by incrementing the offset). We'd also need to know when to call recompute. We can do that by passing in a predicate from KsqlResource that returns true if the command doesn't have a custom executor.

rodesai · 2019-09-11T23:42:38Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandStore.java

@@ -85,6 +94,22 @@ public String getCommandTopicName() {
    return commandTopic.getCommandTopicName();
  }

+  @Override
+  public void setOffsetValue(final int offsetValue) {


Combine setOffsetValue and setSnapshot into 1 method. In that method, we should create a new SnapshotWithOffset with the new snapshot/offset. Then, having made snapshotWithOffset an atomic reference as suggested above, we should do snapshotWithOffset.set(/* the new object we created */)

rodesai · 2019-09-11T23:43:33Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandStore.java

@@ -45,6 +46,8 @@
  private final CommandIdAssigner commandIdAssigner;
  private final Map<CommandId, CommandStatusFuture> commandStatusMap;
  private final SequenceNumberFutureStore sequenceNumberFutureStore;
+  private SnapshotWithOffset snapshotWithOffset;


Change this to AtomicReference<SnapshotWithOffset>. Then we can safely read it from the request threads even as its written by the command runner thread.

rodesai · 2019-09-11T23:44:20Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/CommandStore.java

+
+  @Override
+  public SnapshotWithOffset getSnapshotWithOffset() {
+    return snapshotWithOffset;


Having changed snapshotWithOffset to an atomic reference, this would be snapshotWithOffset.get();

rodesai · 2019-09-11T23:47:00Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/computation/SnapshotWithOffset.java

+
+import io.confluent.ksql.KsqlExecutionContext;
+
+public class SnapshotWithOffset {


rename to SnapshotWithSignature

agavra · 2019-09-12T01:32:12Z

Just leaving a comment saying that I'd like to review this before committing! I'll get to this tomorrow :)

agavra

I think we're trying to batch too many nuanced changes together with this one PR, and some that (in my opinion) need a more thorough design, perhaps even a KLIP so that the problem can be clearly understood. Writing distributed validation is a really hard problem (you can see my failed attempt in #2582), and if I understand correctly the more urgent problem we're trying to solve isn't #2435 but rather #3269, which doesn't need that.

For this change, I'd be more comfortable sticking to just adding the offset to QueuedCommand and using that in queryID generation (and just using -1 for queryID generation on the REST side validation) - we can handle race conditions in a future change.

cc @rodesai - thoughts?

big-andy-coates

Thanks @stevenpyzhang

I've not got time to review this this evening, but I'd like to review before its merged... hence requesting changes..

big-andy-coates · 2019-09-12T16:03:06Z

ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/CommandTopic.java

@@ -127,7 +127,8 @@ public RecordMetadata send(final CommandId commandId, final Command command) {
            new QueuedCommand(
                record.key(),
                record.value(),
-                Optional.empty()));
+                Optional.empty(),
+                (int) record.offset()));


Offset is a long - we shouldn't be casting it to an int.

rodesai · 2019-09-12T18:09:48Z

For this change, I'd be more comfortable sticking to just adding the offset to QueuedCommand and using that in queryID generation (and just using -1 for queryID generation on the REST side validation) - we can handle race conditions in a future change.

cc @rodesai - thoughts?

The problem is that this is likely to corrupt data if there are racing commands, and I'd rather not change the query id generation to something we know is broken for even the simple case without restarts. Ideally we'd build the command validation mechanism first, and then build query id generation on top of that.

stevenpyzhang · 2019-09-12T21:40:51Z

After some offline discussion, this PR will be put on pause for now. There's a smaller PR open now #3343 that just focuses on adding fields to QueuedCommand and Command objects in order to update query id generation code.

stevenpyzhang · 2019-10-11T20:01:17Z

The new approach is to use Kafka's transactional producer APIs so closing this pr as it's not relevant now.

stevenpyzhang requested a review from a team as a code owner September 11, 2019 19:06

stevenpyzhang commented Sep 11, 2019

View reviewed changes

stevenpyzhang force-pushed the add-command-topic-offset-to-commands branch 2 times, most recently from 36b6033 to ace0cfb Compare September 11, 2019 20:44

feat: add command topic offset to commands and perform validation bas…

ff00fea

…ed on offset when executing commands

stevenpyzhang force-pushed the add-command-topic-offset-to-commands branch from ace0cfb to ff00fea Compare September 11, 2019 22:32

stevenpyzhang requested a review from rodesai September 11, 2019 22:34

rodesai reviewed Sep 12, 2019

View reviewed changes

agavra suggested changes Sep 12, 2019

View reviewed changes

agavra requested a review from a team September 12, 2019 15:53

big-andy-coates suggested changes Sep 12, 2019

View reviewed changes

stevenpyzhang mentioned this pull request Sep 12, 2019

feat: add offset to QueuedCommand and flag to Command #3343

Merged

2 tasks

stevenpyzhang closed this Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add command topic offset to commands and perform validation based on offset when executing commands #3330

feat: add command topic offset to commands and perform validation based on offset when executing commands #3330

stevenpyzhang commented Sep 11, 2019

stevenpyzhang Sep 11, 2019 •

edited

Loading

rodesai Sep 11, 2019

rodesai Sep 12, 2019

rodesai left a comment

rodesai Sep 11, 2019

rodesai Sep 11, 2019

rodesai Sep 11, 2019

rodesai Sep 11, 2019

agavra Sep 12, 2019 •

edited

Loading

rodesai Sep 11, 2019

rodesai Sep 11, 2019

rodesai Sep 11, 2019

rodesai Sep 11, 2019

rodesai Sep 11, 2019

agavra commented Sep 12, 2019

agavra left a comment

big-andy-coates left a comment

big-andy-coates Sep 12, 2019

rodesai commented Sep 12, 2019

stevenpyzhang commented Sep 12, 2019

stevenpyzhang commented Oct 11, 2019


		import io.confluent.ksql.KsqlExecutionContext;

		public class SnapshotWithOffset {

feat: add command topic offset to commands and perform validation based on offset when executing commands #3330

feat: add command topic offset to commands and perform validation based on offset when executing commands #3330

Conversation

stevenpyzhang commented Sep 11, 2019

Description

Testing done

Reviewer checklist

stevenpyzhang Sep 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodesai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agavra Sep 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agavra commented Sep 12, 2019

agavra left a comment

Choose a reason for hiding this comment

big-andy-coates left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodesai commented Sep 12, 2019

stevenpyzhang commented Sep 12, 2019

stevenpyzhang commented Oct 11, 2019

stevenpyzhang Sep 11, 2019 •

edited

Loading

agavra Sep 12, 2019 •

edited

Loading