Elasticache: Abort incomplete pipelines and transactions upon reconnect #1084

alavers · 2020-03-27T19:12:14Z

Elasticache severs the connection immediately after it returns a
READONLY error. This can sometimes leave queued up pipelined commands
in an inconsistent state when the connection is reestablished. For
example, if a pipeline has 6 commands and the second one generates a
READONLY error, Elasticache will only return results for the first two
before severing the connection. Upon reconnect, the pipeline still
thinks it has 6 commands to send but the commandQueue has only 4. This
fix will detect any pipeline command sets that only had a partial
response before connection loss, and abort them.

This Elasticache behavior also affects transactions. If reconnectOnError
returns 2, some transaction fragments may end up in the offlineQueue.
This fix will check the offlineQueue for any such transaction fragments
and abort them, so that we don't send mismatched multi/exec to redis
upon reconnection.

Introduced piplineIndex property on pipelined commands to allow for later
cleanup
Added a routine to event_handler that aborts any pipelined commands inside
commandQueue and offlineQueue that were interrupted in the middle of the
pipeline
Added a routine to event_handler that removes any transaction
fragments from the offline queue
Introduced inTransaction property on commands to simplify pipeline logic
Added a flags param to mock_server to allow the Elasticache disconnect
behavior to be simulated
Added a reconnect_on_error test case for transactions
Added some test cases testing for correct handling of this unique elasticache
behavior
Added unit tests to validate inTransaction and pipelineIndex setting

Fixes #965

You can simulate this Elasticache disconnect behavior on a local redis instance by modifying the source like so:

diff --git a/src/server.c b/src/server.c
index f6faa61a..31721551 100644
--- a/src/server.c
+++ b/src/server.c
@@ -2702,7 +2702,9 @@ int processCommand(client *c) {
         !(c->flags & CLIENT_MASTER) &&
         c->cmd->flags & CMD_WRITE)
     {
+        flagTransaction(c);
         addReply(c, shared.roslaveerr);
+        c->flags |= CLIENT_CLOSE_AFTER_REPLY;
         return C_OK;
     }

(Thanks for the idea @luin)

Elasticache severs the connection immediately after it returns a READONLY error. This can sometimes leave queued up pipelined commands in an inconsistent state when the connection is reestablished. For example, if a pipeline has 6 commands and the second one generates a READONLY error, Elasticache will only return results for the first two before severing the connection. Upon reconnect, the pipeline still thinks it has 6 commands to send but the commandQueue has only 4. This fix will detect any pipeline command sets that only had a partial response before connection loss, and abort them. This Elasticache behavior also affects transactions. If reconnectOnError returns 2, some transaction fragments may end up in the offlineQueue. This fix will check the offlineQueue for any such transaction fragments and abort them, so that we don't send mismatched multi/exec to redis upon reconnection. - Introduced piplineIndex property on pipelined commands to allow for later cleanup - Added a routine to event_handler that aborts any pipelined commands inside commandQueue and offlineQueue that were interrupted in the middle of the pipeline - Added a routine to event_handler that removes any transaction fragments from the offline queue - Introduced inTransaction property on commands to simplify pipeline logic - Added a flags param to mock_server to allow the Elasticache disconnect behavior to be simulated - Added a reconnect_on_error test case for transactions - Added some test cases testing for correct handling of this unique elasticache behavior - Added unit tests to validate inTransaction and pipelineIndex setting Fixes redis#965

alavers · 2020-03-27T19:23:58Z

lib/redis/event_handler.ts

+// the connection close and those pipelined commands must be aborted. For
+// example, if the queue looks like this: [2, 3, 4, 0, 1, 2] then after
+// aborting and purging we'll have a queue that looks like this: [0, 1, 2]
+function abortIncompletePipelines(commandQueue: Deque<ICommandItem>) {


I do feel like this approach is treating the downstream symptoms rather than the root cause of the problem. The tradeoff is that it isolates connection handling cleanup stuff inside event_handler so that transaction.ts and pipeline.ts can be more decoupled from reconnection behaviors.

I'm open to other ways of approaching this.

luin · 2020-03-28T10:13:28Z

Awesome! Merged 🍻

## [4.16.1](v4.16.0...v4.16.1) (2020-03-28) ### Bug Fixes * abort incomplete pipelines upon reconnect ([#1084](#1084)) ([0013991](0013991)), closes [#965](#965)

ioredis-robot · 2020-03-28T10:16:05Z

🎉 This PR is included in version 4.16.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

## [4.16.1](redis/ioredis@v4.16.0...v4.16.1) (2020-03-28) ### Bug Fixes * abort incomplete pipelines upon reconnect ([#1084](redis/ioredis#1084)) ([0013991](redis/ioredis@0013991)), closes [#965](redis/ioredis#965)

alavers force-pushed the abort-interrupted-pipelines branch from 3505d83 to aa5fe68 Compare March 27, 2020 19:16

alavers commented Mar 27, 2020

View reviewed changes

luin merged commit 0013991 into redis:master Mar 28, 2020

ioredis-robot pushed a commit that referenced this pull request Mar 28, 2020

chore(release): 4.16.1 [skip ci]

0b4826f

## [4.16.1](v4.16.0...v4.16.1) (2020-03-28) ### Bug Fixes * abort incomplete pipelines upon reconnect ([#1084](#1084)) ([0013991](0013991)), closes [#965](#965)

ioredis-robot added the released label Mar 28, 2020

This was referenced Mar 31, 2020

[Snyk] Upgrade ioredis from 4.16.0 to 4.16.1 bangbang93/freyja#148

Merged

[Snyk] Upgrade ioredis from 4.16.0 to 4.16.1 bangbang93/haruhi#136

Merged

This was referenced Apr 24, 2020

[Snyk] Upgrade ioredis from 4.16.0 to 4.16.3 tuan231195/sportywide-news#12

Closed

[Snyk] Upgrade ioredis from 4.11.2 to 4.16.1 AtomicLoans/liquidator#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticache: Abort incomplete pipelines and transactions upon reconnect #1084

Elasticache: Abort incomplete pipelines and transactions upon reconnect #1084

alavers commented Mar 27, 2020

alavers Mar 27, 2020

luin commented Mar 28, 2020

ioredis-robot commented Mar 28, 2020

Elasticache: Abort incomplete pipelines and transactions upon reconnect #1084

Elasticache: Abort incomplete pipelines and transactions upon reconnect #1084

Conversation

alavers commented Mar 27, 2020

alavers Mar 27, 2020

Choose a reason for hiding this comment

luin commented Mar 28, 2020

ioredis-robot commented Mar 28, 2020