Allow throttling SQS #53

AlexITC · 2022-08-22T17:53:47Z

Adds new config options to throttle SQS.

istreeter · 2022-09-22T21:06:35Z

Hi @AlexITC sorry I've been slow to get around to looking at this PR.

Please can you tell me more about why this change is needed? What behaviour were you finding without this change? For example, was it a problem with the app running out of memory? Or exceeding kinesis rate limits?

Are you using it as part of a Snowplow pipeline setup, or is it for some other purpose of copying from sqs to kinesis?

AlexITC · 2022-09-22T22:16:04Z

We have a project that copies data from SQS to Kinesis, still, from time to time, we are experiencing that messages are consumed faster than what Kinesis can process, leaving us with many errors.

Then, we have introduced a throttling strategy to not pull more data from SQS than what our Kinesis instance can actually handle, until now, it has worked reasonably for us.

So, we came to contribute the minor patch just in case others find it useful.

AlexITC · 2022-10-31T21:25:04Z

@istreeter any luck checking this?

istreeter · 2022-11-30T14:11:58Z

Hi @AlexITC sorry I've been very slow to respond to this. I will explain why I am hesitating about this change...

Akka-streams apps typically work with the concept of sinks applying backpressure towards the source, not the other way around. In sqs2kinesis, the "sink" is writing to kinesis. Sqs2kinesis works on the principle that it should write to kinesis as quickly as it is allowed to. When it is throttled by kinesis... then that's OK! It retries sending to kinesis until it is successful. Then it fetches more from records from Sqs.

This design of the app seems to fit the akka-streams philosophy, that slow sinks set the overall rate, and quick sources adapt to that rate. By throttling the source, it seems we are flipping that paradigm the wrong way around.

When you say...

we are experiencing that messages are consumed faster than what Kinesis can process, leaving us with many errors.

Are you just referring to messages in the logs saying that it got throttled? Or is there some other error that I'm not aware of?

If you just mean the messages in the logs, then I wouldn't really call them errors. I admit that we are currently logging them at ERROR logging level, and that is probably a mistake. Maybe we just need to re-think how we log this transient problem?

Sorry again to be slow with this PR. If there is a good reason why the fundamental design of the app is problematic then I'll try to help you out. I just want to make sure we put in the right fix in the right place.

AlexITC · 2022-12-02T04:51:12Z

Thanks for taking the time to review and explain.

I should mention that I was hired for a few hours to check a problem faced by a customer, hence, I don't have as much context as I'd hope.

Akka-streams apps typically work with the concept of sinks applying backpressure towards the source, not the other way around. In sqs2kinesis, the "sink" is writing to kinesis. Sqs2kinesis works on the principle that it should write to kinesis as quickly as it is allowed to. When it is throttled by kinesis... then that's OK! It retries sending to kinesis until it is successful. Then it fetches more from records from Sqs.

Thanks for explaining this, I'm aware of this back-pressure mechanism, what my customer shared is that their Kinesis instance ended up crashing when processing a high amount of records (I don't know what's the number) which caused considerable delays before recovering.

This design of the app seems to fit the akka-streams philosophy, that slow sinks set the overall rate, and quick sources adapt to that rate. By throttling the source, it seems we are flipping that paradigm the wrong way around.

Fair point, still, this has been the simplest solution for my customer who is currently used my fork.

Are you just referring to messages in the logs saying that it got throttled? Or is there some other error that I'm not aware of?

My customer shared to me that their Kinesis instance was constantly crashing until the throttling strategy was introduced. As far as I saw, failed messages ended up being retried.

Sorry again to be slow with this PR. If there is a good reason why the fundamental design of the app is problematic then I'll try to help you out. I just want to make sure we put in the right fix in the right place.

I have raised this to my customer who is interested in getting rid of my fork, given the lack of details, do you have any better suggestion?

istreeter · 2022-12-02T07:29:51Z

Hi @AlexITC thank you that is very helpful extra context!

their Kinesis instance ended up crashing when processing a high amount of records

I find this very interesting!! If we're talking about the original Kinesis™ service managed by AWS... then in my experience it does not crash like this. Because it's a highly resilient cloud service.

I wonder, is the customer using a "Kinesis-compatible" non-AWS instance? Something like Localstack? If I am guessing right, then that's an interesting angle I had never thought about for sqs2kinesis. But I can see it would be nice to support these less resilient tools.

To avoid crashing the Kinesis sink, it might be possible to use some of the existing settings for backoff and retry. However, our backoff mechanism assumes that the Kinesis instance issues WriteThroughputExceeded messages. If we're talking about a non-AWS version of Kinesis, then I wonder if it has this support for WriteThroughputExceeded.

By the way, I am leaning towards merging this for you. I never wanted to be a blocker for you or for your customer, I just wanted to understand the change.

AlexITC · 2022-12-09T16:06:58Z

@istreeter apologize for the delay, I can't share much context about this just yet, what I got to understand so far is that the problem fixed by this throttling strategy occurs rarely, requiring some specific settings/environment (I still don't know those details).

AlexITC force-pushed the throttle-sqs branch from 929833f to c871ea3 Compare August 22, 2022 17:55

Allow throttling SQS

b665f72

Adds new config options to throttle SQS.

AlexITC force-pushed the throttle-sqs branch from c871ea3 to b665f72 Compare August 22, 2022 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow throttling SQS #53

Allow throttling SQS #53

AlexITC commented Aug 22, 2022

istreeter commented Sep 22, 2022

AlexITC commented Sep 22, 2022

AlexITC commented Oct 31, 2022

istreeter commented Nov 30, 2022

AlexITC commented Dec 2, 2022

istreeter commented Dec 2, 2022

AlexITC commented Dec 9, 2022

Allow throttling SQS #53

Are you sure you want to change the base?

Allow throttling SQS #53

Conversation

AlexITC commented Aug 22, 2022

istreeter commented Sep 22, 2022

AlexITC commented Sep 22, 2022

AlexITC commented Oct 31, 2022

istreeter commented Nov 30, 2022

AlexITC commented Dec 2, 2022

istreeter commented Dec 2, 2022

AlexITC commented Dec 9, 2022