-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kinesis source improvements for larger deployments #99
Conversation
52c5345
to
b779ec3
Compare
Two improvements getting ready for using common-streams on larger pods or with more shards. 1. Configurable max leases to steal at one time. The KCL default is 1. In order to avoid latency during scale up/down and pod-ration, we want the app to be quick to acquire shard-leases to process. With bigger instances we tend to have more shard-leases per instance, so we increase how aggressivley it acquires leases. 2. Set KCL `maxPendingProcessRecordsInput` to the minimum allowed (1). This is a precaution to avoid potentials OOMs in case a single pod subscribes to a very large number of shards. It shouldn't have a negative performance impact, because common-streams apps tend to manage their own pre-fetching. In fact, this makes the Kinesis source more similar to the other Sources which don't have in-built pre-fetching. 3. Fixes a bug in which latency was set by the latest record in a batch, not the earliest.
b779ec3
to
f07cc4b
Compare
@@ -10,6 +10,7 @@ snowplow.defaults: { | |||
maxRecords: 1000 | |||
} | |||
leaseDuration: "10 seconds" | |||
maxLeasesToStealAtOneTimeFactor: 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious that this is a decimal, you can't have a fraction of a lease surely? Doesn't seem like a likely oversight - so I'm guessing there's a reason I don't see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number is first multiplied by the number of processors. It is helpful to make it a decimal, e.g. maybe we want a 8-core instance to steal 2 leases at one time, so we set this value to 0.25
.
I've been using this pattern in lots of places recently. It's a good way to make the app auto-guess a good configuration, instead of putting the burden on the user to configure it appropriately for its vertical size. And I consistently put Factor
as a suffix of params that get multiplied by num processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see - interesting. But nuanced, but I get it. It's cores*this = n instances. I assume it rounds up to the nearest CEIL, since it's max... Cool, thanks for explaining!
@@ -16,6 +16,26 @@ import java.net.URI | |||
import java.time.Instant | |||
import scala.concurrent.duration.FiniteDuration | |||
|
|||
/** | |||
* Config to be supplied from the app's hocon | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the addition of in-code documentation for the other parameters too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
See snowplow-incubator/common-streams#99 for the relevant change This library upgrade brings improvements to the Kinesis source, which should help on vertically larger instances.
See snowplow-incubator/common-streams#99 for the relevant change This library upgrade brings improvements to the Kinesis source, which should help on vertically larger instances.
* Bump common-streams to 0.9.0 See snowplow-incubator/common-streams#99 for the relevant change This library upgrade brings improvements to the Kinesis source, which should help on vertically larger instances. * Bump snowflake-ingest-sdk to 3.0.0
Three improvements getting ready for using common-streams on larger pods or with more shards.
maxPendingProcessRecordsInput
to the minimum allowed (1). This is a precaution to avoid potentials OOMs in case a single pod subscribes to a very large number of shards. It shouldn't have a negative performance impact, because common-streams apps tend to manage their own pre-fetching. In fact, this makes the Kinesis source more similar to the other Sources which don't have in-built pre-fetching.