ISSUE-159 VegasLimit for batch processing #161
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
At Uber, We use Kafka producer to produce message to broker, we saw two problems could cause Kafka producer overload while running the system.
When Kafka producer overloaded, messages accumulated in buffer, then it causes high message delivery latency, saturation of garbage-collector, and drop of throughput.
We think load shedding is the right way to deal with overload, however, existing concurrency limit algorithm is not perfectly fit for batching system, such as Kafka producer.
Abstraction
As a batching system, Kafka producer comes with a sending buffer. while sending message, clients put messages into the buffer.
And there are one or more sender threads keeps fetching message from the queue and send them to broker. latency is evaluated after client received ack from broker.
So in the system, latency has two components.
concretely, there is equation:
(1) observed-latency = buffer-time + roundtrip-latency
in practice, when not overload, there is following relationship
(2) 0 < buffer-time < BF * roundtrip-latency ( BF stand for bufferFactory = 1 / num-inflight-requests)
with equation (1) and relationship (2), there is
(3) roundtrip-latency < observed-latency < (1 + BF) * roundtrip-latency
The problem of VegasLimit is, it use rtt_noload as boundary to detect saturation.
however, for a batching system, it could easily cause over shedding, so we are proposing a little enhancement to make VegasLimit perfectly fit for batching system
Solution
Here is high level solution
with bufferFactor to represent portion of buffer-time in observed-latency, we propose a new equation to estimate queue size
it suggests no queue, when 0 < rtt < rtt_noload * (bufferFactor + 1)
In practice, we found the change prevented over shedding, but also causes another problem - limit inflation.
limit inflation means concurrent limit gradually increase as time elapse.
The reason is when probe, we have
when system under load, rtt are actually in the range of [rtt_noload, (bufferFactor + 1) * rtt_noload]
as a result, if when probe, rtt is at the higher end, it could cause both rtt_noload and estimatedLimit inflate.
We solved the problem in two steps
when probe, update estimatedLimit with following equation
the change reduce estimatedLimit to bare size (no buffer), so in case system is under load, it should start rejecting requests and concurrency start dropping.
eventually when concurrency dropped to estimatedLimit, observed rtt should be equals to rtt_noload, and limit inflation can be solved.
Pause updating of estimatedLimit for extra amount of time, to allow rtt dropped to rtt_noload
It takes time to have concurrency dropped to updated estimatedLimit, and during the time, updating of estimatedLimit should be paused.
other wise estimatedLimit could increase during the time, and miss the opportunity to observe rtt_noload.
The time to wait can be estimated with above equation.