Performance fix for the logger in the executor #3734
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
The logger in the executor spins a goroutine for each prediction and waits until the payload can be logged. With a slow downstream request logger and a huge spike of requests, the executor can OOM.
This PR makes a couple of changes to allow for more graceful handling of such spikes of traffic:
There is also a general refactor of the consumer/producer pattern we use to make it easier to understand and use without changing the behaviour.
This PR also adds a benchmark for profiling the behaviour of the executor. It was used to confirm that before we can see a linear increase in the number of goroutines with the number of requests. These goroutines also never finished if the request logging was taking ages to process. The new implementation shows a steady number of goroutines allocated for logging no matter how many requests we have.
We will need to expose the following flags in the operator so that they can be configured from there:
logger_workers
log_work_buffer_size
log_write_timeout_ms
Which issue(s) this PR fixes:
Fixes #3726
Special notes for your reviewer:
Does this PR introduce a user-facing change?: