Performance fix for the logger in the executor #3734

ivan-valkov · 2021-11-12T17:19:44Z

What this PR does / why we need it:

The logger in the executor spins a goroutine for each prediction and waits until the payload can be logged. With a slow downstream request logger and a huge spike of requests, the executor can OOM.

This PR makes a couple of changes to allow for more graceful handling of such spikes of traffic:

Increase the default work queue size and make it configurable - now we use the channel to buffer work. We no longer use goroutines that wait to write to the channel as a buffer
Add a configurable timeout for when the buffer is full. This means logs can be dropped if there is too much work for the executor and downstream log processing to handle.
Increase the default workers - they are waiting on I/O most of the time so they can be more than they used to be.

There is also a general refactor of the consumer/producer pattern we use to make it easier to understand and use without changing the behaviour.

This PR also adds a benchmark for profiling the behaviour of the executor. It was used to confirm that before we can see a linear increase in the number of goroutines with the number of requests. These goroutines also never finished if the request logging was taking ages to process. The new implementation shows a steady number of goroutines allocated for logging no matter how many requests we have.

We will need to expose the following flags in the operator so that they can be configured from there:

logger_workers
log_work_buffer_size
log_write_timeout_ms

Which issue(s) this PR fixes:

Fixes #3726

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

michaelcheah

lgtm

michaelcheah · 2021-11-12T17:59:28Z

executor/cmd/executor/main.go

+	logWorkBufferSize = flag.Int("log_work_buffer_size", 10000, "Limit of buffered logs in memory while waiting for downstream request ingestion")
+	logWriteTimeoutMs = flag.Int("log_write_timeout_ms", 2000, "Timeout before giving up writing log if buffer is full. If <= 0 will immediately drop log on full log buffer.")


nit: Can this not use the default values in executor/logger/collector.go?

seldondev · 2021-11-12T18:07:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: michaelcheah
To complete the pull request process, please assign ivan-valkov
You can assign the PR to them by writing /assign @ivan-valkov in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ukclivecox · 2021-11-13T11:36:37Z

@ivan-valkov Great. Will open another PR to add Helm chart parameters.

* WIP benchmark * better benchmark with pprof * fix wip * cleanup

ivan-valkov added 4 commits November 5, 2021 15:17

WIP benchmark

2ae63a8

better benchmark with pprof

40fc610

fix wip

9fdd6ea

cleanup

d51b9ae

ivan-valkov requested review from ukclivecox and michaelcheah November 12, 2021 17:19

seldondev added the size/L label Nov 12, 2021

michaelcheah approved these changes Nov 12, 2021

View reviewed changes

ukclivecox merged commit eaf7a22 into SeldonIO:master Nov 13, 2021

stephen37 pushed a commit to stephen37/seldon-core that referenced this pull request Dec 21, 2021

Performance fix for the logger in the executor (SeldonIO#3734)

c343461

* WIP benchmark * better benchmark with pprof * fix wip * cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance fix for the logger in the executor #3734

Performance fix for the logger in the executor #3734

ivan-valkov commented Nov 12, 2021

michaelcheah left a comment

michaelcheah Nov 12, 2021

seldondev commented Nov 12, 2021

ukclivecox commented Nov 13, 2021

		logWorkBufferSize = flag.Int("log_work_buffer_size", 10000, "Limit of buffered logs in memory while waiting for downstream request ingestion")
		logWriteTimeoutMs = flag.Int("log_write_timeout_ms", 2000, "Timeout before giving up writing log if buffer is full. If <= 0 will immediately drop log on full log buffer.")

Performance fix for the logger in the executor #3734

Performance fix for the logger in the executor #3734

Conversation

ivan-valkov commented Nov 12, 2021

michaelcheah left a comment

Choose a reason for hiding this comment

michaelcheah Nov 12, 2021

Choose a reason for hiding this comment

seldondev commented Nov 12, 2021

ukclivecox commented Nov 13, 2021