Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy RSS grows with traffic #3998

Closed
olix0r opened this issue Jan 29, 2020 · 1 comment · Fixed by linkerd/linkerd2-proxy#418 or linkerd/linkerd2-proxy#423
Closed

proxy RSS grows with traffic #3998

olix0r opened this issue Jan 29, 2020 · 1 comment · Fixed by linkerd/linkerd2-proxy#418 or linkerd/linkerd2-proxy#423
Assignees
Labels
bug priority/P0 Release Blocker

Comments

@olix0r
Copy link
Member

olix0r commented Jan 29, 2020

On edge-20.1.4, proxy RSS grows with both HTTP and TCP load

image
image

Similar behavior is observed on more recent branches that change buffering/backpressure behavior; so we should probably continue to debug this on master.

I've put together a small repro.

@olix0r olix0r added the bug label Jan 29, 2020
@olix0r olix0r self-assigned this Jan 29, 2020
@olix0r olix0r added the priority/P0 Release Blocker label Jan 29, 2020
@olix0r olix0r changed the title proxy RSS grows until being OOMKilled proxy RSS grows with traffic Jan 29, 2020
@olix0r olix0r reopened this Jan 31, 2020
olix0r added a commit to linkerd/linkerd2-proxy that referenced this issue Jan 31, 2020
linkerd/linkerd2#3998 describes an issue where the proxy's memory grows
with traffic. After testing, we've identified that this is caused by
logging emitted by `tracing`. 07667b8 upgraded the `tracing-subscriber`
trait, which reduced memory pressure; however, we continue to observe
heap usage grow in large leaps when running with the default log level
`linkerd=info,warn`. This appears to be due to span creation, which
occurs on every new connection.

By changing uses of `info_span!` to `debug_span!`, we can avoid this
allocation path at the expense of losing contextual logging. This seems
like a suitable tradeoff until we can address the underling issues in
`tracing`.

I've tested this overnight and memory usage remains effectively flat.
@hawkw
Copy link
Contributor

hawkw commented Jan 31, 2020

One source of RSS growth with TCP connections only is probably #4006.

hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Feb 4, 2020
Version 0.0.7 of `sharded-slab` contains a bug where, when the `remove`
method is called with the index of a slot that is not being accessed
concurrently, the slot is emptied but **not** placed on the free list.
This issue meant that, under `tracing-subscriber`'s usage pattern, where
slab entries are almost always uncontended when reused, allocated slab
pages are almost never reused, resulting in unbounded slab growth over
time (i.e. a memory leak).

This commit updates `tracing-subscriber`' to version 0.2.0-alpha.6,
which in turn bumps the `sharded-slab` dependency to v0.0.8, which
includes commit hawkw/sharded-slab@dfdd7ae. That commit fixes this bug.

I've empirically verified that, after running `linkerd2-proxy` under
load with a global `trace` filter that enables a *lot* of spans, heap
usage remains stable, and the characteristic stair-step heap growth
pattern of doubling slab allocations doesn't occur. This indicates that
freed slots are actually being reused, and (once fully warmed up), the
slab will only grow when the number of active spans in the system
increases.

![mem_plot](https://user-images.githubusercontent.com/2796466/73581369-cd859900-443d-11ea-8522-abeace03d745.png)

Closes linkerd/linkerd2#3998 

Signed-off-by: Eliza Weisman <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug priority/P0 Release Blocker
Projects
None yet
2 participants