-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subscriber: fix issues when a span triggers its parents to close #514
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Eliza Weisman <[email protected]>
The issue here is that the `CloseGuard` removes the span from the registry (dropping the `DataInner`) *before* it decrements CLOSE_COUNT. Dropping the `DataInner` will decrement the parent span's ref count, potentially allowing it to close. When the new close for the parent span starts, CLOSE_COUNT will be 1, not 0, and it will not be decremented until after the try_close call completes. This means that although the close is processed and all the layers see `on_close` calls for that span, the span's data is never removed from the registry, effectively leaking it. This commit fixes the problem by decrementing CLOSE_COUNT *before* we trigger the parent to close. Fixes #511 Signed-off-by: Eliza Weisman <[email protected]>
This commit adds a test that when a span is kept open by a child which is *itself* kept open by a child, closing the grandchild will close both the parent *and* the grandparent. This is currently broken.
Currently, in situations where multiple parent spans are kept open by a child, only the first parent will be successfully closed. This is because calling `dispatcher::get_default` unsets the default dispatcher temporarily, so that nested `get_default` calls will receive a no-op subscriber. This was intended to prevent infinite dispatch loops when a subscriber calls into code that emits its own traces. Unfortunately, this means that if dropping the parent span would then trigger an additional span to drop, the dispatcher is now unset. This commit fixes this by cloning the dispatcher out of the `get_default` closure _before_ trying to close the parent span. This means that the `get_default` calls are no longer nested. This does have the disadvantage of introducing an additional `Arc` increment/decrement to the span-closing path. The impact of that shouldn't be too big, but if necessary we can try to optimize this in the future.
hawkw
added
kind/bug
Something isn't working
crate/subscriber
Related to the `tracing-subscriber` crate
labels
Jan 7, 2020
davidbarsky
approved these changes
Jan 7, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I think I'd appreciate the bug descriptions above the tests as comments. It provides really rich context.
hawkw
added a commit
that referenced
this pull request
Jan 10, 2020
Added: - **fmt**: Public `FormattedFields::new` constructor (#478) - **fmt**: Added examples to `fmt::Layer` documentation (#510) - Documentation now shows what feature flags are required by each API item (#525) Fixed: - **fmt**: Missing space between timestamp and level (#480) - **fmt**: Incorrect formatting with `with_target(false)` (#481) - **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` compatibility (#489) - **registry**: Spans exited out of order not being closed properly on exit (#509) - **registry**: Memory leak when spans are closed by a child span closing (#514) - **registry**: Spans closed by a child span closing not also closing _their_ parents (#514) - Compilation errors with `no-default-features` (#499, #500) Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Jan 11, 2020
* subscriber: prepare to release 0.2.0-alpha.3 ### Added - **fmt**: Public `FormattedFields::new` constructor (#478) - **fmt**: Added examples to `fmt::Layer` documentation (#510) - Documentation now shows what feature flags are required by each API item (#525) ### Fixed - **fmt**: Missing space between timestamp and level (#480) - **fmt**: Incorrect formatting with `with_target(false)` (#481) - **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` compatibility (#489) - **registry**: Spans exited out of order not being closed properly on exit (#509) - **registry**: Memory leak when spans are closed by a child span closing (#514) - **registry**: Spans closed by a child span closing not also closing _their_ parents (#514) - Compilation errors with `no-default-features` (#499, #500) Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Jan 11, 2020
These were added while debugging issue #511. They were never intended to be committed, but apparently I missed a few while removing them, and they slipped into PR #514 by mistake. Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Jan 11, 2020
These were added while debugging issue #511. They were never intended to be committed, but apparently I missed a few while removing them, and they slipped into PR #514 by mistake. Once this merges, I'll publish `tracing-subscriber` 0.2.0-alpha.4. We may want to consider yanking alpha.3. Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
to linkerd/linkerd2-proxy
that referenced
this pull request
Jan 30, 2020
This commit updates the proxy's `tracing-subscriber` dependency from 0.1.4 to 0.2.0-alpha.4. This alpha includes tokio-rs/tracing#514, which fixes a pair of issues which prevent spans from being considered closed. One of these issues exists in 0.1.x as well. When spans are not considered closed, storage allocated for those spans' per-span data is not reused. Therefore, this issue results in a memory leak. Note that some span storage will still not appear to be _deallocated_ after this change, but that storage will be reused when spans are allowed to close correctly. In addition, I've updated the proxy's `trace` module to remove the custom formatter. This was used only to add Linkerd's custom logging contexts to `tracing-subscriber` logs, and as we no longer use the logging contexts, we can use `tracing-subscriber`'s default formatter. Now, we only override the timestamp format. Signed-off-by: Eliza Weisman <[email protected]>
olix0r
pushed a commit
to linkerd/linkerd2-proxy
that referenced
this pull request
Jan 31, 2020
This commit updates the proxy's `tracing-subscriber` dependency from 0.1.4 to 0.2.0-alpha.4. This alpha includes tokio-rs/tracing#514, which fixes a pair of issues which prevent spans from being considered closed. One of these issues exists in 0.1.x as well. When spans are not considered closed, storage allocated for those spans' per-span data is not reused. Therefore, this issue results in a memory leak. Note that some span storage will still not appear to be _deallocated_ after this change, but that storage will be reused when spans are allowed to close correctly. In addition, I've updated the proxy's `trace` module to remove the custom formatter. This was used only to add Linkerd's custom logging contexts to `tracing-subscriber` logs, and as we no longer use the logging contexts, we can use `tracing-subscriber`'s default formatter. Now, we only override the timestamp format. Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Feb 4, 2020
0.2.0 (February 4, 2020) Breaking Changes: - **fmt**: Renamed `Context` to `FmtContext` (#420, #425) - **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) - **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) Added: - **registry**: `Registry`, a reusable span store that `Layer`s can use a high-performance, in-memory store. (#420, #425, #432, #433, #435) - **registry**: Added `LookupSpan` trait, implemented by `Subscriber`s to expose stored span data to `Layer`s (#420) - **fmt**: Added `fmt::Layer`, to allow composing log formatting with other `Layer`s (#420) - **fmt**: Added support for JSON field and event formatting (#377, #415) - **filter**: Documentation for filtering directives (#554) Changed: - **fmt**: Renamed `Context` to `FmtContext` (#420, #425) (BREAKING) - **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) (BREAKING) - **fmt**: Reimplemented `fmt::Subscriber` in terms of the `Registry` and `Layer`s (#420) Removed: - **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) (BREAKING) Fixed: - **fmt**: Fixed memory leaks in the slab used to store per-span data (3c35048) - **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` compatibility (#489) - **fmt**: Spans closed by a child span closing not also closing _their_ parents (#514) - **Layer**: Fixed `Layered` subscribers failing to downcast to their own type (#549) - **Layer**: Fixed `Layer::downcast_ref` returning invalid references (#454) Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Feb 4, 2020
# 0.2.0 (February 4, 2020) ### Breaking Changes - **fmt**: Renamed `Context` to `FmtContext` (#420, #425) - **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) - **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) ### Added - **registry**: `Registry`, a `Subscriber` implementation that `Layer`s can use as a high-performance, in-memory span store. (#420, #425, #432, #433, #435) - **registry**: Added `LookupSpan` trait, implemented by `Subscriber`s to expose stored span data to `Layer`s (#420) - **fmt**: Added `fmt::Layer`, to allow composing log formatting with other `Layer`s - **fmt**: Added support for JSON field and event formatting (#377, #415) - **filter**: Documentation for filtering directives (#554) ### Changed - **fmt**: Renamed `Context` to `FmtContext` (#420, #425) (BREAKING) - **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) (BREAKING) - **fmt**: Reimplemented `fmt::Subscriber` in terms of the `Registry` and `Layer`s (#420) ### Removed - **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) (BREAKING) ### Fixed - **fmt**: Fixed memory leaks in the slab used to store per-span data (3c35048) - **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` compatibility (#489) - **fmt**: Spans closed by a child span closing not also closing _their_ parents (#514) - **Layer**: Fixed `Layered` subscribers failing to downcast to their own type (#549) - **Layer**: Fixed `Layer::downcast_ref` returning invalid references (#454) Fixes #515 Fixes #458 Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This branch fixes a couple of issues which might result in child spans
not correctly triggering their parents to close:
The issue here is that the
CloseGuard
removes the span from theregistry (dropping the
DataInner
) before it decrements CLOSE_COUNT.Dropping the
DataInner
will decrement the parent span's ref count,potentially allowing it to close. When the new close for the parent span
starts, CLOSE_COUNT will be 1, not 0, and it will not be decremented
until after the try_close call completes. This means that although the
close is processed and all the layers see
on_close
calls for thatspan, the span's data is never removed from the registry, effectively
leaking it.
This commit fixes the problem by decrementing CLOSE_COUNT before we
trigger the parent to close.
Fixes #511
Currently, in situations where multiple parent spans are kept open by a
child, only the first parent will be successfully closed. This is
because calling
dispatcher::get_default
unsets the default dispatchertemporarily, so that nested
get_default
calls will receive a no-opsubscriber. This was intended to prevent infinite dispatch loops when a
subscriber calls into code that emits its own traces. Unfortunately,
this means that if dropping the parent span would then trigger an
additional span to drop, the dispatcher is now unset.
This commit fixes this by cloning the dispatcher out of the
get_default
closure before trying to close the parent span. Thismeans that the
get_default
calls are no longer nested. This does havethe disadvantage of introducing an additional
Arc
increment/decrementto the span-closing path. The impact of that shouldn't be too big, but
if necessary we can try to optimize this in the future.
I've also added tests for these issues.
Signed-off-by: Eliza Weisman [email protected]