Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subscriber: fix issues when a span triggers its parents to close #514

Merged
merged 6 commits into from
Jan 8, 2020

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Jan 7, 2020

This branch fixes a couple of issues which might result in child spans
not correctly triggering their parents to close:

  • subscriber: fix spans closed by their children not being removed

The issue here is that the CloseGuard removes the span from the
registry (dropping the DataInner) before it decrements CLOSE_COUNT.
Dropping the DataInner will decrement the parent span's ref count,
potentially allowing it to close. When the new close for the parent span
starts, CLOSE_COUNT will be 1, not 0, and it will not be decremented
until after the try_close call completes. This means that although the
close is processed and all the layers see on_close calls for that
span, the span's data is never removed from the registry, effectively
leaking it.

This commit fixes the problem by decrementing CLOSE_COUNT before we
trigger the parent to close.

Fixes #511

  • subscriber: fix child spans not closing grandparents

Currently, in situations where multiple parent spans are kept open by a
child, only the first parent will be successfully closed. This is
because calling dispatcher::get_default unsets the default dispatcher
temporarily, so that nested get_default calls will receive a no-op
subscriber. This was intended to prevent infinite dispatch loops when a
subscriber calls into code that emits its own traces. Unfortunately,
this means that if dropping the parent span would then trigger an
additional span to drop, the dispatcher is now unset.

This commit fixes this by cloning the dispatcher out of the
get_default closure before trying to close the parent span. This
means that the get_default calls are no longer nested. This does have
the disadvantage of introducing an additional Arc increment/decrement
to the span-closing path. The impact of that shouldn't be too big, but
if necessary we can try to optimize this in the future.

I've also added tests for these issues.

Signed-off-by: Eliza Weisman [email protected]

The issue here is that the `CloseGuard` removes the span from the
registry (dropping the `DataInner`) *before* it decrements CLOSE_COUNT.
Dropping the `DataInner` will decrement the parent span's ref count,
potentially allowing it to close. When the new close for the parent span
starts, CLOSE_COUNT will be 1, not 0, and it will not be decremented
until after the try_close call completes. This means that although the
close is processed and all the layers see `on_close` calls for that
span, the span's data is never removed from the registry, effectively
leaking it.

This commit fixes the problem by decrementing CLOSE_COUNT *before* we
trigger the parent to close.

Fixes #511

Signed-off-by: Eliza Weisman <[email protected]>
This commit adds a test that when a span is kept open by a child which
is *itself* kept open by a child, closing the grandchild will close both
the parent *and* the grandparent.

This is currently broken.
Currently, in situations where multiple parent spans are kept open by a
child, only the first parent will be successfully closed. This is
because calling `dispatcher::get_default` unsets the default dispatcher
temporarily, so that nested `get_default` calls will receive a no-op
subscriber. This was intended to prevent infinite dispatch loops when a
subscriber calls into code that emits its own traces. Unfortunately,
this means that if dropping the parent span would then trigger an
additional span to drop, the dispatcher is now unset.

This commit fixes this by cloning the dispatcher out of the
`get_default` closure _before_ trying to close the parent span. This
means that the `get_default` calls are no longer nested. This does have
the disadvantage of introducing an additional `Arc` increment/decrement
to the span-closing path. The impact of that shouldn't be too big, but
if necessary we can try to optimize this in the future.
@hawkw hawkw added kind/bug Something isn't working crate/subscriber Related to the `tracing-subscriber` crate labels Jan 7, 2020
@hawkw hawkw requested review from davidbarsky and a team January 7, 2020 22:55
@hawkw hawkw self-assigned this Jan 7, 2020
Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I think I'd appreciate the bug descriptions above the tests as comments. It provides really rich context.

@hawkw hawkw merged commit d0b22bb into master Jan 8, 2020
hawkw added a commit that referenced this pull request Jan 10, 2020
Added:

- **fmt**: Public `FormattedFields::new` constructor (#478)
- **fmt**: Added examples to `fmt::Layer` documentation (#510)
- Documentation now shows what feature flags are required by each API item (#525)

Fixed:

- **fmt**: Missing space between timestamp and level (#480)
- **fmt**: Incorrect formatting with `with_target(false)` (#481)
- **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` compatibility
  (#489)
- **registry**: Spans exited out of order not being closed properly on exit
  (#509)
- **registry**: Memory leak when spans are closed by a child span closing (#514)
- **registry**: Spans closed by a child span closing not also closing _their_
  parents (#514)
- Compilation errors with `no-default-features` (#499, #500)

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Jan 11, 2020
* subscriber: prepare to release 0.2.0-alpha.3

### Added

- **fmt**: Public `FormattedFields::new` constructor (#478)
- **fmt**: Added examples to `fmt::Layer` documentation (#510)
- Documentation now shows what feature flags are required by each API item (#525)

### Fixed

- **fmt**: Missing space between timestamp and level (#480)
- **fmt**: Incorrect formatting with `with_target(false)` (#481)
- **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` compatibility
  (#489)
- **registry**: Spans exited out of order not being closed properly on exit
  (#509)
- **registry**: Memory leak when spans are closed by a child span closing (#514)
- **registry**: Spans closed by a child span closing not also closing _their_
  parents (#514)
- Compilation errors with `no-default-features` (#499, #500)

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Jan 11, 2020
These were added while debugging issue #511. They were never intended to
be committed, but apparently I missed a few while removing them, and
they slipped into PR #514 by mistake.

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Jan 11, 2020
These were added while debugging issue #511. They were never intended to
be committed, but apparently I missed a few while removing them, and
they slipped into PR #514 by mistake.

Once this merges, I'll publish `tracing-subscriber` 0.2.0-alpha.4. We may
want to consider yanking alpha.3.

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit to linkerd/linkerd2-proxy that referenced this pull request Jan 30, 2020
This commit updates the proxy's `tracing-subscriber` dependency from
0.1.4 to 0.2.0-alpha.4. This alpha includes tokio-rs/tracing#514, which
fixes a pair of issues which prevent spans from being considered closed.
One of these issues exists in 0.1.x as well.

When spans are not considered closed, storage allocated for those spans'
per-span data is not reused. Therefore, this issue results in a memory
leak.

Note that some span storage will still not appear to be
_deallocated_ after this change, but that storage will be reused when
spans are allowed to close correctly.

In addition, I've updated the proxy's `trace` module to remove the
custom formatter. This was used only to add Linkerd's custom logging
contexts to `tracing-subscriber` logs, and as we no longer use the
logging contexts, we can use `tracing-subscriber`'s default formatter.
Now, we only override the timestamp format.

Signed-off-by: Eliza Weisman <[email protected]>
olix0r pushed a commit to linkerd/linkerd2-proxy that referenced this pull request Jan 31, 2020
This commit updates the proxy's `tracing-subscriber` dependency from
0.1.4 to 0.2.0-alpha.4. This alpha includes tokio-rs/tracing#514, which
fixes a pair of issues which prevent spans from being considered closed.
One of these issues exists in 0.1.x as well.

When spans are not considered closed, storage allocated for those spans'
per-span data is not reused. Therefore, this issue results in a memory
leak.

Note that some span storage will still not appear to be
_deallocated_ after this change, but that storage will be reused when
spans are allowed to close correctly.

In addition, I've updated the proxy's `trace` module to remove the
custom formatter. This was used only to add Linkerd's custom logging
contexts to `tracing-subscriber` logs, and as we no longer use the
logging contexts, we can use `tracing-subscriber`'s default formatter.
Now, we only override the timestamp format.

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Feb 4, 2020
0.2.0 (February 4, 2020)

Breaking Changes:

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420)
- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434)

Added:

- **registry**: `Registry`, a reusable span store that `Layer`s can use
  a high-performance, in-memory store. (#420, #425, #432, #433, #435)
- **registry**: Added `LookupSpan` trait, implemented by `Subscriber`s
  to expose stored span data to `Layer`s (#420)
- **fmt**: Added `fmt::Layer`, to allow composing log formatting with
  other `Layer`s (#420)
- **fmt**: Added support for JSON field and event formatting (#377,
  #415)
- **filter**: Documentation for filtering directives (#554)

Changed:

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425) (BREAKING)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) (BREAKING)
- **fmt**: Reimplemented `fmt::Subscriber` in terms of the `Registry`
  and `Layer`s (#420)

Removed:

- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) (BREAKING)

Fixed:

- **fmt**: Fixed memory leaks in the slab used to store per-span data
  (3c35048)
- **fmt**: `fmt::SubscriberBuilder::init` not setting up `log`
  compatibility (#489)
- **fmt**: Spans closed by a child span closing not also closing _their_
  parents (#514)
- **Layer**: Fixed `Layered` subscribers failing to downcast to their
  own type (#549)
- **Layer**: Fixed `Layer::downcast_ref` returning invalid references
  (#454)

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Feb 4, 2020
# 0.2.0 (February 4, 2020)

### Breaking Changes

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420)
- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434)

### Added

- **registry**: `Registry`, a `Subscriber` implementation that `Layer`s
  can use as a high-performance, in-memory span store. (#420, #425, 
  #432, #433, #435)
- **registry**: Added `LookupSpan` trait, implemented by `Subscriber`s
  to expose stored span data to `Layer`s (#420)
- **fmt**: Added `fmt::Layer`, to allow composing log formatting with
  other `Layer`s
- **fmt**: Added support for JSON field and event formatting (#377, 
  #415)
- **filter**: Documentation for filtering directives (#554)

### Changed

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425) (BREAKING)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) (BREAKING)
- **fmt**: Reimplemented `fmt::Subscriber` in terms of the `Registry`
  and `Layer`s (#420)

### Removed

- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) 
  (BREAKING)

### Fixed

- **fmt**: Fixed memory leaks in the slab used to store per-span data
  (3c35048)
- **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` 
  compatibility (#489)
- **fmt**: Spans closed by a child span closing not also closing 
  _their_ parents (#514)
- **Layer**: Fixed `Layered` subscribers failing to downcast to their
  own type (#549)
- **Layer**: Fixed `Layer::downcast_ref` returning invalid references
  (#454)

Fixes #515
Fixes #458 

Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

registry::DataInner may not be dropped if child spans are dropped out of order
2 participants