Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subscriber: fix spans never being removed from the registry #435

Merged
merged 8 commits into from
Nov 18, 2019

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Nov 18, 2019

Motivation

This branch fixes several bugs in the Registry span close logic, and
adds new tests for these issues.

Solutions

  • Spans not being removed at the end of on_close

There is currently a bug in the Registry that prevents spans from
being removed, even when all on_close callbacks are complete. The test
for this behavior ( span_is_removed_from_registry) fails to catch the
bug, since there is a bug in the test as well.

The test asserts that a span extension was dropped after the span
closes. Current, the test uses subscriber::with_default, and then
made the assertion outside of the with_default block. However, this
was incorrect, as even if the Registry fails to drop the span, the
span extension will be dropped when the whole Registry itself is
dropped, at the end of the with_default block.

This branch fixes the test, and the bug in the close logic. I've changed
the test to use an explicit Dispatch to keep the registry alive until
the end of the test. This reveals that the span is not being removed
as it should be.

The reason the span fails to be removed is that the Drop
implementation for CloseGuard drops the span if the value of the
CLOSE_COUNT cell is 0. However, the if statement that tests for this
compares the value returned by Cell::get with 0. The value returned by
Cell::get is the previous value, not the current one after dropping
this guard; if the guard being dropped is the final guard, then the
value returned by Cell::get will be 1, rather than 0.

I've fixed this logic, and refactored it slightly to hopefully make it
easier to understand in the future.

Thanks to @jtescher for catching this bug!

  • Only the first span being removed at the end of on_close

In addition, I've fixed a bug where the remove after close logic would
only work for the first span to close on each thread. This is because
dropping a CloseGuard does not currently subtract from CLOSE_COUNT
when it is the final guard. This means that when the next span is
closed, the count will start at 1, rather than 0. I've fixed this by
ensuring that the close count is always decremented, and changed the
tests to close multiple spans.

  • Spans removed on the first try_close

Finally, there is also an issue where the removal logic is run on
every call to try_close, regardless of whether or not the subscriber
actually indicates that the span closes. This means that a span will be
removed from the registry even when there are one or more span handles
that reference it.

This is due to the start_close method being called before
Subscriber::try_close is called. When a close guard is dropped, the
span is currently always removed. However, since we call start_close
at the beginning of the Layered::try_close method, we may be
decrementing a ref count without closing the span, but the close guard
is unaware of this.

I've fixed this bug by updating the CloseGuard struct to track whether
the span is closed. It now has a bool that is set only when the
Subscriber::try_close call returns true.

Only creating the CloseGuard if try_close returns true may seem like
a simpler solution, but that won't work, since the call to try_close
may call into another Layered subscriber. In order to handle
situations where many layers are nested, we need to construct the
CloseGuard for each stack frame before calling into the next one, so
it's necessary to set whether the span is closing only after the call to
try_close returns.

@hawkw hawkw added kind/bug Something isn't working crate/subscriber Related to the `tracing-subscriber` crate labels Nov 18, 2019
@hawkw hawkw requested a review from davidbarsky November 18, 2019 19:50
@hawkw hawkw self-assigned this Nov 18, 2019
Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

This fixes an issue where the test `span_is_removed_from_registry` did
not actually ensure the correct behavior.

This test asserts that a span extension was dropped after the span
closes. Previously, the test used `subscriber::with_default`, and then
made the assertion _outside_ of the `with_default` block. However, this
was incorrect, as even if the `Registry` fails to drop the span, the
span extension will be dropped when the whole `Registry` itself is
dropped, at the end of the `with_default` block.

This commit changes the test to use an explicit `Dispatch` to keep the
registry alive until the end of the test. This reveals that the span is
_not_ being removed as it should be.

Thanks to @jtescher for catching this bug!

Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw force-pushed the eliza/fix-on-close branch from 8000951 to e248703 Compare November 18, 2019 22:26
@hawkw
Copy link
Member Author

hawkw commented Nov 18, 2019

@davidbarsky this branch now fixes several bugs that I found after fixing the first one (it was preventing them from ever occurring). I've updated the PR description to reflect this.

@hawkw hawkw merged commit d93b467 into master Nov 18, 2019
@hawkw hawkw deleted the eliza/fix-on-close branch December 9, 2019 20:53
hawkw added a commit that referenced this pull request Feb 4, 2020
0.2.0 (February 4, 2020)

Breaking Changes:

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420)
- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434)

Added:

- **registry**: `Registry`, a reusable span store that `Layer`s can use
  a high-performance, in-memory store. (#420, #425, #432, #433, #435)
- **registry**: Added `LookupSpan` trait, implemented by `Subscriber`s
  to expose stored span data to `Layer`s (#420)
- **fmt**: Added `fmt::Layer`, to allow composing log formatting with
  other `Layer`s (#420)
- **fmt**: Added support for JSON field and event formatting (#377,
  #415)
- **filter**: Documentation for filtering directives (#554)

Changed:

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425) (BREAKING)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) (BREAKING)
- **fmt**: Reimplemented `fmt::Subscriber` in terms of the `Registry`
  and `Layer`s (#420)

Removed:

- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) (BREAKING)

Fixed:

- **fmt**: Fixed memory leaks in the slab used to store per-span data
  (3c35048)
- **fmt**: `fmt::SubscriberBuilder::init` not setting up `log`
  compatibility (#489)
- **fmt**: Spans closed by a child span closing not also closing _their_
  parents (#514)
- **Layer**: Fixed `Layered` subscribers failing to downcast to their
  own type (#549)
- **Layer**: Fixed `Layer::downcast_ref` returning invalid references
  (#454)

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Feb 4, 2020
# 0.2.0 (February 4, 2020)

### Breaking Changes

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420)
- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434)

### Added

- **registry**: `Registry`, a `Subscriber` implementation that `Layer`s
  can use as a high-performance, in-memory span store. (#420, #425, 
  #432, #433, #435)
- **registry**: Added `LookupSpan` trait, implemented by `Subscriber`s
  to expose stored span data to `Layer`s (#420)
- **fmt**: Added `fmt::Layer`, to allow composing log formatting with
  other `Layer`s
- **fmt**: Added support for JSON field and event formatting (#377, 
  #415)
- **filter**: Documentation for filtering directives (#554)

### Changed

- **fmt**: Renamed `Context` to `FmtContext` (#420, #425) (BREAKING)
- **fmt**: Renamed `Builder` to `SubscriberBuilder` (#420) (BREAKING)
- **fmt**: Reimplemented `fmt::Subscriber` in terms of the `Registry`
  and `Layer`s (#420)

### Removed

- **filter**: Removed `Filter`. Use `EnvFilter` instead (#434) 
  (BREAKING)

### Fixed

- **fmt**: Fixed memory leaks in the slab used to store per-span data
  (3c35048)
- **fmt**: `fmt::SubscriberBuilder::init` not setting up `log` 
  compatibility (#489)
- **fmt**: Spans closed by a child span closing not also closing 
  _their_ parents (#514)
- **Layer**: Fixed `Layered` subscribers failing to downcast to their
  own type (#549)
- **Layer**: Fixed `Layer::downcast_ref` returning invalid references
  (#454)

Fixes #515
Fixes #458 

Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants