Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
subscriber: fix spans never being removed from the registry (#435)
## Motivation This branch fixes several bugs in the `Registry` span close logic, and adds new tests for these issues. ## Solutions * Spans not being removed at the end of `on_close` There is currently a bug in the `Registry` that prevents spans from being removed, even when all `on_close` callbacks are complete. The test for this behavior ( `span_is_removed_from_registry`) fails to catch the bug, since there is a bug in the test as well. The test asserts that a span extension was dropped after the span closes. Current, the test uses `subscriber::with_default`, and then made the assertion _outside_ of the `with_default` block. However, this was incorrect, as even if the `Registry` fails to drop the span, the span extension will be dropped when the whole `Registry` itself is dropped, at the end of the `with_default` block. This branch fixes the test, and the bug in the close logic. I've changed the test to use an explicit `Dispatch` to keep the registry alive until the end of the test. This reveals that the span is _not_ being removed as it should be. The reason the span fails to be removed is that the `Drop` implementation for `CloseGuard` drops the span if the value of the `CLOSE_COUNT` cell is 0. However, the `if` statement that tests for this compares the value returned by `Cell::get` with 0. The value returned by `Cell::get` is the _previous_ value, not the current one after dropping this guard; if the guard being dropped is the final guard, then the value returned by `Cell::get` will be 1, rather than 0. I've fixed this logic, and refactored it slightly to hopefully make it easier to understand in the future. Thanks to @jtescher for catching this bug! * Only the first span being removed at the end of `on_close` In addition, I've fixed a bug where the remove after close logic would only work for the _first_ span to close on each thread. This is because dropping a `CloseGuard` does not currently subtract from CLOSE_COUNT when it is the final guard. This means that when the next span is closed, the count will start at 1, rather than 0. I've fixed this by ensuring that the close count is always decremented, and changed the tests to close multiple spans. * Spans removed on the first `try_close` Finally, there is also an issue where the removal logic is run on _every_ call to `try_close`, regardless of whether or not the subscriber actually indicates that the span closes. This means that a span will be removed from the registry even when there are one or more span handles that reference it. This is due to the `start_close` method being called _before_ `Subscriber::try_close` is called. When a close guard is dropped, the span is currently _always_ removed. However, since we call `start_close` at the beginning of the `Layered::try_close` method, we may be decrementing a ref count _without_ closing the span, but the close guard is unaware of this. I've fixed this bug by updating the `CloseGuard` struct to track whether the span is closed. It now has a bool that is set only when the `Subscriber::try_close` call returns true. Only creating the `CloseGuard` if `try_close` returns true may seem like a simpler solution, but that won't work, since the call to `try_close` may call into another `Layered` subscriber. In order to handle situations where many layers are nested, we need to construct the `CloseGuard` for each stack frame before calling into the next one, so it's necessary to set whether the span is closing only after the call to `try_close` returns. Signed-off-by: Eliza Weisman <[email protected]>
- Loading branch information