Access Logging #601

tustvold · 2020-07-22T10:12:30Z

Access logging is very important functionality for my team as we wish to maintain feature-parity with our existing AWS ALB-based approach. This functionality was requested here and was marked as help wanted, so thought I'd take a stab at implementing it.

Creating as a draft as still needs more testing and benchmarking, and I'm new to tower and so might have made some rookie errors. However, I wanted to create a draft as an opportunity to get some early feedback.

The basic design consists of an AccessLogLayer that instruments both requests and responses that flow through it, in a similar manner to how handle_time is already computed. I'm not a massive fan of this, but it was the only way I could easily see to get accurate processing time metrics. I've tried to avoid any memory allocation on the hot path, although there are possibly more atomic increments than I would like. The performance impact with the feature disabled, i.e. LINKERD2_PROXY_ACCESS_LOG_FILE, not set should be minimal.

The results of this instrumentation are then sent over a mpsc channel to an AccessLogCollector that writes them in a space-delimited format to a file specified as an environment variable. It buffers in memory and flushes on termination and on write if more than FLUSH_TIMEOUT_SECS since the last flush. This makes the access logging best effort much like AWS ALBs.

An example deployment scenario using this functionality might deploy a fluent-bit sidecar to ship the logs, or write to /dev/stderr and use a log shipper deployed as a DaemonSet.

olix0r · 2020-07-23T03:30:20Z

Thanks @tustvold!

@hawkw would you mind looking at this? Can we use tracing for any of this? Particularly, the tracker code looks pretty complex to me; and I'm curious if we already have code that does this sort of thing.

hawkw · 2020-07-23T04:28:37Z

I've not looked too closely yet, but yeah, I imagine it should be possible to come up with a solution using existing tracing information. If we used the existing request spans to time request duration, I think we could potentially avoid a lot of complexity.

Eventually, it might even be worth implementing a tracing layer for access logging as a general purpose library --- but this might require other projects, like hyper, to opt into conventional naming for HTTP-specific fields (which would be nice eventually regardless...).

tustvold · 2020-07-25T14:49:38Z

As far as I can tell it isn't possible to directly consume the tracing metrics as not only are they sampled, but they don't record the necessary request metadata for access logging. That being said I've updated this PR to use similar timing logic, doing away with the complex tracker logic.

Unfortunately this does result in a loss of granularity, in particular the previous tracker-based implementation yielded 3 separate time-related metrics:

request_processing_time: the time the request spends in the proxy
upstream_processing_time: time between when the request was sent and a response was received
response_processing_time: the time between when the response was received and dispatched to the downstream

Whereas this approach only yields a single metric that is the sum of the previous implementation's request_processing_time and upstream_processing_time. However, in my (limited) testing response_processing_time is consistently on the order of a couple of microseconds, and the request_processing_time is already being tracked and exported as the handle_time metric, so this is no great loss.

Additionally the previous approach, whilst based on the way the handle_time metric is computed, is potentially misleading as it only really measures the time until the request is handed off to the L4 stack, so I'm not all that attached to it 😄

I'll take a look over the weekend at getting some test coverage in, but let me know if there is a better way to implement this functionality.

tustvold · 2020-07-25T23:17:17Z

Appears to be shutdown race in test, will look into tomorrow

tustvold · 2020-07-26T15:13:46Z

Working on separate PR to fix non-determinism in integration test shutdown

hawkw · 2020-07-26T19:43:51Z

As far as I can tell it isn't possible to directly consume the tracing metrics as not only are they sampled, but they don't record the necessary request metadata for access logging. That being said I've updated this PR to use similar timing logic, doing away with the complex tracker logic.

tracing spans (tracing in the sense of the Rust crate, not "tracing" in the sense of the Linkerd OpenCensus integration) are not sampled but filtered; I imagine that if we added a tracing Layer for access logging, it would always enable the spans necessary to generate the access log, regardless of other filter configurations. And, if we decided to implement this using existing tracing instrumentation, I think we would want to add a single per-request span that contains all the necessary request metadata as fields on that span (which it looks like we don't currently have, as all that data is split across multiple spans).

I brought this up primarily because I think a reusable library for access logging with tracing would be beneficial to the whole ecosystem. If the current approach here is a quicker and easier way of adding access logging to the Linkerd proxy in particular, I'm not going to block merging this PR on implementing a general-purpose library...it just might be nice to do at some point.

hawkw

I gave this a quick look and left a few notes. Hope they're helpful!

linkerd/access-log/src/layer.rs

linkerd/app/src/access_log_collector.rs

tustvold · 2020-07-26T20:15:13Z

Thanks for the review, I'll look to get an update done, factoring in your feedback, over the next week time-permitting. Currently trying to fix/workaround the fact that the integration test drops the proxy's thread leading to non-deterministic shutdown behavior.

Regarding tracing I'll see what I can come up with, it would definitely be a cleaner way to provide this functionality.

tustvold · 2020-09-15T16:46:01Z

Well it was a bit more than a week, but I eventually manged to get time to work on this.

I've re-implemented the functionality to use tokio-tracing instead of a custom mpsc queue as requested. In particular the AccessLogLayer now produces tokio::tracing spans. This PR then adds an additional tracing-subscriber layer AccessLogWriter which asynchronously writes access_log spans as they are closed. This is currently to stdout but this could easily be extended in future.

As the AccessLogWriter is part of the single tracing layer stack, the instrumentation level is controlled by EnvFilter and by extension the LINKER2_PROXY_LOG environment variable. For example, LINKERD2_PROXY_LOG=warn,access_log, will enable access logs. This feels a little bit off, but I wasn't really sure how else to handle this?

Another thing to note is if people have the JSON log format enabled, this will probably not perform very well due to the way on_record is currently implemented and the number of fields.

Finally the way request_bytes and response_bytes is determined won't work for chunked requests, but I wasn't sure how else to easily obtain this information?

EDIT: Another thing that I've noticed whilst testing but not been able to find the reason behind, is that the very first span takes a good two minutes to be logged, often appearing a long time after later requests have already been logged.

tustvold · 2020-09-21T08:03:18Z

Rebased onto current main

tustvold · 2020-11-16T16:09:18Z

Various iterations of this have been running in a test cluster for the last two months without issue. However, I've noticed the recent PRs to reduce the tracing due to performance issues, which makes me somewhat concerned that this may not be the best approach?

I'll shortly be performing some load-testing on this test cluster and so would appreciate your insight into this, in case the approach in this PR is fundamentally flawed and there would be a better way to provide this functionality.

olix0r · 2020-11-16T17:28:50Z

@tustvold Thanks for your patience and persistence on this. We were pretty focused on getting stable-2.9.0 ready; but now is probably a good time for us to come back to this.

My gut says that tracing is the right approach to this -- access logging is going to be expensive in high-throughput systems and I don't think there's a lot we can do about that. That said, there may be approaches that minimize the impact by more-narrowly targeting the spans/events to be captured for access logging.

I'll try to make some time to look at this more closely in the next ~week. I think we'd be pretty eager to add support to access logging as long as its cost is only incurred when a user opts to enable the feature.

codecov-commenter · 2021-06-17T02:46:52Z

Codecov Report

Merging #601 (65d372a) into main (8fa04f5) will decrease coverage by 0.30%.
The diff coverage is 36.13%.

@@            Coverage Diff             @@
##             main     #601      +/-   ##
==========================================
- Coverage   74.45%   74.15%   -0.31%     
==========================================
  Files         233      235       +2     
  Lines       14671    14783     +112     
==========================================
+ Hits        10924    10962      +38     
- Misses       3747     3821      +74

Impacted Files	Coverage Δ
linkerd/app/gateway/src/lib.rs	`42.47% <0.00%> (-1.56%)`	⬇️
linkerd/tracing/src/test.rs	`100.00% <ø> (ø)`
linkerd/tracing/src/access_log.rs	`12.50% <12.50%> (ø)`
linkerd/http-access-log/src/lib.rs	`29.41% <29.41%> (ø)`
linkerd/tracing/src/lib.rs	`52.32% <85.00%> (+4.38%)`	⬆️
linkerd/app/inbound/src/http/mod.rs	`74.31% <100.00%> (+0.23%)`	⬆️
linkerd/app/inbound/src/target.rs	`62.06% <100.00%> (+0.66%)`	⬆️
linkerd/service-profiles/src/proto.rs	`70.94% <0.00%> (+1.70%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8fa04f5...65d372a. Read the comment docs.

jon-depop · 2021-08-11T13:46:55Z

Hi 👋! Would it be possible to get a status update on this PR @olix0r @hawkw?

0x91 · 2021-08-12T13:57:05Z

+1 we're running the fork from @tustvold but it would be great to have this in a stable production release.

## Motivation Currently, filtering with `Layer`s is always _global_. If a `Layer` performs filtering, it will disable a span or event for _all_ layers that compose the current subscriber. In some cases, however, it is often desirable for individual layers to see different spans or events than the rest of the `Layer` stack. Issues in other projects, such as tokio-rs/console#64 and tokio-rs/console#76, linkerd/linkerd2-proxy#601, influxdata/influxdb_iox#1012 and influxdata/influxdb_iox#1681, jackwaudby/spaghetti#86, etc; as well as `tracing` feature requests like #302, #597, and #1348, all indicate that there is significant demand for the ability to add filters to individual `Layer`s. Unfortunately, doing this nicely is somewhat complicated. Although a naive implementation that simply skips events/spans in `Layer::on_event` and `Layer::new_span` based on some filter is relatively simple, this wouldn't really be an ideal solution, for a number of reasons. A proper per-layer filtering implementation would satisfy the following _desiderata_: * If a per-layer filter disables a span, it shouldn't be present _for the layer that filter is attached to_ when iterating over span contexts (such as `Context::event_scope`, `SpanRef::scope`, etc), or when looking up a span's parents. * When _all_ per-layer filters disable a span or event, it should be completely disabled, rather than simply skipped by those particular layers. This means that per-layer filters should participate in `enabled`, as well as being able to skip spans and events in `new_span` and `on_event`. * Per-layer filters shouldn't interfere with non-filtered `Layer`s. If a subscriber contains layers without any filters, as well as layers with per-layer filters, the non-filtered `Layer`s should behave exactly as they would without any per-layer filters present. * Similarly, per-layer filters shouldn't interfere with global filtering. If a `Layer` in a stack is performing global filtering (e.g. the current filtering behavior), per-layer filters should also be effected by the global filter. * Per-layer filters _should_ be able to participate in `Interest` caching, _but only when doing so doesn't interfere with non-per-layer-filtered layers_. * Per-layer filters should be _tree-shaped_. If a `Subscriber` consists of multiple layers that have been `Layered` together to form new `Layer`s, and some of the `Layered` layers have per-layer filters, those per-layer filters should effect all layers in that subtree. Similarly, if `Layer`s in a per-layer filtered subtree have their _own_ per-layer filters, those layers should be effected by the union of their own filters and any per-layer filters that wrap them at higher levels in the tree. Meeting all these requirements means that implementing per-layer filtering correctly is somewhat more complex than simply skipping events and spans in a `Layer`'s `on_event` and `new_span` callbacks. ## Solution This branch has a basic working implementation of per-layer filtering for `tracing-subscriber` v0.2. It should be possible to add this in a point release without a breaking change --- in particular, the new per-layer filtering feature _doesn't_ interfere with the global filtering behavior that layers currently have when not using the new per-layer filtering APIs, so existing configurations should all behave exactly as they do now. ### Implementation The new per-layer filtering API consists of a `Filter` trait that defines a filtering strategy and a `Filtered` type that combines a `Filter` with a `Layer`. When `enabled` is called on a `Filtered` layer, it checks the metadata against the `Filter` and stores whether that filter enabled the span/event in a thread-local cell. If every per-layer filter disables a span/event, and there are no non-per-layer-filtered layers in use, the `Registry`'s `enabled` will return `false`. Otherwise, when the span/event is recorded, each `Filtered` layer's `on_event` and `new_span` method checks with the `Registry` to see whether _it_ enabled or disabled that span/event, and skips passing it to the wrapped layer if it was disabled. When a span is recorded, the `Registry` which filters disabled it, so that they can skip it when looking up spans in other callbacks. A new `on_layer` method was added to the `Layer` trait, which allows running a callback when adding a `Layer` to a `Subscriber`. This allows mutating both the `Layer` and the `Subscriber`, since they are both passed by value when layering a `Layer` onto a `Subscriber`, and a new `register_filter` method was added to `LookupSpan`, which returns a `FilterId` type. When a `Filtered` layer is added to a subscriber that implements `LookupSpan`, it calls `register_filter` in its `on_layer` hook to generate a unique ID for its filter. `FilterId` can be used to look up whether a span or event was enabled by the `Filtered` layer. Internally, the filter results are stored in a simple bitset to reduce the performance overhead of adding per-layer filtering as much as possible. The `FilterId` type is actually a bitmask that's used to set the bit corresponding to a `Filtered` layer in that bitmap when it disables a span or event, and check whether the bit is set when querying if a span or event was disabled. Setting a bit in the bitset and testing whether its set is extremely efficient --- it's just a couple of bitwise ops. Additionally, the "filter map" that tracks which filters disabled a span or event is just a single `u64` --- we don't need to allocate a costly `HashSet` or similar every time we want to filter an event. This *does* mean that we are limited to 64 unique filters per `Registry`...however, I would be _very_ surprised if anyone _ever_ builds a subscriber with 64 layers, let alone 64 separate per-layer filters. Additionally, the `Context` and `SpanRef` types now also potentially carry `FilterId`s, so that they can filter span lookups and span traversals for `Filtered` layers. When `Filtered` layers are nested using the `Layered`, the `Context` _combines_ the filter ID of the inner and outer layers so that spans can be skipped if they were disabled by either. Finally, an implementation of the `Filter` trait was added for `LevelFilter`, and new `FilterFn` and `DynFilterFn` structs were added to produce a `Filter` implementation from a function pointer or closure. ### Notes Some other miscellaneous factors to consider when reviewing this change include: * I did a bit of unrelated refactoring in the `layer` module. Since we were adding more code to the `Layer` module (the `Filter` trait), the `layer.rs` file got significantly longer and was becoming somewhat hard to navigate. Therefore, I split out the `Context` and `Layered` types into their own files. Most of the code in those files is the same as it was before, although some of it was changed in order to support per-layer filtering. Apologies in advance to reviewers for this... * In order to test this change, I added some new test-support code in `tracing-subscriber`. The `tracing` test support code provides a `Subscriber` implementation that can be configured to expect a particular set of spans and events, and assert that they occurred as expected. In order to test per-layer filtering, I added a new `Layer` implementation that does the same thing, but implements `Layer` rather than `Subscriber`. This allows testing per-layer filter behavior in the same way we test global filter behavior. Reviewers should feel free to just skim the test the test support changes, or skip reviewing them entirely. ## Future Work There is a bunch of additional stuff we can do once this change lands. In order to limit the size of this PR, I didn't make these changes yet, but once this merges, we will want to consider some important follow up changes: * [ ] Currently, _only_ `tracing-subscriber`'s `Registry` is capable of serving as a root subscriber for per-layer filters. This is in order to avoid stabilizing a lot of the per-layer filtering design as public API without taking the time to ensure there are no serious issues. In the future, we will want to add new APIs to allow users to implement their own root `Subscriber`s that can host layers with per-layer filtering. * [ ] The `EnvFilter` type doesn't implement the `Filter` trait and thus cannot currently be used as a per-layer filter. We should add an implementation of `Filter` for `EnvFilter`. * [ ] The new `FilterFn` and `DynFilterFn` types could conceivably _also_ be used as global filters. We could consider adding `Layer` implementations to allow them to be used for global filtering. * [ ] We could consider adding new `Filter` implementations for performing common filtering predicates. For example, "only enable spans/events with a particular target or a set of targets", "only enable spans", "only enable events", etc. Then, we could add a combinator API to combine these predicates to build new filters. * [ ] It would be nice if the `Interest` system could be used to cache the result of multiple individual per-layer filter evaluations, so that we don't need to always use `enabled` when there are several per-layer filters that disagree on whether or not they want a given span/event. There are a lot of unused bits in `Interest` that could potentially be used for this purpose. However, this would require new API changes in `tracing-core`, and is therefore out of scope for this PR. Closes #302 Closes #597 Closes #1348 Signed-off-by: Eliza Weisman <[email protected]>

jon-depop · 2021-10-08T11:26:46Z

@hawkw what's the latest update on this PR?

hawkw · 2021-10-08T15:21:03Z

@hawkw what's the latest update on this PR?

Funny you should ask, because I'm actually currently working on updating this branch with the latest main and getting it ready to merge!

Recent upstream changes (tokio-rs/tracing#1523) have made it possible to filter which tracing spans and events are recorded on a granular per-layer basis, allowing us to separate the events that are used for generating the access log from those that are logged to stdout. So, that unlocks moving forward with this change.

Currently, the way the `tracing` subscriber is constructed is somewhat convoluted, due to potentially constructing formatting layers with different types (either plaintext logging or JSON). Because of this, we can't simply have methods that return the `fmt` layer. Instead, we have to pass the `Registry` into the methods that construct the `fmt` layers and return a type-erased `Dispatch` after adding the `fmt` layers. This is kind of not great. In particular, it makes adding additional layers difficult, such as access logging (#601) and (eventually) `tokio-console` support. This branch refactors the subscriber construction to use the (recently-added) `impl Layer<S> for Box<dyn Layer<S>>` in `tracing-subscriber`. Now, we can erase the types of the JSON and plaintext `fmt` layers and return them from the methods that construct them, and layer them onto the `Registry` in `tracing::Settings::build()`. This makes the `tracing` setup significantly easier, and will enable changes I want to make in #601. Boxing these layers does add slight overhead of dynamic dispatch + a pointer dereference. However, I doubt this has a huge performance impact in practice...

olix0r · 2021-10-20T19:57:54Z

I'm going to close this out in favor of #1319, which takes advantage of new tracing features. I'm hopeful we can merge that PR fairly soon!

## Motivation Currently, filtering with `Layer`s is always _global_. If a `Layer` performs filtering, it will disable a span or event for _all_ layers that compose the current subscriber. In some cases, however, it is often desirable for individual layers to see different spans or events than the rest of the `Layer` stack. Issues in other projects, such as tokio-rs/console#64 and tokio-rs/console#76, linkerd/linkerd2-proxy#601, influxdata/influxdb_iox#1012 and influxdata/influxdb_iox#1681, jackwaudby/spaghetti#86, etc; as well as `tracing` feature requests like #302, #597, and #1348, all indicate that there is significant demand for the ability to add filters to individual `Layer`s. Unfortunately, doing this nicely is somewhat complicated. Although a naive implementation that simply skips events/spans in `Layer::on_event` and `Layer::new_span` based on some filter is relatively simple, this wouldn't really be an ideal solution, for a number of reasons. A proper per-layer filtering implementation would satisfy the following _desiderata_: * If a per-layer filter disables a span, it shouldn't be present _for the layer that filter is attached to_ when iterating over span contexts (such as `Context::event_scope`, `SpanRef::scope`, etc), or when looking up a span's parents. * When _all_ per-layer filters disable a span or event, it should be completely disabled, rather than simply skipped by those particular layers. This means that per-layer filters should participate in `enabled`, as well as being able to skip spans and events in `new_span` and `on_event`. * Per-layer filters shouldn't interfere with non-filtered `Layer`s. If a subscriber contains layers without any filters, as well as layers with per-layer filters, the non-filtered `Layer`s should behave exactly as they would without any per-layer filters present. * Similarly, per-layer filters shouldn't interfere with global filtering. If a `Layer` in a stack is performing global filtering (e.g. the current filtering behavior), per-layer filters should also be effected by the global filter. * Per-layer filters _should_ be able to participate in `Interest` caching, _but only when doing so doesn't interfere with non-per-layer-filtered layers_. * Per-layer filters should be _tree-shaped_. If a `Subscriber` consists of multiple layers that have been `Layered` together to form new `Layer`s, and some of the `Layered` layers have per-layer filters, those per-layer filters should effect all layers in that subtree. Similarly, if `Layer`s in a per-layer filtered subtree have their _own_ per-layer filters, those layers should be effected by the union of their own filters and any per-layer filters that wrap them at higher levels in the tree. Meeting all these requirements means that implementing per-layer filtering correctly is somewhat more complex than simply skipping events and spans in a `Layer`'s `on_event` and `new_span` callbacks. ## Solution This branch has a basic working implementation of per-layer filtering for `tracing-subscriber` v0.2. It should be possible to add this in a point release without a breaking change --- in particular, the new per-layer filtering feature _doesn't_ interfere with the global filtering behavior that layers currently have when not using the new per-layer filtering APIs, so existing configurations should all behave exactly as they do now. ### Implementation The new per-layer filtering API consists of a `Filter` trait that defines a filtering strategy and a `Filtered` type that combines a `Filter` with a `Layer`. When `enabled` is called on a `Filtered` layer, it checks the metadata against the `Filter` and stores whether that filter enabled the span/event in a thread-local cell. If every per-layer filter disables a span/event, and there are no non-per-layer-filtered layers in use, the `Registry`'s `enabled` will return `false`. Otherwise, when the span/event is recorded, each `Filtered` layer's `on_event` and `new_span` method checks with the `Registry` to see whether _it_ enabled or disabled that span/event, and skips passing it to the wrapped layer if it was disabled. When a span is recorded, the `Registry` which filters disabled it, so that they can skip it when looking up spans in other callbacks. A new `on_layer` method was added to the `Layer` trait, which allows running a callback when adding a `Layer` to a `Subscriber`. This allows mutating both the `Layer` and the `Subscriber`, since they are both passed by value when layering a `Layer` onto a `Subscriber`, and a new `register_filter` method was added to `LookupSpan`, which returns a `FilterId` type. When a `Filtered` layer is added to a subscriber that implements `LookupSpan`, it calls `register_filter` in its `on_layer` hook to generate a unique ID for its filter. `FilterId` can be used to look up whether a span or event was enabled by the `Filtered` layer. Internally, the filter results are stored in a simple bitset to reduce the performance overhead of adding per-layer filtering as much as possible. The `FilterId` type is actually a bitmask that's used to set the bit corresponding to a `Filtered` layer in that bitmap when it disables a span or event, and check whether the bit is set when querying if a span or event was disabled. Setting a bit in the bitset and testing whether its set is extremely efficient --- it's just a couple of bitwise ops. Additionally, the "filter map" that tracks which filters disabled a span or event is just a single `u64` --- we don't need to allocate a costly `HashSet` or similar every time we want to filter an event. This *does* mean that we are limited to 64 unique filters per `Registry`...however, I would be _very_ surprised if anyone _ever_ builds a subscriber with 64 layers, let alone 64 separate per-layer filters. Additionally, the `Context` and `SpanRef` types now also potentially carry `FilterId`s, so that they can filter span lookups and span traversals for `Filtered` layers. When `Filtered` layers are nested using the `Layered`, the `Context` _combines_ the filter ID of the inner and outer layers so that spans can be skipped if they were disabled by either. Finally, an implementation of the `Filter` trait was added for `LevelFilter`, and new `FilterFn` and `DynFilterFn` structs were added to produce a `Filter` implementation from a function pointer or closure. ### Notes Some other miscellaneous factors to consider when reviewing this change include: * I did a bit of unrelated refactoring in the `layer` module. Since we were adding more code to the `Layer` module (the `Filter` trait), the `layer.rs` file got significantly longer and was becoming somewhat hard to navigate. Therefore, I split out the `Context` and `Layered` types into their own files. Most of the code in those files is the same as it was before, although some of it was changed in order to support per-layer filtering. Apologies in advance to reviewers for this... * In order to test this change, I added some new test-support code in `tracing-subscriber`. The `tracing` test support code provides a `Subscriber` implementation that can be configured to expect a particular set of spans and events, and assert that they occurred as expected. In order to test per-layer filtering, I added a new `Layer` implementation that does the same thing, but implements `Layer` rather than `Subscriber`. This allows testing per-layer filter behavior in the same way we test global filter behavior. Reviewers should feel free to just skim the test the test support changes, or skip reviewing them entirely. ## Future Work There is a bunch of additional stuff we can do once this change lands. In order to limit the size of this PR, I didn't make these changes yet, but once this merges, we will want to consider some important follow up changes: * [ ] Currently, _only_ `tracing-subscriber`'s `Registry` is capable of serving as a root subscriber for per-layer filters. This is in order to avoid stabilizing a lot of the per-layer filtering design as public API without taking the time to ensure there are no serious issues. In the future, we will want to add new APIs to allow users to implement their own root `Subscriber`s that can host layers with per-layer filtering. * [ ] The `EnvFilter` type doesn't implement the `Filter` trait and thus cannot currently be used as a per-layer filter. We should add an implementation of `Filter` for `EnvFilter`. * [ ] The new `FilterFn` and `DynFilterFn` types could conceivably _also_ be used as global filters. We could consider adding `Layer` implementations to allow them to be used for global filtering. * [ ] We could consider adding new `Filter` implementations for performing common filtering predicates. For example, "only enable spans/events with a particular target or a set of targets", "only enable spans", "only enable events", etc. Then, we could add a combinator API to combine these predicates to build new filters. * [ ] It would be nice if the `Interest` system could be used to cache the result of multiple individual per-layer filter evaluations, so that we don't need to always use `enabled` when there are several per-layer filters that disagree on whether or not they want a given span/event. There are a lot of unused bits in `Interest` that could potentially be used for this purpose. However, this would require new API changes in `tracing-core`, and is therefore out of scope for this PR. Closes #302 Closes #597 Closes #1348 Signed-off-by: Eliza Weisman <[email protected]>

@tustvold

This branch builds on @tustvold's work in #601. The original PR description from that branch: > Access logging is very important functionality for my team as we wish > to maintain feature-parity with our existing AWS ALB-based approach. > This functionality was requested > [here](linkerd/linkerd2#1913) and was marked > as help wanted, so thought I'd take a stab at implementing it. > > Creating as a draft as still needs more testing and benchmarking, and > I'm new to tower and so might have made some rookie errors. However, I > wanted to create a draft as an opportunity to get some early feedback. > > The basic design consists of an AccessLogLayer that instruments both > requests and responses that flow through it, in a similar manner to > how handle_time is already computed. I'm not a massive fan of this, > but it was the only way I could easily see to get accurate processing > time metrics. I've tried to avoid any memory allocation on the hot > path, although there are possibly more atomic increments than I would > like. The performance impact with the feature disabled, i.e. > LINKERD2_PROXY_ACCESS_LOG_FILE, not set should be minimal. > > The results of this instrumentation are then sent over a mpsc channel > to an AccessLogCollector that writes them in a space-delimited format > to a file specified as an environment variable. It buffers in memory > and flushes on termination and on write if more than > FLUSH_TIMEOUT_SECS since the last flush. This makes the access logging > best effort much like AWS ALBs. > > An example deployment scenario using this functionality might deploy a > fluent-bit sidecar to ship the logs, or write to /dev/stderr and use a > log shipper deployed as a DaemonSet. The additional changes in this branch are: - Updated against the latest state of the `main` branch. - Changed the `tracing` configuration to use per-layer filtering, so that the access log layer _only_ sees access log spans, while the stdout logging layer does not see the access log spans (although, it _could_ if we wanted it to...) - Changed the format for outputting the access log to the Apache Common Log Format. Note that this format does *not* include all the data that the access log spans currently collect; I excluded that data so that the output is compatible with tools that ingest the Apache log format. In a follow-up PR, we can add the ability to control what format the access log is written in, and add an alternative format that includes all the access log data that Linkerd's spans can collect (I suggest newline-delimited JSON for this). Of course, a huge thank you to @tustvold for all their work on this; I only updated the branch with the latest changes and made some minor improvements. :) Co-authored-by: Raphael Taylor-Davies <[email protected]>

; This is the 1st commit message: subscriber: implement per-subscriber filtering (#1523) ## Motivation Currently, filtering with `Subscribe`s is always _global_. If a `Subscribe` performs filtering, it will disable a span or event for _all_ subscribers that compose the current subscriber. In some cases, however, it is often desirable for individual subscribers to see different spans or events than the rest of the `Subscribe` stack. Issues in other projects, such as tokio-rs/console#64 and tokio-rs/console#76, linkerd/linkerd2-proxy#601, influxdata/influxdb_iox#1012 and influxdata/influxdb_iox#1681, jackwaudby/spaghetti#86, etc; as well as `tracing` feature requests like #302, #597, and #1348, all indicate that there is significant demand for the ability to add filters to individual `Subscribe`s. Unfortunately, doing this nicely is somewhat complicated. Although a naive implementation that simply skips events/spans in `Subscribe::on_event` and `Subscribe::new_span` based on some filter is relatively simple, this wouldn't really be an ideal solution, for a number of reasons. A proper per-subscriber filtering implementation would satisfy the following _desiderata_: * If a per-subscriber filter disables a span, it shouldn't be present _for the subscriber that filter is attached to_ when iterating over span contexts (such as `Context::event_scope`, `SpanRef::scope`, etc), or when looking up a span's parents. * When _all_ per-subscriber filters disable a span or event, it should be completely disabled, rather than simply skipped by those particular subscribers. This means that per-subscriber filters should participate in `enabled`, as well as being able to skip spans and events in `new_span` and `on_event`. * Per-subscriber filters shouldn't interfere with non-filtered `Subscribe`s. If a subscriber contains subscribers without any filters, as well as subscribers with per-subscriber filters, the non-filtered `Subscribe`s should behave exactly as they would without any per-subscriber filters present. * Similarly, per-subscriber filters shouldn't interfere with global filtering. If a `Subscribe` in a stack is performing global filtering (e.g. the current filtering behavior), per-subscriber filters should also be effected by the global filter. * Per-subscriber filters _should_ be able to participate in `Interest` caching, _but only when doing so doesn't interfere with non-per-subscriber-filtered subscribers_. * Per-subscriber filters should be _tree-shaped_. If a `Subscriber` consists of multiple subscribers that have been `Subscribeed` together to form new `Subscribe`s, and some of the `Subscribeed` subscribers have per-subscriber filters, those per-subscriber filters should effect all subscribers in that subtree. Similarly, if `Subscribe`s in a per-subscriber filtered subtree have their _own_ per-subscriber filters, those subscribers should be effected by the union of their own filters and any per-subscriber filters that wrap them at higher levels in the tree. Meeting all these requirements means that implementing per-subscriber filtering correctly is somewhat more complex than simply skipping events and spans in a `Subscribe`'s `on_event` and `new_span` callbacks. ## Solution This branch has a basic working implementation of per-subscriber filtering for `tracing-subscriber` v0.2. It should be possible to add this in a point release without a breaking change --- in particular, the new per-subscriber filtering feature _doesn't_ interfere with the global filtering behavior that subscribers currently have when not using the new per-subscriber filtering APIs, so existing configurations should all behave exactly as they do now. ### Implementation The new per-subscriber filtering API consists of a `Filter` trait that defines a filtering strategy and a `Filtered` type that combines a `Filter` with a `Subscribe`. When `enabled` is called on a `Filtered` subscriber, it checks the metadata against the `Filter` and stores whether that filter enabled the span/event in a thread-local cell. If every per-subscriber filter disables a span/event, and there are no non-per-subscriber-filtered subscribers in use, the `Registry`'s `enabled` will return `false`. Otherwise, when the span/event is recorded, each `Filtered` subscriber's `on_event` and `new_span` method checks with the `Registry` to see whether _it_ enabled or disabled that span/event, and skips passing it to the wrapped subscriber if it was disabled. When a span is recorded, the `Registry` which filters disabled it, so that they can skip it when looking up spans in other callbacks. A new `on_subscriber` method was added to the `Subscribe` trait, which allows running a callback when adding a `Subscribe` to a `Subscriber`. This allows mutating both the `Subscribe` and the `Subscriber`, since they are both passed by value when subscribering a `Subscribe` onto a `Subscriber`, and a new `register_filter` method was added to `LookupSpan`, which returns a `FilterId` type. When a `Filtered` subscriber is added to a subscriber that implements `LookupSpan`, it calls `register_filter` in its `on_subscriber` hook to generate a unique ID for its filter. `FilterId` can be used to look up whether a span or event was enabled by the `Filtered` subscriber. Internally, the filter results are stored in a simple bitset to reduce the performance overhead of adding per-subscriber filtering as much as possible. The `FilterId` type is actually a bitmask that's used to set the bit corresponding to a `Filtered` subscriber in that bitmap when it disables a span or event, and check whether the bit is set when querying if a span or event was disabled. Setting a bit in the bitset and testing whether its set is extremely efficient --- it's just a couple of bitwise ops. Additionally, the "filter map" that tracks which filters disabled a span or event is just a single `u64` --- we don't need to allocate a costly `HashSet` or similar every time we want to filter an event. This *does* mean that we are limited to 64 unique filters per `Registry`...however, I would be _very_ surprised if anyone _ever_ builds a subscriber with 64 subscribers, let alone 64 separate per-subscriber filters. Additionally, the `Context` and `SpanRef` types now also potentially carry `FilterId`s, so that they can filter span lookups and span traversals for `Filtered` subscribers. When `Filtered` subscribers are nested using the `Subscribeed`, the `Context` _combines_ the filter ID of the inner and outer subscribers so that spans can be skipped if they were disabled by either. Finally, an implementation of the `Filter` trait was added for `LevelFilter`, and new `FilterFn` and `DynFilterFn` structs were added to produce a `Filter` implementation from a function pointer or closure. ### Notes Some other miscellaneous factors to consider when reviewing this change include: * I did a bit of unrelated refactoring in the `subscriber` module. Since we were adding more code to the `Subscribe` module (the `Filter` trait), the `subscriber.rs` file got significantly longer and was becoming somewhat hard to navigate. Therefore, I split out the `Context` and `Subscribeed` types into their own files. Most of the code in those files is the same as it was before, although some of it was changed in order to support per-subscriber filtering. Apologies in advance to reviewers for this... * In order to test this change, I added some new test-support code in `tracing-subscriber`. The `tracing` test support code provides a `Subscriber` implementation that can be configured to expect a particular set of spans and events, and assert that they occurred as expected. In order to test per-subscriber filtering, I added a new `Subscribe` implementation that does the same thing, but implements `Subscribe` rather than `Subscriber`. This allows testing per-subscriber filter behavior in the same way we test global filter behavior. Reviewers should feel free to just skim the test the test support changes, or skip reviewing them entirely. ## Future Work There is a bunch of additional stuff we can do once this change lands. In order to limit the size of this PR, I didn't make these changes yet, but once this merges, we will want to consider some important follow up changes: * [ ] Currently, _only_ `tracing-subscriber`'s `Registry` is capable of serving as a root subscriber for per-subscriber filters. This is in order to avoid stabilizing a lot of the per-subscriber filtering design as public API without taking the time to ensure there are no serious issues. In the future, we will want to add new APIs to allow users to implement their own root `Subscriber`s that can host subscribers with per-subscriber filtering. * [ ] The `EnvFilter` type doesn't implement the `Filter` trait and thus cannot currently be used as a per-subscriber filter. We should add an implementation of `Filter` for `EnvFilter`. * [ ] The new `FilterFn` and `DynFilterFn` types could conceivably _also_ be used as global filters. We could consider adding `Subscribe` implementations to allow them to be used for global filtering. * [ ] We could consider adding new `Filter` implementations for performing common filtering predicates. For example, "only enable spans/events with a particular target or a set of targets", "only enable spans", "only enable events", etc. Then, we could add a combinator API to combine these predicates to build new filters. * [ ] It would be nice if the `Interest` system could be used to cache the result of multiple individual per-subscriber filter evaluations, so that we don't need to always use `enabled` when there are several per-subscriber filters that disagree on whether or not they want a given span/event. There are a lot of unused bits in `Interest` that could potentially be used for this purpose. However, this would require new API changes in `tracing-core`, and is therefore out of scope for this PR. Closes #302 Closes #597 Closes #1348 Signed-off-by: Eliza Weisman <[email protected]> ; The commit message #2 will be skipped: ; fixup: start making changes for filter ; The commit message #3 will be skipped: ; fixup: move stuff to `subscribe` folder from `layer ; The commit message #4 will be skipped: ; fixup: additional compilation errors

## Motivation Currently, filtering with `Subscribe`s is always _global_. If a `Subscribe` performs filtering, it will disable a span or event for _all_ subscribers that compose the current subscriber. In some cases, however, it is often desirable for individual subscribers to see different spans or events than the rest of the `Subscribe` stack. Issues in other projects, such as tokio-rs/console#64 and tokio-rs/console#76, linkerd/linkerd2-proxy#601, influxdata/influxdb_iox#1012 and influxdata/influxdb_iox#1681, jackwaudby/spaghetti#86, etc; as well as `tracing` feature requests like #302, #597, and #1348, all indicate that there is significant demand for the ability to add filters to individual `Subscribe`s. Unfortunately, doing this nicely is somewhat complicated. Although a naive implementation that simply skips events/spans in `Subscribe::on_event` and `Subscribe::new_span` based on some filter is relatively simple, this wouldn't really be an ideal solution, for a number of reasons. A proper per-subscriber filtering implementation would satisfy the following _desiderata_: * If a per-subscriber filter disables a span, it shouldn't be present _for the subscriber that filter is attached to_ when iterating over span contexts (such as `Context::event_scope`, `SpanRef::scope`, etc), or when looking up a span's parents. * When _all_ per-subscriber filters disable a span or event, it should be completely disabled, rather than simply skipped by those particular subscribers. This means that per-subscriber filters should participate in `enabled`, as well as being able to skip spans and events in `new_span` and `on_event`. * Per-subscriber filters shouldn't interfere with non-filtered `Subscribe`s. If a subscriber contains subscribers without any filters, as well as subscribers with per-subscriber filters, the non-filtered `Subscribe`s should behave exactly as they would without any per-subscriber filters present. * Similarly, per-subscriber filters shouldn't interfere with global filtering. If a `Subscribe` in a stack is performing global filtering (e.g. the current filtering behavior), per-subscriber filters should also be effected by the global filter. * Per-subscriber filters _should_ be able to participate in `Interest` caching, _but only when doing so doesn't interfere with non-per-subscriber-filtered subscribers_. * Per-subscriber filters should be _tree-shaped_. If a `Subscriber` consists of multiple subscribers that have been `Subscribeed` together to form new `Subscribe`s, and some of the `Subscribeed` subscribers have per-subscriber filters, those per-subscriber filters should effect all subscribers in that subtree. Similarly, if `Subscribe`s in a per-subscriber filtered subtree have their _own_ per-subscriber filters, those subscribers should be effected by the union of their own filters and any per-subscriber filters that wrap them at higher levels in the tree. Meeting all these requirements means that implementing per-subscriber filtering correctly is somewhat more complex than simply skipping events and spans in a `Subscribe`'s `on_event` and `new_span` callbacks. ## Solution This branch has a basic working implementation of per-subscriber filtering for `tracing-subscriber` v0.2. It should be possible to add this in a point release without a breaking change --- in particular, the new per-subscriber filtering feature _doesn't_ interfere with the global filtering behavior that subscribers currently have when not using the new per-subscriber filtering APIs, so existing configurations should all behave exactly as they do now. ### Implementation The new per-subscriber filtering API consists of a `Filter` trait that defines a filtering strategy and a `Filtered` type that combines a `Filter` with a `Subscribe`. When `enabled` is called on a `Filtered` subscriber, it checks the metadata against the `Filter` and stores whether that filter enabled the span/event in a thread-local cell. If every per-subscriber filter disables a span/event, and there are no non-per-subscriber-filtered subscribers in use, the `Registry`'s `enabled` will return `false`. Otherwise, when the span/event is recorded, each `Filtered` subscriber's `on_event` and `new_span` method checks with the `Registry` to see whether _it_ enabled or disabled that span/event, and skips passing it to the wrapped subscriber if it was disabled. When a span is recorded, the `Registry` which filters disabled it, so that they can skip it when looking up spans in other callbacks. A new `on_subscriber` method was added to the `Subscribe` trait, which allows running a callback when adding a `Subscribe` to a `Subscriber`. This allows mutating both the `Subscribe` and the `Subscriber`, since they are both passed by value when subscribering a `Subscribe` onto a `Subscriber`, and a new `register_filter` method was added to `LookupSpan`, which returns a `FilterId` type. When a `Filtered` subscriber is added to a subscriber that implements `LookupSpan`, it calls `register_filter` in its `on_subscriber` hook to generate a unique ID for its filter. `FilterId` can be used to look up whether a span or event was enabled by the `Filtered` subscriber. Internally, the filter results are stored in a simple bitset to reduce the performance overhead of adding per-subscriber filtering as much as possible. The `FilterId` type is actually a bitmask that's used to set the bit corresponding to a `Filtered` subscriber in that bitmap when it disables a span or event, and check whether the bit is set when querying if a span or event was disabled. Setting a bit in the bitset and testing whether its set is extremely efficient --- it's just a couple of bitwise ops. Additionally, the "filter map" that tracks which filters disabled a span or event is just a single `u64` --- we don't need to allocate a costly `HashSet` or similar every time we want to filter an event. This *does* mean that we are limited to 64 unique filters per `Registry`...however, I would be _very_ surprised if anyone _ever_ builds a subscriber with 64 subscribers, let alone 64 separate per-subscriber filters. Additionally, the `Context` and `SpanRef` types now also potentially carry `FilterId`s, so that they can filter span lookups and span traversals for `Filtered` subscribers. When `Filtered` subscribers are nested using the `Subscribeed`, the `Context` _combines_ the filter ID of the inner and outer subscribers so that spans can be skipped if they were disabled by either. Finally, an implementation of the `Filter` trait was added for `LevelFilter`, and new `FilterFn` and `DynFilterFn` structs were added to produce a `Filter` implementation from a function pointer or closure. ### Notes Some other miscellaneous factors to consider when reviewing this change include: * I did a bit of unrelated refactoring in the `subscriber` module. Since we were adding more code to the `Subscribe` module (the `Filter` trait), the `subscriber.rs` file got significantly longer and was becoming somewhat hard to navigate. Therefore, I split out the `Context` and `Subscribeed` types into their own files. Most of the code in those files is the same as it was before, although some of it was changed in order to support per-subscriber filtering. Apologies in advance to reviewers for this... * In order to test this change, I added some new test-support code in `tracing-subscriber`. The `tracing` test support code provides a `Subscriber` implementation that can be configured to expect a particular set of spans and events, and assert that they occurred as expected. In order to test per-subscriber filtering, I added a new `Subscribe` implementation that does the same thing, but implements `Subscribe` rather than `Subscriber`. This allows testing per-subscriber filter behavior in the same way we test global filter behavior. Reviewers should feel free to just skim the test the test support changes, or skip reviewing them entirely. ## Future Work There is a bunch of additional stuff we can do once this change lands. In order to limit the size of this PR, I didn't make these changes yet, but once this merges, we will want to consider some important follow up changes: * [ ] Currently, _only_ `tracing-subscriber`'s `Registry` is capable of serving as a root subscriber for per-subscriber filters. This is in order to avoid stabilizing a lot of the per-subscriber filtering design as public API without taking the time to ensure there are no serious issues. In the future, we will want to add new APIs to allow users to implement their own root `Subscriber`s that can host subscribers with per-subscriber filtering. * [ ] The `EnvFilter` type doesn't implement the `Filter` trait and thus cannot currently be used as a per-subscriber filter. We should add an implementation of `Filter` for `EnvFilter`. * [ ] The new `FilterFn` and `DynFilterFn` types could conceivably _also_ be used as global filters. We could consider adding `Subscribe` implementations to allow them to be used for global filtering. * [ ] We could consider adding new `Filter` implementations for performing common filtering predicates. For example, "only enable spans/events with a particular target or a set of targets", "only enable spans", "only enable events", etc. Then, we could add a combinator API to combine these predicates to build new filters. * [ ] It would be nice if the `Interest` system could be used to cache the result of multiple individual per-subscriber filter evaluations, so that we don't need to always use `enabled` when there are several per-subscriber filters that disagree on whether or not they want a given span/event. There are a lot of unused bits in `Interest` that could potentially be used for this purpose. However, this would require new API changes in `tracing-core`, and is therefore out of scope for this PR. Closes #302 Closes #597 Closes #1348 Signed-off-by: Eliza Weisman <[email protected]>

Currently, the way the `tracing` subscriber is constructed is somewhat convoluted, due to potentially constructing formatting layers with different types (either plaintext logging or JSON). Because of this, we can't simply have methods that return the `fmt` layer. Instead, we have to pass the `Registry` into the methods that construct the `fmt` layers and return a type-erased `Dispatch` after adding the `fmt` layers. This is kind of not great. In particular, it makes adding additional layers difficult, such as access logging (#601) and (eventually) `tokio-console` support. This branch refactors the subscriber construction to use the (recently-added) `impl Layer<S> for Box<dyn Layer<S>>` in `tracing-subscriber`. Now, we can erase the types of the JSON and plaintext `fmt` layers and return them from the methods that construct them, and layer them onto the `Registry` in `tracing::Settings::build()`. This makes the `tracing` setup significantly easier, and will enable changes I want to make in #601. Boxing these layers does add slight overhead of dynamic dispatch + a pointer dereference. However, I doubt this has a huge performance impact in practice... (cherry picked from commit 3bb7ec4) Signed-off-by: Oliver Gould <[email protected]>

## Motivation Currently, filtering with `Subscribe`s is always _global_. If a `Subscribe` performs filtering, it will disable a span or event for _all_ subscribers that compose the current subscriber. In some cases, however, it is often desirable for individual subscribers to see different spans or events than the rest of the `Subscribe` stack. Issues in other projects, such as tokio-rs/console#64 and tokio-rs/console#76, linkerd/linkerd2-proxy#601, influxdata/influxdb_iox#1012 and influxdata/influxdb_iox#1681, jackwaudby/spaghetti#86, etc; as well as `tracing` feature requests like #302, #597, and #1348, all indicate that there is significant demand for the ability to add filters to individual `Subscribe`s. Unfortunately, doing this nicely is somewhat complicated. Although a naive implementation that simply skips events/spans in `Subscribe::on_event` and `Subscribe::new_span` based on some filter is relatively simple, this wouldn't really be an ideal solution, for a number of reasons. A proper per-subscriber filtering implementation would satisfy the following _desiderata_: * If a per-subscriber filter disables a span, it shouldn't be present _for the subscriber that filter is attached to_ when iterating over span contexts (such as `Context::event_scope`, `SpanRef::scope`, etc), or when looking up a span's parents. * When _all_ per-subscriber filters disable a span or event, it should be completely disabled, rather than simply skipped by those particular subscribers. This means that per-subscriber filters should participate in `enabled`, as well as being able to skip spans and events in `new_span` and `on_event`. * Per-subscriber filters shouldn't interfere with non-filtered `Subscribe`s. If a subscriber contains subscribers without any filters, as well as subscribers with per-subscriber filters, the non-filtered `Subscribe`s should behave exactly as they would without any per-subscriber filters present. * Similarly, per-subscriber filters shouldn't interfere with global filtering. If a `Subscribe` in a stack is performing global filtering (e.g. the current filtering behavior), per-subscriber filters should also be effected by the global filter. * Per-subscriber filters _should_ be able to participate in `Interest` caching, _but only when doing so doesn't interfere with non-per-subscriber-filtered subscribers_. * Per-subscriber filters should be _tree-shaped_. If a `Subscriber` consists of multiple subscribers that have been `Subscribeed` together to form new `Subscribe`s, and some of the `Subscribeed` subscribers have per-subscriber filters, those per-subscriber filters should effect all subscribers in that subtree. Similarly, if `Subscribe`s in a per-subscriber filtered subtree have their _own_ per-subscriber filters, those subscribers should be effected by the union of their own filters and any per-subscriber filters that wrap them at higher levels in the tree. Meeting all these requirements means that implementing per-subscriber filtering correctly is somewhat more complex than simply skipping events and spans in a `Subscribe`'s `on_event` and `new_span` callbacks. ## Solution This branch has a basic working implementation of per-subscriber filtering for `tracing-subscriber` v0.2. It should be possible to add this in a point release without a breaking change --- in particular, the new per-subscriber filtering feature _doesn't_ interfere with the global filtering behavior that subscribers currently have when not using the new per-subscriber filtering APIs, so existing configurations should all behave exactly as they do now. ### Implementation The new per-subscriber filtering API consists of a `Filter` trait that defines a filtering strategy and a `Filtered` type that combines a `Filter` with a `Subscribe`. When `enabled` is called on a `Filtered` subscriber, it checks the metadata against the `Filter` and stores whether that filter enabled the span/event in a thread-local cell. If every per-subscriber filter disables a span/event, and there are no non-per-subscriber-filtered subscribers in use, the `Registry`'s `enabled` will return `false`. Otherwise, when the span/event is recorded, each `Filtered` subscriber's `on_event` and `new_span` method checks with the `Registry` to see whether _it_ enabled or disabled that span/event, and skips passing it to the wrapped subscriber if it was disabled. When a span is recorded, the `Registry` which filters disabled it, so that they can skip it when looking up spans in other callbacks. A new `on_subscriber` method was added to the `Subscribe` trait, which allows running a callback when adding a `Subscribe` to a `Subscriber`. This allows mutating both the `Subscribe` and the `Subscriber`, since they are both passed by value when subscribering a `Subscribe` onto a `Subscriber`, and a new `register_filter` method was added to `LookupSpan`, which returns a `FilterId` type. When a `Filtered` subscriber is added to a subscriber that implements `LookupSpan`, it calls `register_filter` in its `on_subscriber` hook to generate a unique ID for its filter. `FilterId` can be used to look up whether a span or event was enabled by the `Filtered` subscriber. Internally, the filter results are stored in a simple bitset to reduce the performance overhead of adding per-subscriber filtering as much as possible. The `FilterId` type is actually a bitmask that's used to set the bit corresponding to a `Filtered` subscriber in that bitmap when it disables a span or event, and check whether the bit is set when querying if a span or event was disabled. Setting a bit in the bitset and testing whether its set is extremely efficient --- it's just a couple of bitwise ops. Additionally, the "filter map" that tracks which filters disabled a span or event is just a single `u64` --- we don't need to allocate a costly `HashSet` or similar every time we want to filter an event. This *does* mean that we are limited to 64 unique filters per `Registry`...however, I would be _very_ surprised if anyone _ever_ builds a subscriber with 64 subscribers, let alone 64 separate per-subscriber filters. Additionally, the `Context` and `SpanRef` types now also potentially carry `FilterId`s, so that they can filter span lookups and span traversals for `Filtered` subscribers. When `Filtered` subscribers are nested using the `Subscribeed`, the `Context` _combines_ the filter ID of the inner and outer subscribers so that spans can be skipped if they were disabled by either. Finally, an implementation of the `Filter` trait was added for `LevelFilter`, and new `FilterFn` and `DynFilterFn` structs were added to produce a `Filter` implementation from a function pointer or closure. ### Notes Some other miscellaneous factors to consider when reviewing this change include: * I did a bit of unrelated refactoring in the `subscriber` module. Since we were adding more code to the `Subscribe` module (the `Filter` trait), the `subscriber.rs` file got significantly longer and was becoming somewhat hard to navigate. Therefore, I split out the `Context` and `Subscribeed` types into their own files. Most of the code in those files is the same as it was before, although some of it was changed in order to support per-subscriber filtering. Apologies in advance to reviewers for this... * In order to test this change, I added some new test-support code in `tracing-subscriber`. The `tracing` test support code provides a `Subscriber` implementation that can be configured to expect a particular set of spans and events, and assert that they occurred as expected. In order to test per-subscriber filtering, I added a new `Subscribe` implementation that does the same thing, but implements `Subscribe` rather than `Subscriber`. This allows testing per-subscriber filter behavior in the same way we test global filter behavior. Reviewers should feel free to just skim the test the test support changes, or skip reviewing them entirely. ## Future Work There is a bunch of additional stuff we can do once this change lands. In order to limit the size of this PR, I didn't make these changes yet, but once this merges, we will want to consider some important follow up changes: * [ ] Currently, _only_ `tracing-subscriber`'s `Registry` is capable of serving as a root subscriber for per-subscriber filters. This is in order to avoid stabilizing a lot of the per-subscriber filtering design as public API without taking the time to ensure there are no serious issues. In the future, we will want to add new APIs to allow users to implement their own root `Subscriber`s that can host subscribers with per-subscriber filtering. * [ ] The `EnvFilter` type doesn't implement the `Filter` trait and thus cannot currently be used as a per-subscriber filter. We should add an implementation of `Filter` for `EnvFilter`. * [ ] The new `FilterFn` and `DynFilterFn` types could conceivably _also_ be used as global filters. We could consider adding `Subscribe` implementations to allow them to be used for global filtering. * [ ] We could consider adding new `Filter` implementations for performing common filtering predicates. For example, "only enable spans/events with a particular target or a set of targets", "only enable spans", "only enable events", etc. Then, we could add a combinator API to combine these predicates to build new filters. * [ ] It would be nice if the `Interest` system could be used to cache the result of multiple individual per-subscriber filter evaluations, so that we don't need to always use `enabled` when there are several per-subscriber filters that disagree on whether or not they want a given span/event. There are a lot of unused bits in `Interest` that could potentially be used for this purpose. However, this would require new API changes in `tracing-core`, and is therefore out of scope for this PR. Closes #302 Closes #597 Closes #1348 Signed-off-by: Eliza Weisman <[email protected]>

Currently, filtering with `Layer`s is always _global_. If a `Layer` performs filtering, it will disable a span or event for _all_ layers that compose the current subscriber. In some cases, however, it is often desirable for individual layers to see different spans or events than the rest of the `Layer` stack. Issues in other projects, such as tokio-rs/console#64 and tokio-rs/console#76, linkerd/linkerd2-proxy#601, influxdata/influxdb_iox#1012 and influxdata/influxdb_iox#1681, jackwaudby/spaghetti#86, etc; as well as `tracing` feature requests like the ability to add filters to individual `Layer`s. Unfortunately, doing this nicely is somewhat complicated. Although a naive implementation that simply skips events/spans in `Layer::on_event` and `Layer::new_span` based on some filter is relatively simple, this wouldn't really be an ideal solution, for a number of reasons. A proper per-layer filtering implementation would satisfy the following _desiderata_: * If a per-layer filter disables a span, it shouldn't be present _for the layer that filter is attached to_ when iterating over span contexts (such as `Context::event_scope`, `SpanRef::scope`, etc), or when looking up a span's parents. * When _all_ per-layer filters disable a span or event, it should be completely disabled, rather than simply skipped by those particular layers. This means that per-layer filters should participate in `enabled`, as well as being able to skip spans and events in `new_span` and `on_event`. * Per-layer filters shouldn't interfere with non-filtered `Layer`s. If a subscriber contains layers without any filters, as well as layers with per-layer filters, the non-filtered `Layer`s should behave exactly as they would without any per-layer filters present. * Similarly, per-layer filters shouldn't interfere with global filtering. If a `Layer` in a stack is performing global filtering (e.g. the current filtering behavior), per-layer filters should also be effected by the global filter. * Per-layer filters _should_ be able to participate in `Interest` caching, _but only when doing so doesn't interfere with non-per-layer-filtered layers_. * Per-layer filters should be _tree-shaped_. If a `Subscriber` consists of multiple layers that have been `Layered` together to form new `Layer`s, and some of the `Layered` layers have per-layer filters, those per-layer filters should effect all layers in that subtree. Similarly, if `Layer`s in a per-layer filtered subtree have their _own_ per-layer filters, those layers should be effected by the union of their own filters and any per-layer filters that wrap them at higher levels in the tree. Meeting all these requirements means that implementing per-layer filtering correctly is somewhat more complex than simply skipping events and spans in a `Layer`'s `on_event` and `new_span` callbacks. This branch has a basic working implementation of per-layer filtering for `tracing-subscriber` v0.2. It should be possible to add this in a point release without a breaking change --- in particular, the new per-layer filtering feature _doesn't_ interfere with the global filtering behavior that layers currently have when not using the new per-layer filtering APIs, so existing configurations should all behave exactly as they do now. The new per-layer filtering API consists of a `Filter` trait that defines a filtering strategy and a `Filtered` type that combines a `Filter` with a `Layer`. When `enabled` is called on a `Filtered` layer, it checks the metadata against the `Filter` and stores whether that filter enabled the span/event in a thread-local cell. If every per-layer filter disables a span/event, and there are no non-per-layer-filtered layers in use, the `Registry`'s `enabled` will return `false`. Otherwise, when the span/event is recorded, each `Filtered` layer's `on_event` and `new_span` method checks with the `Registry` to see whether _it_ enabled or disabled that span/event, and skips passing it to the wrapped layer if it was disabled. When a span is recorded, the `Registry` which filters disabled it, so that they can skip it when looking up spans in other callbacks. A new `on_layer` method was added to the `Layer` trait, which allows running a callback when adding a `Layer` to a `Subscriber`. This allows mutating both the `Layer` and the `Subscriber`, since they are both passed by value when layering a `Layer` onto a `Subscriber`, and a new `register_filter` method was added to `LookupSpan`, which returns a `FilterId` type. When a `Filtered` layer is added to a subscriber that implements `LookupSpan`, it calls `register_filter` in its `on_layer` hook to generate a unique ID for its filter. `FilterId` can be used to look up whether a span or event was enabled by the `Filtered` layer. Internally, the filter results are stored in a simple bitset to reduce the performance overhead of adding per-layer filtering as much as possible. The `FilterId` type is actually a bitmask that's used to set the bit corresponding to a `Filtered` layer in that bitmap when it disables a span or event, and check whether the bit is set when querying if a span or event was disabled. Setting a bit in the bitset and testing whether its set is extremely efficient --- it's just a couple of bitwise ops. Additionally, the "filter map" that tracks which filters disabled a span or event is just a single `u64` --- we don't need to allocate a costly `HashSet` or similar every time we want to filter an event. This *does* mean that we are limited to 64 unique filters per `Registry`...however, I would be _very_ surprised if anyone _ever_ builds a subscriber with 64 layers, let alone 64 separate per-layer filters. Additionally, the `Context` and `SpanRef` types now also potentially carry `FilterId`s, so that they can filter span lookups and span traversals for `Filtered` layers. When `Filtered` layers are nested using the `Layered`, the `Context` _combines_ the filter ID of the inner and outer layers so that spans can be skipped if they were disabled by either. Finally, an implementation of the `Filter` trait was added for `LevelFilter`, and new `FilterFn` and `DynFilterFn` structs were added to produce a `Filter` implementation from a function pointer or closure. Some other miscellaneous factors to consider when reviewing this change include: * I did a bit of unrelated refactoring in the `layer` module. Since we were adding more code to the `Layer` module (the `Filter` trait), the `layer.rs` file got significantly longer and was becoming somewhat hard to navigate. Therefore, I split out the `Context` and `Layered` types into their own files. Most of the code in those files is the same as it was before, although some of it was changed in order to support per-layer filtering. Apologies in advance to reviewers for this... * In order to test this change, I added some new test-support code in `tracing-subscriber`. The `tracing` test support code provides a `Subscriber` implementation that can be configured to expect a particular set of spans and events, and assert that they occurred as expected. In order to test per-layer filtering, I added a new `Layer` implementation that does the same thing, but implements `Layer` rather than `Subscriber`. This allows testing per-layer filter behavior in the same way we test global filter behavior. Reviewers should feel free to just skim the test the test support changes, or skip reviewing them entirely. There is a bunch of additional stuff we can do once this change lands. In order to limit the size of this PR, I didn't make these changes yet, but once this merges, we will want to consider some important follow up changes: * [ ] Currently, _only_ `tracing-subscriber`'s `Registry` is capable of serving as a root subscriber for per-layer filters. This is in order to avoid stabilizing a lot of the per-layer filtering design as public API without taking the time to ensure there are no serious issues. In the future, we will want to add new APIs to allow users to implement their own root `Subscriber`s that can host layers with per-layer filtering. * [ ] The `EnvFilter` type doesn't implement the `Filter` trait and thus cannot currently be used as a per-layer filter. We should add an implementation of `Filter` for `EnvFilter`. * [ ] The new `FilterFn` and `DynFilterFn` types could conceivably _also_ be used as global filters. We could consider adding `Layer` implementations to allow them to be used for global filtering. * [ ] We could consider adding new `Filter` implementations for performing common filtering predicates. For example, "only enable spans/events with a particular target or a set of targets", "only enable spans", "only enable events", etc. Then, we could add a combinator API to combine these predicates to build new filters. * [ ] It would be nice if the `Interest` system could be used to cache the result of multiple individual per-layer filter evaluations, so that we don't need to always use `enabled` when there are several per-layer filters that disagree on whether or not they want a given span/event. There are a lot of unused bits in `Interest` that could potentially be used for this purpose. However, this would require new API changes in `tracing-core`, and is therefore out of scope for this PR. Closes tokio-rs#302 Closes tokio-rs#597 Closes tokio-rs#1348 Signed-off-by: Eliza Weisman <[email protected]>

olix0r requested a review from hawkw July 23, 2020 03:22

tustvold force-pushed the access-logging branch from 175f709 to f8dd015 Compare July 25, 2020 14:30

tustvold force-pushed the access-logging branch 2 times, most recently from 141cfe2 to 3ce298d Compare July 25, 2020 22:43

tustvold marked this pull request as ready for review July 25, 2020 22:45

tustvold requested a review from a team July 25, 2020 22:45

tustvold force-pushed the access-logging branch 2 times, most recently from b810118 to c44d873 Compare July 25, 2020 22:59

tustvold force-pushed the access-logging branch 2 times, most recently from d8c3e1c to 3f1df4d Compare July 26, 2020 14:16

hawkw reviewed Jul 26, 2020

View reviewed changes

linkerd/access-log/src/layer.rs Outdated Show resolved Hide resolved

linkerd/access-log/src/layer.rs Outdated Show resolved Hide resolved

linkerd/app/src/access_log_collector.rs Outdated Show resolved Hide resolved

tustvold mentioned this pull request Aug 15, 2020

Wait for proxy thread to terminate in integration tests #625

Merged

tustvold force-pushed the access-logging branch 2 times, most recently from e24bacf to 3aa7aca Compare September 15, 2020 16:45

tustvold force-pushed the access-logging branch from 3aa7aca to 6b1120d Compare September 21, 2020 08:03

tustvold requested a review from hawkw September 24, 2020 17:45

tustvold force-pushed the access-logging branch from 6b1120d to 95687e1 Compare October 16, 2020 09:46

tustvold force-pushed the access-logging branch 2 times, most recently from 0153782 to e5b3901 Compare November 16, 2020 16:01

hawkw mentioned this pull request Sep 3, 2021

subscriber: implement per-layer filtering tokio-rs/tracing#1523

Merged

5 tasks

hawkw mentioned this pull request Oct 8, 2021

tracing: simplify subscriber construction with Boxed layers #1304

Merged

hawkw mentioned this pull request Oct 20, 2021

trace: Apache Common Log Format access logging #1319

Merged

olix0r closed this Oct 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access Logging #601

Access Logging #601

tustvold commented Jul 22, 2020 •

edited

Loading

olix0r commented Jul 23, 2020

hawkw commented Jul 23, 2020

tustvold commented Jul 25, 2020 •

edited

Loading

tustvold commented Jul 25, 2020

tustvold commented Jul 26, 2020

hawkw commented Jul 26, 2020

hawkw left a comment

tustvold commented Jul 26, 2020 •

edited

Loading

tustvold commented Sep 15, 2020 •

edited

Loading

tustvold commented Sep 21, 2020 •

edited

Loading

tustvold commented Nov 16, 2020

olix0r commented Nov 16, 2020

codecov-commenter commented Jun 17, 2021

jon-depop commented Aug 11, 2021

0x91 commented Aug 12, 2021

jon-depop commented Oct 8, 2021

hawkw commented Oct 8, 2021 •

edited

Loading

olix0r commented Oct 20, 2021

Access Logging #601

Access Logging #601

Conversation

tustvold commented Jul 22, 2020 • edited Loading

olix0r commented Jul 23, 2020

hawkw commented Jul 23, 2020

tustvold commented Jul 25, 2020 • edited Loading

tustvold commented Jul 25, 2020

tustvold commented Jul 26, 2020

hawkw commented Jul 26, 2020

hawkw left a comment

Choose a reason for hiding this comment

tustvold commented Jul 26, 2020 • edited Loading

tustvold commented Sep 15, 2020 • edited Loading

tustvold commented Sep 21, 2020 • edited Loading

tustvold commented Nov 16, 2020

olix0r commented Nov 16, 2020

codecov-commenter commented Jun 17, 2021

Codecov Report

jon-depop commented Aug 11, 2021

0x91 commented Aug 12, 2021

jon-depop commented Oct 8, 2021

hawkw commented Oct 8, 2021 • edited Loading

olix0r commented Oct 20, 2021

tustvold commented Jul 22, 2020 •

edited

Loading

tustvold commented Jul 25, 2020 •

edited

Loading

tustvold commented Jul 26, 2020 •

edited

Loading

tustvold commented Sep 15, 2020 •

edited

Loading

tustvold commented Sep 21, 2020 •

edited

Loading

hawkw commented Oct 8, 2021 •

edited

Loading