Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically resolved metadata (normalized_metadata) in core #2048

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

CAD97
Copy link
Contributor

@CAD97 CAD97 commented Apr 4, 2022

(With humble apologies to the CODEOWNERS getting pinged for mechanical changes...)

Best reviewed commit-by-commit. I've gone to lengths to ensure that each commit is a semantically meaninful chunk on its own and builds/passes all tests.

This is a "what if" mainly to scratch my own itch. Feel free to say "we don't want this", as it is not insignificant chunk of additional complexity, and broadly breaking changes to core APIs. However, I do think this is generally beneficial, thus the implementation and PR.

PR #2043 offers a lesser solution which works for [email protected]. This is generally the "principled" version of that hack.

Motivation

Consider the case where I'm binding to an external library that provides C FFI. (For convenience, we'll call it lib.) Because they're trying to provide a nicely usable library, lib performs some diagnostic logging.

By default, lib logs directly to stderr. However, because lib wants to integrate well into your application, they provide a callback API for you to handle lib's logs. The interface looks something like this:

/// Configure delivery of debug messages.
extern "C" Lib_Configure_Debug(
    flags: Lib_Debug_Flags,
    callback: Option<extern "C" fn(
        /// The flags (e.g. severity) for this debug message.
        flags: Lib_Debug_Flags,
        /// The file name that this debug message is from.
        file: *const c_char,
        /// The line number that this debug message is from.
        line: c_int,
        /// The semantic location that this debug message is from.
        func: *const c_char,
        /// The actual debug message to be e.g. logged.
        message: *const c_char,
    ) -> Lib_Result>;
) -> Lib_Result;

We can then hook up lib to tracing like this:

let enable_flags = determine_enabled();
ffi::Lib_Configure_Debug(enable_flags, lib_debug_callback)?;

extern "C" fn lib_debug_callback(
    flags: ffi::Lib_Debug_Flags,
    file: *const ffi::c_char,
    line: ffi::c_int,
    func: *const ffi::c_char,
    message: *const ffi::c_char,
) -> ffi::Lib_Result {
    let level = debug_flags_to_level(flags)?;
    let file: Option<&str> = cstr(file)?;
    let line: u32 = line.try_into()?;
    let func: Option<&str> = cstr(func)?;
    let message: Option<&str> = cstr(message)?;
    
    tracing::event!(
        level: level,
        // wait... how do we specify file/line?
        message
    );
    
    ffi::Lib_Result_Ok
}

In current tracing, it's impossible to hook this debug callback into tracing properly, because Metadata is 'static, and the metadata is where file/line/module_path live.

I want it to be possible to do so. And hey, why doesn't tracing-log have this same problem?..

Solution

tracing-log solves the issue by conspriring with approximately the entire tracing ecosystem. Specifically, tracing-log contains the NormalizeEvent trait, whose purpose is to take whatever static metadata the event thinks it has and provide the real, normalized metadata instead.

Event normalization is done in tracing-log by identifying that an event came from a tracing-log callsite, and then using the fields of that event to smuggle in the dynamic metadata that log provides.

That's where the conspriring comes into play. Either code knows about tracing-log, uses NormalizeEvent and filters out event fields log.target, log.module_path, log.file, and log.line for is_log() events1, or they treat events emitted from log's macros incorrectly; potentially making incorrect choices on the metadata, and logging the metadata as semantic log fields.

This PR takes much the same approach, but in doing so directly in tracing-core, can do so mostly transparently to most consumers. However, we additionally support dynamic metadata holding any number of custom fields, so checking for a known callsite is no longer sufficient.

To identify a piece of dynamic metadata, we use a magic name. The name is constructed of two randomly chosen noncharacters, so should be sufficiently unique to not occur accidentally. Each metadata field has a magic name prefixed with the same magic, and are used to normalize the metadata as in tracing-log.

To hide the magic fields when not requested, we require them to be a strict (and ordered) prefix2 of the metadata's FieldSet. As such, we can slice off the prefix and only leave the nonmagic fields in the set. For events' ValueSet, we require by construction3 that events with dynamic metadata use the normalized FieldSet. For spans' attributes... a solution has yet to be designed3, but will likely take the same approach.

Each commit is a semantic step towards this goal, and compiles and passes tests individually. For more information, see each commit, which is has additional context attached as well. To accomplish dynamic metadata, we:

  • Add a lifetime to field::Visit, which communicates the lifetime of any visited string. Existing visitors can do impl Visit<'_> for the existing semantics.
  • Use the lifetime on field::Visit to remove the unsafety in tracing-log. The lifetime will allow doing so generically in the uplifted solution for spans as well as events.
  • Uplift the RecordFields trait into tracing-core.
  • Remove 'static from Metadata::name, so it can be injected as well, to replace the magic name for spans. (The metadata name is typically unused for events, but used for spans. Spans using dynamic metadata are expected to be less common, but will certainly want to inject a name for post-normalization.)
  • Make places that don't need &'static Metadata<'static> use a shorter lifetime, so they can work with dynamic metadata.
  • The big meat: implement the actual support for dynamic metadata. The mere act of providing the patch data is simple enough, but presenting a normalized view to the outside world is where things get interesting. Event and tracing-subscriber's SpanRef should present consistent post-normalization views unless prenormal data is requested.
  • Payoff: use the system we've implemented to implement tracing-log.

Remaining work

  • Tests, tests, tests. tracing-log acts as a simple proof that the system works, but we need more tests ensuring that magic fields don't leak into the public API except where explicitly requested.
  • APIs to actually emit spans/events with dynamic metadata.
  • Did I mention more tests?

Footnotes

  1. And here's an interesting note: currently, they do so by just always filtering out fields starting with "log." This means potential data loss for events which don't come from the log crate; a tracing::error!(log.data, "oh no"); will not record the log.data field for most subscribers. Even more annoying, consider putting log.file; it's not enough to filter out just the magic fields, they have to be filtered only when NormalizeEvent::is_log().

  2. When writing this, I realized that having them as a postfix may be simpler, as FieldSet::contains(Field) can continue to be a simple bounds check rather than having to account for an offset slice start.

  3. While there is a macro for creating the dynamic metadata itself, there is no public API for constructing/dispatching the event or span with said metadata. tracing-log continues to use its bespoke dispatch code. 2

@CAD97 CAD97 marked this pull request as ready for review April 4, 2022 21:11
@CAD97
Copy link
Contributor Author

CAD97 commented Apr 28, 2022

Technically closes #372, though supplying a dynamic level still isn't advised per the added documentation.

@CAD97
Copy link
Contributor Author

CAD97 commented Jun 14, 2022

(Sorry for flooding the actions queue with the rebase, but for this large PR I feel showing/proving each commit independently passes CI is important)

@gferon
Copy link

gferon commented Feb 2, 2023

@CAD97 thanks for this change, we were going on this exact same road before knowing that this PR existed. Is it something that you're willing to drive to the end?

@CAD97 CAD97 force-pushed the metadata-injection branch 2 times, most recently from df959a2 to 9763bdc Compare February 5, 2023 00:05
@CAD97
Copy link
Contributor Author

CAD97 commented Feb 5, 2023

@gferon I'm still excited to see something along these lines happen, and I've rebased the PR. Unfortunately, I don't have much time I can spend on refining this at this exact moment. I did my best to make sure each commit is a single unit and fairly easily understood in isolation; if y'all want to take this and go from here, or take it as a draft and redo it, I'm fine either way.

I might end up with some time to spend, in which case I'd love to spend that driving this change, but I can't promise anything at the moment; it's my final semester at $gradschool currently and that takes priority 🙂

Partially an exercise in "how bad would it be?" This allows the removal
of the unsafety in tracing-log's LogVisitor (see next commit), as well
as generalization of NormalizeEvent to more RecordField containers.
Changes like this are the purpose of the previous commit. Now that Visit
has a lifetime parameter, it can be used to extract &str data in a safe
and general manner. While tracing-log is already sound (as indicated by
this safe impl being valid), the lifetime unlocks more generic uses.
For the time being, it's also been unsealed. There doesn't seem to be a
specific reason to seal it, other than it just existing to generalize
over the provided types, and not being an input generalization.

It can be resealed without issue; however, it being in tracing-core is
necessary for what's coming up.
Yes, I'm as surprised as you are that this causes 0 issues.
Now that places are using non-'static metadata, we can overlay a patched
view of the metadata. Using the strategy seen in tracing-log, we
recognize specifically named metadata/fields and use the values they
provide to create metadata with the resolved runtime values.

Importantly, we also strive to only expose dynamic injection fields when
explicitly asked for. This means that most code can continue on without
knowing that these fields even exist.

Applying these patches does have a cost; notably, things which used to
deal in references like &Metadata or &FieldSet now often use an owned,
patched version instead. These types are primarily bags of references
themselves, and no copying of actual data is involved, but the fact that
the bags of references (which aren't exactly tiny) are copied does have
an impact.

More places which don't have to deal with metadata patching (e.g. the
macro support code) should be audited for what can use the _prenormal
versions of methods.

Additionally, there are likely places where the magic fields leak out
without being explicitly asked for. Ideally, these are all bugs which
can be fixed. There aren't really any tests that effectively check for
this, so new tests will need to be written.

Finaly, there's the question of actually dispatching such a dynamic
event. tracing-log is an example of doing so (but uses touchy, hidden
API to construct the ValueSet) (implemented in the next commit), but a
generally applicable is likely desirable. This will probably take the
shape of dynamic_span! and dynamic_event! macros in tracing proper.
And now we reach the point of this commit series. tracing-log is now
capable of implementing its dynamic metadata resolution on solely
publicly available APIs*, and thus so can anyone else do the same.

*Dispatch still uses #[doc(hidden)] APIs to create the value set.
Figuring out how to best expose is still an open question.
@emersonford
Copy link

just a note, this would make #2457 way cleaner to implement and would allow it to be an independent crate :)

@gferon
Copy link

gferon commented Feb 7, 2023

just a note, this would make #2457 way cleaner to implement and would allow it to be an independent crate :)

This is exactly the same I'm trying to pursue with this, but with C++ code using spdlog and this PR would help a lot indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants