Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spdlog): impl JsonFormatter #69

Merged
merged 20 commits into from
Aug 5, 2024

Conversation

NotEvenANeko
Copy link
Contributor

@NotEvenANeko NotEvenANeko commented Jul 22, 2024

Implement JsonFormatter with serde and serde_json

Add serde_json as a feature, add serde and serde_json as deps.

Copy link
Owner

@SpriteOvO SpriteOvO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NotEvenANeko Appreciate your work on this. Serialized logging is a pretty meaningful feature that makes it easier for third-party programs to parse the logs output by spdlog-rs.

Most code looks great, just a few minor issues and nitpicks. Nice work! 👍

Comment on lines 26 to 32
record.serialize_field::<str>(
"time",
local_time_cacher
.get(self.0.time())
.full_second_str()
.as_ref(),
)?;
Copy link
Owner

@SpriteOvO SpriteOvO Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the machine format JSON, a timestamp number is more friendly. How about we rename the field to timestamp and then get the unix timestamp from the cacher?

UPDATE: Seems that the cacher is not needed, just get the timestamp from field time in Record should be fine.

spdlog/src/formatter/json_formatter.rs Outdated Show resolved Hide resolved
spdlog/src/error.rs Outdated Show resolved Hide resolved
spdlog/src/formatter/json_formatter.rs Outdated Show resolved Hide resolved
spdlog/src/formatter/json_formatter.rs Show resolved Hide resolved
}
}

/// JSON logs formatter
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// JSON logs formatter
#[rustfmt::skip]
/// JSON logs formatter

The doc formatting seems to be broken by Rustfmt, try adding a #[rustfmt::skip] on it.

@Lancern
Copy link
Collaborator

Lancern commented Jul 23, 2024

Not a blocker for this PR, but in a future PR maybe we could generalize this idea to something like SerdeFormatter and not stick to just JSON. Then JsonFormatter would become a specialization of SerdeFormatter.

@@ -37,6 +37,7 @@ native = []
libsystemd = ["libsystemd-sys"]
multi-thread = ["crossbeam"]
runtime-pattern = ["spdlog-internal"]
json-formatter = ["serde", "serde_json"]
Copy link
Owner

@SpriteOvO SpriteOvO Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this gate these days. Perhaps we can enable JsonFormatter when feature serde is enabled and feature serde implies dependencies serde and serde_json, then feature json-formatter is no longer needed.

The benefit of it is that we have ability to add other serialization crates in the future, so users can choose the "backend" of JsonFormatter by enabling just one feature.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might add lots of dependencies for a single feature as new formatters are introduced.

Or we can just make a SerdeFormatter as @Lancern said to decouple the serializer from the formatter. So user can choose any serializer instead of the serde_json.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might add lots of dependencies for a single feature as new formatters are introduced.

Hmm, makes sense. What about enable JsonFormatter when feature serde_json is enabled, so serde_json implies serde?

Or we can just make a SerdeFormatter as @Lancern said to decouple the serializer from the formatter. So user can choose any serializer instead of the serde_json.

SerdeFormatter is fine, but we are in no hurry to implement it in this PR.

What I meant by my comment was that JsonFormatter should not be limited to only using serde as a backend. Semantically, JSON is a format and serde is an implementation. It seems more correct that the implementation implies the format, rather than the format implying the implementation. So that if now or in the future, there are any json crates that aren't serde-based, we could easily add it without breaking anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serde_json implies serde sounds more reasonable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that if now or in the future, there are any json crates that aren't serde-based, we could easily add it without breaking anything.

Is this really implementable? serde is the de-facto standard and we already rely on the critical abstractions it provides. The "critical abstractions" I'm talking about here is the Serialize trait which is provided by serde. I cannot work out a way to bypass it and still implement a usable JsonFormatter.

Copy link
Owner

@SpriteOvO SpriteOvO Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really implementable? serde is the de-facto standard [...] I cannot work out a way to bypass it and still implement a usable JsonFormatter.

It is. For now, there are crates rkyv, miniserde, nanoserde, etc., they each have their own advantages and are not serde-based. If we can harmlessly preserve the maximum possibilities, why not do so?

[...] and we already rely on the critical abstractions it provides. The "critical abstractions" I'm talking about here is the Serialize trait which is provided by serde.

If you are talking about deriving serde::Serialize macro for types of spdlog-rs, this can be unforced. Considering the following psuode-code:

#[cfg(and(feature = "serde", feature = "rkyv"))
compile_error!("you can select only one serialization backend");

#[cfg_attr(feature = "serde", derive(serde::Serialize))]
#[cfg_attr(feature = "rkyv", derive(rkyv::Serialize))]
pub enum Level {
    // ...
}

Then implement JsonFormatter:

// In fact, we'll apply this condition on `mod json_formatter` and `pub use json_formatter::JsonFormatter`.
#[cfg(any(feature = "serde_json", feature = "rkyv"))
pub struct JsonFormatter(PhantomData);

impl Formatter for JsonFormatter {
    fn format(&self, record: &Record, dest: &mut StringBuf) -> crate::Result<FmtExtraInfo> {
        #[cfg(feature = "serde_json")]
        {
            // using `serde_json`
        }
        #[cfg(feature = "rkyv")]
        {
            // using `rkyv`
        }
    }
}

Also, we are not in a hurry to implement these other crate supports in this PR right now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SpriteOvO OK this makes sense. I thought you mean we completely abandon any dependencies on serialization libraries and let the users choose what suits them. So we're still depending on a bunch of pre-selected serialization libraries (even if they're just optional) and let the user choose one among them.

Comment on lines 28 to 37
record.serialize_field(
"timestamp",
&self
.0
.time()
.duration_since(SystemTime::UNIX_EPOCH)
.expect("invalid timestamp")
.as_secs()
.to_string(),
)?;
Copy link
Owner

@SpriteOvO SpriteOvO Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
record.serialize_field(
"timestamp",
&self
.0
.time()
.duration_since(SystemTime::UNIX_EPOCH)
.expect("invalid timestamp")
.as_secs()
.to_string(),
)?;
record.serialize_field(
"timestamp",
self
.0
.time()
.duration_since(SystemTime::UNIX_EPOCH)
.ok()
.and_then(|dur| u64::try_from(dur.as_millis()).ok())
.expect("invalid timestamp"),
)?;

I've discussed with @NotEvenANeko about this in private messages.

Initially I simply thought that UNIX timestamp would be OK, but then realized that it is in seconds, which is obviously not precise enough for logging. After some investigation, I think a millisecond timestamp might be more appropriate, based on the following clues:

  • I prefer not to format it to a human-readable string (e.g. ISO 8601 / RFC 3339) because JSON will eventually be parsed by a machine. And the performance of date-time formatting is quite bad for both serialization and deserialization.

  • in JavaScript, Date::now and the constructor of Date returns and accepts a number of milliseconds since January 1, 1970 00:00:00 UTC, with leap seconds ignored.

  • most programming languages have time libraries that support parsing millisecond timestamps directly or indirectly.

  • nanoseconds are more precise, but it uses more significant integer digits, and perhaps this precision is unnecessary.

Since this integer will eventually be expressed in JSON, and Duration::as_millis returns a u128, we need to convert it to u64 to express it in JSON. Although the maximum safe type for JSON integers in JavaScript is i53, I tested it with Rust chrono crate, and the maximum millisecond timestamp it supports parsing doesn't even exceed i53::MAX, and that maximum timestamp has reached the year 262142. So in my opinion, representing it as a string is not necessary for the foreseeable future.

extern crate chrono; // 0.4.38

use chrono::prelude::*;

fn main() {
    const JS_MAX_SAFE_INTEGER: i64 = 9007199254740991;
    let max_utc = DateTime::<Utc>::MAX_UTC;
    let max_timestamp = max_utc.timestamp_millis();
    println!("Max UTC supported by chrono   : {}", max_utc);
    println!("Max timestamp in millisecond  : {}", max_timestamp);
    println!("Max safe integer in JavaScript: {}", JS_MAX_SAFE_INTEGER);
    println!("Max u64                       : {}", u64::MAX);
    assert!(max_timestamp <= JS_MAX_SAFE_INTEGER);
}
Max UTC supported by chrono   : +262142-12-31 23:59:59.999999999 UTC
Max timestamp in millisecond  : 8210266876799999
Max safe integer in JavaScript: 9007199254740991
Max u64                       : 18446744073709551615

Using u64::try_from rather than as u64 here so we can catch the overflow error instead of implicit truncation.

@SpriteOvO SpriteOvO force-pushed the feat/json-formatter branch 2 times, most recently from 2b6358c to 2fce767 Compare August 5, 2024 01:33
//
// The performance can be significantly optimized here if the issue can be
// solved.
dest.write_str(&serde_json::to_string(&json_record)?)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this comment as a note.

test bench_1_full_formatter       ... bench:          90.39 ns/iter (+/- 9.58)
test bench_1_json_formatter       ... bench:         121.27 ns/iter (+/- 9.10)

I benchmarked JsonFormatter and I was expecting it to be faster than that (close to FullFormatter). After a quick check, the String returned by serde::to_string will do unnecessary allocation and should be optimized out.

Unfortunately, due to the serde_json API limitation (serde-rs/json#863), this is currently not possible (at least not without using unsafe).

@SpriteOvO SpriteOvO force-pushed the feat/json-formatter branch from 2fce767 to e698a45 Compare August 5, 2024 01:46
Copy link
Owner

@SpriteOvO SpriteOvO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, much appreciated your work on this @NotEvenANeko and @Lancern's reviews.

@SpriteOvO SpriteOvO merged commit 4bc900b into SpriteOvO:main Aug 5, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants