Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the purpose of tendermint::Time? #865

Open
hdevalence opened this issue Apr 19, 2021 · 10 comments
Open

What's the purpose of tendermint::Time? #865

hdevalence opened this issue Apr 19, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@hdevalence
Copy link
Collaborator

The tendermint crate has a Time(DateTime<Utc>) structure that wraps the chrono::DateTime<Utc> type and implements a bunch of custom methods for it.

Currently in #862, I translated the Protobuf Timestamp type directly to a DateTime<Utc>, which I think is a more ergonomic type for downstream users and would prefer to use, unless there is a reason not to use it.

I wasn't sure from reading the documentation what the purpose of the custom time type was, but I didn't see an explanation of how it differs from a DateTime<Utc>. Because Time has impl From<DateTime<Utc>> for Time, the tendermint crate's public API is still locked to chrono 0.4, and because From impls are infallible, there's no way for the Time type to enforce any additional invariants on a DateTime<Utc>. So I'm not sure that there's really a meaningful encapsulation benefit.

Another possible motivation could be for use with JSON serialization -- from looking at the commit history, it seemed like there was a bug involved in the custom serialization (#775). It's easy to add custom serialization to a newtype wrapper compared to a third party type. But if this was the motivation, an alternative that doesn't involve creating and maintaining a new type could be to define JSON serialization on the tendermint_proto structs, rather than on all of the domain types, and use the From/TryFrom impls to apply that serialization to the domain types.

@hdevalence hdevalence added the enhancement New feature or request label Apr 19, 2021
@greg-szabo greg-szabo self-assigned this Apr 22, 2021
@greg-szabo
Copy link
Member

All right, so if I understand correctly, you're advocating for getting rid of tendermint::Time in lieu of tendermint_proto::Timestamp since there's no real added value.

So far, looking at the code I found that

Apart from checking out the JSON serialization I didn't find any other reason why this wouldn't be feasible to do. I'm going to play around with serialization and then attempt to implement this and see what happens. We might just end up with simpler code.

@greg-szabo
Copy link
Member

Just confirmed, tendermint::Time is falling back to tendermint_proto::Timestamp for serialization. No issues there.

@greg-szabo
Copy link
Member

greg-szabo commented Apr 22, 2021

tendermint::Time has a lot of little convenience functions, like now(), duration_since(), Display, String traits, etc. These are used in all kinds of tests and also possibly in downstream projects. All of them need to be updated or alternatives made available or somehow communicated that these are not available anymore.

We could add these to tendermint_proto so it can be easily reused, but it still requires a new crate to be imported to access these functions.

Or we could keep tendermint::Time around with a deprecation warning for a release cycle to get feedback from the downstream projects. Even if we do this, I'd rewrite these convenience functions to call into tendermint_proto so downstream projects have an example on how to do it after these functions go away.

I'd be happy to hear additional ideas and opinions before I sink my teeth too much into it.

@hdevalence
Copy link
Collaborator Author

All right, so if I understand correctly, you're advocating for getting rid of tendermint::Time in lieu of tendermint_proto::Timestamp since there's no real added value.

No, sorry for being unclear, I'd rather use the DateTime<Utc> that tendermint::Time holds internally:

Currently in #862, I translated the Protobuf Timestamp type directly to a DateTime<Utc>, which I think is a more ergonomic type for downstream users and would prefer to use, unless there is a reason not to use it.

I'd prefer not to use the raw proto types except for cases that really need to care about proto serialization specifically.

@greg-szabo
Copy link
Member

Huh, ok, that's interesting. If you want to use it directly, there's no way to add additional functionality to it, like convenient protobuf-encoding/decoding or JSON serialization/deserialization. As Tony mentioned in the Signal discussion, the newtype tendermint::Time was created exactly to add these additional functions.

I would like to entertain the idea but I can't yet imagine how it would work. We rip out tendermint::Time and anywhere it's used we replace it with DateTime<Utc>, right? We'd lose all the little functions (but possibly gain ones defined in DateTime) and when a type that is protobuf-encoded uses time, it has to have a "RawType" that converts it from/to DateTime so it can be encoded. A first-grade example is CanonicalProposal which works exactly like that. This looks doable. (Other candidates Proposal, Vote.)

Now, let's look at JSON-serialization. (I'm not going to repeat the blog entry I sent on Signal :) ). I went through the structs with a struct-analyzer and saw that most structs either use a RawType already (like CanonicalVote or block::Header) or they are not serialized/deserialized at all. This is important, because serde does implicit serialization if no RawType is defined: you add a field that is a struct with a field of Time and serde will follow the definitions and serialize case-by-case.

There is one place where this happens: the Genesis struct has a Time field and it does not have a RawType. Exchanging this to DateTime<Utc> makes us trust serde that it can deserialize an Rfc3339-compatible string into a DateTime<Utc>. Based on the custom deserializers we've already encountered, I believe we will have to do deserialization manually here. Interestingly, there is already a deserialize_time function right under the Genesis definition, which is currently unused. (I guess when Time got its serialization, this deserializer was removed from the struct attributes.)

Genesis JSON serialization is used in (at least) two places: In the RPC /genesis endpoint and when reading the genesis.json file during the first startup. Adding a custom time deserializer here sounds fine, as long as we have some kind of validation that downstream projects won't have to do the same.

So, yeah, I think I can entertain this idea. :) I'd be happy to hear others' opinions.

@hdevalence
Copy link
Collaborator Author

If you want to use it directly, there's no way to add additional functionality to it

I don't think this is quite true, it's possible to monkey-patch additional functionality onto third-party types using extension traits. The downside of this is that the user has to import the extension trait, but the upside is that they maintain access to all of the original API. In contrast, wrapping a struct to add functionality requires either removing access to the original API, making the user jump through a hoop to get it (conversions), or reimplementing it.

That said, in terms of the concrete functionality you mentioned:

like convenient protobuf-encoding/decoding or JSON serialization/deserialization. As Tony mentioned in the Signal discussion, the newtype tendermint::Time was created exactly to add these additional functions.

I would like to entertain the idea but I can't yet imagine how it would work. We rip out tendermint::Time and anywhere it's used we replace it with DateTime<Utc>, right? We'd lose all the little functions (but possibly gain ones defined in DateTime) and when a type that is protobuf-encoded uses time, it has to have a "RawType" that converts it from/to DateTime so it can be encoded. A first-grade example is CanonicalProposal which works exactly like that. This looks doable. (Other candidates Proposal, Vote.)

All the RawType proto types are generated from the proto definitions already though, right? So it's not that you'd need to be defining anything extra, but just defining conversions to the types that already exist.

Now, let's look at JSON-serialization. (I'm not going to repeat the blog entry I sent on Signal :) ). I went through the structs with a struct-analyzer and saw that most structs either use a RawType already (like CanonicalVote or block::Header) or they are not serialized/deserialized at all. This is important, because serde does implicit serialization if no RawType is defined: you add a field that is a struct with a field of Time and serde will follow the definitions and serialize case-by-case.

There is one place where this happens: the Genesis struct has a Time field and it does not have a RawType. Exchanging this to DateTime<Utc> makes us trust serde that it can deserialize an Rfc3339-compatible string into a DateTime<Utc>. Based on the custom deserializers we've already encountered, I believe we will have to do deserialization manually here. Interestingly, there is already a deserialize_time function right under the Genesis definition, which is currently unused. (I guess when Time got its serialization, this deserializer was removed from the struct attributes.)

Genesis JSON serialization is used in (at least) two places: In the RPC /genesis endpoint and when reading the genesis.json file during the first startup. Adding a custom time deserializer here sounds fine, as long as we have some kind of validation that downstream projects won't have to do the same.

So, yeah, I think I can entertain this idea. :) I'd be happy to hear others' opinions.

I guess my feeling here is that, for all of the types that have an associated proto, there's already a generated proto type, so the Serde annotations that define the exact shape of the JSON output would be best defined on the proto type, so that there's only one place in the code that's defining the serialization.

For structures that don't have a corresponding proto, like apparently the Genesis struct (it's never sent over ABCI?), I think it's better to define a serialization for that structure explicitly, rather than implicitly defining that structure's serialization by defining that timestamps in general should be serialized in the way that specific structure requires.

@greg-szabo greg-szabo removed their assignment Nov 15, 2021
@mzabaluev
Copy link
Contributor

mzabaluev commented Nov 26, 2021

In #1030, I'm steering Time to be more explicit in the value range it accepts (since we probably want RFC 3339 fmt to work with any Time value, which means the year must be in 0..=9999), hide the struct's internal representation, and generally provide a more disciplined API than the prost-generated Timestamp struct with public members.

A case can be made for Time becoming more of an operational data type, optimized for arithmetic and comparisons which are performed on this type, for example, in the light client. These operations are slow on parsed date-time, whether it's implemented by chrono or time. With changes made in #1030, it should be easy to later change the internal representation to an i128 Unix timestamp in nanoseconds (or even u128 if we don't need timestamps before 1970 and don't want one extra fallible conversion for arithmetic with core::time::Duration). This will lose some memory efficiency on 32-bit platforms, but in regard to CPU time, the extra overhead on conversion to/from human time data types will likely be negligible in parse/fmt use cases.

For serde, Time should delegate to the proto Timestamp type. The serde impls for that could use a more efficient binary struct serialization on non-human-readable serializers, in case someone decides to use one of those on our library types.

I should write these and a few other ideas up in an ADR, probably some time next week.

@hdevalence
Copy link
Collaborator Author

hdevalence commented Nov 27, 2021

A case can be made for Time becoming more of an operational data type, optimized for arithmetic and comparisons which are performed on this type, for example, in the light client. These operations are slow on parsed date-time, whether it's implemented by chrono or time.

Is there actually a performance issue using chrono::DateTime<Utc>?

EDIT: looking at the chrono source, it stores the time as a pair of u32s, together with an extra i32 encoding the date. It's hard to understand how a different representation would be significantly more efficient.

@mzabaluev
Copy link
Contributor

mzabaluev commented Nov 27, 2021

@hdevalence For adding or subtracting a Duration on a human time struct, the arithmetic is either performed member-wise, which is not pretty, or the struct is converted to an integer timestamp, then after adding/subtracting, the result is broken down back into human time. Comparisons are also performed member-to-member. With a linear timestamp value, it's simple integer checked arithmetic.

Memory efficiency is another matter. The parsed time struct can fit into 12 bytes with the alignment of 4. A timestamp can be broken down to an (i64, u32) pair, but on 64-bit platforms the structure size will be 16 bytes due to the alignment.

So the choice of internal representation depends on the optimization priorities for users of the Time struct. If it's going to be used in bulk workloads doing e.g. validation logic, I'd suggest going with the arithmetic-friendly timestamp representation.

@mzabaluev
Copy link
Contributor

Filed an ADR: #1035

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants