Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FromHex is not a generic hex parsing trait #74

Closed
tcharding opened this issue Feb 18, 2024 · 18 comments
Closed

FromHex is not a generic hex parsing trait #74

tcharding opened this issue Feb 18, 2024 · 18 comments

Comments

@tcharding
Copy link
Member

tcharding commented Feb 18, 2024

The string "a" as a valid hex string but using FromHex it cannot be parsed.

Also, if the FromHex crate is a generic hex parsing trait I would expect to be able to implement it for Wrapper(u32) but the trait is not really suited to that because there is no u32::from_hex impl and u32::from_str_radix does not play nicely with FromHex::from_byte_iter.

This is starting to feel like a parse-hex-as-required-by-rust-bitcoin trait (and by extension crate) not a generic hex parsing trait.

To add salt to the wound, rust-bitcoin includes FromHexStr to add parsing of types that can include a prefix, which seems alot like something a generic hex parsing library should handle.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 18, 2024

Oh, yeah, now I realized my comment about having special error type was bogus.

IMO FromHex should be only for [u8; N], Vec<u8>, Box<[u8]>, Box<[u8; N]>, Rc<[u8]>, Rc<[u8; N], Arc<[u8]>, Arc<[u8; N>].

Newtypes can just use FromStr or an explicit from_hex method - that's even better because one doesn't have to import anything.

@apoelstra
Copy link
Member

What should we do for locktimes, scriptnums, sequence numbers and targets?

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 18, 2024

Inherent methods. But maybe we could consider FromHex to be general-purpose if we merge #75. However I still don't think that being generic over hex parse-able type is useful. I think it makes much more sense for FromHex to be an extension trait for std types and just have inherent methods so people don't need to import the trait.

@apoelstra
Copy link
Member

However I still don't think that being generic over hex parse-able type is useful.

If all we need to do in order to be generic is to allow 0x prefixes then we should do it.

Or is your point that if we support hex-parsing u64 etc with a 0x prefix, then anybody who wants this for their own types can just internally parse a u64 and convert that? I agree.

I think it makes much more sense for FromHex to be an extension trait for std types and just have inherent methods so people don't need to import the trait.

Agreed that we should have inherent methods so people don't have to import the trait. As for whether the trait should be implemented at all on our newtypes etc., I think it should, even if the trait methods' errors are crappy and non-generic, because it lets people implement serde-like "parse an object in hex from special-purpose stream X or ASCII-computing algorithm Y" functions.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 18, 2024

There are two categories of hex-parseable types: sequences of bytes and integers. The former never have 0x prefix, the latter may have which suggests they should be separate traits.
There are std types and our own types. std integers already have from_str_radix, no need to add anything (well, maybe if people really, really like our ParseIntError I would consider). For byte-like we have FromHex.
We need the trait because we can't have inherent methods on external types. However our own types don't need any traits and not having to import things is better.

it lets people implement serde-like "parse an object in hex from special-purpose stream X or ASCII-computing algorithm Y" functions.

This is the exact thing I doubt. I don't remember seeing this anywhere and I'm not convinced why we should be the ones responsible for providing this general-purpose functionality. We have enough work with bitcoin as-is. I'd rather cut the complexity by reducing features that can be added later without breakage. If later someone demonstrates that this is actually useful I'll accept it.

@apoelstra
Copy link
Member

There are two categories of hex-parseable types: sequences of bytes and integers. The former never have 0x prefix, the latter may have which suggests they should be separate traits.

In terms of conceptual purity, yes, I see what you're getting at, but from a user POV when I search "hex" in order to parse "hex" I expect to find one thing, not to have to think through more conceptual divisions in the exact task that I'm trying to achieve. We could similarly argue for different methods that parse from bytestreams or that parse from UTF-8 strings, or for different methods that parse bytewise (even lengths) or nibble-wise (odd lengths allowed) etc etc.

But we only have the one word "hex" so we ought to use it to cover a reasonably large amount of ground.

There are std types and our own types. std integers already have from_str_radix, no need to add anything (well, maybe if people really, really like our ParseIntError I would consider). For byte-like we have FromHex.

from_str_radix is the worst. It is inefficient because it needs to support arbitrary radices, has a long name and you need to type , 16 every time even though 16 is literally the only base anybody would ever call this function with.

I don't remember seeing this anywhere

serde_json's from_reader is an example of such a function (with the Read trait rather than FromHex, but same concept).

and I'm not convinced why we should be the ones responsible for providing this general-purpose functionality.

Yes we are. Because std did not provide it (with the same "nobody needs hex" reasoning that I'm seeing here) and the hex crate has had MSRV issues in the past.

If we neuter hex-conservative then I will need to go make my own hex crate and be very annoyed at having spent so long helping build this one. Or I will just go back to using bitcoin_hashes 0.11 and bitcoin 0.29, as will most of our users.

We have enough work with bitcoin as-is.

Yes, but adding support for a 0x prefix is really not a lot of extra work, assuming we support it sufficiently well to handle locktimes and sequence numbers (which in turn will be sufficient for everybody else who simply has a newtyped u64 that they want to parse as hex), and don't go crazy trying to be super generic.

@apoelstra
Copy link
Member

I'd rather cut the complexity by reducing features that can be added later without breakage.

This part I agree with -- if there's a straightforward path to adding 0x-prefix support later then let's forget about it for now and circle back.

@apoelstra
Copy link
Member

which suggests they should be separate traits.

After thinking about this I'm warming up to it. We could have FromHex and FromHexNum, where the latter allows odd-length strings and 0x prefixes, and maybe even rangechecks beyond those of the underlying primitive, etc.

And we can delay implementing this even until after 1.0, because for our types that we care about (e.g. Sequence) it's easy to manually implement the logic in inherent methods using from_str_radix under the hood.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 19, 2024

but from a user POV when I search "hex" in order to parse "hex" I expect to find one thing

Well, this is Rust, we have String, str, OsString, OsStr... and for a very good reason. So we also might have from_hexfor byte sequences andfrom_hex_no_prefix` for integers. It might be possible to put them into the same trait with a bit more magic but I suspect it'd be more confusing.

from_str_radix is the worst. It is inefficient because it needs to support arbitrary radices, has a long name and you need to type , 16 every time even though 16 is literally the only base anybody would ever call this function with.

And you want to maintain a fast alternative that everyone wanting it will have to import? I personally would rather not deal with it at least until bitcoin is fleshed out.

with the Read trait rather than FromHex, but same concept

Well, that's the thing. I think it makes more sense to be either generic over FromStr or FromIterator<Item=u8>.

"nobody needs hex" reasoning that I'm seeing here

That's not what I'm saying. If std were to add it I'm pretty sure it'd be just inherent methods. std does inherent methods even for things where it'd make sense to have traits (wrapping arithmetic comes to mind) but I think it'd be fine in this case.

don't go crazy trying to be super generic

Exactly, and I think this is required to not get ourselves into some huge mess.

@apoelstra
Copy link
Member

And you want to maintain a fast alternative that everyone wanting it will have to import? I personally would rather not deal with it at least until bitcoin is fleshed out.

I feel you, but we need to at least recover the functionality that bitcoin_hashes::hex had in 0.11. (Though TBH we may have already passed it, as soon as we do a hex-conservative 0.2 release. We didn't used to have a serde module or proper fmt::Formatter handling, and I don't know whether we could parse 0x prefixes before.)

For right now I think that means

  • Declaring an intention to support 0x prefixes in the indefinite future
  • Making sure we don't box out our API to make that harder or impossible

with the Read trait rather than FromHex, but same concept

Well, that's the thing. I think it makes more sense to be either generic over FromStr or FromIterator<Item=u8>.

Ah, yeah, that makes sense. Ok, agreed.

@apoelstra
Copy link
Member

For right now I think that means

...and specifically (puttiing this in a separate comment that's easier to reply to/link):

  • Declaring that our existing FromHex trait is for hex-encoded bytestrings
  • ...that always have even length
  • ...and that never have a 0x prefix
  • ...and that we may, in future, add a new extension trait for numbers to the crate, but in the meantime people can manually parse 0x and then use from_str_radix.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 19, 2024

we need to at least recover the functionality that bitcoin_hashes::hex had in 0.11

That didn't have integer support. So the rest of that comment is moot.

@apoelstra
Copy link
Member

Ok, cool. Then all I want for now is for us to leave the door open to supporting integers.

@tcharding
Copy link
Member Author

Just throwing this comment in, to complete the move of types to units we need the functionality provided by the current bitcoin::string module, specifically FromHexStr for parsing integers from hex. As far as abstractions go, it is really odd to have a module for doing something that is totally a hex "utility" in the units crate when that crate depends on a crate called hex that is created by the same devs specifically for handling hex stuff. If I was a user and I opened that up I would think that it was half baked and that these guys don't really focus on clean abstractions - or they are too busy to have gotten to it and what else have they not gotten to. None of these things is really true, I think we should solve this problem as part of units. Crate smashing is the main task right now, and it feels to me like this is part of it. I may be being melodramatic again.

@apoelstra
Copy link
Member

@tcharding TBH I don't think that trait should have been public; it looks like it exists mainly to facilitate code reuse for various integer-like types. And the methods in question ought to be intrinsics. And possibly they should be collapsed into just from_str.

But ok, yes, understood that we need to resolve this one way or another.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 20, 2024

Yes, inherent methods please and if we ever decide to make the trait public we should do it in hex and probably name it NumFromHex or something.

@tcharding
Copy link
Member Author

@tcharding
Copy link
Member Author

I believe this issue is solved with the PRs above as well as the intention to add integer parsing at some later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants