Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UUID as a well-known string type #217

Merged
merged 1 commit into from
Jun 20, 2019
Merged

Conversation

nichegriffy
Copy link
Contributor

@nichegriffy nichegriffy commented Jun 6, 2019

This should resolve #61

As the title states, this PR adds uuid as a well-known string type that verifies it looks like a UUID per RFC 4122.

I decided to add one uuid validator rather than version-specific validators because I figured the most common use case would be verifying that the string looks like a UUID rather than a particular version, as well as to be able to support the special nil UUID.

@nichegriffy nichegriffy force-pushed the master branch 4 times, most recently from 0b725ba to f8040f8 Compare June 7, 2019 02:15
@nichegriffy nichegriffy changed the title WIP: Add UUID as a well-known string type Add UUID as a well-known string type Jun 7, 2019
rmichela
rmichela previously approved these changes Jun 13, 2019
Copy link
Contributor

@rmichela rmichela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me. There are a few conflicts that need resolving.

@nichegriffy
Copy link
Contributor Author

@rmichela Thanks! I resolved the conflicts, so this should be ready for another review now 🔍

@rmichela rmichela added Enhancement Extend or improve functionality Go Go language support GoGo [DEPRECATED] Gogo plugin support [DEPRECATED] Java Java/JVM language support labels Jun 18, 2019
@rodaine
Copy link
Member

rodaine commented Jun 20, 2019

Awesome! Thanks for the work here :D

@rodaine rodaine merged commit 30b4f8c into bufbuild:master Jun 20, 2019
@Fleshgrinder
Copy link
Contributor

@rmichela and @rodaine

What about the simple format that basically saves 4 bytes?

Also, what is the general policy regarding performance? Regular expressions — especially in Java — are extremely slow compared to a custom tailored and hard coded state machine.

I didn’t want to create a dedicated issue for either of these questions because how to proceed is basically depending too much on the answers and it seems fitting to ask here.

@rmichela
Copy link
Contributor

rmichela commented Jun 20, 2019

PGV uses the Google RE2J regex library instead of java.util.regex.

RE2 is a regular expression engine that runs in time linear in the size of the input.

@akonradi
Copy link
Contributor

That's true for Go (because that's the implementation in the standard library), but the C++ code doesn't implement regexes and needs #22 to get resolved before implementation. I'm personally on board with using RE2 everywhere but want more buy-in.

As for regexes for parsing by hand, my experience has been that switching from RE2 to hand-written parsing makes for a 4-10x performance improvement, at least in C++. For fixed patterns like for UUIDs, it would make sense to implement the same logic by hand.

@rmichela
Copy link
Contributor

I created a new issue so we don't lose track of the work.

@nichegriffy
Copy link
Contributor Author

@Fleshgrinder

I did see the mention of the simple format without hyphens in the original issue and was going to support it, but the grammar for how to represent a UUID in string form in RFC 4122 does seem to require the hyphens: https://tools.ietf.org/html/rfc4122#section-3

With that said, the decision to strictly adhere to RFC 4122 was my own and this project may prefer to support more representations, but I wanted to give context for why I left it out in this PR.

@Fleshgrinder
Copy link
Contributor

I’m with @akonradi here and happy to provide a benchmark for Java. There are objects being created in the background and they get persisted throughout the lifetime of the whole program.

@nichegriffy yeah it does and I guess that most people will also prefer to use it with hyphens because it’s the best way to distinguish it from some random hex string but I thought I’d ask to gauge whether there’s interest or not. 😉

@rmichela
Copy link
Contributor

See #222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Extend or improve functionality Go Go language support GoGo [DEPRECATED] Gogo plugin support [DEPRECATED] Java Java/JVM language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add UUIDv4 to well-known formats
5 participants