-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UUID v7 support #15
Conversation
8e36c6a
to
72bc8e6
Compare
21ffc3b
to
def6ad2
Compare
UUIDv7, currently RFC is a new version that allows for time ordering thanks to a unix timestamp component. @see https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/04/
# See RFC 4122 for details of UUID. | ||
# | ||
def uuid_v7 | ||
ts = [Process.clock_gettime(Process::CLOCK_REALTIME, :millisecond)].pack('Q>').unpack('nNn').drop(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC is fundamentally flawed >> and will not work at scale if monotonic total ordering is required <<. CLOCK_REALTIME
skips forwards and backwards on many events, just to name a few: hibernation, NTP adjustments, daylight savings time, and leapseconds. And CLOCK_MONOTOMIC_RAW
is not suitable for use between systems. If there will only ever be a single system generating UUIDs, then CLOCK_MONOTONIC_RAW
fallback on CLOCK_MONOTONIC
is appropriate. If multiple systems expect canonical total monotonic ordering, then deploy PTP and use TAI ( CLOCK_TAI
on Linux ). CLOCK_REALTIME
with a timezone of UTC
can never be monotonic due leapseconds. UTC(t) = TAI(t) - leap_seconds_for_year_and_month(t(m, y)) data here. TAI is the primary reliable, global monotonic time standard and essential to providing lock-free, unique, total ordering across multiple systems. The fallback method to global ordering is to have a single (possible SPoF risk) UUID master issuer. TL;DR In any case, this type of UUID won't be useful for anything important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment! Is monotonic total ordering required though? From my perspective there are a lot of use cases where a certain instability is accepted while an approximate monotonic ordering will help.
Consider for example a batching mechanism for backfills in a typical RoR application:
Model.in_batches do |batch| # Loads records by 1000 keeping the latest id
batch.update_all(something: :something)
# batch operation that would normally lock the table, but it's now locking only selected rows
end
In the above-mentioned example having an UUIDv4 as a primary key means that the records don't have a stable order. The occasional inconsistency of UUIDv7 is usually covered by the batch size.
However I'd be open to rewrite this to use TAI (perhaps as an option) if necessary.
Can we merge this? |
I'm a fan of using UUID as identifiers, but yeah, sometimes the loss of monotonically increasing is a pain. UUIDv7 would be helpful in many cases. I know technically you can have clock issues, but those clock issues tend to cause problems in the millisecond ranges while most user-generated data tend to be in the seconds or minute ranges for the apps that I build, so it's not a problem. Knowing which record was created first when they were created 2 days apart, just from the id, can be useful. I think this can be a middle ground before going to a central monotonically increasing generation of ids, ala Twitter Snowflake. |
Sorry, I didn't realize that My implementation was originally almost identical to this. But after someone made a comment about monotonicity and I thought about it a little bit, I added an optional part of the draft RFC: a kwarg for 0..12 extra timestamp bits. This changes the timestamp precision from 1ms to up to ~250ns, at the loss of up to 12 bits of randomness, and slightly more complex code. I agree with @khasinski and @pupeno that perfect monotonicity isn't necessary for most use-cases, and in the places where it is necessary, you probably need to handle it in a centralized DB server anyway (and probably a special purpose database). Considering that my current DBs use v4 UUIDs with zero monotonicity, 1ms precision is certainly good enough for nearly anything I'd use it in. And for simple single-node monotonicity, you can always simply sort, like so: IMO, the other techniques provided by the RFC for improving monotonicity are all far too complicated and come with far too many trade-offs. If a ruby application truly needs a monotonicity guarantees better than 240ns of precision (single node, and also whatever clock skew your ntp-managed servers might have), then that application knows what tradeoffs make the most sense and can implement whatever global system state (counters, etc) it needs. |
FWIW, I added my slightly different PR here: #19. |
UUIDv7 (currently in RFC) is a new version of UUID that allows for time ordering values thanks to a unix timestamp component. Can be helpful to iterate over a large set of data (think for example of backfilling migrations
in_batches
) while still maintaining some of the randomness of UUIDv4.see
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/04/There is an updated version of this document: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/