Fix UUID comparisons to conform to IETF RFC 4122 #23847
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Reverse UUIDs bytes internally to make comparison operators conform to IETF RFC 4122
Motivation and Context
The presto documentation states that we support UUIDs and conform to RFC 41221:
Before this change, UUIDs were read in as two longs in big endian format and used the that byte order for comparisons. However, the java bytewise comparison in our
io.airlift.slice
dependency assumes the backing values are in little endian format, so the bytes are swapped during the comparison2. This made comparisons operators between UUIDs incorrect according the RFC 41223 §3 under "Rules for Lexical Equivalence" on p.5.Note that RFC 41223 has an errata in the paragraph describing the lexicographic comparison in which the original text is inconsistent. The corrected text can be found in the errata EID 14284 and is reproduced above for easier reference.
Additionally, RFC 9562 has been published which, supersedes 4122. It defines the sorting rules as a simple byte-wise comparison in §6.115
For example, before this change, the following comparison of UUIDs would result in a
TRUE
result:This seems to be incorrect because the reading of the
00000000-0000-0000-1000-000000000000
UUID (say, UUID "A") appears to have a1
byte in a more significant position than the1
byte in the UUID00000000-0000-0000-0000-000000000001
(say, UUID "B"). Because A has a 1 byte in a more significant position than B, this comparison should evaluate to FALSE.In addition, when testing this same comparison in postgres (for which we also support the UUID type), postgres returns results which are inconsistent with Presto.
Additional verification on ordering from postgres
result
Impact
Test Plan
Contributor checklist
Release Notes
Footnotes
https://prestodb.io/docs/0.289/language/types.html#uuid-type ↩
https://github.com/airlift/slice/blob/8f0494bdaad91f0c57f03e09aad2d77f955cfe42/src/main/java/io/airlift/slice/Slice.java#L1331-L1340 ↩
https://datatracker.ietf.org/doc/html/rfc4122#section-3 ↩ ↩2
https://www.rfc-editor.org/errata/eid1428 ↩
https://datatracker.ietf.org/doc/html/rfc9562#section-6.11 ↩