Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UUID comparisons to conform to IETF RFC 4122 #23847

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ZacBlanco
Copy link
Contributor

@ZacBlanco ZacBlanco commented Oct 16, 2024

Description

Reverse UUIDs bytes internally to make comparison operators conform to IETF RFC 4122

Motivation and Context

The presto documentation states that we support UUIDs and conform to RFC 41221:

Before this change, UUIDs were read in as two longs in big endian format and used the that byte order for comparisons. However, the java bytewise comparison in our io.airlift.slice dependency assumes the backing values are in little endian format, so the bytes are swapped during the comparison2. This made comparisons operators between UUIDs incorrect according the RFC 41223 §3 under "Rules for Lexical Equivalence" on p.5.

UUIDs, as defined in this document, can also be ordered lexicographically.
For a pair of UUIDs, the first one follows the second if the most significant
field in which the UUIDs differ is greater for the first UUID. The second
follows the first if the most significant field in which the UUIDs differ
is greater for the second UUID.

Note that RFC 41223 has an errata in the paragraph describing the lexicographic comparison in which the original text is inconsistent. The corrected text can be found in the errata EID 14284 and is reproduced above for easier reference.

Additionally, RFC 9562 has been published which, supersedes 4122. It defines the sorting rules as a simple byte-wise comparison in §6.115

UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g., database indexes) sort as opaque raw bytes without the need for parsing or introspection.

For example, before this change, the following comparison of UUIDs would result in a TRUE result:

presto:tpch> SELECT UUID '00000000-0000-0000-1000-000000000000' < UUID '00000000-0000-0000-0000-000000000001';
 _col0
-------
 true
(1 row)

This seems to be incorrect because the reading of the 00000000-0000-0000-1000-000000000000 UUID (say, UUID "A") appears to have a 1 byte in a more significant position than the 1 byte in the UUID 00000000-0000-0000-0000-000000000001 (say, UUID "B"). Because A has a 1 byte in a more significant position than B, this comparison should evaluate to FALSE.

In addition, when testing this same comparison in postgres (for which we also support the UUID type), postgres returns results which are inconsistent with Presto.

postgres=# SELECT uuid '00000000-0000-0000-1000-000000000000' < uuid '00000000-0000-0000-0000-000000000001';
 ?column?
----------
 f
(1 row)
Additional verification on ordering from postgres

SELECT * FROM (VALUES  
    (uuid '00000000-0000-0000-0000-000000000000'),
    (uuid '00000000-0000-0000-0000-000000000001'),
    (uuid '00000000-0000-0000-0000-000000000010'),
    (uuid '00000000-0000-0000-0000-000000000100'),
    (uuid '00000000-0000-0000-0000-000000001000'),
    (uuid '00000000-0000-0000-0000-000000010000'),
    (uuid '00000000-0000-0000-0000-000000100000'),
    (uuid '00000000-0000-0000-0000-000001000000'),
    (uuid '00000000-0000-0000-0000-000010000000'),
    (uuid '00000000-0000-0000-0000-000100000000'),
    (uuid '00000000-0000-0000-0000-001000000000'),
    (uuid '00000000-0000-0000-0000-010000000000'),
    (uuid '00000000-0000-0000-0000-100000000000'),
    (uuid '00000000-0000-0000-0001-000000000000'),
    (uuid '00000000-0000-0000-0010-000000000000'),
    (uuid '00000000-0000-0000-0100-000000000000'),
    (uuid '00000000-0000-0000-1000-000000000000'),
    (uuid '00000000-0000-0001-0000-000000000000'),
    (uuid '00000000-0000-0010-0000-000000000000'),
    (uuid '00000000-0000-0100-0000-000000000000'),
    (uuid '00000000-0000-1000-0000-000000000000'),
    (uuid '00000000-0001-0000-0000-000000000000'),
    (uuid '00000000-0010-0000-0000-000000000000'),
    (uuid '00000000-0100-0000-0000-000000000000'),
    (uuid '00000000-1000-0000-0000-000000000000'),
    (uuid '00000001-0000-0000-0000-000000000000'),
    (uuid '00000010-0000-0000-0000-000000000000'),
    (uuid '00000100-0000-0000-0000-000000000000'),
    (uuid '00001000-0000-0000-0000-000000000000'),
    (uuid '00010000-0000-0000-0000-000000000000'),
    (uuid '00100000-0000-0000-0000-000000000000'),
    (uuid '01000000-0000-0000-0000-000000000000'),
    (uuid '10000000-0000-0000-0000-000000000000')
) as t(u) order by u desc;

result

                 u
--------------------------------------
 10000000-0000-0000-0000-000000000000
 01000000-0000-0000-0000-000000000000
 00100000-0000-0000-0000-000000000000
 00010000-0000-0000-0000-000000000000
 00001000-0000-0000-0000-000000000000
 00000100-0000-0000-0000-000000000000
 00000010-0000-0000-0000-000000000000
 00000001-0000-0000-0000-000000000000
 00000000-1000-0000-0000-000000000000
 00000000-0100-0000-0000-000000000000
 00000000-0010-0000-0000-000000000000
 00000000-0001-0000-0000-000000000000
 00000000-0000-1000-0000-000000000000
 00000000-0000-0100-0000-000000000000
 00000000-0000-0010-0000-000000000000
 00000000-0000-0001-0000-000000000000
 00000000-0000-0000-1000-000000000000
 00000000-0000-0000-0100-000000000000
 00000000-0000-0000-0010-000000000000
 00000000-0000-0000-0001-000000000000
 00000000-0000-0000-0000-100000000000
 00000000-0000-0000-0000-010000000000
 00000000-0000-0000-0000-001000000000
 00000000-0000-0000-0000-000100000000
 00000000-0000-0000-0000-000010000000
 00000000-0000-0000-0000-000001000000
 00000000-0000-0000-0000-000000100000
 00000000-0000-0000-0000-000000010000
 00000000-0000-0000-0000-000000001000
 00000000-0000-0000-0000-000000000100
 00000000-0000-0000-0000-000000000010
 00000000-0000-0000-0000-000000000001
 00000000-0000-0000-0000-000000000000

Impact

  • Fixes comparisons to conform to the UUID RFC. Technically breaks compatibility between versions as comparisons now behave differently

Test Plan

  • Additional test cases for comparisons

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Improve UUID comparisons so they conform to `IETF RFC 4122 <https://datatracker.ietf.org/doc/html/rfc4122>`_. :pr:`23847`

Footnotes

  1. https://prestodb.io/docs/0.289/language/types.html#uuid-type

  2. https://github.com/airlift/slice/blob/8f0494bdaad91f0c57f03e09aad2d77f955cfe42/src/main/java/io/airlift/slice/Slice.java#L1331-L1340

  3. https://datatracker.ietf.org/doc/html/rfc4122#section-3 2

  4. https://www.rfc-editor.org/errata/eid1428

  5. https://datatracker.ietf.org/doc/html/rfc9562#section-6.11

@tdcmeehan tdcmeehan self-assigned this Oct 16, 2024
@steveburnett
Copy link
Contributor

Thanks for the release note entry! A couple of nit suggestions to follow the phrasing in the Order of changes in the Release Notes Guidelines, add the PR number, and consider adding a link to the RFC. I found this link I used in the suggestion, if you have a better link please use it instead.

== RELEASE NOTES ==

General Changes
* Improve UUID comparisons so they are now performed correctly according to `IETF RFC 4122 <https://datatracker.ietf.org/doc/html/rfc4122>`_. :pr:`23847`

@ZacBlanco
Copy link
Contributor Author

Thanks as usual for your suggestions @steveburnett 😄

@ZacBlanco ZacBlanco marked this pull request as ready for review October 17, 2024 05:27
@ZacBlanco ZacBlanco requested review from elharo and a team as code owners October 17, 2024 05:28
@ZacBlanco ZacBlanco changed the title Use big-endian representation for UUIDs Fix UUID comparisons to conform to IETF RFC 4122 Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants