Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Presto Java UUID serialization #11197

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BryanCutler
Copy link

This fixes the PrestoSerializer to put UUID values in the correct format that is expected by Presto Java so that the values will match those from a Java worker. First, when converting UUID to/from string, the values are no longer in big endian format (as taken from boost::uuid) and are instead stored as a little endian in an int128_t. Secondly, Presto Java will read UUID values from an Int128ArrayBlock with the first value as the most significant bits. To correct this, the upper/lower parts of the int128_t are swapped during serialization/deserialization.

A unit test for checking roundtrip UUID serializaiton was added and manual testing of Presto with a native worker to verify the problem from the issue description is fixed.

From prestodb/presto#23311

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 8, 2024
Copy link

netlify bot commented Oct 8, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d608cfe
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6719a288cad77f0008bb84a5

@BryanCutler
Copy link
Author

please review @aditi-pandit @Yuhta @mbasmanova

@@ -63,10 +63,13 @@ TEST_F(UuidFunctionsTest, castAsVarchar) {
// Verify that CAST results as the same as boost::lexical_cast. We do not use
// boost::lexical_cast to implement CAST because it is too slow.
auto expected = makeFlatVector<std::string>(size, [&](auto row) {
const auto uuid = uuids->valueAt(row);
auto uuid = uuids->valueAt(row);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do this and rest of the file can be left unchanged:

auto uuid = folly::Endian::big(uuids->valueAt(row));

@@ -97,8 +97,8 @@ class UuidCastOperator : public exec::CastOperator {

size_t offset = 0;
for (auto i = 0; i < 16; ++i) {
result.data()[offset] = kHexTable[uuidBytes[i] * 2];
result.data()[offset + 1] = kHexTable[uuidBytes[i] * 2 + 1];
result.data()[offset] = kHexTable[uuidBytes[15 - i] * 2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto uuid = folly::Endian::big(uuids->valueAt(row));

@@ -125,7 +125,10 @@ class UuidCastOperator : public exec::CastOperator {
auto uuid = boost::lexical_cast<boost::uuids::uuid>(uuidString);

int128_t u;
memcpy(&u, &uuid, 16);
auto charPtr = reinterpret_cast<char*>(&u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy(&u, &uuid, 16);
u = folly::Endian::big(u);

@BryanCutler
Copy link
Author

Thanks @Yuhta , but folly::Endian::big is not defined for int128 types. It's based off of compiler built-in bytes swapping functions, so it looks like it could be added for GCC variants, but there is no built-in function for MSVC. I could add a check to these changes to only swap if the host system is LE, wdyt?

@Yuhta
Copy link
Contributor

Yuhta commented Oct 16, 2024

@BryanCutler You can add a utility to DecimalUtil to do that. We can do it with 2 folly::Endian::bigs if needed.

Added DecimalUtil::big for reversing int128_t byte order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants