Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly delimit identifiers in DataType expression created from Type #8845

Merged
merged 1 commit into from
Aug 16, 2021

Conversation

losipiuk
Copy link
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Aug 10, 2021
@losipiuk losipiuk requested a review from martint August 11, 2021 10:42
@losipiuk losipiuk marked this pull request as ready for review August 11, 2021 13:54
@losipiuk losipiuk force-pushed the lo/delimit-type-identifiers branch 2 times, most recently from 05d1413 to f24a51d Compare August 11, 2021 14:36
@@ -300,6 +301,19 @@ static DataType toDataType(TypeSignature typeSignature)
}
}

private static boolean requiresDelimiting(String identifier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for now, but technically, every name should be delimited. Once #17 is fixed, any names coming from a connector (including column types and field names in row types) will be considered as "already canonicalized", so they'll need to be surrounded in quotes when rendering as SQL to ensure proper matching semantics.

@@ -33,7 +32,7 @@ private void assertRoundTrip(String expression)
{
assertThat(type(expression))
.ignoringLocation()
.withComparatorForType(Comparator.comparing(identifier -> identifier.getValue().toLowerCase(Locale.ENGLISH)), Identifier.class)
.withComparatorForType(Comparator.comparing(Identifier::getCanonicalValue), Identifier.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes semantics. Is it intentional? The canonical value per SQL standard rules uses upper case. For historical reasons, and until #17 is fixed, we continue to use lower case in many places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter here? I assumed that for sake of testing assertion whatever normalization (upper/lower) is fine. I changed to getCanonicalValue mostly to have different matching logic for delimited (uppercasing) and non-delimited (no uppercasing) identifiers. Without the change, the test passes for field names that require delimiting even without changes to TypeSignatureTranslator. I agree this is kinda a hack though. I can take that back and assume that regression testing is solely done via testImplicitCastToRowWithFieldsRequiringDelimitation().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After addressing @martint's comment

@losipiuk
Copy link
Member Author

AC @martint . Let me know if this is good to go.

@losipiuk
Copy link
Member Author

CI: #8691

@losipiuk losipiuk merged commit bd2d372 into trinodb:master Aug 16, 2021
@losipiuk losipiuk mentioned this pull request Aug 16, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants