-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20493][R] De-duplicate parse logics for DDL-like type strings in R #17785
Closed
Closed
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't checked myself, what are the differences if any between
getSQLDataType
andCatalystSqlParser.parseDataType
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to my knowledge,
getSQLDataType
supports the types below:and these look required to be case-sensitive whereas
parseDataType
supports ...and these look case-insensitive.
I think the inital intention for
getSQLDataType
was to support R type string conversionstoo but they look unreachable codes now because we were checking the type strings before actually calling
getSQLDataType
incheckType
.If the types are not in
!is.null(PRIMITIVE_TYPES[[type]])
(case-sensitive), it looks throwing an error.In short, I think there should not be a behaviour change below types (intersection between
getSQLDataType
andparseDataType
) ...and these should be case-sensitive.
Additionally, we will support the types below (which are written in R's
PREMISITVE_TYPES
butgetSQLDataType
did not support before):Before
After
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wrote the details about this at my best. Yes, I think this should be targeted to master not 2.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for looking into it. if I take the diff,
these are actually R native type names though, for which if I have to guess, is intentional that we support R native type in structField as well as Scala/Spark types.
I'm not sure how much coverage we have for something like this but is that going to still work with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, however, for those types, we can't create that field because the check via checkType fails as it is not in the keys in
PREMISITVE_TYPES
as below:I double-checked this is the only place where we called
getSQLDataType
and therefore they look unreachable (I hope you could double-check this when you have some time for this one just in case I missed something about this).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - then I suppose this was "broken" when checkType was added / over the past 2 years or so.
I'm ok with this change then - could you please check that each of the case in
checkType
we have a test for in R?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, probably those 12 cases in test_sparkSQL.R#L143-L194 and 5 cases in #17785 (comment) of total 17 cases in checkType above will be covered.
Let me double check them and will leave a comment here today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add positive cases for ...
and negative cases for ...