-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22981][SQL] Fix incorrect results of Casting Struct to String #20176
Conversation
buildCast[InternalRow](_, row => { | ||
val builder = new UTF8StringBuilder | ||
builder.append("[") | ||
if (row.numFields > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, it seems we have no chance to hit row.numFields == 0
here though, I just leave this for strict checks.
Test build #85767 has finished for PR 20176 at commit
|
retest this please |
Test build #85770 has finished for PR 20176 at commit
|
@cloud-fan ping |
val structToStringCode = st.zipWithIndex.map { case (ft, i) => | ||
val fieldToStringCode = castToStringCode(ft, ctx) | ||
val funcName = ctx.freshName("fieldToString") | ||
ctx.addNewFunction(funcName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to create a function, it's called only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can create functions, for data types that appeared more than once among struct fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW here we may hit 64kb compile error if there are a lot of fields in this struct. We should use ctx.splitExpressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, l'll update soon
Test build #85834 has finished for PR 20176 at commit
|
Test build #85838 has finished for PR 20176 at commit
|
retest this please |
Test build #85844 has finished for PR 20176 at commit
|
## What changes were proposed in this pull request? This pr fixed the issue when casting structs into strings; ``` scala> val df = Seq(((1, "a"), 0), ((2, "b"), 0)).toDF("a", "b") scala> df.write.saveAsTable("t") scala> sql("SELECT CAST(a AS STRING) FROM t").show +-------------------+ | a| +-------------------+ |[0,1,1800000001,61]| |[0,2,1800000001,62]| +-------------------+ ``` This pr modified the result into; ``` +------+ | a| +------+ |[1, a]| |[2, b]| +------+ ``` ## How was this patch tested? Added tests in `CastSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #20176 from maropu/SPARK-22981. (cherry picked from commit 2250cb7) Signed-off-by: Wenchen Fan <[email protected]>
LGTM, merging to master/2.3! |
Thanks! Next, I'll fix FYI: This casting behaviour is different between mysql and postgresql;
|
hive one is the same with mysql one;
|
show binary as string and cast binary to string seems different to me, let's stick with what it is. BTW it's pretty dangerous to change the behavior of cast to be different with Hive. |
ok, I'll make a follow-up for |
case StructType(fields) => | ||
buildCast[InternalRow](_, row => { | ||
val builder = new UTF8StringBuilder | ||
builder.append("[") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any reasons for using the same brackets []
for Struct, Map and Array?
How about []
for arrays, {}
for structs and <>
for maps or somehow distinguish them. At least, it is not consistent with toHiveString
which prints {}
for structs and maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have no strong reason. In this PR, I just followed the existing format. Looks like we can change it to {
for consistency (But, we might need to update the migration doc cuz it change the casting behaivour). cc: @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the new format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the PR #29308 . Please, review it.
What changes were proposed in this pull request?
This pr fixed the issue when casting structs into strings;
This pr modified the result into;
How was this patch tested?
Added tests in
CastSuite
.