Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22973][SQL] Fix incorrect results of Casting Map to String #20166

Closed
wants to merge 2 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Jan 5, 2018

What changes were proposed in this pull request?

This pr fixed the issue when casting maps into strings;

scala> Seq(Map(1 -> "a", 2 -> "b")).toDF("a").write.saveAsTable("t")
scala> sql("SELECT cast(a as String) FROM t").show(false)
+----------------------------------------------------------------+
|a                                                               |
+----------------------------------------------------------------+
|org.apache.spark.sql.catalyst.expressions.UnsafeMapData@38bdd75d|
+----------------------------------------------------------------+

This pr modified the result into;

+----------------+
|a               |
+----------------+
|[1 -> a, 2 -> b]|
+----------------+

How was this patch tested?

Added tests in CastSuite.

@SparkQA
Copy link

SparkQA commented Jan 5, 2018

Test build #85726 has finished for PR 20166 at commit efc28e3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2018

Test build #85727 has finished for PR 20166 at commit be04e64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 6, 2018

@cloud-fan

if (map.numElements > 0) {
val keyToUTF8String = castToString(kt)
val valueToUTF8String = castToString(vt)
builder.append(keyToUTF8String(map.keyArray().get(0, kt)).asInstanceOf[UTF8String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map.keyArray() and map.valueArray appear many times, we can create 2 local variables for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cloud-fan
Copy link
Contributor

LGTM except one minor comment

@maropu
Copy link
Member Author

maropu commented Jan 6, 2018

BTW, the current Datset.showString prints rows thru RowEncoder deserializers like;

scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
+------------------------------------------------------------+
|a                                                           |
+------------------------------------------------------------+
|[WrappedArray(1, 2), WrappedArray(3), WrappedArray(4, 5, 6)]|
+------------------------------------------------------------+

If we cast them before prints, we could get more simpler forms like;

scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
+------------------------+
|a                       |
+------------------------+
|[[1, 2], [3], [4, 5, 6]]|
+------------------------+

I'm not sure through, is this acceptable? (Probably, we might need to add a option to keep the old behaviour)

@SparkQA
Copy link

SparkQA commented Jan 6, 2018

Test build #85751 has finished for PR 20166 at commit fb11796.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

I think this is a bug, in Dataset.showString I see code like

case seq: Seq[_] => seq.mkString("[", ", ", "]")

Which means we do want to show strings like [[1, 2], [3], [4, 5, 6]]

Anyway let's fix in another PR, I'm merging this PR first

@cloud-fan
Copy link
Contributor

merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jan 7, 2018
## What changes were proposed in this pull request?
This pr fixed the issue when casting maps into strings;
```
scala> Seq(Map(1 -> "a", 2 -> "b")).toDF("a").write.saveAsTable("t")
scala> sql("SELECT cast(a as String) FROM t").show(false)
+----------------------------------------------------------------+
|a                                                               |
+----------------------------------------------------------------+
|org.apache.spark.sql.catalyst.expressions.UnsafeMapData38bdd75d|
+----------------------------------------------------------------+
```
This pr modified the result into;
```
+----------------+
|a               |
+----------------+
|[1 -> a, 2 -> b]|
+----------------+
```

## How was this patch tested?
Added tests in `CastSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes #20166 from maropu/SPARK-22973.

(cherry picked from commit 18e9414)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 18e9414 Jan 7, 2018
@maropu
Copy link
Member Author

maropu commented Jan 7, 2018

ok, I'll fix struct in a next following pr first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants