[SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column order is different #19928

dongjoon-hyun · 2017-12-08T11:24:36Z

What changes were proposed in this pull request?

Until 2.2.1, with the default configuration, Apache Spark returns incorrect results when ORC file schema is different from metastore schema order. This is due to Hive 1.2.1 library and some issues on convertMetastoreOrc option.

scala> Seq(1 -> 2).toDF("c1", "c2").write.format("orc").mode("overwrite").save("/tmp/o")
scala> sql("CREATE EXTERNAL TABLE o(c2 INT, c1 INT) STORED AS orc LOCATION '/tmp/o'")
scala> spark.table("o").show    // This is wrong.
+---+---+
| c2| c1|
+---+---+
|  1|  2|
+---+---+
scala> spark.read.orc("/tmp/o").show  // This is correct.
+---+---+
| c1| c2|
+---+---+
|  1|  2|
+---+---+

After SPARK-22279, the default configuration doesn't have this bug. Although Hive 1.2.1 library code path still has the problem, we had better have a test coverage on what we have now in order to prevent future regression on it.

How was this patch tested?

Pass the Jenkins with a newly added test test.

…olumn order is different

SparkQA · 2017-12-08T13:31:13Z

Test build #84653 has finished for PR 19928 at commit ea75bce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

cc @gatorsmile and @cloud-fan

dongjoon-hyun · 2017-12-10T02:45:29Z

Thank you for review and approval, @HyukjinKwon !

cloud-fan · 2017-12-11T13:53:11Z

thanks, merging to master!

dongjoon-hyun · 2017-12-11T17:14:54Z

Thank you, @HyukjinKwon and @cloud-fan !

[SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when c…

ea75bce

…olumn order is different

HyukjinKwon approved these changes Dec 10, 2017

View reviewed changes

asfgit closed this in 6cc7021 Dec 11, 2017

dongjoon-hyun deleted the SPARK-22267 branch December 11, 2017 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column order is different #19928

[SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column order is different #19928

dongjoon-hyun commented Dec 8, 2017 •

edited

Loading

SparkQA commented Dec 8, 2017

HyukjinKwon left a comment •

edited

Loading

dongjoon-hyun commented Dec 10, 2017

cloud-fan commented Dec 11, 2017

dongjoon-hyun commented Dec 11, 2017

[SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column order is different #19928

[SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column order is different #19928

Conversation

dongjoon-hyun commented Dec 8, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Dec 8, 2017

HyukjinKwon left a comment • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun commented Dec 10, 2017

cloud-fan commented Dec 11, 2017

dongjoon-hyun commented Dec 11, 2017

dongjoon-hyun commented Dec 8, 2017 •

edited

Loading

HyukjinKwon left a comment •

edited

Loading