[SPARK-14557][SQL] Reading textfile (created though CTAS) doesn't work #12356

kasjain · 2016-04-13T10:55:43Z

What changes were proposed in this pull request?

These changes fix the below broken functionality and does a small performance improvement.

Reading the CSV table (created through CTAS query) doesn't work when the pathFilter is provided.
A bug in HadoopFileReader. Currently, when the pathFilter is provided, it passes the list of filtered files to the "setInputPaths" which wrongly sets the string of incorrectly escaped comma separated files in an array-sequence of size one. This should have been a sequence of size equal to the number of files obtained after filtering. Hence the exception mentioned in the bug.
------ FileInputFormat.setInputPaths(jobConf, Seq[Path](new Path%28path%29): _*)
Secondly, in this flow, filtering is triggered twice for each file. Once in hadoopTableReader.applyFilterIfApplicable and then again in FileInputFormat.singleThreadedListStatus. This is costly and redundant.

To solve both the issues above, we can just pass the directory path itself even when the pathFilter is enabled.

How was this patch tested?

Integration tests, manual tests

…k when pathFilter is enabled. 1) A bug in HadoopFileReader. Resolved by passing the directory instead of a list of files in case of pathFilter also, since it gets triggerred in FileInputFormat. This also saves multiple filterings in the codePath. 2) Not using the applyFilterIfApplicable

andrewor14 · 2016-04-13T20:50:46Z

ok to test

SparkQA · 2016-04-13T21:50:22Z

Test build #55741 has finished for PR 12356 at commit 48a598e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-04-13T22:27:18Z

@liancheng @yhuai?

saucam · 2016-04-14T04:56:01Z

I think we can eliminate applyFilterIfNeeded method as well.

SparkQA · 2016-04-14T06:39:15Z

Test build #55796 has finished for PR 12356 at commit 6bd529c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kasjain · 2016-04-18T03:37:35Z

Can any of the admin verify the above fix?

kasjain · 2016-04-22T06:08:45Z

Resolved the merge conflicts for easy merging

SparkQA · 2016-04-22T07:14:53Z

Test build #56660 has finished for PR 12356 at commit ca9a160.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-04-30T01:56:36Z

cc @marmbrus

kasjain · 2016-05-02T05:44:34Z

Resolved the merge conflicts for easy merging

SparkQA · 2016-05-02T06:39:34Z

Test build #57504 has finished for PR 12356 at commit 5ba453b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-05-23T17:57:33Z

Is it possible to write unit tests for this?

marmbrus · 2016-05-23T17:57:40Z

ok to test

SparkQA · 2016-05-23T19:03:06Z

Test build #59146 has finished for PR 12356 at commit 5ba453b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kasjain · 2016-05-24T06:23:46Z

Sure. Let me add the CTAS query in the test suite

gatorsmile · 2017-06-12T23:07:16Z

@kasjain Could you add a test case? Does it still fail in the latest master?

gatorsmile · 2017-06-27T06:41:00Z

We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!

[SPARK-14557][SQL] Removing unused method

6bd529c

[SPARK-14557][SQL]: Resolving merge conflicts

ca9a160

[SPARK-14557][SQL]: [2]Resolving merge conflicts

5ba453b

HyukjinKwon mentioned this pull request Jun 25, 2017

[INFRA] Close stale PRs #18417

Closed

asfgit closed this in b32bd00 Jun 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14557][SQL] Reading textfile (created though CTAS) doesn't work #12356

[SPARK-14557][SQL] Reading textfile (created though CTAS) doesn't work #12356

kasjain commented Apr 13, 2016

andrewor14 commented Apr 13, 2016

SparkQA commented Apr 13, 2016

andrewor14 commented Apr 13, 2016

saucam commented Apr 14, 2016

SparkQA commented Apr 14, 2016

kasjain commented Apr 18, 2016

kasjain commented Apr 22, 2016

SparkQA commented Apr 22, 2016

rxin commented Apr 30, 2016

kasjain commented May 2, 2016

SparkQA commented May 2, 2016

marmbrus commented May 23, 2016

marmbrus commented May 23, 2016

SparkQA commented May 23, 2016

kasjain commented May 24, 2016

gatorsmile commented Jun 12, 2017

gatorsmile commented Jun 27, 2017

[SPARK-14557][SQL] Reading textfile (created though CTAS) doesn't work #12356

[SPARK-14557][SQL] Reading textfile (created though CTAS) doesn't work #12356

Conversation

kasjain commented Apr 13, 2016

What changes were proposed in this pull request?

How was this patch tested?

andrewor14 commented Apr 13, 2016

SparkQA commented Apr 13, 2016

andrewor14 commented Apr 13, 2016

saucam commented Apr 14, 2016

SparkQA commented Apr 14, 2016

kasjain commented Apr 18, 2016

kasjain commented Apr 22, 2016

SparkQA commented Apr 22, 2016

rxin commented Apr 30, 2016

kasjain commented May 2, 2016

SparkQA commented May 2, 2016

marmbrus commented May 23, 2016

marmbrus commented May 23, 2016

SparkQA commented May 23, 2016

kasjain commented May 24, 2016

gatorsmile commented Jun 12, 2017

gatorsmile commented Jun 27, 2017