Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26673][FollowUp][SQL] File source V2: remove duplicated broadcast object in FileWriterFactory #23800

Closed

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Feb 15, 2019

What changes were proposed in this pull request?

This is a followup PR to fix two issues in #23601:

  1. the class FileWriterFactory contains conf: SerializableConfiguration as a member, which is duplicated with WriteJobDescription. serializableHadoopConf . By removing it we can reduce the broadcast task binary size by around 70KB
  2. The test suite OrcV1QuerySuite/OrcV1QuerySuite/OrcV1PartitionDiscoverySuite didn't change the configuration SQLConf.USE_V1_SOURCE_WRITER_LIST to "orc". We should set the conf.

How was this patch tested?

Unit test

@gengliangwang
Copy link
Member Author

@cloud-fan

@cloud-fan
Copy link
Contributor

LGTM

@gengliangwang gengliangwang changed the title [SPARK-26673][SQL] File source V2: remove duplicated broadcast object in FileWriterFactory [SPARK-26673][FollowUp][SQL] File source V2: remove duplicated broadcast object in FileWriterFactory Feb 15, 2019
@SparkQA
Copy link

SparkQA commented Feb 15, 2019

Test build #102394 has finished for PR 23800 at commit 07f7a34.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ast object in FileWriterFactory

## What changes were proposed in this pull request?

This is a followup PR to fix two issues in apache#23601:
1.  the class `FileWriterFactory` contains `conf: SerializableConfiguration` as a member, which is duplicated with `WriteJobDescription. serializableHadoopConf `. By removing it we can reduce the broadcast task binary size by around 70KB
2. The test suite `OrcV1QuerySuite`/`OrcV1QuerySuite`/`OrcV1PartitionDiscoverySuite` didn't change the configuration `SQLConf.USE_V1_SOURCE_WRITER_LIST` to `"orc"`. We should set the conf.

## How was this patch tested?

Unit test

Closes apache#23800 from gengliangwang/reduceWriteTaskSize.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants