Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24206][SQL] Improve DataSource read benchmark code #21266

Closed
wants to merge 5 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented May 8, 2018

What changes were proposed in this pull request?

This pr added benchmark code DataSourceReadBenchmark for orc, paruqet, csv, and json based on the existing ParquetReadBenchmark and OrcReadBenchmark.

How was this patch tested?

N/A

@maropu
Copy link
Member Author

maropu commented May 8, 2018

I'll add benchmark results just after #21070 merged.

@maropu
Copy link
Member Author

maropu commented May 8, 2018

Also, I'll make a follow-up pr for pushdown benchmarks;
master...maropu:UpdateParquetBenchmark

@maropu
Copy link
Member Author

maropu commented May 8, 2018

@gatorsmile @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented May 8, 2018

Test build #90359 has finished for PR 21266 at commit 09b3920.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1,339 +0,0 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu . Since you are merging ParquetReadBenchmark and OrcReadBenchmark benchmarks, let's remove OrcReadBenchmark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we still need to OrcReadBenchmark to compare native orc with Hive built-in orc?

Copy link
Member

@dongjoon-hyun dongjoon-hyun May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Never mind. I deleted the previous comment. The scope is only testing native orc here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks!

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90392 has finished for PR 21266 at commit 2813706.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2018

Test build #90437 has finished for PR 21266 at commit 8aedbf0.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2018

Test build #90438 has finished for PR 21266 at commit 1d93d99.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 14, 2018

Test build #90572 has finished for PR 21266 at commit fc96adb.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 14, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 14, 2018

Test build #90579 has finished for PR 21266 at commit fc96adb.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* Benchmark to measure data source read performance.
* To run this:
* spark-submit --class <this class> <spark sql test jar>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this type of comments should not be included by scala file in this directory.
If you want to put this comment, this scala file should be put into sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ where files are not translated to doc.

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/DataSourceReadBenchmark.java:5: error: unknown tag: this
[error]  *  spark-submit --class <this class> <spark sql test jar>
[error]                          ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/DataSourceReadBenchmark.java:5: error: unknown tag: spark
[error]  *  spark-submit --class <this class> <spark sql test jar>
[error]     

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90875 has finished for PR 21266 at commit 3b6f541.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90877 has finished for PR 21266 at commit fad31b7.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90879 has finished for PR 21266 at commit d8c308f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 21, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90884 has finished for PR 21266 at commit d8c308f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

sqlBenchmark.addCase("SQL ORC Vectorized") { _ =>
spark.sql("SELECT sum(id) FROM orcTable").collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us explicitly set these confs? Here, we are expecting the perf number when ORC_COPY_BATCH_TO_SPARK is set to false. Please also double check the other benchmarks and add the related confs too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that ORC_COPY_BATCH_TO_SPARK=false in other tests (I didn't find performance differences after explicitly setting false in line 50.
https://github.com/apache/spark/pull/21266/files#diff-ae11b49db05c9e6829cad071b112a742R50

@gatorsmile
Copy link
Member

@maropu Great work! Thanks for helping this!

@maropu
Copy link
Member Author

maropu commented May 23, 2018

I'll update the benchmark results soon.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91012 has finished for PR 21266 at commit 5eab1a5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91017 has finished for PR 21266 at commit 5eab1a5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91036 has finished for PR 21266 at commit 5eab1a5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91049 has finished for PR 21266 at commit 5eab1a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in 84557bc May 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants