Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements to TPC-DS benchmarking code #565

Merged
merged 1 commit into from
Aug 18, 2020

Conversation

andygrove
Copy link
Contributor

This PR contains a few small improvements that I found helpful when benchmarking:

  1. It is now possible to run spark-submit to perform a CSV to parquet/orc conversion rather than using spark-shell and needing to keep a terminal session open to a remote server.
  2. It is possible to do the conversion with or without partitioning by column.
  3. Minor improvements to the summary output of each benchmark run.

@andygrove andygrove added the test Only impacts tests label Aug 14, 2020
@andygrove andygrove added this to the Aug 3 - Aug 14 milestone Aug 14, 2020
@revans2
Copy link
Collaborator

revans2 commented Aug 14, 2020

build

@andygrove andygrove merged commit 419f05a into NVIDIA:branch-0.2 Aug 18, 2020
@andygrove andygrove deleted the tpcds-automation branch August 18, 2020 16:02
pxLi added a commit to pxLi/spark-rapids that referenced this pull request Aug 24, 2020
* explicitly disable AQE in one test (NVIDIA#567)

Signed-off-by: Andy Grove <[email protected]>

* Enable using spark-submit to convert from csv to parquet (NVIDIA#565)

Signed-off-by: Andy Grove <[email protected]>

* xfail the Spark 3.1.0 integration tests that fail  (NVIDIA#580)

* xfail GpuTimeSub, arithmetic ops, and full outer join failures on 3.1.0

Signed-off-by: Thomas Graves <[email protected]>

* xfail the rest of the 3.1.0 tests and enable 3.1.0 unit tests in the
jenkins builds

Signed-off-by: Thomas Graves <[email protected]>

Co-authored-by: Thomas Graves <[email protected]>

* Fix unit tests when AQE is enabled (NVIDIA#558)

* Fix scala tests when AQE is enabled

Signed-off-by: Niranjan Artal <[email protected]>

* fix broadcasthashjoin tests

Signed-off-by: Niranjan Artal <[email protected]>

* fix indentation

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* Update buffer store to return compressed batches directly, add compression NVTX ranges (NVIDIA#572)

* Update buffer store to return compressed batches directly, add compression NVTX ranges

Signed-off-by: Jason Lowe <[email protected]>

* Update parameter name for clarity

Signed-off-by: Jason Lowe <[email protected]>

* xfail the tpch spark 3.1.0 tests that fail (NVIDIA#588)

Signed-off-by: Thomas Graves <[email protected]>

Co-authored-by: Thomas Graves <[email protected]>

* Move GpuParquetScan/GpuOrcScan into Shim (NVIDIA#590)

* Move GpuParquetScan to shim

Signed-off-by: Thomas Graves <[email protected]>

* Move scan overrides into shim

Signed-off-by: Thomas Graves <[email protected]>

* Rename GpuParquetScan object to match

Signed-off-by: Thomas Graves <[email protected]>

* Add tests for v2 datasources

Signed-off-by: Thomas Graves <[email protected]>

* Move OrcScan into shims

Signed-off-by: Thomas Graves <[email protected]>

* Fixes

Signed-off-by: Thomas Graves <[email protected]>

* Fix imports

Signed-off-by: Thomas Graves <[email protected]>

Co-authored-by: Thomas Graves <[email protected]>

* Filter nulls from joins where possible to improve performance. (NVIDIA#594)

* Filter nulls from joins where possible to improve performance.

Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Addressed review comments

Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Updated patch for other shims

* changelog generator

Signed-off-by: Peixin Li <[email protected]>

* only filter out labels for issue

* add nightly workflow on github actions

Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Thomas Graves <[email protected]>
Co-authored-by: Thomas Graves <[email protected]>
Co-authored-by: Niranjan Artal <[email protected]>
Co-authored-by: Jason Lowe <[email protected]>
Co-authored-by: Robert (Bobby) Evans <[email protected]>
pxLi added a commit to pxLi/spark-rapids that referenced this pull request Aug 24, 2020
* explicitly disable AQE in one test (NVIDIA#567)

Signed-off-by: Andy Grove <[email protected]>

* Enable using spark-submit to convert from csv to parquet (NVIDIA#565)

Signed-off-by: Andy Grove <[email protected]>

* xfail the Spark 3.1.0 integration tests that fail  (NVIDIA#580)

* xfail GpuTimeSub, arithmetic ops, and full outer join failures on 3.1.0

Signed-off-by: Thomas Graves <[email protected]>

* xfail the rest of the 3.1.0 tests and enable 3.1.0 unit tests in the
jenkins builds

Signed-off-by: Thomas Graves <[email protected]>

Co-authored-by: Thomas Graves <[email protected]>

* Fix unit tests when AQE is enabled (NVIDIA#558)

* Fix scala tests when AQE is enabled

Signed-off-by: Niranjan Artal <[email protected]>

* fix broadcasthashjoin tests

Signed-off-by: Niranjan Artal <[email protected]>

* fix indentation

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* addressed review comments

Signed-off-by: Niranjan Artal <[email protected]>

* Update buffer store to return compressed batches directly, add compression NVTX ranges (NVIDIA#572)

* Update buffer store to return compressed batches directly, add compression NVTX ranges

Signed-off-by: Jason Lowe <[email protected]>

* Update parameter name for clarity

Signed-off-by: Jason Lowe <[email protected]>

* xfail the tpch spark 3.1.0 tests that fail (NVIDIA#588)

Signed-off-by: Thomas Graves <[email protected]>

Co-authored-by: Thomas Graves <[email protected]>

* Move GpuParquetScan/GpuOrcScan into Shim (NVIDIA#590)

* Move GpuParquetScan to shim

Signed-off-by: Thomas Graves <[email protected]>

* Move scan overrides into shim

Signed-off-by: Thomas Graves <[email protected]>

* Rename GpuParquetScan object to match

Signed-off-by: Thomas Graves <[email protected]>

* Add tests for v2 datasources

Signed-off-by: Thomas Graves <[email protected]>

* Move OrcScan into shims

Signed-off-by: Thomas Graves <[email protected]>

* Fixes

Signed-off-by: Thomas Graves <[email protected]>

* Fix imports

Signed-off-by: Thomas Graves <[email protected]>

Co-authored-by: Thomas Graves <[email protected]>

* Filter nulls from joins where possible to improve performance. (NVIDIA#594)

* Filter nulls from joins where possible to improve performance.

Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Addressed review comments

Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Updated patch for other shims

* changelog generator

Signed-off-by: Peixin Li <[email protected]>

* only filter out labels for issue

* add nightly workflow on github actions

* fix format

Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Thomas Graves <[email protected]>
Co-authored-by: Thomas Graves <[email protected]>
Co-authored-by: Niranjan Artal <[email protected]>
Co-authored-by: Jason Lowe <[email protected]>
Co-authored-by: Robert (Bobby) Evans <[email protected]>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…p ci] [bot] (NVIDIA#565)

* Update submodule cudf to afbff54bc1d9d29b4689bd48ef7f7a9e99ab12c5

Signed-off-by: spark-rapids automation <[email protected]>

* Update submodule cudf to e64c2da1207d5069bc627b5b08bbcc1f25636e76

Signed-off-by: spark-rapids automation <[email protected]>

Signed-off-by: spark-rapids automation <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants