Generate CSV data per Spark version for tools [databricks] #10440

jlowe · 2024-02-16T22:15:05Z

Fixes #10424. Adds CSV generation per Apache Spark version under tools/generated_files. The existing CSV files for tools data has been preserved to avoid breaking downstream pipelines that use this data. Once those pipelines have been updated to consume the new per-Spark versions of these files, we can remove the old tools CSV files.

Signed-off-by: Jason Lowe <[email protected]>

jlowe · 2024-02-16T22:15:45Z

build

jlowe · 2024-02-16T22:53:29Z

build

gerashegalov

LGTM with optional nits

gerashegalov · 2024-02-17T01:42:05Z

tools/pom.xml

Should we name the dir tools-support to differentiate from the tools repo and to match the Maven module name?

Sounds good, but we probably want to address this as a followup. I don't want to break downstream pipelines which are expecting to find things under tools/generated_files/, and those would need to be updated to check the new location if the old is missing.

gerashegalov · 2024-02-17T01:57:52Z

build/buildall

@@ -264,7 +264,7 @@ function build_single_shim() {
      -Drat.skip="$SKIP_CHECKS" \
      -Dmaven.scaladoc.skip \
      -Dmaven.scalastyle.skip="$SKIP_CHECKS" \
-      -pl aggregator -am > "$LOG_FILE" 2>&1 || {
+      -pl tools -am > "$LOG_FILE" 2>&1 || {


Previously the idea was to build the minimum set of artifacts for dist ? Should buildall have a separate option for including tools?

Should there be an argument to not build tools? I would expect buildall to be used by developers as an easy way to build all the Spark versions, and more often than not they want to build the tools CSV data to know whether or not they need to checkin updated files as part of their PR. The build of these files is quick, so we're saving much by making the default to not build the tools data, and I think building the tools data is a better default. Thoughts?

It adds less than 4 seconds to the parallel part of the build, so (totalShimsInProfile / numParallelShims) * 4 sec is acceptable. Agree, no pressing need to add a skip option now

gerashegalov · 2024-02-20T18:41:20Z

build

Generate CSV data per Spark version for tools

7e591c1

Signed-off-by: Jason Lowe <[email protected]>

jlowe added the build Related to CI / CD or cleanly building label Feb 16, 2024

jlowe self-assigned this Feb 16, 2024

jlowe requested review from revans2, tgravescs, GaryShen2008, NvTimLiu and gerashegalov as code owners February 16, 2024 22:15

jlowe mentioned this pull request Feb 16, 2024

Remove generation of top-level CSV data for tools #10441

Open

Update generated files

dd40fc7

gerashegalov approved these changes Feb 17, 2024

View reviewed changes

revans2 approved these changes Feb 20, 2024

View reviewed changes

jlowe mentioned this pull request Feb 20, 2024

Generate CSV tools support files for supported Databricks versions #10448

Open

jlowe merged commit 0bf3c28 into NVIDIA:branch-24.04 Feb 21, 2024
39 of 40 checks passed

jlowe deleted the toolgen-per-spark branch February 21, 2024 16:37

cindyyuanjiang mentioned this pull request Mar 8, 2024

[FEA] Sync supported CSV files with the spark-rapids repo NVIDIA/spark-rapids-tools#846

Closed

amahussein mentioned this pull request Mar 22, 2024

[FEA] Automate synchronization of RAPIDS plugin generated tools files NVIDIA/spark-rapids-tools#864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate CSV data per Spark version for tools [databricks] #10440

Generate CSV data per Spark version for tools [databricks] #10440

jlowe commented Feb 16, 2024

jlowe commented Feb 16, 2024

jlowe commented Feb 16, 2024

gerashegalov left a comment

gerashegalov Feb 17, 2024

jlowe Feb 20, 2024

gerashegalov Feb 17, 2024

jlowe Feb 20, 2024

gerashegalov Feb 20, 2024

gerashegalov commented Feb 20, 2024

Generate CSV data per Spark version for tools [databricks] #10440

Generate CSV data per Spark version for tools [databricks] #10440

Conversation

jlowe commented Feb 16, 2024

jlowe commented Feb 16, 2024

jlowe commented Feb 16, 2024

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov Feb 17, 2024

Choose a reason for hiding this comment

jlowe Feb 20, 2024

Choose a reason for hiding this comment

gerashegalov Feb 17, 2024

Choose a reason for hiding this comment

jlowe Feb 20, 2024

Choose a reason for hiding this comment

gerashegalov Feb 20, 2024

Choose a reason for hiding this comment

gerashegalov commented Feb 20, 2024