-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate CSV data per Spark version for tools [databricks] #10440
Conversation
Signed-off-by: Jason Lowe <[email protected]>
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with optional nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we name the dir tools-support
to differentiate from the tools repo and to match the Maven module name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but we probably want to address this as a followup. I don't want to break downstream pipelines which are expecting to find things under tools/generated_files/, and those would need to be updated to check the new location if the old is missing.
@@ -264,7 +264,7 @@ function build_single_shim() { | |||
-Drat.skip="$SKIP_CHECKS" \ | |||
-Dmaven.scaladoc.skip \ | |||
-Dmaven.scalastyle.skip="$SKIP_CHECKS" \ | |||
-pl aggregator -am > "$LOG_FILE" 2>&1 || { | |||
-pl tools -am > "$LOG_FILE" 2>&1 || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the idea was to build the minimum set of artifacts for dist ? Should buildall have a separate option for including tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an argument to not build tools? I would expect buildall to be used by developers as an easy way to build all the Spark versions, and more often than not they want to build the tools CSV data to know whether or not they need to checkin updated files as part of their PR. The build of these files is quick, so we're saving much by making the default to not build the tools data, and I think building the tools data is a better default. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds less than 4 seconds to the parallel part of the build, so (totalShimsInProfile / numParallelShims) * 4 sec
is acceptable. Agree, no pressing need to add a skip option now
build |
Fixes #10424. Adds CSV generation per Apache Spark version under tools/generated_files. The existing CSV files for tools data has been preserved to avoid breaking downstream pipelines that use this data. Once those pipelines have been updated to consume the new per-Spark versions of these files, we can remove the old tools CSV files.