-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated supported files per Spark version #10424
Comments
File a follow up for Databricks |
There is a way to avoid a dedicated handling for databricks if we stop requiring the generated files being checked in. Instead they can be part of generated resources in the toools-spark3XY.jar s. What do you think @jlowe ? |
Either we're modifying the Databricks build to generate these CSV files and updating the nighty builds to post PRs for any modified tool CSV files, or we're modifying the Databricks build to deploy a new tools jar artifact. We need to generate the files either way, so it's just a difference in how those generated files are published after the build. I see pros and cons to both approaches. @amahussein @cindyyuanjiang do you have any preference on how these files should be prepared to be consumed by the downstream tools pipelines? |
Yeah, I can see the pros and cons in each approach. I think generating the CSV files is easier to consume. For example, if we are troubleshooting a customer case and we can check manually by looking at a CSV file on a repo. |
Is your feature request related to a problem? Please describe.
Currently the files generated under tools/generated_files that detail what is supported is only generated on the lowest Spark version supported (currently Spark 3.1.1). This means any Spark operations that are new since the lowest Spark version are missing from these files.
Describe the solution you'd like
The tools/generated_files/supported* files should be per Spark version (e.g.: tools/generated_files/spark311/supportedExecs.csv, etc. for Spark 3.1.1, tools/generated_files/spark320/supportedExecs.csv, etc. for Spark 3.2.0, etc.)
Describe alternatives you've considered
Spark version as a dimension within the existing files, but that makes it more difficult to generate given the separate Spark builds per Spark version.
The text was updated successfully, but these errors were encountered: