Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated supported files per Spark version #10424

Closed
jlowe opened this issue Feb 14, 2024 · 4 comments · Fixed by #10440
Closed

Generated supported files per Spark version #10424

jlowe opened this issue Feb 14, 2024 · 4 comments · Fixed by #10440
Assignees
Labels
build Related to CI / CD or cleanly building tools

Comments

@jlowe
Copy link
Member

jlowe commented Feb 14, 2024

Is your feature request related to a problem? Please describe.
Currently the files generated under tools/generated_files that detail what is supported is only generated on the lowest Spark version supported (currently Spark 3.1.1). This means any Spark operations that are new since the lowest Spark version are missing from these files.

Describe the solution you'd like
The tools/generated_files/supported* files should be per Spark version (e.g.: tools/generated_files/spark311/supportedExecs.csv, etc. for Spark 3.1.1, tools/generated_files/spark320/supportedExecs.csv, etc. for Spark 3.2.0, etc.)

Describe alternatives you've considered
Spark version as a dimension within the existing files, but that makes it more difficult to generate given the separate Spark builds per Spark version.

@jlowe jlowe added ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building tools labels Feb 14, 2024
@jlowe jlowe self-assigned this Feb 14, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 20, 2024
@sameerz
Copy link
Collaborator

sameerz commented Feb 20, 2024

File a follow up for Databricks

@gerashegalov
Copy link
Collaborator

gerashegalov commented Feb 20, 2024

File a follow up for Databricks

There is a way to avoid a dedicated handling for databricks if we stop requiring the generated files being checked in. Instead they can be part of generated resources in the toools-spark3XY.jar s. What do you think @jlowe ?

@jlowe
Copy link
Member Author

jlowe commented Feb 20, 2024

Either we're modifying the Databricks build to generate these CSV files and updating the nighty builds to post PRs for any modified tool CSV files, or we're modifying the Databricks build to deploy a new tools jar artifact. We need to generate the files either way, so it's just a difference in how those generated files are published after the build. I see pros and cons to both approaches.

@amahussein @cindyyuanjiang do you have any preference on how these files should be prepared to be consumed by the downstream tools pipelines?

@amahussein
Copy link
Collaborator

Yeah, I can see the pros and cons in each approach.
The Artifact solution seems to be more complex for the average users.

I think generating the CSV files is easier to consume. For example, if we are troubleshooting a customer case and we can check manually by looking at a CSV file on a repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building tools
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants