Generated supported files per Spark version #10424

jlowe · 2024-02-14T22:48:14Z

Is your feature request related to a problem? Please describe.
Currently the files generated under tools/generated_files that detail what is supported is only generated on the lowest Spark version supported (currently Spark 3.1.1). This means any Spark operations that are new since the lowest Spark version are missing from these files.

Describe the solution you'd like
The tools/generated_files/supported* files should be per Spark version (e.g.: tools/generated_files/spark311/supportedExecs.csv, etc. for Spark 3.1.1, tools/generated_files/spark320/supportedExecs.csv, etc. for Spark 3.2.0, etc.)

Describe alternatives you've considered
Spark version as a dimension within the existing files, but that makes it more difficult to generate given the separate Spark builds per Spark version.

sameerz · 2024-02-20T21:11:32Z

File a follow up for Databricks

gerashegalov · 2024-02-20T21:22:30Z

File a follow up for Databricks

There is a way to avoid a dedicated handling for databricks if we stop requiring the generated files being checked in. Instead they can be part of generated resources in the toools-spark3XY.jar s. What do you think @jlowe ?

jlowe · 2024-02-20T23:08:10Z

Either we're modifying the Databricks build to generate these CSV files and updating the nighty builds to post PRs for any modified tool CSV files, or we're modifying the Databricks build to deploy a new tools jar artifact. We need to generate the files either way, so it's just a difference in how those generated files are published after the build. I see pros and cons to both approaches.

@amahussein @cindyyuanjiang do you have any preference on how these files should be prepared to be consumed by the downstream tools pipelines?

amahussein · 2024-02-21T16:01:13Z

Yeah, I can see the pros and cons in each approach.
The Artifact solution seems to be more complex for the average users.

I think generating the CSV files is easier to consume. For example, if we are troubleshooting a customer case and we can check manually by looking at a CSV file on a repo.

jlowe added ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building tools labels Feb 14, 2024

jlowe self-assigned this Feb 14, 2024

jlowe mentioned this issue Feb 16, 2024

Generate CSV data per Spark version for tools [databricks] #10440

Merged

mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 20, 2024

jlowe closed this as completed in #10440 Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated supported files per Spark version #10424

Generated supported files per Spark version #10424

jlowe commented Feb 14, 2024

sameerz commented Feb 20, 2024

gerashegalov commented Feb 20, 2024 •

edited

Loading

jlowe commented Feb 20, 2024

amahussein commented Feb 21, 2024

Generated supported files per Spark version #10424

Generated supported files per Spark version #10424

Comments

jlowe commented Feb 14, 2024

sameerz commented Feb 20, 2024

gerashegalov commented Feb 20, 2024 • edited Loading

jlowe commented Feb 20, 2024

amahussein commented Feb 21, 2024

gerashegalov commented Feb 20, 2024 •

edited

Loading