Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Support for Yarn queue and other extras in SparkSubmit Operator and Hook #35911

Closed
2 tasks done
pateash opened this issue Nov 28, 2023 · 9 comments · Fixed by #36151
Closed
2 tasks done

Adding Support for Yarn queue and other extras in SparkSubmit Operator and Hook #35911

pateash opened this issue Nov 28, 2023 · 9 comments · Fixed by #36151

Comments

@pateash
Copy link
Contributor

pateash commented Nov 28, 2023

Description

Spark-submit
--queue thequeue option specifies the YARN queue to which the application should be submitted.

more - https://spark.apache.org/docs/3.2.0/running-on-yarn.html

Use case/motivation

The --queue option is particularly useful in a multi-tenant environment where different users or groups have allocated resources in specific YARN queues.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@pateash pateash added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Nov 28, 2023
@pateash
Copy link
Contributor Author

pateash commented Nov 28, 2023

Please assign this to me, this is a quick fix
I will try to complete this ASAP.

@hussein-awala
Copy link
Member

@pateash assigned you.

@pateash
Copy link
Contributor Author

pateash commented Dec 2, 2023

Currently it's possible to set this option inside Spark connection, but we should have the ability to override it if needed using operator arguments.
image

@eladkal
Copy link
Contributor

eladkal commented Dec 2, 2023

Currently it's possible to set this option inside Spark connection, but we should have the ability to override it if needed using operator arguments.

So do we have a task to complete here?

@pateash
Copy link
Contributor Author

pateash commented Dec 3, 2023

@eladkal,
I am thinking of adding it as an argument in SparkSubmitOperator if user wants to override it from connection extras.
let me know what you think.

@eladkal
Copy link
Contributor

eladkal commented Dec 3, 2023

We normally allow to override connection extra from the operator level. For example with Snowflake you can set different warehouse than the default one set in the connection.

@pateash
Copy link
Contributor Author

pateash commented Dec 7, 2023

Yes, it's done using args AFAIK,
Let me add an argument to override this.

@pateash pateash changed the title Adding Support for Yarn queue in SparkSubmit Operator Adding Support for Yarn queue and other extras in SparkSubmit Operator and Hook Dec 12, 2023
@databius
Copy link

databius commented Mar 18, 2024

I think the update caused the new issue since SparkSubmitOperator accidentally override the Base Operator's parameter.

https://github.com/apache/airflow/blob/main/airflow/providers/apache/spark/operators/spark_submit.py#L76

:param queue: The name of the YARN queue to which the application is submitted.
(will overwrite any yarn queue defined in the connection's extra JSON)

https://github.com/apache/airflow/blob/main/airflow/models/baseoperator.py#L587

:param queue: which queue to target when running this job. Not
 all executors implement queue management, the CeleryExecutor
 does support targeting specific queues.

So if you are using CeleryKubernetesExecutor, you cannot decide the Operator run on KubernetesExecutor by setting the queue to kubernetes_queue

@djuarezg
Copy link

djuarezg commented Mar 25, 2024

Since 4.6.0 where this was introduced, onwards, using SparkSubmitOperator is broken for us, as the queue can no longer be taken into account due to this change.

Before, this queue parameter overrode BaseOperator value where a queue is an Airflow queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants