Dynamically determine queue for a dag or task by calling a function that takes context as input #20516

lssatvik · 2021-12-25T19:17:32Z

lssatvik
Dec 25, 2021

Description

Right now the queue of a dag/task is determined by the queue parameter in dag/task definition or the config file. I want the queue parameter to take a function as input. If a function is given as input, it should pass a context variable and use the return value as the queue for the dag/task.

Airflow dags and tasks support callback functions like for on_success_callback, which is given the context variable and executed on success. I want similar capability for determining queue.

def getQueue(context):
    return f"{context.dag_id}-{context.run_id}"

def successCallback(context):
    log.info("Success")

dag = DAG(
            'test',
            schedule_interval='0 * * * *',
            default_args=default_args,
            start_date=datetime(2021, 12, 25),
            on_success_callback=successCallback
            queue=getQueue # or queue_generator=getQueue
        )

task = BashOperator(
            task_id='test',
            bash_command=f"""echo Namaste World""",
            on_success_callback=successCallback
            queue=getQueue # or queue_generator=getQueue
        )

Use case/motivation

I use an independent ec2 instance as a celery worker for every "dagrun". The queue for any dagrun is dag_id-run_id. In all my dags my first task is always an operator working in the "master" queue that sets up an ec2 instance and starts the worker with the custom queue name.

I modified a line in the _enqueue_task_instances_with_queued_state function in scheduler_job.py:-
queue = ti.queue if ti.queue == "master" else f"{ti.dag_id}-{ti.run_id}"

So finally the queue for every task that is not the ec2 operator is dag_id-run_id. As the ec2_operator starts a celery worker with that specific queue name all tasks not defined with "master" queue (which has a worker running locally) are executed in the celery worker.

So my setup required a small modification to the airflow code base. It would be helpful if the scheduler can determine the queue name through a user-defined function taking the context variable as input.

Presently airflow allows defining a queue only for each dag but not for each dag-run, as the run_id is determined at runtime. Also changing the queue name at the enqueue step does not reflect in the UI as the queue name is only changed w.r.t the scheduler only.

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

potiuk · 2021-12-27T13:07:07Z

potiuk
Dec 27, 2021
Collaborator

That is unlikely to happen because of security.

Callbacks are executed in the context of Worker or DagFileProcessor, Scheuler is not supposed to execute any code provided by the user in the DAG. IT's the scheduler that dermines which executor can be used, and it sends prepared task to the executor (sometimes based on the "queue") parameter. And as you mentioned - the celery workers pick the tasks from the queu that they are configured with, so by the time the task start, their queue already pre-determined where they should be run.

The only real place where you can change queue for the tasks is at teh DAG parsing time - which effectively means that once the task has been plced in the DAG structure it's queue has to be determined. You canot dynamically change it in scheduler. Schedulers just schedules whatever is declared in the code that comes "pre-installed" with airflow. - for example custom triggers, or custom timetables have to be pre-installed and DAGs cannot define their logic - they can at most declare and configure which timetable/trigger will be used.

So the only way it could be implemented is by defining some "customizable" mechanism of queue selection - rather than allow DAG writer to define it in the way that callbacks are defined.

I will convert it into discussion - maybe it will be picked by someone who would like to have similar mechanism, but at the very least it would require extensive discussion in devlist and AIP (Airflow Improvement Proposal).

2 replies

lssatvik Jan 4, 2022
Author

I don't think a function would be required, then can dag_id, dag_id + task_id, dag_id+ run_id, etc. be considered as customisable templates? A queue for each dag, or each task, or each dagrun. These common options can be predefined allowing to develop functions externally, as we can predict the queue.
With dag_id-run_id as a customizable queue format, we would be able to determine which dag/task/dagrun we are referring to.

prernadubey Jan 17, 2024

@lssatvik I am also having the requirement where I can pass request to correct queue and want to pass queue name dynamically not hard coded. Please let me know if you find some solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically determine queue for a dag or task by calling a function that takes context as input #20516

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Dynamically determine queue for a dag or task by calling a function that takes context as input #20516

lssatvik Dec 25, 2021

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Replies: 1 comment · 2 replies

potiuk Dec 27, 2021 Collaborator

lssatvik Jan 4, 2022 Author

prernadubey Jan 17, 2024

lssatvik
Dec 25, 2021

Replies: 1 comment 2 replies

potiuk
Dec 27, 2021
Collaborator

lssatvik Jan 4, 2022
Author