-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix schedule_downstream_tasks bug #42582
fix schedule_downstream_tasks bug #42582
Conversation
Can you add test that prevent regression? |
f2f8ab2
to
a4328a7
Compare
Thank you for the suggestion. I've added test to prevent regression. Please check the latest commit. |
e095405
to
fd928db
Compare
1708d5f
to
a45a61b
Compare
Also, DB tests currently fail |
a45a61b
to
eb5833b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I'm ok with merging it after resolving my last nitpick.
@potiuk / @uranusjr / @ephraimbuddy - any objections?
* fix schedule_downstream_tasks bug * remove partial_subset * Update comment --------- Co-authored-by: 维湘 <[email protected]> (cherry picked from commit 3fceaa6)
* fix schedule_downstream_tasks bug * remove partial_subset * Update comment --------- Co-authored-by: 维湘 <[email protected]> (cherry picked from commit 3fceaa6) Co-authored-by: luoyuliuyin <[email protected]>
* fix schedule_downstream_tasks bug * remove partial_subset * Update comment --------- Co-authored-by: 维湘 <[email protected]>
* fix schedule_downstream_tasks bug * remove partial_subset * Update comment --------- Co-authored-by: 维湘 <[email protected]>
* fix schedule_downstream_tasks bug * remove partial_subset * Update comment --------- Co-authored-by: 维湘 <[email protected]> (cherry picked from commit 3fceaa6) Co-authored-by: luoyuliuyin <[email protected]>
* fix schedule_downstream_tasks bug * remove partial_subset * Update comment --------- Co-authored-by: 维湘 <[email protected]>
closes: #42581
Problem Description
The trigger_rule of
task_one_success
isone_success
. When the upstream node oftask_one_success
has not yet run,task_one_success
is skipped. According to the semantics ofone_success
,task_one_success
should be able to run.In this scenario, Airflow turns on the
schedule_after_task_execution
parameter, which means that after the upstream node finishes running, it will try to schedule the downstream node in the current worker.This problem may occur when
task_1
runs faster thantask_run
. More specifically, it occurs whentask_1
finishes running and successfully schedules downstream tasks in the current worker.Related Code
Below is the code in question
When
task_1
is finished, it will try to schedule downstream tasks. First, a partial dag will be generated.task => "task_1"
task.downstream_task_ids => "task_2"
include_downstream=True => ["task_2"]
include_upstream=False => ["task_2"]
include_direct_upstream=True => ["task_2", "task_skip", "task_one_success", "task_1"]
So the final
partial_dag
is["task_2", "task_skip", "task_one_success", "task_1"]
This partial_dag is incomplete because
task_one_success
's other upstream nodetask_run
is not in it.Specifically, theinclude_upstream
parameter should not be falseSolution
The correct subgraph division should be as follows,
include_upstream=True
:task => "task_1"
task.downstream_task_ids => "task_2"
include_downstream=True => ["task_2"]
include_upstream=True =>["task_2", "task_skip", "task_one_success", "task_1", "task_run", "branch"]
include_direct_upstream=True => ["task_2", "task_skip", "task_one_success", "task_1", "task_run", "branch"]
So the final partial_dag is
["task_2", "task_skip", "task_one_success", "task_1", "task_run", "branch"]
The final partial_dag should be as follows:
Subgraph pruning will only be performed when the
schedule_after_task_execution
parameter is turned on. Normal scheduler scheduling will not have this problem.