Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use back_populates instead of sqlalchemy.orm.backref in bidirectional relationships #39430

Closed
wants to merge 6 commits into from

Conversation

Taragolis
Copy link
Contributor

backref is a legacy approach for construct relationships, instead of that preferable way is use back_populates keyword argument in relationship for built it explicitly


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler area:serialization labels May 6, 2024
@Taragolis Taragolis changed the title Use back_populates instead of backref in bidirectional relationships Use back_populates instead of sqlalchemy.orm.backref in bidirectional relationships May 6, 2024
@Taragolis Taragolis force-pushed the use-explicit-back-populates branch 2 times, most recently from 1a1e18d to 168fc11 Compare May 6, 2024 14:58
airflow/__init__.py Outdated Show resolved Hide resolved
airflow/__init__.py Outdated Show resolved Hide resolved

Only makes sense for SchedulerJob and BackfillJob instances.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? Let's move it to the task_instances_enqueued. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This do not required anymore, you could achieve it by explicit bidirectional relationships expressions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO

This construction looks more clear

primaryjoin="Job.id == foreign(DagRun.creating_job_id)"

Rather than this

primaryjoin=lambda: Job.id == foreign(_resolve_dagrun_model().creating_job_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original comment stored information about two different relationships

"TaskInstances which have been enqueued by this Job." -> task_instances_enqueued
"Only makes sense for SchedulerJob and BackfillJob" -> dag_runs

Copy link
Contributor Author

@Taragolis Taragolis May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some proofs

DAG

from __future__ import annotations

from datetime import datetime, timezone

from airflow.decorators import task
from airflow.models.dag import DAG


START_DATE = datetime(2024, 2, 1, tzinfo=timezone.utc)


with DAG("pr_39430", schedule="@once", tags=["pr", "39430", "bi-directional"], start_date=START_DATE):
    @task
    def do_nothing(): ...

    do_nothing()

Run one time by scheduler and one manual run, by trigger DAG from the UI, results in Tables

Table job

column
id 3 4 1 2
dag_id pr_39430 pr_39430 null null
state success success running running
job_type LocalTaskJob LocalTaskJob TriggererJob SchedulerJob
start_date 2024-05-22 11:45:23.867096 +00:00 2024-05-22 11:45:29.782799 +00:00 2024-05-22 11:44:49.709191 +00:00 2024-05-22 11:44:49.813874 +00:00
end_date 2024-05-22 11:45:24.143576 +00:00 2024-05-22 11:45:30.040894 +00:00 null null
latest_heartbeat 2024-05-22 11:45:23.853896 +00:00 2024-05-22 11:45:29.766046 +00:00 2024-05-22 11:45:35.885734 +00:00 2024-05-22 11:45:37.722701 +00:00
executor_class null null null null
hostname cd2591016c8c cd2591016c8c cd2591016c8c cd2591016c8c
unixname root root root root

Table task_instance

column
task_id do_nothing do_nothing
dag_id pr_39430 pr_39430
run_id scheduled__2024-02-01T00:00:00+00:00 manual__2024-05-22T11:45:28.453199+00:00
map_index -1 -1
start_date 2024-05-22 11:45:23.921193 +00:00 2024-05-22 11:45:29.821869 +00:00
end_date 2024-05-22 11:45:24.085409 +00:00 2024-05-22 11:45:29.979149 +00:00
duration 0.164216 0.15728
state success success
try_number 1 1
max_tries 0 0
hostname cd2591016c8c cd2591016c8c
unixname root root
job_id 3 4
pool default_pool default_pool
pool_slots 1 1
queue default default
priority_weight 1 1
operator _PythonDecoratedOperator _PythonDecoratedOperator
custom_operator_name @task @task
queued_dttm 2024-05-22 11:45:23.394469 +00:00 2024-05-22 11:45:29.546020 +00:00
queued_by_job_id 2 2
pid 2375 2377
executor null null
executor_config 0x80057D942E 0x80057D942E
updated_at 2024-05-22 11:45:23.930163 +00:00 2024-05-22 11:45:29.831787 +00:00
rendered_map_index null null
external_executor_id null null
trigger_id null null
trigger_timeout null null
next_method null null
next_kwargs null null
task_display_name do_nothing do_nothing

Table dag_run

column
id 1 2
dag_id pr_39430 pr_39430
queued_at 2024-05-22 11:45:23.351323 +00:00 2024-05-22 11:45:28.490357 +00:00
execution_date 2024-02-01 00:00:00.000000 +00:00 2024-05-22 11:45:28.453199 +00:00
start_date 2024-05-22 11:45:23.373351 +00:00 2024-05-22 11:45:29.516604 +00:00
end_date 2024-05-22 11:45:24.476332 +00:00 2024-05-22 11:45:30.609079 +00:00
state success success
run_id scheduled__2024-02-01T00:00:00+00:00 manual__2024-05-22T11:45:28.453199+00:00
creating_job_id 2 null
external_trigger false true
run_type scheduled manual
conf 0x80057D942E 0x80057D942E
data_interval_start 2024-02-01 00:00:00.000000 +00:00 2024-05-22 11:45:28.453199 +00:00
data_interval_end 2024-02-01 00:00:00.000000 +00:00 2024-05-22 11:45:28.453199 +00:00
last_scheduling_decision 2024-05-22 11:45:24.474830 +00:00 2024-05-22 11:45:30.606623 +00:00
dag_hash cd6f55b5c736bf50d2f341663c0ad83f cd6f55b5c736bf50d2f341663c0ad83f
log_template_id 2 2
updated_at 2024-05-22 11:45:24.477225 +00:00 2024-05-22 11:45:30.609766 +00:00
clear_number 0 0

Relationships

LocalJob (id: 3) create TI (run_id: scheduled__2024-02-01T00:00:00+00:00)
LocalJob (id: 4) create TI (run_id: manual__2024-05-22T11:45:28.453199+00:00)
SchedulerJob (id: 2) queued TI (run_id: scheduled__2024-02-01T00:00:00+00:00)
SchedulerJob (id: 2) queued TI (run_id: manual__2024-05-22T11:45:28.453199+00:00) - Even if it Manual DAG Run

DagRun (run_id: scheduled__2024-02-01T00:00:00+00:00) created by SchedulerJob (id: 2)
DagRun (run_id: manual__2024-05-22T11:45:28.453199+00:00) doesn't created by any Job

airflow/models/taskinstance.py Show resolved Hide resolved
airflow/providers_manager.py Outdated Show resolved Hide resolved
@Taragolis Taragolis force-pushed the use-explicit-back-populates branch from 9ed927a to 106026f Compare May 29, 2024 13:54
@Taragolis Taragolis added the full tests needed We need to run full set of tests for this PR to merge label May 29, 2024
@Taragolis Taragolis closed this May 29, 2024
@Taragolis Taragolis reopened this May 29, 2024
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a merge conflict though

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 22, 2024
@github-actions github-actions bot closed this Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler area:serialization full tests needed We need to run full set of tests for this PR to merge stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants