Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename dagrun_id to dag_run_id #21806

Merged
merged 12 commits into from
Mar 2, 2022

Conversation

dstandish
Copy link
Contributor

I think we can call this attribute dag_run_id without suffering too much ambiguity, principally because dag does not, strictly speaking, have a run_id; it's the dag run that has a run_id.

For a long time, the codebase has used dag_run_id or "dagrun id" to refer to the dagrun.run_id column.

As of recently, we now have dagrun_id alongside dag_run_id -- the former is supposed to refer to DagRun.id and the latter to DagRun.run_id. But personally I think that retains too much ambiguity. When reading it, you're not sure if it maybe is just a typo, or what exactly it is / how it is different from dag_run_id. And it doesn't obey standard camel-to-snake conversion.

I think we can do a better job of reducing ambiguity by using, wherever possible, run_id to mean DagRun.run_id and dag_run_id to mean the integer PK. (i don't think we have to resort to the painfully verbose dag run run id).

So, this PR hopes to rename dagrun_id to dag_run_id, and tries to replace any old references to "dag run id" to simply "run id".

The one wrinkle is the API. There are references to "dag_run_id". What I've done here is leave that alone. We could think of the DagRun.id as internal, and not user-facing from the rest API perspective. We could consider deprecating / renaming this param, but I'm considering that out of scope since it's already been out there and is unaffected by the recent changes (which this PR hopes to address before they go out in 2.3.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Any user/developer-facing backcompat concerns?

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Feb 24, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Comment on lines -44 to +45
Column("dagrun_id", Integer(), nullable=False),
Column("dag_run_id", Integer(), nullable=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO dagrun_id is more appropriate here because it refers to the id column on the dagrun table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the table is actually dag_run -- not dagrun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that change your feeling about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm with TP on this. Yes the table is "dag_run", but by keeping this as "dagrun_id" it is (marginally) clearer that this is the "id" column of the "dagrun" table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand, but i disagree.

in my experience, if you have an order_line table, you call the identity column either order_line_id or simply id, but not orderline_id. you don't violate your snake case convention for identity columns.

and there's good reason we use snake case. consider table ab_permission_view_role. would you really want to call that table's identity column abpermissionviewrole_id?

actually, looking at that table, its ident col is id -- maybe we should just go with that here.

Copy link
Contributor Author

@dstandish dstandish Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see we're talking about the column in the xcom table, which is pointing to the dag run table.

The PK in dag_run is already id.

So yeah, same reasoning essentially applies. When we refer to id col of a table, it should be f"{table_name}_{col_name}". I don't think there's really any reason to diverge from this convention here. There's no "run_id" on the "dag" table so there's no conflict here.

Can find precedent for this with column view_menu_id in table ab_permission_view (which keys to ab_view_menu.id), and column permission_view_id in table ab_permission_view_role (which keys to ab_permission_view.id).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is essentially down to personally preference. I won’t insist if you feel dag_run_id is better.

(Personally I never use underscores in my table names, so this is kind of undefined in my mental style guide. And dag_run is also an outlier in Airflow since most other tables e.g. taskinstance do lack the undercores).

Copy link
Contributor Author

@dstandish dstandish Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And dag_run is also an outlier in Airflow since most other tables e.g. taskinstance do lack the undercores).

it's not actually true, they are all snake case. the table for TaskInstance is task_instance. and there's not a single table in airflow db that's not snake case table name. (But the python module names are like that.)

I guess this is essentially down to personally preference

To some extent that's true but consistency and convention also count for something. Particularly in the context of database object naming conventions, in this area it's a lot easier to make conventions about than some other decisions we have to make. And referring to the PK as f"{table_name}_id is i think pretty standard[1]. E.g. you would not create a table like this:

create table dag_run (
dagrun_id int identity,
val varchar
)

That would not make any sense. Why not call the table dagrun in that case? I think it's essentially the same here. The table is called dag_run so it's ID should be referred to as dag_run_id. Since the table names have already made the decision to go snake case, why should the columns go backwards on that decision? And it's not that we can't ever deviate from the standard choice, but that side of the argument bears the burden; there should be a good reason to make an exception.

As always, I'll go with the majority view, but just trying to present what makes sense to me. And of course it's always better if we can reach consensus.


[1] Jetbrains can even suggest joins for this table naming pattern when no FK is defined.

@dstandish
Copy link
Contributor Author

@ashb re

I think this is fine. Any user/developer-facing backcompat concerns?

I think we're ok here.

i think the only changes that could possibly cause an issue are these:

airflow.api.common.mark_tasks.set_state (signature change, rename param dag_run_id to run_id)
IN_MEMORY_RUN_ID (rename from IN_MEMORY_DAG_RUN_ID)
DagRunNotBackfillDep (rename from DagrunIdDep)

I think all of these can reasonably be considered internal.

And as mentioned in the description above, there will remain API parameters dag_run_id which will refer to the dag run run_id. But I think that's OK and sortof orthogonal to the addition of this dag run ID column. If we ever need to expose the integer Id, we'd probably want to rename "dag_run_id" to "run_id" anyway, rather than having two params "dag_run_id" and "dagrun_id" coexisting together.

@dstandish
Copy link
Contributor Author

side note, question for @uranusjr :
this bit here: https://github.com/apache/airflow/pull/21806/files/f3aaa031be561d3e6671dbd734bb427fe283ddf8#diff-3f95f161bd9ef4bb455611e0c58583899769360afc53f755cd1577cf194553c5R70-R79

now that we are keying xcom to the PK of dag_run, we should be able to look up the dag run by the PK rather than doing this join:

        primaryjoin="""and_(
            BaseXCom.dag_id == foreign(DagRun.dag_id),
            BaseXCom.run_id == foreign(DagRun.run_id),
        )""",

@uranusjr
Copy link
Member

now that we are keying xcom to the PK of dag_run, we should be able to look up the dag run by the PK rather than doing this join:

Good point, I think I simply missed changing this when adding the dagrun_id fk.

@dstandish dstandish force-pushed the dagrun_id-to-dag_run_id branch from f3aaa03 to 46ebed1 Compare February 28, 2022 21:33
airflow/models/xcom.py Outdated Show resolved Hide resolved
airflow/models/xcom.py Outdated Show resolved Hide resolved
@ashb ashb force-pushed the dagrun_id-to-dag_run_id branch from a22d13e to 0389861 Compare March 1, 2022 10:27
@dstandish dstandish marked this pull request as draft March 1, 2022 21:53
@dstandish dstandish marked this pull request as ready for review March 2, 2022 06:17
@dstandish dstandish force-pushed the dagrun_id-to-dag_run_id branch from 8d6e02c to 2a0603d Compare March 2, 2022 06:22
@dstandish dstandish merged commit 42e2b80 into apache:main Mar 2, 2022
@dstandish dstandish deleted the dagrun_id-to-dag_run_id branch March 2, 2022 19:03
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:CLI area:providers area:webserver Webserver related Issues full tests needed We need to run full set of tests for this PR to merge type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants