-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename dagrun_id
to dag_run_id
#21806
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. Any user/developer-facing backcompat concerns?
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Column("dagrun_id", Integer(), nullable=False), | ||
Column("dag_run_id", Integer(), nullable=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO dagrun_id
is more appropriate here because it refers to the id
column on the dagrun
table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the table is actually dag_run
-- not dagrun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that change your feeling about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm with TP on this. Yes the table is "dag_run", but by keeping this as "dagrun_id" it is (marginally) clearer that this is the "id" column of the "dagrun" table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand, but i disagree.
in my experience, if you have an order_line table, you call the identity column either order_line_id
or simply id
, but not orderline_id
. you don't violate your snake case convention for identity columns.
and there's good reason we use snake case. consider table ab_permission_view_role
. would you really want to call that table's identity column abpermissionviewrole_id
?
actually, looking at that table, its ident col is id
-- maybe we should just go with that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah i see we're talking about the column in the xcom table, which is pointing to the dag run table.
The PK in dag_run is already id
.
So yeah, same reasoning essentially applies. When we refer to id col of a table, it should be f"{table_name}_{col_name}"
. I don't think there's really any reason to diverge from this convention here. There's no "run_id" on the "dag" table so there's no conflict here.
Can find precedent for this with column view_menu_id
in table ab_permission_view
(which keys to ab_view_menu.id
), and column permission_view_id
in table ab_permission_view_role
(which keys to ab_permission_view.id
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is essentially down to personally preference. I won’t insist if you feel dag_run_id
is better.
(Personally I never use underscores in my table names, so this is kind of undefined in my mental style guide. And dag_run
is also an outlier in Airflow since most other tables e.g. taskinstance
do lack the undercores).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And dag_run is also an outlier in Airflow since most other tables e.g. taskinstance do lack the undercores).
it's not actually true, they are all snake case. the table for TaskInstance
is task_instance
. and there's not a single table in airflow db that's not snake case table name. (But the python module names are like that.)
I guess this is essentially down to personally preference
To some extent that's true but consistency and convention also count for something. Particularly in the context of database object naming conventions, in this area it's a lot easier to make conventions about than some other decisions we have to make. And referring to the PK as f"{table_name}_id
is i think pretty standard[1]. E.g. you would not create a table like this:
create table dag_run (
dagrun_id int identity,
val varchar
)
That would not make any sense. Why not call the table dagrun
in that case? I think it's essentially the same here. The table is called dag_run
so it's ID should be referred to as dag_run_id
. Since the table names have already made the decision to go snake case, why should the columns go backwards on that decision? And it's not that we can't ever deviate from the standard choice, but that side of the argument bears the burden; there should be a good reason to make an exception.
As always, I'll go with the majority view, but just trying to present what makes sense to me. And of course it's always better if we can reach consensus.
[1] Jetbrains can even suggest joins for this table naming pattern when no FK is defined.
@ashb re
I think we're ok here. i think the only changes that could possibly cause an issue are these: airflow.api.common.mark_tasks.set_state (signature change, rename param I think all of these can reasonably be considered internal. And as mentioned in the description above, there will remain API parameters |
side note, question for @uranusjr : now that we are keying xcom to the PK of dag_run, we should be able to look up the dag run by the PK rather than doing this join: primaryjoin="""and_(
BaseXCom.dag_id == foreign(DagRun.dag_id),
BaseXCom.run_id == foreign(DagRun.run_id),
)""", |
Good point, I think I simply missed changing this when adding the |
f3aaa03
to
46ebed1
Compare
a22d13e
to
0389861
Compare
8d6e02c
to
2a0603d
Compare
I think we can call this attribute
dag_run_id
without suffering too much ambiguity, principally because dag does not, strictly speaking, have arun_id
; it's the dag run that has arun_id
.For a long time, the codebase has used
dag_run_id
or "dagrun id" to refer to the dagrun.run_id column.As of recently, we now have
dagrun_id
alongsidedag_run_id
-- the former is supposed to refer toDagRun.id
and the latter toDagRun.run_id
. But personally I think that retains too much ambiguity. When reading it, you're not sure if it maybe is just a typo, or what exactly it is / how it is different from dag_run_id. And it doesn't obey standard camel-to-snake conversion.I think we can do a better job of reducing ambiguity by using, wherever possible,
run_id
to meanDagRun.run_id
anddag_run_id
to mean the integer PK. (i don't think we have to resort to the painfully verbosedag run run id
).So, this PR hopes to rename
dagrun_id
todag_run_id
, and tries to replace any old references to "dag run id" to simply "run id".The one wrinkle is the API. There are references to "dag_run_id". What I've done here is leave that alone. We could think of the DagRun.id as internal, and not user-facing from the rest API perspective. We could consider deprecating / renaming this param, but I'm considering that out of scope since it's already been out there and is unaffected by the recent changes (which this PR hopes to address before they go out in 2.3.