-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent templated field logic checks in __init__ of operators automatically #33786
Conversation
a4579c6
to
d72bb2b
Compare
I think it should be approached differently.. This PR should fail. It should not have exclusioins. It should only be merged when we (or others) fixed all the problems. This is what we do normally. Then this PR open should serve as reminder and incentivisation to fix all the places. So I think in general, the sequence should be reverse:
This is the aproach we've used for other cases that were similar. |
d72bb2b
to
8294516
Compare
Sounds good to me :) |
I am not sure if that's the right set of validations. I do not find loops or if/return particularly troubling. What I was really hoping for was a check that will check for "typical" errors made in operators that are not often caught during review, but they have some negative side effects. I am not sure if we want to limit operator's authors from raising warnings/running returns etc. This is far too much IMHO. We do not want to tell the authors "this is how you shoudl do your logic in constructor" - we want to make them aware when they made an obvious mistake. Generally I think we should only run validations that are very clearly violating some not even best but "reasonbly only possible" practice in Airflow. I think the rule of thumb should be "You should be able to tell the author EXACTLY how to correct their code and explain why it is wrong". And it should be a mistake that "newbie" contributors are likely to make. The issue #29069 was really about this one:
otherwise having some logic for non-templated fields is perfectly fine. I would not want us to be a "dictactor" to tell people how to write their constructors. And there are few other things that are OBVIOUSLY wrong:
I'd start with these three rules, rather than trying to present some constructs that are not obviously wrong. |
Thanks for the detailed feedback - now I have a better sense of the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is too big, IMHO it should be split in small PRs before adding the pre-commit hook.
Yes, I'll split it - the files from the providers were here temporarily only for demonstration. |
8294516
to
4be3f19
Compare
f8d6a95
to
fc2bad9
Compare
Eventually, I created this list manually in #36484 :) |
fc2bad9
to
525be01
Compare
I think we can also improve the process here as it takes longer to finish the list of operators to handle. @shahar1 you can add exclusion list of operators to the pre-commit (the ones that are not ready). This should give us green build here thus we can merge this PR and enjoy the protection it brings. In the mean time @romsharon98 can continue to work on removing items from the exclusion list (we won't need to maintain #36484 any longer as the exclusion list will be on the pre-commit itself) |
525be01
to
de6a4e0
Compare
|
Edited: Dec. 16, 2023
closes: #29069
Concept
We want to add a pre-commit that checks for the templated field "invalid logic" within the constructor of providers' operators to avoid issues like #27328. "Invalid logic" is defined according to the following rules (thanks to @potiuk for the clarification):
Scope
We'll check that defined template fields are directly assigned with a corresponding parameter with an identical name passed from the constructor.
Implementation
Creating a pre-commit Python script that will enforce the limitations below.
Added limitations
as the fields. The following example is invalid, as the parameter passed into the constructor is not the same as the
templated field:
either by a direct assignment or by calling the parent's constructor (in which these fields are
defined as
template_fields
) with a direct assignment of the parameter.The following example is invalid, as the instance member
self.field_a
is not assigned at all, despite being a templated field:The following example is also invalid, as the instance member
self.field_a
ofMyHelloOperator
is initializedimplicitly as part of the
kwargs
passed to its parent constructor:execute()
method. For example, the following example is invalid:When an operator inherits from a base operator and does not have a constructor defined on its own, the limitations above do not apply. However, the templated fields must be set properly in the parent according to those limitations (currently not enforced by the pre-commit, see caveats).
Thus, the following example is valid:
Caveats
BaseOperator
(simple string comparison).templated_fields
are defined properly.Future work
TODOs
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.