-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically register DAGs that are used in a context manager #23592
Conversation
4c2bdff
to
a62b484
Compare
This works for the current example sub-dags we have in the tests -- is there any other case I should test? |
@ashb there are at least two things to consider here which I stumbled upon playing around with this:
I ended up instead with this pattern: dag = DAG(...)
with dag:
... – as a recommended pattern, simply because then you can't forget to give the DAG instance a top-level variable. And then as I mention in https://lists.apache.org/thread/xndt3zklxbzo5nlmjz10dtfm2pp4t4wq, we could instead warn the user if they'd created a DAG which isn't exposed/registered. (The gist of that linked reply is that there are two different ways to do that.) |
@malthe Is it the param name that is confusing? I do not understand I follow what you are saying. The use-case for this PR is that we don't need to needlessly do a
Isn't it the same behaviour as the one with |
@kaxil I understand the use-case – what I am somewhat hesitant to vote for is the conflation of the two orthogonal concerns which are both suggested for
I think you could argue then that any DAG instance created during the loading of a module should be automatically registered – after all, why would a user create a DAG instance only to throw it away? The problem with that is that we do have a test case that prevents this: def test():
dag = DAG(...)
# Add tasks to dag and forget returning the DAG instance
test()
# or dag = test(), same result because `test` has no return value The point is, should we require the use of |
We could extend the auto-register -- but that makes it slightly more complex for SubDag (we'd have to "unregister" a dag when it gets turned in to a sub dag) but we could do it just as easily if we want to. Edit: actually I guess that |
c1a7796
to
ac66a51
Compare
@@ -3022,24 +3029,28 @@ class DagContext: | |||
|
|||
""" | |||
|
|||
_context_managed_dag: Optional[DAG] = None | |||
_previous_context_managed_dags: List[DAG] = [] | |||
_context_managed_dags: Deque[DAG] = deque() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we need a deque here? From what I can tell a plain list is enough (with the ordering reversed, i.e. we append to / pop from the end and use -1 to access).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason no -- mostly thinking that we only ever need to access the head value, or push a new head value, and deque is more optimized for that case.
But it hardly matters as this is not on the critical path for performance.
ac66a51
to
301c20a
Compare
301c20a
to
bb5d2de
Compare
@@ -937,8 +953,7 @@ def test_get_dag_with_dag_serialization(self): | |||
def test_collect_dags_from_db(self): | |||
"""DAGs are collected from Database""" | |||
db.clear_db_dags() | |||
example_dags_folder = airflow.example_dags.__path__[0] | |||
dagbag = DagBag(example_dags_folder) | |||
dagbag = DagBag(str(example_dags_folder)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just allow DagBag
to take Path
, but this doesn’t need to be done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was my thought -- it was orthogonal to this PR
bb5d2de
to
ed6a98f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I still need to update the docs/tutorial etc to reflect this. |
Reminder to self: finish this before 2.4 |
ed6a98f
to
6cdd00e
Compare
6cdd00e
to
92fa142
Compare
@@ -306,6 +314,8 @@ def _load_modules_from_file(self, filepath, safe_mode): | |||
if mod_name in sys.modules: | |||
del sys.modules[mod_name] | |||
|
|||
DagContext.current_autoregister_module_name = mod_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in-place operation feels a bit weird to me… Perhaps a context manager would be easier to understand and maintain? Something like
with DagContext.enable_autoregister(mod_name):
parse(...)
This probably needs some refactoring to work though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree, This is essentially used as a global variable.
It's not very easy to refactor given it's use across three or so different functions, so a context manager would be tricky. (I.e. I can't see an easy refactor to make it work right now)
291af76
to
2f793c5
Compare
This has caused a bit of confusion to committers (missing off the `as dag`) and is just more user friendly This is on by default but can be disabled by passing `auto_register=False` to a DAG
72bbc61
to
48f7281
Compare
This has caused a bit of confusion to committers (missing off the
as dag
) and is just more user friendlyThis is on by default but can be disabled by passing
auto_register=False
do a DAG