-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-103] Allow jinja templates to be used in task params #1488
Conversation
@@ -1414,6 +1411,15 @@ def get_template_context(self, session=None): | |||
'test_mode': self.test_mode, | |||
} | |||
|
|||
# Allow task level param definitions to be rendered via jinja | |||
# using the context available up until this point | |||
if task.params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in the render_templates
and apply only on strings (use isinstance(obj, basestring)
to check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still holds. It hasn't been addressed or contested.
I think this can result in a change of behavior that could create some troubles or slow downs. If many tasks are referencing a params object (which is a common pattern), it will get "rendered" many times. This could result in either unexpected results or inefficiencies / regressions. It'd be wise to add a new bool param to If double rendering was an issue (I think with |
Yes, I can see that effectively caching values would avoid the potential slowdown of multiple jinja renders. However, would it not be possible to just store the rendered params on first call within a TI and serve this value of subsequent calls rather than delving into the tracking individual ids walking through the params dict? Also, I'm confused why you'd say it lives in render_template rather than get_template_context. I wouldn't like to treat it as just another field in template_fields, since you would always want the template_fields to be able to make use of the rendered params. Or are you suggesting a separate block in render_templates? |
|
a8309cf
to
69be5e4
Compare
|
69be5e4
to
4025764
Compare
Have updated the existing PR to include a For context sake, here is an example operator from one of my DAGs. The create_cluster = EmrSensor(
config='resources/emr/cluster/config.yaml',
steps='resources/emr/steps/steps.yaml',
params={
'db_host': "{{ task_instance.xcom_pull(task_ids='create_db') }}",
'year': "{{ (execution_date - macros.timedelta(days=18)).year }}",
'month': "{{ (execution_date - macros.timedelta(days=18)).month }}"
},
task_id='create_cluster',
dag=dag,
) So if implemented in `render_templates()`` are you thinking of something like this (not tested as yet)...? def render_templates(self):
task = self.task
jinja_context = self.get_template_context()
if hasattr(self, 'task') and hasattr(self.task, 'dag'):
if self.task.dag.user_defined_macros:
jinja_context.update(
self.task.dag.user_defined_macros)
rt = self.task.render_template # shortcut to method
if self.render_params:
jinja_context['params'] = rt('params', jinja_context['params'], jinja_context)
for attr in task.__class__.template_fields:
content = getattr(task, attr)
if content:
rendered_content = rt(attr, content, jinja_context)
setattr(task, attr, rendered_content) I guess it comes down to whether you view that the rendered params are special cases of normal templated variables or whether they are actually now the context for rendering - this was my original view, but after writing out what I think the updated render_templates would look like, it's a little more six of one and half a dozen of the other now. I'm happy to resubmit the PR in whichever form seems best... NB: It seems that if not mark_success:
context = self.get_template_context()
task_copy = copy.copy(task)
self.task = task_copy
def signal_handler(signum, frame):
'''Setting kill signal handler'''
logging.error("Killing subprocess")
task_copy.on_kill()
raise AirflowException("Task received SIGTERM signal")
signal.signal(signal.SIGTERM, signal_handler)
self.render_templates(context=context)
...
def render_templates(self, context=None):
task = self.task
jinja_context = context or self.get_template_context() |
|
@mistercrunch any chance of an update? |
@@ -489,7 +489,8 @@ def to_csv( | |||
schema='default', | |||
delimiter=',', | |||
lineterminator='\r\n', | |||
output_header=True): | |||
output_header=True, | |||
fetch_size=1000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have the potential to break existing implementations of HiveHook, if someone is expecting > 1000 rows? I think this should default to None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this related to the PR?
I think this looks good and the explicit @mistercrunch are you ok with it? |
Ah. The hive-hook shouldn't be there. I think I must have included it from someone elses commit when doing a rebase. Will tidy it up and do a force push later today. |
0d41e15
to
5548dbb
Compare
5548dbb
to
d1387b3
Compare
Current coverage is 67.78%@@ master #1488 diff @@
==========================================
Files 116 116
Lines 8285 8289 +4
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 5611 5618 +7
+ Misses 2674 2671 -3
Partials 0 0
|
A bit later than planned but the PR has been rebased again to current. |
@withnale Thanks for all the hard work! Sorry this one got overlooked a bit. As @mistercrunch felt a bit strongly about this I would like hime to chime in before merging. So I hope you still have a bit of patience with us ;). |
Hi @withnale.. sorry for long delay in reviewing this PR. If you still wants to pursue this PR, please resolve the conflict and address the concern raised by @mistercrunch. You would also need to change the commit message, so that it doesn't cross the 50 char limit. We've 1 weeks time before this PR gets marked as |
….1 (apache#1488) Signed-off-by: Renovate Bot <[email protected]> Co-authored-by: Renovate Bot <[email protected]>
Dear Airflow Maintainers,
Please accept this PR that addresses the following issues:
Reminders for contributors: