Show custom instance names for a mapped task in UI #36797

RNHTTR · 2024-01-15T16:19:46Z

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/models/dagrun.py

uranusjr · 2024-01-18T06:22:02Z

I’m thinking the following flow:

Define map_index_template on the operator (done)
Instead of just forwarding the template to TaskInstance, we only store the rendered result on it.
- Say we call it map_rendered_name (bikeshedding welcomed)
- We don’t really want to show the template anywhere anyway (it’s arguably worse than map_index without rendering since every mapped instance will have the exact same value).
- When a task instance is created (not yet run), the field will be NULL in the database (trivial, all columns are nullable by default).
In _run_raw_task, we render the value (like you already did), and store the result in self.map_rendered_name.
- This will be flushed to the database when the function ends along with other states.
Show the value in the web UI.

We’ll also need a database migration for the new field on TaskInstance.

Does this sound reasonable?

airflow/utils/context.py

airflow/models/taskinstance.py

airflow/decorators/base.py

airflow/models/baseoperator.py

airflow/models/taskinstance.py

airflow/serialization/serialized_objects.py

airflow/models/taskinstance.py

airflow/serialization/serialized_objects.py

airflow/models/taskinstance.py

tests/models/test_mappedoperator.py

uranusjr · 2024-01-29T06:17:25Z

Also need to look into test failures and static check errors, and fix them.

uranusjr · 2024-02-21T11:22:56Z

Forgot to actually make rendered_map_index visible in the UI 🤦

airflow/www/static/js/dag/details/taskInstance/MappedInstances.tsx

….tsx Co-authored-by: Brent Bovenzi <[email protected]>

airflow/www/static/js/dag/details/taskInstance/MappedInstances.tsx

uranusjr · 2024-02-23T06:00:11Z

Some todos after this PR is merged:

There are still places in the UI we still show the raw map index. We should either change them to show the rendered value, or both (in e.g. table views).
In the REST API, should we add a way for the user to filter with the rendered value, instead of map_index? A new parameter is probably needed if we do. But I’m not exactly sure we need to do it (since the API is for machines anyway?)

MrBounty · 2024-02-23T14:17:47Z

Hello,

I'm new to Airflow and open-source in general, and I'm currently implementing Airflow within my work. I'm very interested in the feature you're working on. Thank you for your contributions!

I'd like to ask about the possibility of getting access to this feature before its official release. I understand it may be part of the upcoming Airflow 2.9.0 version, which is currently 83% complete without a set due date.

Would it be possible to access the feature by creating a branch from my version of Airflow that includes your changes? I'm still learning the ropes of Git, so any guidance on the feasibility of this would be greatly appreciated.

If this approach is possible, I'm happy to test the feature and provide feedback.
Thank you,

potiuk · 2024-02-23T14:35:02Z

Would it be possible to access the feature by creating a branch from my version of Airflow that includes your changes? I'm still learning the ropes of Git, so any guidance on the feasibility of this would be greatly appreciated.

Sure but it is does not come with any guarantees whatsoever, it might break any time and if it breaks it might stay broken for a long time. However if you will, you can definitely run airflow locally from latest main and if you see any issues, you can even fix them via PR (this is what most people contributing here are doing).

You can follow contribution flow -> see https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst as an entry point - then you have local virtualenv or Breeze contenerized development environment working - where you can more or less easily run airflow, but you have to be prepared for all the cases - like having to wipe your database, reinstall things from scratch, having your machine blow up becaus of used memory and CPU and all the bad things that can happen during the development. We cannot recommend it for running anything tha resembles productoin, and you should not put too much faith on being able to continue using the same instance when 2.9.0 is released, most likely you will have to reinstall everything from the scratch and wipe your database (and you will likely do it several times till the release date). But it also gives you the opportunity to contribute back and improve things.

Also for that you need to learn git and branching etc. based on the contributing guide, there is an expectation that you know what you are doing there.

There is no other "approach" that allows you to run things which are not released officially in production. Whatever we do here is purely for development and contribuion purposes. If you want to be exclusively a user, then releasing the software is a Legal Act of the Apache Software Foundation, and only then when the software is formally released and there are 3 binding +1 votes from the PMC members, the software we release should be used by the users who are not contributors. This has legal, licencing implications and even if we would like to, we cannot ever say that whatever we have in the repo is "usable" by users. It's usable to do contributions. Full stop.

bbovenzi

UI parts look good to me. We can go through and find more areas where we want to show rendered map index in a follow-up PR.

tirkarthi · 2024-02-26T13:15:11Z

Are migrations being worked on in another PR? This breaks main branch testing for me since rendered_map_index doesn't have migration.

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: task_instance.rendered_map_index
[SQL: SELECT task_instance.try_number, task_instance.task_id, task_instance.dag_id, task_instance.run_id, task_instance.map_index, task_instance.start_date, task_instance.end_date, task_instance.duration, task_instance.state, task_instance.max_tries, task_instance.hostname, task_instance.unixname, task_instance.job_id, task_instance.pool, task_instance.pool_slots, task_instance.queue, task_instance.priority_weight, task_instance.operator, task_instance.custom_operator_name, task_instance.queued_dttm, task_instance.queued_by_job_id, task_instance.pid, task_instance.executor_config, task_instance.updated_at, task_instance.rendered_map_index, task_instance.external_executor_id, task_instance.trigger_id, task_instance.trigger_timeout, task_instance.next_method, task_instance.next_kwargs, dag_run_1.state AS state_1, dag_run_1.id, dag_run_1.dag_id AS dag_id_1, dag_run_1.queued_at, dag_run_1.execution_date, dag_run_1.start_date AS start_date_1, dag_run_1.end_date AS end_date_1, dag_run_1.run_id AS run_id_1, dag_run_1.creating_job_id, dag_run_1.external_trigger, dag_run_1.run_type, dag_run_1.conf, dag_run_1.data_interval_start, dag_run_1.data_interval_end, dag_run_1.last_scheduling_decision, dag_run_1.dag_hash, dag_run_1.log_template_id, dag_run_1.updated_at AS updated_at_1, dag_run_1.clear_number 
FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id 
WHERE task_instance.dag_id = ? AND dag_run.execution_date >= ? AND dag_run.execution_date <= ? AND ((task_instance.task_id, task_instance.map_index) NOT IN (SELECT 1, 1 FROM (SELECT 1, 1) WHERE 1!=1)) ORDER BY dag_run.execution_date]
[parameters: ('task_duration', '2024-01-07 00:00:00.000000', '2024-01-10 05:41:51.314018')]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

tirkarthi · 2024-02-26T13:38:05Z

Migration PR in #37708

MrBounty · 2024-02-27T10:50:32Z

Would it be possible to access the feature by creating a branch from my version of Airflow that includes your changes? I'm still learning the ropes of Git, so any guidance on the feasibility of this would be greatly appreciated.

Sure but it is does not come with any guarantees whatsoever, it might break any time and if it breaks it might stay broken for a long time. However if you will, you can definitely run airflow locally from latest main and if you see any issues, you can even fix them via PR (this is what most people contributing here are doing).

You can follow contribution flow -> see https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst as an entry point - then you have local virtualenv or Breeze contenerized development environment working - where you can more or less easily run airflow, but you have to be prepared for all the cases - like having to wipe your database, reinstall things from scratch, having your machine blow up becaus of used memory and CPU and all the bad things that can happen during the development. We cannot recommend it for running anything tha resembles productoin, and you should not put too much faith on being able to continue using the same instance when 2.9.0 is released, most likely you will have to reinstall everything from the scratch and wipe your database (and you will likely do it several times till the release date). But it also gives you the opportunity to contribute back and improve things.

Also for that you need to learn git and branching etc. based on the contributing guide, there is an expectation that you know what you are doing there.

There is no other "approach" that allows you to run things which are not released officially in production. Whatever we do here is purely for development and contribuion purposes. If you want to be exclusively a user, then releasing the software is a Legal Act of the Apache Software Foundation, and only then when the software is formally released and there are 3 binding +1 votes from the PMC members, the software we release should be used by the users who are not contributors. This has legal, licencing implications and even if we would like to, we cannot ever say that whatever we have in the repo is "usable" by users. It's usable to do contributions. Full stop.

Thank you! This is what I was missing, specially the how to contribute. I will take a look

boring-cyborg bot added the area:serialization label Jan 15, 2024

Bisk1 reviewed Jan 15, 2024

View reviewed changes

airflow/models/dagrun.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 18, 2024

View reviewed changes

airflow/utils/context.py Show resolved Hide resolved

RNHTTR added the area:dynamic-task-mapping AIP-42 label Jan 18, 2024

uranusjr reviewed Jan 19, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

RNHTTR requested a review from uranusjr January 20, 2024 17:44

uranusjr reviewed Jan 23, 2024

View reviewed changes

airflow/decorators/base.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 23, 2024

View reviewed changes

airflow/models/baseoperator.py Show resolved Hide resolved

uranusjr force-pushed the custom-mapped-task-indices branch from 5ab2fde to 167af3c Compare January 23, 2024 06:59

uranusjr reviewed Jan 23, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 23, 2024

View reviewed changes

airflow/serialization/serialized_objects.py Show resolved Hide resolved

uranusjr reviewed Jan 23, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 23, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

RNHTTR force-pushed the custom-mapped-task-indices branch 2 times, most recently from 52a4979 to 02ca753 Compare January 23, 2024 20:47

uranusjr reviewed Jan 24, 2024

View reviewed changes

airflow/serialization/serialized_objects.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 24, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 24, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 29, 2024

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

uranusjr reviewed Jan 29, 2024

View reviewed changes

tests/models/test_mappedoperator.py Outdated Show resolved Hide resolved

RNHTTR mentioned this pull request Feb 7, 2024

Names for expanded tasks #23020

Closed

2 tasks

RNHTTR added 7 commits February 7, 2024 15:33

draft for map_index_template to overwrite map_index in the UI

5ec12b7

add rendered_map_index to TI table based on map_index_template

4d49f79

address comments

b026f33

rebase

f2d853e

rebase

4d7ac49

add tests

892d2a4

fix static checks

dc63d1a

uranusjr removed request for kaxil, XD-DENG and pierrejeambrun February 21, 2024 11:22

RNHTTR added 2 commits February 21, 2024 15:29

reflect mapped ti label in UI

1e8f3f5

remove console.log

025573f

bbovenzi reviewed Feb 21, 2024

View reviewed changes

airflow/www/static/js/dag/details/taskInstance/MappedInstances.tsx Outdated Show resolved Hide resolved

RNHTTR and others added 3 commits February 21, 2024 19:04

Update airflow/www/static/js/dag/details/taskInstance/MappedInstances…

00e45a4

….tsx Co-authored-by: Brent Bovenzi <[email protected]>

render custom map index in other locations

2779f68

Merge branch 'main' into custom-mapped-task-indices

12ad5d7

bbovenzi reviewed Feb 22, 2024

View reviewed changes

airflow/www/static/js/dag/details/taskInstance/MappedInstances.tsx Outdated Show resolved Hide resolved

RNHTTR and others added 2 commits February 22, 2024 18:45

fix UI static checks

3984384

Merge branch 'main' into custom-mapped-task-indices

e44fb3f

RNHTTR marked this pull request as ready for review February 23, 2024 03:42

RNHTTR requested review from uranusjr and bbovenzi February 23, 2024 03:42

uranusjr approved these changes Feb 23, 2024

View reviewed changes

bbovenzi approved these changes Feb 23, 2024

View reviewed changes

uranusjr merged commit 6a00111 into apache:main Feb 24, 2024
59 checks passed

tirkarthi mentioned this pull request Feb 26, 2024

Add migration for rendered_map_index column in task_instance table. #37708

Merged

ephraimbuddy added the type:new-feature Changelog: New Features label Mar 6, 2024

anteverse mentioned this pull request Apr 17, 2024

Move render map index method and apply to dry run #39087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show custom instance names for a mapped task in UI #36797

Show custom instance names for a mapped task in UI #36797

RNHTTR commented Jan 15, 2024

uranusjr commented Jan 18, 2024

uranusjr commented Jan 29, 2024

uranusjr commented Feb 21, 2024

uranusjr commented Feb 23, 2024

MrBounty commented Feb 23, 2024

potiuk commented Feb 23, 2024 •

edited

Loading

bbovenzi left a comment

tirkarthi commented Feb 26, 2024

tirkarthi commented Feb 26, 2024 •

edited

Loading

MrBounty commented Feb 27, 2024

Show custom instance names for a mapped task in UI #36797

Show custom instance names for a mapped task in UI #36797

Conversation

RNHTTR commented Jan 15, 2024

uranusjr commented Jan 18, 2024

uranusjr commented Jan 29, 2024

uranusjr commented Feb 21, 2024

uranusjr commented Feb 23, 2024

MrBounty commented Feb 23, 2024

potiuk commented Feb 23, 2024 • edited Loading

bbovenzi left a comment

Choose a reason for hiding this comment

tirkarthi commented Feb 26, 2024

tirkarthi commented Feb 26, 2024 • edited Loading

MrBounty commented Feb 27, 2024

potiuk commented Feb 23, 2024 •

edited

Loading

tirkarthi commented Feb 26, 2024 •

edited

Loading