Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sequence concatination by unpacking in Airflow core #33934

Merged
merged 2 commits into from
Sep 5, 2023

Conversation

hussein-awala
Copy link
Member


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Aug 30, 2023
@hussein-awala
Copy link
Member Author

More readable and fater:

$ python -m timeit '[1, 2, 3] + [4] + [5] + [6] + [7]'
2000000 loops, best of 5: 179 nsec per loop

$ python -m timeit '[*[1, 2, 3], 4, 5, 6, 7]'
5000000 loops, best of 5: 79 nsec per loop

@@ -1699,7 +1699,7 @@ def _get_task_instances(
if include_subdags:
# Crafting the right filter for dag_id and task_ids combo
conditions = []
for dag in self.subdags + [self]:
for dag in [*self.subdags, self]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if itertools.chain would be better here

Copy link
Member

@Lee-W Lee-W Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some tests on it. It seems performance-wise [*[1, 2, 3], 4, 5, 6, 7] is the best solution

$ python -m timeit '[1, 2, 3] + [4] + [5] + [6] + [7]'
2000000 loops, best of 5: 179 nsec per loop

$ python -m timeit '[*[1, 2, 3], 4, 5, 6, 7]'
5000000 loops, best of 5: 69.2 nsec per loop

$ python -m timeit 'import itertools; itertools.chain([1, 2, 3], [4, 5, 6, 7])'
2000000 loops, best of 5: 177 nsec per loop

$ python -m timeit 'import itertools; [1, 2, 3] + [4] + [5] + [6] + [7]'
1000000 loops, best of 5: 242 nsec per loop

$ python -m timeit 'import itertools;  [*[1, 2, 3], 4, 5, 6, 7]'
2000000 loops, best of 5: 131 nsec per loop

$ python -m timeit 'import itertools; list(itertools.chain([1, 2, 3], [4, 5, 6, 7]))'
1000000 loops, best of 5: 305 nsec per loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big chunk of this would come from import itertools though, which would not be relevant in Airflow since the module is already imported in a lot of places. I would not be surprised if * is still best for small lists though, since the itertools version still needs to build an additional list (of a single item).

Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me if @uranusjr 's suggestion is applied

@potiuk
Copy link
Member

potiuk commented Sep 3, 2023

LGTM. I think using itertools for those is indeed over-the-top. I am fine witth the current version :)

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@uranusjr uranusjr merged commit 33e5d03 into apache:main Sep 5, 2023
51 checks passed
@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.2 milestone Oct 3, 2023
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Oct 3, 2023
ephraimbuddy pushed a commit that referenced this pull request Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants