Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit/rewrite PNT30 flake8 lint failures to achieve more concurrency #18365

Open
4 of 17 tasks
huonw opened this issue Feb 24, 2023 · 1 comment
Open
4 of 17 tasks

Audit/rewrite PNT30 flake8 lint failures to achieve more concurrency #18365

huonw opened this issue Feb 24, 2023 · 1 comment

Comments

@huonw
Copy link
Contributor

huonw commented Feb 24, 2023

Is your feature request related to a problem? Please describe.

In #18303, a new flake8 rule is added that flags constructs like [await Get(...) for ... in ...]. If these are embarrassingly parallel (i.e. no dependencies between iterations), they should be rewritten to use MultiGet: for instance, await MultiGet(Get(...) for ... in ...).

That PR added noqa: PNT30: requires triage annotations to many instances of this, rather than fix them. This issue is about going through and either fixing them or making the annotation more specific.

Embarrassingly parallel:

  • src/python/paths/backend/go/goals/debug_goals.py
  • src/python/paths/backend/go/goals/generate.py
  • src/python/paths/backend/go/util_rules/cgo.py
  • src/python/paths/backend/go/util_rules/go_bootstrap.py
  • src/python/paths/backend/go/util_rules/goroot.py
  • src/python/paths/backend/helm/util_rules/post_renderer.py
  • src/python/paths/backend/python/framework/django/detect_apps.py
  • src/python/paths/bsp/util_rules/targets.py
  • src/python/paths/core/goals/deploy.py
  • src/python/paths/core/goals/update_build_files.py Triage some noqa: PNT30 await-in-loops #18831
  • src/python/paths/engine/internals/build_files.py Triage some noqa: PNT30 await-in-loops #18831
  • src/python/paths/engine/internals/dep_rules.py
  • src/python/paths/jvm/goals/lockfile.py
  • src/python/paths/source/source_root.py Triage some noqa: PNT30 await-in-loops #18831

Embarrassingly parallel, except there's an early return, so just moving all invocations to a MultiGet might result in more work total.

  • src/python/paths/backend/cc/util_rules/toolchain.py
  • src/python/paths/backend/python/goals/setup_py.py
  • src/python/paths/core/subsystems/python_bootstrap.py

Describe the solution you'd like
For each of the files above, rewrite the loops that call await Get(...) internally to use MultiGet.

Describe alternatives you've considered
N/A
Additional context
N/A

huonw added a commit that referenced this issue Feb 25, 2023
This fixes #7491 by implementing a flake8 rule that disallows using
`await Get(...)` or `await MultiGet(...) inside the body of a loop.
Generally, these should be `MultiGet` to allow them to run concurrently,
AIUI.

```python
# BAD: each Get runs sequentially
[await Get(X, Y(i)) for i in range(10)]
# GOOD: each Get runs in parallel
await MultiGet(Get(X, Y(i)) for i in range(10))
```

This rule sometimes flags correct code as an error: for instance,
running formatting tools is inherently sequential as one needs to use
the output of one as the input to the next. This can be allowed on a
case-by-case basis via the `# noqa: PNT30` annotation.

In order to get this rule landed faster (to reduce the chance of new
issues appearing), I haven't tried to fix the existing issues flagged by
the rule: this PR just applies that annotation to all current examples.
For some that are obviously sequential, I've added a note explaining,
while the rest have just `requires triage`. #18365 covers training
these.

I believe that sequential code will be the exception rather than the
rule, and so occasional `noqa: PNT30` annotations is fine, vs. making
the flake8 rule significantly more complicated (e.g. doing dataflow
analysis to determine if the result of one `await` call might influence
the next). For instance, one estimate:

- `rg -t py MultiGet | grep -v test | wc -l` reveals 647 `MultiGet`
calls
- this PR only has to add 28 `# noqa: PNT30` annotations, and only 8 of
these are definitely sequential (i.e. 20 of these might benefit from
being `MultiGet`s)

This doesn't flag loops containing `await some_helper()` where `async
def some_helper(): ...` might contain various `await Get(...)`s, because
there's currently no useful way to rewrite these to run concurrently
(#18364).
@kaos
Copy link
Member

kaos commented Feb 25, 2023

This one was never an issue, as there is a return at the end of the for-body it effectively works as an if-then block.

for request_type in request_types:
impl = await Get( # noqa: PNT30: this for loop will never process more than a single iteration.
BuildFileDependencyRulesImplementation,
BuildFileDependencyRulesImplementationRequest,
request_type(),
)
return MaybeBuildFileDependencyRulesImplementation(impl.build_file_dependency_rules_class)

Opted to fix the noqa text in #18112 to save on CI cycles.

huonw added a commit that referenced this issue Apr 25, 2023
This triages/resolves a few `# noqa: PNT30` exclusions about the
concurrency hazard of awaits in loops:

- two instances that almost-trivially can switch from (something like)
`(await Get(args) for ...)` to `await MultiGet(args for ...)`
- one instance that's actually sequential

This relates to #18365.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants