Rewrite the pass that adds mutexes for atomic nodes #8105

abadams · 2024-02-18T19:47:47Z

For O(n) nested allocate nodes, this pass was quadratic in n, even if
there was no use of atomics. This commit rewrites it to use a
linear-time algorithm, and skips it entirely after the first validation
pass if there aren't any atomic nodes.

It also needlessly used IRGraphMutators, which slowed things down,
didn't handle LargeBuffers (could overflow in the allocation),
incorrectly thought every producer/consumer node was associated with an
output buffer, and didn't print the realization name when printing the
atomic node (the body of an atomic node is only atomic w.r.t. a specific
realization).

I noticed all this because it stuck out in a profile. For resnet 50, the
rewrite that changed to a linear algorithm took this stage from 185ms
down to 6.7ms, and then skipping it entirely when it doesn't find any
atomic nodes took it to 1.5ms for the single IRVisitor check.

For local laplacian with 100 pyramid levels (which contains many nested
allocate nodes due to a large number of skip connections), the times are
5846 ms -> 16 ms -> 4.6 ms

This is built on top of #8103

This pattern has been bugging me for a long time: ``` if (scope.contains(key)) { Foo f = scope.get(key); } ``` This redundantly looks up the key in the scope twice. I've finally gotten around to fixing it. I've introduced a find method that either returns a const pointer to the value, if it exists, or null. It also searches any containing scopes, which are held by const pointer, so the method has to return a const pointer. ``` if (const Foo *f = scope.find(key)) { } ``` For cases where you want to get and then mutate, I added shallow_find, which doesn't search enclosing scopes, but returns a mutable pointer. We were also doing redundant scope lookups in ScopedBinding. We stored the key in the helper object, and then did a pop on that key in the ScopedBinding destructor. This commit changes Scope so that Scope::push returns an opaque token that you can pass to Scope::pop to have it remove that element without doing a fresh lookup. ScopedBinding now uses this. Under the hood it's just an iterator on the underlying map (map iterators are not invalidated on inserting or removing other stuff). The net effect is to speed up local laplacian lowering by about 5% I also considered making it look more like an stl class, and having find return an iterator, but it doesn't really work. The iterator it returns might point to an entry in an enclosing scope, in which case you can't compare it to the .end() method of the scope you have. Scopes are different enough from maps that the interface really needs to be distinct.

For O(n) nested allocate nodes, this pass was quadratic in n, even if there was no use of atomics. This commit rewrites it to use a linear-time algorithm, and skips it entirely after the first validation pass if there aren't any atomic nodes. It also needlessly used IRGraphMutators, which slowed things down, didn't handle LargeBuffers (could overflow in the allocation), incorrectly thought every producer/consumer node was associated with an output buffer, and didn't print the realization name when printing the atomic node (the body of an atomic node is only atomic w.r.t. a specific realization). I noticed all this because it stuck out in a profile. For resnet 50, the rewrite that changed to a linear algorithm took this stage from 185ms down to 6.7ms, and then skipping it entirely when it doesn't find any atomic nodes added 1.5 for the single IRVisitor check. For local laplacian with 100 pyramid levels (which contains many nested allocate nodes due to a large number of skip connections), the times are 5846 ms -> 16 ms -> 4.6 ms This is built on top of #8103

steven-johnson · 2024-02-18T20:39:44Z

I changed the base to abadams/scope_improvements to see just the changes here

… abadams/rewrite_atomic_pass

…c_pass

steven-johnson · 2024-02-27T18:25:48Z

Overnight test is good, ready to land (failure is irrelevant and will be fixed when llvm/llvm-project#83151 lands)

abadams added 3 commits February 17, 2024 17:28

Pacify clang-tidy

72bcf1d

steven-johnson changed the base branch from main to abadams/scope_improvements February 18, 2024 20:39

steven-johnson self-requested a review February 18, 2024 20:43

steven-johnson approved these changes Feb 18, 2024

View reviewed changes

abadams added 2 commits February 19, 2024 08:17

Fix unintentional mutation of interval in scope

1f8c8b5

Merge remote-tracking branch 'origin/abadams/scope_improvements' into…

eb94eb4

… abadams/rewrite_atomic_pass

abadams changed the base branch from abadams/scope_improvements to main February 22, 2024 18:53

Merge remote-tracking branch 'origin/main' into abadams/rewrite_atomi…

03fc785

…c_pass

Merge branch 'main' into abadams/rewrite_atomic_pass

ce16c73

abadams merged commit bf0d611 into main Mar 12, 2024
19 checks passed

BrewTestBot mentioned this pull request Jul 17, 2024

halide 18.0.0 Homebrew/homebrew-core#177657

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the pass that adds mutexes for atomic nodes #8105

Rewrite the pass that adds mutexes for atomic nodes #8105

abadams commented Feb 18, 2024

steven-johnson commented Feb 18, 2024

steven-johnson commented Feb 27, 2024

Rewrite the pass that adds mutexes for atomic nodes #8105

Rewrite the pass that adds mutexes for atomic nodes #8105

Conversation

abadams commented Feb 18, 2024

steven-johnson commented Feb 18, 2024

steven-johnson commented Feb 27, 2024