Improve checks on transformer cache #881

graemenail · 2021-09-15T12:40:33Z

Description

This PR strengthens the check on the cache in transformer attention, and improves cache access.

As background to this, @fiqas and I were encountering segfaults in some transformer models, on a custom branch, during decoding on both GPU and CPU. This emerged during a call tosuppressWords in which suppressedWordIndices had been corrupted, and pointed to an index beyond the vocabulary size. We traced this issue back to a bdot(q,k,...) operation in Attention where the batch dimension of q is smaller than that of k. In MultiHead, the former input is not cached, while the latter input is cached, and is currently only recomputed when the total number of elements of a relevant input change. We were unlucky to encounter a situation in which a cache value was kept.

The memory corruption seems to have occured because in the node operation for bdot bdot(a,b,...), it is implicitly assumed that a has the larger (or equal) batch dimension, and this is used when setting the resulting shape. The tensor operation ProdBatched takes the maximum of the batch sizes. Ideally, I would add some logic here to deal with this in some way, but I notice that these have been moved to _legacy in favour of a new implementation.

While I have not seen this arise on master, replacing the check on the number of elements to a check on the input shape from which the cache entry was computed is a stronger requirement, and should remove such calls to bdot.

List of changes:

Transformer attention cache is checked against the shape of the input
Improved access to the cache
Remove trailing whitespaces

Added dependencies: none

How to test

I tested the fix on the impacted model on our branch. I also ran the regression tests on this PR, after updating the expected outputs to match those obtained from current master on the same machine.

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

kpu · 2021-10-04T16:38:06Z

Brought up with @emjotde today, says he will take a look. Might want hash specialization in hash.h.

snukky · 2021-11-01T17:25:05Z

The automatic check with Ubuntu 16.04 is just an invalid artifact of previous runs. Ubuntu 16.04 has already been removed and this PR passes all required checks.

emjotde · 2022-05-29T22:51:43Z

@graemenail Hi, I have to revert this PR internally. It's actually causing a ton of "memory leaks" in the memory allocator during decoding. I wonder how you guys never ran into that.

emjotde · 2022-05-29T23:00:02Z

I can take a look later how to get that back, but for now this is causing many more bugs than it's solving.

graemenail · 2022-05-30T07:53:07Z

Hi @emjotde; that's fine - I have no strong feelings about this code. It was only ever meant to be a stopgap until a memoized solution was implemented.

About the memory leaks, was it specifically the changes in this PR, or did the previous implementation also suffer? I think this PR will keep more objects in the cache as the tensor shape is now part of the cache key.

emjotde · 2022-05-30T11:33:16Z

I did a git bisect, and it was this commit. During decoding with a large ensemble there was a growing memory allocation happening, not really a leak, but would result in OOM eventually.

graemenail · 2022-05-30T11:40:14Z

That sounds like it's the cache key. This cache is internal to transformer, and persists until it leaves scope, which is seemingly too long.

Is the sync from internal coming soon? Otherwise I'll patch this today to be more like the old implementation, but retain the check on shape.

emjotde · 2022-05-30T12:12:29Z

Yes, about to sync now. It was this issue that made me delay the sync since I thought I introduced that with something internally. I reverted already, was easy enough considering how local this PR is. We can then just re-open and see if we get that under control. I have a good testcase now (cannot share unfortunately but can run it).

This PR reverts changes to transformer caching (public PR #881) It seems to cause catastrophic memory leaks or incorrect de-allocation during decoding.

emjotde · 2022-05-30T12:28:51Z

Synced. I think I will do a release now too.

graemenail · 2022-05-30T12:34:25Z

Thanks @emjotde - sorry for the headache! We can revisit the caching, I'll try to dig up the model we had the issue with.

emjotde · 2022-05-30T12:48:47Z

No biggie. I will actually wait with the release until internal engineering confirms all the production test cases run smoothly, i.e. in a day or two.

Related: marian-nmt#881

graemenail added 5 commits November 1, 2021 13:25

Fix caching in transformer attention

7e6521b

Fix whitespace

4fca66c

Move hash specialization

7778c40

Swap comments to doxygen

444ecf9

Include string header

a2941ff

graemenail force-pushed the transformer-cache branch from f41fb00 to a2941ff Compare November 1, 2021 13:25

kpu requested a review from emjotde November 1, 2021 17:24

emjotde self-assigned this Nov 1, 2021

snukky approved these changes Jan 18, 2022

View reviewed changes

snukky merged commit 894a07a into marian-nmt:master Jan 24, 2022

marianminion mentioned this pull request Jan 24, 2022

Jenkins marian-dev-cpu-avx2 #140 failed #902

Closed

emjotde added a commit that referenced this pull request May 30, 2022

Merged PR 24072: Revert changes to transformer caching

042ed8f

This PR reverts changes to transformer caching (public PR #881) It seems to cause catastrophic memory leaks or incorrect de-allocation during decoding.

graemenail added a commit to graemenail/marian-dev that referenced this pull request Jun 8, 2022

Check size on transformer cache

a817edd

Related: marian-nmt#881

This was referenced Jun 8, 2022

Check size on transformer cache #942

Open

Check size on transformer cache browsermt/marian-dev#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve checks on transformer cache #881

Improve checks on transformer cache #881

graemenail commented Sep 15, 2021 •

edited

Loading

kpu commented Oct 4, 2021

snukky commented Nov 1, 2021

emjotde commented May 29, 2022

emjotde commented May 29, 2022

graemenail commented May 30, 2022

emjotde commented May 30, 2022

graemenail commented May 30, 2022

emjotde commented May 30, 2022 •

edited

Loading

emjotde commented May 30, 2022

graemenail commented May 30, 2022

emjotde commented May 30, 2022 •

edited

Loading

Improve checks on transformer cache #881

Improve checks on transformer cache #881

Conversation

graemenail commented Sep 15, 2021 • edited Loading

Description

How to test

Checklist

kpu commented Oct 4, 2021

snukky commented Nov 1, 2021

emjotde commented May 29, 2022

emjotde commented May 29, 2022

graemenail commented May 30, 2022

emjotde commented May 30, 2022

graemenail commented May 30, 2022

emjotde commented May 30, 2022 • edited Loading

emjotde commented May 30, 2022

graemenail commented May 30, 2022

emjotde commented May 30, 2022 • edited Loading

graemenail commented Sep 15, 2021 •

edited

Loading

emjotde commented May 30, 2022 •

edited

Loading

emjotde commented May 30, 2022 •

edited

Loading