Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac M1 mps error mps.scatter_nd when iterating across architectures #343

Open
mercicle opened this issue Jul 24, 2023 · 0 comments
Open

Comments

@mercicle
Copy link

mercicle commented Jul 24, 2023

Hi all,

I'm on Apple M1 and used the nanoGPT train.py to make a train_gpt(...) function, so I can sweep across architectures (varying number of layers, heads and embeddings).

When only using the cpu, I get no errors. But when using mps, I get this error:


loc("edb"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":49:0)): error: 'mps.scatter_nd' op invalid input tensor shape: updates tensor shape and data tensor shape must match along inner dimensions

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1710: failed assertion `Error: MLIR pass manager failed'

If I use mps option and keep the architecture fixed (e.g. {'n_layer': 4, 'n_head' : 4, 'n_embd' : 128}), then mps works without error. However, when I loop across varying architectures I get the error above ^^.

The issue is not with compatible n_layer, n_head and n_embd dimensions or memory issues. I've tested that the architectures in which the error occurs work fine if ran alone (rather than in a loop after a different architecture). The general rules I use for generating architectures are: 1. n_head is a multiple of n_layer, and 2. n_embd is divisible by n_head.

So it smells like the GPU cache is not getting cleared after each train_gpt(...) iteration and there is some stale dimensions being expected in operations.

Is there a way to clear the GPU cache in mps framework? I've tried throwing a mps.empty_cache() at the end of each iteration (by iteration, I mean a round of calling train_gpt(...) for a specific architecture, not an iteration in each training loop).

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant