You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm on Apple M1 and used the nanoGPT train.py to make a train_gpt(...) function, so I can sweep across architectures (varying number of layers, heads and embeddings).
When only using the cpu, I get no errors. But when using mps, I get this error:
loc("edb"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":49:0)): error: 'mps.scatter_nd' op invalid input tensor shape: updates tensor shape and data tensor shape must match along inner dimensions
/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1710: failed assertion `Error: MLIR pass manager failed'
If I use mps option and keep the architecture fixed (e.g. {'n_layer': 4, 'n_head' : 4, 'n_embd' : 128}), then mps works without error. However, when I loop across varying architectures I get the error above ^^.
The issue is not with compatible n_layer, n_head and n_embd dimensions or memory issues. I've tested that the architectures in which the error occurs work fine if ran alone (rather than in a loop after a different architecture). The general rules I use for generating architectures are: 1. n_head is a multiple of n_layer, and 2. n_embd is divisible by n_head.
So it smells like the GPU cache is not getting cleared after each train_gpt(...) iteration and there is some stale dimensions being expected in operations.
Is there a way to clear the GPU cache in mps framework? I've tried throwing a mps.empty_cache() at the end of each iteration (by iteration, I mean a round of calling train_gpt(...) for a specific architecture, not an iteration in each training loop).
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hi all,
I'm on Apple M1 and used the nanoGPT train.py to make a train_gpt(...) function, so I can sweep across architectures (varying number of layers, heads and embeddings).
When only using the cpu, I get no errors. But when using mps, I get this error:
If I use mps option and keep the architecture fixed (e.g.
{'n_layer': 4, 'n_head' : 4, 'n_embd' : 128}
), then mps works without error. However, when I loop across varying architectures I get the error above ^^.The issue is not with compatible
n_layer
,n_head
andn_embd
dimensions or memory issues. I've tested that the architectures in which the error occurs work fine if ran alone (rather than in a loop after a different architecture). The general rules I use for generating architectures are: 1. n_head is a multiple of n_layer, and 2. n_embd is divisible by n_head.So it smells like the GPU cache is not getting cleared after each train_gpt(...) iteration and there is some stale dimensions being expected in operations.
Is there a way to clear the GPU cache in mps framework? I've tried throwing a
mps.empty_cache()
at the end of each iteration (by iteration, I mean a round of calling train_gpt(...) for a specific architecture, not an iteration in each training loop).Thanks in advance!
The text was updated successfully, but these errors were encountered: