Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical inaccuracies in multi-device sharded toy-sized Llama #18687

Closed
sogartar opened this issue Oct 3, 2024 · 5 comments
Closed

Numerical inaccuracies in multi-device sharded toy-sized Llama #18687

sogartar opened this issue Oct 3, 2024 · 5 comments
Labels
bug 🐞 Something isn't working codegen/llvm LLVM code generation compiler backend

Comments

@sogartar
Copy link
Contributor

sogartar commented Oct 3, 2024

What happened?

I have this reproducer of numerical issues when executing for llvm-cpu a sharded Lllama model. It is a small size model with random weights and inputs. The numerics seem off compared to the PyTorch equivalent.
The prefill step result is almost correct ~1e-2 absolute error.
But the paged cache state is way off and everything in the decode step is wrong.

The reproducer with iree-run-module attached here does not check the cache state as it is an in-out argument.
The source code that was used to produce this data is here.
There also the in-out arguments are checked.

Steps to reproduce your issue

You would need this change to compile.

Download and extract sharded-toy-llama-inaccuracy-reproducer-2.zip.
./compile.sh
./run.sh

To verify the cache state you would need to run this test .

What component(s) does this issue relate to?

Compiler

Version information

To reproduce #18663 is required.

Additional context

The fix for the issue #18283 does not solve this issue.

@sogartar sogartar added bug 🐞 Something isn't working codegen/llvm LLVM code generation compiler backend labels Oct 3, 2024
@sogartar
Copy link
Contributor Author

sogartar commented Oct 3, 2024

FYI @IanNod.

@sogartar
Copy link
Contributor Author

sogartar commented Oct 9, 2024

Here is an update to the reproducer sharded-toy-llama-inaccuracy-reproducer-2.zip.
This fix did not solve this issue.
The original reproducer is sharded-toy-llama-inaccuracy-reproducer.zip.
I have updated the description with the new reproducer.

@sogartar
Copy link
Contributor Author

The paged cache is in-out arguments and needs to be updated in-place, but I don't see any usage of these arguments. It is the last 2 arguments in the prefill and decode functions.
My initial hypothesis is it got erroneously optimized out due to dead code elimination as we are not properly generating cache update code to be really in-place.

@sogartar
Copy link
Contributor Author

I am closing this as it is an issue with model exportation. I will reopen if there are other problems after exportation has been fixed.

@sogartar sogartar closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2024
@sogartar
Copy link
Contributor Author

This is the issue for the model exportation problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working codegen/llvm LLVM code generation compiler backend
Projects
None yet
Development

No branches or pull requests

1 participant