Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Perform snapshot based model result test #2844

Closed
wants to merge 22 commits into from

Conversation

simon-mo
Copy link
Collaborator

@simon-mo simon-mo commented Feb 13, 2024

This PR implement snapshot based testing in CI such that the model output in float32 matches exactly as of Huggingface version.

@simon-mo simon-mo mentioned this pull request Feb 13, 2024
12 tasks
@sahilsuneja1
Copy link
Contributor

Hi @simon-mo , what's the status on this PR? I updated it a bit based on the current master (PR here) I do observe different outputs for vllm vs HF for Deci/DeciLM-7b. Example below:

prompt: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.

HF: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and it is compatible with all Transformers models.\nLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and it is compatible with all Transformers models.\nLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library

vLLM: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and is compatible with all of the same models and datasets.\nThe LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and is compatible with all of the same models and datasets.\nLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Trans

@simon-mo
Copy link
Collaborator Author

simon-mo commented Mar 6, 2024

@sahilsuneja1, thanks! I don't have bandwidth to work on this PR at the moment so if you can create another PR based on your fixes that would be awesome. As you can see the primary issue in the CI testing is that it runs out of memory even with --tensor-paralle-size 2 on our CI machine (L4x2), this is quite weird because I was able to run the test locally on the same hardware. If you can take over and iterate that would be appreciated!

@sahilsuneja1
Copy link
Contributor

sahilsuneja1 commented Mar 6, 2024

Got it, thanks @simon-mo, will do.
Where do you see the memory error? Probably in the run prior to the latest merge? How can I see that OOM run log?

@sahilsuneja1
Copy link
Contributor

Found the run history: https://buildkite.com/vllm/ci/builds?branch=simon-mo%3Amodels-snapshot-testing
Where's the OOM error?

@simon-mo
Copy link
Collaborator Author

Actually I'm going to close this PR because I think snapshot testing is still too brittle (need to update it on model by model basis). We should just get A100 machine to run these tests.

@simon-mo simon-mo closed this Mar 16, 2024
@sahilsuneja1
Copy link
Contributor

OK, I'll close the update PR as well

@simon-mo
Copy link
Collaborator Author

sorry about the back and forth!

@sahilsuneja1
Copy link
Contributor

No worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants