[CI] Perform snapshot based model result test #2844

simon-mo · 2024-02-13T02:53:20Z

This PR implement snapshot based testing in CI such that the model output in float32 matches exactly as of Huggingface version.

…hot-testing

sahilsuneja1 · 2024-03-01T18:35:16Z

Hi @simon-mo , what's the status on this PR? I updated it a bit based on the current master (PR here) I do observe different outputs for vllm vs HF for Deci/DeciLM-7b. Example below:

prompt: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.

HF: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and it is compatible with all Transformers models.\nLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and it is compatible with all Transformers models.\nLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library

vLLM: vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and is compatible with all of the same models and datasets.\nThe LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Transformers library, and is compatible with all of the same models and datasets.\nLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be a drop-in replacement for HuggingFace's Trans

simon-mo · 2024-03-06T02:25:40Z

@sahilsuneja1, thanks! I don't have bandwidth to work on this PR at the moment so if you can create another PR based on your fixes that would be awesome. As you can see the primary issue in the CI testing is that it runs out of memory even with --tensor-paralle-size 2 on our CI machine (L4x2), this is quite weird because I was able to run the test locally on the same hardware. If you can take over and iterate that would be appreciated!

sahilsuneja1 · 2024-03-06T19:48:09Z

Got it, thanks @simon-mo, will do.
Where do you see the memory error? Probably in the run prior to the latest merge? How can I see that OOM run log?

sahilsuneja1 · 2024-03-06T19:57:37Z

Found the run history: https://buildkite.com/vllm/ci/builds?branch=simon-mo%3Amodels-snapshot-testing
Where's the OOM error?

simon-mo · 2024-03-16T05:43:29Z

Actually I'm going to close this PR because I think snapshot testing is still too brittle (need to update it on model by model basis). We should just get A100 machine to run these tests.

sahilsuneja1 · 2024-03-18T14:14:12Z

OK, I'll close the update PR as well

simon-mo · 2024-03-18T20:19:07Z

sorry about the back and forth!

sahilsuneja1 · 2024-03-18T21:19:32Z

No worries!

simon-mo added 5 commits February 13, 2024 02:47

add initial code for generating snapshots

98c230b

fix

97ee3ad

add multiple prompts

a27419b

add snapshot tests

31e6ea5

fix jinja

2ad41f5

simon-mo mentioned this pull request Feb 13, 2024

Refactor llama family models #2637

Merged

12 tasks

simon-mo added 17 commits February 13, 2024 08:29

escape

a0bc4b3

fix

48d6455

fix

677fb0f

fix typo

a66c5e1

use 2 gpus

45d362f

Merge branch 'main' of github.com:vllm-project/vllm into models-snaps…

4b5bc55

…hot-testing

verify memory is freed

5ce60e9

reverse order

63b7ab7

Merge branch 'main' of github.com:vllm-project/vllm into models-snaps…

dfe98bd

…hot-testing

fix

3f959c7

debug

6470996

debug

fc51549

debug

fe8374a

fix

037ace8

remove hack

20959c5

Merge branch 'main' of github.com:vllm-project/vllm into models-snaps…

b821e46

…hot-testing

Merge branch 'main' of github.com:vllm-project/vllm into models-snaps…

41f5202

…hot-testing

simon-mo closed this Mar 16, 2024

sahilsuneja1 mentioned this pull request Mar 18, 2024

updating model snapshot testing simon-mo/vllm#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Perform snapshot based model result test #2844

[CI] Perform snapshot based model result test #2844

simon-mo commented Feb 13, 2024 •

edited

Loading

sahilsuneja1 commented Mar 1, 2024

simon-mo commented Mar 6, 2024

sahilsuneja1 commented Mar 6, 2024 •

edited

Loading

sahilsuneja1 commented Mar 6, 2024

simon-mo commented Mar 16, 2024

sahilsuneja1 commented Mar 18, 2024

simon-mo commented Mar 18, 2024

sahilsuneja1 commented Mar 18, 2024

[CI] Perform snapshot based model result test #2844

[CI] Perform snapshot based model result test #2844

Conversation

simon-mo commented Feb 13, 2024 • edited Loading

sahilsuneja1 commented Mar 1, 2024

simon-mo commented Mar 6, 2024

sahilsuneja1 commented Mar 6, 2024 • edited Loading

sahilsuneja1 commented Mar 6, 2024

simon-mo commented Mar 16, 2024

sahilsuneja1 commented Mar 18, 2024

simon-mo commented Mar 18, 2024

sahilsuneja1 commented Mar 18, 2024

simon-mo commented Feb 13, 2024 •

edited

Loading

sahilsuneja1 commented Mar 6, 2024 •

edited

Loading