[Feature]: CI - Split up "Models Test" and "Vision Language Models Test" #7439

jon-chuang · 2024-08-12T20:01:31Z

🚀 The feature, motivation and pitch

Takes 1 hour+ on CI compared to others, which take <~30 min. Thus, ends up being a bottleneck

So, should be split up similar to kernels

youkaichao · 2024-08-21T06:29:49Z

root for this!

DarkLight1337 · 2024-08-21T06:38:25Z

This sounds great! However, we should still avoid regressions. I suggest testing a representative subset per PR and perform full testing once in a while.

Any ideas on which models we should always test?

DarkLight1337 · 2024-08-21T08:09:08Z

A more concrete plan:

Let's add a pytest marker core_model to explicitly tag model tests to be tested in each PR. By default, unmarked model tests will only be tested daily.

Separately, we can organize the models tests as follows:

models/embedding/language
models/encoder_decoder/language
models/decoder_only/language
models/decoder_only/vision
models/decoder_only/audio (once [Model] Add UltravoxModel and UltravoxConfig #7615 is merged)

The first three are currently included in Models test while the fourth one is currently in VLM test. They are split apart further so that model PRs can easily run only the tests that are relevant to their model.

(By the way, the models directory in main vLLM is pretty cluttered... perhaps we can also organize the model files as above?)

In the normal CI, we can add the pytest flag -m core_model so that only the tagged tests are run. The full CI (run daily) omits this flag to run all tests.

Regarding which model tests to mark as core_model:

We should select one model per family of architectures.
For multi-modal models, we should consider one model with multi-modal encoder defined by Transformers, and another one that is defined by ourselves.
There is no need to consider LoRA, TP/PP and quantization behavior since their basic implementations are covered by other tests already. The full CI should be able to catch any regressions of those features for individual models.

jon-chuang · 2024-08-21T20:49:40Z

Hmm, it sounds unreliable to ask users to run specified model tests... in pytorch for instance it's mandatory to run quite comprehensive test suite in every PR

I think that changing this behaviour will potentially introduce uncaught bugs.

Separately, we can organize the models tests as follows:

models/embedding/language
models/encoder_decoder/language
models/decoder_only/language
models/decoder_only/vision
models/decoder_only/audio (once #7615 is merged)

I think that first step of splitting up like this is a good idea.

DarkLight1337 · 2024-09-02T06:17:27Z

After some offline discussion, we have decided to cut down on the number of models to regularly test in CI. This is due to the following reasons:

Most of the model tests aim to test the correctness of the model implementation. However, there is no need to repeatedly test this unless the model file itself is updated. Instead, we can cover the code paths in model loader, layers, executor and runner (which are shared between models) using a relatively small number of tests.
Some of the model PRs are from the same organization who originally developed the model. In this case, we trust the authors to implement their own model correctly.
With the growing number of models supported by vLLM, it becomes increasingly expensive to test them all in the long run.

For encoder-decoder and embedding models, we will preserve all tests since there are only a few tests involved.

For multimodal models, @ywang96 and I have decided to only test the following models regularly in CI:

LLaVA w/ our implementation of CLIPEncoder
PaliGemma w/ our implementation of SiglipEncoder
Ultravox w/ HuggingFace's implementation of WhisperEncoder
(A model that supports video input, once it's implemented in vLLM. Qwen2-VL will add M-ROPE so we should test that)

Meanwhile, @simon-mo will help narrow down the list of decoder-only language models to test.

Note that this does not remove the need to test new models. New model PRs should still implement tests so we can verify their implementation, as well as re-run those tests whenever a future PR updates related code. Until @khluu figures out how to create a separate test item for each model without clogging up the web UI, these tests will remain excluded from CI; instead, we can run these tests locally as necessary.

DarkLight1337 · 2024-10-29T17:40:25Z

Closing as completed by #9488. VLM tests will be further split up after #9372.

jon-chuang added the feature request label Aug 12, 2024

DarkLight1337 mentioned this issue Aug 21, 2024

[Bugfix][Hardware][CPU] Fix mm_limits initialization for CPU backend #7735

Merged

DarkLight1337 mentioned this issue Aug 23, 2024

[CI/Build] Reorganize models tests #7820

Merged

DarkLight1337 mentioned this issue Sep 2, 2024

[CI/Build] Update CPU tests to include all "standard" tests #5481

Merged

DarkLight1337 closed this as completed Oct 29, 2024

DarkLight1337 mentioned this issue Oct 31, 2024

[CI/Build] Add Model Tests for Qwen2-VL #9846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: CI - Split up "Models Test" and "Vision Language Models Test" #7439

[Feature]: CI - Split up "Models Test" and "Vision Language Models Test" #7439

jon-chuang commented Aug 12, 2024 •

edited

Loading

youkaichao commented Aug 21, 2024

DarkLight1337 commented Aug 21, 2024

DarkLight1337 commented Aug 21, 2024

jon-chuang commented Aug 21, 2024

DarkLight1337 commented Sep 2, 2024 •

edited

Loading

DarkLight1337 commented Oct 29, 2024

[Feature]: CI - Split up "Models Test" and "Vision Language Models Test" #7439

[Feature]: CI - Split up "Models Test" and "Vision Language Models Test" #7439

Comments

jon-chuang commented Aug 12, 2024 • edited Loading

🚀 The feature, motivation and pitch

youkaichao commented Aug 21, 2024

DarkLight1337 commented Aug 21, 2024

DarkLight1337 commented Aug 21, 2024

jon-chuang commented Aug 21, 2024

DarkLight1337 commented Sep 2, 2024 • edited Loading

DarkLight1337 commented Oct 29, 2024

jon-chuang commented Aug 12, 2024 •

edited

Loading

DarkLight1337 commented Sep 2, 2024 •

edited

Loading