[Model] Support Mamba2 (Codestral Mamba) #9292

tlrmchlsmth · 2024-10-11T17:22:28Z

Add support for Mamba2. Not thoroughly tested yet, but Codestral Mamba has legible outputs.

Todo:

Integration tests
Support Chunked Prefill
Incorporate mamba_chunk_scan_combined kernel to avoid the dependency on mamba_ssm
Fix tensor parallelism
Try to refactor the code for Mamba2's mixer layer to look more like Mamba's

github-actions · 2024-10-11T17:22:40Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

tlrmchlsmth · 2024-10-16T20:44:14Z

Notes on current state:

Now that [Kernel][Model] Improve continuous batching for Jamba and Mamba #9189 has landed, need to update the mamba_chunk_scan_combined to take cache indices, so that this PR will work with the updated MambaCacheManager. Until then this PR is not compatible with current main.
TP does seem to work in the present state however I see bad output when using CUDA graphs + custom_all_reduce

1. Format triton kernels 2. Tweak mamba2.py so models converted using transformers util src/transformers/models/mamba2/convert_mamba2_ssm_checkpoint_to_pytorch.py will run. However they have garbage output.

Initial mamba2 support

09a30d5

tlrmchlsmth added 2 commits October 11, 2024 13:38

Some fixes but TP is broken

ed3cc3a

format

58941dc

mgoin mentioned this pull request Oct 15, 2024

[New Model]: Support Zyphra/Zamba2-7B #9382

Open

1 task

tlrmchlsmth added 2 commits October 15, 2024 16:52

Move ssd_chunk_combined triton kernels into vLLM

0735328

fixups

d2bd1ac

tlrmchlsmth added 2 commits October 17, 2024 13:39

format and small tweaks to mamba2.py

5f7f67d

1. Format triton kernels 2. Tweak mamba2.py so models converted using transformers util src/transformers/models/mamba2/convert_mamba2_ssm_checkpoint_to_pytorch.py will run. However they have garbage output.

Fixes for mamba2-ssm. Need to rethink TP.

552d02a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Support Mamba2 (Codestral Mamba) #9292

[Model] Support Mamba2 (Codestral Mamba) #9292

tlrmchlsmth commented Oct 11, 2024 •

edited

Loading

github-actions bot commented Oct 11, 2024

tlrmchlsmth commented Oct 16, 2024

[Model] Support Mamba2 (Codestral Mamba) #9292

Are you sure you want to change the base?

[Model] Support Mamba2 (Codestral Mamba) #9292

Conversation

tlrmchlsmth commented Oct 11, 2024 • edited Loading

github-actions bot commented Oct 11, 2024

tlrmchlsmth commented Oct 16, 2024

tlrmchlsmth commented Oct 11, 2024 •

edited

Loading