[Bugfix] Fix divide by zero when serving Mamba models #9617

`num_total_gpu` ends up being 0 for attention-free models, which results in a divide-by-zero in llm_engine.py when running: ``` vllm serve tiiuae/falcon-mamba-7b-instruct ``` We're already guarding against None here so this guards against zero as well. I also tried setting `num_gpu_blocks` to `None` in `determine_num_available_blocks` but a couple of different spots choked on this. Signed-off-by: Tyler Michael Smith <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix divide by zero when serving Mamba models #9617

[Bugfix] Fix divide by zero when serving Mamba models #9617

Commits on Oct 23, 2024