Fix assertion failure in Qwen 1.5 with prefix caching enabled #3373

chenxu2048 · 2024-03-13T07:45:10Z

Prefix caching is currently not supported with sliding window attention, and model runner asserts model_config.get_sliding_window() is None.

However, Qwen2/Qwen1.5 use use_sliding_window to turn off sliding window, which disables sliding window by default but model_config.get_sliding_window() is not None.

cadedaniel

Thanks for the PR!

vllm/config.py

Co-authored-by: Cade Daniel <[email protected]>

cadedaniel

Thanks!

…roject#3373) Co-authored-by: Cade Daniel <[email protected]>

fix: Qwen1.5 with prefix cache

27881c8

cadedaniel reviewed Mar 13, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

cadedaniel self-assigned this Mar 13, 2024

cadedaniel mentioned this pull request Mar 13, 2024

AssertionError: Prefix caching is currently not supported with sliding window attention #3355

Closed

chenxu2048 added 4 commits March 14, 2024 11:29

Add unittest for get_sliding_window()

9789534

format code

1cdb0cf

Merge remote-tracking branch 'origin/main' into qwen_1_5_sliding_window

179a978

revert format line break

911f914

cadedaniel reviewed Mar 14, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

chenxu2048 and others added 2 commits March 14, 2024 14:52

Update vllm/config.py

c9aca48

Co-authored-by: Cade Daniel <[email protected]>

chore: format code

31f7368

cadedaniel approved these changes Mar 14, 2024

View reviewed changes

simon-mo approved these changes Mar 14, 2024

View reviewed changes

simon-mo merged commit 54be8a0 into vllm-project:main Mar 14, 2024
22 of 24 checks passed

This was referenced Mar 15, 2024

Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 #3377

Closed

[Testing] Add test_config.py to CI #3437

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Fix assertion failure in Qwen 1.5 with prefix caching enabled (vllm-p…

288df34

…roject#3373) Co-authored-by: Cade Daniel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix assertion failure in Qwen 1.5 with prefix caching enabled #3373

Fix assertion failure in Qwen 1.5 with prefix caching enabled #3373

chenxu2048 commented Mar 13, 2024

cadedaniel left a comment

cadedaniel left a comment

Fix assertion failure in Qwen 1.5 with prefix caching enabled #3373

Fix assertion failure in Qwen 1.5 with prefix caching enabled #3373

Conversation

chenxu2048 commented Mar 13, 2024

cadedaniel left a comment

Choose a reason for hiding this comment

cadedaniel left a comment

Choose a reason for hiding this comment