Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best server cmd for mistralai/Mistral-7B-v0.1 #3781

Open
sshleifer opened this issue Apr 1, 2024 · 3 comments
Open

Best server cmd for mistralai/Mistral-7B-v0.1 #3781

sshleifer opened this issue Apr 1, 2024 · 3 comments
Labels
stale usage How to use vllm

Comments

@sshleifer
Copy link

sshleifer commented Apr 1, 2024

export MODEL=mistralai/Mistral-7B-v0.1
python3 -m vllm.entrypoints.openai.api_server --model $MODEL \
    --tensor-parallel-size=1 \
    --enable-prefix-caching --max-model-len=4096 --trust-remote-code | tee server_mistral.log &

raises NotImplementedError: Sliding window is not allowed with prefix caching enabled!

Is there a way to turn off sliding window and keep prefix caching?

(More generally is there a list of commands to serve common models efficiently?)

@sshleifer sshleifer added the usage How to use vllm label Apr 1, 2024
@robertgshaw2-neuralmagic
Copy link
Collaborator

I do not believe there is currently a way to disable sliding window, but I think this is something we should add

@sshleifer sshleifer changed the title Best server cmd for Best server cmd for mistralai/Mistral-7B-v0.1 Apr 1, 2024
@ssmi153
Copy link

ssmi153 commented Jul 11, 2024

You can disable the sliding window by using --disable-sliding-window . For mistral, as you've done, you'll need to restrict the model to a context window of 4096 tokens to do this.

@robertgshaw2-neuralmagic considering that prefix caching is by definition focusing the early portion of the prompt whereas the sliding window in mistral only kicks in after 4096 tokens, do you think it might be possible to enable a prefix cache that only looked at the first 4096 tokens of a prompt so there wasn't a clash? That would be the best of both worlds here.

Copy link

github-actions bot commented Nov 9, 2024

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale usage How to use vllm
Projects
None yet
Development

No branches or pull requests

3 participants