Best server cmd for mistralai/Mistral-7B-v0.1 #3781

sshleifer · 2024-04-01T19:57:03Z

export MODEL=mistralai/Mistral-7B-v0.1
python3 -m vllm.entrypoints.openai.api_server --model $MODEL \
    --tensor-parallel-size=1 \
    --enable-prefix-caching --max-model-len=4096 --trust-remote-code | tee server_mistral.log &

raises NotImplementedError: Sliding window is not allowed with prefix caching enabled!

Is there a way to turn off sliding window and keep prefix caching?

(More generally is there a list of commands to serve common models efficiently?)

The text was updated successfully, but these errors were encountered:

robertgshaw2-neuralmagic · 2024-04-01T20:24:26Z

I do not believe there is currently a way to disable sliding window, but I think this is something we should add

ssmi153 · 2024-07-11T10:24:05Z

You can disable the sliding window by using --disable-sliding-window . For mistral, as you've done, you'll need to restrict the model to a context window of 4096 tokens to do this.

@robertgshaw2-neuralmagic considering that prefix caching is by definition focusing the early portion of the prompt whereas the sliding window in mistral only kicks in after 4096 tokens, do you think it might be possible to enable a prefix cache that only looked at the first 4096 tokens of a prompt so there wasn't a clash? That would be the best of both worlds here.

github-actions · 2024-11-09T01:56:52Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

sshleifer added the usage How to use vllm label Apr 1, 2024

sshleifer changed the title ~~Best server cmd for~~ Best server cmd for mistralai/Mistral-7B-v0.1 Apr 1, 2024

jasonacox mentioned this issue Apr 28, 2024

[Hardware][Nvidia] Enable support for Pascal GPUs #4290

Closed

github-actions bot added the stale label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best server cmd for mistralai/Mistral-7B-v0.1 #3781

Best server cmd for mistralai/Mistral-7B-v0.1 #3781

sshleifer commented Apr 1, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Apr 1, 2024

ssmi153 commented Jul 11, 2024

github-actions bot commented Nov 9, 2024

Best server cmd for mistralai/Mistral-7B-v0.1 #3781

Best server cmd for mistralai/Mistral-7B-v0.1 #3781

Comments

sshleifer commented Apr 1, 2024 • edited Loading

robertgshaw2-neuralmagic commented Apr 1, 2024

ssmi153 commented Jul 11, 2024

github-actions bot commented Nov 9, 2024

sshleifer commented Apr 1, 2024 •

edited

Loading