You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can disable the sliding window by using --disable-sliding-window . For mistral, as you've done, you'll need to restrict the model to a context window of 4096 tokens to do this.
@robertgshaw2-neuralmagic considering that prefix caching is by definition focusing the early portion of the prompt whereas the sliding window in mistral only kicks in after 4096 tokens, do you think it might be possible to enable a prefix cache that only looked at the first 4096 tokens of a prompt so there wasn't a clash? That would be the best of both worlds here.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
raises
NotImplementedError: Sliding window is not allowed with prefix caching enabled!
Is there a way to turn off sliding window and keep prefix caching?
(More generally is there a list of commands to serve common models efficiently?)
The text was updated successfully, but these errors were encountered: