server: process prompt fairly accross slots #6607
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
server/webui
Context
At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.
Proposal: implement a fair batch usage of prompt processing accross all pending slots.
References:
The text was updated successfully, but these errors were encountered: