We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
290537b
非常dirty的实现自回归模型(chat models)异步调度
两倍的延迟实现10%的峰值吞吐提高 效果有点令人扣头
会在 https://github.com/noooop/wde 库进行进一步优化代码,希望能有性能的进一步提升
祝好 light-vllm v0.3.0