Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-lora documentation fix #3064

Merged
merged 5 commits into from
Feb 28, 2024
Merged

Conversation

ElefHead
Copy link
Contributor

@ElefHead ElefHead commented Feb 27, 2024

Minor fixes to the documentation for open-ai compatible server.

@ElefHead ElefHead changed the title fix: multi-lora with sample api-server multi-lora with sample api-server Feb 27, 2024
@simon-mo
Copy link
Collaborator

Hi @ElefHead, thank you for your PR, however, at the beginning of the api_server.py, we noted:

"""
NOTE: This API server is used only for demonstrating usage of AsyncEngine and simple performance benchmarks.
It is not intended for production use. For production use, we recommend using our OpenAI compatible server.
We are also not going to accept PRs modifying this file, please change `vllm/entrypoints/openai/api_server.py` instead.
"""

Please revert the change in api_server, we are happy to accept the documentation change!

@ElefHead
Copy link
Contributor Author

@simon-mo Thanks for letting me know, i missed it. I reverted changes and made small addition to the docs. Cheers!

@ElefHead ElefHead changed the title multi-lora with sample api-server multi-lora documentation fix Feb 27, 2024
@findalexli
Copy link

Thanks! Was just confused between using vllm.entrypoints.openai.api_server and vllm.entrypoints.api_server

@simon-mo simon-mo merged commit a868310 into vllm-project:main Feb 28, 2024
22 checks passed
@ElefHead ElefHead deleted the gj/multi-lora-server branch February 28, 2024 05:27
@sleepwalker2017
Copy link

sleepwalker2017 commented Mar 4, 2024

Hello, why only support s-lora in openai api?

Any plans to support it in vllm.entrypoints.api_server?Thank you. @ElefHead

I mentioned it here:
#3174

Seems the implementation is not completed. There is no lora information in the generate server api.

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
@simon-mo
Copy link
Collaborator

simon-mo commented Mar 4, 2024

The generate server API is not recommended for production usage. We kept it not to break existing usage. The OpenAI server provide a superset of functionality and uses the same engine under the hood.

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants