Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Fix getting stared to use publicly available model #3963

Merged
merged 1 commit into from
Apr 10, 2024
Merged

[Doc] Fix getting stared to use publicly available model #3963

merged 1 commit into from
Apr 10, 2024

Conversation

fpaupier
Copy link
Contributor

Previous model meta/llama-2 required special access to the model card on hugging face leading to an exception when starting the reference example.

The access error with meta-llama/Llama-2-7b-hf was:

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Llama-2-7b-hf.
401 Client Error. (Request ID: Root=1-66163835-71dc8d373b3129360016da39;ee08d21c-507d-43e7-b66b-04e9b45563ca)

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/config.json.
Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it.

The new model proposed for the getting started - mistralai/Mistral-7B-Instruct-v0.2 - is publicly available and runs on commodity hardware. The getting started proposed in the vLLM Quickstart now works.

Also update dead link

Previous model `meta/llama-2` required special access to the model card on hugging face leading to an exception when starting the reference example.

The new model proposed for the getting started -mistralai/Mistral-7B-Instruct-v0.2- is publicly available and runs on commodity hardware

Also update dead link
@@ -4,7 +4,7 @@ vLLM provides an HTTP server that implements OpenAI's [Completions](https://plat

You can start the server using Python, or using [Docker](deploying_with_docker.rst):
```bash
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-hf --dtype float32 --api-key token-abc123
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.2 --dtype auto --api-key token-abc123
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's remove --api-key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep the api key arg. Its a low overhead way to ensure basics of access control, especially given the prive of GPU. Having the api key in the QuickStart promotes security best practices (so many quick start configs end up being the production ones..) let's make this one right :)

@simon-mo simon-mo enabled auto-merge (squash) April 10, 2024 18:01
messages=[
{"role": "system", "content": "You are a helpful assistant."},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

@simon-mo simon-mo merged commit 92cd2e2 into vllm-project:main Apr 10, 2024
35 checks passed
SageMoore pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 11, 2024
andy-neuma pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 12, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants