Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you provide a sample script to start the model with openai access #1

Closed
garyyang85 opened this issue May 7, 2024 · 4 comments

Comments

@garyyang85
Copy link

No description provided.

@mayank31398
Copy link
Member

mayank31398 commented May 7, 2024

HI @garyyang85 , having trouble understanding the request.
Can you clarify?
There is an example in the README.md

you will need to install transformers from source for this though.

cross-pasting here as well:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # or "cpu"
model_path = "ibm-granite/granite-3b-code-base" # pick anyone from above list

tokenizer = AutoTokenizer.from_pretrained(model_path)

# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

# change input text as desired
input_text = "def generate():"
# tokenize the text
input_tokens = tokenizer(input_text, return_tensors="pt")

# transfer tokenized inputs to the device
for i in input_tokens:
    input_tokens[i] = input_tokens[i].to(device)

# generate output tokens
output = model.generate(**input_tokens)
# decode output tokens into text
output = tokenizer.batch_decode(output)

# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)

@garyyang85
Copy link
Author

garyyang85 commented May 9, 2024

Hi @mayank31398 Thanks for your response.
Popular models may included in fastchat. I mean something like the api server in fastchat. It will load model only once and then accept standard openai request, return the stream answer. Like this: https://github.com/baichuan-inc/Baichuan2/blob/main/OpenAI_api.py
For test purpose, maybe Flask is enough. Of course I can follow the sample code to build one myself. :)

@mayank31398
Copy link
Member

@garyyang85 I see
currently, there is VLLM integration underway: vllm-project/vllm#4636

@mayank31398
Copy link
Member

@garyyang85 vllm-project/vllm#4636 is merged now.
closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants