-
Notifications
You must be signed in to change notification settings - Fork 1
Issues: tenstorrent/tt-inference-server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
vLLM run script prefill + decode trace pre-capture to avoid TTFT on first completions being unexpectedly high or stalling
enhancement
New feature or request
#56
opened Dec 12, 2024 by
tstescoTT
Missing
--max_prompt_length
argument running example_requests_client_alpaca_eval.py
#51
opened Dec 2, 2024 by
milank94
Initial vLLM setup fails due to missing HuggingFace permissions
bug
Something isn't working
#37
opened Nov 15, 2024 by
milank94
Provide example chat template usage
documentation
Improvements or additions to documentation
enhancement
New feature or request
#36
opened Nov 15, 2024 by
tstescoTT
Docker run support for HF_TOKEN authentication using env var pass in
enhancement
New feature or request
#23
opened Oct 24, 2024 by
tstescoTT
Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle.
enhancement
New feature or request
#17
opened Sep 26, 2024 by
tstescoTT
llama model install script support for llama CLI and huggingface hub
enhancement
New feature or request
#14
opened Sep 26, 2024 by
tstescoTT
Capture tt-metal and tt-NN loguru logs in inference server python log files
enhancement
New feature or request
#13
opened Sep 25, 2024 by
tstescoTT
ProTip!
Add no:assignee to see everything that’s not assigned.