Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于TensorRT-LLM的Yuan 2.0推理服务2B-hf版编译试用问题 #127

Open
18842685792 opened this issue Mar 11, 2024 · 2 comments
Open

Comments

@18842685792
Copy link

1.build时调整脚本input和output参数为4096会build失败,这个上限值是多少
2.build时调整脚本output为2048可以build成功,但是实际推理结果跟output默认512基本一样,没有改变,是否是有多个参数需要联合调整才能生效
image
3.启动tritonserver服务时,设置跳过符号未生效
image
image

@IEI-mjx
Copy link
Contributor

IEI-mjx commented Mar 12, 2024

1.这个上限值是跟你显卡的显存有关,显存越高这个上限越高(我这里A800的显卡上限值是8192)
2.推理过程的输出token数跟设置的“--max_output_len”有关,请参考README_Yuan.md设置此参数

@zhaoxudong01
Copy link
Collaborator

zhaoxudong01 commented Mar 21, 2024

推理服务,发送请求的时候,是否有指定"end_id": 77185?我们测试是可以正常提前结束的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants