Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why prepending output with prompt? #31

Closed
wants to merge 1 commit into from

Conversation

mkhludnev
Copy link
Contributor

I'd rather created issue for discussion, but this repo doesn't have issues enabled. First of all such prepending seems redundant especially for long RAG prompts. Then, it's an actual problem since I notice that triton gRPC crops the long response. Curiously, REST doesn't crop payload and full concatenation of prompt and output arrives to client.

Here I put more details of the issue langchain-ai/langchain#12474 (comment)

I'd rather created issue for discussion, but this repo doesn't have issues enabled. 
First of all such prepending seems redundant especially for long RAG prompts. Then, it's an actual problem since I notice that triton gRPC crops the long response. Curiously, REST doesn't crop payload and full concatenation of prompt and output arrives to client. 

Here I put more details of the issue langchain-ai/langchain#12474 (comment)
@mkhludnev mkhludnev changed the title Avoid prepending output with prompt Why prepending output with prompt? Feb 6, 2024
@mkhludnev
Copy link
Contributor Author

@oandreeva-nv hello, may I ask you a question? 👆

@oandreeva-nv
Copy link
Collaborator

Hi @mkhludnev, our ReadMe mentions that you can report all issues on issues page, which is for server. Apologies for confusion, we use server repo as the main entry-point for questions & issues.

Regarding your question, I believe the main goal is to report the full text prompt + generated portion. Do you happen to have a reproducer for gRPC cropping the part of the output? If it does, I am concerned that simply removing prompt from the response may not work

@mkhludnev
Copy link
Contributor Author

Thanks for replying @oandreeva-nv

report all issues on issues page, which is for server.

Here we go triton-inference-server/server#6866 Perhaps it's worth to discuss first.

Regarding your question, I believe the main goal is to report the full text prompt + generated portion. Do you happen to have a reproducer for gRPC cropping the part of the output?

Well. I just have triton container, curl, and python app, which logs gRPC output. I put output excerpts into the linked langchain issue above.

If it does, I am concerned that simply removing prompt from the response may not work

Right.

@mkhludnev
Copy link
Contributor Author

Considering triton-inference-server/server#6867 it's worth to implement echo=False option for vlm and trt backends.

@mkhludnev mkhludnev closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants