Why prepending output with prompt? #31

mkhludnev · 2024-02-05T07:11:47Z

I'd rather created issue for discussion, but this repo doesn't have issues enabled. First of all such prepending seems redundant especially for long RAG prompts. Then, it's an actual problem since I notice that triton gRPC crops the long response. Curiously, REST doesn't crop payload and full concatenation of prompt and output arrives to client.

Here I put more details of the issue langchain-ai/langchain#12474 (comment)

I'd rather created issue for discussion, but this repo doesn't have issues enabled. First of all such prepending seems redundant especially for long RAG prompts. Then, it's an actual problem since I notice that triton gRPC crops the long response. Curiously, REST doesn't crop payload and full concatenation of prompt and output arrives to client. Here I put more details of the issue langchain-ai/langchain#12474 (comment)

mkhludnev · 2024-02-06T07:25:24Z

@oandreeva-nv hello, may I ask you a question? 👆

oandreeva-nv · 2024-02-06T17:02:08Z

Hi @mkhludnev, our ReadMe mentions that you can report all issues on issues page, which is for server. Apologies for confusion, we use server repo as the main entry-point for questions & issues.

Regarding your question, I believe the main goal is to report the full text prompt + generated portion. Do you happen to have a reproducer for gRPC cropping the part of the output? If it does, I am concerned that simply removing prompt from the response may not work

mkhludnev · 2024-02-06T18:30:27Z

Thanks for replying @oandreeva-nv

report all issues on issues page, which is for server.

Here we go triton-inference-server/server#6866 Perhaps it's worth to discuss first.

Regarding your question, I believe the main goal is to report the full text prompt + generated portion. Do you happen to have a reproducer for gRPC cropping the part of the output?

Well. I just have triton container, curl, and python app, which logs gRPC output. I put output excerpts into the linked langchain issue above.

If it does, I am concerned that simply removing prompt from the response may not work

Right.

mkhludnev · 2024-02-07T09:21:38Z

Considering triton-inference-server/server#6867 it's worth to implement echo=False option for vlm and trt backends.

mkhludnev mentioned this pull request Feb 5, 2024

gRPC response is cropped, where REST /generate fully sent triton-inference-server/server#6864

Closed

mkhludnev changed the title ~~Avoid prepending output with prompt~~ Why prepending output with prompt? Feb 6, 2024

mkhludnev closed this Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why prepending output with prompt? #31

Why prepending output with prompt? #31

mkhludnev commented Feb 5, 2024

mkhludnev commented Feb 6, 2024

oandreeva-nv commented Feb 6, 2024

mkhludnev commented Feb 6, 2024

mkhludnev commented Feb 7, 2024

Why prepending output with prompt? #31

Why prepending output with prompt? #31

Conversation

mkhludnev commented Feb 5, 2024

mkhludnev commented Feb 6, 2024

oandreeva-nv commented Feb 6, 2024

mkhludnev commented Feb 6, 2024

mkhludnev commented Feb 7, 2024