[Bugfix] fix output parsing error for trtllm backend #4137

elinx · 2024-04-17T07:13:12Z

ywang96

Thank you for testing and fix! Do you mind attaching the output from running the fixed version?

ywang96 · 2024-04-17T08:07:32Z

benchmarks/backend_request_func.py

@@ -149,7 +150,6 @@ async def async_request_trt_llm(
                        most_recent_timestamp = timestamp

                    output.latency = most_recent_timestamp - st
-                    output.generated_text = json.loads(data)["text_output"]


I remember Triton + TRT had an issue where the data['text_output'] contains the cumulative output text instead of the delta, hence initially in this version I only took the last chunk to retrieve generated_text.

Could you verify if this is still the case? (and either way, this should have been data['text_output'] not json.loads(data)["text_output"], so thanks for catching this!)

looks like it has been fixed, the output I got is like:

data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":0.0,"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"a"} data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"city"} data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"in"} data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"China"} data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"."} data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"It"} data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"is"}

elinx · 2024-04-17T10:10:57Z

Thank you for testing and fix! Do you mind attaching the output from running the fixed version?

the final output is something like:

============ Serving Benchmark Result ============
Successful requests:                     1000     
Benchmark duration (s):                  355.37   
Total input tokens:                      248339   
Total generated tokens:                  260518   
Request throughput (req/s):              2.81     
Input token throughput (tok/s):          698.81   
Output token throughput (tok/s):         733.08   
---------------Time to First Token----------------
Mean TTFT (ms):                          213579.15
Median TTFT (ms):                        213038.47
P99 TTFT (ms):                           322949.50
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          55.34    
Median TPOT (ms):                        55.59    
P99 TPOT (ms):                           77.75    
==================================================

Co-authored-by: Roger Wang <[email protected]>

[Bugfix] fix output parsing error for trtllm backend

44f2a40

resolve vllm-project#4163

ywang96 reviewed Apr 17, 2024

View reviewed changes

ywang96 approved these changes Apr 17, 2024

View reviewed changes

Merge branch 'main' into bugfix-trtllm

5c15d2c

ywang96 enabled auto-merge (squash) April 17, 2024 10:19

ywang96 merged commit fe3b5bb into vllm-project:main Apr 17, 2024
46 checks passed

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024

[Bugfix] fix output parsing error for trtllm backend (vllm-project#4137)

23158a9

Co-authored-by: Roger Wang <[email protected]>

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024

[Bugfix] fix output parsing error for trtllm backend (vllm-project#4137)

bc7f15b

Co-authored-by: Roger Wang <[email protected]>

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024

[Bugfix] fix output parsing error for trtllm backend (vllm-project#4137)

ffd9ca8

Co-authored-by: Roger Wang <[email protected]>

alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024

[Bugfix] fix output parsing error for trtllm backend (vllm-project#4137)

68c4a39

Co-authored-by: Roger Wang <[email protected]>

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Bugfix] fix output parsing error for trtllm backend (vllm-project#4137)

6e96ed4

Co-authored-by: Roger Wang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] fix output parsing error for trtllm backend #4137

[Bugfix] fix output parsing error for trtllm backend #4137

elinx commented Apr 17, 2024

ywang96 left a comment •

edited

Loading

ywang96 Apr 17, 2024 •

edited

Loading

elinx Apr 17, 2024

elinx commented Apr 17, 2024

[Bugfix] fix output parsing error for trtllm backend #4137

[Bugfix] fix output parsing error for trtllm backend #4137

Conversation

elinx commented Apr 17, 2024

ywang96 left a comment • edited Loading

Choose a reason for hiding this comment

ywang96 Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

elinx Apr 17, 2024

Choose a reason for hiding this comment

elinx commented Apr 17, 2024

ywang96 left a comment •

edited

Loading

ywang96 Apr 17, 2024 •

edited

Loading