Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genai/optimum support streaming output #1290

Merged
merged 7 commits into from
Dec 5, 2024

Conversation

zhaohb
Copy link
Contributor

@zhaohb zhaohb commented Dec 4, 2024

Support chunk streaming mode, mainly to reduce the number of decode calls, thereby improving performance

    Support chunk streaming mode, mainly to reduce the number of decode calls, thereby improving performance
@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Dec 4, 2024
@zhaohb zhaohb changed the title Genai/optimum supports streaming output Genai/optimum support streaming output Dec 4, 2024
@zhaohb
Copy link
Contributor Author

zhaohb commented Dec 5, 2024

@eaidova @as-suvorov
Could you please review this PR quickly? Thank you very much.

  • First, that pr supports streaming mode, and secondly, it supports chunk streaming in streaming mode.
  • Chunk streaming reduces the number of decode calls.
  • We tested the glm4v-nano model using igpu, chunk streaming can bring about a 30% performance improvement in token rate in streaming mode.

@eaidova
Copy link
Collaborator

eaidova commented Dec 5, 2024

@zhaohb looks like it breaks some other supported use cases, could you please fix:

[ INFO ] Traceback (most recent call last):
  File "/home/runner/work/openvino.genai/openvino.genai/./tools/llm_bench/benchmark.py", line 226, in main
    iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
TypeError: run_image_generation_benchmark() takes 6 positional arguments but 8 were given

@zhaohb
Copy link
Contributor Author

zhaohb commented Dec 5, 2024

@zhaohb looks like it breaks some other supported use cases, could you please fix:

[ INFO ] Traceback (most recent call last):
  File "/home/runner/work/openvino.genai/openvino.genai/./tools/llm_bench/benchmark.py", line 226, in main
    iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
TypeError: run_image_generation_benchmark() takes 6 positional arguments but 8 were given

@eaidova thank you very much, I had fix it.

@eaidova eaidova enabled auto-merge December 5, 2024 06:41
@sammysun0711
Copy link
Collaborator

build_jenkins

@eaidova eaidova added this pull request to the merge queue Dec 5, 2024
@andrei-kochin andrei-kochin added the port to LTS PR needs to be ported to LTS label Dec 5, 2024
@andrei-kochin andrei-kochin added this to the 2025.0 milestone Dec 5, 2024
Merged via the queue into openvinotoolkit:master with commit d294db9 Dec 5, 2024
55 checks passed
@ilya-lavrenov ilya-lavrenov removed the port to LTS PR needs to be ported to LTS label Dec 12, 2024
sungeunk pushed a commit to sungeunk/openvino.genai that referenced this pull request Dec 16, 2024
Support chunk streaming mode, mainly to reduce the number of decode
calls, thereby improving performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants