chatglm output results are repeating with basic prompts #527

avinashbhat09 · 2024-06-18T14:47:01Z

Context

We see that basic chatglm output generated words are repetitive.

What needs to be done?

We need to figure out if this is due to weight compression or model issue

Example Pull Requests

No response

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
How to link your Pull Request to an issue

Contact points

@avinashbhat09

Ticket

No response

avinashbhat09 · 2024-06-21T06:29:03Z

Tried the same OV converted model using chatglm-openvino (https://github.com/OpenVINO-dev-contest/chatglm3.openvino) and it works fine. We dont see any repetitive words.

With this we can conclude below:

No issue running inference on CPU or GPU
Not a model issue
No quantization issue

This looks more of gen-ai interfacing with the model.

avinashbhat09 · 2024-06-21T06:30:21Z

Can we add chatbot kind of implementation same as chatglm-openvino into the gen-ai to support chatglm?

avinashbhat09 · 2024-06-24T04:58:23Z

Hi, any update on this? @Wovchena

Wovchena · 2024-06-24T11:21:09Z

Hi. I don't have any update. @peterchen-intel is the correct person to discuss llm_bench related questions with. As for the chatbot kind of implementation, the sample is here https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample.

avinashbhat09 · 2024-06-27T06:22:47Z

Thanks @Wovchena . Unfortunately, the chat sample does not work for me.

avinashbhat09 · 2024-06-27T06:23:46Z

Hi. I don't have any update. @peterchen-intel is the correct person to discuss llm_bench related questions with. As for the chatbot kind of implementation, the sample is here https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample.

@peterchen-intel : Any input from your side?

Wovchena · 2024-06-27T08:56:50Z

Your model is stateless. You need a stateful one. To export such a model, ensure you don't have --disable-stateful while running optimum-cli export openvino. Alternatively, if you use python ./llm_bench/python/convert.py, you need to specify --stateful (and not --disable-stateful).

avinashbhat09 · 2024-06-27T14:22:49Z

Thanks for your input @Wovchena . I converted using this command and it worked.
optimum-cli export openvino --trust-remote-code --model THUDM/chatglm3-6b chatglm3-6b_stateful --task question-answering

So this confirms that we dont have issues with model or quantization. Coming back to original bug- when we use benchmark.py why do we see the answers repeating? Anything can be done to fix that? For our validation get the metrices printed for each response is important (like token/sec, first token latency etc). Which is currently not available in chat_sample.

peterchen-intel · 2024-07-03T09:38:02Z

It may due to benchmark.py force to output "--infer_count" tokens for performance consistency (with fixed output size instead of stop at end_token). Will add an option to stop_ending_token.

peterchen-intel · 2024-07-08T13:35:35Z

CVS-146307

peterchen-intel · 2024-07-15T08:55:53Z

#606

peterchen-intel · 2024-07-16T09:30:44Z

@avinashbhat09 Can you try HEAD of openvino.genai master branch with option --end_token_stopping?

avinashbhat09 · 2024-07-24T17:04:56Z

@peterchen-intel : After rebasing to latest head (commit id 42dd049 ) and adding --end_token_stopping I see this-

command: python benchmark.py -m C:\temp\chatglm3-6b\chatglm3-6b\pytorch\dldt\compressed_weights\OV_FP16-INT4_SYM -d GPU -r llama_report.csv -n 2 -ic 128 --end_token_stopping -pf 1k_pmpt.jsonl

peterchen-intel · 2024-07-30T03:36:08Z

Link CVS-146307

peterchen-intel · 2024-08-02T09:30:58Z

In some cases, we need to fine tune the prompt to get expected size of output tokens for LLM benchmarking. In order to avoid the fine tune for each model, we set end_token_stopping=false by default to force generating expected size of output tokens. The side effect is that the output looks not good, repeating is one of the cases. It is really a trade-off. The bad outputs doesn't mean accuracy issue, accuracy issue should be tested by accuracy tool, benchmarking tool can't cover it.

avinashbhat09 added the good first issue Good for newcomers label Jun 18, 2024

github-project-automation bot added this to Good first issues Jun 18, 2024

github-project-automation bot moved this to Contributors Needed in Good first issues Jun 18, 2024

eaidova removed the good first issue Good for newcomers label Jun 24, 2024

eaidova changed the title ~~[Good First Issue]: Chatglm output results are repeating with basic prompts~~ chatglm output results are repeating with basic prompts Jun 24, 2024

andrei-kochin removed this from Good first issues Jun 26, 2024

peterchen-intel self-assigned this Jul 3, 2024

peterchen-intel assigned avinashbhat09 Jul 23, 2024

peterchen-intel closed this as completed Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chatglm output results are repeating with basic prompts #527

chatglm output results are repeating with basic prompts #527

avinashbhat09 commented Jun 18, 2024

avinashbhat09 commented Jun 21, 2024 •

edited

Loading

avinashbhat09 commented Jun 21, 2024 •

edited

Loading

avinashbhat09 commented Jun 24, 2024 •

edited

Loading

Wovchena commented Jun 24, 2024

avinashbhat09 commented Jun 27, 2024

avinashbhat09 commented Jun 27, 2024

Wovchena commented Jun 27, 2024

avinashbhat09 commented Jun 27, 2024 •

edited

Loading

peterchen-intel commented Jul 3, 2024

peterchen-intel commented Jul 8, 2024

peterchen-intel commented Jul 15, 2024

peterchen-intel commented Jul 16, 2024

avinashbhat09 commented Jul 24, 2024 •

edited

Loading

peterchen-intel commented Jul 30, 2024

peterchen-intel commented Aug 2, 2024

chatglm output results are repeating with basic prompts #527

chatglm output results are repeating with basic prompts #527

Comments

avinashbhat09 commented Jun 18, 2024

Context

What needs to be done?

Example Pull Requests

Resources

Contact points

Ticket

avinashbhat09 commented Jun 21, 2024 • edited Loading

avinashbhat09 commented Jun 21, 2024 • edited Loading

avinashbhat09 commented Jun 24, 2024 • edited Loading

Wovchena commented Jun 24, 2024

avinashbhat09 commented Jun 27, 2024

avinashbhat09 commented Jun 27, 2024

Wovchena commented Jun 27, 2024

avinashbhat09 commented Jun 27, 2024 • edited Loading

peterchen-intel commented Jul 3, 2024

peterchen-intel commented Jul 8, 2024

peterchen-intel commented Jul 15, 2024

peterchen-intel commented Jul 16, 2024

avinashbhat09 commented Jul 24, 2024 • edited Loading

peterchen-intel commented Jul 30, 2024

peterchen-intel commented Aug 2, 2024

avinashbhat09 commented Jun 21, 2024 •

edited

Loading

avinashbhat09 commented Jun 21, 2024 •

edited

Loading

avinashbhat09 commented Jun 24, 2024 •

edited

Loading

avinashbhat09 commented Jun 27, 2024 •

edited

Loading

avinashbhat09 commented Jul 24, 2024 •

edited

Loading