Support finishing PP inference once `eos_token_id` is found #11336

plusbang · 2024-06-17T08:40:22Z

Description

Support finishing PP inference once eos_token_id is found
For example, qwen1.5-32b-chat & int4 & 1k-128 output is as following
Update qwen1.5-32b-chat in verified model list

plusbang requested a review from sgwhat June 17, 2024 08:41

sgwhat approved these changes Jun 17, 2024

View reviewed changes

fix

cad798c

plusbang merged commit e50c890 into intel-analytics:main Jun 18, 2024
31 checks passed