-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update IPEX_LLM_PERFORMANCE_MODE
with input length threshold
#11908
Update IPEX_LLM_PERFORMANCE_MODE
with input length threshold
#11908
Conversation
IPEX_LLM_PERFORMANCE_MODE
with input length threshold
if inputs_embeds.shape[1] >= PERFORMANCE_MODE_LOOKUP_INPUT_THRESHOLD: | ||
lookahead = 2 # default to 2 now | ||
else: | ||
lookahead = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set lookahead=0
when meeting threshold but not directly use original generate function for performance mode benchmarking purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May have some further verification whether this way will bring too much overhead.
else: | ||
lookahead = 0 | ||
use_update_candidate_strategy = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just set lookahead to None for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated :) But then the all-in-one benchmark may not work for now with input_len < 100 for performance mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merge for now :) Will try to update the all-in-one benchmark tool in another PR accordingly next Monday. |
Description
Update IPEX_LLM_PERFORMANCE_MODE with input length threshold.