Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM: Add length check for IPEX-CPU speculative decoding #10529

Merged
merged 3 commits into from
Mar 26, 2024

Conversation

xiangyuT
Copy link
Contributor

Description

Add length check for IPEX-CPU speculative decoding. For length < 256 prompts will use original_generate() method.

@@ -53,6 +53,28 @@ def generate(
**kwargs,
):
if hasattr(self, "draft_model"):
from ipex_llm.llm.transformers.convert import get_enable_ipex
_enable_ipex = get_enable_ipex()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add CPU device check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_enable_ipex is CPU only. Others LGTM

Copy link
Contributor

@qiyuangong qiyuangong Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jason-dai
Any comment on this PR

Copy link
Contributor

@qiyuangong qiyuangong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiangyuT xiangyuT marked this pull request as ready for review March 26, 2024 09:46
@xiangyuT xiangyuT merged commit 11550d3 into intel-analytics:main Mar 26, 2024
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants