Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Reuse prefill of acc lib for pipeline #12279

Merged
merged 13 commits into from
Oct 28, 2024

Conversation

rnwang04
Copy link
Contributor

@rnwang04 rnwang04 commented Oct 28, 2024

Description

1. Why the change?

Accelerate prefill of NPU pipeline.
Work with https://github.com/intel-analytics/llm.cpp/pull/598

2. User API changes

See example.

3. Summary of the change

  • Reuse prefill of acc lib for pipeline
  • Modify generate function for corresponding adaptation
  • Update embedding weight from input to const
  • Update examples
  • change max-output-len to max-context-len

4. How to test?

  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • Application test

5. Demo output

image

@rnwang04 rnwang04 requested a review from jason-dai October 28, 2024 01:20
@rnwang04
Copy link
Contributor Author

@rnwang04 rnwang04 force-pushed the first_token_support branch from e40bb1a to 8f749b8 Compare October 28, 2024 01:57
@rnwang04
Copy link
Contributor Author

Performance validation is updated here : https://github.com/analytics-zoo/nano/issues/1687#issuecomment-2440336530

@rnwang04 rnwang04 requested a review from jason-dai October 28, 2024 07:41
Copy link
Contributor

@jason-dai jason-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rnwang04 rnwang04 merged commit 3fe2ea3 into intel-analytics:main Oct 28, 2024
1 check passed
@rnwang04 rnwang04 deleted the first_token_support branch October 28, 2024 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants