Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pp_serving example to serving image #11433

Merged
merged 4 commits into from
Jun 28, 2024

Conversation

hzjane
Copy link
Contributor

@hzjane hzjane commented Jun 26, 2024

Description

  1. Failed on
    tp_size = get_tensor_model_parallel_world_size()
    , need to fix it first. Fixed by https://github.com/intel-analytics/ipex-llm/pull/11434/files
  2. Running pp in docker will block a long time. Caused by the version isn't intel-level-zero-gpu=1.3.26241.33-647~22.04. The latest version 1.3.27191.42-775~22.04 can't normally run on kernel-5.19 OS(mainly enable on 6.2 and 6.5).

Waiting for this refactor pr merge, and test again[passed]

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • N/A
  • Unit test
  • Application test
  • Document test
  • ...

5. New dependencies

  • New Python dependencies
    - Dependency1
    - Dependency2
    - ...
  • New Java/Scala dependencies and their license
    - Dependency1 and license1
    - Dependency2 and license2
    - ...

export MODEL_PATH="/llm/models/Llama-2-7b-chat-hf"
export low_bit="fp8"
# max requests = max_num * rank_num
export max_num="4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_num_seqs

@hzjane hzjane marked this pull request as ready for review June 28, 2024 02:35
@hzjane hzjane requested a review from glorysdj June 28, 2024 02:35
Copy link
Contributor

@xiangyuT xiangyuT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hzjane hzjane merged commit e000ac9 into intel-analytics:main Jun 28, 2024
RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024
* init pp

* update

* update

* no clone ipex-llm again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants