add save & load support for NPU optimized model #11999

rnwang04 · 2024-09-03T08:55:14Z

Description

1. Why the change?

fix support of NPU save & load func
add save & load support for NPU optimized model

2. User API changes

for non-optimized model

# save.py
from ipex_llm.transformers.npu_model import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True,
                                             load_in_low_bit="sym_int4",
                                             attn_implementation="eager")
model.save_low_bit("llama_low_bit_npu")

# load.py
model = AutoModelForCausalLM.load_low_bit("llama_low_bit_npu",
                                          attn_implementation="eager")

for optimized model

# save.py
from ipex_llm.transformers.npu_model import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    attn_implementation="eager",
    load_in_low_bit="sym_int4",
    optimize_model=True,
    max_output_len=args.max_output_len,
    max_prompt_len=args.max_prompt_len,
    intra_pp=args.intra_pp,
    inter_pp=args.inter_pp,
    transpose_value_cache=not args.disable_transpose_value_cache,
)

model.save_low_bit("llama_low_bit_npu")

# load.py
model = AutoModelForCausalLM.load_low_bit("llama_low_bit_npu",
                                          attn_implementation="eager",
                                          torch_dtype=torch.float16,
                                          optimize_model=True,
                                          max_output_len=args.max_output_len,
                                          max_prompt_len=args.max_prompt_len,
                                          intra_pp=args.intra_pp,
                                          inter_pp=args.inter_pp,
                                          transpose_value_cache=not args.disable_transpose_value_cache)

Remain issues:

for optimized model, save.py & load.py must be different scirpt, otherwise program will hang at loading stage .

4. How to test?

Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.

plusbang

LGTM, maybe we could also add related example later.

rnwang04 · 2024-09-03T09:52:44Z

LGTM, maybe we could also add related example later.

Yeah we can add the usage in our NPU examples in later PR.

rnwang04 · 2024-09-03T12:54:22Z

PR validation is passed: https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/10680656064

* add save & load support * fix style

add save & load support

453961d

rnwang04 requested a review from plusbang September 3, 2024 08:57

fix style

c7777be

rnwang04 requested a review from jason-dai September 3, 2024 08:57

plusbang approved these changes Sep 3, 2024

View reviewed changes

rnwang04 merged commit 9eaff5e into intel-analytics:main Sep 3, 2024
1 check passed

rnwang04 deleted the npu_save_load_api branch September 3, 2024 12:53

rnwang04 mentioned this pull request Sep 4, 2024

NPU Acceleration Library model loaing #11984

Open

cyita pushed a commit to cyita/BigDL that referenced this pull request Sep 5, 2024

add save & load support for NPU optimized model (intel-analytics#11999)

e4a1f5c

* add save & load support * fix style

cranechu0131 pushed a commit to cranechu0131/ipex-llm that referenced this pull request Sep 9, 2024

add save & load support for NPU optimized model (intel-analytics#11999)

33cba9d

* add save & load support * fix style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add save & load support for NPU optimized model #11999

add save & load support for NPU optimized model #11999

rnwang04 commented Sep 3, 2024 •

edited

Loading

plusbang left a comment

rnwang04 commented Sep 3, 2024

rnwang04 commented Sep 3, 2024

add save & load support for NPU optimized model #11999

add save & load support for NPU optimized model #11999

Conversation

rnwang04 commented Sep 3, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

4. How to test?

plusbang left a comment

Choose a reason for hiding this comment

rnwang04 commented Sep 3, 2024

rnwang04 commented Sep 3, 2024

rnwang04 commented Sep 3, 2024 •

edited

Loading