Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] update save-load API usage #12473

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

plusbang
Copy link
Contributor

@plusbang plusbang commented Dec 2, 2024

Description

Update save-load API usage.

  • save: For all cases with optimize_model=True, specify save_directory is required during the first time to load model, ckpt is saved to save_directory after converting.
  • load: still use load_low_bit API to load, and add python cpp backend support

2. User API changes

# first time load, and ckpt is saved to `save_directory` after converting
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    attn_implementation="eager",
    optimize_model=True,
    max_context_len=1024,
    max_prompt_len=960,
    mixed_precision=True,
    quantization_group_size=0,
    save_directory=save_directory  # required and has related check
)

# load converted model
model = AutoModelForCausalLM.load_low_bit(
    save_directory,
    attn_implementation="eager",
    torch_dtype=torch.float16,
    optimize_model=True,
    max_context_len=1024,
    max_prompt_len=960,
    trust_remote_code=True,
)

example and benchmark scripts have updated.

4. How to test?

  • Application test

@plusbang plusbang requested a review from jason-dai December 2, 2024 09:08
@plusbang plusbang changed the title [NPU cpp] update save-load API usage [NPU] update save-load API usage Dec 2, 2024
Copy link
Contributor

@rnwang04 rnwang04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@plusbang
Copy link
Contributor Author

plusbang commented Dec 3, 2024

Merge it first and will add tokenizer save-load processing in examples later.

@plusbang plusbang merged commit ab01753 into intel-analytics:main Dec 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants