Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial support for intel npu acceleration library #11347

Merged

Conversation

MeouSker77
Copy link
Contributor

@MeouSker77 MeouSker77 commented Jun 18, 2024

Description

Add initial support for intel npu acceleration library, just keep ipex-llm's API, calling intel_npu_acceleration_library's compile function directly

2. User API changes

from ipex_llm.transformers.npu_model import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_low_bit='fp16')

ipex_llm.transformers.npu_model.AutoModelForCausalLM receives all arguments of ipex_llm.transformers.AutoModelForCausalLM, but it doesn't support most of them, so it just ignores them.

load_in_low_bit only supports sym_int4, sym_int8 and fp32

Note: import ipex will cause intel_npu_acceleration_library cannot find NPU, so we must disable ipex auto-import when use it by setting set BIGDL_IMPORT_IPEX=0

@MeouSker77 MeouSker77 requested a review from rnwang04 June 18, 2024 07:32
low_bit_to_dtype_map = {
'sym_int4': int4,
'sym_int8': int8,
'fp32': torch.float,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about fp16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only moves Linear Computation to NPU, other OPs are still run on CPU, CPU doesn't support most fp16 operations

Copy link
Contributor

@rnwang04 rnwang04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others LGTM as an initial PR.

@jason-dai
Copy link
Contributor

Note: import ipex will cause intel_npu_acceleration_library cannot find NPU, so we must disable ipex auto-import when use it by setting set BIGDL_IMPORT_IPEX=0

Maybe add another ENV "BIGDL_USE_NPU" to disable auto import

@MeouSker77
Copy link
Contributor Author

Maybe add another ENV "BIGDL_USE_NPU" to disable auto import

added it

Copy link
Contributor

@jason-dai jason-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MeouSker77 MeouSker77 merged commit 83082e5 into intel-analytics:main Jun 18, 2024
17 of 18 checks passed
@MeouSker77 MeouSker77 deleted the add-initial-support-for-npu-lib branch June 18, 2024 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants