-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add initial support for intel npu acceleration library #11347
add initial support for intel npu acceleration library #11347
Conversation
low_bit_to_dtype_map = { | ||
'sym_int4': int4, | ||
'sym_int8': int8, | ||
'fp32': torch.float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about fp16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it only moves Linear Computation to NPU, other OPs are still run on CPU, CPU doesn't support most fp16 operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
others LGTM as an initial PR.
Maybe add another ENV "BIGDL_USE_NPU" to disable auto import |
added it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Add initial support for intel npu acceleration library, just keep ipex-llm's API, calling
intel_npu_acceleration_library
'scompile
function directly2. User API changes
ipex_llm.transformers.npu_model.AutoModelForCausalLM
receives all arguments ofipex_llm.transformers.AutoModelForCausalLM
, but it doesn't support most of them, so it just ignores them.load_in_low_bit
only supportssym_int4
,sym_int8
andfp32
Note: import ipex will cause
intel_npu_acceleration_library
cannot find NPU, so we must disable ipex auto-import when use it by settingset BIGDL_IMPORT_IPEX=0