Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created new changes in the functions #7

Open
wants to merge 644 commits into
base: ipex-vllm-mainline
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
644 commits
Select commit Hold shift + click to select a range
a7b6668
[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)
sgwhat Nov 6, 2024
79f2877
add minicpm-v models to `transformers_int4_npu_win` api (#12352)
JinheTang Nov 7, 2024
d880e53
[NPU] acclib llama3.2 support groupwise (#12355)
cyita Nov 7, 2024
ce0c6ae
Update Readme for FastChat docker demo (#12354)
ATMxsp01 Nov 7, 2024
71ea539
Add troubleshootings for ollama and llama.cpp (#12358)
JinheTang Nov 7, 2024
ad68c56
small improvement (#12359)
MeouSker77 Nov 7, 2024
520af4e
Update install_linux_gpu.md (#12353)
qiuxin2012 Nov 7, 2024
1a6cbc4
Add fused mlp optimizations to glm4 models (#12360)
Oscilloscope98 Nov 7, 2024
8fe294e
Small fix to all-in-one benchmark (#12362)
Oscilloscope98 Nov 7, 2024
7ef7696
update linux installation doc (#12365)
qiuxin2012 Nov 8, 2024
812d5cc
[NPU L0] Support llama3.2 in L0 pipeline (#12361)
plusbang Nov 8, 2024
b2e69a8
[NPU] Support Baichuan groupwise & gw code refactor (#12337)
cyita Nov 8, 2024
51f7f87
fix ipex 2.3 bug (#12366)
MeouSker77 Nov 8, 2024
fad15c8
Update fastchat demo script (#12367)
liu-shaojun Nov 8, 2024
2dfcc36
Fix trl version and padding in trl qlora example (#12368)
qiyuangong Nov 8, 2024
dc34e8c
optimize glm4v vision attention (#12369)
MeouSker77 Nov 8, 2024
e091893
Add fused_mlp to glm4v models (#12378)
Oscilloscope98 Nov 11, 2024
c92d76b
Update oneccl-binding.patch (#12377)
liu-shaojun Nov 11, 2024
85c9279
Update llama-cpp docker usage (#12387)
hzjane Nov 12, 2024
7a97fbb
Support vpm and resampler module of minicpm-v on NPU (#12375)
plusbang Nov 12, 2024
6bf5a8c
[NPU] Update qwen2 compile config (#12383)
rnwang04 Nov 12, 2024
4376fde
Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfil…
ACupofAir Nov 12, 2024
0ee54fc
Upgrade to vllm 0.6.2 (#12338)
gc-fu Nov 12, 2024
dd8964b
changed inference-cpp/Dockerfile (#12386)
ATMxsp01 Nov 12, 2024
2715247
minor fix (#12389)
liu-shaojun Nov 12, 2024
1158f91
Fix llava with multi-image inputs (#12384)
Oscilloscope98 Nov 13, 2024
9220bab
qwen prefill attn_mask type fp16 (#12394)
cyita Nov 13, 2024
d6d63d6
[NPU] Qwen prefill attn_mask type hotfix (#12395)
cyita Nov 13, 2024
00fce5c
use new q4_0 batch kernel (#12396)
MeouSker77 Nov 13, 2024
59b01fa
small fix (#12397)
cyita Nov 14, 2024
6726b19
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
ATMxsp01 Nov 14, 2024
d2cbcb0
Add initial support for modeling_xlm encoder on NPU (#12393)
sgwhat Nov 14, 2024
7e50ff1
Add padding_token=eos_token for GPU trl QLora example (#12398)
qiyuangong Nov 14, 2024
d4d9494
[NPU] change attention_mask to fp16 (#12400)
plusbang Nov 14, 2024
548dec5
fix npu pipeline workflow (#12404)
rnwang04 Nov 15, 2024
d1cde7f
Tiny doc fix (#12405)
Oscilloscope98 Nov 15, 2024
fcc0fa7
fix workflow again (#12406)
rnwang04 Nov 15, 2024
6c5e8fc
fix again (#12407)
rnwang04 Nov 15, 2024
3d5fbf2
update batch kernel condition (#12408)
MeouSker77 Nov 15, 2024
d2c821d
Add missing arguments in pipeline parallel generate method (#12142)
notsyncing Nov 18, 2024
a69395f
Support performance mode of GLM4 model (#12401)
Oscilloscope98 Nov 18, 2024
d6057f6
Update benchmark_vllm_throughput.py (#12414)
gc-fu Nov 19, 2024
a9cb70a
Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md (#12409)
joan726 Nov 19, 2024
ff3f7cb
Fix speech_paraformer issue with unexpected changes (#12416)
sgwhat Nov 19, 2024
1bfcbc0
Add multimodal benchmark (#12415)
hzjane Nov 20, 2024
54c62fe
[NPU] dump prefill IR for further C++ solution (#12402)
rnwang04 Nov 20, 2024
d2a37b6
add Stable diffusion examples (#12418)
JinheTang Nov 20, 2024
7288c75
Initial NPU C++ Example (#12417)
rnwang04 Nov 21, 2024
145e8b4
update batch kernel condition (#12421)
MeouSker77 Nov 21, 2024
7e0a840
add optimization to openjourney (#12423)
JinheTang Nov 21, 2024
8fdc36c
Optimize with new batch kernel when `batch_size=1` on LNL (#12419)
Oscilloscope98 Nov 21, 2024
2935e97
small fix of cpp readme(#12425)
rnwang04 Nov 21, 2024
e61ae88
Upgrade denpendency for xpu_lnl and xpu_arl option (#12424)
Oscilloscope98 Nov 21, 2024
c089b6c
Update english prompt to 34k (#12429)
liu-shaojun Nov 22, 2024
4ffa6c7
New convert support for C++ NPU (#12430)
rnwang04 Nov 22, 2024
0819fad
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
rnwang04 Nov 22, 2024
f414053
Support minicpm for NPU C++ (#12434)
rnwang04 Nov 25, 2024
be132c4
fix and optimize sd (#12436)
MeouSker77 Nov 25, 2024
8164aed
small change (#12439)
MeouSker77 Nov 25, 2024
b633fbf
add chinese prompt troubleshooting for npu cpp examples (#12437)
JinheTang Nov 25, 2024
b9abb8a
Support qwen2.5 3B for NPU & update related examples (#12438)
rnwang04 Nov 25, 2024
cdd41f5
optimize sdxl again (#12441)
MeouSker77 Nov 25, 2024
0e23bd7
Add support of llama3.2 for NPU C++ (#12442)
rnwang04 Nov 26, 2024
66bd7ab
add sdxl and lora-lcm optimization (#12444)
JinheTang Nov 26, 2024
52c17fe
Optimize first token of C++ NPU by adding npu_dpu_groups (#12443)
rnwang04 Nov 26, 2024
71e1f11
update serving image runtime (#12433)
pepijndevos Nov 26, 2024
303b104
Fix abnormal output for Qwen2-7B when sym_int8 (#12446)
Oscilloscope98 Nov 26, 2024
24b46b2
[NPU] further fix of qwen2 int8 pipeline & C++ (#12449)
rnwang04 Nov 26, 2024
c2efa26
Update LangChain examples to use upstream (#12388)
JinBridger Nov 26, 2024
7b40f9b
[NPU] Support GW for NPU C++ (#12450)
rnwang04 Nov 26, 2024
cb7b089
update vllm-docker-quick-start for vllm0.6.2 (#12392)
ACupofAir Nov 27, 2024
8331875
Fix (#12390)
gc-fu Nov 27, 2024
f8c2bb2
[NPU] optimize qwen2 prefill performance for C++ (#12451)
rnwang04 Nov 27, 2024
acd77d9
Remove env variable `BIGDL_LLM_XMX_DISABLED` in documentation (#12445)
cranechu0131 Nov 27, 2024
effb9bb
Small update to LangChain examples readme (#12452)
Oscilloscope98 Nov 27, 2024
ce6fcaa
update transformers version in example of glm4 (#12453)
cranechu0131 Nov 27, 2024
281c9b0
[NPU] Add L0 support for NPU C++ (#12454)
rnwang04 Nov 27, 2024
6f3441b
fix glm4-9b overflow (#12455)
MeouSker77 Nov 27, 2024
a2272b7
Small fix in llama.cpp troubleshooting guide (#12457)
Oscilloscope98 Nov 27, 2024
b29da30
[NPU] Update C++ L0 (#12458)
rnwang04 Nov 27, 2024
d272f6b
remove nf4 unsupport comment in cpu finetuning (#12460)
Uxito-Ada Nov 28, 2024
1b533a1
[NPU] Add env to enable scale search (#12462)
cyita Nov 28, 2024
490bb0c
[NPU] update fused layers for GW (#12459)
rnwang04 Nov 28, 2024
14d8d3d
Integrate NPU C++ imple into ipex-llm (#12461)
plusbang Nov 29, 2024
c911026
[NPU C++] Update model support & examples & benchmark (#12466)
plusbang Nov 29, 2024
f99f188
Hotfix of benchmark script (#12467)
plusbang Nov 29, 2024
4b6c316
Support imatrix-guided quantization for NPU CW (#12468)
rnwang04 Dec 2, 2024
59bd4a2
add vLLM glm4 fix (#12474)
gc-fu Dec 2, 2024
54d9a59
[NPU]Fix eos_token setting (#12475)
plusbang Dec 2, 2024
31c69a8
Fix MiniCPM-V models running on NPU (#12478)
JinBridger Dec 2, 2024
aee9acb
Add NPU QuickStart & update example links (#12470)
Oscilloscope98 Dec 2, 2024
b2e56a2
Add release support for option `xpu_arc` (#12422)
Oscilloscope98 Dec 2, 2024
26adb82
[NPU] Remove hard code (#12479)
Oscilloscope98 Dec 2, 2024
ab01753
[NPU] update save-load API usage (#12473)
plusbang Dec 3, 2024
598603b
small fix of imatrix (#12480)
rnwang04 Dec 3, 2024
5fe7667
Fix MiniCPM-V-2_6 running on NPU (#12486)
JinBridger Dec 3, 2024
7082844
Fix NPU LLM example save/load tokenizer (#12485)
JinBridger Dec 3, 2024
4ac66db
[NPU] Support streaming in Python (cpp backend) (#12488)
Oscilloscope98 Dec 3, 2024
80f15e4
Update README.md (#12489)
jason-dai Dec 3, 2024
c592844
Hotfix of BCE-Emdedding model (#12490)
plusbang Dec 3, 2024
5629fdd
optimize qwen2_vl multiple image input or video input (#12487)
MeouSker77 Dec 4, 2024
ef4028a
[NPU] Support split `lm_head` for Qwen2 with CPP (#12491)
Oscilloscope98 Dec 4, 2024
7ff4533
Support hf generate (#12477)
hkvision Dec 4, 2024
e0bf005
small fix (#12493)
MeouSker77 Dec 4, 2024
ae9c215
Added cross-links (#12494)
joan726 Dec 4, 2024
a9e3f7f
optimize minicpm (#12496)
MeouSker77 Dec 4, 2024
ffa9a9e
Update streaming in npu examples (#12495)
cranechu0131 Dec 4, 2024
7d27f13
Fix hf generate for llama3.2 (#12497)
hkvision Dec 4, 2024
b89ea1b
Support save/load model for hf generate (#12499)
hkvision Dec 4, 2024
d8b14a6
Update save/load comments (#12500)
hkvision Dec 4, 2024
84f1c4a
Small fix for NPU Python cpp simple generate regarding eos tokens (#1…
Oscilloscope98 Dec 4, 2024
f56a111
[NPU] Fix load-low-bit benchmark script (#12502)
plusbang Dec 5, 2024
727f299
Add NPU demo gif to main readme (#12503)
Oscilloscope98 Dec 5, 2024
5e1416c
fix readme for npu cpp examples and llama.cpp (#12505)
JinheTang Dec 5, 2024
0a3eda0
Update README.md (#12507)
jason-dai Dec 5, 2024
60bafab
Small fixes to main readme (#12508)
Oscilloscope98 Dec 5, 2024
49ab897
[NPU] initial support of `asym_int4_rtn` (#12484)
rnwang04 Dec 5, 2024
0918d3b
[NPU] Fix hf generate with save/load generation config for Python (cp…
Oscilloscope98 Dec 5, 2024
12c7897
[NPU C++] Update example with conversation mode support (#12510)
plusbang Dec 6, 2024
ea55235
[NPU] Support glm-edge models (#12511)
plusbang Dec 9, 2024
922958c
vllm oneccl upgrade to b9 (#12520)
hzjane Dec 10, 2024
77404d2
support new model (#12523)
MeouSker77 Dec 11, 2024
68f2873
[NPU] Support repetition penalty for simple generate, Python (cpp bac…
Oscilloscope98 Dec 11, 2024
588bfa2
support hqq (#12518)
rnwang04 Dec 11, 2024
41ef497
[NPU] fix `transpose_value = False` for NPU `optimize_model=True` (#1…
rnwang04 Dec 11, 2024
fd9cf76
All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper …
ATMxsp01 Dec 11, 2024
509bdb4
[NPU] Fix minicpm-2B error (#12527)
plusbang Dec 11, 2024
6fc27da
[NPU] Update glm-edge support in docs (#12529)
plusbang Dec 12, 2024
2cce896
Enable `use_batch_forward` Optimization on Battlemage GPU (#12516)
liu-shaojun Dec 12, 2024
dbaf4ab
[NPU] Update C++ example with repetition_penalty & update Python code…
Oscilloscope98 Dec 12, 2024
3e0823d
add basic glm-edge support (#12531)
MeouSker77 Dec 12, 2024
ffce86d
add basic glm-edge-v support (#12533)
MeouSker77 Dec 12, 2024
f36c236
[NPU] Fix abnormal output with latest driver (#12530)
plusbang Dec 12, 2024
b747f3f
Small fix to GPU installation guide (#12536)
Oscilloscope98 Dec 13, 2024
fa261b8
torch 2.3 inference docker (#12517)
Uxito-Ada Dec 13, 2024
7cc01fd
[NPU] further fix of `new_value_states` (#12538)
rnwang04 Dec 13, 2024
6596c18
[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input …
plusbang Dec 13, 2024
1521994
optimize glm edge again (#12539)
MeouSker77 Dec 13, 2024
d20a968
[NPU] Fix generate example (#12541)
plusbang Dec 13, 2024
5402fc6
[Ollama] Update ipex-llm ollama readme to v0.4.6 (#12542)
sgwhat Dec 13, 2024
c090d16
remove old rope usage (#12544)
MeouSker77 Dec 13, 2024
caf15cc
[NPU] Add `IPEX_LLM_NPU_MTL` to enable support on mtl (#12543)
plusbang Dec 13, 2024
0b953e6
[REFINE] graphmode code (#12540)
ACupofAir Dec 16, 2024
a86487c
Add GLM-Edge GPU example (#12483)
cranechu0131 Dec 16, 2024
5ae0006
remove old rope usage (#12552)
MeouSker77 Dec 16, 2024
ccc18ee
Add Modelscope option for chatglm3 on GPU (#12545)
ATMxsp01 Dec 16, 2024
680ea7e
[NPU doc] Update configuration for different platforms (#12554)
plusbang Dec 17, 2024
a608f26
use new fused layer norm (#12553)
MeouSker77 Dec 17, 2024
d127a86
Small typo fixes (#12558)
Oscilloscope98 Dec 17, 2024
fcb4748
[NPU] support asym_int4 for llama (#12556)
lzivan Dec 17, 2024
429bf1f
Change: Use cn mirror for PyTorch extension installation to resolve n…
liu-shaojun Dec 17, 2024
694d14b
[NPU doc] Add ARL runtime configuration (#12562)
plusbang Dec 17, 2024
6278caf
Add `setuptools` as a basic dependency (#12563)
Oscilloscope98 Dec 17, 2024
6e801bc
Update readme (#12565)
jason-dai Dec 18, 2024
1a2ab12
[NPU] support asym_int4 for minicpm (#12567)
lzivan Dec 18, 2024
a4eb561
optimize siglip attention on arc (#12569)
MeouSker77 Dec 18, 2024
e2ae429
small fix (#12573)
MeouSker77 Dec 18, 2024
f7a2bd2
Update ollama and llama.cpp readme (#12574)
sgwhat Dec 18, 2024
28e81fd
Replace runner doc in ollama quickstart (#12575)
sgwhat Dec 18, 2024
47e90a3
Add `--modelscope` in GPU examples for glm4, codegeex2, qwen2 and qwe…
ATMxsp01 Dec 19, 2024
e0921f8
padding mask on torch side (#12577)
MeouSker77 Dec 19, 2024
4540424
optimize siglip attention again (#12578)
MeouSker77 Dec 19, 2024
80f2fdc
optimize new minicpm model (#12579)
MeouSker77 Dec 19, 2024
4e7e988
[NPU] Fix MTL and ARL support (#12580)
plusbang Dec 19, 2024
3eeb02f
support Megrez-3B-Omni (#12582)
MeouSker77 Dec 19, 2024
47da3c9
Add `--modelscope` in GPU examples for minicpm, minicpm3, baichuan2 (…
ATMxsp01 Dec 19, 2024
51ff9eb
Upgrade oneccl version to 0.0.6.3 (#12560)
liu-shaojun Dec 20, 2024
f3b5fad
refactor qwen2 and llama3 (#12587)
MeouSker77 Dec 20, 2024
b0338c5
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internv…
ATMxsp01 Dec 20, 2024
6ea8033
refactor glm edge (#12588)
MeouSker77 Dec 20, 2024
b050368
refactor yuan2 and starcoder2 and fix (#12589)
MeouSker77 Dec 20, 2024
098eb33
refactor sd 1.5 and qwen2-vl and fix (#12590)
MeouSker77 Dec 20, 2024
c410d9c
[NPU] support asym_int4 for baichuan (#12576)
lzivan Dec 24, 2024
7aaf02f
refactor baichuan, glm4 and minicpm3 (#12600)
MeouSker77 Dec 24, 2024
ad2dc96
refactor mllama, gpt2 and internvl (#12602)
MeouSker77 Dec 24, 2024
45f8f72
[NPU] Fix minicpm on MTL (#12599)
plusbang Dec 24, 2024
073f936
refactor mistral and phi3 (#12605)
MeouSker77 Dec 24, 2024
4135b89
refactor chatglm2, internlm, stablelm and qwen (#12604)
MeouSker77 Dec 24, 2024
9c9800b
Update README.zh-CN.md (#12570)
joan726 Dec 24, 2024
4e6b9d8
add compresskv back for mistral (#12607)
MeouSker77 Dec 25, 2024
54b1d7d
Update README.zh-CN.md (#12610)
jason-dai Dec 25, 2024
5f5ac8a
fix llama related import (#12611)
MeouSker77 Dec 25, 2024
6249c1e
rewrite llama optimization (#12609)
MeouSker77 Dec 25, 2024
0477fe6
[docs] Update doc for latest open webui: 0.4.8 (#12591)
Mingqi2 Dec 26, 2024
9e895f0
[NPU] fix npu save (#12614)
rnwang04 Dec 26, 2024
a596f1a
remove bigdl-llm test to fix langchain UT (#12613)
MeouSker77 Dec 26, 2024
28737c2
Update Dockerfile (#12585)
liu-shaojun Dec 26, 2024
ef585d3
Polish Readme for ModelScope-related examples (#12603)
ATMxsp01 Dec 26, 2024
d841e1d
[NPU] update convert script based on latest usage (#12617)
rnwang04 Dec 26, 2024
1604b4e
small fix (#12616)
MeouSker77 Dec 26, 2024
ccc4055
[NPU] Update prompt format for baichuan2 (#12615)
lzivan Dec 26, 2024
40a7d2b
Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Envi…
liu-shaojun Dec 26, 2024
a9abde0
support passing attn_scale to sdpa (#12619)
MeouSker77 Dec 26, 2024
bbdbbb0
[NPU] Compatible with other third-party models like auto-round (#12620)
rnwang04 Dec 26, 2024
796ee57
[NPU doc] Update verified platforms (#12621)
plusbang Dec 26, 2024
55ce091
Add GLM4-Edge-V GPU example (#12596)
ATMxsp01 Dec 27, 2024
34dbdb8
small fix (#12623)
MeouSker77 Dec 27, 2024
5f04ed7
NPU] Update prompt format for baichuan2-pipeline (#12625)
lzivan Dec 27, 2024
90f6709
[remove pipeline examples (#12626)
rnwang04 Dec 27, 2024
46eeab4
[NPU] Fix regression caused by layer_norm change (#12627)
plusbang Dec 27, 2024
c72a5db
remove unused code again (#12624)
MeouSker77 Dec 27, 2024
f17ccfa
[NPU] Fix save-load usage of minicpm models (#12628)
plusbang Dec 27, 2024
2d08155
remove bmm, which is only required in ipex 2.0 (#12630)
MeouSker77 Dec 27, 2024
f289f68
small fix (#12634)
MeouSker77 Dec 30, 2024
534566e
[NPU] Support minicpm-v with python cpp backend (#12637)
plusbang Jan 2, 2025
81211fd
remove unused code (#12635)
MeouSker77 Jan 2, 2025
6231896
Update llama example information (#12640)
ATMxsp01 Jan 2, 2025
8e5328e
add disable opts for awq (#12641)
cyita Jan 2, 2025
550fa01
[Doc] Update ipex-llm ollama troubleshooting for v0.4.6 (#12642)
sgwhat Jan 2, 2025
8fd2dcb
Add benchmark_util for `transformers >= 4.47.0` (#12644)
lzivan Jan 3, 2025
6711a48
Enable internvl2-8b on vllm(#12645)
hzjane Jan 3, 2025
0b37710
Add guide for save-load usage (#12498)
plusbang Jan 3, 2025
9f8b134
add ipex-llm custom kernel registration (#12648)
MeouSker77 Jan 3, 2025
502461d
remove unnecessary ipex kernel usage (#12649)
MeouSker77 Jan 3, 2025
fae73ee
[NPU] Support save npu quantized model without npu dependency (#12647)
cyita Jan 6, 2025
ea65e4f
remove falcon support and related UT (#12656)
MeouSker77 Jan 7, 2025
ddc0ef3
refactor device check and remove cohere/mixtral support (#12659)
MeouSker77 Jan 7, 2025
381d448
[NPU] Example & Quickstart updates (#12650)
Oscilloscope98 Jan 7, 2025
ebdf19f
[NPU] Further fix saving of generation config (#12657)
Oscilloscope98 Jan 7, 2025
525b0ee
[NPU] Tiny fixes on examples (#12661)
Oscilloscope98 Jan 7, 2025
29ad5c4
refactor codegeex to remove ipex kernel usage (#12664)
MeouSker77 Jan 7, 2025
f9ee789
fix onednn dependency bug (#12665)
MeouSker77 Jan 7, 2025
5db6f9d
Add option with PyTorch 2.6 RC version for testing purposes (#12668)
Oscilloscope98 Jan 7, 2025
0534d72
Update docker_cpp_xpu_quickstart.md (#12667)
ca1ic0 Jan 8, 2025
ccf618f
Remove all ipex usage (#12666)
MeouSker77 Jan 8, 2025
7dd156d
small fix and add comment (#12670)
MeouSker77 Jan 8, 2025
2c23ce2
Create a BattleMage QuickStart (#12663)
liu-shaojun Jan 8, 2025
c11f5f0
also convert SdpaAttention in optimize_model (#12673)
MeouSker77 Jan 8, 2025
a22a8c2
small fix and remove ununsed code about ipex (#12671)
MeouSker77 Jan 8, 2025
5c24276
fix custom kernel registration (#12674)
MeouSker77 Jan 8, 2025
2321e8d
Update README.md (#12676)
jason-dai Jan 8, 2025
c6f57ad
Update README.md (#12677)
jason-dai Jan 8, 2025
aa9e70a
Update B580 Doc (#12678)
jason-dai Jan 8, 2025
1ec40cd
refactor to simplify following upgrade (#12680)
MeouSker77 Jan 9, 2025
5d8081a
Remove dummy model from performance tests (#12682)
Oscilloscope98 Jan 9, 2025
7234c9b
update quantize kv cache condition (#12681)
MeouSker77 Jan 9, 2025
c247415
Support PyTorch 2.6 RC perf test on Windows (#12683)
Oscilloscope98 Jan 9, 2025
66d4385
Update B580 CN Doc (#12686)
joan726 Jan 9, 2025
f9b29a4
Update B580 doc (#12691)
jason-dai Jan 10, 2025
2673792
Update Dockerfile (#12688)
liu-shaojun Jan 10, 2025
6885749
refactor to simplify following upgrade 2 (#12685)
MeouSker77 Jan 10, 2025
f8dc408
fix user issue (#12692)
MeouSker77 Jan 10, 2025
cbb8e2a
Update documents (#12693)
jason-dai Jan 10, 2025
584c1c5
Update B580 CN doc (#12695)
joan726 Jan 10, 2025
da8bcb7
[NPU ] fix load logic of glm-edge models (#12698)
plusbang Jan 10, 2025
4bf93c6
Support install from source for PyTorch 2.6 RC in UT (#12697)
Oscilloscope98 Jan 10, 2025
db9db51
fix lnl perf (#12700)
MeouSker77 Jan 10, 2025
e2d58f7
Update ollama v0.5.1 document (#12699)
sgwhat Jan 10, 2025
a1da790
Fix name device is not found bug (#12703)
Oscilloscope98 Jan 13, 2025
350fae2
Add Qwen2-VL HF GPU example with ModelScope Support (#12606)
ATMxsp01 Jan 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
25 changes: 0 additions & 25 deletions .github/actions/llm/cli-test-windows/action.yml

This file was deleted.

2 changes: 2 additions & 0 deletions .github/actions/llm/download-llm-binary/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ runs:
mv windows-avx2/* python/llm/llm-binary/
mv windows-avx-vnni/* python/llm/llm-binary/
mv windows-avx/* python/llm/llm-binary/
mv windows-npu-level0/* python/llm/llm-binary/
fi
rm -rf linux-avx2 || true
rm -rf linux-avx512 || true
Expand All @@ -36,3 +37,4 @@ runs:
rm -rf windows-avx2 || true
rm -rf windows-avx-vnni || true
rm -rf windows-avx || true
rm -rf windows-npu-level0 || true
6 changes: 6 additions & 0 deletions .github/actions/llm/setup-llm-env/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ runs:
sed -i 's/"bigdl-core-xe-addons-21==" + CORE_XE_VERSION/"bigdl-core-xe-addons-21"/g' python/llm/setup.py
sed -i 's/"bigdl-core-xe-esimd-21==" + CORE_XE_VERSION/"bigdl-core-xe-esimd-21"/g' python/llm/setup.py

pip uninstall bigdl-core-xe-all -y || true
sed -i 's/"bigdl-core-xe-all==" + CORE_XE_VERSION/"bigdl-core-xe-all"/g' python/llm/setup.py

pip install requests
if [[ ${{ runner.os }} == 'Linux' ]]; then
bash python/llm/dev/release_default_linux.sh default false
Expand All @@ -45,6 +48,9 @@ runs:
elif [[ ${{ inputs.extra-dependency }} == 'xpu_2.1' ]]; then
pip install --upgrade --pre -i https://pypi.python.org/simple --force-reinstall "python/llm/dist/${whl_name}[xpu_2.1]" --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
pip install pytest expecttest
elif [[ ${{ inputs.extra-dependency }} == 'xpu_2.6' ]]; then
pip install --upgrade --pre -i https://pypi.python.org/simple --force-reinstall "python/llm/dist/${whl_name}[xpu_2.6]" --extra-index-url https://download.pytorch.org/whl/test/xpu
pip install pytest
else
if [[ ${{ runner.os }} == 'Linux' ]]; then
pip install --upgrade --pre -i https://pypi.python.org/simple --force-reinstall "python/llm/dist/${whl_name}[all]" --extra-index-url https://download.pytorch.org/whl/cpu
Expand Down
58 changes: 58 additions & 0 deletions .github/workflows/llm-binary-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,64 @@ jobs:
path: |
release

check-windows-npu-level0-artifact:
if: ${{contains(inputs.platform, 'Windows')}}
runs-on: [Shire]
outputs:
if-exists: ${{steps.check_artifact.outputs.exists}}
steps:
- name: Check if built
id: check_artifact
uses: xSAVIKx/artifact-exists-action@v0
with:
name: windows-npu-level0

windows-build-npu-level0:
runs-on: [self-hosted, Windows, npu-level0]
needs: check-windows-npu-level0-artifact
if: needs.check-windows-npu-level0-artifact.outputs.if-exists == 'false'
steps:
- name: Set access token
run: |
echo "github_access_token=$env:GITHUB_ACCESS_TOKEN" >> $env:GITHUB_ENV
echo "github_access_token=$env:GITHUB_ACCESS_TOKEN"
- uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744 # actions/checkout@v3
with:
repository: "intel-analytics/llm.cpp"
ref: ${{ inputs.llmcpp-ref }}
token: ${{ env.github_access_token }}
submodules: "recursive"
- name: Add msbuild to PATH
uses: microsoft/[email protected]
with:
msbuild-architecture: x64
- name: Add cmake to PATH
uses: ilammy/msvc-dev-cmd@v1
- name: Build binary
shell: cmd
run: |
call "C:\Program Files (x86)\Intel\openvino_2024.4.0\setupvars.bat"
cd bigdl-core-npu-level0
sed -i "/FetchContent_MakeAvailable(intel_npu_acceleration_library)/s/^/#/" CMakeLists.txt
mkdir build
cd build
cmake ..
cmake --build . --config Release -t pipeline
- name: Move release binary
shell: powershell
run: |
cd bigdl-core-npu-level0
if (Test-Path ./release) { rm -r -fo release }
mkdir release
mv build/Release/pipeline.dll release/pipeline.dll
- name: Archive build files
uses: actions/upload-artifact@v3
with:
name: windows-npu-level0
path: |
bigdl-core-npu-level0/release


# to make llm-binary-build optionally skippable
dummy-step:
if: ${{ inputs.platform == 'Dummy' }}
Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/llm-c-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ permissions:

# Controls when the action will run.
on:
schedule:
- cron: "00 15 * * *" # GMT time, 15:00 GMT == 23:00 Beijing Time
pull_request:
branches: [main]
paths:
- ".github/workflows/llm-c-evaluation.yml"
# schedule:
# - cron: "00 15 * * *" # GMT time, 15:00 GMT == 23:00 Beijing Time
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm-c-evaluation.yml"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
Expand Down Expand Up @@ -204,7 +204,7 @@ jobs:
pip install pandas==1.5.3

- name: Download ceval results
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: ceval_results
path: results
Expand Down Expand Up @@ -259,7 +259,7 @@ jobs:
fi

- name: Download ceval results
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: results_${{ needs.set-matrix.outputs.date }}
path: ${{ env.ACC_FOLDER }}
Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/llm-harness-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ permissions:

# Controls when the action will run.
on:
schedule:
- cron: "30 12 * * *" # GMT time, 12:30 GMT == 20:30 China
pull_request:
branches: [main]
paths:
- ".github/workflows/llm-harness-evaluation.yml"
# schedule:
# - cron: "30 12 * * *" # GMT time, 12:30 GMT == 20:30 China
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm-harness-evaluation.yml"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
Expand Down Expand Up @@ -220,7 +220,7 @@ jobs:
pip install --upgrade pip
pip install jsonlines pytablewriter regex
- name: Download all results
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: harness_results
path: results
Expand Down Expand Up @@ -260,7 +260,7 @@ jobs:
fi

- name: Download harness results
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: harness_results
path: ${{ env.ACC_FOLDER}}/${{ env.DATE }}
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/llm-nightly-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ jobs:
shell: bash
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools==58.0.4
python -m pip install --upgrade wheel

- name: Download llm binary
Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/llm-ppl-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ permissions:

# Controls when the action will run.
on:
schedule:
- cron: "00 12 * * *" # GMT time, 12:00 GMT == 20:00 China
pull_request:
branches: [main]
paths:
- ".github/workflows/llm-ppl-evaluation.yml"
# schedule:
# - cron: "00 12 * * *" # GMT time, 12:00 GMT == 20:00 China
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm-ppl-evaluation.yml"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
Expand Down Expand Up @@ -206,7 +206,7 @@ jobs:
pip install --upgrade pip
pip install jsonlines pytablewriter regex
- name: Download all results
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: ppl_results
path: results
Expand Down Expand Up @@ -245,7 +245,7 @@ jobs:
fi

- name: Download ppl results
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: ppl_results
path: ${{ env.ACC_FOLDER}}/${{ env.DATE }}
Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/llm-whisper-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ permissions:

# Controls when the action will run.
on:
schedule:
- cron: "00 13 * * *" # GMT time, 13:00 GMT == 21:00 China
pull_request:
branches: [main]
paths:
- ".github/workflows/llm-whisper-evaluation.yml"
# schedule:
# - cron: "00 13 * * *" # GMT time, 13:00 GMT == 21:00 China
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm-whisper-evaluation.yml"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
Expand Down Expand Up @@ -176,14 +176,14 @@ jobs:

- name: Download all results for nightly run
if: github.event_name == 'schedule'
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: whisper_results
path: ${{ env.NIGHTLY_FOLDER}}/${{ env.OUTPUT_PATH }}

- name: Download all results for pr run
if: github.event_name == 'pull_request'
uses: actions/download-artifact@v3
uses: actions/download-artifact@4.1.7
with:
name: whisper_results
path: ${{ env.PR_FOLDER}}/${{ env.OUTPUT_PATH }}
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/llm_example_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools==58.0.4
python -m pip install --upgrade wheel

- name: Download llm binary
Expand Down
Loading