Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New functions added #13

Open
wants to merge 220 commits into
base: sankha_branch
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
220 commits
Select commit Hold shift + click to select a range
94ce447
Fix performance tests regarding `trl` version (#12319)
Oscilloscope98 Nov 4, 2024
c8679ad
Qwen layernorm as input (#12309)
hkvision Nov 4, 2024
8fe01c9
[NPU pipeline] update cmake usage of pipeline (#12320)
rnwang04 Nov 4, 2024
4644cb6
Perf test further fix regarding trl version (#12321)
Oscilloscope98 Nov 4, 2024
a01371f
Doc: update harness readme (#12324)
cranechu0131 Nov 4, 2024
5ee6f97
[NPU L0] Add layernorm weight as const / input setting (#12322)
plusbang Nov 4, 2024
e54af44
Add `transformers_int4_npu_pipeline_win` in all-in-one benchmark (#12…
ch1y0q Nov 4, 2024
94c4ce3
[NPU] Add env to disable compile opt (#12330)
cyita Nov 4, 2024
1b637e4
Add chatglm2&3 fuse mlp (#12328)
leonardozcm Nov 4, 2024
522cdf8
Add initial support for LNL nightly performance tests (#12326)
Oscilloscope98 Nov 4, 2024
e2adc97
Small fix to LNL performance tests (#12331)
Oscilloscope98 Nov 4, 2024
45b0d37
update benchmark readme (#12323)
lzivan Nov 5, 2024
923d696
Small fix to LNL performance tests (#12333)
Oscilloscope98 Nov 5, 2024
82a61b5
Limit trl version in example (#12332)
JinBridger Nov 5, 2024
d872639
[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327)
cyita Nov 5, 2024
8e9a3a1
fix chatglm2 cpu ut (#12336)
leonardozcm Nov 5, 2024
7240c28
Add dummy model in iGPU perf (#12341)
JinBridger Nov 5, 2024
899a303
Replace gradio_web_server.patch to adjust webui (#12329)
ATMxsp01 Nov 6, 2024
69e3a56
[NPU] Hot fix of load_low_bit (#12344)
plusbang Nov 6, 2024
c8b7265
Add basic glm4v support (#12345)
MeouSker77 Nov 6, 2024
e23ef7d
optimize glm4v's vision part (#12346)
MeouSker77 Nov 6, 2024
d984c06
Add MiniCPM-V-2_6 to arc perf test (#12349)
JinBridger Nov 6, 2024
f24352a
llama 3.1/3.2 support compresskv (#12347)
cyita Nov 6, 2024
c267355
fix three NPU benchmark issues (#12350)
rnwang04 Nov 6, 2024
872a744
Small optimization to glm4 models (#12351)
Oscilloscope98 Nov 6, 2024
a7b6668
[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)
sgwhat Nov 6, 2024
79f2877
add minicpm-v models to `transformers_int4_npu_win` api (#12352)
JinheTang Nov 7, 2024
d880e53
[NPU] acclib llama3.2 support groupwise (#12355)
cyita Nov 7, 2024
ce0c6ae
Update Readme for FastChat docker demo (#12354)
ATMxsp01 Nov 7, 2024
71ea539
Add troubleshootings for ollama and llama.cpp (#12358)
JinheTang Nov 7, 2024
ad68c56
small improvement (#12359)
MeouSker77 Nov 7, 2024
520af4e
Update install_linux_gpu.md (#12353)
qiuxin2012 Nov 7, 2024
1a6cbc4
Add fused mlp optimizations to glm4 models (#12360)
Oscilloscope98 Nov 7, 2024
8fe294e
Small fix to all-in-one benchmark (#12362)
Oscilloscope98 Nov 7, 2024
7ef7696
update linux installation doc (#12365)
qiuxin2012 Nov 8, 2024
812d5cc
[NPU L0] Support llama3.2 in L0 pipeline (#12361)
plusbang Nov 8, 2024
b2e69a8
[NPU] Support Baichuan groupwise & gw code refactor (#12337)
cyita Nov 8, 2024
51f7f87
fix ipex 2.3 bug (#12366)
MeouSker77 Nov 8, 2024
fad15c8
Update fastchat demo script (#12367)
liu-shaojun Nov 8, 2024
2dfcc36
Fix trl version and padding in trl qlora example (#12368)
qiyuangong Nov 8, 2024
dc34e8c
optimize glm4v vision attention (#12369)
MeouSker77 Nov 8, 2024
e091893
Add fused_mlp to glm4v models (#12378)
Oscilloscope98 Nov 11, 2024
c92d76b
Update oneccl-binding.patch (#12377)
liu-shaojun Nov 11, 2024
85c9279
Update llama-cpp docker usage (#12387)
hzjane Nov 12, 2024
7a97fbb
Support vpm and resampler module of minicpm-v on NPU (#12375)
plusbang Nov 12, 2024
6bf5a8c
[NPU] Update qwen2 compile config (#12383)
rnwang04 Nov 12, 2024
4376fde
Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfil…
ACupofAir Nov 12, 2024
0ee54fc
Upgrade to vllm 0.6.2 (#12338)
gc-fu Nov 12, 2024
dd8964b
changed inference-cpp/Dockerfile (#12386)
ATMxsp01 Nov 12, 2024
2715247
minor fix (#12389)
liu-shaojun Nov 12, 2024
1158f91
Fix llava with multi-image inputs (#12384)
Oscilloscope98 Nov 13, 2024
9220bab
qwen prefill attn_mask type fp16 (#12394)
cyita Nov 13, 2024
d6d63d6
[NPU] Qwen prefill attn_mask type hotfix (#12395)
cyita Nov 13, 2024
00fce5c
use new q4_0 batch kernel (#12396)
MeouSker77 Nov 13, 2024
59b01fa
small fix (#12397)
cyita Nov 14, 2024
6726b19
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
ATMxsp01 Nov 14, 2024
d2cbcb0
Add initial support for modeling_xlm encoder on NPU (#12393)
sgwhat Nov 14, 2024
7e50ff1
Add padding_token=eos_token for GPU trl QLora example (#12398)
qiyuangong Nov 14, 2024
d4d9494
[NPU] change attention_mask to fp16 (#12400)
plusbang Nov 14, 2024
548dec5
fix npu pipeline workflow (#12404)
rnwang04 Nov 15, 2024
d1cde7f
Tiny doc fix (#12405)
Oscilloscope98 Nov 15, 2024
fcc0fa7
fix workflow again (#12406)
rnwang04 Nov 15, 2024
6c5e8fc
fix again (#12407)
rnwang04 Nov 15, 2024
3d5fbf2
update batch kernel condition (#12408)
MeouSker77 Nov 15, 2024
d2c821d
Add missing arguments in pipeline parallel generate method (#12142)
notsyncing Nov 18, 2024
a69395f
Support performance mode of GLM4 model (#12401)
Oscilloscope98 Nov 18, 2024
d6057f6
Update benchmark_vllm_throughput.py (#12414)
gc-fu Nov 19, 2024
a9cb70a
Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md (#12409)
joan726 Nov 19, 2024
ff3f7cb
Fix speech_paraformer issue with unexpected changes (#12416)
sgwhat Nov 19, 2024
1bfcbc0
Add multimodal benchmark (#12415)
hzjane Nov 20, 2024
54c62fe
[NPU] dump prefill IR for further C++ solution (#12402)
rnwang04 Nov 20, 2024
d2a37b6
add Stable diffusion examples (#12418)
JinheTang Nov 20, 2024
7288c75
Initial NPU C++ Example (#12417)
rnwang04 Nov 21, 2024
145e8b4
update batch kernel condition (#12421)
MeouSker77 Nov 21, 2024
7e0a840
add optimization to openjourney (#12423)
JinheTang Nov 21, 2024
8fdc36c
Optimize with new batch kernel when `batch_size=1` on LNL (#12419)
Oscilloscope98 Nov 21, 2024
2935e97
small fix of cpp readme(#12425)
rnwang04 Nov 21, 2024
e61ae88
Upgrade denpendency for xpu_lnl and xpu_arl option (#12424)
Oscilloscope98 Nov 21, 2024
c089b6c
Update english prompt to 34k (#12429)
liu-shaojun Nov 22, 2024
4ffa6c7
New convert support for C++ NPU (#12430)
rnwang04 Nov 22, 2024
0819fad
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
rnwang04 Nov 22, 2024
f414053
Support minicpm for NPU C++ (#12434)
rnwang04 Nov 25, 2024
be132c4
fix and optimize sd (#12436)
MeouSker77 Nov 25, 2024
8164aed
small change (#12439)
MeouSker77 Nov 25, 2024
b633fbf
add chinese prompt troubleshooting for npu cpp examples (#12437)
JinheTang Nov 25, 2024
b9abb8a
Support qwen2.5 3B for NPU & update related examples (#12438)
rnwang04 Nov 25, 2024
cdd41f5
optimize sdxl again (#12441)
MeouSker77 Nov 25, 2024
0e23bd7
Add support of llama3.2 for NPU C++ (#12442)
rnwang04 Nov 26, 2024
66bd7ab
add sdxl and lora-lcm optimization (#12444)
JinheTang Nov 26, 2024
52c17fe
Optimize first token of C++ NPU by adding npu_dpu_groups (#12443)
rnwang04 Nov 26, 2024
71e1f11
update serving image runtime (#12433)
pepijndevos Nov 26, 2024
303b104
Fix abnormal output for Qwen2-7B when sym_int8 (#12446)
Oscilloscope98 Nov 26, 2024
24b46b2
[NPU] further fix of qwen2 int8 pipeline & C++ (#12449)
rnwang04 Nov 26, 2024
c2efa26
Update LangChain examples to use upstream (#12388)
JinBridger Nov 26, 2024
7b40f9b
[NPU] Support GW for NPU C++ (#12450)
rnwang04 Nov 26, 2024
cb7b089
update vllm-docker-quick-start for vllm0.6.2 (#12392)
ACupofAir Nov 27, 2024
8331875
Fix (#12390)
gc-fu Nov 27, 2024
f8c2bb2
[NPU] optimize qwen2 prefill performance for C++ (#12451)
rnwang04 Nov 27, 2024
acd77d9
Remove env variable `BIGDL_LLM_XMX_DISABLED` in documentation (#12445)
cranechu0131 Nov 27, 2024
effb9bb
Small update to LangChain examples readme (#12452)
Oscilloscope98 Nov 27, 2024
ce6fcaa
update transformers version in example of glm4 (#12453)
cranechu0131 Nov 27, 2024
281c9b0
[NPU] Add L0 support for NPU C++ (#12454)
rnwang04 Nov 27, 2024
6f3441b
fix glm4-9b overflow (#12455)
MeouSker77 Nov 27, 2024
a2272b7
Small fix in llama.cpp troubleshooting guide (#12457)
Oscilloscope98 Nov 27, 2024
b29da30
[NPU] Update C++ L0 (#12458)
rnwang04 Nov 27, 2024
d272f6b
remove nf4 unsupport comment in cpu finetuning (#12460)
Uxito-Ada Nov 28, 2024
1b533a1
[NPU] Add env to enable scale search (#12462)
cyita Nov 28, 2024
490bb0c
[NPU] update fused layers for GW (#12459)
rnwang04 Nov 28, 2024
14d8d3d
Integrate NPU C++ imple into ipex-llm (#12461)
plusbang Nov 29, 2024
c911026
[NPU C++] Update model support & examples & benchmark (#12466)
plusbang Nov 29, 2024
f99f188
Hotfix of benchmark script (#12467)
plusbang Nov 29, 2024
4b6c316
Support imatrix-guided quantization for NPU CW (#12468)
rnwang04 Dec 2, 2024
59bd4a2
add vLLM glm4 fix (#12474)
gc-fu Dec 2, 2024
54d9a59
[NPU]Fix eos_token setting (#12475)
plusbang Dec 2, 2024
31c69a8
Fix MiniCPM-V models running on NPU (#12478)
JinBridger Dec 2, 2024
aee9acb
Add NPU QuickStart & update example links (#12470)
Oscilloscope98 Dec 2, 2024
b2e56a2
Add release support for option `xpu_arc` (#12422)
Oscilloscope98 Dec 2, 2024
26adb82
[NPU] Remove hard code (#12479)
Oscilloscope98 Dec 2, 2024
ab01753
[NPU] update save-load API usage (#12473)
plusbang Dec 3, 2024
598603b
small fix of imatrix (#12480)
rnwang04 Dec 3, 2024
5fe7667
Fix MiniCPM-V-2_6 running on NPU (#12486)
JinBridger Dec 3, 2024
7082844
Fix NPU LLM example save/load tokenizer (#12485)
JinBridger Dec 3, 2024
4ac66db
[NPU] Support streaming in Python (cpp backend) (#12488)
Oscilloscope98 Dec 3, 2024
80f15e4
Update README.md (#12489)
jason-dai Dec 3, 2024
c592844
Hotfix of BCE-Emdedding model (#12490)
plusbang Dec 3, 2024
5629fdd
optimize qwen2_vl multiple image input or video input (#12487)
MeouSker77 Dec 4, 2024
ef4028a
[NPU] Support split `lm_head` for Qwen2 with CPP (#12491)
Oscilloscope98 Dec 4, 2024
7ff4533
Support hf generate (#12477)
hkvision Dec 4, 2024
e0bf005
small fix (#12493)
MeouSker77 Dec 4, 2024
ae9c215
Added cross-links (#12494)
joan726 Dec 4, 2024
a9e3f7f
optimize minicpm (#12496)
MeouSker77 Dec 4, 2024
ffa9a9e
Update streaming in npu examples (#12495)
cranechu0131 Dec 4, 2024
7d27f13
Fix hf generate for llama3.2 (#12497)
hkvision Dec 4, 2024
b89ea1b
Support save/load model for hf generate (#12499)
hkvision Dec 4, 2024
d8b14a6
Update save/load comments (#12500)
hkvision Dec 4, 2024
84f1c4a
Small fix for NPU Python cpp simple generate regarding eos tokens (#1…
Oscilloscope98 Dec 4, 2024
f56a111
[NPU] Fix load-low-bit benchmark script (#12502)
plusbang Dec 5, 2024
727f299
Add NPU demo gif to main readme (#12503)
Oscilloscope98 Dec 5, 2024
5e1416c
fix readme for npu cpp examples and llama.cpp (#12505)
JinheTang Dec 5, 2024
0a3eda0
Update README.md (#12507)
jason-dai Dec 5, 2024
60bafab
Small fixes to main readme (#12508)
Oscilloscope98 Dec 5, 2024
49ab897
[NPU] initial support of `asym_int4_rtn` (#12484)
rnwang04 Dec 5, 2024
0918d3b
[NPU] Fix hf generate with save/load generation config for Python (cp…
Oscilloscope98 Dec 5, 2024
12c7897
[NPU C++] Update example with conversation mode support (#12510)
plusbang Dec 6, 2024
ea55235
[NPU] Support glm-edge models (#12511)
plusbang Dec 9, 2024
922958c
vllm oneccl upgrade to b9 (#12520)
hzjane Dec 10, 2024
77404d2
support new model (#12523)
MeouSker77 Dec 11, 2024
68f2873
[NPU] Support repetition penalty for simple generate, Python (cpp bac…
Oscilloscope98 Dec 11, 2024
588bfa2
support hqq (#12518)
rnwang04 Dec 11, 2024
41ef497
[NPU] fix `transpose_value = False` for NPU `optimize_model=True` (#1…
rnwang04 Dec 11, 2024
fd9cf76
All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper …
ATMxsp01 Dec 11, 2024
509bdb4
[NPU] Fix minicpm-2B error (#12527)
plusbang Dec 11, 2024
6fc27da
[NPU] Update glm-edge support in docs (#12529)
plusbang Dec 12, 2024
2cce896
Enable `use_batch_forward` Optimization on Battlemage GPU (#12516)
liu-shaojun Dec 12, 2024
dbaf4ab
[NPU] Update C++ example with repetition_penalty & update Python code…
Oscilloscope98 Dec 12, 2024
3e0823d
add basic glm-edge support (#12531)
MeouSker77 Dec 12, 2024
ffce86d
add basic glm-edge-v support (#12533)
MeouSker77 Dec 12, 2024
f36c236
[NPU] Fix abnormal output with latest driver (#12530)
plusbang Dec 12, 2024
b747f3f
Small fix to GPU installation guide (#12536)
Oscilloscope98 Dec 13, 2024
fa261b8
torch 2.3 inference docker (#12517)
Uxito-Ada Dec 13, 2024
7cc01fd
[NPU] further fix of `new_value_states` (#12538)
rnwang04 Dec 13, 2024
6596c18
[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input …
plusbang Dec 13, 2024
1521994
optimize glm edge again (#12539)
MeouSker77 Dec 13, 2024
d20a968
[NPU] Fix generate example (#12541)
plusbang Dec 13, 2024
5402fc6
[Ollama] Update ipex-llm ollama readme to v0.4.6 (#12542)
sgwhat Dec 13, 2024
c090d16
remove old rope usage (#12544)
MeouSker77 Dec 13, 2024
caf15cc
[NPU] Add `IPEX_LLM_NPU_MTL` to enable support on mtl (#12543)
plusbang Dec 13, 2024
0b953e6
[REFINE] graphmode code (#12540)
ACupofAir Dec 16, 2024
a86487c
Add GLM-Edge GPU example (#12483)
cranechu0131 Dec 16, 2024
5ae0006
remove old rope usage (#12552)
MeouSker77 Dec 16, 2024
ccc18ee
Add Modelscope option for chatglm3 on GPU (#12545)
ATMxsp01 Dec 16, 2024
680ea7e
[NPU doc] Update configuration for different platforms (#12554)
plusbang Dec 17, 2024
a608f26
use new fused layer norm (#12553)
MeouSker77 Dec 17, 2024
d127a86
Small typo fixes (#12558)
Oscilloscope98 Dec 17, 2024
fcb4748
[NPU] support asym_int4 for llama (#12556)
lzivan Dec 17, 2024
429bf1f
Change: Use cn mirror for PyTorch extension installation to resolve n…
liu-shaojun Dec 17, 2024
694d14b
[NPU doc] Add ARL runtime configuration (#12562)
plusbang Dec 17, 2024
6278caf
Add `setuptools` as a basic dependency (#12563)
Oscilloscope98 Dec 17, 2024
6e801bc
Update readme (#12565)
jason-dai Dec 18, 2024
1a2ab12
[NPU] support asym_int4 for minicpm (#12567)
lzivan Dec 18, 2024
a4eb561
optimize siglip attention on arc (#12569)
MeouSker77 Dec 18, 2024
e2ae429
small fix (#12573)
MeouSker77 Dec 18, 2024
f7a2bd2
Update ollama and llama.cpp readme (#12574)
sgwhat Dec 18, 2024
28e81fd
Replace runner doc in ollama quickstart (#12575)
sgwhat Dec 18, 2024
47e90a3
Add `--modelscope` in GPU examples for glm4, codegeex2, qwen2 and qwe…
ATMxsp01 Dec 19, 2024
e0921f8
padding mask on torch side (#12577)
MeouSker77 Dec 19, 2024
4540424
optimize siglip attention again (#12578)
MeouSker77 Dec 19, 2024
80f2fdc
optimize new minicpm model (#12579)
MeouSker77 Dec 19, 2024
4e7e988
[NPU] Fix MTL and ARL support (#12580)
plusbang Dec 19, 2024
3eeb02f
support Megrez-3B-Omni (#12582)
MeouSker77 Dec 19, 2024
47da3c9
Add `--modelscope` in GPU examples for minicpm, minicpm3, baichuan2 (…
ATMxsp01 Dec 19, 2024
51ff9eb
Upgrade oneccl version to 0.0.6.3 (#12560)
liu-shaojun Dec 20, 2024
f3b5fad
refactor qwen2 and llama3 (#12587)
MeouSker77 Dec 20, 2024
b0338c5
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internv…
ATMxsp01 Dec 20, 2024
6ea8033
refactor glm edge (#12588)
MeouSker77 Dec 20, 2024
b050368
refactor yuan2 and starcoder2 and fix (#12589)
MeouSker77 Dec 20, 2024
098eb33
refactor sd 1.5 and qwen2-vl and fix (#12590)
MeouSker77 Dec 20, 2024
c410d9c
[NPU] support asym_int4 for baichuan (#12576)
lzivan Dec 24, 2024
7aaf02f
refactor baichuan, glm4 and minicpm3 (#12600)
MeouSker77 Dec 24, 2024
ad2dc96
refactor mllama, gpt2 and internvl (#12602)
MeouSker77 Dec 24, 2024
45f8f72
[NPU] Fix minicpm on MTL (#12599)
plusbang Dec 24, 2024
073f936
refactor mistral and phi3 (#12605)
MeouSker77 Dec 24, 2024
4135b89
refactor chatglm2, internlm, stablelm and qwen (#12604)
MeouSker77 Dec 24, 2024
9c9800b
Update README.zh-CN.md (#12570)
joan726 Dec 24, 2024
4e6b9d8
add compresskv back for mistral (#12607)
MeouSker77 Dec 25, 2024
54b1d7d
Update README.zh-CN.md (#12610)
jason-dai Dec 25, 2024
5f5ac8a
fix llama related import (#12611)
MeouSker77 Dec 25, 2024
6249c1e
rewrite llama optimization (#12609)
MeouSker77 Dec 25, 2024
0477fe6
[docs] Update doc for latest open webui: 0.4.8 (#12591)
Mingqi2 Dec 26, 2024
9e895f0
[NPU] fix npu save (#12614)
rnwang04 Dec 26, 2024
a596f1a
remove bigdl-llm test to fix langchain UT (#12613)
MeouSker77 Dec 26, 2024
28737c2
Update Dockerfile (#12585)
liu-shaojun Dec 26, 2024
ef585d3
Polish Readme for ModelScope-related examples (#12603)
ATMxsp01 Dec 26, 2024
d841e1d
[NPU] update convert script based on latest usage (#12617)
rnwang04 Dec 26, 2024
1604b4e
small fix (#12616)
MeouSker77 Dec 26, 2024
ccc4055
[NPU] Update prompt format for baichuan2 (#12615)
lzivan Dec 26, 2024
40a7d2b
Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Envi…
liu-shaojun Dec 26, 2024
a9abde0
support passing attn_scale to sdpa (#12619)
MeouSker77 Dec 26, 2024
bbdbbb0
[NPU] Compatible with other third-party models like auto-round (#12620)
rnwang04 Dec 26, 2024
796ee57
[NPU doc] Update verified platforms (#12621)
plusbang Dec 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 4 additions & 2 deletions .github/workflows/llm-binary-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -477,13 +477,15 @@ jobs:
- name: Add cmake to PATH
uses: ilammy/msvc-dev-cmd@v1
- name: Build binary
shell: powershell
shell: cmd
run: |
call "C:\Program Files (x86)\Intel\openvino_2024.4.0\setupvars.bat"
cd bigdl-core-npu-level0
sed -i "/FetchContent_MakeAvailable(intel_npu_acceleration_library)/s/^/#/" CMakeLists.txt
mkdir build
cd build
cmake ..
cmake --build . --config Release -j
cmake --build . --config Release -t pipeline
- name: Move release binary
shell: powershell
run: |
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/llm-nightly-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ jobs:
shell: bash
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools==58.0.4
python -m pip install --upgrade wheel

- name: Download llm binary
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/llm_example_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools==58.0.4
python -m pip install --upgrade wheel

- name: Download llm binary
Expand Down
414 changes: 265 additions & 149 deletions .github/workflows/llm_performance_tests.yml

Large diffs are not rendered by default.

27 changes: 12 additions & 15 deletions .github/workflows/llm_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,6 @@ jobs:
shell: bash
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools==58.0.4
python -m pip install --upgrade wheel

# May remove later
Expand Down Expand Up @@ -213,39 +212,39 @@ jobs:
fi

- name: Run LLM cli test (Linux)
if: runner.os == 'Linux'
if: runner.os == 'Linux'
uses: ./.github/actions/llm/cli-test-linux

- name: Setup Python Path
if: runner.os == 'Windows'
if: runner.os == 'Windows'
shell: bash
run: |
# Get Python interpreter path
python_path=$(python -c 'import sys; print(sys.executable)')
python_dir=$(dirname "$python_path")
scripts_dir="$python_dir/Scripts"

# Set environment variables
echo "PYTHON_DIR=$python_dir" >> $GITHUB_ENV
echo "SCRIPTS_DIR=$scripts_dir" >> $GITHUB_ENV

- name: Run LLM cli test (Windows)
if: runner.os == 'Windows'
if: runner.os == 'Windows'
shell: powershell
run: |
# Retrieve environment variables
$pythonDir = $env:PYTHON_DIR
$scriptsDir = $env:SCRIPTS_DIR

# Update PATH
$env:PATH = "$pythonDir;$scriptsDir;$env:PATH"

# Run tests
llm-cli.ps1 -t $env:THREAD_NUM -n 256 -x llama -m $env:LLAMA_INT4_CKPT_PATH -p 'Once upon a time,'
llm-cli.ps1 -t $env:THREAD_NUM -n 256 -x gptneox -m $env:GPTNEOX_INT4_CKPT_PATH -p 'Once upon a time,'
llm-cli.ps1 -t $env:THREAD_NUM -n 256 -x bloom -m $env:BLOOM_INT4_CKPT_PATH -p 'Once upon a time,'
# llm-cli.ps1 -t $env:THREAD_NUM -x starcoder -m $env:STARCODER_INT4_CKPT_PATH -p 'def check_odd('

- name: Run LLM inference test
shell: bash
run: |
Expand Down Expand Up @@ -317,7 +316,6 @@ jobs:
shell: bash
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade "setuptools<70.0.0"
python -m pip install --upgrade wheel
python -m pip install --upgrade notebook

Expand Down Expand Up @@ -401,7 +399,7 @@ jobs:
echo "Directory $VICUNA_7B_1_3_ORIGIN_PATH not found. Downloading from FTP server..."
wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/vicuna-7b-v1.3 -P $ORIGIN_DIR
fi

- name: Run LLM inference test
shell: bash
run: |
Expand All @@ -414,7 +412,7 @@ jobs:
fi
fi
python -m pip install datasets librosa soundfile einops tiktoken transformers_stream_generator

bash python/llm/test/run-llm-inference-tests-gpu.sh

- name: Run LLM example tests
Expand All @@ -432,7 +430,7 @@ jobs:
fi
fi
bash python/llm/test/run-llm-example-tests-gpu.sh

- name: Get Langchain version
shell: bash
id: get_langchain_version
Expand All @@ -448,7 +446,7 @@ jobs:
repository: "langchain-ai/langchain"
ref: ${{ join(steps.get_langchain_version.outputs.*, '\n') }}
path: langchain_upstream

- name: Run LLM langchain GPU test
shell: bash
run: |
Expand All @@ -464,10 +462,9 @@ jobs:
fi
fi
bash python/llm/test/run-llm-langchain-tests-gpu.sh

pip install -U langchain
pip install -U langchain-community
pip install --pre --upgrade bigdl-llm[all]
bash python/llm/test/run-langchain-upstream-tests.sh

- name: Run LLM llamaindex GPU test
Expand Down
Loading