Test transformers 41 #12

SANKHA1 · 2024-11-04T16:22:40Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

* Add openai-whisper pytorch gpu * Update README.md * Update README.md * fix typo * fix names update readme * Update README.md

* updated qwen1.5B to all transformer==4.37 yaml * updated qwen1.5B to all transformer==4.37 yaml

…11747) mistral-7B-instruct-v0.2 and mistral-7B-instruct-v0.1 use different rope_theta (0.2 is 1e, 0.1 is 1e5). Pass self.config.rope_theta to apply_rotary_pos_emb_no_cache_xpu to avoid output difference.

* phi3 support compresskv * fix phi3 mtl error * fix conflict with quant kv * fix abnormal on mtl * fix style * use slide windows size to compress kv * support sliding window * fix style * fix style * temp: partial support quant kv * support quant kv with compress kv, todo: model check * temp * fix style * fix style * remove prepare * address comment * default -> 1.8k

* fix gptq of llama * small fix

* support compress kv with lookahead * enough kv miss param

* add perf mode * update * fix style

* Revert to use out-of-tree GPU driver since the performance with out-of-tree driver is better than upsteam's * add spaces * add troubleshooting case * update Troubleshooting

* set mistral fuse rope to false except fp6 & fp16 * lint * lint --------- Co-authored-by: ATMxsp01 <[email protected]>

…#11760) * All use 8192.txt for prompt preparation for now * Small fix * Fix text encoding mode to utf-8 * Small update

* fix compresskv + lookahead attn_mask qwen2 * support llama chatglm * support mistral & chatglm * address comments * revert run.py

* Reduce Mistral softmax memory only in low memory mode

…est (#11778) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 * remove 4.43 for arc; fix; * remove 4096-512 for 4.43 * comment some models * Small fix * uncomment models (#11777) --------- Co-authored-by: Ch1y0q <[email protected]>

* deepspeed zero3 QLoRA finetuning * Update convert.py * Update low_bit_linear.py * Update utils.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update utils.py * Update convert.py * Update alpaca_qlora_finetuning.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update deepspeed_zero3.json * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update low_bit_linear.py * Update utils.py * fix style * fix style * Update alpaca_qlora_finetuning.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update convert.py * Update low_bit_linear.py * Update model.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update low_bit_linear.py * Update low_bit_linear.py

* Fix mistral forward_qkv without self.rotary_emb.base in q4_0. * Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced. * Revert #11765

* fix check error * fix other models * remove print

* fix nan value * update

* update on readme after ipex-llm update * update on readme after ipex-llm update * rebase & delete redundancy * revise * add numbers for troubleshooting

* feat：add gptq for ppl * fix: add an empty line * fix: add an empty line * fix: remove an empty line * Resolve comments * Resolve comments * Resolve comments

* add initial support for minicpm-llama-v2.5 * update impl * add minicpm-llama3-v2.5 example

* initial pr * update npu model * fix * fix kv cache type * fix * small fix * fix style * fix model id * change inter_pp=4 * address comment * fix * fix style * fix * rebase

* fix * fix * fix * fix stype * fix style * fix style

* fix * meet comment

* update npu readme of multimodal * small fix * meet comment

…to test_transformers_41

* Add MiniCPM-V cpu example * fix * fix * fix * fix

…s during lookup generation (#11989) * Fix garbage output for input_embeds inputs during lookup generation * Fix on sliding windows * Simplify code

* Update GraphRAG QuickStart * Further updates * Small fixes * Small fix

* minicpm example updates * --stream

* add save & load support * fix style

* fix dependabot alerts * update

…ex-llm into test_transformers_41

lzivan and others added 30 commits August 8, 2024 12:32

Add openai-whisper pytorch gpu (#11736)

9e65cf0

* Add openai-whisper pytorch gpu * Update README.md * Update README.md * fix typo * fix names update readme * Update README.md

enable inference mode for deepspeed tp serving (#11742)

107f7aa

Add qwen2-1.5b-instruct into igpu performance (#11735)

27b4b10

* updated qwen1.5B to all transformer==4.37 yaml * updated qwen1.5B to all transformer==4.37 yaml

Fix vLLM CPU /chat endpoint (#11748)

044e486

Mistral apply_rotary_pos_emb_no_cache_xpu use rope_theta from config (#…

d8808cc

…11747) mistral-7B-instruct-v0.2 and mistral-7B-instruct-v0.1 use different rope_theta (0.2 is 1e, 0.1 is 1e5). Pass self.config.rope_theta to apply_rotary_pos_emb_no_cache_xpu to avoid output difference.

fix gptq of llama (#11749)

7e917d6

* fix gptq of llama * small fix

fix minicpm V 2.6 repeat output (#11753)

93455aa

Support compress kv with lookahead (#11752)

4b9c57c

* support compress kv with lookahead * enough kv miss param

initial support of IPEX_LLM_PERFORMANCE_MODE (#11754)

66fe2ee

* add perf mode * update * fix style

Fix lightweight-serving codegeex error (#11759)

245dba0

Revert to use out-of-tree GPU driver (#11761)

fac4c01

* Revert to use out-of-tree GPU driver since the performance with out-of-tree driver is better than upsteam's * add spaces * add troubleshooting case * update Troubleshooting

optimize minicpm-v-2_6 repetition penalty (#11763)

57d1777

Update npu example and all in one benckmark (#11766)

05989ad

optimize lookahead init time (#11769)

8db3405

Set mistral fuse rope to false except fp6 & fp16 (#11765)

1b05cab

* set mistral fuse rope to false except fp6 & fp16 * lint * lint --------- Co-authored-by: ATMxsp01 <[email protected]>

Update all-in-one benchmark for continuation task input preparation (…

f97a77e

…#11760) * All use 8192.txt for prompt preparation for now * Small fix * Fix text encoding mode to utf-8 * Small update

Fix compresskv with lookahead issue (#11767)

841dbcd

* fix compresskv + lookahead attn_mask qwen2 * support llama chatglm * support mistral & chatglm * address comments * revert run.py

optimize minicpm v 2_6 firs token perf (#11770)

a1eb793

Fix stdout in all-in-one benchmark to utf-8 (#11772)

81824ff

Update npu multimodal example (#11773)

c28b338

Add experimental support of fused decoder layer for llama2 (#11768)

23d3acd

use new fp32 softmax kernel (#11776)

aa861df

Reduce Mistral softmax memory only in low memory mode (#11775)

a88c132

* Reduce Mistral softmax memory only in low memory mode

fix minicpm-v 2.5 (#11780)

a184b12

Fix mistral forward_qkv in q4_0 (#11781)

3998de1

* Fix mistral forward_qkv without self.rotary_emb.base in q4_0. * Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced. * Revert #11765

MiniCPM-V support compresskv (#11779)

7cd6ec9

* fix check error * fix other models * remove print

refactor llama convert to fix minicpm-v 2.5 optimization (#11783)

cb79dcd

hzjane and others added 30 commits August 30, 2024 09:50

Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970)

7d10341

* fix nan value * update

Disable lm head (#11972)

cd07788

modification on llamacpp readme after Ipex-llm latest update (#11971)

e895e1b

* update on readme after ipex-llm update * update on readme after ipex-llm update * rebase & delete redundancy * revise * add numbers for troubleshooting

fix model path (#11973)

1e8c870

add gptq option for ppl test (#11921)

ae7302a

* feat：add gptq for ppl * fix: add an empty line * fix: add an empty line * fix: remove an empty line * Resolve comments * Resolve comments * Resolve comments

[NPU] Add initial support for minicpm-llama-v2.5 (#11962)

158289d

* add initial support for minicpm-llama-v2.5 * update impl * add minicpm-llama3-v2.5 example

Initial NPU support for MiniCPM-V-2_6 (#11966)

60aa1a2

* initial pr * update npu model * fix * fix kv cache type * fix * small fix * fix style * fix model id * change inter_pp=4 * address comment * fix * fix style * fix * rebase

fix npu lm_head cpu condition (#11976)

573c20b

* fix * fix * fix * fix stype * fix style * fix style

small fix (#11978)

4811a49

* fix * meet comment

update npu multimodal readme (#11979)

79978e6

* update npu readme of multimodal * small fix * meet comment

Merge branch 'main' of https://github.com/intel-analytics/ipex-llm in…

9e5518e

…to test_transformers_41

for cpu

f325660

Add MiniCPM-V cpu example (#11975)

65e281b

* Add MiniCPM-V cpu example * fix * fix * fix * fix

Support Qwen2-7b MLP in int4 and transpose_value_cache=True (#11968)

c48817b

Fix AttributeError of qwen2-1.5B (#11990)

a40ea70

hotfix qwen2-7b weight setting (#11991)

2f3d1bd

Fix wrong attention mask and garbage output for inputs_embeds input…

659d15d

…s during lookup generation (#11989) * Fix garbage output for input_embeds inputs during lookup generation * Fix on sliding windows * Simplify code

Revert prefill logic of qwen2-7b (#11992)

01099f0

Update GraphRAG QuickStart (#11995)

643458d

* Update GraphRAG QuickStart * Further updates * Small fixes * Small fix

Rename MiniCPM-V-2_6 CPU example (#11998)

2e54f44

MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates (#11988)

164f47a

* minicpm example updates * --stream

Performance mode strategy update for input_embeds input (#11997)

6eb5565

add save & load support for NPU optimized model (#11999)

9eaff5e

* add save & load support * fix style

vllm update for glm-4 model automatic not_convert (#12003)

2b993ad

fix dependabot alerts (#12006)

77cb348

* fix dependabot alerts * update

fix UT (#12005)

b1408a1

Update action.yml (#12016)

c6348a4

revert actions/download-artifact version to 3 (#12017)

75b19f8

update

428e62b

new main Merge branch 'main' of https://github.com/intel-analytics/ip…

7d8f3a0

…ex-llm into test_transformers_41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test transformers 41 #12

Test transformers 41 #12

SANKHA1 commented Nov 4, 2024

Test transformers 41 #12

Are you sure you want to change the base?

Test transformers 41 #12

Conversation

SANKHA1 commented Nov 4, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies