New functions added #13

SANKHA1 · 2024-11-07T17:24:07Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

* Fix performance tests regarding trl version * Small fix

* qwen layernorm as input * add group size

) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`

* add env to disable compile opt * fix style * fix style

* add chatglm fuse mlp

* Add initial support for LNL nightly performance tests * Small fix

* update benchmark readme update new comment with memory usage included * Update README.md

* Limit trl version in example * Limit trl version in example

* support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print

* Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix

* replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <[email protected]>

* llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv

* fix three issues * limit mixed_precision for CW only

* Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl

* add minicpm npu * optimize model

* change inter_pp * add comment

* update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <[email protected]>

* add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory

…12564) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <[email protected]>

* Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh

…l2 (#12583) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <[email protected]>

* add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py

* add compresskv back for mistral * fix * fix

* Update open webui doc * Resolve comments

* fix npu save * update

* Update baichuan2.py * style fix

Oscilloscope98 and others added 30 commits November 4, 2024 09:42

Fix performance tests regarding trl version (#12319)

94ce447

* Fix performance tests regarding trl version * Small fix

Qwen layernorm as input (#12309)

c8679ad

* qwen layernorm as input * add group size

[NPU pipeline] update cmake usage of pipeline (#12320)

8fe01c9

Perf test further fix regarding trl version (#12321)

4644cb6

Doc: update harness readme (#12324)

a01371f

[NPU L0] Add layernorm weight as const / input setting (#12322)

5ee6f97

Add transformers_int4_npu_pipeline_win in all-in-one benchmark (#12325

e54af44

) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`

[NPU] Add env to disable compile opt (#12330)

94c4ce3

* add env to disable compile opt * fix style * fix style

Add chatglm2&3 fuse mlp (#12328)

1b637e4

* add chatglm fuse mlp

Add initial support for LNL nightly performance tests (#12326)

522cdf8

* Add initial support for LNL nightly performance tests * Small fix

Small fix to LNL performance tests (#12331)

e2adc97

update benchmark readme (#12323)

45b0d37

* update benchmark readme update new comment with memory usage included * Update README.md

Small fix to LNL performance tests (#12333)

923d696

Limit trl version in example (#12332)

82a61b5

* Limit trl version in example * Limit trl version in example

[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327)

d872639

* support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print

fix chatglm2 cpu ut (#12336)

8e9a3a1

Add dummy model in iGPU perf (#12341)

7240c28

* Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix

Replace gradio_web_server.patch to adjust webui (#12329)

899a303

* replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <[email protected]>

[NPU] Hot fix of load_low_bit (#12344)

69e3a56

Add basic glm4v support (#12345)

c8b7265

optimize glm4v's vision part (#12346)

e23ef7d

Add MiniCPM-V-2_6 to arc perf test (#12349)

d984c06

llama 3.1/3.2 support compresskv (#12347)

f24352a

* llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv

fix three NPU benchmark issues (#12350)

c267355

* fix three issues * limit mixed_precision for CW only

Small optimization to glm4 models (#12351)

872a744

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)

a7b6668

* Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl

add minicpm-v models to transformers_int4_npu_win api (#12352)

79f2877

* add minicpm npu * optimize model

[NPU] acclib llama3.2 support groupwise (#12355)

d880e53

* change inter_pp * add comment

Update Readme for FastChat docker demo (#12354)

ce0c6ae

* update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <[email protected]>

Add troubleshootings for ollama and llama.cpp (#12358)

71ea539

* add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory

MeouSker77 and others added 30 commits December 19, 2024 13:40

optimize siglip attention again (#12578)

4540424

optimize new minicpm model (#12579)

80f2fdc

[NPU] Fix MTL and ARL support (#12580)

4e7e988

support Megrez-3B-Omni (#12582)

3eeb02f

Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 (#…

47da3c9

…12564) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <[email protected]>

Upgrade oneccl version to 0.0.6.3 (#12560)

51ff9eb

* Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh

refactor qwen2 and llama3 (#12587)

f3b5fad

Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internv…

b0338c5

…l2 (#12583) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <[email protected]>

refactor glm edge (#12588)

6ea8033

refactor yuan2 and starcoder2 and fix (#12589)

b050368

refactor sd 1.5 and qwen2-vl and fix (#12590)

098eb33

[NPU] support asym_int4 for baichuan (#12576)

c410d9c

* add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py

refactor baichuan, glm4 and minicpm3 (#12600)

7aaf02f

refactor mllama, gpt2 and internvl (#12602)

ad2dc96

[NPU] Fix minicpm on MTL (#12599)

45f8f72

refactor mistral and phi3 (#12605)

073f936

refactor chatglm2, internlm, stablelm and qwen (#12604)

4135b89

Update README.zh-CN.md (#12570)

9c9800b

add compresskv back for mistral (#12607)

4e6b9d8

* add compresskv back for mistral * fix * fix

Update README.zh-CN.md (#12610)

54b1d7d

fix llama related import (#12611)

5f5ac8a

rewrite llama optimization (#12609)

6249c1e

[docs] Update doc for latest open webui: 0.4.8 (#12591)

0477fe6

* Update open webui doc * Resolve comments

[NPU] fix npu save (#12614)

9e895f0

* fix npu save * update

remove bigdl-llm test to fix langchain UT (#12613)

a596f1a

Update Dockerfile (#12585)

28737c2

Polish Readme for ModelScope-related examples (#12603)

ef585d3

[NPU] update convert script based on latest usage (#12617)

d841e1d

small fix (#12616)

1604b4e

[NPU] Update prompt format for baichuan2 (#12615)

ccc4055

* Update baichuan2.py * style fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New functions added #13

New functions added #13

SANKHA1 commented Nov 7, 2024

New functions added #13

Are you sure you want to change the base?

New functions added #13

Conversation

SANKHA1 commented Nov 7, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies