Created new functions #10

SANKHA1 · 2024-11-01T17:07:34Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

* first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens

* except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3

* update layernorm & code refactor * fix style * add common utils * change to Pool() * remove print

* Add ollama_quickstart.zh-CN.md Add ollama_quickstart.zh-CN.md * Update ollama_quickstart.zh-CN.md Add Chinese and English switching * Update ollama_quickstart.md Add Chinese and English switching * Update README.zh-CN.md Modify the related link to ollama_quickstart.zh-CN.md * Update ollama_quickstart.zh-CN.md Modified based on comments. * Update ollama_quickstart.zh-CN.md Modified based on comments

…_size=0` (#12282) * Initial support for quantized forward on CPU when quantization_group_size=0 * Style fix * Style fix * Small fix * Small fix

* support save & load, update llama examples * update baichuan2 example * update readme

* except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3 * slice -> split * remove debug * fix style * add dpu

* bugfix for qlora 100 step error * indent fix * annotation fix

* qwen2 gw performance opt * remove debug

* feat: change oneccl * fix: restore llama-70b * fix: remove tab * fix: remove extra blank * small fix * add comments * fix: add a blank space

bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available

* support qwen pipeline * update error msg * style * meet review * minor

* new codegeex attn * use kv cache * add compress/quantize kv * remove compress/quantize kv * fix style check * fix style * fix codegeex

* fix graphrag quickstart * fix axolotl quickstart * fix ragflow quickstart * fix ragflow quickstart * fix graphrag toc * fix comments * fix comment * fix comments

* prefill use sdp * add param * update * fix style * fix style * meet comments

…12564) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <[email protected]>

* Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh

…l2 (#12583) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <[email protected]>

* add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py

* add compresskv back for mistral * fix * fix

* Update open webui doc * Resolve comments

* fix npu save * update

* Update baichuan2.py * style fix

rnwang04 and others added 30 commits October 28, 2024 16:05

Add benchmark_latency.py to docker serving image (#12283)

67014cb

Update README.md (#12286)

1cef0c4

[NPU] Support l0 Llama groupwise (#12276)

4467645

* except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3

[NPU L0] update layernorm & code refactor (#12287)

821b003

* update layernorm & code refactor * fix style * add common utils * change to Pool() * remove print

[fix] vllm-online-benchmark first token latency error (#12271)

3700e81

Patch sdpa check function in specific module attributes table (#12285)

546f455

Support baichuan2 for level0 pipeline (#12289)

3feb58d

Initial support for quantized forward on CPU when `quantization_group…

5a15098

…_size=0` (#12282) * Initial support for quantized forward on CPU when quantization_group_size=0 * Style fix * Style fix * Small fix * Small fix

[NPU pipeline] Support save & load and update examples (#12293)

2b2cb9c

* support save & load, update llama examples * update baichuan2 example * update readme

refactor attention_softmax (#12295)

540eaeb

Groupwise prefill optimization (#12291)

70037ad

* except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3 * slice -> split * remove debug * fix style * add dpu

bugfix for qlora finetuning on GPU (#12298)

46d8300

* bugfix for qlora 100 step error * indent fix * annotation fix

Support minicpm-1B in level0 pipeline (#12297)

41b8064

[NPU]Qwen2 groupwise performance opt (#12299)

0763268

* qwen2 gw performance opt * remove debug

Update AWQ and GPTQ GPU example (#12300)

6f22133

feat: change oneccl to internal (#12296)

29400e2

* feat: change oneccl * fix: restore llama-70b * fix: remove tab * fix: remove extra blank * small fix * add comments * fix: add a blank space

Update DPO EADME.md (#12162)

4cf1ccc

bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available

Add Qwen pipeline and example (#12292)

416c191

* support qwen pipeline * update error msg * style * meet review * minor

fix llama3.1/3.2 quantize kv check (#12302)

72605c7

Codegeex support (#12303)

97a0f7f

* new codegeex attn * use kv cache * add compress/quantize kv * remove compress/quantize kv * fix style check * fix style * fix codegeex

updated transformers & accelerate requirements (#12301)

30f668c

Add qwen2-1.5b in l0 pipeline example (#12306)

4892df6

Fix application quickstart (#12305)

3df6195

* fix graphrag quickstart * fix axolotl quickstart * fix ragflow quickstart * fix ragflow quickstart * fix graphrag toc * fix comments * fix comment * fix comments

fix qwen2 attention_mask slice (#12307)

b9853f9

Add minicpm-2b in L0 pipeline (#12308)

eda7649

[NPU] Llama2 prefill use ov sdp (#12310)

05c5d02

* prefill use sdp * add param * update * fix style * fix style * meet comments

Fix DPO finetuning example (#12313)

126f95b

[NPU L0] Update streaming mode of example (#12312)

d409d9d

MeouSker77 and others added 30 commits December 19, 2024 13:40

optimize siglip attention again (#12578)

4540424

optimize new minicpm model (#12579)

80f2fdc

[NPU] Fix MTL and ARL support (#12580)

4e7e988

support Megrez-3B-Omni (#12582)

3eeb02f

Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 (#…

47da3c9

…12564) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <[email protected]>

Upgrade oneccl version to 0.0.6.3 (#12560)

51ff9eb

* Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh

refactor qwen2 and llama3 (#12587)

f3b5fad

Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internv…

b0338c5

…l2 (#12583) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <[email protected]>

refactor glm edge (#12588)

6ea8033

refactor yuan2 and starcoder2 and fix (#12589)

b050368

refactor sd 1.5 and qwen2-vl and fix (#12590)

098eb33

[NPU] support asym_int4 for baichuan (#12576)

c410d9c

* add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py

refactor baichuan, glm4 and minicpm3 (#12600)

7aaf02f

refactor mllama, gpt2 and internvl (#12602)

ad2dc96

[NPU] Fix minicpm on MTL (#12599)

45f8f72

refactor mistral and phi3 (#12605)

073f936

refactor chatglm2, internlm, stablelm and qwen (#12604)

4135b89

Update README.zh-CN.md (#12570)

9c9800b

add compresskv back for mistral (#12607)

4e6b9d8

* add compresskv back for mistral * fix * fix

Update README.zh-CN.md (#12610)

54b1d7d

fix llama related import (#12611)

5f5ac8a

rewrite llama optimization (#12609)

6249c1e

[docs] Update doc for latest open webui: 0.4.8 (#12591)

0477fe6

* Update open webui doc * Resolve comments

[NPU] fix npu save (#12614)

9e895f0

* fix npu save * update

remove bigdl-llm test to fix langchain UT (#12613)

a596f1a

Update Dockerfile (#12585)

28737c2

Polish Readme for ModelScope-related examples (#12603)

ef585d3

[NPU] update convert script based on latest usage (#12617)

d841e1d

small fix (#12616)

1604b4e

[NPU] Update prompt format for baichuan2 (#12615)

ccc4055

* Update baichuan2.py * style fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created new functions #10

Created new functions #10

SANKHA1 commented Nov 1, 2024

Created new functions #10

Are you sure you want to change the base?

Created new functions #10

Conversation

SANKHA1 commented Nov 1, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies