Created chnges in the main function #14

SANKHA1 · 2024-11-08T16:01:42Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

* Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl

* add minicpm npu * optimize model

* change inter_pp * add comment

* update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <[email protected]>

* add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory

* Add fused mlp to glm4 models * Small fix

* update linux doc * update

* support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert

* Update README.md * Update vllm_docker_quickstart.md

* Change trl to 0.9.6 * Enable padding to avoid padding related errors.

* Add files via upload * upload oneccl-binding.patch * Update Dockerfile

* update * fix

#12382) * remove the openwebui in inference-cpp-xpu dockerfile * update docker_cpp_xpu_quickstart.md * add sample output in inference-cpp/readme * remove the openwebui in main readme * remove the openwebui in main readme

* Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <[email protected]>

Co-authored-by: ATMxsp01 <[email protected]> Co-authored-by: Shaojun Liu <[email protected]>

* qwen prefill attn_mask type fp16 * update

* Add option with PyTorch 2.6 RC version for testing purposes * Small update

* Create bmg_quickstart.md * Update bmg_quickstart.md * Clarify IPEX-LLM package installation based on use case * Update bmg_quickstart.md * Update bmg_quickstart.md

* Support install from source for PyTorch 2.6 RC in UT * Remove expecttest

* Update ollama document version and known issue

* Add qwen2-vl example * complete generate.py & readme * improve lint style * update 1-6 * update main readme * Format and other small fixes --------- Co-authored-by: Yuwen Hu <[email protected]>

sgwhat and others added 30 commits November 6, 2024 19:21

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)

a7b6668

* Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl

add minicpm-v models to transformers_int4_npu_win api (#12352)

79f2877

* add minicpm npu * optimize model

[NPU] acclib llama3.2 support groupwise (#12355)

d880e53

* change inter_pp * add comment

Update Readme for FastChat docker demo (#12354)

ce0c6ae

* update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <[email protected]>

Add troubleshootings for ollama and llama.cpp (#12358)

71ea539

* add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory

small improvement (#12359)

ad68c56

Update install_linux_gpu.md (#12353)

520af4e

Add fused mlp optimizations to glm4 models (#12360)

1a6cbc4

* Add fused mlp to glm4 models * Small fix

Small fix to all-in-one benchmark (#12362)

8fe294e

update linux installation doc (#12365)

7ef7696

* update linux doc * update

[NPU L0] Support llama3.2 in L0 pipeline (#12361)

812d5cc

fix ipex 2.3 bug (#12366)

51f7f87

Update fastchat demo script (#12367)

fad15c8

* Update README.md * Update vllm_docker_quickstart.md

Fix trl version and padding in trl qlora example (#12368)

2dfcc36

* Change trl to 0.9.6 * Enable padding to avoid padding related errors.

optimize glm4v vision attention (#12369)

dc34e8c

Add fused_mlp to glm4v models (#12378)

e091893

Update oneccl-binding.patch (#12377)

c92d76b

* Add files via upload * upload oneccl-binding.patch * Update Dockerfile

Update llama-cpp docker usage (#12387)

85c9279

Support vpm and resampler module of minicpm-v on NPU (#12375)

7a97fbb

[NPU] Update qwen2 compile config (#12383)

6bf5a8c

* update * fix

changed inference-cpp/Dockerfile (#12386)

dd8964b

Co-authored-by: ATMxsp01 <[email protected]> Co-authored-by: Shaojun Liu <[email protected]>

minor fix (#12389)

2715247

Fix llava with multi-image inputs (#12384)

1158f91

qwen prefill attn_mask type fp16 (#12394)

9220bab

[NPU] Qwen prefill attn_mask type hotfix (#12395)

d6d63d6

* qwen prefill attn_mask type fp16 * update

use new q4_0 batch kernel (#12396)

00fce5c

small fix (#12397)

59b01fa

MeouSker77 and others added 30 commits January 7, 2025 16:17

refactor codegeex to remove ipex kernel usage (#12664)

29ad5c4

fix onednn dependency bug (#12665)

f9ee789

Add option with PyTorch 2.6 RC version for testing purposes (#12668)

5db6f9d

* Add option with PyTorch 2.6 RC version for testing purposes * Small update

Update docker_cpp_xpu_quickstart.md (#12667)

0534d72

Remove all ipex usage (#12666)

ccf618f

small fix and add comment (#12670)

7dd156d

Create a BattleMage QuickStart (#12663)

2c23ce2

* Create bmg_quickstart.md * Update bmg_quickstart.md * Clarify IPEX-LLM package installation based on use case * Update bmg_quickstart.md * Update bmg_quickstart.md

also convert SdpaAttention in optimize_model (#12673)

c11f5f0

small fix and remove ununsed code about ipex (#12671)

a22a8c2

fix custom kernel registration (#12674)

5c24276

Update README.md (#12676)

2321e8d

Update README.md (#12677)

c6f57ad

Update B580 Doc (#12678)

aa9e70a

refactor to simplify following upgrade (#12680)

1ec40cd

Remove dummy model from performance tests (#12682)

5d8081a

update quantize kv cache condition (#12681)

7234c9b

Support PyTorch 2.6 RC perf test on Windows (#12683)

c247415

Update B580 CN Doc (#12686)

66d4385

Update B580 doc (#12691)

f9b29a4

Update Dockerfile (#12688)

2673792

refactor to simplify following upgrade 2 (#12685)

6885749

fix user issue (#12692)

f8dc408

Update documents (#12693)

cbb8e2a

Update B580 CN doc (#12695)

584c1c5

[NPU ] fix load logic of glm-edge models (#12698)

da8bcb7

Support install from source for PyTorch 2.6 RC in UT (#12697)

4bf93c6

* Support install from source for PyTorch 2.6 RC in UT * Remove expecttest

fix lnl perf (#12700)

db9db51

Update ollama v0.5.1 document (#12699)

e2d58f7

* Update ollama document version and known issue

Fix name device is not found bug (#12703)

a1da790

Add Qwen2-VL HF GPU example with ModelScope Support (#12606)

350fae2

* Add qwen2-vl example * complete generate.py & readme * improve lint style * update 1-6 * update main readme * Format and other small fixes --------- Co-authored-by: Yuwen Hu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created chnges in the main function #14

Created chnges in the main function #14

SANKHA1 commented Nov 8, 2024

Created chnges in the main function #14

Are you sure you want to change the base?

Created chnges in the main function #14

Conversation

SANKHA1 commented Nov 8, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies