Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created chnges in the main function #14

Open
wants to merge 653 commits into
base: release_240807_fix
Choose a base branch
from

Conversation

SANKHA1
Copy link
Owner

@SANKHA1 SANKHA1 commented Nov 8, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • N/A
  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • Application test
  • Document test
  • ...

5. New dependencies

  • New Python dependencies
    - Dependency1
    - Dependency2
    - ...
  • New Java/Scala dependencies and their license
    - Dependency1 and license1
    - Dependency2 and license2
    - ...

sgwhat and others added 30 commits November 6, 2024 19:21
* Add initial support for llama3.2-1b/3b

* move llama3.2 support into current llama_mp impl
* change inter_pp

* add comment
* update Readme for FastChat docker demo

* update readme

* add 'Serving with FastChat' part in docs

* polish docs

---------

Co-authored-by: ATMxsp01 <[email protected]>
* add ollama troubleshoot en

* zh ollama troubleshoot

* llamacpp trouble shoot

* llamacpp trouble shoot

* fix

* save gpu memory
* Add fused mlp to glm4 models

* Small fix
* update linux doc

* update
* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* baichuan part

* update

* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* baichuan part

* update

* update

* update

* baichuan support

* code refactor

* remove code

* fix style

* address comments

* revert
* Update README.md

* Update vllm_docker_quickstart.md
* Change trl to 0.9.6
* Enable padding to avoid padding related errors.
* Add files via upload

* upload oneccl-binding.patch

* Update Dockerfile
#12382)

* remove the openwebui in inference-cpp-xpu dockerfile

* update docker_cpp_xpu_quickstart.md

* add sample output in inference-cpp/readme

* remove the openwebui in main readme

* remove the openwebui in main readme
* Initial updates for vllm 0.6.2

* fix

* Change Dockerfile to support v062

* Fix

* fix examples

* Fix

* done

* fix

* Update engine.py

* Fix Dockerfile to original path

* fix

* add option

* fix

* fix

* fix

* fix

---------

Co-authored-by: xiangyuT <[email protected]>
Co-authored-by: ATMxsp01 <[email protected]>
Co-authored-by: Shaojun Liu <[email protected]>
* qwen prefill attn_mask type fp16

* update
MeouSker77 and others added 30 commits January 7, 2025 16:17
* Add option with PyTorch 2.6 RC version for testing purposes

* Small update
* Create bmg_quickstart.md

* Update bmg_quickstart.md

* Clarify IPEX-LLM package installation based on use case

* Update bmg_quickstart.md

* Update bmg_quickstart.md
* Support install from source for PyTorch 2.6 RC in UT

* Remove expecttest
* Update ollama document version and known issue
* Add qwen2-vl example

* complete generate.py & readme

* improve lint style

* update 1-6

* update main readme

* Format and other small fixes

---------

Co-authored-by: Yuwen Hu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.