forked from intel-analytics/ipex-llm
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test transformers 41 #12
Open
SANKHA1
wants to merge
194
commits into
SANKHA1:test_transformers_41
Choose a base branch
from
intel-analytics:test_transformers_41
base: test_transformers_41
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Test transformers 41 #12
SANKHA1
wants to merge
194
commits into
SANKHA1:test_transformers_41
from
intel-analytics:test_transformers_41
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add openai-whisper pytorch gpu * Update README.md * Update README.md * fix typo * fix names update readme * Update README.md
* updated qwen1.5B to all transformer==4.37 yaml * updated qwen1.5B to all transformer==4.37 yaml
…11747) mistral-7B-instruct-v0.2 and mistral-7B-instruct-v0.1 use different rope_theta (0.2 is 1e, 0.1 is 1e5). Pass self.config.rope_theta to apply_rotary_pos_emb_no_cache_xpu to avoid output difference.
* phi3 support compresskv * fix phi3 mtl error * fix conflict with quant kv * fix abnormal on mtl * fix style * use slide windows size to compress kv * support sliding window * fix style * fix style * temp: partial support quant kv * support quant kv with compress kv, todo: model check * temp * fix style * fix style * remove prepare * address comment * default -> 1.8k
* fix gptq of llama * small fix
* support compress kv with lookahead * enough kv miss param
* add perf mode * update * fix style
* Revert to use out-of-tree GPU driver since the performance with out-of-tree driver is better than upsteam's * add spaces * add troubleshooting case * update Troubleshooting
* set mistral fuse rope to false except fp6 & fp16 * lint * lint --------- Co-authored-by: ATMxsp01 <[email protected]>
…#11760) * All use 8192.txt for prompt preparation for now * Small fix * Fix text encoding mode to utf-8 * Small update
* fix compresskv + lookahead attn_mask qwen2 * support llama chatglm * support mistral & chatglm * address comments * revert run.py
* Reduce Mistral softmax memory only in low memory mode
…est (#11778) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 * remove 4.43 for arc; fix; * remove 4096-512 for 4.43 * comment some models * Small fix * uncomment models (#11777) --------- Co-authored-by: Ch1y0q <[email protected]>
* deepspeed zero3 QLoRA finetuning * Update convert.py * Update low_bit_linear.py * Update utils.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update utils.py * Update convert.py * Update alpaca_qlora_finetuning.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update deepspeed_zero3.json * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update low_bit_linear.py * Update utils.py * fix style * fix style * Update alpaca_qlora_finetuning.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update convert.py * Update low_bit_linear.py * Update model.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update low_bit_linear.py * Update low_bit_linear.py
* Fix mistral forward_qkv without self.rotary_emb.base in q4_0. * Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced. * Revert #11765
* fix check error * fix other models * remove print
* fix nan value * update
* update on readme after ipex-llm update * update on readme after ipex-llm update * rebase & delete redundancy * revise * add numbers for troubleshooting
* feat:add gptq for ppl * fix: add an empty line * fix: add an empty line * fix: remove an empty line * Resolve comments * Resolve comments * Resolve comments
* add initial support for minicpm-llama-v2.5 * update impl * add minicpm-llama3-v2.5 example
* initial pr * update npu model * fix * fix kv cache type * fix * small fix * fix style * fix model id * change inter_pp=4 * address comment * fix * fix style * fix * rebase
* fix * fix * fix * fix stype * fix style * fix style
* fix * meet comment
* update npu readme of multimodal * small fix * meet comment
…to test_transformers_41
* Add MiniCPM-V cpu example * fix * fix * fix * fix
…s during lookup generation (#11989) * Fix garbage output for input_embeds inputs during lookup generation * Fix on sliding windows * Simplify code
* Update GraphRAG QuickStart * Further updates * Small fixes * Small fix
* minicpm example updates * --stream
* add save & load support * fix style
* fix dependabot alerts * update
…ex-llm into test_transformers_41
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
1. Why the change?
2. User API changes
3. Summary of the change
4. How to test?
1234
). And paste your action link here once it has been successfully finished.5. New dependencies
- Dependency1
- Dependency2
- ...
- Dependency1 and license1
- Dependency2 and license2
- ...