Skip to content

Commit

Permalink
Merge commit 'bd420f2e9ef6d04a8b4c62c1ab27d3b8b7fb9b67' into feat/tra…
Browse files Browse the repository at this point in the history
…cker

* commit 'bd420f2e9ef6d04a8b4c62c1ab27d3b8b7fb9b67':
  Support gemma2 (modelscope#1247)
  Add more datasets (modelscope#1246)
  refactor inference (modelscope#1245)
  fix bugs (modelscope#1242)
  Fix bugs (modelscope#1241)
  fix (modelscope#1232)
  Add new dataset (modelscope#1227)
  fix readme
  Fix qlora deploy (modelscope#1224)
  Add debug log support (modelscope#1226)
  Fix glm4v batch size (modelscope#1223)
  [TorchAcc] Add USE_TORCH_XLA=0 flag for native swift scripts (modelscope#1221)
  add in_browswer (modelscope#1220)
  support llava 1.5 (modelscope#1217)
  update multi-modal docs (modelscope#1212)
  • Loading branch information
tastelikefeet committed Jun 28, 2024
2 parents 6688285 + bd420f2 commit ff34a85
Show file tree
Hide file tree
Showing 54 changed files with 1,073 additions and 610 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!

## 🎉 News
- 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
- 🔥2024.06.18: Supports **DeepSeek-Coder-v2** series model! Use model_type `deepseek-coder-v2-instruct` and `deepseek-coder-v2-lite-instruct` to begin.
- 🔥2024.06.16: Supports **KTO** and **CPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Human-Preference-Alignment-Training-Documentation.md) to start training!
- 2024.06.11: Support for tool-calling agent deployment that conform to the OpenAI interface.You can refer to [Agent deployment best practice](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Agent-deployment-best-practice.md)
Expand Down Expand Up @@ -388,7 +389,7 @@ NODE_RANK=0 \
MASTER_ADDR=127.0.0.1 \
NPROC_PER_NODE=8 \
swift sft \
--model_id_or_path qwen1half-32b-chat \
--model_type qwen1half-32b-chat \
--sft_type full \
--dataset blossom-math-zh \
--output_dir output \
Expand All @@ -401,7 +402,7 @@ NODE_RANK=1 \
MASTER_ADDR=xxx.xxx.xxx.xxx \
NPROC_PER_NODE=8 \
swift sft \
--model_id_or_path qwen1half-32b-chat \
--model_type qwen1half-32b-chat \
--sft_type full \
--dataset blossom-math-zh \
--output_dir output \
Expand All @@ -415,7 +416,7 @@ In DLC product, WORLD_SIZE is the node number, RANK is the node index, this is d
NNODES=$WORLD_SIZE \
NODE_RANK=$RANK \
swift sft \
--model_id_or_path qwen1half-32b-chat \
--model_type qwen1half-32b-chat \
--sft_type full \
--dataset blossom-math-zh \
--output_dir output \
Expand Down Expand Up @@ -502,7 +503,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
|------------------------------------------------|------------------------------------------------------------------------|--------------------|----------------------------------------|------------------------------------------- |
| Qwen<br>Qwen1.5<br>Qwen2 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
| Baichuan/Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
| Baichuan<br>Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
Expand All @@ -512,7 +513,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
| InternLM<br>InternLM2<br>InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2 | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-236B | base model<br>chat model<br>MoE model<br>code model<br>math model |
| MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model |
| Gemma | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-7B | base model<br>instruct model |
| Gemma<br>Gemma2 | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-27B | base model<br>instruct model |
| MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 2B-3B | chat model<br>MoE model |
| OpenBuddy | [OpenBuddy series models](https://github.com/OpenBuddy/OpenBuddy) | Chinese<br>English | 7B-70B | base model<br>chat model |
| Orion | [OrionStar AI series models](https://github.com/OrionStarAI) | Chinese<br>English | 14B | base model<br>chat model |
Expand Down Expand Up @@ -548,7 +549,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
| DeepSeek-VL | [DeepSeek series vision models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-7B | chat model |
| MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2_5 | [OpenBmB MiniCPM vision model](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 3B-9B | chat model |
| CogVLM<br>CogVLM2<br>CogAgent<br>GLM4V | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/) | Chinese<br>English | 9B-19B | chat model |
| Llava | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B-34B | chat model |
| Llava1.5<br>Llava1.6 | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B-34B | chat model |
| Llava-Next | [Llava-Next series models](https://github.com/LLaVA-VL/LLaVA-NeXT) | Chinese<br>English | 8B-110B | chat model |
| mPLUG-Owl | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl) | English | 11B | chat model |
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | Chinese<br>English | 2B-25.5B<br>including quantized version | chat model |
Expand Down
11 changes: 6 additions & 5 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift)[ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。

## 🎉 新闻
- 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
- 🔥2024.06.18: 支持**DeepSeek-Coder-v2**系列模型! 使用model_type`deepseek-coder-v2-instruct``deepseek-coder-v2-lite-instruct`来开启训练和推理.
- 🔥2024.06.16: 支持**KTO****CPO**训练,使用`swift rlhf --rlhf_type kto``swift rlhf --rlhf_type cpo`来开始训练,可以参考[文档](./docs/source/LLM/人类偏好对齐训练文档.md).
- 2024.06.11: 支持符合OpenAI接口的工具调用Agent部署, 可以查看[Agent部署最佳实践](docs/source/LLM/Agent部署最佳实践.md).
Expand Down Expand Up @@ -385,7 +386,7 @@ NODE_RANK=0 \
MASTER_ADDR=127.0.0.1 \
NPROC_PER_NODE=8 \
swift sft \
--model_id_or_path qwen1half-32b-chat \
--model_type qwen1half-32b-chat \
--sft_type full \
--dataset blossom-math-zh \
--output_dir output \
Expand All @@ -398,7 +399,7 @@ NODE_RANK=1 \
MASTER_ADDR=xxx.xxx.xxx.xxx \
NPROC_PER_NODE=8 \
swift sft \
--model_id_or_path qwen1half-32b-chat \
--model_type qwen1half-32b-chat \
--sft_type full \
--dataset blossom-math-zh \
--output_dir output \
Expand All @@ -411,7 +412,7 @@ DLC环境变量中,WORLD_SIZE指代node数量,RANK指代node序号,这一
NNODES=$WORLD_SIZE \
NODE_RANK=$RANK \
swift sft \
--model_id_or_path qwen1half-32b-chat \
--model_type qwen1half-32b-chat \
--sft_type full \
--dataset blossom-math-zh \
--output_dir output \
Expand Down Expand Up @@ -508,7 +509,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| InternLM<br>InternLM2<br>InternLM2-Math | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B | base模型<br>chat模型<br>数学模型 |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2 | [幻方系列模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-236B | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
| MAMBA | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba) | 英文 | 130M-2.8B | base模型 |
| Gemma | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-7B | base模型<br>instruct模型 |
| Gemma<br>Gemma2 | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-27B | base模型<br>instruct模型 |
| MiniCPM | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 2B-3B | chat模型<br>MoE模型 |
| OpenBuddy | [OpenBuddy系列模型](https://github.com/OpenBuddy/OpenBuddy) | 中文<br>英文 | 7B-70B | base模型<br>chat模型 |
| Orion | [猎户星空系列模型](https://github.com/OrionStarAI) | 中文<br>英文 | 14B | base模型<br>chat模型 |
Expand Down Expand Up @@ -545,7 +546,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| DeepSeek-VL | [幻方系列视觉模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-7B | chat模型 |
| MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2_5 | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 3B-9B | chat模型 |
| CogVLM<br>CogVLM2<br>CogAgent<br>GLM4V | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/) | 中文<br>英文 | 9B-19B | chat模型 |
| Llava | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B-34B | chat模型 |
| Llava1.5<br>Llava1.6 | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B-34B | chat模型 |
| Llava-Next | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT) | 中文<br>英文 | 8B-110B | chat模型 |
| mPLUG-Owl | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl) | 英文 | 11B | chat模型 |
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | 中文<br>英文 | 2B-25.5B<br>包含量化版本 | chat模型 |
Expand Down
2 changes: 1 addition & 1 deletion docs/source/LLM/DPO训练文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ cd examples/pytorch/llm
- 如果用带有history的数据训练base模型,需要指定支持多轮对话的template(base模型往往不支持多轮对话),对于这种情况我们默认设置了`chatml`template,你也可以支持--model_type 来选择训练模型的template
- 我们默认在训练时设置`--gradient_checkpointing true`**节约显存**, 这会略微降低训练速度.
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
- 如果你需要断网进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](命令行参数.md).
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.

Expand Down
Loading

0 comments on commit ff34a85

Please sign in to comment.