- [2023/12] ๐ฅ Support multi-modal VLM pretraining and fine-tuning with LLaVA-v1.5 architecture! Click here for details!
- [2023/12] ๐ฅ Support Mixtral 8x7b model! Click here for details!
- [2023/11] Support ChatGLM3-6B model!
- [2023/10] Support MSAgent-Bench dataset, and the fine-tuned LLMs can be applied by Lagent!
- [2023/10] Optimize the data processing to accommodate
system
context. More information can be found on Docs! - [2023/09] Support InternLM-20B models!
- [2023/09] Support Baichuan2 models!
- [2023/08] XTuner is released, with multiple fine-tuned adapters on HuggingFace.
XTuner is a toolkit for efficiently fine-tuning LLM, developed by the MMRazor and MMDeploy teams.
- Efficiency: Support LLM fine-tuning on consumer-grade GPUs. The minimum GPU memory required for 7B LLM fine-tuning is only 8GB, indicating that users can use nearly any GPU (even the free resource, e.g., Colab) to fine-tune custom LLMs.
- Versatile: Support various LLMs (InternLM, Llama2, ChatGLM, Qwen, Baichuan2, ...), datasets (MOSS_003_SFT, Alpaca, WizardLM, oasst1, Open-Platypus, Code Alpaca, Colorist, ...) and algorithms (QLoRA, LoRA), allowing users to choose the most suitable solution for their requirements.
- Compatibility: Compatible with DeepSpeed ๐ and HuggingFace ๐ค training pipeline, enabling effortless integration and utilization.
Models | SFT Datasets | Data Pipelines | Algorithms |
-
It is recommended to build a Python-3.10 virtual environment using conda
conda create --name xtuner-env python=3.10 -y conda activate xtuner-env
-
Install XTuner via pip
pip install -U xtuner
or with DeepSpeed integration
pip install -U 'xtuner[deepspeed]'
-
Install XTuner from source
git clone https://github.com/InternLM/xtuner.git cd xtuner pip install -e '.[all]'
XTuner supports the efficient fine-tune (e.g., QLoRA) for LLMs. Dataset prepare guides can be found on dataset_prepare.md.
-
Step 0, prepare the config. XTuner provides many ready-to-use configs and we can view all configs by
xtuner list-cfg
Or, if the provided configs cannot meet the requirements, please copy the provided config to the specified directory and make specific modifications by
xtuner copy-cfg ${CONFIG_NAME} ${SAVE_PATH}
-
Step 1, start fine-tuning.
xtuner train ${CONFIG_NAME_OR_PATH}
For example, we can start the QLoRA fine-tuning of InternLM-7B with oasst1 dataset by
# On a single GPU xtuner train internlm_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2 # On multiple GPUs (DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2 (SLURM) srun ${SRUN_ARGS} xtuner train internlm_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2
-
--deepspeed
means using DeepSpeed ๐ to optimize the training. XTuner comes with several integrated strategies including ZeRO-1, ZeRO-2, and ZeRO-3. If you wish to disable this feature, simply remove this argument. -
For more examples, please see finetune.md.
-
-
Step 2, convert the saved PTH model (if using DeepSpeed, it will be a directory) to HuggingFace model, by
xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}
XTuner provides tools to chat with pretrained / fine-tuned LLMs.
xtuner chat ${NAME_OR_PATH_TO_LLM} --adapter {NAME_OR_PATH_TO_ADAPTER} [optional arguments]
For example, we can start the chat with
InternLM-7B with adapter trained from Alpaca-enzh:
xtuner chat internlm/internlm-7b --adapter xtuner/internlm-7b-qlora-alpaca-enzh --prompt-template internlm_chat --system-template alpaca
Llama2-7b with adapter trained from MOSS-003-SFT:
xtuner chat meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --system-template moss_sft --with-plugins calculate solve search --command-stop-word "<eoc>" --answer-stop-word "<eom>" --no-streamer
For more examples, please see chat.md.
-
Step 0, merge the HuggingFace adapter to pretrained LLM, by
xtuner convert merge \ ${NAME_OR_PATH_TO_LLM} \ ${NAME_OR_PATH_TO_ADAPTER} \ ${SAVE_PATH} \ --max-shard-size 2GB
-
Step 1, deploy fine-tuned LLM with any other framework, such as LMDeploy ๐.
pip install lmdeploy python -m lmdeploy.pytorch.chat ${NAME_OR_PATH_TO_LLM} \ --max_new_tokens 256 \ --temperture 0.8 \ --top_p 0.95 \ --seed 0
๐ฅ Seeking efficient inference with less GPU memory? Try 4-bit quantization from LMDeploy! For more details, see here.
- We recommend using OpenCompass, a comprehensive and systematic LLM evaluation library, which currently supports 50+ datasets with about 300,000 questions.
We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline.
@misc{2023xtuner,
title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
author={XTuner Contributors},
howpublished = {\url{https://github.com/InternLM/xtuner}},
year={2023}
}
This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.