This example demonstrates how to easily run LLM finetuning application using axolotl v0.4.0 and IPEX-LLM 4bit optimizations with Intel GPUs. By applying IPEX-LLM patch, you could use axolotl on Intel GPUs using IPEX-LLM optimization without writing code.
Note, this example is just used for illustrating related usage and don't guarantee convergence of training.
To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here for more information.
conda create -n llm python=3.11
conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# install axolotl v0.4.0
git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl
git checkout v0.4.0
cp ../requirements-xpu.txt requirements.txt
pip install -e .
pip install transformers==4.36.0
# to avoid https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
pip install datasets==2.15.0
source /opt/intel/oneapi/setvars.sh
You can download a default default_config.yaml
with use_cpu: false
.
mkdir -p ~/.cache/huggingface/accelerate/
wget -O ~/.cache/huggingface/accelerate/default_config.yaml https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/default_config.yaml
As an alternative, you can config accelerate based on your requirements.
accelerate config
Please answer NO
in option Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:
.
After finish accelerate config, check if use_cpu
is disable (i.e., use_cpu: false
) in accelerate config file (~/.cache/huggingface/accelerate/default_config.yaml
).
export HF_HUB_OFFLINE=1
For more details, please refer hfhuboffline.
This example shows how to run Alpaca LoRA training and Alpaca QLoRA finetune directly on Intel GPU. Note that only Llama-2-7B LoRA and QLoRA examples are verified on Intel ARC 770 with 16GB memory.
Based on axolotl Llama-2 LoRA example.
accelerate launch finetune.py lora.yml
In v0.4.0, you can also use train.py
instead of -m axolotl.cli.train
or finetune.py
.
accelerate launch train.py lora.yml
Based on axolotl Llama-2 QLoRA example.
Modify parameters in qlora.yml
based on your requirements. Then, launch finetuning with the following command.
accelerate launch finetune.py qlora.yml
In v0.4.0, you can also use train.py
instead of -m axolotl.cli.train
or finetune.py
.
accelerate launch train.py qlora.yml
Output in console
{'eval_loss': 0.9382301568984985, 'eval_runtime': 6.2513, 'eval_samples_per_second': 3.199, 'eval_steps_per_second': 3.199, 'epoch': 0.36}
{'loss': 0.944, 'learning_rate': 0.00019752490425051743, 'epoch': 0.38}
{'loss': 1.0179, 'learning_rate': 0.00019705675197106016, 'epoch': 0.4}
{'loss': 0.9346, 'learning_rate': 0.00019654872959986937, 'epoch': 0.41}
{'loss': 0.9747, 'learning_rate': 0.0001960010458282326, 'epoch': 0.43}
{'loss': 0.8928, 'learning_rate': 0.00019541392564000488, 'epoch': 0.45}
{'loss': 0.9317, 'learning_rate': 0.00019478761021918728, 'epoch': 0.47}
{'loss': 1.0534, 'learning_rate': 0.00019412235685085035, 'epoch': 0.49}
{'loss': 0.8777, 'learning_rate': 0.00019341843881544372, 'epoch': 0.5}
{'loss': 0.9447, 'learning_rate': 0.00019267614527653488, 'epoch': 0.52}
{'loss': 0.9651, 'learning_rate': 0.00019189578116202307, 'epoch': 0.54}
{'loss': 0.9067, 'learning_rate': 0.00019107766703887764, 'epoch': 0.56}
Warning: this section will install axolotl main (796a085) for new features, e.g., Llama-3-8B.
Axolotl main has lots of new dependencies. Please setup a new conda env for this version.
conda create -n llm python=3.11
conda activate llm
# install axolotl main
git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl && git checkout 796a085
pip install -e .
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# install transformers etc
# to avoid https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
pip install datasets==2.15.0
pip install transformers==4.37.0
Config accelerate and oneAPIs, according to Configures OneAPI environment variables and accelerate.
Based on axolotl Llama-3 QLoRA example.
Modify parameters in llama3-qlora.yml
based on your requirements. Then, launch finetuning with the following command.
accelerate launch finetune.py llama3-qlora.yml
You can also use train.py
instead of -m axolotl.cli.train
or finetune.py
.
accelerate launch train.py llama3-qlora.yml
Expected output
{'loss': 0.237, 'learning_rate': 1.2254711850265387e-06, 'epoch': 3.77}
{'loss': 0.6068, 'learning_rate': 1.1692453482951115e-06, 'epoch': 3.77}
{'loss': 0.2926, 'learning_rate': 1.1143322458989303e-06, 'epoch': 3.78}
{'loss': 0.2475, 'learning_rate': 1.0607326072295087e-06, 'epoch': 3.78}
{'loss': 0.1531, 'learning_rate': 1.008447144232094e-06, 'epoch': 3.79}
{'loss': 0.1799, 'learning_rate': 9.57476551396197e-07, 'epoch': 3.79}
{'loss': 0.2724, 'learning_rate': 9.078215057463868e-07, 'epoch': 3.79}
{'loss': 0.2534, 'learning_rate': 8.594826668332445e-07, 'epoch': 3.8}
{'loss': 0.3388, 'learning_rate': 8.124606767246579e-07, 'epoch': 3.8}
{'loss': 0.3867, 'learning_rate': 7.667561599972505e-07, 'epoch': 3.81}
{'loss': 0.2108, 'learning_rate': 7.223697237281668e-07, 'epoch': 3.81}
{'loss': 0.0792, 'learning_rate': 6.793019574868775e-07, 'epoch': 3.82}