fp8_transformer_engine_train

How to use FP8 to train LLM with multiple GPU

0- python=3.10

1- First install cudnn 8.92 cuda 12.1 Torch 2.3.1 cudnn: https://developer.nvidia.com/rdp/cudnn-archive

2-Then install transformer_engine

3- wait for make completed

4- download the two python files to same folder

5- use this command to trigger train: torchrun --nproc_per_node=2 m_gpu.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
m_gpu.py		m_gpu.py
quickstart_utils.py		quickstart_utils.py

Provide feedback