Honk honk honk! This project is actively under development. Check out my learning progress here.
import torch
import torch.nn.functional as F
from transformer import AutoModel, AutoTokenizer
from datasets import load_dataset
+ from pipegoose import DataParallel, TensorParallel, PipelineParalell, ParallelContext
+ from pipegoose.optim import DistributedOptimizer
model = AutoModel.from_pretrained("bloom")
tokenizer = AutoTokenizer.from_pretrained("bloom")
- device = "cuda"
- model = model.to(device)
+ parallel_context = ParallelContext(
+ tensor_parallel_size=2,
+ data_parallel_size=2,
+ pipeline_parallel_size=2
+ )
+ model = DataParallel(model, parallel_context).parallelize()
+ model = TensorParallel(model, parallel_context).parallelize()
+ model = PipelineParallel(model, parallel_context).parallelize()
optimizer = torch.optim.Adam(model.parameters())
+ optimizer = DistributedOptimizer(optimizer, parallel_context)
dataset = load_dataset('goose')
dataloader = torch.utils.data.DataLoader(dataset, batch_size=42)
for epoch in range(69):
for inputs, targets in dataloader:
- inputs = inputs.to(device)
- targets = targets.to(device)
output = model(inputs)
loss = F.cross_entropy(output, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Implementation Details
- Supports training
transformers
model in Megatron 3D parallelism and ZeRO-1 (write from scratch). - Implements parallel compute and data transfer using separate CUDA streams.
- Gradient checkpointing will be implemented by enforcing virtual dependency in the backpropagation graph, ensuring that the activation for gradient checkpoint will be recomputed just in time for each (micro-batch, partition).
- Custom algorithms for model partitioning with two default partitioning models based on elapsed time and GPU memory consumption per layer.
- Potential support includes:
- Callbacks within the pipeline:
Callback(function, microbatch_idx, partition_idx)
for before and after the forward, backward, and recompute steps (for gradient checkpointing). - Mixed precision training.
- Callbacks within the pipeline:
Appreciation
-
Big thanks to 🤗 Hugging Face for sponsoring this project with 8x A100 GPUs for testing! And Zach Schrier for monthly twitch donations
-
The library's APIs are inspired by OSLO's and ColossalAI's APIs.