Megatron fused CUDA kernels to improve Hugging Face model classes' scalability #11368
Labels
Performance
WIP
Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
🚀 Feature request
Support for custom fused CUDA kernels with HF model classes.
Motivation
It appears that Hugging Face model classes do not scale very well as-is unlike Megatron-LM, even when the latter is configured with a degree of model-parallelization = 1 for a "fair" performance comparison.
One of the presumed reasons for this is that Megatron-LM leverages custom fused CUDA kernels written by NVIDIA, specifically these.
Could we get variants of existing HF classes (perhaps for
GPT2Model
,GPT2LMHeadModel
, etc.) such that the variants leverage some/all of these fused CUDA kernels? All this while still ensuring that one can load the original pre-trained weights into these variant classes.Any guidance/low-level thoughts towards making this happen would also be greatly useful!
@thomwolf @patrickvonplaten @LysandreJik @stas00
The text was updated successfully, but these errors were encountered: