Megatron fused CUDA kernels to improve Hugging Face model classes' scalability #11368

g-karthik · 2021-04-22T01:03:49Z

🚀 Feature request

Support for custom fused CUDA kernels with HF model classes.

Motivation

It appears that Hugging Face model classes do not scale very well as-is unlike Megatron-LM, even when the latter is configured with a degree of model-parallelization = 1 for a "fair" performance comparison.

One of the presumed reasons for this is that Megatron-LM leverages custom fused CUDA kernels written by NVIDIA, specifically these.

Could we get variants of existing HF classes (perhaps for GPT2Model, GPT2LMHeadModel, etc.) such that the variants leverage some/all of these fused CUDA kernels? All this while still ensuring that one can load the original pre-trained weights into these variant classes.

Any guidance/low-level thoughts towards making this happen would also be greatly useful!

@thomwolf @patrickvonplaten @LysandreJik @stas00

The text was updated successfully, but these errors were encountered:

stas00 · 2021-05-23T01:39:41Z

I think the biggest barrier to using custom CUDA kernel is that it'd require transformers to move from a python-only package, to a compilation-required type of package (even if JIT), which in my experience is the type of a package that is far from trivial to use and often raises a barrier to entry.

If I'm not mistaken some fused kernels have been pushed upstream into the pytorch-core, so if you know of any that we could receive precompiled via pytorch, then we can definitely use those.

And if they aren't and you have some resources to initiate the conversation - it'd definitely help to request that such kernels will be added to pytorch-core. Definitely tag me if I do start such a thread at pytorch Issues.

I love your spirit of proposing various performance optimizations, @g-karthik and I'd love to work on all of those you have been proposing here and at Deepspeed issues, but so far I find no free resources to do so and all my time is spent on making things work.

huggingface deleted a comment from github-actions bot May 22, 2021

stas00 added WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress Performance labels May 22, 2021

stas00 mentioned this issue Jun 12, 2021

[Performance] Tracking open Issues and PRs (pytorch transformers) #12126

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron fused CUDA kernels to improve Hugging Face model classes' scalability #11368

Megatron fused CUDA kernels to improve Hugging Face model classes' scalability #11368

g-karthik commented Apr 22, 2021 •

edited

Loading

stas00 commented May 23, 2021

Megatron fused CUDA kernels to improve Hugging Face model classes' scalability #11368

Megatron fused CUDA kernels to improve Hugging Face model classes' scalability #11368

Comments

g-karthik commented Apr 22, 2021 • edited Loading

🚀 Feature request

Motivation

stas00 commented May 23, 2021

g-karthik commented Apr 22, 2021 •

edited

Loading