Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functional transformer demo #971

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Add functional transformer demo #971

wants to merge 18 commits into from

Conversation

kiya00
Copy link
Collaborator

@kiya00 kiya00 commented Aug 15, 2024

This notebook demonstrates the acceleration of a transformer model, implemented in a functional style, using Thunder. Key highlights:

  • Illustrates Thunder-compatible PyTorch code (it doesn't mean that more complicated code with object-oriented style cannot be handled)
  • Showcases successful execution of basic prompts using pre-trained weights
  • Provides a clear example of performance gains achieved through Thunder optimization (the only transformation involved here is the initial trace construction and "transform_for_execution")

The primary objective is to explain the characteristics of Thunder-friendly code and verify functionality with loaded pre-trained weights.

The code used in the notebook is adapted from https://gist.github.com/nreHieW/a4ae05d216c5326c9fb9a70fcdda3274

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@t-vi
Copy link
Collaborator

t-vi commented Aug 15, 2024

Hi @kiya00, thank you for writing thunder tutorials!

I would be very keen to not give the impression that users need to convert their code to functional in order to run it through thunder. Is there a particular reason for starting at a functional transformer here?
I would venture that the same code with the computation in forward and weights as modules would work as well?

@IvanYashchuk
Copy link
Collaborator

I would be very keen to not give the impression that users need to convert their code to functional in order to run it through thunder.

That is certainly not part of the plan. Do you have suggestions on what should be changed to avoid making this impression?

Is there a particular reason for starting at a functional transformer here?

It's the simplest form of PyTorch code apart from the imperative style without any functions.

I would venture that the same code with the computation in forward and weights as modules would work as well?

Of course, LitGPT is an example of that.

@t-vi
Copy link
Collaborator

t-vi commented Aug 16, 2024

That is certainly not part of the plan. Do you have suggestions on what should be changed to avoid making this impression?

I think it's mainly a wording thing. The initial wording looked a bit like the functional part was the key to get it to run with thunder, e.g.

This will give us some insight into how to convert a PyTorch module into a simple "functional" Python function, allowing for seamless integration with Thunder.

seems quite odd to me.

At the other end of the spectrum would be something like "Usually, you can just apply thunder.jit to any PyTorch Module and this is the recommended way, but today we want to use thunder.jit with a transformer that is implemented as a function. Along the way find highlight a couple of things are not supported by thunder (yet?) and change them...."

The other part is that, jitting a module and grabbing the thunder.last_traces(tm) would also give you a fully functional transformer, and indeed there are cases when this is very useful.

@kiya00
Copy link
Collaborator Author

kiya00 commented Aug 26, 2024

Hi @t-vi @IvanYashchuk , I rephrased a bit, the main purpose of this notebook is to give an example of writing a simple functional python function for a pytorch module and thunder can also apply to this version. there's no implication that the function needs to be converted in any specific way to be compatible with Thunder. Sorry for the confusion of the initial draft. and I hope this revision more accurately conveys my intention, please help to take a look if I express what I meant

@kiya00 kiya00 marked this pull request as ready for review August 27, 2024 08:08
@kiya00
Copy link
Collaborator Author

kiya00 commented Aug 27, 2024

If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in Meta-Llama-3-8B/consolidated.00.pth under the same folder of the notebook

@t-vi
Copy link
Collaborator

t-vi commented Aug 27, 2024

I think it would be OK to skip it in the CI. (We have not been running full models in it.)

@IvanYashchuk
Copy link
Collaborator

If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in Meta-Llama-3-8B/consolidated.00.pth under the same folder of the notebook

Is there any other popular model that is not behind a registration wall?

@IvanYashchuk
Copy link
Collaborator

@lantiga, @t-vi could you please review this new tutorial?

@IvanYashchuk
Copy link
Collaborator

@lantiga, @t-vi could you please review this new tutorial?

Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So first, it is a great tutorial.

I'm still not thrilled about the introduction:
These two things:

easily understood and optimized by both developers and compilers.

By the end of this notebook, you'll have a clear understanding of how functional programming principles can be leveraged to create more efficient and compiler-friendly transformer models.

are relatively dubious to me. How would a developer more easily understand and optimize a functional model? If that is so, why does PyTorch still do modular style?

Maybe we can put it more as

As part of compiling models such as LitGPT, Thunder produces a functional version - the computation trace and function - of the model. Here we implement such a functional model directly to understand what is usually done behind the scenes.

The other question I'd have is if our use of the code is OK here (did we ask the gist author, do we think that the notebook is affected by the copyright of the gist)?

@kiya00
Copy link
Collaborator Author

kiya00 commented Sep 25, 2024

The other question I'd have is if our use of the code is OK here (did we ask the gist author, do we think that the notebook is affected by the copyright of the gist)?

I've left a message to the author on the gist, hopefully we'll get some feedback soon.

@t-vi
Copy link
Collaborator

t-vi commented Sep 26, 2024

image

Supergood!

@t-vi
Copy link
Collaborator

t-vi commented Sep 26, 2024

In a v2 (absolutely not required in this PR), it might be interesting to compare functional version built here to the computation trace from jitting LitGPT.

* Update the type signature of `rope`
* Update the docstring of `rotate`

Co-authored-by: beverlylytle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants