-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support copies #32159
support copies #32159
Conversation
What kind of copying are we talking about here? Like cache.copy? |
@amyeroberts On main, without the fix, we get
Cache copying is needed to reuse the cache from the prompt. E.g. to run new prompts on top of the system prompt without spending compute on the system prompt. |
da262b0
to
80bb8fb
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I'm sorry if it's not the right place to ask this question, but. In Llama.cpp it's trivial to save and load state to/from disk to maintain the cache between sessions. Is it currently possible with Transformers, and if yes, could you please provide a minimal example or point to docs? Cheers, |
@vladfaust yes it is possible, but it requires custom code (i.e. you would need to store and restore the cache's tensors). We will add a user-friendly API for that in the future :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ps this was actually already merged in #32168 so I'll close this one! |
What does this PR do?
We can't copy the cache 😢 inheriting from module fixes this easily
This renders us unable to re-use prompts / system prompt like this: