-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPU OOM for mistral.py::Mask4DTestHard
#31212
Conversation
if self.__class__._model is None: | ||
self.__class__._model = MistralForCausalLM.from_pretrained( | ||
self.model_name, torch_dtype=self.model_dtype | ||
).to(torch_device) | ||
return self.__class__._model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to modify the class attribute instead of self._model
.
Without this but just @cached_property
, it still GPU OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal, but since it's slow tests which is meant to be run in a single process, we can accept this somehow hacky solution to avoid GPU OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need @cached_property
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No :-), you have a very good 👁️ ! I will remove it.
(Actually, after this PR, I started to doubt if cached_property
is working well with tests!)
self.model_dtype = torch.float32 | ||
self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False) | ||
self.model = MistralForCausalLM.from_pretrained(model_name, torch_dtype=self.model_dtype).to(torch_device) | ||
self.model_dtype = torch.float16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is also necessary, otherwise still GPU OOM
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
if self.__class__._model is None: | ||
self.__class__._model = MistralForCausalLM.from_pretrained( | ||
self.model_name, torch_dtype=self.model_dtype | ||
).to(torch_device) | ||
return self.__class__._model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need @cached_property
here?
* build * build * build * build --------- Co-authored-by: ydshieh <[email protected]>
What does this PR do?
Fix GPU OOM for
mistral.py::Mask4DTestHard
job run page
https://github.com/huggingface/transformers/actions/runs/9343352211/job/25712939733