Offload fixes #17810

sgugger · 2022-06-21T21:58:36Z

What does this PR do?

This PR fixes a few bugs in the current offload to disk implementation via Accelerate.

The offload_folder is not created if it doesn't exists, loading to cryptic errors about missing files.
When the model is a task model and the checkpoint one of the base model (like for OPT), there are two issues arising:
- if offload_state_dict=True, the weights should be reloaded in model_to_load from the temporary offload
- all the weights offloaded to disk are missing the base_model_cls prefix since they were offloaded as weights of model_to_load and not of model.

HuggingFaceDocBuilderDev · 2022-06-21T22:09:24Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-06-21T22:54:33Z

src/transformers/modeling_utils.py

@@ -2449,6 +2451,15 @@ def _find_mismatched_keys(
                gc.collect()

            if offload_index is not None and len(offload_index) > 0:
+                if model != model_to_load:
+                    # We need to add the prefix of the base model


(nit) Maybe make the comment a bit more explicit? Took me a bit to figure out what is going on here

Suggested change

# We need to add the prefix of the base model

# We need to add the prefix of the base model `model_to_load` to be able to correctly the weights into the task model `model`

patrickvonplaten

Looks good to me! Maybe we could add a test as well or too edge casy?

* Offload fixes * Add a test

sgugger requested a review from patrickvonplaten June 21, 2022 21:58

Offload fixes

5318cf8

sgugger force-pushed the offload_fixes branch from b0b7477 to 5318cf8 Compare June 21, 2022 21:59

patrickvonplaten reviewed Jun 21, 2022

View reviewed changes

patrickvonplaten approved these changes Jun 21, 2022

View reviewed changes

Add a test

5ead4bf

sgugger merged commit df8e680 into main Jun 22, 2022

sgugger deleted the offload_fixes branch June 22, 2022 16:23

younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Jun 25, 2022

Offload fixes (huggingface#17810)

ac1e75d

* Offload fixes * Add a test

younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Jun 29, 2022

Offload fixes (huggingface#17810)

9d96613

* Offload fixes * Add a test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload fixes #17810

Offload fixes #17810

sgugger commented Jun 21, 2022

HuggingFaceDocBuilderDev commented Jun 21, 2022 •

edited

Loading

patrickvonplaten Jun 21, 2022

patrickvonplaten left a comment

	# We need to add the prefix of the base model
	# We need to add the prefix of the base model `model_to_load` to be able to correctly the weights into the task model `model`

Offload fixes #17810

Offload fixes #17810

Conversation

sgugger commented Jun 21, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 21, 2022 • edited Loading

patrickvonplaten Jun 21, 2022

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 21, 2022 •

edited

Loading