How to correctly load a model when using FSDP and init_module(empty_init=True) #19832
Unanswered
RuABraun
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Can one just use
load_state_dict(path=..)
? That's the impression I get from the docsAlthough it's not clear to me if that checkpoint is a special distributed one.
The following code makes me think one needs a different approach?
https://github.com/Lightning-AI/litgpt/blob/main/litgpt/finetune/full.py#L148C9-L148C24
It's also weird that this code has this comment "is agnostic to the strategy being used" but litgpt still has an if case for FSDP?
So my main questions are:
with fabric.init_module(empty_init=False)
block? What if one has a part of the model that is not sharded by FSDP (its modules are not in the auto_wrap_policy), can one init that insideinit_module()
?load_state_dict()
be wrong andfabric.load_raw
be right?Beta Was this translation helpful? Give feedback.
All reactions