How to correctly load a model when using FSDP and init_module(empty_init=True) #19832

RuABraun · 2024-04-30T23:47:57Z

RuABraun
Apr 30, 2024

Can one just use load_state_dict(path=..) ? That's the impression I get from the docs

Although it's not clear to me if that checkpoint is a special distributed one.

The following code makes me think one needs a different approach?

It's also weird that this code has this comment "is agnostic to the strategy being used" but litgpt still has an if case for FSDP?

So my main questions are:

Should one load the model parameters outside of the with fabric.init_module(empty_init=False) block? What if one has a part of the model that is not sharded by FSDP (its modules are not in the auto_wrap_policy), can one init that inside init_module() ?
How should one load a model's parameters if it's not a distributed checkpoint already. Would load_state_dict() be wrong and fabric.load_raw be right?