Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing "adapter_model.bin" or "pytorch_model.bin" #2215

Open
zhao1402072392 opened this issue Jul 30, 2024 · 0 comments
Open

missing "adapter_model.bin" or "pytorch_model.bin" #2215

zhao1402072392 opened this issue Jul 30, 2024 · 0 comments

Comments

@zhao1402072392
Copy link

zhao1402072392 commented Jul 30, 2024

When I run # run aggregator server
bash scripts/run_fedml_server.sh "$RUN_ID"

run client(s)

bash scripts/run_fedml_client.sh 1 "$RUN_ID"
bash scripts/run_fedml_client.sh 2 "$RUN_ID"
bash scripts/run_fedml_client.sh 3 "$RUN_ID"

I have the error as follows:

File "/home//anaconda3/envs/fedllm/lib/python3.10/site-packages/fedml/cross_silo/client/fedml_trainer.py", line 83, in train weights = self.trainer.get_model_params() File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 325, in get_model_params peft_state_dict = load_checkpoint(self.latest_checkpoint_dir) File "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/run_fedllm.py", line 238, in load_checkpoint raise FileNotFoundError( FileNotFoundError: Could not find either PEFT checkpoint in "/gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/adapter_model.bin" nor full checkpoint in /gpfs/work4/0/tese0660/projects/FedML/python/spotlight_prj/fedllm/.logs/FedML/1111/node_2/round_0_before_agg/pytorch_model.bin. [2024-07-30 15:00:46,590] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 3673085

Could someone help me with the issue? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant