Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行程序遇到以下报错,怎么解决呀? #7

Open
ww249 opened this issue Sep 4, 2023 · 4 comments
Open

运行程序遇到以下报错,怎么解决呀? #7

ww249 opened this issue Sep 4, 2023 · 4 comments

Comments

@ww249
Copy link

ww249 commented Sep 4, 2023

Missing logger folder: outputs/aedet_lss_r50_256x704_128x128_24e_2key/lightning_logs
Restoring states from the checkpoint path at /home/ww/Coding/AeDet/data/nuscenes/nuscenes_12hz_infos_train.pkl
Traceback (most recent call last):
File "/home/ww/Coding/AeDet/exps/aedet/aedet_lss_r50_256x704_128x128_24e_2key.py", line 109, in
run_cli()
File "/home/ww/Coding/AeDet/exps/aedet/aedet_lss_r50_256x704_128x128_24e_2key.py", line 105, in run_cli
main(args)
File "/home/ww/Coding/AeDet/exps/aedet/aedet_lss_r50_256x704_128x128_24e_2key.py", line 75, in main
trainer.fit(model, ckpt_path=args.ckpt_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._call_and_handle_interrupt(
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1180, in _run
self._restore_modules_and_callbacks(ckpt_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1140, in _restore_modules_and_callbacks
self._checkpoint_connector.resume_start(checkpoint_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 84, in resume_start
self._loaded_checkpoint = self._load_and_validate_checkpoint(checkpoint_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 88, in _load_and_validate_checkpoint
loaded_checkpoint = self.trainer.strategy.load_checkpoint(checkpoint_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 316, in load_checkpoint
return self.checkpoint_io.load_checkpoint(checkpoint_path)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/plugins/io/torch_plugin.py", line 85, in load_checkpoint
return pl_load(path, map_location=map_location)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/pytorch_lightning/utilities/cloud_io.py", line 47, in load
return torch.load(f, map_location=map_location)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ww/.conda/envs/aedet2/lib/python3.8/site-packages/torch/serialization.py", line 780, in _legacy_load
raise RuntimeError("Invalid magic number; corrupt file?")
RuntimeError: Invalid magic number; corrupt file?

@fcjian
Copy link
Owner

fcjian commented Sep 4, 2023

What is your training script?

@ww249
Copy link
Author

ww249 commented Sep 11, 2023

sudo /home/ww/.conda/envs/aedet2/bin/python /home/ww/Coding/AeDet/exps/aedet/aedet_lss_r101_512x1408_256x256_24e_2key.py --amp_backend native -b 8 --gpus 1 --ckpt_path /home/ww/Coding/AeDet/data/nuScenes/nuscenes_12hz_infos_train.pkl

@fcjian
Copy link
Owner

fcjian commented Sep 11, 2023

--ckpt_path means the path of the model checkpoint, and you should remove it, namely:
sudo /home/ww/.conda/envs/aedet2/bin/python /home/ww/Coding/AeDet/exps/aedet/aedet_lss_r101_512x1408_256x256_24e_2key.py --amp_backend native -b 8 --gpus 1

@ww249
Copy link
Author

ww249 commented Sep 11, 2023

very very very appreciate it!!
this problem has been solved.
from now on, i'll order food delivery by Meittuan.
thanks again!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants