Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmseg - WARNING - The model and loaded state dict do not match exactly #15

Open
OpenAI-chn opened this issue Sep 24, 2023 · 5 comments
Open

Comments

@OpenAI-chn
Copy link

mmseg - WARNING - The model and loaded state dict do not match exactly
size mismatch for stem.0.conv.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 3, 3, 3]).
size mismatch for stem.0.bn.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for stem.0.bn.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for stem.0.bn.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
好像是预训练权重文件 和模型不匹配,请问这个怎么修改呢?

@dongbo1998
Copy link

Please modify the path of the corresponding pre-training weights in the config.

@dongbo1998
Copy link

Sorry, I have re-updated the pre-training weights. Please download and try again. If you have more questions, please contact me.

@OpenAI-chn
Copy link
Author

Sorry, I have re-updated the pre-training weights. Please download and try again. If you have more questions, please contact me.

Hello, thank you very much for your help. The pre-training weight file this time did not throw any errors and effectively improved the performance on my own dataset B. Additionally, I'd like to ask you a question: After training Afformer-base with my self-constructed dataset A and using it as pre-training weights, why is there no significant improvement when fine-tuning on the small-scale dataset B?

@OpenAI-chn
Copy link
Author

My dataset A has around 20,000 images, while dataset B has a few hundred images. Is it because my dataset A is too small in scale? Were your pre-training weight files trained on ImageNet?

@agfwhf
Copy link

agfwhf commented Sep 11, 2024

请问这个问题怎么解决
Traceback (most recent call last):
File "tools/train.py", line 250, in
main()
File "tools/train.py", line 239, in main
train_segmentor(
File "/home/AFFormer/mmseg/apis/train.py", line 178, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/iter_based_runner.py", line 134, in run
iter_runner(iter_loaders[i], **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/iter_based_runner.py", line 59, in train
data_batch = next(data_loader)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/iter_based_runner.py", line 39, in next
data = next(self.iter_loader)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1176, in _next_data
raise StopIteration
StopIteration
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25137) of binary: /usr/bin/python3.8
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants