You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
运行:
python generate_samples.py --model-parallel-size 2 --num-layers 32 --hidden-size 2560 --load ./80000 --num-attention-heads 32 --seq-length 1024 --max-position-embeddings 1024 --fp16 --cache-dir cache --out-seq-length 512 --temperature 0.9 --top_k 0 --top_p 0 --tokenizer-path bpe_3w_new/ --vocab-size 30000 --input-text example.txt
报错如下:
Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1
using dynamic loss scaling
/home/troila/anaconda3/envs/test/lib/python3.7/site-packages/torch/cuda/init.py:146: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
building CPM model ...
number of parameters on model parallel rank 0: 2597073920
global rank 0 is loading checkpoint ./80000/80000/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "generate_samples.py", line 384, in
main()
File "generate_samples.py", line 374, in main
model = setup_model(args)
File "generate_samples.py", line 345, in setup_model
args.iteration = load_checkpoint_model(model, args)
File "/home/hanlifei/CPM-Generate/utils.py", line 290, in load_checkpoint_model
model.load_state_dict(sd['module'])
File "/home/hanlifei/CPM-Generate/model/distributed.py", line 90, in load_state_dict
self.module.load_state_dict(state_dict, strict=strict)
File "/home/hanlifei/CPM-Generate/fp16/fp16.py", line 71, in load_state_dict
self.module.load_state_dict(state_dict, strict=strict)
File "/home/troila/anaconda3/envs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for word_embeddings.weight: copying a param with shape torch.Size([15000, 2560]) from checkpoint, the shape in current model is torch.Size([30000, 2560]).
size mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.0.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.0.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.1.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.1.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.2.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.2.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.2.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.2.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.2.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.2.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.3.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.3.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.3.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.3.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.3.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.3.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.4.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.4.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.4.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.4.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.4.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.4.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.5.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.5.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.5.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.5.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.5.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.5.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.6.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.6.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.6.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.6.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.6.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.6.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.7.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.7.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.7.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.7.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.7.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.7.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.8.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.8.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.8.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.8.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.8.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.8.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.9.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.9.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.9.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.9.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.9.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.9.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.10.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.10.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.10.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.10.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.10.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.10.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.11.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.11.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.11.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.11.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.11.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.11.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.12.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.12.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.12.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.12.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.12.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.12.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.13.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.13.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.13.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.13.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.13.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.13.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.14.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.14.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.14.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.14.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.14.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.14.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.15.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.15.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.15.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.15.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.15.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.15.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.16.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.16.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.16.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.16.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.16.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.16.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.17.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.17.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.17.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.17.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.17.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.17.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.18.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.18.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.18.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.18.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.18.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.18.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.19.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.19.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.19.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.19.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.19.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.19.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.20.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.20.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.20.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.20.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.20.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.20.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.21.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.21.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.21.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.21.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.21.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.21.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.22.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.22.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.22.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.22.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.22.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.22.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.23.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.23.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.23.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.23.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.23.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.23.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.24.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.24.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.24.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.24.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.24.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.24.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.25.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.25.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.25.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.25.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.25.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.25.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.26.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.26.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.26.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.26.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.26.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.26.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.27.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.27.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.27.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.27.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.27.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.27.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.28.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.28.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.28.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.28.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.28.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.28.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.29.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.29.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.29.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.29.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.29.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.29.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.30.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.30.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.30.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.30.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.30.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.30.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
size mismatch for transformer.layers.31.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]).
size mismatch for transformer.layers.31.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]).
size mismatch for transformer.layers.31.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]).
size mismatch for transformer.layers.31.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]).
size mismatch for transformer.layers.31.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
size mismatch for transformer.layers.31.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).
The text was updated successfully, but these errors were encountered:
Centos系统中,安装apex和deepspeed等依赖包
运行目录为项目根目录,
预训练模型,存储根目录:80000/80000/mp_rank_00_model_states.pt
运行:
python generate_samples.py --model-parallel-size 2 --num-layers 32 --hidden-size 2560 --load ./80000 --num-attention-heads 32 --seq-length 1024 --max-position-embeddings 1024 --fp16 --cache-dir cache --out-seq-length 512 --temperature 0.9 --top_k 0 --top_p 0 --tokenizer-path bpe_3w_new/ --vocab-size 30000 --input-text example.txt
报错如下:
Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
The text was updated successfully, but these errors were encountered: