Converting Mamba to tp4: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 #10966

zixianwang2022 · 2024-10-21T07:18:59Z

Describe the bug

I am trying to convert the default mamba.nemo file (I converted form huggingface .pt to .nemo) to have tensor_parallel=4. I have been following the documentation here, but I got the following error.

[NeMo I 2024-10-21 06:50:54 megatron_init:314] Rank 0 has data parallel group : [0]                                                                                                                 
[NeMo I 2024-10-21 06:50:54 megatron_init:320] Rank 0 has combined group of data parallel and context parallel : [0]                                                                                
[NeMo I 2024-10-21 06:50:54 megatron_init:325] All data parallel group ranks with context parallel combined: [[0], [1], [2], [3]]                                                                   
[NeMo I 2024-10-21 06:50:54 megatron_init:328] Ranks 0 has data parallel rank: 0                                                                                                                    
[NeMo I 2024-10-21 06:50:54 megatron_init:336] Rank 0 has context parallel group: [0]                                                                                                               
[NeMo I 2024-10-21 06:50:54 megatron_init:339] All context parallel group ranks: [[0], [1], [2], [3]]                                                                                               
[NeMo I 2024-10-21 06:50:54 megatron_init:340] Ranks 0 has context parallel rank: 0                                                                                                                 
[NeMo I 2024-10-21 06:50:54 megatron_init:347] Rank 0 has model parallel group: [0, 1, 2, 3]                                                                                                        
[NeMo I 2024-10-21 06:50:54 megatron_init:348] All model parallel group ranks: [[0, 1, 2, 3]]                                                                                                       
[NeMo I 2024-10-21 06:50:54 megatron_init:357] Rank 0 has tensor model parallel group: [0, 1, 2, 3]                                                                                                 
[NeMo I 2024-10-21 06:50:54 megatron_init:361] All tensor model parallel group ranks: [[0, 1, 2, 3]]                                                                                                
[NeMo I 2024-10-21 06:50:54 megatron_init:362] Rank 0 has tensor model parallel rank: 0                                                                                                             
[NeMo I 2024-10-21 06:50:54 megatron_init:382] Rank 0 has pipeline model parallel group: [0]                                                                                                        
[NeMo I 2024-10-21 06:50:54 megatron_init:394] Rank 0 has embedding group: [0]                                                                                                                      
[NeMo I 2024-10-21 06:50:54 megatron_init:400] All pipeline model parallel group ranks: [[0], [1], [2], [3]]                                                                                        
[NeMo I 2024-10-21 06:50:54 megatron_init:401] Rank 0 has pipeline model parallel rank 0                                                                                                            
[NeMo I 2024-10-21 06:50:54 megatron_init:402] All embedding group ranks: [[0], [1], [2], [3]]                                                                                                      
[NeMo I 2024-10-21 06:50:54 megatron_init:403] Rank 0 has embedding rank: 0                                                                                                                         
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.                                                                                                                                                                                
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.                                                                                                                                                                                      
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.                                                                                                                                                                            
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: deterministic_mode in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: use_te_rng_tracker in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_ag in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs_dgrad in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.                                                                                                                                                                             
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.                                                                                                                                                                                     
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_ag in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.                                                                                                                                                                                    
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.                                                                                                                                                                                     
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_rs in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.                                                                                                                                                                                    
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cross_entropy_loss_fusion in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.                                                                                                                                                                            
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_disable_qkv in its cfg. Add this key to cfg or config_mapping to mak
e to make it configurable.                                                                                                                                                                          
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_disable_fc1 in its cfg. Add this key to cfg or config_mapping to mak
e to make it configurable.                                                                                                                                                                          
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bootstrap_backend in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.                                                                                                                                                                            
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: overlap_p2p_comm in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.                                                                                                                                                                                     
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: batch_p2p_comm in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: defer_embedding_wgrad_compute in its cfg. Add this key to cfg or config_mapping to m
ake to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: wgrad_deferral_limit in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: pipeline_model_parallel_split_rank in its cfg. Add this key to cfg or config_mapping
 to make to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_num_layers in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: _cpu_offloading_context in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_activations in its cfg. Add this key to cfg or config_mapping to make
 to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_weights in its cfg. Add this key to cfg or config_mapping to make to 
make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo I 2024-10-21 06:50:54 tokenizer_utils:217] tokenizer_model: 
[NeMo I 2024-10-21 06:50:54 tokenizer_utils:218] /tmp/tmpozie5_vl/a649c6bb46eb404fac64c5ee8f43c407_mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model
[NeMo I 2024-10-21 06:50:55 megatron_base_model:604] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: deterministic_mode in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: use_te_rng_tracker in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_ag in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs_dgrad in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_ag in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_rs in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cross_entropy_loss_fusion in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: defer_embedding_wgrad_compute in its cfg. Add this key to cfg or config_mapping to m
ake to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: wgrad_deferral_limit in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: pipeline_model_parallel_split_rank in its cfg. Add this key to cfg or config_mapping
 to make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_num_layers in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: _cpu_offloading_context in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_activations in its cfg. Add this key to cfg or config_mapping to make
 to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_weights in its cfg. Add this key to cfg or config_mapping to make to 
make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:516] apply_query_key_layer_scaling is only enabled when using FP16, setting it to False and setting NVTE_APPLY_QK_LAYER_SCALING=0
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: first_pipeline_num_layers in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: last_pipeline_num_layers in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: activation_func_fp8_input_store in its cfg. Add this key to cfg or config_mapping to 
make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: window_size in its cfg. Add this key to cfg or config_mapping to make to make it conf
igurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: qk_layernorm in its cfg. Add this key to cfg or config_mapping to make to make it con
figurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: test_mode in its cfg. Add this key to cfg or config_mapping to make to make it config
urable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: calculate_per_token_loss in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: multi_latent_attention in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: memory_efficient_layer_norm in its cfg. Add this key to cfg or config_mapping to make
 to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_margin in its cfg. Add this key to cfg or config_mapping to make to make it confi
gurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_interval in its cfg. Add this key to cfg or config_mapping to make to make it con
figurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_amax_history_len in its cfg. Add this key to cfg or config_mapping to make to mak
e it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_amax_compute_algo in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it config
urable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_dot_product_attention in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_multi_head_attention in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_shared_expert_intermediate_size in its cfg. Add this key to cfg or config_mapping
 to make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_shared_expert_overlap in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_router_load_balancing_type in its cfg. Add this key to cfg or config_mapping to m
ake to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_router_topk in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_router_pre_softmax in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_grouped_gemm in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_aux_loss_coeff in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_z_loss_coeff in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_input_jitter_eps in its cfg. Add this key to cfg or config_mapping to make to mak
e it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_token_dropping in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_token_dispatcher_type in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_per_layer_logging in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_expert_capacity_factor in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_pad_expert_input_to_capacity in its cfg. Add this key to cfg or config_mapping to
 make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_token_drop_policy in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_layer_recompute in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping t
o make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: disable_parameter_transpose_cache in its cfg. Add this key to cfg or config_mapping t
o make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: enable_cuda_graph in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: external_cuda_graph in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: config_logger_dir in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_mixer.py", line 168, in __init__
[rank0]:     self.in_proj = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 when instantiating TELayerNormColumnParallelLinear

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_layer.py", line 45, in __init__
[rank0]:     self.mixer = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_mixer.py", line 168, in __init__
[rank0]:     self.in_proj = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 when instantiating TELayerNormColumnParallelLinear when instantiating MambaM
ixer

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_block.py", line 155, in __init__
[rank0]:     layer = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_layer.py", line 45, in __init__
[rank0]:     self.mixer = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_mixer.py", line 168, in __init__
[rank0]:     self.in_proj = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 when instantiating TELayerNormColumnParallelLinear when instantiating MambaM
ixer when instantiating MambaLayer

[rank0]: During handling of the above exception, another exception occurred:

Steps/Code to reproduce bug

Please list minimal steps or code snippet for us to be able to reproduce the bug.
I am using the main branch.
load the model huggingface: https://huggingface.co/nvidia/mamba2-8b-3t-4k/tree/main

CUDA_VISIBLE_DEVICES="0" python /opt/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py \
                                --input_name_or_path <path to the source pytorch model> \
                                --output_path <path to target .nemo model> \
                                --mamba_ssm_ngroups 8 \
                                --precision bf16 \
                                --tokenizer_model_dir=<path to tokenizer.model> # Remove this line (or set it to None) for 130m, 370m, 780m, 1.3b, and 2.7b models.

python /opt/NeMo/examples/nlp/language_modeling/mamba_change_num_partition.py \
       --model_file=<path to source .nemo model> \
       --target_file=<path to target .nemo model> \
       --tensor_model_parallel_size=1 \
       --target_tensor_model_parallel_size=4 \
       --precision=bf16 \
       --tokenizer_path=<path to tokenizer.model>

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

It should work without any errors by following the documentation.

Environment overview (please complete the following information)

Environment location: docker.
Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
running the docker command
If method of install is [Docker], provide docker pull & docker run commands used
https://github.com/NVIDIA/NeMo/blob/main/Dockerfile.ci

docker build -f Dockerfile.ci -t nemo:latest .

Add any other context about the problem here.
Example: 8-SXM H100

The text was updated successfully, but these errors were encountered:

zixianwang2022 added the bug Something isn't working label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting Mamba to tp4: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 #10966

Converting Mamba to tp4: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 #10966

zixianwang2022 commented Oct 21, 2024

Converting Mamba to tp4: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 #10966

Converting Mamba to tp4: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 #10966

Comments

zixianwang2022 commented Oct 21, 2024