Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Mamba to tp4: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 #10966

Open
zixianwang2022 opened this issue Oct 21, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@zixianwang2022
Copy link

Describe the bug

I am trying to convert the default mamba.nemo file (I converted form huggingface .pt to .nemo) to have tensor_parallel=4. I have been following the documentation here, but I got the following error.

[NeMo I 2024-10-21 06:50:54 megatron_init:314] Rank 0 has data parallel group : [0]                                                                                                                 
[NeMo I 2024-10-21 06:50:54 megatron_init:320] Rank 0 has combined group of data parallel and context parallel : [0]                                                                                
[NeMo I 2024-10-21 06:50:54 megatron_init:325] All data parallel group ranks with context parallel combined: [[0], [1], [2], [3]]                                                                   
[NeMo I 2024-10-21 06:50:54 megatron_init:328] Ranks 0 has data parallel rank: 0                                                                                                                    
[NeMo I 2024-10-21 06:50:54 megatron_init:336] Rank 0 has context parallel group: [0]                                                                                                               
[NeMo I 2024-10-21 06:50:54 megatron_init:339] All context parallel group ranks: [[0], [1], [2], [3]]                                                                                               
[NeMo I 2024-10-21 06:50:54 megatron_init:340] Ranks 0 has context parallel rank: 0                                                                                                                 
[NeMo I 2024-10-21 06:50:54 megatron_init:347] Rank 0 has model parallel group: [0, 1, 2, 3]                                                                                                        
[NeMo I 2024-10-21 06:50:54 megatron_init:348] All model parallel group ranks: [[0, 1, 2, 3]]                                                                                                       
[NeMo I 2024-10-21 06:50:54 megatron_init:357] Rank 0 has tensor model parallel group: [0, 1, 2, 3]                                                                                                 
[NeMo I 2024-10-21 06:50:54 megatron_init:361] All tensor model parallel group ranks: [[0, 1, 2, 3]]                                                                                                
[NeMo I 2024-10-21 06:50:54 megatron_init:362] Rank 0 has tensor model parallel rank: 0                                                                                                             
[NeMo I 2024-10-21 06:50:54 megatron_init:382] Rank 0 has pipeline model parallel group: [0]                                                                                                        
[NeMo I 2024-10-21 06:50:54 megatron_init:394] Rank 0 has embedding group: [0]                                                                                                                      
[NeMo I 2024-10-21 06:50:54 megatron_init:400] All pipeline model parallel group ranks: [[0], [1], [2], [3]]                                                                                        
[NeMo I 2024-10-21 06:50:54 megatron_init:401] Rank 0 has pipeline model parallel rank 0                                                                                                            
[NeMo I 2024-10-21 06:50:54 megatron_init:402] All embedding group ranks: [[0], [1], [2], [3]]                                                                                                      
[NeMo I 2024-10-21 06:50:54 megatron_init:403] Rank 0 has embedding rank: 0                                                                                                                         
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.                                                                                                                                                                                
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.                                                                                                                                                                                      
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.                                                                                                                                                                            
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: deterministic_mode in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: use_te_rng_tracker in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_ag in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.                                                                                                                                                                                   
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs_dgrad in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.                                                                                                                                                                             
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.                                                                                                                                                                                     
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_ag in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.                                                                                                                                                                                    
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.                                                                                                                                                                                     
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_rs in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.                                                                                                                                                                                    
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cross_entropy_loss_fusion in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.                                                                                                                                                                            
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_disable_qkv in its cfg. Add this key to cfg or config_mapping to mak
e to make it configurable.                                                                                                                                                                          
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_disable_fc1 in its cfg. Add this key to cfg or config_mapping to mak
e to make it configurable.                                                                                                                                                                          
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bootstrap_backend in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.                                                                                                                                                                            
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: overlap_p2p_comm in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.                                                                                                                                                                                     
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: batch_p2p_comm in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: defer_embedding_wgrad_compute in its cfg. Add this key to cfg or config_mapping to m
ake to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: wgrad_deferral_limit in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: pipeline_model_parallel_split_rank in its cfg. Add this key to cfg or config_mapping
 to make to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_num_layers in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: _cpu_offloading_context in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_activations in its cfg. Add this key to cfg or config_mapping to make
 to make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_weights in its cfg. Add this key to cfg or config_mapping to make to 
make it configurable.
[NeMo W 2024-10-21 06:50:54 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo I 2024-10-21 06:50:54 tokenizer_utils:217] tokenizer_model: 
[NeMo I 2024-10-21 06:50:54 tokenizer_utils:218] /tmp/tmpozie5_vl/a649c6bb46eb404fac64c5ee8f43c407_mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model
[NeMo I 2024-10-21 06:50:55 megatron_base_model:604] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: deterministic_mode in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: use_te_rng_tracker in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_ag in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_overlap_rs_dgrad in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_ag in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: tp_comm_atomic_rs in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cross_entropy_loss_fusion in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: defer_embedding_wgrad_compute in its cfg. Add this key to cfg or config_mapping to m
ake to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: wgrad_deferral_limit in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: pipeline_model_parallel_split_rank in its cfg. Add this key to cfg or config_mapping
 to make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_num_layers in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: _cpu_offloading_context in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_activations in its cfg. Add this key to cfg or config_mapping to make
 to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: cpu_offloading_weights in its cfg. Add this key to cfg or config_mapping to make to 
make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:1189] The model: MegatronMambaModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:516] apply_query_key_layer_scaling is only enabled when using FP16, setting it to False and setting NVTE_APPLY_QK_LAYER_SCALING=0
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: first_pipeline_num_layers in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: last_pipeline_num_layers in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: activation_func_fp8_input_store in its cfg. Add this key to cfg or config_mapping to 
make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: window_size in its cfg. Add this key to cfg or config_mapping to make to make it conf
igurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: qk_layernorm in its cfg. Add this key to cfg or config_mapping to make to make it con
figurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: test_mode in its cfg. Add this key to cfg or config_mapping to make to make it config
urable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: calculate_per_token_loss in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: multi_latent_attention in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: memory_efficient_layer_norm in its cfg. Add this key to cfg or config_mapping to make
 to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_margin in its cfg. Add this key to cfg or config_mapping to make to make it confi
gurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_interval in its cfg. Add this key to cfg or config_mapping to make to make it con
figurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_amax_history_len in its cfg. Add this key to cfg or config_mapping to make to mak
e it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_amax_compute_algo in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it config
urable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_dot_product_attention in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: fp8_multi_head_attention in its cfg. Add this key to cfg or config_mapping to make to
 make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_shared_expert_intermediate_size in its cfg. Add this key to cfg or config_mapping
 to make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_shared_expert_overlap in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_router_load_balancing_type in its cfg. Add this key to cfg or config_mapping to m
ake to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_router_topk in its cfg. Add this key to cfg or config_mapping to make to make it 
configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_router_pre_softmax in its cfg. Add this key to cfg or config_mapping to make to m
ake it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_grouped_gemm in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_aux_loss_coeff in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_z_loss_coeff in its cfg. Add this key to cfg or config_mapping to make to make it
 configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_input_jitter_eps in its cfg. Add this key to cfg or config_mapping to make to mak
e it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_token_dropping in its cfg. Add this key to cfg or config_mapping to make to make 
it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_token_dispatcher_type in its cfg. Add this key to cfg or config_mapping to make t
o make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_per_layer_logging in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_expert_capacity_factor in its cfg. Add this key to cfg or config_mapping to make 
to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_pad_expert_input_to_capacity in its cfg. Add this key to cfg or config_mapping to
 make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_token_drop_policy in its cfg. Add this key to cfg or config_mapping to make to ma
ke it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: moe_layer_recompute in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping t
o make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: disable_parameter_transpose_cache in its cfg. Add this key to cfg or config_mapping t
o make to make it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: enable_cuda_graph in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: external_cuda_graph in its cfg. Add this key to cfg or config_mapping to make to make
 it configurable.
[NeMo W 2024-10-21 06:50:55 megatron_base_model:577] The model: MegatronMambaModel() does not have field.name: config_logger_dir in its cfg. Add this key to cfg or config_mapping to make to make i
t configurable.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_mixer.py", line 168, in __init__
[rank0]:     self.in_proj = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 when instantiating TELayerNormColumnParallelLinear

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_layer.py", line 45, in __init__
[rank0]:     self.mixer = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_mixer.py", line 168, in __init__
[rank0]:     self.in_proj = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 when instantiating TELayerNormColumnParallelLinear when instantiating MambaM
ixer

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_block.py", line 155, in __init__
[rank0]:     layer = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_layer.py", line 45, in __init__
[rank0]:     self.mixer = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/ssm/mamba_mixer.py", line 168, in __init__
[rank0]:     self.in_proj = build_module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 104, in build_module
[rank0]:     raise type(e)(f"{str(e)} when instantiating {module.__name__}").with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/transformer/spec_utils.py", line 97, in build_module
[rank0]:     return module(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/extensions/transformer_engine.py", line 333, in __init__
[rank0]:     _ = _initialize_affine_weight_cpu(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/megatron/core/tensor_parallel/layers.py", line 150, in _initialize_affine_weight_cpu
[rank0]:     weight.data.copy_(cpu_weight)
[rank0]: RuntimeError: The size of tensor a (18560) must match the size of tensor b (4640) at non-singleton dimension 0 when instantiating TELayerNormColumnParallelLinear when instantiating MambaM
ixer when instantiating MambaLayer

[rank0]: During handling of the above exception, another exception occurred:

Steps/Code to reproduce bug

Please list minimal steps or code snippet for us to be able to reproduce the bug.
I am using the main branch.
load the model huggingface: https://huggingface.co/nvidia/mamba2-8b-3t-4k/tree/main

CUDA_VISIBLE_DEVICES="0" python /opt/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py \
                                --input_name_or_path <path to the source pytorch model> \
                                --output_path <path to target .nemo model> \
                                --mamba_ssm_ngroups 8 \
                                --precision bf16 \
                                --tokenizer_model_dir=<path to tokenizer.model> # Remove this line (or set it to None) for 130m, 370m, 780m, 1.3b, and 2.7b models.
python /opt/NeMo/examples/nlp/language_modeling/mamba_change_num_partition.py \
       --model_file=<path to source .nemo model> \
       --target_file=<path to target .nemo model> \
       --tensor_model_parallel_size=1 \
       --target_tensor_model_parallel_size=4 \
       --precision=bf16 \
       --tokenizer_path=<path to tokenizer.model>

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

It should work without any errors by following the documentation.

Environment overview (please complete the following information)

docker build -f Dockerfile.ci -t nemo:latest . 

Add any other context about the problem here.
Example: 8-SXM H100

@zixianwang2022 zixianwang2022 added the bug Something isn't working label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant