-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] RuntimeError: Ninja is required to load C++ extensions #1687
Comments
Do you have |
@jeffra Hi, When I have two machines in parallel, the same problem occurs; however, a single machine does not have this problem, have any tips for me? |
Hey @jeffra, when I was doing ninja --version, there was a permission error. The work around I found was to chmod 777 the folder it was try access, and then it worked. Was wondering if there was any other way |
I can run ninja --version, but still get this error.. |
I had the same problem when I ran deepspeed with tmux/screen |
deepspeed doesn't seem to load the anaconda environment variable correctly in the case of multiple nodes. For example, my ninja path is /home/xxx/anaconda3/envs/NLP/bin/ninja, but deepspeed does not add this path to the PATH environment variable. |
A temporary solution is to manually add the path of ninja to the PATH environment variable in the torch/utils/cpp_extension.py file |
@chinoll I have a similar problem. Where do you exaclty add the .../bin/ninja path in the torch/utils/cpp_extension.py file? |
|
That worked, thanks a lot @chinoll |
just my conjecture for my scenario, seems like deepspeed is using some cached torch extensions which point to files in an old conda environment which I no longer have access. I delete the cache |
This hypothesis makes sense to me. In my case, I'm using a conda environment, and directly calling the
This should be less intrusive imo than modifying |
Noted that some version of ninja has a bug that shows version correctly by return a
In this case, try install another version. |
it works, thanks a lot |
I finally figured out my error by [pip install ninja] out of my virtual environment! |
it works for me, thx |
Hi,
I am getting the following error when running pretrain_gpt.sh
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/qfs/people/shar703/scripts/mega_ai/deepspeed_megatron/DeepSpeed/deepspeed']
deepspeed info ................... 0.5.9+1d295ff, 1d295ff, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=1ac4a44 git_branch=main ****
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
using torch.float16 for parameters ...
------------------------ arguments ------------------------
accumulate_allreduce_grads_in_fp32 .............. False
adam_beta1 ...................................... 0.9
adam_beta2 ...................................... 0.999
adam_eps ........................................ 1e-08
adlr_autoresume ................................. False
adlr_autoresume_interval ........................ 1000
apply_query_key_layer_scaling ................... True
apply_residual_connection_post_layernorm ........ False
attention_dropout ............................... 0.1
attention_softmax_in_fp32 ....................... False
bert_binary_head ................................ True
bert_load ....................................... None
bf16 ............................................ False
bias_dropout_fusion ............................. True
bias_gelu_fusion ................................ True
biencoder_projection_dim ........................ 0
biencoder_shared_query_context_model ............ False
block_data_path ................................. None
checkpoint_activations .......................... True
checkpoint_in_cpu ............................... False
checkpoint_num_layers ........................... 1
clip_grad ....................................... 1.0
consumed_train_samples .......................... 0
consumed_train_tokens ........................... 0
consumed_valid_samples .......................... 0
contigious_checkpointing ........................ False
cpu_optimizer ................................... False
cpu_torch_adam .................................. False
curriculum_learning ............................. False
data_impl ....................................... infer
data_parallel_size .............................. 1
data_path ....................................... ['cord19/chemistry_cord19_abstract_document']
dataloader_type ................................. single
DDP_impl ........................................ local
decoder_seq_length .............................. None
deepscale ....................................... False
deepscale_config ................................ None
deepspeed ....................................... False
deepspeed_activation_checkpointing .............. False
deepspeed_config ................................ None
deepspeed_mpi ................................... False
distribute_checkpointed_activations ............. False
distributed_backend ............................. nccl
embedding_path .................................. None
encoder_seq_length .............................. 1024
eod_mask_loss ................................... False
eval_interval ................................... 100
eval_iters ...................................... 10
evidence_data_path .............................. None
exit_duration_in_mins ........................... None
exit_interval ................................... None
ffn_hidden_size ................................. 4096
finetune ........................................ False
fp16 ............................................ True
fp16_lm_cross_entropy ........................... False
fp32_residual_connection ........................ False
global_batch_size ............................... 8
hidden_dropout .................................. 0.1
hidden_size ..................................... 1024
hysteresis ...................................... 2
ict_head_size ................................... None
ict_load ........................................ None
img_dim ......................................... 224
indexer_batch_size .............................. 128
indexer_log_interval ............................ 1000
init_method_std ................................. 0.02
init_method_xavier_uniform ...................... False
initial_loss_scale .............................. 4294967296
kv_channels ..................................... 64
layernorm_epsilon ............................... 1e-05
lazy_mpu_init ................................... None
load ............................................ checkpoints/gpt2_345m
local_rank ...................................... None
log_batch_size_to_tensorboard ................... False
log_interval .................................... 10
log_learning_rate_to_tensorboard ................ True
log_loss_scale_to_tensorboard ................... True
log_num_zeros_in_grad ........................... False
log_params_norm ................................. False
log_timers_to_tensorboard ....................... False
log_validation_ppl_to_tensorboard ............... False
loss_scale ...................................... None
loss_scale_window ............................... 1000
lr .............................................. 0.00015
lr_decay_iters .................................. 320000
lr_decay_samples ................................ None
lr_decay_style .................................. cosine
lr_decay_tokens ................................. None
lr_warmup_fraction .............................. 0.01
lr_warmup_iters ................................. 0
lr_warmup_samples ............................... 0
make_vocab_size_divisible_by .................... 128
mask_prob ....................................... 0.15
masked_softmax_fusion ........................... True
max_position_embeddings ......................... 1024
memory_centric_tiled_linear ..................... False
merge_file ...................................... ../deepspeed_megatron/gpt_files/gpt2-merges.txt
micro_batch_size ................................ 4
min_loss_scale .................................. 1.0
min_lr .......................................... 0.0
mmap_warmup ..................................... False
no_load_optim ................................... None
no_load_rng ..................................... None
no_save_optim ................................... None
no_save_rng ..................................... None
num_attention_heads ............................. 16
num_channels .................................... 3
num_classes ..................................... 1000
num_layers ...................................... 24
num_layers_per_virtual_pipeline_stage ........... None
num_workers ..................................... 2
onnx_safe ....................................... None
openai_gelu ..................................... False
optimizer ....................................... adam
override_lr_scheduler ........................... False
params_dtype .................................... torch.float16
partition_activations ........................... False
patch_dim ....................................... 16
pipeline_model_parallel_size .................... 1
profile_backward ................................ False
query_in_block_prob ............................. 0.1
rampup_batch_size ............................... None
rank ............................................ 0
remote_device ................................... none
reset_attention_mask ............................ False
reset_position_ids .............................. False
retriever_report_topk_accuracies ................ []
retriever_score_scaling ......................... False
retriever_seq_length ............................ 256
sample_rate ..................................... 1.0
save ............................................ checkpoints/gpt2_345m
save_interval ................................... 500
scatter_gather_tensors_in_pipeline .............. True
scattered_embeddings ............................ False
seed ............................................ 1234
seq_length ...................................... 1024
sgd_momentum .................................... 0.9
short_seq_prob .................................. 0.1
split ........................................... 969, 30, 1
split_transformers .............................. False
synchronize_each_layer .......................... False
tensor_model_parallel_size ...................... 1
tensorboard_dir ................................. None
tensorboard_log_interval ........................ 1
tensorboard_queue_size .......................... 1000
tile_factor ..................................... 1
titles_data_path ................................ None
tokenizer_type .................................. GPT2BPETokenizer
train_iters ..................................... 500000
train_samples ................................... None
train_tokens .................................... None
use_checkpoint_lr_scheduler ..................... False
use_contiguous_buffers_in_ddp ................... False
use_cpu_initialization .......................... None
use_one_sent_docs ............................... False
use_pin_memory .................................. False
virtual_pipeline_model_parallel_size ............ None
vocab_extra_ids ................................. 0
vocab_file ...................................... ../deepspeed_megatron/gpt_files/gpt2-vocab.json
weight_decay .................................... 0.01
world_size ...................................... 1
zero_allgather_bucket_size ...................... 0.0
zero_contigious_gradients ....................... False
zero_reduce_bucket_size ......................... 0.0
zero_reduce_scatter ............................. False
zero_stage ...................................... 1.0
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 2
The text was updated successfully, but these errors were encountered: