Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RuntimeError: Ninja is required to load C++ extensions #1687

Open
ShivamSharma2705 opened this issue Jan 10, 2022 · 16 comments
Open

[BUG] RuntimeError: Ninja is required to load C++ extensions #1687

ShivamSharma2705 opened this issue Jan 10, 2022 · 16 comments
Labels
bug Something isn't working

Comments

@ShivamSharma2705
Copy link

Hi,

I am getting the following error when running pretrain_gpt.sh


DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/qfs/people/shar703/scripts/mega_ai/deepspeed_megatron/DeepSpeed/deepspeed']
deepspeed info ................... 0.5.9+1d295ff, 1d295ff, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=1ac4a44 git_branch=main ****
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
using torch.float16 for parameters ...
------------------------ arguments ------------------------
accumulate_allreduce_grads_in_fp32 .............. False
adam_beta1 ...................................... 0.9
adam_beta2 ...................................... 0.999
adam_eps ........................................ 1e-08
adlr_autoresume ................................. False
adlr_autoresume_interval ........................ 1000
apply_query_key_layer_scaling ................... True
apply_residual_connection_post_layernorm ........ False
attention_dropout ............................... 0.1
attention_softmax_in_fp32 ....................... False
bert_binary_head ................................ True
bert_load ....................................... None
bf16 ............................................ False
bias_dropout_fusion ............................. True
bias_gelu_fusion ................................ True
biencoder_projection_dim ........................ 0
biencoder_shared_query_context_model ............ False
block_data_path ................................. None
checkpoint_activations .......................... True
checkpoint_in_cpu ............................... False
checkpoint_num_layers ........................... 1
clip_grad ....................................... 1.0
consumed_train_samples .......................... 0
consumed_train_tokens ........................... 0
consumed_valid_samples .......................... 0
contigious_checkpointing ........................ False
cpu_optimizer ................................... False
cpu_torch_adam .................................. False
curriculum_learning ............................. False
data_impl ....................................... infer
data_parallel_size .............................. 1
data_path ....................................... ['cord19/chemistry_cord19_abstract_document']
dataloader_type ................................. single
DDP_impl ........................................ local
decoder_seq_length .............................. None
deepscale ....................................... False
deepscale_config ................................ None
deepspeed ....................................... False
deepspeed_activation_checkpointing .............. False
deepspeed_config ................................ None
deepspeed_mpi ................................... False
distribute_checkpointed_activations ............. False
distributed_backend ............................. nccl
embedding_path .................................. None
encoder_seq_length .............................. 1024
eod_mask_loss ................................... False
eval_interval ................................... 100
eval_iters ...................................... 10
evidence_data_path .............................. None
exit_duration_in_mins ........................... None
exit_interval ................................... None
ffn_hidden_size ................................. 4096
finetune ........................................ False
fp16 ............................................ True
fp16_lm_cross_entropy ........................... False
fp32_residual_connection ........................ False
global_batch_size ............................... 8
hidden_dropout .................................. 0.1
hidden_size ..................................... 1024
hysteresis ...................................... 2
ict_head_size ................................... None
ict_load ........................................ None
img_dim ......................................... 224
indexer_batch_size .............................. 128
indexer_log_interval ............................ 1000
init_method_std ................................. 0.02
init_method_xavier_uniform ...................... False
initial_loss_scale .............................. 4294967296
kv_channels ..................................... 64
layernorm_epsilon ............................... 1e-05
lazy_mpu_init ................................... None
load ............................................ checkpoints/gpt2_345m
local_rank ...................................... None
log_batch_size_to_tensorboard ................... False
log_interval .................................... 10
log_learning_rate_to_tensorboard ................ True
log_loss_scale_to_tensorboard ................... True
log_num_zeros_in_grad ........................... False
log_params_norm ................................. False
log_timers_to_tensorboard ....................... False
log_validation_ppl_to_tensorboard ............... False
loss_scale ...................................... None
loss_scale_window ............................... 1000
lr .............................................. 0.00015
lr_decay_iters .................................. 320000
lr_decay_samples ................................ None
lr_decay_style .................................. cosine
lr_decay_tokens ................................. None
lr_warmup_fraction .............................. 0.01
lr_warmup_iters ................................. 0
lr_warmup_samples ............................... 0
make_vocab_size_divisible_by .................... 128
mask_prob ....................................... 0.15
masked_softmax_fusion ........................... True
max_position_embeddings ......................... 1024
memory_centric_tiled_linear ..................... False
merge_file ...................................... ../deepspeed_megatron/gpt_files/gpt2-merges.txt
micro_batch_size ................................ 4
min_loss_scale .................................. 1.0
min_lr .......................................... 0.0
mmap_warmup ..................................... False
no_load_optim ................................... None
no_load_rng ..................................... None
no_save_optim ................................... None
no_save_rng ..................................... None
num_attention_heads ............................. 16
num_channels .................................... 3
num_classes ..................................... 1000
num_layers ...................................... 24
num_layers_per_virtual_pipeline_stage ........... None
num_workers ..................................... 2
onnx_safe ....................................... None
openai_gelu ..................................... False
optimizer ....................................... adam
override_lr_scheduler ........................... False
params_dtype .................................... torch.float16
partition_activations ........................... False
patch_dim ....................................... 16
pipeline_model_parallel_size .................... 1
profile_backward ................................ False
query_in_block_prob ............................. 0.1
rampup_batch_size ............................... None
rank ............................................ 0
remote_device ................................... none
reset_attention_mask ............................ False
reset_position_ids .............................. False
retriever_report_topk_accuracies ................ []
retriever_score_scaling ......................... False
retriever_seq_length ............................ 256
sample_rate ..................................... 1.0
save ............................................ checkpoints/gpt2_345m
save_interval ................................... 500
scatter_gather_tensors_in_pipeline .............. True
scattered_embeddings ............................ False
seed ............................................ 1234
seq_length ...................................... 1024
sgd_momentum .................................... 0.9
short_seq_prob .................................. 0.1
split ........................................... 969, 30, 1
split_transformers .............................. False
synchronize_each_layer .......................... False
tensor_model_parallel_size ...................... 1
tensorboard_dir ................................. None
tensorboard_log_interval ........................ 1
tensorboard_queue_size .......................... 1000
tile_factor ..................................... 1
titles_data_path ................................ None
tokenizer_type .................................. GPT2BPETokenizer
train_iters ..................................... 500000
train_samples ................................... None
train_tokens .................................... None
use_checkpoint_lr_scheduler ..................... False
use_contiguous_buffers_in_ddp ................... False
use_cpu_initialization .......................... None
use_one_sent_docs ............................... False
use_pin_memory .................................. False
virtual_pipeline_model_parallel_size ............ None
vocab_extra_ids ................................. 0
vocab_file ...................................... ../deepspeed_megatron/gpt_files/gpt2-vocab.json
weight_decay .................................... 0.01
world_size ...................................... 1
zero_allgather_bucket_size ...................... 0.0
zero_contigious_gradients ....................... False
zero_reduce_bucket_size ......................... 0.0
zero_reduce_scatter ............................. False
zero_stage ...................................... 1.0
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 2

building GPT2BPETokenizer tokenizer ...
padded vocab (size: 50257) with 47 dummy tokens (new size: 50304)
initializing torch distributed ...
initializing tensor model parallel with size 1
initializing pipeline model parallel with size 1
setting random seeds to 1234 ...
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
compiling dataset index builder ...
make: Entering directory /qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for default'.
make: Leaving directory `/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/data'

done with dataset index builder. Compilation time: 0.051 seconds
compiling and loading fused kernels ...
Traceback (most recent call last):
File "/people/shar703/anaconda3/envs/deepspeed/bin/ninja", line 33, in
sys.exit(load_entry_point('ninja', 'console_scripts', 'ninja')())
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/init.py", line 51, in ninja
raise SystemExit(_program('ninja', sys.argv[1:]))
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/init.py", line 47, in _program
return subprocess.call([os.path.join(BIN_DIR, name)] + args)
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/data/bin/ninja'
Traceback (most recent call last):
File "pretrain_gpt.py", line 231, in
pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/training.py", line 96, in pretrain
initialize_megatron(extra_args_provider=extra_args_provider,
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/initialize.py", line 89, in initialize_megatron
_compile_dependencies()
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/initialize.py", line 137, in _compile_dependencies
fused_kernels.load(args)
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/fused_kernels/init.py", line 71, in load
scaled_upper_triang_masked_softmax_cuda = _cpp_extention_load_helper(
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/fused_kernels/init.py", line 47, in _cpp_extention_load_helper
return cpp_extension.load(
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
return _jit_compile(
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1292, in _jit_compile
_write_ninja_file_and_build_library(
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1373, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1429, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

@ShivamSharma2705 ShivamSharma2705 added the bug Something isn't working label Jan 10, 2022
@jeffra
Copy link
Collaborator

jeffra commented Jan 10, 2022

Do you have ninja installed? The command from pytorch that is raising this RuntimeError is attempting to run ninja --version. Does this command work for you?

@XiaoqingNLP
Copy link

@jeffra Hi, When I have two machines in parallel, the same problem occurs; however, a single machine does not have this problem, have any tips for me?

@ShivamSharma2705
Copy link
Author

Hey @jeffra, when I was doing ninja --version, there was a permission error. The work around I found was to chmod 777 the folder it was try access, and then it worked. Was wondering if there was any other way

@JiyangZhang
Copy link

I can run ninja --version, but still get this error..

@chinoll
Copy link

chinoll commented May 16, 2022

I had the same problem when I ran deepspeed with tmux/screen

@chinoll
Copy link

chinoll commented May 16, 2022

deepspeed doesn't seem to load the anaconda environment variable correctly in the case of multiple nodes. For example, my ninja path is /home/xxx/anaconda3/envs/NLP/bin/ninja, but deepspeed does not add this path to the PATH environment variable.

@chinoll
Copy link

chinoll commented May 16, 2022

A temporary solution is to manually add the path of ninja to the PATH environment variable in the torch/utils/cpp_extension.py file

@joanrod
Copy link

joanrod commented May 17, 2022

@chinoll I have a similar problem. Where do you exaclty add the .../bin/ninja path in the torch/utils/cpp_extension.py file?

@chinoll
Copy link

chinoll commented May 18, 2022

@chinoll I have a similar problem. Where do you exaclty add the .../bin/ninja path in the torch/utils/cpp_extension.py file?

image

@joanrod
Copy link

joanrod commented May 18, 2022

That worked, thanks a lot @chinoll

@tnq177
Copy link

tnq177 commented Dec 20, 2022

just my conjecture for my scenario, seems like deepspeed is using some cached torch extensions which point to files in an old conda environment which I no longer have access. I delete the cache rm -rf /home/ubuntu/.cache/torch_extensions/py310_cu116/ forcing DS to rebuild the extensions and it works again.

@manestay
Copy link

manestay commented Aug 1, 2023

deepspeed doesn't seem to load the anaconda environment variable correctly in the case of multiple nodes. For example, my ninja path is /home/xxx/anaconda3/envs/NLP/bin/ninja, but deepspeed does not add this path to the PATH environment variable.

This hypothesis makes sense to me. In my case, I'm using a conda environment, and directly calling the deepspeed binary from that conda env. I guess that way the path isn't set properly. My fix is to use these lines:

ENV_PATH=/path/to/env
export PATH="${ENV_PATH}/:$PATH"
${ENV_PATH}deepspeed your_script_here

This should be less intrusive imo than modifying torch/utils/cpp_extension.py.

@Dixeran
Copy link

Dixeran commented Sep 6, 2023

I can run ninja --version, but still get this error..

Noted that some version of ninja has a bug that shows version correctly by return a 245 return code, which cause an exception when detecting ninja. Check with

ninja --version
echo $?

In this case, try install another version.

@Songqiw
Copy link

Songqiw commented Jan 16, 2024

@chinoll I have a similar problem. Where do you exaclty add the .../bin/ninja path in the torch/utils/cpp_extension.py file?

image

it works, thanks a lot

@xingyouxin
Copy link

I can run ninja --version, but still get this error..

I finally figured out my error by [pip install ninja] out of my virtual environment!

@ywb2018
Copy link

ywb2018 commented May 31, 2024

@chinoll I have a similar problem. Where do you exaclty add the .../bin/ninja path in the torch/utils/cpp_extension.py file?

image

it works for me, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests