Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error building extension 'cpu_adam' #889

Closed
arthur-morgan-712 opened this issue Mar 24, 2021 · 30 comments
Closed

Error building extension 'cpu_adam' #889

arthur-morgan-712 opened this issue Mar 24, 2021 · 30 comments

Comments

@arthur-morgan-712
Copy link

Hey guys, I'm having a problem getting DeepSpeed working with XLM-Roberta. I'm trying to run it on an Amazon Linux machine, which is based on Red Hat. Here are a some versions of packages/dependencies I'm using:

cuda version: 10.2
transformers: 4.4.2
pytorch: 1.7.1
deepspeed: 0.3.13
gcc/c++/g++: (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)

I must admit I had some issues upgrading the CUDA version from the default 10.0 on the instance to 10.2 and GCC from 4.8.5 to 7.2.1 but since I don't get the error that the torch and installed CUDA versions are different and that GCC has a version lower than 5, I'd assume I'm in the clear.

Here's the essential part of the code I'm running (from a notebook):

import os
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '9994' # modify if RuntimeError: Address already in use
os.environ['RANK'] = "0"
os.environ['LOCAL_RANK'] = "0"
os.environ['WORLD_SIZE'] = "1"

from transformers import Trainer, TrainingArguments, XLMRobertaForSequenceClassification, XLMRobertaTokenizer

model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base')

training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    save_steps=500,
    save_total_limit=2,
    deepspeed="my_ds_config.json"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Here's the content of my config file:

{
    "fp16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },

    "zero_optimization": {
        "stage": 2,
       "allgather_partitions": true,
       "allgather_bucket_size": 2e8,
       "reduce_scatter": true,
       "reduce_bucket_size": 2e8,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "cpu_offload": true
    },

    "optimizer": {
        "type": "Adam",
        "params": {
            "adam_w_mode": true,
            "lr": 3e-5,
            "betas": [ 0.9, 0.999 ],
            "eps": 1e-8,
            "weight_decay": 3e-7
        }
    },

    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": 0,
            "warmup_max_lr": 3e-5,
            "warmup_num_steps": 500
        }
    }
}

Here's the output of my ds_config:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
/bin/sh: line 0: type: llvm-config: not found
/bin/sh: line 0: type: llvm-config-9: not found
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 10.2
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.3.13+22d5a1f, 22d5a1f, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.2

And finally, here's the stack trace:

[2021-03-24 15:29:36,478] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13, git-hash=unknown, git-branch=unknown
[2021-03-24 15:29:36,494] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module cpu_adam, skipping build step...
Loading extension module cpu_adam...
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-131-cc14ac05ecbb> in <module>
     30 )
     31 
---> 32 trainer.train()

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
    901         delay_optimizer_creation = self.sharded_ddp is not None and self.sharded_ddp != ShardedDDPOption.SIMPLE
    902         if self.args.deepspeed:
--> 903             model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
    904             self.model = model.module
    905             self.model_wrapped = model  # will get further wrapped in DDP

~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/integrations.py in init_deepspeed(trainer, num_training_steps)
    416         model=model,
    417         model_parameters=model_parameters,
--> 418         config_params=config,
    419     )
    420 

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/__init__.py in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config_params)
    123                                  dist_init_required=dist_init_required,
    124                                  collate_fn=collate_fn,
--> 125                                  config_params=config_params)
    126     else:
    127         assert mpu is None, "mpu must be None with pipeline parallelism"

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in __init__(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config_params, dont_change_device)
    181         self.lr_scheduler = None
    182         if model_parameters or optimizer:
--> 183             self._configure_optimizer(optimizer, model_parameters)
    184             self._configure_lr_scheduler(lr_scheduler)
    185             self._report_progress(0)

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_optimizer(self, client_optimizer, model_parameters)
    596                 logger.info('Using client Optimizer as basic optimizer')
    597         else:
--> 598             basic_optimizer = self._configure_basic_optimizer(model_parameters)
    599             if self.global_rank == 0:
    600                 logger.info(

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_basic_optimizer(self, model_parameters)
    665                     optimizer = DeepSpeedCPUAdam(model_parameters,
    666                                                  **optimizer_parameters,
--> 667                                                  adamw_mode=effective_adam_w_mode)
    668                 else:
    669                     from deepspeed.ops.adam import FusedAdam

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/adam/cpu_adam.py in __init__(self, model_params, lr, bias_correction, betas, eps, weight_decay, amsgrad, adamw_mode)
     76         DeepSpeedCPUAdam.optimizer_id = DeepSpeedCPUAdam.optimizer_id + 1
     77         self.adam_w_mode = adamw_mode
---> 78         self.ds_opt_adam = CPUAdamBuilder().load()
     79 
     80         self.ds_opt_adam.create_adam(self.opt_id,

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in load(self, verbose)
    213             return importlib.import_module(self.absolute_name())
    214         else:
--> 215             return self.jit_load(verbose)
    216 
    217     def jit_load(self, verbose=True):

~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in jit_load(self, verbose)
    250             extra_cuda_cflags=self.nvcc_args(),
    251             extra_ldflags=self.extra_ldflags(),
--> 252             verbose=verbose)
    253         build_duration = time.time() - start_build
    254         if verbose:

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1089     if isinstance(cuda_sources, str):
   1090         cuda_sources = [cuda_sources]
-> 1091 
   1092     cpp_sources.insert(0, '#include <torch/extension.h>')
   1093 

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1315 
   1316 
-> 1317 def verify_ninja_availability():
   1318     r'''
   1319     Raises ``RuntimeError`` if `ninja <https://ninja-build.org/>`_ build system is not

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _import_module_from_library(module_name, path, is_python_module)
   1697                       sources,
   1698                       objects,
-> 1699                       ldflags,
   1700                       library_target,
   1701                       with_cuda) -> None:

~/anaconda3/envs/python3/lib/python3.6/imp.py in find_module(name, path)
    295         break  # Break out of outer loop when breaking out of inner loop.
    296     else:
--> 297         raise ImportError(_ERR_MSG.format(name), name=name)
    298 
    299     encoding = None

ImportError: No module named 'cpu_adam'

Thanks in advance for your help!

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @arthur-morgan-712

I am not sure what exactly is happening here, since in the trace log I am seeing it says that it is trying to load the extension module cpu-adam, however the ds-report says it is not installed! I am thinking maybe this might be a problem of the system caching this module somehow!
Can you remove the /tmp/torch_extensions folder (rm -rf /tmp/torch_extentions/*) and retry this?

Thanks.
Reza

@arthur-morgan-712
Copy link
Author

Hi Reza,

Thanks for your reply! I tried removing it, it did have cpu_adam as a subfolder, however the issue still persists.

Funnily enough, I tried running this on Colab and it seemed to have loaded it:

Loading extension module cpu_adam...
Time to load cpu_adam op: 33.38163924217224 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000030, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1

And Colab's ds_report also states that it hasn't been installed:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.7.1+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.13+22d5a1f, 22d5a1f, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

Albeit, I did get another error on Colab (tcmalloc: large alloc 1200709632 bytes == .... Killing subprocess 346) but I assume it's because the 11 GB video card was still rather small for this task. So maybe cpu_adam not showing up as installed isn't out of the ordinary?

@arthur-morgan-712
Copy link
Author

Quick update, I ran the command from the notebook (running run_seq2seq.py) on the Amazon Linux instance and it seems to have provided a more detailed stack trace, I'm pasting it here:

Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda-10.2/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__AVX256__ -c /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
FAILED: cpu_adam.o 
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda-10.2/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__AVX256__ -c /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:4:10: fatal error: omp.h: No such file or directory
 #include <omp.h>
          ^~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1539, in _run_ninja_build
    env=env)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "examples/seq2seq/run_seq2seq.py", line 650, in <module>
    main()
  File "examples/seq2seq/run_seq2seq.py", line 590, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/ec2-user/SageMaker/checkstep-research/t5/transformers/src/transformers/trainer.py", line 892, in train
    model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
  File "/home/ec2-user/SageMaker/checkstep-research/t5/transformers/src/transformers/integrations.py", line 417, in init_deepspeed
    config_params=config,
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/__init__.py", line 125, in initialize
    config_params=config_params)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 183, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 598, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 667, in _configure_basic_optimizer
    adamw_mode=effective_adam_w_mode)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/adam/cpu_adam.py", line 78, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 215, in load
    return self.jit_load(verbose)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 252, in jit_load
    verbose=verbose)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 997, in load
    keep_intermediates=keep_intermediates)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1202, in _jit_compile
    with_cuda=with_cuda)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1300, in _write_ninja_file_and_build_library
    error_prefix="Error building extension '{}'".format(name))
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Killing subprocess 6428
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 171, in <module>
    main()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 161, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ec2-user/anaconda3/envs/python3/bin/python3.6', '-u', 'examples/seq2seq/run_seq2seq.py', '--local_rank=0', '--model_name_or_path', 'google/mt5-small', '--output_dir', 'output_dir', '--adam_eps', '1e-06', '--evaluation_strategy=steps', '--do_train', '--label_smoothing', '0.1', '--learning_rate', '3e-5', '--logging_first_step', '--logging_steps', '1000', '--max_source_length', '128', '--max_target_length', '128', '--num_train_epochs', '1', '--overwrite_output_dir', '--per_device_train_batch_size', '16', '--predict_with_generate', '--sortish_sampler', '--val_max_target_length', '128', '--warmup_steps', '500', '--max_train_samples', '2000', '--max_val_samples', '500', '--task', 'translation_en_to_ro', '--dataset_name', 'wmt16', '--dataset_config', 'ro-en', '--source_prefix', 'translate English to Romanian: ', '--deepspeed', 'ds_config.json', '--fp16']' returned non-zero exit status 1.

@arthur-morgan-712
Copy link
Author

I was able to get the example from the notebook going after I downgraded the DeepSpeed version to 0.3.10. I do have a follow-up question though: Correct me if I'm wrong, but the only way to use DeepSpeed would be to use the HuggingFace Trainer class? At least that's what I can find on HuggingFace (also trying to implement a custom script without using Trainer resulted in an unrecognized arguments: --local_rank=0 error, even though I wasn't passing that argument). If that's the case, does this mean I cannot use DeepSpeed to train a classification model passing different class weights to each class, because there's no such parameter to pass to Trainer?

@RezaYazdaniAminabadi
Copy link
Contributor

RezaYazdaniAminabadi commented Mar 25, 2021

We already have examples for running for some transformer networks. For this argument, I think you might just add local_rank to your parser arguments the same as here.

@saichandrapandraju
Copy link

I was able to get the example from the notebook going after I downgraded the DeepSpeed version to 0.3.10.

So is this the issue with 0.3.13 of DeepSpeed? Because I'm facing the same issue as well with 0.3.13.

Also, are you able to run HuggingFace-4.4.2 with DeepSpeed-0.3.10 ? I think you should've downgraded to HuggingFace-4.3.x.

@saichandrapandraju
Copy link

Hi @arthur-morgan-712 ,

Could you try building deepspeed ops while installing as suggested here?

@stas00
Copy link
Collaborator

stas00 commented Mar 26, 2021

We already have examples for running for some transformer networks. For this argument, I think you might just add local_rank to your parser arguments the same as here.

This is no longer needed in deepspeed since #825 and transformers master has been adjusted accordingly. You just need to have env LOCAL_RANK to be set.

I do have a follow-up question though: Correct me if I'm wrong, but the only way to use DeepSpeed would be to use the HuggingFace Trainer class?

Not at all. You can do your own integration and not rely on the HF Trainer.

If you do use transformers Trainer for a time being while this is all new you must use the transformers master branch as frequent deepspeed-related updates are made.

If you have build problems please make sure you read:
https://huggingface.co/transformers/main_classes/trainer.html#installation-notes
though looking at OP I think you have all the right components. Just check that PATH/LD_LIBRARY_PATH are good.

Perhaps try to pre-build deepspeed: #885 (comment)

@RezaYazdaniAminabadi
Copy link
Contributor

Thanks @stas00 for clarifying this : )

@stas00
Copy link
Collaborator

stas00 commented Mar 26, 2021

And the error is right there in your report: #889 (comment)

c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda-10.2/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__AVX256 -c /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:4:10:

fatal error: omp.h: No such file or directory
#include <omp.h>

You're missing the right build tools. omp.h is missing. It should be in your gcc dev package. e.g. on my machine it's under:

/usr/lib/gcc/x86_64-linux-gnu/6/include/omp.h
/usr/lib/gcc/x86_64-linux-gnu/7/include/omp.h
/usr/lib/gcc/x86_64-linux-gnu/9/include/omp.h

@RezaYazdaniAminabadi, perhaps ds_report could somehow check that all the build components are there? e.g. it could attempt to build some dummy very basic extension when run? not sure on the details. Clearly here gcc-dev tools are either missing or misconfigured.

@Alx-AI
Copy link

Alx-AI commented Jul 15, 2021

Still running into:
ImportError: No module named 'cpu_adam'

CUDA 10.2
deepspeed 0.4.3
pytorch 1.8.1

Tried down grading deepspeed to 0.3.10 and ran into:
"deepspeed>=0.4.3 is required for a normal functioning of this module, but found deepspeed==0.3.10."

Any other potential solutions to date or still open??

@ziweiji
Copy link

ziweiji commented Nov 17, 2021

I also occur that.
before AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
The error show: cannot make a dir in /tmp/torch_extensions/build for cpu_adam.
So I change the DEFAULT_TORCH_EXTENSION_PATH in the file /anaconda3/envs/XXXXX/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py
then it works

@stas00
Copy link
Collaborator

stas00 commented Nov 17, 2021

This sounds like a permission issue. Try to set TMPDIR to another dir that is writable by you?

e.g.:

mkdir ~/tmp
export TMPDIR=~/tmp
... do the build here ...

@rowanworth
Copy link

Try to set TMPDIR to another dir that is writable by you?

That won't work because deepspeed hardcodes the default extension path to be /tmp/torch_extensions

The default is not used if TORCH_EXTENSIONS_DIR is set in the environment, but it would certainly be an improvement for it to follow TMPDIR (especially as TORCH is not in the whitelist of environment variable prefixes which the deepspeed distributed launcher automatically plumbs through to worker processes)

@arain60gb
Copy link

@stas00 @RezaYazdaniAminabadi
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:

  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    [INFO|trainer.py:414] 2023-01-09 19:06:58,180 >> Using amp fp16 backend
    [2023-01-09 19:06:58,187] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown
    [2023-01-09 19:06:58,191] [WARNING] [config_utils.py:67:process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
    [2023-01-09 19:07:05,242] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
    Creating extension directory /root/.cache/torch_extensions/py38_cu117/cpu_adam...
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    Detected CUDA files, patching ldflags
    Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/cpu_adam/build.ninja...
    Building extension module cpu_adam...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
  • Avoid using tokenizers before the fork if possible
  • Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    [1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
    FAILED: cpu_adam.o
    c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
    In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0,
    from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16,
    from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:11,
    from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:1:
    /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
    #include <cuda_profiler_api.h>
    ^~~~~~~~~
    compilation terminated.
    [2/3] /root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS -D_CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS -U_CUDA_NO_HALF_CONVERSIONS_ -U_CUDA_NO_HALF2_OPERATORS_ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
    FAILED: custom_cuda_kernel.cuda.o
    /root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS_ -D_CUDA_NO_HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS_ -U_CUDA_NO_HALF_CONVERSIONS_ -U_CUDA_NO_HALF2_OPERATORS_ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
    In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0,
    from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16,
    from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu:1:
    /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
    #include <cuda_profiler_api.h>
    ^~~~~~~~~
    compilation terminated.
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
    File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
    subprocess.run(
    File "/root/anaconda3/envs/bitten/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "run_clm.py", line 478, in
main()
File "run_clm.py", line 441, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/trainer.py", line 1112, in train
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/deepspeed.py", line 355, in deepspeed_init
model, optimizer, _, lr_scheduler = deepspeed.initialize(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 330, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1195, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1266, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 460, in load
return self.jit_load(verbose)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 495, in jit_load
op_module = load(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7efc99bbd1f0>
Traceback (most recent call last):
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-01-09 19:07:12,824] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 7728
[2023-01-09 19:07:12,826] [ERROR] [launch.py:324:sigkill_handler]
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

@stas00
Copy link
Collaborator

stas00 commented Jan 9, 2023

@arthur-morgan-712, you have a problem with your cuda environment:

/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10:
 fatal error: cuda_profiler_api.h: No such file or directory
#include <cuda_profiler_api.h>

properly install the cuda environment, including all dev header files and do the run again and it'll work.

on ubuntu I usually recommend the nvidia cuda pre-packaged .deb files, but be careful the latest cuda is already in 12.x so you make sure you're installing the same major version as pytorch - most likely cuda-11.x - I think 11.8 is the latest in that line.

e.g. on my system the missing file is here:

/usr/local/cuda-11.8/targets/x86_64-linux/include/cuda_profiler_api.h

@Chiang97912
Copy link

In my case, this BUG is due to ninja compile error, you can change directory to ~/.cache/torch_extensions/cpu_adam, then run ninja -v to see the error details.

@loadams
Copy link
Contributor

loadams commented Aug 18, 2023

Closing this issue as the original issue was resolved. If anyone is having issues with this, please open a new issue and link this one and we would be happy to take a look.

@zydmtaichi
Copy link

In my case, this BUG is due to ninja compile error, you can change directory to ~/.cache/torch_extensions/cpu_adam, then run ninja -v to see the error details.

hi @chiang ,
i follow your advice to run ninja -v under dir /root/.cache/torch_extensions/py310_cu118/cpu_adam, and get error info below, please check and tell me how to fix this problem

[1/1] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so 
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

@stas00
Copy link
Collaborator

stas00 commented Jul 31, 2024

/usr/bin/ld: cannot find -lcurand

It can't find your cuda install - if you have it installed search for where libcurand.so is on your fs.

e.g. on my machine it's: /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcurand.so.10.3.2.106 so in that case adding export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/targets/x86_64-linux/lib should help finding that shared object. You need to edit this path to where your cuda is installed if any.

But most likely you don't have cuda installed. You can install it via apt https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation or even inside a conda environment via https://anaconda.org/conda-forge/cudatoolkit if you don't have sudo access. (but I see it's only 11.8 - I don't know if you need cuda-12.x or not).

I haven't tried, but this is probably the right package if you want to install it into your conda and has all the latest versions as well https://anaconda.org/nvidia/cuda-libraries - use the same version as your pytorch, to check which pytorch cuda version is used run:

python -c 'import torch; print(f"pt={torch.__version__}, cuda={torch.version.cuda}")'

@zydmtaichi
Copy link

/usr/bin/ld: cannot find -lcurand

It can't find your cuda install - if you have it installed search for where libcurand.so is on your fs.

e.g. on my machine it's: /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcurand.so.10.3.2.106 so in that case adding export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/targets/x86_64-linux/lib should help finding that shared object. You need to edit this path to where your cuda is installed if any.

But most likely you don't have cuda installed. You can install it via apt https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation or even inside a conda environment via https://anaconda.org/conda-forge/cudatoolkit if you don't have sudo access. (but I see it's only 11.8 - I don't know if you need cuda-12.x or not).

I haven't tried, but this is probably the right package if you want to install it into your conda and has all the latest versions as well https://anaconda.org/nvidia/cuda-libraries - use the same version as your pytorch, to check which pytorch cuda version is used run:

python -c 'import torch; print(f"pt={torch.__version__}, cuda={torch.version.cuda}")'

hi @stas00 ,
it's not working, i change the env var LD_LIBRARY_PATH but the err is same as before

@zydmtaichi
Copy link

@stas00 pls see the detail below

[1/1] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so 
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

@zydmtaichi
Copy link

i export the LD_LIBRARY_PATH first and run the ninja -v under concern folder but it has no change for linkage err, Why?

@stas00
Copy link
Collaborator

stas00 commented Jul 31, 2024

I see you have used /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib for LD_LIBRARY_PATH

Did you check that you have libcurand.so under /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib?

and which conda package did you install?

@zydmtaichi
Copy link

I see you have used /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib for LD_LIBRARY_PATH

Did you check that you have libcurand.so under /root/miniconda3/envs/llmtest/lib/python3.10/site-packages/torch/lib?

and which conda package did you install?

I link the libcurand.so to the folder and still no change. and i find it works good if i add a link dir -L/usr/local/cuda/lib64 in ldflags of build.ninja file.

it seems like deepspeed gen a wrong ninja config and cause the program fail. i think this should be a bug and require a fix

@zydmtaichi
Copy link

@stas00 for detail info, pls refer the 5813 issue i mention above

@stas00
Copy link
Collaborator

stas00 commented Jul 31, 2024

I link the libcurand.so to the folder and still no change. and i find it works good if i add a link dir -L/usr/local/cuda/lib64 in ldflags of build.ninja file.

In which case you need to set:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

To automate this see: https://askubuntu.com/questions/210884/setting-ld-library-path-for-cuda

The other solution that often helps is to set CUDA_HOME

export CUDA_HOME=/usr/local/cuda

@zydmtaichi
Copy link

I link the libcurand.so to the folder and still no change. and i find it works good if i add a link dir -L/usr/local/cuda/lib64 in ldflags of build.ninja file.

In which case you need to set:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

To automate this see: https://askubuntu.com/questions/210884/setting-ld-library-path-for-cuda

The other solution that often helps is to set CUDA_HOME

export CUDA_HOME=/usr/local/cuda

nope, the env LD_LIBRARY_PATH take effect at runtime so it is only used when program loaded. and for now the cpu_adam fail at compile time. So LD_LIBRARY_PATH provides no help. I manuall modify the build.ninja and run ninja -v under ~/.cache/torch_extensions/py310_cu118/cpu_adam, it can be good. However, if i run the whole training program at start, the deepspeed will overide the build.ninja file to wrong setting again and it still crush at compile time

@stas00
Copy link
Collaborator

stas00 commented Jul 31, 2024

What I shared works for me. That's what I use on my desktop to build deepspeed.

@zydmtaichi
Copy link

What I shared works for me. That's what I use on my desktop to build deepspeed.
but it's not work for me. To be more specifically, i encounter this problem when i use llama factory intergrated with deepspeed to train a model, you can follow the link below to get more info.(skip the chinese and focus on output and logs if you have difficulty in reading chinese content)

hiyouga/LLaMA-Factory#5020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests