Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

multi-GPU training throw an illegal memory access #32

Closed
zdwong opened this issue Jan 25, 2018 · 64 comments
Closed

multi-GPU training throw an illegal memory access #32

zdwong opened this issue Jan 25, 2018 · 64 comments

Comments

@zdwong
Copy link

zdwong commented Jan 25, 2018

When I use one GPU to train, there is no problem. But when I use two or four GPUs, the problem come out. The log output:

terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
*** Aborted at 1516866180 (unix time) try "date -d @1516866180" if you are using GNU date ***
terminate called recursively
terminate called recursively
terminate called recursively
PC: @ 0x7ff67559f428 gsignal
terminate called recursively
terminate called recursively
E0125 07:43:00.745853 55683 pybind_state.h:422] Exception encountered running PythonOp function: RuntimeError: [enforce fail at context_gpu.h:307] error == cudaSuccess. 77 vs 0. Error at: /mnt/hzhida/project/caffe2/caffe2/core/context_gpu.h:307: an illegal memory access was encountered

At:
/mnt/hzhida/facebook/detectron/lib/ops/generate_proposals.py(101): forward
*** SIGABRT (@0x3e80000d84f) received by PID 55375 (TID 0x7ff453fff700) from PID 55375; stack trace: ***
terminate called recursively
@ 0x7ff675945390 (unknown)
@ 0x7ff67559f428 gsignal
@ 0x7ff6755a102a abort
@ 0x7ff66f37e84d __gnu_cxx::__verbose_terminate_handler()
@ 0x7ff66f37c6b6 (unknown)
@ 0x7ff66f37c701 std::terminate()
@ 0x7ff66f3a7d38 (unknown)
@ 0x7ff67593b6ba start_thread
@ 0x7ff67567141d clone
@ 0x0 (unknown)
Aborted (core dumped)

@yousongzhu
Copy link

I got the same error. The difference is when i use one GPU or two GPUs , there is no problem. But using 4 GPUs to train Mask RCNN (mask_rcnn_R-101-FPN) or RetinaNet (retinanet_R-101-FPN), the same problem occurs.

@lwher
Copy link

lwher commented Jan 25, 2018

I have the same problem when I train the tutorial_Res50 network with two or more GPUs.

@jwnsu
Copy link

jwnsu commented Jan 25, 2018

Encountered same issue when specifying GPU ids (i.e. different from lowest ids, e.g. '1,3,5,7' for 4 GPUs). If lowest GPU ids are specified, training goes on fine.

@rbgirshick
Copy link
Contributor

@jwnsu: we're working on a fix so that when CUDA_VISIBLE_DEVICES does not use the lowest ids training still works. Thanks for reporting and diagnosing.

@rbgirshick
Copy link
Contributor

Hi @jwnsu, @coolbrain, @tshizys, @lwher: we are unable to reproduce this issue on our side.

Can you each provide some more information that might reveal a common pattern?

In particular:

  • Operating system: ?
  • Compiler version: ?
  • CUDA version: ?
  • cuDNN version: ?
  • NVIDIA driver version: ?
  • GPU models (for all devices if they are not all the same): ?
  • Anything else that seems relevant: ?

Here's what we see when training, for example, with GPU ids 1,3,5,7:

CUDA_VISIBLE_DEVICES=1,3,5,7 python2 tools/train_net.py --cfg configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml OUTPUT_DIR /tmp/dbg-cvd-train TRAIN.DATASETS "('coco_2014_minival',)" NUM_GPUS 4

Every 0.1s: nvidia-smi                                                                                                                                                                                                                                                                                                                             Fri Jan 26 09:09:26 2018

Fri Jan 26 09:09:26 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M40           On   | 0000:07:00.0     Off |                  Off |
|  0%   42C    P8    17W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M40           On   | 0000:08:00.0     Off |                  Off |
|  0%   51C    P0   144W / 250W |   7214MiB / 12209MiB |     46%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M40           On   | 0000:09:00.0     Off |                  Off |
|  0%   38C    P8    19W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M40           On   | 0000:0A:00.0     Off |                  Off |
|  0%   52C    P0   220W / 250W |   7502MiB / 12209MiB |     38%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla M40           On   | 0000:0B:00.0     Off |                  Off |
|  0%   40C    P8    17W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla M40           On   | 0000:0C:00.0     Off |                  Off |
|  0%   60C    P0    85W / 250W |   7081MiB / 12209MiB |     48%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla M40           On   | 0000:0D:00.0     Off |                  Off |
|  0%   40C    P8    20W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla M40           On   | 0000:0E:00.0     Off |                  Off |
|  0%   56C    P0    81W / 250W |   7494MiB / 12209MiB |     40%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7210MiB |
|    3   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7498MiB |
|    5   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7077MiB |
|    7   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7490MiB |
+-----------------------------------------------------------------------------+

@zdwong
Copy link
Author

zdwong commented Jan 27, 2018

Operating system: Ubuntu 16.04
Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0
CUDA version: 8.0
cuDNN version: v5.1
NVIDIA driver version: 384.111

nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+|
| 0 Tesla M60 Off | 00001543:00:00.0 Off | Off |
| N/A 42C P0 41W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 00003134:00:00.0 Off | Off |
| N/A 42C P0 39W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 00004975:00:00.0 Off | Off |
| N/A 38C P0 41W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000F3E6:00:00.0 Off | Off |
| N/A 38C P0 40W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

@yousongzhu
Copy link

Operating system: CentOS Linux release 7.1.1503
Compiler version: gcc version 4.8.2
CUDA version: CUDA 8.0
cuDNN version: cuDNN 6.0.21
NVIDIA driver version: 375.26
GPU models: 4x GeForce GTX TITAN X (12G)

nvidia-smi:
image

When using 4 GPUs (0,1,2,3) to train Mask RCNN (e2e_mask_rcnn_R-101-FPN) , RetinaNet (retinanet_R-101-FPN) or Faster RCNN (e2e_faster_rcnn_R-50-FPN), the error “context_gpu.h:307: an illegal memory access was encountered” or “context_gpu.h:170. Encountered CUDA error: an illegal memory access was encountered Error from operator: input: "gpu_0/retnet_cls_pred_fpn3_b_grad" input: "gpu_2/retnet_cls_pred_fpn3_b_grad" output: "gpu_0/retnet_cls_pred_fpn3_b_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 } ” occurs.

But using one GPU or two GPUS (0,1 or 2,3), it can be trained normally.
Thanks.

@rbgirshick
Copy link
Contributor

@jwnsu: looking at your error more closely ("invalid device ordinal"), it looks like you're trying to train with a config set up for 8 GPUs but restricting the process to have only access to 4 (via CUDA_VISIBLE_DEVICES). The "invalid device ordinal" error is because it's trying to create ops on devices that the process does not have access to.

@rbgirshick
Copy link
Contributor

rbgirshick commented Jan 27, 2018

@coolbrain, @tshizys: thanks for the details. What happens if you use two GPUs using ids {0,2}, {0,3}, {1,2}, or {1,3}?

@jwnsu
Copy link

jwnsu commented Jan 27, 2018

@rbgirshick you are right, picked wrong config file (with 8 GPUs setting) to try yesterday. Just tried again with the right config file (4 GPUs, error from gpu ids "1,2,4,5", "0,1,2,3" works fine), the error is now similar to what others are seeing:

I0127 09:06:48.220716 10872 context_gpu.cu:325] Total: 20748 MB
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
  what():  [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/retnet_bbox_pred_fpn3_b_grad" input: "gpu_2/retnet_bbox_pred_fpn3_b_grad" output: "gpu_0/retnet_bbox_pred_fpn3_b_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
  what():  [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_2/retnet_cls_conv_n3_fpn3" input: "gpu_2/__m13_shared" output: "gpu_2/__m13_shared" name: "" type: "ReluGradient" arg { name: "cudnn_exhaustive_search" i: 0 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 cuda_gpu_id: 2 } engine: "CUDNN" is_gradient_op: true
*** Aborted at 1517072808 (unix time) try "date -d @1517072808" if you are using GNU date ***
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
PC: @     0x7fd71f6bd428 gsignal
*** SIGABRT (@0x3e900002a18) received by PID 10776 (TID 0x7fd548e3d700) from PID 10776; stack trace: ***
    @     0x7fd71fa63390 (unknown)
    @     0x7fd71f6bd428 gsignal
    @     0x7fd71f6bf02a abort
    @     0x7fd71b51c84d __gnu_cxx::__verbose_terminate_handler()
    @     0x7fd71b51a6b6 (unknown)
    @     0x7fd71b51a701 std::terminate()
    @     0x7fd71b545d38 (unknown)
    @     0x7fd71fa596ba start_thread
    @     0x7fd71f78f41d clone
    @                0x0 (unknown)
./itrain4.sh: line 9: 10776 Aborted                 (core dumped) python2 tools/train_net.py --multi-gpu-testing --cfg configs/iret-rn50-fpn-voc.yaml OUTPUT_DIR ./output

@rbgirshick
Copy link
Contributor

@coolbrain, @tshizys: one shot in the dark is to switch the all-reduce implementation to nccl by passing USE_NCCL True to train_net.py, as in:

python2 tools/train_net.py --multi-gpu-testing \
  --cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R-50-FPN.yaml \
  OUTPUT_DIR /tmp/output USE_NCCL True

This will require Caffe2 to have been built with nccl ops -- I'm not sure if this is done by default or will require some work to rebuild Caffe2 with nccl support.

@yousongzhu
Copy link

@rbgirshick , when using two GPUs, i.e. {0,2}, {0,3}, {1,2}, {1,3}, the error still exists. Here is the details, using {0,3} and training RetinaNet (retinanet_R-101-FPN) for example:

F0128 12:09:08.461153 4938 context_gpu.cu:387] Error at: /home/yszhu/local/caffe2/caffe2/core/context_gpu.cu:387: an illegal memory access was encountered
*** Check failure stack trace: ***
terminate called recursively
terminate called recursively
*** Aborted at 1517112548 (unix time) try "date -d @1517112548" if you are using GNU date ***
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/fpn_6_relu" input: "gpu_0/fpn_7_w" input: "gpu_0/__m23_shared" output: "gpu_0/fpn_7_w_grad" output: "gpu_0/fpn_7_b_grad" output: "gpu_0/__m22_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
@ 0x7f2bdf712772 google::LogMessage::Fail()
PC: @ 0x0 (unknown)
*** SIGABRT (@0x3e8000012b7) received by PID 4791 (TID 0x7f2a6effd700) from PID 4791; stack trace: ***
@ 0x7f2bdf7126ce google::LogMessage::SendToLog()
@ 0x7f2c2670e130 (unknown)
@ 0x7f2bdf71204c google::LogMessage::Flush()
@ 0x7f2c25c6a5d7 __GI_raise
@ 0x7f2bdf71556d google::LogMessageFatal::~LogMessageFatal()
@ 0x7f2c25c6bcc8 __GI_abort
@ 0x7f2c1b1b1965 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f2bdfdd1180 caffe2::CUDAContext::Delete()
@ 0x7f2c1b1af946 (unknown)
@ 0x7f2be27f42d9 std::_Sp_counted_base<>::_M_release()
@ 0x7f2c1b1af973 std::terminate()
@ 0x7f2c1b2062c5 (unknown)
@ 0x7f2bdfd377d1 caffe2::Tensor<>::ResizeLike<>()
@ 0x7f2c26706df5 start_thread
@ 0x7f2bdfd6e3e2 ZN6caffe210CuDNNState7executeIRZNS_19CudnnConvGradientOp13DoRunWithTypeIffffffffEEbvEUlPS0_E1_EEvP11CUstream_stOT
@ 0x7f2c25d2b1ad __clone
@ 0x7f2bdfd707e1 caffe2::CudnnConvGradientOp::DoRunWithType<>()
@ 0x0 (unknown)

image

The forms of error are not the same each time, but it's just "Encountered CUDA error: an illegal memory access was encountered".

@yousongzhu
Copy link

I also rebuild caffe2 with nccl-1.3.5 (following https://caffe2.ai/docs/getting-started.html?platform=centos&configuration=cloud#null__troubleshooting):

image

and switch the all-reduce implementation to nccl by passing USE_NCCL True to train_net.py, as in:

python2 tools/train_net.py --multi-gpu-testing
--cfg configs/12_2017_baselines/retinanet_R-101-FPN_1x_4gpus.yaml
OUTPUT_DIR results_retinanet_R-101-FPN_1x_4gpus_model USE_NCCL True

The error disappeared ^--^ for both using four GPUs {0,1,2,3} or any of two GPUs {0,2}, {0,3}, {1,2}, {1,3}.
@rbgirshick ,thanks very much.

@lwher
Copy link

lwher commented Jan 29, 2018

Hi, I open the nccl op to train the tutorial_network and the error above disappeared. However, the program hangs after loading data and occupy 100% CPU all the time.

.......
I0129 03:25:13.106998 118074 context_gpu.cu:321] GPU 0: 2175 MB
I0129 03:25:13.107028 118074 context_gpu.cu:321] GPU 1: 2078 MB
I0129 03:25:13.107045 118074 context_gpu.cu:321] GPU 2: 2266 MB
I0129 03:25:13.107059 118074 context_gpu.cu:321] GPU 3: 1860 MB
I0129 03:25:13.107072 118074 context_gpu.cu:325] Total: 8381 MB
I0129 03:25:13.122316 118079 context_gpu.cu:321] GPU 0: 2195 MB
I0129 03:25:13.122344 118079 context_gpu.cu:321] GPU 1: 2145 MB
I0129 03:25:13.122361 118079 context_gpu.cu:321] GPU 2: 2267 MB
I0129 03:25:13.122378 118079 context_gpu.cu:321] GPU 3: 1924 MB
I0129 03:25:13.122395 118079 context_gpu.cu:325] Total: 8532 MB
I0129 03:25:13.151623 118079 context_gpu.cu:321] GPU 0: 2245 MB
I0129 03:25:13.151650 118079 context_gpu.cu:321] GPU 1: 2159 MB
I0129 03:25:13.152823 118079 context_gpu.cu:321] GPU 2: 2269 MB
I0129 03:25:13.153623 118079 context_gpu.cu:321] GPU 3: 2020 MB
I0129 03:25:13.154454 118079 context_gpu.cu:325] Total: 8694 MB
I0129 03:25:13.186017 118079 context_gpu.cu:321] GPU 0: 2260 MB
I0129 03:25:13.186053 118079 context_gpu.cu:321] GPU 1: 2214 MB
I0129 03:25:13.186067 118079 context_gpu.cu:321] GPU 2: 2279 MB
I0129 03:25:13.186077 118079 context_gpu.cu:321] GPU 3: 2080 MB
I0129 03:25:13.186089 118079 context_gpu.cu:325] Total: 8835 MB
I0129 03:25:13.215306 118076 context_gpu.cu:321] GPU 0: 2310 MB
I0129 03:25:13.215342 118076 context_gpu.cu:321] GPU 1: 2269 MB
I0129 03:25:13.215351 118076 context_gpu.cu:321] GPU 2: 2308 MB
I0129 03:25:13.215368 118076 context_gpu.cu:321] GPU 3: 2081 MB
I0129 03:25:13.215384 118076 context_gpu.cu:325] Total: 8970 MB
I0129 03:25:13.307595 118084 context_gpu.cu:321] GPU 0: 2310 MB
I0129 03:25:13.307623 118084 context_gpu.cu:321] GPU 1: 2301 MB
I0129 03:25:13.307641 118084 context_gpu.cu:321] GPU 2: 2391 MB
I0129 03:25:13.307652 118084 context_gpu.cu:321] GPU 3: 2104 MB
I0129 03:25:13.307665 118084 context_gpu.cu:325] Total: 9108 MB
I0129 03:25:13.324935 118077 context_gpu.cu:321] GPU 0: 2312 MB
I0129 03:25:13.324965 118077 context_gpu.cu:321] GPU 1: 2313 MB
I0129 03:25:13.324982 118077 context_gpu.cu:321] GPU 2: 2452 MB
I0129 03:25:13.324993 118077 context_gpu.cu:321] GPU 3: 2171 MB
I0129 03:25:13.325011 118077 context_gpu.cu:325] Total: 9250 MB
I0129 03:25:13.343673 118080 context_gpu.cu:321] GPU 0: 2336 MB
I0129 03:25:13.343698 118080 context_gpu.cu:321] GPU 1: 2380 MB
I0129 03:25:13.343715 118080 context_gpu.cu:321] GPU 2: 2468 MB
I0129 03:25:13.343731 118080 context_gpu.cu:321] GPU 3: 2233 MB
I0129 03:25:13.343747 118080 context_gpu.cu:325] Total: 9417 MB
I0129 03:25:13.369802 118085 cuda_nccl_gpu.cc:110] Creating NCCLContext for key: 0:0,1,2,3,
I0129 03:25:13.381914 118076 context_gpu.cu:321] GPU 0: 2361 MB
I0129 03:25:13.381942 118076 context_gpu.cu:321] GPU 1: 2453 MB
I0129 03:25:13.381961 118076 context_gpu.cu:321] GPU 2: 2524 MB
I0129 03:25:13.381978 118076 context_gpu.cu:321] GPU 3: 2247 MB
I0129 03:25:13.381995 118076 context_gpu.cu:325] Total: 9587 MB
I0129 03:25:13.613253 118083 context_gpu.cu:321] GPU 0: 2388 MB
I0129 03:25:13.613292 118083 context_gpu.cu:321] GPU 1: 2525 MB
I0129 03:25:13.613301 118083 context_gpu.cu:321] GPU 2: 2524 MB
I0129 03:25:13.613308 118083 context_gpu.cu:321] GPU 3: 2310 MB
I0129 03:25:13.613315 118083 context_gpu.cu:325] Total: 9748 MB

the program hangs......

my environment:
Operating system: Ubuntu 16.04
Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0
CUDA version: 8.0
cuDNN version: v5.1
NVIDIA driver version: 384.111

nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+|
| 0 Tesla M60 Off | 00001543:00:00.0 Off | Off |
| N/A 42C P0 41W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 00003134:00:00.0 Off | Off |
| N/A 42C P0 39W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 00004975:00:00.0 Off | Off |
| N/A 38C P0 41W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000F3E6:00:00.0 Off | Off |
| N/A 38C P0 40W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

@rbgirshick
Copy link
Contributor

@lwher: that's unfortunate. The reason we don't use NCCL by default is that it's prone to causing deadlocks, which is what I think you're seeing.

@zdwong
Copy link
Author

zdwong commented Jan 29, 2018

After rebuilding caffe2 with NCCL, I rerun the program with this script:
python tools/train_net.py
--multi-gpu-testing
--cfg configs/getting_started/tutorial_4gpu_e2e_faster_rcnn_R-50-FPN.yaml
OUTPUT_DIR ./output USE_NCCL True

It throws this error:

Creating NCCLContext for key: 0:0,1,2,3,
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at cuda_nccl_gpu.cc:40] status == ncclSuccess. 2 vs 0. Error at: /mnt/hzhida/project/caffe2/caffe2/contrib/nccl/cuda_nccl_gpu.cc40: system error Error from operator:
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" input: "gpu_2/rpn_cls_logits_fpn2_w_grad" input: "gpu_3/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" output: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_2/rpn_cls_logits_fpn2_w_grad" output: "gpu_3/rpn_cls_logits_fpn2_w_grad" name: "" type: "NCCLAllreduce" device_option { device_type: 1 cuda_gpu_id: 0 }
*** Aborted at 1517210588 (unix time) try "date -d @1517210588" if you are using GNU date ***
PC: @ 0x7ff1e0383428 gsignal
*** SIGABRT (@0x3e800007a46) received by PID 31302 (TID 0x7fefb5ffb700) from PID 31302; stack trace: ***
I0129 07:23:08.187249 31591 cuda_nccl_gpu.cc:110] Creating NCCLContext for key: 0:0,1,2,3,

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
terminate called recursively
@ 0x7ff1e0729390 (unknown)
I0129 07:23:08.188051 31592 context_gpu.cu:321] GPU 0: 2466 MB
I0129 07:23:08.188074 31592 context_gpu.cu:321] GPU 1: 2387 MB
I0129 07:23:08.188091 31592 context_gpu.cu:321] GPU 2: 2311 MB
I0129 07:23:08.188099 31592 context_gpu.cu:321] GPU 3: 2382 MB
I0129 07:23:08.188107 31592 context_gpu.cu:325] Total: 9548 MB
@ 0x7ff1e0383428 gsignal
@ 0x7ff1e038502a abort
@ 0x7ff1da16284d __gnu_cxx::__verbose_terminate_handler()
@ 0x7ff1da1606b6 (unknown)
@ 0x7ff1da160701 std::terminate()
@ 0x7ff1da18bd38 (unknown)
@ 0x7ff1e071f6ba start_thread
@ 0x7ff1e045541d clone
@ 0x0 (unknown)
Aborted (core dumped)

Running Environment:
Operating system: Ubuntu 16.04
Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0
CUDA version: 8.0
cuDNN version: v5.1
NVIDIA driver version: 384.111

nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+|
| 0 Tesla M60 Off | 00001543:00:00.0 Off | Off |
| N/A 42C P0 41W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 00003134:00:00.0 Off | Off |
| N/A 42C P0 39W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 00004975:00:00.0 Off | Off |
| N/A 38C P0 41W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000F3E6:00:00.0 Off | Off |
| N/A 38C P0 40W / 150W | 0MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

@ir413
Copy link
Contributor

ir413 commented Jan 29, 2018

One additional note about NCCL: Caffe2 builds with NCCL by default so there is no need to rebuild it.

@Yangqing
Copy link
Contributor

Yangqing commented Jan 30, 2018

Jumping onto this: since the illegal memory access is from the Add operator, you might want to check if direct peer access is available between the gpus that you are using. Current Add op relies on that, and if not we might want to fix the code indeed. Basically, to do so, in python, do:

from caffe2.python import workspace
print(workspace.GetCudaPeerAccessPattern())

Could you paste the output of that for debugging? (Especially, if you are using CUDA_VISIBLE_DEVICES, make sure you invoke python with that too)

@jwnsu
Copy link

jwnsu commented Jan 30, 2018

@Yangqing output from your two debug lines:

[[ True  True False False]
[ True  True False False]
[False False  True  True]
[False False  True  True]]

thx for looking into this issue (and ... caffe/caffe2 frameworks!)

@Yangqing
Copy link
Contributor

@jwnsu thanks! Just to confirm, so the Add operator is adding tensors across gpu {0,1} and {2,3} right? (I assume it is adding stuff together from the 4 gpus).

@jwnsu
Copy link

jwnsu commented Jan 30, 2018

It's 4 gpus config, with GPU ids specified as "0,1,2,4" (via CUDA_VISIBLE_DEVICES.) If GPU ids are configured as "0,1,2,3" (lowest GPU ids), it works fine without any error.

@Liang-Sen
Copy link

@Yangqing
My Linux Server have 4 M60 GPUs,
This is my workspace.GetCudaPeerAccessPattern() output:
[[ True False False False]
[False True False False]
[False False True False]
[False False False True]]

I can train net using 1 gpu well, but when I train net using 2 or 4 GPUS, I meet problems the same above, even I set NCCL = True

@Yangqing
Copy link
Contributor

Thanks guys. This verifies my assumption that the illegal memory access comes from the Add op not properly handling cross-device communications when peer access is not enabled. Will issue a fix.

@JohnnyGambler
Copy link

JohnnyGambler commented Jan 31, 2018

same problem in cross-device communications...
this machine can use 4 GPU[0,1,2,3]:
image
this machine can use [0,1] and [2,3]:
image

BTW, I have use 12 Cpu and 4 titan x to train 3D Faster RCNN in pytorch framework . Why Pytorch doesn't have this problem ????

@illutheplanet
Copy link

anybody tells me whether I can run mask r-cnn with only one GPU?

@yuzcccc
Copy link

yuzcccc commented Apr 25, 2018

@daquexian I tried your PR, it works!!! Thanks very much

@Feynman27
Copy link

@daquexian This PR doesn't appear to work for me. I'm experiencing deadlocks while using a single GPU without NCCL and also while using 2 GPUs with USE_NCCL True. After changing muji.py according to your PR and running with 2 GPUs with USE_NCCL True, I'm still experiencing a deadlock; the training just pauses at random iteration numbers.

@daquexian
Copy link
Contributor

daquexian commented May 2, 2018 via email

@Feynman27
Copy link

Feynman27 commented May 2, 2018

Maybe I'm missing something, but if I set USE_NCCL=False, and use your modified muji.py and muji_test.py PR, I get the original error:

I0502 14:35:57.192476 79712 context_gpu.cu:318] Total: 23025 MB
E0502 14:35:58.382604 79711 net_dag.cc:195] Exception from operator chain starting at '' (type 'Add'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:156] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/rpn_cls_logits_fpn2_b_grad" input: "gpu_1/rpn_cls_logits_fpn2_b_grad" output: "gpu_0/rpn_cls_logits_fpn2_b_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
E0502 14:35:58.382622 79712 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'Add'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:156] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
F0502 14:35:58.382670 79711 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0502 14:35:58.382683 79712 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
E0502 14:35:58.383510 79709 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_1/fpn_res3_3_sum" input: "gpu_1/conv_rpn_fpn2_w" input: "gpu_1/__m18_shared" output: "_gpu_1/conv_rpn_fpn2_w_grad_autosplit_2" output: "_gpu_1/conv_rpn_fpn2_b_grad_autosplit_2" output: "_gpu_1/fpn_res3_3_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 1 } engine: "CUDNN" is_gradient_op: true
E0502 14:35:58.383541 79713 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at conv_op_cudnn.cc:1290] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /home/markable-ai/pytorch/caffe2/operators/conv_op_cudnn.cc:1290: CUDNN_STATUS_EXECUTION_FAILED Error from operator: 
input: "gpu_3/conv_rpn_fpn4" input: "gpu_3/rpn_bbox_pred_fpn2_w" input: "gpu_3/rpn_bbox_pred_fpn4_grad" output: "_gpu_3/rpn_bbox_pred_fpn2_w_grad_autosplit_1" output: "_gpu_3/rpn_bbox_pred_fpn2_b_grad_autosplit_1" output: "gpu_3/__m13_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 3 } engine: "CUDNN" is_gradient_op: true
E0502 14:35:58.383591 79706 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_3/conv_rpn_fpn3" input: "gpu_3/rpn_cls_logits_fpn2_w" input: "gpu_3/rpn_cls_logits_fpn3_grad" output: "_gpu_3/rpn_cls_logits_fpn2_w_grad_autosplit_2" output: "_gpu_3/rpn_cls_logits_fpn2_b_grad_autosplit_2" output: "_gpu_3/conv_rpn_fpn3_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 3 } engine: "CUDNN" is_gradient_op: true
F0502 14:35:58.382683 79712 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encounteredF0502 14:35:58.434631 79709 context_gpu.h:107] FCheck failed: error == cudaSuccess an illegal memory access was encountered0502 14:35:58.434648 79713 c*** Check failure stack trace: ***
E0502 14:35:58.383741 79700 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_3/conv_rpn_fpn2" input: "gpu_3/rpn_cls_logits_fpn2_w" input: "gpu_3/rpn_cls_logits_fpn2_grad" output: "_gpu_3/rpn_cls_logits_fpn2_w_grad_autosplit_3" output: "_gpu_3/rpn_cls_logits_fpn2_b_grad_autosplit_3" output: "_gpu_3/conv_rpn_fpn2_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 3 } engine: "CUDNN" is_gradient_op: true
Aborted (core dumped)

I'm using Cuda 9.1, cudnn 7.1 with 4 V100s.

@daquexian
Copy link
Contributor

daquexian commented May 2, 2018

@Feynman27 Could you tell me which branch(like Allreduce4, Allreduce4Group2, Allreduce2 or others) of Allreduce in the updated muji.py is entered? You might want to add some print functions in these branch to know it. And what if you replace the implementation of Allreduce by just calling AllreduceFallback? It will be great if you can also provide your gpu access pattern like #32 (comment). Thanks!

@Feynman27
Copy link

Allreduce4 is being called. The gpu access pattern is:

>>> from caffe2.python import workspace
>>> print(workspace.GetCudaPeerAccessPattern())
[[ True False False False]
 [False  True False False]
 [False False  True False]
 [False False False  True]]

I'll try calling AllreduceFallback.

@Feynman27
Copy link

Feynman27 commented May 2, 2018

Calling AllreduceFallback gives a similar error as above:

I0502 17:08:51.294476 88651 context_gpu.cu:318] Total: 22524 MB
E0502 17:08:52.009866 88659 net_dag.cc:195] Exception from operator chain starting at '' (type 'Add'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:156] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
F0502 17:08:52.009990 88659 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
E0502 17:08:52.010440 88651 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_2/fpn_res3_3_sum" input: "gpu_2/conv_rpn_fpn2_w" input: "gpu_2/__m15_shared" output: "_gpu_2/conv_rpn_fpn2_w_grad_autosplit_2" output: "_gpu_2/conv_rpn_fpn2_b_grad_autosplit_2" output: "_gpu_2/fpn_res3_3_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 2 } engine: "CUDNN" is_gradient_op: true
E0502 17:08:52.010524 88663 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_1/fpn_res2_2_sum" input: "gpu_1/conv_rpn_fpn2_w" input: "gpu_1/__m12_shared" output: "_gpu_1/conv_rpn_fpn2_w_grad_autosplit_3" output: "_gpu_1/conv_rpn_fpn2_b_grad_autosplit_3" output: "_gpu_1/fpn_res2_2_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 1 } engine: "CUDNN" is_gradient_op: true
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encountered
*** Check failure stack trace: ***
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encounteredF0502 17:08:52.061641 88651 context_gpu.hF107] 502 17:Ch:ck failed: error == cudaSuccess 52.061651 88663 context_gpu.h:
E0502 17:08:52.010577 88653 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_0/fpn_res4_22_sum" input: "gpu_0/conv_rpn_fpn2_w" input: "gpu_0/__m15_shared" output: "_gpu_0/conv_rpn_fpn2_w_grad_autosplit_1" output: "_gpu_0/conv_rpn_fpn2_b_grad_autosplit_1" output: "_gpu_0/fpn_res4_22_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
*** Check failure stack trace: ***
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encounteredF0502 17:08:52.061641 88651 context_gpu.hF107] 502 17:Ch:ck failed: error == cudaSuccess 52.061651 88663 context_gpu.h:
07] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encounteredF0502 17:08:52.061641 88651 context_gpu.hF107] 502 17:Ch:ck failed: error == cudaSuccess 52.061651 88663 context_gpu.h:
07] Check failed: error == cudaSuccess an illegal memory access was encounteredF0502 17:08:52.061749 88653 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
Aborted (core dumped

@daquexian
Copy link
Contributor

daquexian commented May 3, 2018

@Feynman27 It's strange. According to your gpu access pattern, AllreduceFallback instead of Allreduce4 will be called. And when you called AllreduceFallback manually, the error message doesn't appear to be came from AllreduceFallback. Did you change the muji.py in right folder? For example, if the python package of caffe2 is in /usr/lib/python/site-packages/caffe2, then changing the muji.py in caffe2's source folder(like ~/caffe2/python) will not work.

@yuzcccc
Copy link

yuzcccc commented May 3, 2018

@Feynman27 did you rebuild the caffe2 ?

@Feynman27
Copy link

Feynman27 commented May 3, 2018

@daquexian The caffe2 package is installed under pytorch/caffe2, not /usr/lib/python/site-packages/caffe2 or anything else. I've set my $PYTHONPATH to look in this directory. I've also confirmed this by:

Python 2.7.14 |Anaconda, Inc.| (default, Mar 27 2018, 17:29:31) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe2
>>> caffe2.__file__
'/home/markable-ai/pytorch/build/caffe2/__init__.pyc'
>>> from caffe2.python import muji
>>> muji.__file__
'/home/markable-ai/pytorch/build/caffe2/python/muji.pyc'
>>> 

I simply modified the muji.py file under pytorch/caffe2/python/muji.py.

@yuzcccc I didn't rebuild caffe2, but why would I have to? I'm only modifying a python file.

@daquexian
Copy link
Contributor

@Feynman27 I think you should modify muji.py under /home/markable-ai/pytorch/build/caffe2/python/muji.py

@Feynman27
Copy link

Feynman27 commented May 3, 2018

Yep, that was my oversight. Good catch. I was modifying pytorch/caffe2/python/muji.py and should have modified pytorch/build/caffe2/python/muji.py.

@daquexian
Copy link
Contributor

@Feynman27 It's happy to see it working :)
@Yangqing Could you please review my pr pytorch/pytorch#6896? It may help many detectron users :)

@Feynman27
Copy link

@daquexian Unfortunately, I still seem to be experiencing deadlocks.

@daquexian
Copy link
Contributor

@Feynman27 Hmm.. What is the value of USE_NCCL? It should be False

@Feynman27
Copy link

Yes, USE_NCCL was set to false.

@daquexian
Copy link
Contributor

@Feynman27 Sorry I have no idea why it will cause deadlock. It's hard to reproduce for me

@Feynman27
Copy link

Fair enough. For all I know, the deadlock I'm experiencing could be unrelated to whether or not GPU peer access is enabled. Your PR definitely allowed me to start training with USE_NCCL=False. I'm running on Azure machines, so it could be related to running on their VMs. I've started training on local machines with 2 TitanXs and the training seems to be progressing just fine.

@mks0601
Copy link

mks0601 commented May 11, 2018

@daquexian Thanks! Your PR worked for me!

@gadcam
Copy link
Contributor

gadcam commented May 28, 2018

Looks like this issue can be closed.

@rbgirshick
Copy link
Contributor

@gadcam thanks for helping to identify issues that can be closed!

For this one, I'd like to leave it open until there's a fix merged into Caffe2.

@daquexian
Copy link
Contributor

@rbgirshick Unfortunately no one reviews my PR :|

@daquexian
Copy link
Contributor

@rbgirshick Thanks! My PR pytorch/pytorch#6896 has been merged. It looks like this issue can be closed :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests