multi-GPU training throw an illegal memory access #32

zdwong · 2018-01-25T07:48:11Z

When I use one GPU to train, there is no problem. But when I use two or four GPUs, the problem come out. The log output:

terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
*** Aborted at 1516866180 (unix time) try "date -d @1516866180" if you are using GNU date ***
terminate called recursively
terminate called recursively
terminate called recursively
PC: @ 0x7ff67559f428 gsignal
terminate called recursively
terminate called recursively
E0125 07:43:00.745853 55683 pybind_state.h:422] Exception encountered running PythonOp function: RuntimeError: [enforce fail at context_gpu.h:307] error == cudaSuccess. 77 vs 0. Error at: /mnt/hzhida/project/caffe2/caffe2/core/context_gpu.h:307: an illegal memory access was encountered

At:
/mnt/hzhida/facebook/detectron/lib/ops/generate_proposals.py(101): forward
*** SIGABRT (@0x3e80000d84f) received by PID 55375 (TID 0x7ff453fff700) from PID 55375; stack trace: ***
terminate called recursively
@ 0x7ff675945390 (unknown)
@ 0x7ff67559f428 gsignal
@ 0x7ff6755a102a abort
@ 0x7ff66f37e84d __gnu_cxx::__verbose_terminate_handler()
@ 0x7ff66f37c6b6 (unknown)
@ 0x7ff66f37c701 std::terminate()
@ 0x7ff66f3a7d38 (unknown)
@ 0x7ff67593b6ba start_thread
@ 0x7ff67567141d clone
@ 0x0 (unknown)
Aborted (core dumped)

yousongzhu · 2018-01-25T08:05:51Z

I got the same error. The difference is when i use one GPU or two GPUs , there is no problem. But using 4 GPUs to train Mask RCNN (mask_rcnn_R-101-FPN) or RetinaNet (retinanet_R-101-FPN), the same problem occurs.

lwher · 2018-01-25T10:25:55Z

I have the same problem when I train the tutorial_Res50 network with two or more GPUs.

jwnsu · 2018-01-25T16:02:20Z

Encountered same issue when specifying GPU ids (i.e. different from lowest ids, e.g. '1,3,5,7' for 4 GPUs). If lowest GPU ids are specified, training goes on fine.

rbgirshick · 2018-01-25T16:42:58Z

@jwnsu: we're working on a fix so that when CUDA_VISIBLE_DEVICES does not use the lowest ids training still works. Thanks for reporting and diagnosing.

rbgirshick · 2018-01-26T17:26:06Z

Hi @jwnsu, @coolbrain, @tshizys, @lwher: we are unable to reproduce this issue on our side.

Can you each provide some more information that might reveal a common pattern?

In particular:

Operating system: ?
Compiler version: ?
CUDA version: ?
cuDNN version: ?
NVIDIA driver version: ?
GPU models (for all devices if they are not all the same): ?
Anything else that seems relevant: ?

Here's what we see when training, for example, with GPU ids 1,3,5,7:

CUDA_VISIBLE_DEVICES=1,3,5,7 python2 tools/train_net.py --cfg configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml OUTPUT_DIR /tmp/dbg-cvd-train TRAIN.DATASETS "('coco_2014_minival',)" NUM_GPUS 4

Every 0.1s: nvidia-smi                                                                                                                                                                                                                                                                                                                             Fri Jan 26 09:09:26 2018

Fri Jan 26 09:09:26 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M40           On   | 0000:07:00.0     Off |                  Off |
|  0%   42C    P8    17W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M40           On   | 0000:08:00.0     Off |                  Off |
|  0%   51C    P0   144W / 250W |   7214MiB / 12209MiB |     46%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M40           On   | 0000:09:00.0     Off |                  Off |
|  0%   38C    P8    19W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M40           On   | 0000:0A:00.0     Off |                  Off |
|  0%   52C    P0   220W / 250W |   7502MiB / 12209MiB |     38%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla M40           On   | 0000:0B:00.0     Off |                  Off |
|  0%   40C    P8    17W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla M40           On   | 0000:0C:00.0     Off |                  Off |
|  0%   60C    P0    85W / 250W |   7081MiB / 12209MiB |     48%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla M40           On   | 0000:0D:00.0     Off |                  Off |
|  0%   40C    P8    20W / 250W |      0MiB / 12209MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla M40           On   | 0000:0E:00.0     Off |                  Off |
|  0%   56C    P0    81W / 250W |   7494MiB / 12209MiB |     40%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7210MiB |
|    3   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7498MiB |
|    5   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7077MiB |
|    7   2871837    C   ..............gcc-5-glibc-2.23/bin/python2.7  7490MiB |
+-----------------------------------------------------------------------------+

zdwong · 2018-01-27T04:03:24Z

Operating system: Ubuntu 16.04
Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0
CUDA version: 8.0
cuDNN version: v5.1
NVIDIA driver version: 384.111

yousongzhu · 2018-01-27T06:35:00Z

Operating system: CentOS Linux release 7.1.1503
Compiler version: gcc version 4.8.2
CUDA version: CUDA 8.0
cuDNN version: cuDNN 6.0.21
NVIDIA driver version: 375.26
GPU models: 4x GeForce GTX TITAN X （12G）

nvidia-smi:

When using 4 GPUs (0,1,2,3) to train Mask RCNN (e2e_mask_rcnn_R-101-FPN) , RetinaNet (retinanet_R-101-FPN) or Faster RCNN (e2e_faster_rcnn_R-50-FPN), the error “context_gpu.h:307: an illegal memory access was encountered” or “context_gpu.h:170. Encountered CUDA error: an illegal memory access was encountered Error from operator: input: "gpu_0/retnet_cls_pred_fpn3_b_grad" input: "gpu_2/retnet_cls_pred_fpn3_b_grad" output: "gpu_0/retnet_cls_pred_fpn3_b_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 } ” occurs.

But using one GPU or two GPUS (0,1 or 2,3), it can be trained normally.
Thanks.

rbgirshick · 2018-01-27T15:50:45Z

@jwnsu: looking at your error more closely ("invalid device ordinal"), it looks like you're trying to train with a config set up for 8 GPUs but restricting the process to have only access to 4 (via CUDA_VISIBLE_DEVICES). The "invalid device ordinal" error is because it's trying to create ops on devices that the process does not have access to.

rbgirshick · 2018-01-27T15:52:42Z

@coolbrain, @tshizys: thanks for the details. What happens if you use two GPUs using ids {0,2}, {0,3}, {1,2}, or {1,3}?

jwnsu · 2018-01-27T17:25:31Z

@rbgirshick you are right, picked wrong config file (with 8 GPUs setting) to try yesterday. Just tried again with the right config file (4 GPUs, error from gpu ids "1,2,4,5", "0,1,2,3" works fine), the error is now similar to what others are seeing:

I0127 09:06:48.220716 10872 context_gpu.cu:325] Total: 20748 MB
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
  what():  [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/retnet_bbox_pred_fpn3_b_grad" input: "gpu_2/retnet_bbox_pred_fpn3_b_grad" output: "gpu_0/retnet_bbox_pred_fpn3_b_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
  what():  [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_2/retnet_cls_conv_n3_fpn3" input: "gpu_2/__m13_shared" output: "gpu_2/__m13_shared" name: "" type: "ReluGradient" arg { name: "cudnn_exhaustive_search" i: 0 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 cuda_gpu_id: 2 } engine: "CUDNN" is_gradient_op: true
*** Aborted at 1517072808 (unix time) try "date -d @1517072808" if you are using GNU date ***
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
PC: @     0x7fd71f6bd428 gsignal
*** SIGABRT (@0x3e900002a18) received by PID 10776 (TID 0x7fd548e3d700) from PID 10776; stack trace: ***
    @     0x7fd71fa63390 (unknown)
    @     0x7fd71f6bd428 gsignal
    @     0x7fd71f6bf02a abort
    @     0x7fd71b51c84d __gnu_cxx::__verbose_terminate_handler()
    @     0x7fd71b51a6b6 (unknown)
    @     0x7fd71b51a701 std::terminate()
    @     0x7fd71b545d38 (unknown)
    @     0x7fd71fa596ba start_thread
    @     0x7fd71f78f41d clone
    @                0x0 (unknown)
./itrain4.sh: line 9: 10776 Aborted                 (core dumped) python2 tools/train_net.py --multi-gpu-testing --cfg configs/iret-rn50-fpn-voc.yaml OUTPUT_DIR ./output

rbgirshick · 2018-01-28T02:19:25Z

@coolbrain, @tshizys: one shot in the dark is to switch the all-reduce implementation to nccl by passing USE_NCCL True to train_net.py, as in:

python2 tools/train_net.py --multi-gpu-testing \
  --cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R-50-FPN.yaml \
  OUTPUT_DIR /tmp/output USE_NCCL True

This will require Caffe2 to have been built with nccl ops -- I'm not sure if this is done by default or will require some work to rebuild Caffe2 with nccl support.

yousongzhu · 2018-01-28T04:33:56Z

@rbgirshick , when using two GPUs, i.e. {0,2}, {0,3}, {1,2}, {1,3}, the error still exists. Here is the details, using {0,3} and training RetinaNet (retinanet_R-101-FPN) for example:

F0128 12:09:08.461153 4938 context_gpu.cu:387] Error at: /home/yszhu/local/caffe2/caffe2/core/context_gpu.cu:387: an illegal memory access was encountered
*** Check failure stack trace: ***
terminate called recursively
terminate called recursively
*** Aborted at 1517112548 (unix time) try "date -d @1517112548" if you are using GNU date ***
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/fpn_6_relu" input: "gpu_0/fpn_7_w" input: "gpu_0/__m23_shared" output: "gpu_0/fpn_7_w_grad" output: "gpu_0/fpn_7_b_grad" output: "gpu_0/__m22_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
@ 0x7f2bdf712772 google::LogMessage::Fail()
PC: @ 0x0 (unknown)
*** SIGABRT (@0x3e8000012b7) received by PID 4791 (TID 0x7f2a6effd700) from PID 4791; stack trace: ***
@ 0x7f2bdf7126ce google::LogMessage::SendToLog()
@ 0x7f2c2670e130 (unknown)
@ 0x7f2bdf71204c google::LogMessage::Flush()
@ 0x7f2c25c6a5d7 __GI_raise
@ 0x7f2bdf71556d google::LogMessageFatal::~LogMessageFatal()
@ 0x7f2c25c6bcc8 __GI_abort
@ 0x7f2c1b1b1965 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f2bdfdd1180 caffe2::CUDAContext::Delete()
@ 0x7f2c1b1af946 (unknown)
@ 0x7f2be27f42d9 std::_Sp_counted_base<>::_M_release()
@ 0x7f2c1b1af973 std::terminate()
@ 0x7f2c1b2062c5 (unknown)
@ 0x7f2bdfd377d1 caffe2::Tensor<>::ResizeLike<>()
@ 0x7f2c26706df5 start_thread
@ 0x7f2bdfd6e3e2 ZN6caffe210CuDNNState7executeIRZNS_19CudnnConvGradientOp13DoRunWithTypeIffffffffEEbvEUlPS0_E1_EEvP11CUstream_stOT
@ 0x7f2c25d2b1ad __clone
@ 0x7f2bdfd707e1 caffe2::CudnnConvGradientOp::DoRunWithType<>()
@ 0x0 (unknown)

The forms of error are not the same each time, but it's just "Encountered CUDA error: an illegal memory access was encountered".

yousongzhu · 2018-01-28T05:01:26Z

I also rebuild caffe2 with nccl-1.3.5 (following https://caffe2.ai/docs/getting-started.html?platform=centos&configuration=cloud#null__troubleshooting):

and switch the all-reduce implementation to nccl by passing USE_NCCL True to train_net.py, as in:

python2 tools/train_net.py --multi-gpu-testing
--cfg configs/12_2017_baselines/retinanet_R-101-FPN_1x_4gpus.yaml
OUTPUT_DIR results_retinanet_R-101-FPN_1x_4gpus_model USE_NCCL True

The error disappeared ^--^ for both using four GPUs {0,1,2,3} or any of two GPUs {0,2}, {0,3}, {1,2}, {1,3}.
@rbgirshick ,thanks very much.

lwher · 2018-01-29T03:35:57Z

Hi, I open the nccl op to train the tutorial_network and the error above disappeared. However, the program hangs after loading data and occupy 100% CPU all the time.

.......
I0129 03:25:13.106998 118074 context_gpu.cu:321] GPU 0: 2175 MB
I0129 03:25:13.107028 118074 context_gpu.cu:321] GPU 1: 2078 MB
I0129 03:25:13.107045 118074 context_gpu.cu:321] GPU 2: 2266 MB
I0129 03:25:13.107059 118074 context_gpu.cu:321] GPU 3: 1860 MB
I0129 03:25:13.107072 118074 context_gpu.cu:325] Total: 8381 MB
I0129 03:25:13.122316 118079 context_gpu.cu:321] GPU 0: 2195 MB
I0129 03:25:13.122344 118079 context_gpu.cu:321] GPU 1: 2145 MB
I0129 03:25:13.122361 118079 context_gpu.cu:321] GPU 2: 2267 MB
I0129 03:25:13.122378 118079 context_gpu.cu:321] GPU 3: 1924 MB
I0129 03:25:13.122395 118079 context_gpu.cu:325] Total: 8532 MB
I0129 03:25:13.151623 118079 context_gpu.cu:321] GPU 0: 2245 MB
I0129 03:25:13.151650 118079 context_gpu.cu:321] GPU 1: 2159 MB
I0129 03:25:13.152823 118079 context_gpu.cu:321] GPU 2: 2269 MB
I0129 03:25:13.153623 118079 context_gpu.cu:321] GPU 3: 2020 MB
I0129 03:25:13.154454 118079 context_gpu.cu:325] Total: 8694 MB
I0129 03:25:13.186017 118079 context_gpu.cu:321] GPU 0: 2260 MB
I0129 03:25:13.186053 118079 context_gpu.cu:321] GPU 1: 2214 MB
I0129 03:25:13.186067 118079 context_gpu.cu:321] GPU 2: 2279 MB
I0129 03:25:13.186077 118079 context_gpu.cu:321] GPU 3: 2080 MB
I0129 03:25:13.186089 118079 context_gpu.cu:325] Total: 8835 MB
I0129 03:25:13.215306 118076 context_gpu.cu:321] GPU 0: 2310 MB
I0129 03:25:13.215342 118076 context_gpu.cu:321] GPU 1: 2269 MB
I0129 03:25:13.215351 118076 context_gpu.cu:321] GPU 2: 2308 MB
I0129 03:25:13.215368 118076 context_gpu.cu:321] GPU 3: 2081 MB
I0129 03:25:13.215384 118076 context_gpu.cu:325] Total: 8970 MB
I0129 03:25:13.307595 118084 context_gpu.cu:321] GPU 0: 2310 MB
I0129 03:25:13.307623 118084 context_gpu.cu:321] GPU 1: 2301 MB
I0129 03:25:13.307641 118084 context_gpu.cu:321] GPU 2: 2391 MB
I0129 03:25:13.307652 118084 context_gpu.cu:321] GPU 3: 2104 MB
I0129 03:25:13.307665 118084 context_gpu.cu:325] Total: 9108 MB
I0129 03:25:13.324935 118077 context_gpu.cu:321] GPU 0: 2312 MB
I0129 03:25:13.324965 118077 context_gpu.cu:321] GPU 1: 2313 MB
I0129 03:25:13.324982 118077 context_gpu.cu:321] GPU 2: 2452 MB
I0129 03:25:13.324993 118077 context_gpu.cu:321] GPU 3: 2171 MB
I0129 03:25:13.325011 118077 context_gpu.cu:325] Total: 9250 MB
I0129 03:25:13.343673 118080 context_gpu.cu:321] GPU 0: 2336 MB
I0129 03:25:13.343698 118080 context_gpu.cu:321] GPU 1: 2380 MB
I0129 03:25:13.343715 118080 context_gpu.cu:321] GPU 2: 2468 MB
I0129 03:25:13.343731 118080 context_gpu.cu:321] GPU 3: 2233 MB
I0129 03:25:13.343747 118080 context_gpu.cu:325] Total: 9417 MB
I0129 03:25:13.369802 118085 cuda_nccl_gpu.cc:110] Creating NCCLContext for key: 0:0,1,2,3,
I0129 03:25:13.381914 118076 context_gpu.cu:321] GPU 0: 2361 MB
I0129 03:25:13.381942 118076 context_gpu.cu:321] GPU 1: 2453 MB
I0129 03:25:13.381961 118076 context_gpu.cu:321] GPU 2: 2524 MB
I0129 03:25:13.381978 118076 context_gpu.cu:321] GPU 3: 2247 MB
I0129 03:25:13.381995 118076 context_gpu.cu:325] Total: 9587 MB
I0129 03:25:13.613253 118083 context_gpu.cu:321] GPU 0: 2388 MB
I0129 03:25:13.613292 118083 context_gpu.cu:321] GPU 1: 2525 MB
I0129 03:25:13.613301 118083 context_gpu.cu:321] GPU 2: 2524 MB
I0129 03:25:13.613308 118083 context_gpu.cu:321] GPU 3: 2310 MB
I0129 03:25:13.613315 118083 context_gpu.cu:325] Total: 9748 MB

the program hangs......

my environment:
Operating system: Ubuntu 16.04
Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0
CUDA version: 8.0
cuDNN version: v5.1
NVIDIA driver version: 384.111

rbgirshick · 2018-01-29T04:57:44Z

@lwher: that's unfortunate. The reason we don't use NCCL by default is that it's prone to causing deadlocks, which is what I think you're seeing.

zdwong · 2018-01-29T07:30:46Z

After rebuilding caffe2 with NCCL， I rerun the program with this script：
python tools/train_net.py
--multi-gpu-testing
--cfg configs/getting_started/tutorial_4gpu_e2e_faster_rcnn_R-50-FPN.yaml
OUTPUT_DIR ./output USE_NCCL True

It throws this error：

Creating NCCLContext for key: 0:0,1,2,3,
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at cuda_nccl_gpu.cc:40] status == ncclSuccess. 2 vs 0. Error at: /mnt/hzhida/project/caffe2/caffe2/contrib/nccl/cuda_nccl_gpu.cc40: system error Error from operator:
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" input: "gpu_2/rpn_cls_logits_fpn2_w_grad" input: "gpu_3/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" output: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_2/rpn_cls_logits_fpn2_w_grad" output: "gpu_3/rpn_cls_logits_fpn2_w_grad" name: "" type: "NCCLAllreduce" device_option { device_type: 1 cuda_gpu_id: 0 }
*** Aborted at 1517210588 (unix time) try "date -d @1517210588" if you are using GNU date ***
PC: @ 0x7ff1e0383428 gsignal
*** SIGABRT (@0x3e800007a46) received by PID 31302 (TID 0x7fefb5ffb700) from PID 31302; stack trace: ***
I0129 07:23:08.187249 31591 cuda_nccl_gpu.cc:110] Creating NCCLContext for key: 0:0,1,2,3,

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
terminate called recursively
@ 0x7ff1e0729390 (unknown)
I0129 07:23:08.188051 31592 context_gpu.cu:321] GPU 0: 2466 MB
I0129 07:23:08.188074 31592 context_gpu.cu:321] GPU 1: 2387 MB
I0129 07:23:08.188091 31592 context_gpu.cu:321] GPU 2: 2311 MB
I0129 07:23:08.188099 31592 context_gpu.cu:321] GPU 3: 2382 MB
I0129 07:23:08.188107 31592 context_gpu.cu:325] Total: 9548 MB
@ 0x7ff1e0383428 gsignal
@ 0x7ff1e038502a abort
@ 0x7ff1da16284d __gnu_cxx::__verbose_terminate_handler()
@ 0x7ff1da1606b6 (unknown)
@ 0x7ff1da160701 std::terminate()
@ 0x7ff1da18bd38 (unknown)
@ 0x7ff1e071f6ba start_thread
@ 0x7ff1e045541d clone
@ 0x0 (unknown)
Aborted (core dumped)

Running Environment：
Operating system: Ubuntu 16.04
Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0
CUDA version: 8.0
cuDNN version: v5.1
NVIDIA driver version: 384.111

ir413 · 2018-01-29T14:30:27Z

One additional note about NCCL: Caffe2 builds with NCCL by default so there is no need to rebuild it.

Yangqing · 2018-01-30T21:25:37Z

Jumping onto this: since the illegal memory access is from the Add operator, you might want to check if direct peer access is available between the gpus that you are using. Current Add op relies on that, and if not we might want to fix the code indeed. Basically, to do so, in python, do:

from caffe2.python import workspace
print(workspace.GetCudaPeerAccessPattern())

Could you paste the output of that for debugging? (Especially, if you are using CUDA_VISIBLE_DEVICES, make sure you invoke python with that too)

jwnsu · 2018-01-30T21:47:25Z

@Yangqing output from your two debug lines:

[[ True  True False False]
[ True  True False False]
[False False  True  True]
[False False  True  True]]

thx for looking into this issue (and ... caffe/caffe2 frameworks!)

Yangqing · 2018-01-30T22:46:15Z

@jwnsu thanks! Just to confirm, so the Add operator is adding tensors across gpu {0,1} and {2,3} right? (I assume it is adding stuff together from the 4 gpus).

jwnsu · 2018-01-30T22:55:29Z

It's 4 gpus config, with GPU ids specified as "0,1,2,4" (via CUDA_VISIBLE_DEVICES.) If GPU ids are configured as "0,1,2,3" (lowest GPU ids), it works fine without any error.

Liang-Sen · 2018-01-31T02:57:17Z

@Yangqing
My Linux Server have 4 M60 GPUs,
This is my workspace.GetCudaPeerAccessPattern() output:
[[ True False False False]
[False True False False]
[False False True False]
[False False False True]]

I can train net using 1 gpu well, but when I train net using 2 or 4 GPUS, I meet problems the same above, even I set NCCL = True

Yangqing · 2018-01-31T04:43:48Z

Thanks guys. This verifies my assumption that the illegal memory access comes from the Add op not properly handling cross-device communications when peer access is not enabled. Will issue a fix.

JohnnyGambler · 2018-01-31T04:55:07Z

same problem in cross-device communications...
this machine can use 4 GPU[0,1,2,3]:

this machine can use [0,1] and [2,3]:

BTW, I have use 12 Cpu and 4 titan x to train 3D Faster RCNN in pytorch framework . Why Pytorch doesn't have this problem ????

illutheplanet · 2018-04-25T07:09:20Z

anybody tells me whether I can run mask r-cnn with only one GPU?

yuzcccc · 2018-04-25T14:55:04Z

@daquexian I tried your PR, it works!!! Thanks very much

Feynman27 · 2018-05-01T18:51:23Z

@daquexian This PR doesn't appear to work for me. I'm experiencing deadlocks while using a single GPU without NCCL and also while using 2 GPUs with USE_NCCL True. After changing muji.py according to your PR and running with 2 GPUs with USE_NCCL True, I'm still experiencing a deadlock; the training just pauses at random iteration numbers.

daquexian · 2018-05-02T01:52:43Z

Thanks for your trying :) You don't need to set USE_NCCL=True if you use my pr. NCCL and "muji" are two different gpu communication methods. My pr is a patch for muji, which required gpu peer access before, and not for nccl. Just set USE_NCCL=False and my pr will work.

…

On Wed, May 2, 2018, 2:51 AM Thomas Balestri ***@***.***> wrote: @daquexian <https://github.com/daquexian> This PR doesn't appear to work for me. I'm experiencing deadlocks while using a single GPU without NCCL and also while using 2 GPUs with USE_NCCL True. After changing muji.py according to your PR and running with 2 GPUs with USE_NCCL True, I'm still experiencing a deadlock; the training just pauses at random iteration numbers. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALEcn2nGO9e-fIF8S3bTDNkK4370hjOVks5tuK7DgaJpZM4Rsc8n> .

Feynman27 · 2018-05-02T14:39:43Z

Maybe I'm missing something, but if I set USE_NCCL=False, and use your modified muji.py and muji_test.py PR, I get the original error:

I0502 14:35:57.192476 79712 context_gpu.cu:318] Total: 23025 MB
E0502 14:35:58.382604 79711 net_dag.cc:195] Exception from operator chain starting at '' (type 'Add'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:156] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/rpn_cls_logits_fpn2_b_grad" input: "gpu_1/rpn_cls_logits_fpn2_b_grad" output: "gpu_0/rpn_cls_logits_fpn2_b_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
E0502 14:35:58.382622 79712 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'Add'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:156] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
F0502 14:35:58.382670 79711 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0502 14:35:58.382683 79712 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
E0502 14:35:58.383510 79709 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_1/fpn_res3_3_sum" input: "gpu_1/conv_rpn_fpn2_w" input: "gpu_1/__m18_shared" output: "_gpu_1/conv_rpn_fpn2_w_grad_autosplit_2" output: "_gpu_1/conv_rpn_fpn2_b_grad_autosplit_2" output: "_gpu_1/fpn_res3_3_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 1 } engine: "CUDNN" is_gradient_op: true
E0502 14:35:58.383541 79713 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at conv_op_cudnn.cc:1290] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /home/markable-ai/pytorch/caffe2/operators/conv_op_cudnn.cc:1290: CUDNN_STATUS_EXECUTION_FAILED Error from operator: 
input: "gpu_3/conv_rpn_fpn4" input: "gpu_3/rpn_bbox_pred_fpn2_w" input: "gpu_3/rpn_bbox_pred_fpn4_grad" output: "_gpu_3/rpn_bbox_pred_fpn2_w_grad_autosplit_1" output: "_gpu_3/rpn_bbox_pred_fpn2_b_grad_autosplit_1" output: "gpu_3/__m13_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 3 } engine: "CUDNN" is_gradient_op: true
E0502 14:35:58.383591 79706 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_3/conv_rpn_fpn3" input: "gpu_3/rpn_cls_logits_fpn2_w" input: "gpu_3/rpn_cls_logits_fpn3_grad" output: "_gpu_3/rpn_cls_logits_fpn2_w_grad_autosplit_2" output: "_gpu_3/rpn_cls_logits_fpn2_b_grad_autosplit_2" output: "_gpu_3/conv_rpn_fpn3_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 3 } engine: "CUDNN" is_gradient_op: true
F0502 14:35:58.382683 79712 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encounteredF0502 14:35:58.434631 79709 context_gpu.h:107] FCheck failed: error == cudaSuccess an illegal memory access was encountered0502 14:35:58.434648 79713 c*** Check failure stack trace: ***
E0502 14:35:58.383741 79700 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_3/conv_rpn_fpn2" input: "gpu_3/rpn_cls_logits_fpn2_w" input: "gpu_3/rpn_cls_logits_fpn2_grad" output: "_gpu_3/rpn_cls_logits_fpn2_w_grad_autosplit_3" output: "_gpu_3/rpn_cls_logits_fpn2_b_grad_autosplit_3" output: "_gpu_3/conv_rpn_fpn2_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 3 } engine: "CUDNN" is_gradient_op: true
Aborted (core dumped)

I'm using Cuda 9.1, cudnn 7.1 with 4 V100s.

daquexian · 2018-05-02T16:21:49Z

@Feynman27 Could you tell me which branch(like Allreduce4, Allreduce4Group2, Allreduce2 or others) of Allreduce in the updated muji.py is entered? You might want to add some print functions in these branch to know it. And what if you replace the implementation of Allreduce by just calling AllreduceFallback? It will be great if you can also provide your gpu access pattern like #32 (comment). Thanks!

Feynman27 · 2018-05-02T17:04:30Z

Allreduce4 is being called. The gpu access pattern is:

>>> from caffe2.python import workspace
>>> print(workspace.GetCudaPeerAccessPattern())
[[ True False False False]
 [False  True False False]
 [False False  True False]
 [False False False  True]]

I'll try calling AllreduceFallback.

Feynman27 · 2018-05-02T17:10:04Z

Calling AllreduceFallback gives a similar error as above:

I0502 17:08:51.294476 88651 context_gpu.cu:318] Total: 22524 MB
E0502 17:08:52.009866 88659 net_dag.cc:195] Exception from operator chain starting at '' (type 'Add'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:156] . Encountered CUDA error: an illegal memory access was encountered Error from operator: 
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
F0502 17:08:52.009990 88659 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
E0502 17:08:52.010440 88651 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_2/fpn_res3_3_sum" input: "gpu_2/conv_rpn_fpn2_w" input: "gpu_2/__m15_shared" output: "_gpu_2/conv_rpn_fpn2_w_grad_autosplit_2" output: "_gpu_2/conv_rpn_fpn2_b_grad_autosplit_2" output: "_gpu_2/fpn_res3_3_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 2 } engine: "CUDNN" is_gradient_op: true
E0502 17:08:52.010524 88663 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_1/fpn_res2_2_sum" input: "gpu_1/conv_rpn_fpn2_w" input: "gpu_1/__m12_shared" output: "_gpu_1/conv_rpn_fpn2_w_grad_autosplit_3" output: "_gpu_1/conv_rpn_fpn2_b_grad_autosplit_3" output: "_gpu_1/fpn_res2_2_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 1 } engine: "CUDNN" is_gradient_op: true
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encountered
*** Check failure stack trace: ***
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encounteredF0502 17:08:52.061641 88651 context_gpu.hF107] 502 17:Ch:ck failed: error == cudaSuccess 52.061651 88663 context_gpu.h:
E0502 17:08:52.010577 88653 net_dag.cc:195] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.cu:336] error == cudaSuccess. 77 vs 0. Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:336: an illegal memory access was encountered Error from operator: 
input: "gpu_0/fpn_res4_22_sum" input: "gpu_0/conv_rpn_fpn2_w" input: "gpu_0/__m15_shared" output: "_gpu_0/conv_rpn_fpn2_w_grad_autosplit_1" output: "_gpu_0/conv_rpn_fpn2_b_grad_autosplit_1" output: "_gpu_0/fpn_res4_22_sum_grad_autosplit_0" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
*** Check failure stack trace: ***
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encounteredF0502 17:08:52.061641 88651 context_gpu.hF107] 502 17:Ch:ck failed: error == cudaSuccess 52.061651 88663 context_gpu.h:
07] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0502 17:08:52.010545 88660 context_gpu.cu:387] Error at: /home/markable-ai/pytorch/caffe2/core/context_gpu.cu:387: an illegal memory access was encounteredF0502 17:08:52.061641 88651 context_gpu.hF107] 502 17:Ch:ck failed: error == cudaSuccess 52.061651 88663 context_gpu.h:
07] Check failed: error == cudaSuccess an illegal memory access was encounteredF0502 17:08:52.061749 88653 context_gpu.h:107] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
Aborted (core dumped

daquexian · 2018-05-03T01:47:52Z

@Feynman27 It's strange. According to your gpu access pattern, AllreduceFallback instead of Allreduce4 will be called. And when you called AllreduceFallback manually, the error message doesn't appear to be came from AllreduceFallback. Did you change the muji.py in right folder? For example, if the python package of caffe2 is in /usr/lib/python/site-packages/caffe2, then changing the muji.py in caffe2's source folder(like ~/caffe2/python) will not work.

yuzcccc · 2018-05-03T01:57:20Z

@Feynman27 did you rebuild the caffe2 ?

Feynman27 · 2018-05-03T14:24:27Z

@daquexian The caffe2 package is installed under pytorch/caffe2, not /usr/lib/python/site-packages/caffe2 or anything else. I've set my $PYTHONPATH to look in this directory. I've also confirmed this by:

Python 2.7.14 |Anaconda, Inc.| (default, Mar 27 2018, 17:29:31) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe2
>>> caffe2.__file__
'/home/markable-ai/pytorch/build/caffe2/__init__.pyc'
>>> from caffe2.python import muji
>>> muji.__file__
'/home/markable-ai/pytorch/build/caffe2/python/muji.pyc'
>>>

I simply modified the muji.py file under pytorch/caffe2/python/muji.py.

@yuzcccc I didn't rebuild caffe2, but why would I have to? I'm only modifying a python file.

daquexian · 2018-05-03T14:28:51Z

@Feynman27 I think you should modify muji.py under /home/markable-ai/pytorch/build/caffe2/python/muji.py

Feynman27 · 2018-05-03T14:41:30Z

Yep, that was my oversight. Good catch. I was modifying pytorch/caffe2/python/muji.py and should have modified pytorch/build/caffe2/python/muji.py.

daquexian · 2018-05-03T15:48:25Z

@Feynman27 It's happy to see it working :)
@Yangqing Could you please review my pr pytorch/pytorch#6896? It may help many detectron users :)

Feynman27 · 2018-05-03T20:33:56Z

@daquexian Unfortunately, I still seem to be experiencing deadlocks.

daquexian · 2018-05-04T01:08:34Z

@Feynman27 Hmm.. What is the value of USE_NCCL? It should be False

Feynman27 · 2018-05-04T12:17:34Z

Yes, USE_NCCL was set to false.

daquexian · 2018-05-04T14:38:28Z

@Feynman27 Sorry I have no idea why it will cause deadlock. It's hard to reproduce for me

Feynman27 · 2018-05-04T14:43:40Z

Fair enough. For all I know, the deadlock I'm experiencing could be unrelated to whether or not GPU peer access is enabled. Your PR definitely allowed me to start training with USE_NCCL=False. I'm running on Azure machines, so it could be related to running on their VMs. I've started training on local machines with 2 TitanXs and the training seems to be progressing just fine.

mks0601 · 2018-05-11T18:24:45Z

@daquexian Thanks! Your PR worked for me!

gadcam · 2018-05-28T23:39:34Z

Looks like this issue can be closed.

rbgirshick · 2018-05-29T00:18:48Z

@gadcam thanks for helping to identify issues that can be closed!

For this one, I'd like to leave it open until there's a fix merged into Caffe2.

daquexian · 2018-05-29T03:20:38Z

@rbgirshick Unfortunately no one reviews my PR :|

daquexian · 2018-06-04T19:45:14Z

@rbgirshick Thanks! My PR pytorch/pytorch#6896 has been merged. It looks like this issue can be closed :)

ir413 mentioned this issue Jan 25, 2018

Error at: caffe2/core/context_gpu.cu:343: out of memory #5

Closed

rbgirshick added the cannot repro label Jan 26, 2018

This was referenced Jan 29, 2018

multi GPU running Erro #42

Closed

GPU utilization become zero after long term training #19

Closed

rbgirshick added upstream bug and removed cannot repro labels Jan 31, 2018

rbgirshick mentioned this issue Apr 28, 2018

Check failed: error == cudaSuccess an illegal memory access was encountered #383

Closed

srikanth-kilaru mentioned this issue May 28, 2018

Train Faster R-CNN with different backbone #184

Closed

rbgirshick closed this as completed Jun 5, 2018

happyharrycn mentioned this issue Aug 8, 2018

Inference and training time is far slower than Caffe2? roytseng-tw/Detectron.pytorch#103

Open

JeasonUESTC mentioned this issue Mar 18, 2019

RuntimeError: [enforce fail at context_gpu.cu:234] #842

Open

cpoptic mentioned this issue Nov 26, 2019

RuntimeError: CUDA error: no kernel image is available for execution on the device #965

Closed

multi-GPU training throw an illegal memory access #32

multi-GPU training throw an illegal memory access #32

Comments

zdwong commented Jan 25, 2018 • edited Loading

yousongzhu commented Jan 25, 2018

lwher commented Jan 25, 2018

jwnsu commented Jan 25, 2018

rbgirshick commented Jan 25, 2018

rbgirshick commented Jan 26, 2018

zdwong commented Jan 27, 2018 • edited Loading

yousongzhu commented Jan 27, 2018

rbgirshick commented Jan 27, 2018

rbgirshick commented Jan 27, 2018 • edited Loading

jwnsu commented Jan 27, 2018 • edited Loading

rbgirshick commented Jan 28, 2018

yousongzhu commented Jan 28, 2018

yousongzhu commented Jan 28, 2018

lwher commented Jan 29, 2018

the program hangs......

rbgirshick commented Jan 29, 2018

zdwong commented Jan 29, 2018 • edited Loading

ir413 commented Jan 29, 2018

Yangqing commented Jan 30, 2018 • edited Loading

jwnsu commented Jan 30, 2018 • edited Loading

Yangqing commented Jan 30, 2018

jwnsu commented Jan 30, 2018 • edited Loading

Liang-Sen commented Jan 31, 2018

Yangqing commented Jan 31, 2018

JohnnyGambler commented Jan 31, 2018 • edited Loading

illutheplanet commented Apr 25, 2018

yuzcccc commented Apr 25, 2018

Feynman27 commented May 1, 2018

daquexian commented May 2, 2018 via email

Feynman27 commented May 2, 2018 • edited Loading

daquexian commented May 2, 2018 • edited Loading

Feynman27 commented May 2, 2018

Feynman27 commented May 2, 2018 • edited Loading

daquexian commented May 3, 2018 • edited Loading

yuzcccc commented May 3, 2018

Feynman27 commented May 3, 2018 • edited Loading

daquexian commented May 3, 2018

Feynman27 commented May 3, 2018 • edited Loading

daquexian commented May 3, 2018

Feynman27 commented May 3, 2018

daquexian commented May 4, 2018

Feynman27 commented May 4, 2018

daquexian commented May 4, 2018

Feynman27 commented May 4, 2018

mks0601 commented May 11, 2018

gadcam commented May 28, 2018

rbgirshick commented May 29, 2018

daquexian commented May 29, 2018

daquexian commented Jun 4, 2018

zdwong commented Jan 25, 2018 •

edited

Loading

zdwong commented Jan 27, 2018 •

edited

Loading

rbgirshick commented Jan 27, 2018 •

edited

Loading

jwnsu commented Jan 27, 2018 •

edited

Loading

zdwong commented Jan 29, 2018 •

edited

Loading

Yangqing commented Jan 30, 2018 •

edited

Loading

jwnsu commented Jan 30, 2018 •

edited

Loading

jwnsu commented Jan 30, 2018 •

edited

Loading

JohnnyGambler commented Jan 31, 2018 •

edited

Loading

Feynman27 commented May 2, 2018 •

edited

Loading

daquexian commented May 2, 2018 •

edited

Loading

Feynman27 commented May 2, 2018 •

edited

Loading

daquexian commented May 3, 2018 •

edited

Loading

Feynman27 commented May 3, 2018 •

edited

Loading

Feynman27 commented May 3, 2018 •

edited

Loading