Testing Fail #26

dilincv · 2014-01-10T16:57:53Z

Hi, currently I encounter a problem. I run the train_net.bin on my own data, the testing seems to crash. The logged information is:

I0111 00:39:27.884718 8026 solver.cpp:84] Testing net
F0111 00:39:28.925695 8026 syncedmem.cpp:45] Check failed: (cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (2 vs. 0)
*** Check failure stack trace: ***
@ 0x7f1b7e10bb5d google::LogMessage::Fail()
@ 0x7f1b7e10fb77 google::LogMessage::SendToLog()
@ 0x7f1b7e10d9f9 google::LogMessage::Flush()
@ 0x7f1b7e10dcfd google::LogMessageFatal::~LogMessageFatal()
@ 0x436d57 caffe::SyncedMemory::mutable_gpu_data()
@ 0x4208fe caffe::Blob<>::mutable_gpu_data()
@ 0x445dd4 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x42842a caffe::Net<>::ForwardPrefilled()
@ 0x41d319 caffe::Solver<>::Test()
@ 0x41e705 caffe::Solver<>::Solve()
@ 0x40b8dd main
@ 0x30b9c1ecdd (unknown)
@ 0x40b739 (unknown)

Aborted (core dumped)

I exactly exploit the network architecture defined in "imagenet.prototxt" and "imagenet_val.prototxt". My training and testing datasets, respectively, are out 20G and contain 200,000 images cropped to 256*256.

Anyway, I feel a little confused now...Thus I would like to ask for help here. Much thanks!

sguada · 2014-01-10T17:03:37Z

That error is due to a memory error in CUDA. You're GPU don't have enough
memory to load the TestNet. You can reduce the number of images per batch
for test in imagen_val.prototxt to avoid that.

Sergio

2014/1/10 ANDEHK [email protected]

Hi, currently I encounter a problem. I run the train_net.bin on my own
data, the testing seems to crash. The logged information is:

I0111 00:39:27.884718 8026 solver.cpp:84] Testing net
F0111 00:39:28.925695 8026 syncedmem.cpp:45] Check failed:
(cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (2 vs. 0)
*** Check failure stack trace: ***
@ 0x7f1b7e10bb5d google::LogMessage::Fail()
@ 0x7f1b7e10fb77 google::LogMessage::SendToLog()
@ 0x7f1b7e10d9f9 google::LogMessage::Flush()
@ 0x7f1b7e10dcfd google::LogMessageFatal::~LogMessageFatal()
@ 0x436d57 caffe::SyncedMemory::mutable_gpu_data()
@ 0x4208fe caffe::Blob<>::mutable_gpu_data()
@ 0x445dd4 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x42842a caffe::Net<>::ForwardPrefilled()
@ 0x41d319 caffe::Solver<>::Test()
@ 0x41e705 caffe::Solver<>::Solve()
@ 0x40b8dd main
@ 0x30b9c1ecdd (unknown)
@ 0x40b739 (unknown)
Aborted (core dumped)

I exactly exploit the network architecture defined in "imagenet.prototxt"
and "imagenet_val.prototxt". My training and testing datasets,
respectively, are out 20G and contain 200,000 images cropped to 256*256.

Anyway, I feel a little confused now...Thus I would like to ask for help
here. Much thanks!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/26
.

dilincv · 2014-01-10T18:01:34Z

Thanks! It works for me.

Fix potential CNMEM_NOT_INITIALIZED errors

extend pycaffe solver API for GAN training

Yangqing closed this as completed Jan 10, 2014

shelhamer added the downstream problem? label Feb 25, 2014

johnswan mentioned this issue Mar 19, 2014

train_net.bin crash when solving with own data (out of memory) #241

Closed

lukeyeager added a commit to lukeyeager/caffe that referenced this issue Sep 3, 2015

Merge pull request BVLC#26 from slayton58/cnmem_fix

b0df955

Fix potential CNMEM_NOT_INITIALIZED errors

chensiqin mentioned this issue Nov 28, 2015

Output accuracies per class. #2935

Merged

shuguang101 mentioned this issue Jan 20, 2018

Segmentation Fault: 11 - OSX high sierra - please Help #6019

Open

dkoes added a commit to gnina/caffe that referenced this issue Jun 4, 2018

Merge pull request BVLC#26 from gnina/gan

136fabd

extend pycaffe solver API for GAN training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Fail #26

Testing Fail #26

dilincv commented Jan 10, 2014

sguada commented Jan 10, 2014

dilincv commented Jan 10, 2014

Testing Fail #26

Testing Fail #26

Comments

dilincv commented Jan 10, 2014

Hi, currently I encounter a problem. I run the train_net.bin on my own data, the testing seems to crash. The logged information is:

Aborted (core dumped)

sguada commented Jan 10, 2014

dilincv commented Jan 10, 2014