Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Fail #26

Closed
dilincv opened this issue Jan 10, 2014 · 2 comments
Closed

Testing Fail #26

dilincv opened this issue Jan 10, 2014 · 2 comments

Comments

@dilincv
Copy link

dilincv commented Jan 10, 2014

Hi, currently I encounter a problem. I run the train_net.bin on my own data, the testing seems to crash. The logged information is:

I0111 00:39:27.884718 8026 solver.cpp:84] Testing net
F0111 00:39:28.925695 8026 syncedmem.cpp:45] Check failed: (cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (2 vs. 0)
*** Check failure stack trace: ***
@ 0x7f1b7e10bb5d google::LogMessage::Fail()
@ 0x7f1b7e10fb77 google::LogMessage::SendToLog()
@ 0x7f1b7e10d9f9 google::LogMessage::Flush()
@ 0x7f1b7e10dcfd google::LogMessageFatal::~LogMessageFatal()
@ 0x436d57 caffe::SyncedMemory::mutable_gpu_data()
@ 0x4208fe caffe::Blob<>::mutable_gpu_data()
@ 0x445dd4 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x42842a caffe::Net<>::ForwardPrefilled()
@ 0x41d319 caffe::Solver<>::Test()
@ 0x41e705 caffe::Solver<>::Solve()
@ 0x40b8dd main
@ 0x30b9c1ecdd (unknown)
@ 0x40b739 (unknown)

Aborted (core dumped)

I exactly exploit the network architecture defined in "imagenet.prototxt" and "imagenet_val.prototxt". My training and testing datasets, respectively, are out 20G and contain 200,000 images cropped to 256*256.

Anyway, I feel a little confused now...Thus I would like to ask for help here. Much thanks!

@sguada
Copy link
Contributor

sguada commented Jan 10, 2014

That error is due to a memory error in CUDA. You're GPU don't have enough
memory to load the TestNet. You can reduce the number of images per batch
for test in imagen_val.prototxt to avoid that.

Sergio

2014/1/10 ANDEHK [email protected]

Hi, currently I encounter a problem. I run the train_net.bin on my own
data, the testing seems to crash. The logged information is:

I0111 00:39:27.884718 8026 solver.cpp:84] Testing net
F0111 00:39:28.925695 8026 syncedmem.cpp:45] Check failed:
(cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (2 vs. 0)
*** Check failure stack trace: ***
@ 0x7f1b7e10bb5d google::LogMessage::Fail()
@ 0x7f1b7e10fb77 google::LogMessage::SendToLog()
@ 0x7f1b7e10d9f9 google::LogMessage::Flush()
@ 0x7f1b7e10dcfd google::LogMessageFatal::~LogMessageFatal()
@ 0x436d57 caffe::SyncedMemory::mutable_gpu_data()
@ 0x4208fe caffe::Blob<>::mutable_gpu_data()
@ 0x445dd4 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x42842a caffe::Net<>::ForwardPrefilled()
@ 0x41d319 caffe::Solver<>::Test()
@ 0x41e705 caffe::Solver<>::Solve()
@ 0x40b8dd main
@ 0x30b9c1ecdd (unknown)
@ 0x40b739 (unknown)
Aborted (core dumped)

I exactly exploit the network architecture defined in "imagenet.prototxt"
and "imagenet_val.prototxt". My training and testing datasets,
respectively, are out 20G and contain 200,000 images cropped to 256*256.

Anyway, I feel a little confused now...Thus I would like to ask for help
here. Much thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/26
.

@dilincv
Copy link
Author

dilincv commented Jan 10, 2014

Thanks! It works for me.

lukeyeager added a commit to lukeyeager/caffe that referenced this issue Sep 3, 2015
Fix potential CNMEM_NOT_INITIALIZED errors
dkoes added a commit to gnina/caffe that referenced this issue Jun 4, 2018
extend pycaffe solver API for GAN training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants