-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryDataLayer Problem #2334
Comments
how are you initially setting train_set_x and y? |
I tried with MNIST data
and also tried zero data:
|
Will hopeful play with this this weekend as its something I need to achieve sometime soon. Hope I'm able to shed some light. |
Does this happen in |
I tested in on many machines such as Ubuntu 14.04 with Geforce GTX 760 and on MacBook Pro (Late 2013) OSX 10.10.3 It happens in release mode. The stack trace: |
Right, can you try in That's just the error above, not the stack trace; was anything printed below? (I believe there might be an issue where stack traces don't get printed in pycaffe, so there might not be; in that case you would need to use |
Yes, still the same error in
|
Okay, thanks. In |
|
Interestingly enough, when I run it in
|
It seems like it doesn't get the right pointer for the train data set. Here is some output of the memory layer that I print to the console during the training/test
|
Ah, that's what I was looking for, forgot that those checks are only there in Testing first is normal (unless you set the option |
Yes, I checked the input in python with min/max. It's fine. When using zero input, same problem:
|
Interestingly enough data seems to be initialized correctly. When printing after
|
I am coming closer to the source of error. When I make the training set size smaller, let's say from 500 to 250 elements, it works without problem. Just a reminder I am working with images 28x28, so the amount of memory required is not very large and the machine(s) I use have sufficient (128 GByte). |
How about 257 items? You see where I'm going...
|
Well, I was using a batch size of 50 so far. So the next bigger one multiple of 50 is 300, which also gives me an error. However, when setting the batch size to 1, I also get an error with 250, although a different one (see below). I just noticed that I also happen to get this error with batch size of 50 and 250 training samples.
|
Interesting, when setting number of training elements to 257 with a batch size of 1, I also get the good old label error:
|
and 255? Phil Teare On Sat, May 9, 2015 at 12:52 PM, Tassilo Klein [email protected]
|
That's more or less fine, just the
|
I think it is a problem of boost python. When calling |
For the meantime I make a deep-copy of the array and then it works fine. |
Good to hear. If you've got working code I'd really appreciate seeing it. Any chance we Phil Teare On Sun, May 10, 2015 at 7:53 AM, Tassilo Klein [email protected]
|
Sure, you can have a look. It's a just minor modifications. https://github.com/TJKlein/caffe/commit/5f1bb97a587043dbe0892466b866abfe4c76804c |
I have faced the same problem since a few days too. I also do believe that the problem is with the python part or/and the memory layer part since it works when I convert my data to a lmdb and run it via the terminal itself. So @TJKlein, I made the line by line changes from your commit and did a 'make all'. All went fine until I ran 'make runtest' to get the following failed test:
So, are you sure the problem is really solved for you? |
That's weird. For me it works well (on three different machines) and for me it passes all the tests. Technically, I am working on a different branch. But I don't see why this should have any effect on it. |
Oh, so now I did a
and got no errors. Seems like the Makefile isn't that complete after all. |
Thanks for the fix. One minor issue I added to the fix, initializing data_ and labels_ to NULL in the constructor, then my tests passed. Your test passed because, luckily, your memory allocator initializes the memory to 0. |
I don't know why the label constrained to 1D . }top[1]->Reshape(label_shape); ??? |
I have faced the same problem when using Segnet to do Semantic Segmentation. The reason turns out to be mislabeled ground truth, |
hi @baidut , I have the same problem. i tried to change the |
hi @zeevikal, I = imread('old_label.jpg');
imwrite(im2bw(I), 'new_label.png'); If not, your problem should not be related to the dataset label, please try other solutions. |
hi @baidut , my label images are *.png (grayscale without alpha channel). I didn't understand "to store 0s and 1s." what do you mean? by the way, Im coding with python. thanks a lot! |
@baidut hi, I'm facing the same problem. I have checked that my groundtruth image is 0/1 image, and saved as .png, but when I use SoftmaxLoss I still face this problem, when I use SigmodCrossEntropy, problem is gone. can you help me about this problem? |
Closing according to #5528. |
Hi,
when choosing MemoryDataLayer as input in Python I get segmentation faults and Check failed: status == CUBLAS_STATUS_SUCCESS (14 vs. 0) CUBLAS_STATUS_INTERNAL_ERROR.
For small data sets it runs for a while until the Python kernel crashes with the above error. For large data sets this error occurs immediately. Tested in on both OSX and Linux, with the latest master branch. Same issue
However, when using HDF5 layer input for the exactly same data / network, it works perfectly. I used the networks as defined in the tutorial - with the adaption of MemoryDataLayer:
http://nbviewer.ipython.org/github/BVLC/caffe/blob/tutorial/examples/01-learning-lenet.ipynb
I am also training / testing on MNIST data set
Setting up the data and the solver
Same as in the example code:
The text was updated successfully, but these errors were encountered: