-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't serialize when protobuf size > 2gb #2006
Comments
Yeah -- this is a limitation commented on #1756. Two methods suggested there are
(1) could be a quick workaround if you need it while (2) seems more sustainable. |
Okay, #1756 leaves something to be desired in terms of checks at the time of writing a blob, which seems to be the more likely time that you would encounter this issue. Message& types have a ByteSize() attribute that could be checked, although it's computation is not free and its result is not safe from overflow > 2gb. (2) and (3) would work, but somehow migrating away from protobufs seems a little extreme given how central they are to the caffe ecosystem. For now, I stopped serializing layers containing duplicate shared parameter blobs, which decreased my writes by a factor of 4 and brought me under the limit. Thanks for the quick reply. It helped a lot in my decision making. Somehow I missed those other issues despite googling several times. |
Right, there is an open issue for weight sharing polish that includes
|
@shelhamer Okay, I've opened up some of my code here: https://github.com/Russell91/nlp_caffe There's really too much to submit as a single pull request, but many of the weight sharing issues have been solved by accumulating diffs in a master buffer as you go through the backward() calls. If you take a look and let me know what you would want to see in a pull request to Dev, I'll try and put something together. |
@shelhamer Have you already evaluated Google flatbuffers? |
I'm getting segfaults from write proto with large models. It seems that protobuf simply doesn't support writing messages greater than 2gb (marbl/harvest-tools#3). Does anyone have a workaround they would suggest? It would be nice if we added a check on the message size to give a nicer error message if nothing else.
Here are the details:
I0228 14:22:15.306717 23672 solver.cpp:355] Snapshotting to /snapshots/caffe_iter_10.caffemodel
Program received signal SIGSEGV, Segmentation fault.
0x000000000047fa74 in caffe::LayerParameter::GetCachedSize (
this=0x903d621f263c2284) at .build_debug/src/caffe/proto/caffe.pb.h:2063
The backtrace from gdb:
#0 0x000000000047fa74 in caffe::LayerParameter::GetCachedSize (
#1 0x000000000048cedd in google::protobuf::internal::WireFormatLite::WriteMessageNoVirtualToArraycaffe::LayerParameter (field_number=2, value=...,
#2 0x0000000000433cbe in caffe::NetParameter::SerializeWithCachedSizesToArray
#3 0x00007ffff2094d0c in google::protobuf::MessageLite::SerializePartialToCodedStream(google::protobuf::io::CodedOutputStream*) const ()
from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
#4 0x00007ffff2094dc5 in google::protobuf::MessageLite::SerializeToCodedStream(google::protobuf::io::CodedOutputStream*) const ()
from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
#5 0x00007ffff2094f01 in google::protobuf::MessageLite::SerializeToZeroCopyStream(google::protobuf::io::ZeroCopyOutputStream*) const ()
from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
#6 0x00007ffff20ea20b in google::protobuf::Message::SerializeToOstream(std::ostream*) const () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
---Type to continue, or q to quit---
#7 0x00000000004a98fe in caffe::WriteProtoToBinaryFile (proto=...,
#8 0x0000000000497249 in caffe::Solver::Snapshot (this=0x50f9e90)
The text was updated successfully, but these errors were encountered: