Can't serialize when protobuf size > 2gb #2006

ghost · 2015-02-28T22:44:21Z

I'm getting segfaults from write proto with large models. It seems that protobuf simply doesn't support writing messages greater than 2gb (marbl/harvest-tools#3). Does anyone have a workaround they would suggest? It would be nice if we added a check on the message size to give a nicer error message if nothing else.

Here are the details:

I0228 14:22:15.306717 23672 solver.cpp:355] Snapshotting to /snapshots/caffe_iter_10.caffemodel

Program received signal SIGSEGV, Segmentation fault.
0x000000000047fa74 in caffe::LayerParameter::GetCachedSize (
this=0x903d621f263c2284) at .build_debug/src/caffe/proto/caffe.pb.h:2063

The backtrace from gdb:
#0 0x000000000047fa74 in caffe::LayerParameter::GetCachedSize (

this=0x903d621f263c2284) at .build_debug/src/caffe/proto/caffe.pb.h:2063

#1 0x000000000048cedd in google::protobuf::internal::WireFormatLite::WriteMessageNoVirtualToArraycaffe::LayerParameter (field_number=2, value=...,

target=0x2305c18b "<\v\a\237\274>\025\254\274r\376\245<g̪<\003`\332<n\345\061=\241\374\202\273>ֶ\274J2|=\201\300\326\067\203\211.=\212\035\371\274\224\337B\275\025&ɼ") at /usr/include/google/protobuf/wire_format_lite_inl.h:708

#2 0x0000000000433cbe in caffe::NetParameter::SerializeWithCachedSizesToArray

(this=0x7fffffffd160, 
target=0x2305c18a "\022<\v\a\237\274>\025\254\274r\376\245<g̪<\003`\332<n\345\061=\241\374\202\273>ֶ\274J2|=\201\300\326\067\203\211.=\212\035\371\274\224\337B\275\025&ɼ") at .build_debug/src/caffe/proto/caffe.pb.cc:3530

#3 0x00007ffff2094d0c in google::protobuf::MessageLite::SerializePartialToCodedStream(google::protobuf::io::CodedOutputStream*) const ()

from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
#4 0x00007ffff2094dc5 in google::protobuf::MessageLite::SerializeToCodedStream(google::protobuf::io::CodedOutputStream*) const ()

from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
#5 0x00007ffff2094f01 in google::protobuf::MessageLite::SerializeToZeroCopyStream(google::protobuf::io::ZeroCopyOutputStream*) const ()

from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8
#6 0x00007ffff20ea20b in google::protobuf::Message::SerializeToOstream(std::ostream*) const () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.8

---Type to continue, or q to quit---
#7 0x00000000004a98fe in caffe::WriteProtoToBinaryFile (proto=...,

filename=0x193f2c68 "/snapshots/caffe_iter_10.caffemodel")
at src/caffe/util/io.cpp:66

#8 0x0000000000497249 in caffe::Solver::Snapshot (this=0x50f9e90)

at src/caffe/solver.cpp:356

The text was updated successfully, but these errors were encountered:

shelhamer · 2015-02-28T23:06:12Z

Yeah -- this is a limitation commented on #1756. Two methods suggested there are

serialize parameters into separate binary proto
switch to hdf5 for parameter serialization
take a closer look at Cap'n Proto Cap'n Proto #1762

(1) could be a quick workaround if you need it while (2) seems more sustainable.

ghost · 2015-03-01T00:59:09Z

Okay, #1756 leaves something to be desired in terms of checks at the time of writing a blob, which seems to be the more likely time that you would encounter this issue. Message& types have a ByteSize() attribute that could be checked, although it's computation is not free and its result is not safe from overflow > 2gb. (2) and (3) would work, but somehow migrating away from protobufs seems a little extreme given how central they are to the caffe ecosystem. For now, I stopped serializing layers containing duplicate shared parameter blobs, which decreased my writes by a factor of 4 and brought me under the limit. Thanks for the quick reply. It helped a lot in my decision making. Somehow I missed those other issues despite googling several times.

shelhamer · 2015-03-01T02:05:55Z

Right, there is an open issue for weight sharing polish that includes
saving / loading / filling only the owner of shared weights as you
highlighted too. Send a PR if you can and then we'll check that off.
On Sat, Feb 28, 2015 at 16:59 Russell Stewart [email protected]
wrote:

Closed #2006 #2006.

—
Reply to this email directly or view it on GitHub
#2006 (comment).

ghost · 2015-04-03T04:44:46Z

@shelhamer Okay, I've opened up some of my code here: https://github.com/Russell91/nlp_caffe

There's really too much to submit as a single pull request, but many of the weight sharing issues have been solved by accumulating diffs in a master buffer as you go through the backward() calls. If you take a look and let me know what you would want to see in a pull request to Dev, I'll try and put something together.

bhack · 2015-04-03T21:33:17Z

@shelhamer Have you already evaluated Google flatbuffers?

shelhamer · 2015-08-08T23:06:50Z

Closing as solved by #2836 although it is not yet the default due to the issue raised by #2885.

shelhamer changed the title ~~Caffe segfaults when protobuf size > 2gb~~ Can't serialize when protobuf size > 2gb Feb 28, 2015

ghost closed this as completed Mar 1, 2015

shelhamer mentioned this issue Apr 27, 2015

Inefficient Snapshotting of Shared Parameters #2375

Closed

5kg mentioned this issue Jun 15, 2015

Alternative data format for matrix shards storage taskgraph/bwmf#5

Open

shelhamer reopened this Jul 29, 2015

shelhamer mentioned this issue Jul 29, 2015

Separate dependencies for configurable installation #2523

Merged

bhack mentioned this issue Aug 8, 2015

HDF5 snapshot crashes on duplicate layer names #2885

Closed

shelhamer closed this as completed Aug 8, 2015

bhack mentioned this issue Mar 16, 2017

XLA segfaults with large graphs tensorflow/tensorflow#8451

Closed

csukuangfj mentioned this issue Aug 19, 2020

Same setup - different training loss (outcome) #6588

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't serialize when protobuf size > 2gb #2006

Can't serialize when protobuf size > 2gb #2006

ghost commented Feb 28, 2015

shelhamer commented Feb 28, 2015

ghost commented Mar 1, 2015

shelhamer commented Mar 1, 2015

ghost commented Apr 3, 2015

bhack commented Apr 3, 2015

shelhamer commented Aug 8, 2015

Can't serialize when protobuf size > 2gb #2006

Can't serialize when protobuf size > 2gb #2006

Comments

ghost commented Feb 28, 2015

shelhamer commented Feb 28, 2015

ghost commented Mar 1, 2015

shelhamer commented Mar 1, 2015

ghost commented Apr 3, 2015

bhack commented Apr 3, 2015

shelhamer commented Aug 8, 2015