-
Notifications
You must be signed in to change notification settings - Fork 178
mv startup segfault
- merged to master - January 19, 2016
- code complete - January 7, 2016
- development started - January 7, 2016
A customer reported random eleveldb.so segfaults during Riak startups with the newly released Riak 2.1.3. There was no new code in the function where the segfault occurred, VersionSet::PickCompaction(). Analysis of the code paths that can reach PickCompaction() lead to the observation that a new DBImpl object could be used by a background thread before its construction completed.
This behavior is the interaction of two features created years apart. The flexcache feature (automated cache sizing) added a global list to track all DBImpl objects created. The point of tracking the objects was to allow code to dynamically adjust cache sizes of all DBImpl objects as DBImpl objects were created and destroyed. Recent compaction selection code also used DBImpl objects global list to initiate grooming compactions based upon time and/or grooming completion of a different DBImpl object.
The original flexcache feature did not need to worry about secondary threads interacting with DBImpl object during its creation. The function call to add the new DBImpl to the global list was haphazardly thrown into the object creation routine. The new DBImpl was therefore added to the global list before it was fully initialized. But the moment it becomes part of the list, it could be used. And the new grooming code was using it. The uninitialized components of the DBImpl object when used by background compaction logic lead to segfaults.
The DBImpl::DBImpl constructor now adds itself to the global list of all DBImpl objects only after initialization completes.
This still leaves the opportunity for future segfault type failures should someone derive an object from DBImpl. That opportunity is not currently supported, so not defended against. The likely fix would be to add the DBImpl object to the global list as part of the DB::Open() routine, after the "new DBImpl" call.