You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The splitstore compaction algorithm currently includes a sorting step, whereby objects to be deleted are kept in memory and sorted so that the constituents of an object are never deleted before the object itself.
This ensures that there can be no dangling references, which is a critical safety property for two reasons:
the VM cannot recreate an object with dangling references during compaction
compaction is crash-safe -- there are no danglig references if lotus crashes during cold object purge.
The Problem
Unfortunately, this important property does not come for free:
it takes a long time (up to 50% of compaction time) to sort objects, as computing object weights needs to walk all those objects and recursively compute their weights.
it uses a lot of transient memory, which is linear in the size of the cold object set. This creates problems and can lead to OOMs in memory constrained situations (eg lotus running with 32G or less RAM). Furthermore, it makes it very difficult or impossible to recover from long down times where the node needs to resync -- see Issues with syncing from scratch and long resyncs in the splitstore #6769.
Sort-free Compaction
With the introduction of on-disk marksets, we have pieces of the solution already.
The key property of the reworked algorithm is that we consider marking atomic, Once we are done marking, everything not in the markset is considered deleted and the VM must recreate the object, which can now be tracked in the markset; this can be accomplished at the interface boundary.
The basic idea is that we keep both the cold set and the markset on disk, flushed at the end of marking, and recovery of a failed purge if we happen to crash during the critical section (when we actually delete objects in the hotstore).
The critical property we must ensure is that the markset is always flushed to disk before we start deletion and that the markset themselves are crash-safe. This will need some work in our current implementation, both in the map markset backend so that it is flushed on disk and in the badger markset to flush and directly write once we have flushed.
Another observation is that with cold object reification (see #6726) we only need to do this when we operate with the discard coldstore.
It is fine to leave dangling references in the hotstore if we do have a coldstore to fallback on, as these will be fixed once there is an access to them with reification -- a nice self-healing property.
The text was updated successfully, but these errors were encountered:
Context
The splitstore compaction algorithm currently includes a sorting step, whereby objects to be deleted are kept in memory and sorted so that the constituents of an object are never deleted before the object itself.
This ensures that there can be no dangling references, which is a critical safety property for two reasons:
The Problem
Unfortunately, this important property does not come for free:
Sort-free Compaction
With the introduction of on-disk marksets, we have pieces of the solution already.
The key property of the reworked algorithm is that we consider marking atomic, Once we are done marking, everything not in the markset is considered deleted and the VM must recreate the object, which can now be tracked in the markset; this can be accomplished at the interface boundary.
The basic idea is that we keep both the cold set and the markset on disk, flushed at the end of marking, and recovery of a failed purge if we happen to crash during the critical section (when we actually delete objects in the hotstore).
The critical property we must ensure is that the markset is always flushed to disk before we start deletion and that the markset themselves are crash-safe. This will need some work in our current implementation, both in the map markset backend so that it is flushed on disk and in the badger markset to flush and directly write once we have flushed.
Another observation is that with cold object reification (see #6726) we only need to do this when we operate with the discard coldstore.
It is fine to leave dangling references in the hotstore if we do have a coldstore to fallback on, as these will be fixed once there is an access to them with reification -- a nice self-healing property.
The text was updated successfully, but these errors were encountered: