Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitstore: Rework the Compaction algorithm to eliminate sorting #7137

Closed
vyzo opened this issue Aug 19, 2021 · 0 comments · Fixed by #8008
Closed

Splitstore: Rework the Compaction algorithm to eliminate sorting #7137

vyzo opened this issue Aug 19, 2021 · 0 comments · Fixed by #8008
Assignees
Labels
team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Milestone

Comments

@vyzo
Copy link
Contributor

vyzo commented Aug 19, 2021

Context

The splitstore compaction algorithm currently includes a sorting step, whereby objects to be deleted are kept in memory and sorted so that the constituents of an object are never deleted before the object itself.
This ensures that there can be no dangling references, which is a critical safety property for two reasons:

  • the VM cannot recreate an object with dangling references during compaction
  • compaction is crash-safe -- there are no danglig references if lotus crashes during cold object purge.

The Problem

Unfortunately, this important property does not come for free:

  • it takes a long time (up to 50% of compaction time) to sort objects, as computing object weights needs to walk all those objects and recursively compute their weights.
  • it uses a lot of transient memory, which is linear in the size of the cold object set. This creates problems and can lead to OOMs in memory constrained situations (eg lotus running with 32G or less RAM). Furthermore, it makes it very difficult or impossible to recover from long down times where the node needs to resync -- see Issues with syncing from scratch and long resyncs in the splitstore #6769.

Sort-free Compaction

With the introduction of on-disk marksets, we have pieces of the solution already.

The key property of the reworked algorithm is that we consider marking atomic, Once we are done marking, everything not in the markset is considered deleted and the VM must recreate the object, which can now be tracked in the markset; this can be accomplished at the interface boundary.

The basic idea is that we keep both the cold set and the markset on disk, flushed at the end of marking, and recovery of a failed purge if we happen to crash during the critical section (when we actually delete objects in the hotstore).

The critical property we must ensure is that the markset is always flushed to disk before we start deletion and that the markset themselves are crash-safe. This will need some work in our current implementation, both in the map markset backend so that it is flushed on disk and in the badger markset to flush and directly write once we have flushed.

Another observation is that with cold object reification (see #6726) we only need to do this when we operate with the discard coldstore.
It is fine to leave dangling references in the hotstore if we do have a coldstore to fallback on, as these will be fixed once there is an access to them with reification -- a nice self-healing property.

@vyzo vyzo self-assigned this Aug 19, 2021
@vyzo vyzo added epic/splitstore team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs labels Aug 19, 2021
@jennijuju jennijuju added this to the v1.13.3 milestone Dec 8, 2021
@jennijuju jennijuju modified the milestones: v1.13.3, v1.15.0 Jan 2, 2022
@jennijuju jennijuju modified the milestones: v1.15.0, v1.15.1 Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants