kvs object store needs garbage collection #258

garlick · 2015-07-10T00:28:54Z

The kvs object store currently grows without bound.

This is hard to address in the current design. Reference counting would add overhead to the system. Periodically walking the namespace to find disconnected objects ignores the fact that eventually consistent slaves may still be using those objects, or clients may be traversing old versions of the name space after kvsdir_t is turned into a "snapshot reference" per issue #64.

The text was updated successfully, but these errors were encountered:

trws · 2018-03-25T23:19:44Z

This is potentially becoming relevant again. A long-run job I have testing longevity is, at 200,000 jobs none of which use the kvs for IO, up to 26GB of sqlite database at this point.

Do you think it would be reasonable to do something like etcd, where older versions can be traversed, but only up to a certain distance back? In their case it's a certain number of updates to a value, on the order of 1000 or something, but we might be able to establish a limit at which a client couldn't expect to be able to look back in time to ease this a bit.

trws · 2018-03-28T23:03:43Z

An idle thought, but if we could tell the kvs explicitly that nothing will ever look at a given key again, would it be reasonable to clear out all data that was ever used to represent that key? I'm thinking specifically of something we could include with purge to take care of old lwj data that we're explicitly deleting.

garlick · 2018-03-29T14:16:32Z

Hmm, not sure how that would work since content blobs can be pointed to from multiple keys/directories and the content store is inherently deduplicating, with no refcount/back reference data kept with the blobs.

It seems like we need something like git-gc here, to identify "unreachable objects". For us this is complicated by multiple namespaces sharing one content store, the possible existence of content references outside of any KVS namespace, and the difficulty of taking the KVS offline for any length of time to walk every reference.

I like the etcd idea. I wonder as a first cut if we could add an epoch to each content blob and then periodically walk the namespace(s), updating the epoch for all currently-referenced blobs? Then purge all blobs whose epoch is older than some threshold.

The other thing that seems worth pursuing is to add some "persistence flags" to a namespace to handle

namespaces that need no persistence at all, like the per-job PMI namespace
namespaces that might only need their "final" snapshot captured (some jobs maybe?)
namespaces that require strong persistence (on disk after every commit, say)

Another thought is that the current "write back" cache on rank 0 might have some opportunity to avoid writing some objects to the backing store completely, e.g. if they are "dereferenced" before being written. Content blobs in flight could have some additional flags that affect their persistence.

Just thinking out loud really, more discussion/thought needed.

SteVwonder · 2019-10-29T21:57:14Z

Per the coffee discussion today:

For us this is complicated by multiple namespaces sharing one content store

@trws suggested that we could have a separate content store for each namespace. For guest namespaces for a job, the final "snapshot" could be copied into the main content store and the guest content store be deleted.

It seems like we need something like git-gc here, to identify "unreachable objects".

It was also mentioned that for our TOSS4 timeline, we could start running this gc process after an instance restart as long as the job shells don't try and access "old"/"stale" references (that code will need some auditing).

garlick · 2022-02-24T18:31:52Z

Let's say that this issue can be closed if we can garbage collect the content store on the way up from an instance restart, based on following the last-written root blobref checkpoint, and deleting everything that's not referenced.

Sort of like WALL-E. The trash piles up, then we send Flux away until the robots finish cleaning up. What could go wrong?

Let's save KVS redesign with refcounting for another day/issue.

Problem: a system instance that runs flux-dump(1) from rc3 might get killed by systemd TimeoutStopSec. Have flux-shutdown(1) arrange for the dump. If the instance is being shut down by this method, then systemctl stop is not being run, so TimeoutStopSec does not apply. Fixes flux-framework#258

garlick mentioned this issue Jul 10, 2015

kvs master needs disk backing store #259

Closed

trws modified the milestone: In-job scheduling part 2 Jul 13, 2015

trws mentioned this issue Aug 13, 2015

lightweight flux-run processes #327

Closed

garlick mentioned this issue Jan 20, 2017

system instance #755

Closed

13 tasks

grondo mentioned this issue Mar 27, 2018

wreck assumes 1 task per core #1378

Closed

trws mentioned this issue Jul 10, 2018

simple job shell #1335

Closed

chu11 mentioned this issue Apr 21, 2021

throughput of jobs slows down over time #3583

Closed

garlick removed this from the In-job scheduling part 2 milestone Apr 21, 2021

garlick mentioned this issue Aug 3, 2021

kvs: support mechanism to checkpoint and restore guest namespaces #3811

Closed

garlick mentioned this issue Feb 25, 2022

add a way to access the broker's config object outside of the broker #4161

Closed

This was referenced Mar 9, 2022

garbage collection plan could leave job-exec checkpoint with dangling refrerences #4201

Open

kvs dump/restore utilities needed #4202

Closed

kvs should not assume empty directory exists in content store #4222

Closed

garlick added this to the flux-core v0.39.0 milestone May 2, 2022

garlick mentioned this issue May 2, 2022

flux-shutdown: add --gc garbage collection option #4303

Merged

mergify bot closed this as completed in 358f21b May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvs object store needs garbage collection #258

kvs object store needs garbage collection #258

garlick commented Jul 10, 2015

trws commented Mar 25, 2018

trws commented Mar 28, 2018

garlick commented Mar 29, 2018

SteVwonder commented Oct 29, 2019

garlick commented Feb 24, 2022

kvs object store needs garbage collection #258

kvs object store needs garbage collection #258

Comments

garlick commented Jul 10, 2015

trws commented Mar 25, 2018

trws commented Mar 28, 2018

garlick commented Mar 29, 2018

SteVwonder commented Oct 29, 2019

garlick commented Feb 24, 2022