implement mark and sweep garbage collection #1420

whyrusleeping · 2015-06-23T23:09:14Z

This PR implements mark and sweep garbage collection and completely removes the concept of indirect pinning.

whyrusleeping · 2015-06-23T23:43:13Z

not yet ready for merge, i forgot to take into account that after rebasing on dev0.4.0, pins are stored in the blockstore, so i have to make sure to mark those on GC.

jbenet · 2015-06-25T07:31:11Z

RFCR?

whyrusleeping · 2015-06-25T15:40:17Z

@jbenet Yeah, I could use some CR. not quite done with things, but the basics are there.

jbenet · 2015-06-25T21:52:07Z

pin/gc/gc.go

+	return nil
+}
+
+func GC(ctx context.Context, bs bstore.Blockstore, pn pin.Pinner) (<-chan key.Key, error) {


describe the algorithm in a comment here. it's important that for this sort of thing we have a record of what it should be conceptually. as often that may not match what it is.

jbenet · 2015-07-01T04:24:39Z

pin/gc/gc.go

+
+	// GCSet currently implemented in memory, in the future, may be bloom filter or
+	// disk backed to conserve memory.
+	gcs := NewGCSet()


now that we have prometheus, might be interesting to maybe mark some of our buffer datastructures to see if any of them are responsible for bad memory spikes.

what would this look like in code?

make a special buffer with a prometheus gauge or somethng?

jbenet · 2015-07-01T04:38:37Z

my biggest comment here is that we must address what happens when things are pinned during GC. Currently, GC will defer "what keys to keep or delete" to both the pin state and the blockstore, neither of which mutex-es with GC. A pin or an add (which pins) during GC could be incorrect. ("lose user data" incorrect)

Suggested approaches below.

(easy) add a gc lock

Add a lock (similar to an RWLock) which mutexes between "GC" and "Adds+Pins". Meaning, it allows concurrent access of any add and pin operation, but blocms GC during them. And likewise blocks them during GC.

This will preseve fast operation during most uses, but will peg the node a bit during GC. Similar to how allocation + GC happens in many memory GCs.

I don't expect GC to be so common that it must be run so often that this lock becomes a problem. However, it may have weird effects when there is a node with automatic gc upon hitting a "gc threshold" which is very near to the total amount of data pinned.

(hard) add add/pin log

Similar to the lock, but instead of blocking add+pins, it will instead generate a (persistent) log of operations that can be replayed to ensure the adds+pins can be properly applied after gc ends.

This is much harder to write + test, but would make add/pins not block.

jbenet · 2015-07-01T05:27:46Z

pin/gc/gc.go

+		if err != nil {
+			return err
+		}
+	}


my impression is that without concurrency, this will sequentially do IO + mutate memory, losing lots of time in the context switches with the kernel. Perhaps one good hack might be to have 2+ concurrent goroutines calling AddDag? would need some mutex on the map, so that might actually be enough to kill the improvement... but it is IO + lots of syscalls that we might schedule better...

I think we should worry about complicating things when we notice a perf hit because of this.

whyrusleeping · 2015-07-01T15:46:20Z

Yeah, my thoughts were to add a RWMutex to the blockstore where the only taker of the write portion of the lock is the GC routine. (the easy route you describe above). We can transition to the log later, it wont require a migration or anything so i'm fine taking the quick and easy route for now.

jbenet · 2015-07-01T22:35:48Z

merkledag/merkledag.go

 			}

-			nd, err := lnk.GetNode(ctx, serv)
+			err = <-FetchGraph(ctx, nd, serv)


this way of doing it spawns a goroutine per node and keeps them all alive until they all get in. maybe a concurrent approach with channels may fare better.

yeah, i can make that better

whyrusleeping · 2015-07-02T17:04:56Z

@jbenet I improved fetchgraph to still use one goroutine per node fetched, but the goroutine exits as soon as that node is received, as opposed to waiting till all its children are fetched. This would be easier if you could select on an array of channels, i considered using reflection but decided pretty code > slightly more efficient code for now

whyrusleeping · 2015-07-03T03:32:34Z

separating the locking code on the blockstore into a separate interface complicates things... the write_cache wrapper needs to now cast the blockstore its wrapping as the locking blockstore in order to provide the same interface

whyrusleeping · 2015-07-03T03:43:09Z

I'm gonna YOLO-cast it:

func (w *writecache) Lock() func() {
    return w.blockstore.(GCBlockstore).Lock()
}

jbenet · 2015-07-03T19:48:05Z

blocks/blockstore/write_cache.go

@@ -8,7 +8,7 @@ import (
 )

 // WriteCached returns a blockstore that caches up to |size| unique writes (bs.Put).
-func WriteCached(bs Blockstore, size int) (Blockstore, error) {
+func WriteCached(bs Blockstore, size int) (*writecache, error) {


is this ok to return since *writecache isn't exported?

Yes, this is the way we should be doing interfaces in go. that way, in the calling code, i can do either:

var bs Blockstore bs = WriteCached()

OR

var gcbs GCBlockstore gcbs = WriteCached()

without any ugly casts or failure conditions.

what i mean is getting into an annoying case where i need the real type of the thing but i don't have it -- maybe we should also export WriteCache? shrug it's fine i guess

i'm fine with exporting WriteCache

whyrusleeping · 2015-07-04T18:25:04Z

some of the tests in t0080-repo.sh seem wrong... for example: https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0080-repo.sh#L47

That one is expecting all the child objects of the welcome docs dir to show up in the recursive refs listing.

whyrusleeping · 2015-07-04T18:26:24Z

we should also take another look at https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0080-repo.sh#L93

because we arent storing things in leveldb anymore...

jbenet · 2015-07-04T22:56:36Z

@whyrusleeping

some of the tests in t0080-repo.sh seem wrong... for example: https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0080-repo.sh#L47

That one is expecting all the child objects of the welcome docs dir to show up in the recursive refs listing.

Yeah, that looks incorrect to me too.

we should also take another look at https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0080-repo.sh#L93

because we arent storing things in leveldb anymore...

agreed, does it pass now?

jbenet · 2015-07-07T05:56:00Z

@whyrusleeping

thought we'd split the blocks changes into their own PR before full GC?
also what's up with tons of commits from tv's pinning changes leaking in here? is it from the block changes + rebasing? could we apply -> -> ? otherwise, the changeset is enormous and very hard to check.
also, the tests didnt like you this roll (circleci had an outage and was resource constrained). i've re-rolled them.

whyrusleeping · 2015-07-07T15:52:59Z

@jbenet the commits from the pinning stuff keep leaking in since this is based on dev0.4.0 and i keep rebasing that branch, so these commits dont match up and it looks weird. I'll fix.

whyrusleeping · 2015-07-07T16:13:49Z

Alright, i separated our the blocks and merkledag changes to pr #1453, merge that first. Then this PR will be a single commit

jbenet · 2015-07-08T04:03:23Z

core/commands/pin.go

@@ -250,21 +250,11 @@ Defaults to "direct".
 				return nil, u.ErrCast()
 			}
 			out := new(bytes.Buffer)
-			if typeStr == "indirect" && count {


i believe this needs to be brought back. (with likely a mark and sweep implementation)

no, its okay that this is gone. you can still specify that you want to see indirect keys, but we arent going to bother with counting them anymore. Thats too much hassle and gains us nothing.

License: MIT Signed-off-by: Jeromy <[email protected]> dont GC blocks used by pinner License: MIT Signed-off-by: Jeromy <[email protected]> comment GC algo License: MIT Signed-off-by: Jeromy <[email protected]> add lock to blockstore to prevent GC from eating wanted blocks License: MIT Signed-off-by: Jeromy <[email protected]> improve FetchGraph License: MIT Signed-off-by: Jeromy <[email protected]> separate interfaces for blockstore and GCBlockstore License: MIT Signed-off-by: Jeromy <[email protected]> reintroduce indirect pinning, add enumerateChildren dag method License: MIT Signed-off-by: Jeromy <[email protected]>

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet · 2015-07-10T15:57:09Z

core/corerepo/gc.go

@@ -15,54 +16,40 @@ type KeyRemoved struct {
 }

 func GarbageCollect(n *core.IpfsNode, ctx context.Context) error {


we should flip the params here:GarbageCollect(ctx, n)

we've always had the core.* functions take the node as their first argument. You want to change this everywhere?

ah. even when there is a ctx involved? i guess that makes sense

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping · 2015-07-11T00:08:07Z

The tests look good. Just so many of those random failures...

implement mark and sweep garbage collection

whyrusleeping added the status/in-progress In progress label Jun 23, 2015

jbenet reviewed Jun 25, 2015
View reviewed changes

jbenet self-assigned this Jun 30, 2015

jbenet reviewed Jul 1, 2015
View reviewed changes

whyrusleeping mentioned this pull request Jul 1, 2015

Sprint N ipfs/team-mgmt#20

Closed

44 tasks

whyrusleeping force-pushed the dev0.4.0 branch from d672791 to 0e5faec Compare July 1, 2015 21:07

jbenet reviewed Jul 1, 2015
View reviewed changes

whyrusleeping force-pushed the dev0.4.0 branch from 0e5faec to 617714a Compare July 1, 2015 22:44

whyrusleeping force-pushed the feat/mark-n-sweep branch 2 times, most recently from 2dfda14 to 0149e9e Compare July 2, 2015 17:01

jbenet reviewed Jul 3, 2015
View reviewed changes

whyrusleeping force-pushed the dev0.4.0 branch from db6e00f to e486600 Compare July 6, 2015 16:25

whyrusleeping force-pushed the feat/mark-n-sweep branch 2 times, most recently from 7580503 to 9202d50 Compare July 7, 2015 16:12

jbenet reviewed Jul 8, 2015
View reviewed changes

whyrusleeping force-pushed the feat/mark-n-sweep branch from 9202d50 to 268239d Compare July 9, 2015 21:54

break up GC logic

2d82a20

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet reviewed Jul 10, 2015
View reviewed changes

whyrusleeping added 2 commits July 10, 2015 10:49

addressing comments from CR

aa1be6b

License: MIT Signed-off-by: Jeromy <[email protected]>

pin rm fails appropriately for indirect pins

c2fc8bc

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping force-pushed the feat/mark-n-sweep branch from 1bb4530 to b521f54 Compare July 10, 2015 22:41

dont use searchset for indirect pin checking

b521f54

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping added this to the IPFS 0.4.0 milestone Jul 11, 2015

jbenet added a commit that referenced this pull request Jul 11, 2015

Merge pull request #1420 from ipfs/feat/mark-n-sweep

95df5a1

implement mark and sweep garbage collection

jbenet merged commit 95df5a1 into dev0.4.0 Jul 11, 2015

jbenet removed the status/in-progress In progress label Jul 11, 2015

jbenet deleted the feat/mark-n-sweep branch July 11, 2015 00:57

kevina mentioned this pull request Aug 18, 2016

More intelligent blockstore garbage collection #3092

Open

gammazero mentioned this pull request Nov 4, 2020

[META] Garbage Collection Enhancement/Rework #7752

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement mark and sweep garbage collection #1420

implement mark and sweep garbage collection #1420

whyrusleeping commented Jun 23, 2015

whyrusleeping commented Jun 23, 2015

jbenet commented Jun 25, 2015

whyrusleeping commented Jun 25, 2015

jbenet Jun 25, 2015

jbenet Jul 1, 2015

whyrusleeping Jul 3, 2015

jbenet Jul 3, 2015

jbenet commented Jul 1, 2015

jbenet Jul 1, 2015

whyrusleeping Jul 1, 2015

jbenet Jul 3, 2015

whyrusleeping commented Jul 1, 2015

jbenet Jul 1, 2015

whyrusleeping Jul 2, 2015

whyrusleeping commented Jul 2, 2015

whyrusleeping commented Jul 3, 2015

whyrusleeping commented Jul 3, 2015

jbenet Jul 3, 2015

whyrusleeping Jul 3, 2015

jbenet Jul 3, 2015

whyrusleeping Jul 4, 2015

jbenet Jul 4, 2015

whyrusleeping commented Jul 4, 2015

whyrusleeping commented Jul 4, 2015

jbenet commented Jul 4, 2015

jbenet commented Jul 7, 2015

whyrusleeping commented Jul 7, 2015

whyrusleeping commented Jul 7, 2015

jbenet Jul 8, 2015

whyrusleeping Jul 8, 2015

jbenet Jul 10, 2015

whyrusleeping Jul 10, 2015

jbenet Jul 10, 2015

whyrusleeping commented Jul 11, 2015

		@@ -15,54 +16,40 @@ type KeyRemoved struct {
		}

		func GarbageCollect(n *core.IpfsNode, ctx context.Context) error {

implement mark and sweep garbage collection #1420

implement mark and sweep garbage collection #1420

Conversation

whyrusleeping commented Jun 23, 2015

whyrusleeping commented Jun 23, 2015

jbenet commented Jun 25, 2015

whyrusleeping commented Jun 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbenet commented Jul 1, 2015

(easy) add a gc lock

(hard) add add/pin log

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Jul 1, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Jul 2, 2015

whyrusleeping commented Jul 3, 2015

whyrusleeping commented Jul 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Jul 4, 2015

whyrusleeping commented Jul 4, 2015

jbenet commented Jul 4, 2015

jbenet commented Jul 7, 2015

whyrusleeping commented Jul 7, 2015

whyrusleeping commented Jul 7, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Jul 11, 2015