Link caching #3642

Voker57 · 2017-01-28T16:17:59Z

This PR introduces link caching in GetLinks, dramatically speeding up, among other things, pinning large sets of data already present in DB.

Performance on my machine:
Pinning 1.7GB without cache: 1.08m
Pinning the same with cache: ~1s

Includes #3598.
Fixes #3505.

Kubuxu · 2017-01-28T17:35:38Z

Apart from test failing, could you compare it vs memory only ARC cache? Persistent cache rises some problems (for example deleting).

Voker57 · 2017-01-28T19:11:13Z

@Kubuxu I don't think memory-only cache will bring considerable improvement and it obviously would only work until ipfs daemon is restarted (if it's even present)
Deleting or updating the data should not be required since DAG nodes are immutable.

ghost · 2017-01-28T19:37:12Z

I'm not generally opposed to this (although it touches a lot of code).

Here's a different idea though: a while ago we discussed embedding a graph db like cayley as a linkstore. Use cases for this include smarter GC algorithms, and IPLD selectors.

rht · 2017-01-28T23:37:01Z

Looks neat -- I think the idea in the third commit of this PR could be repurposed as PBNode.Unmarshallinks (read only the links part of the dagnode, i.e. until at the links and data separator, 0xa). No cache is necessary and will likely have smaller implementation surface.

whyrusleeping · 2017-01-29T10:09:00Z

@Voker57 Nice work! Those are some serious perf improvements. However, the changeset is uncomfortably large. Theres a lot we can do to break this up, First i'll review and merge #3598. Then i have two ideas to move forward with the rest of this.

We can keep this functionality in the dagservice (increasing the complexity and responsibilities of that one object). If we go this route, id like to see the NewDagService constructor remain the same, and just have it set theData field to nil so we don't have to change every invocation of this through the whole codebase (and external things also using that function). We can then add a new constructor NewDagServiceWithLinkCache (or something less wordy).

We can also extract this functionality into a LinkService abstraction that does this caching, and have the dagservice accept one of those in its extended constructor. Doing it this way should really clean up the interfaces a lot and allow us to mock this out and replace it more easily.

Beyond that, we need to think about cleaning up the datastore (this adds a not-insignificant overhead of storage space) and also tests.

Thanks again!

rht · 2017-01-29T11:35:55Z

Caching doesn't change the fact that every nodes has to have a counterpart that contains just links but without data. Wouldn't it be sufficient to just read the links-bytes of the serialized node stream without additional caching?

Voker57 · 2017-01-29T13:00:20Z

@rht yeah sounds viable, this however requires fixed node marshaling format, with filestore it's going to get a bit complicated and will need expanded blockstore interface

rht · 2017-01-29T22:16:32Z

@Voker57 does "fixed node marshaling format" refer to having the marshaled data deterministic/stable?
I see this PR as an under-the-hood optimization for DAGService.GetLinks (@whyrusleeping's option 1). And where the only extension needed for multiblockstore is to specify prefix path determines where the node data should be read from. In short, I did not get the complications.

On another note, using in-memory linkstore/DAG cache sounds like a reverse of what zfs does (in-memory block deduptable/DDT cache). Haven't figured out why such is the case for zfs other than suspecting that it is optimized for RW on a single file rather than frequent fs DAG walk (the use case for pin and gc -- but then, zfs does gc well, as well).

Voker57 · 2017-01-29T23:29:35Z

@rht

And where the only extension needed for multiblockstore is to specify prefix path determines where the node data should be read from. In short, I did not get the complications.

That's assuming data is stored in flatfs. What if it's for example stored in filestore and links are in different place? Blockstore then needs GetLinks method or something. Returning physical file path from blockstore would greatly restrict development of other blockstore backends.

rht · 2017-01-30T18:47:52Z

case flatfs:
Read just the links-bytes until 0xa should be sufficient.

case filestore:
I haven't fully scanned through the filestore implementation to see where the fs links are stored. I am currently assuming that it stores DAGnodes as usual in a flatfs, but each filecontent is stored as a serialized PosInfo objects (i.e. "symlinks to a slice of file"). It operates at a layer in between blockstore and dagnode, so it doesn't see dagnode links. UnmarshalLinks should still suffice, since links and PosInfo metadata are still separated by 0xa.

rht · 2017-01-30T19:06:42Z

Here's a different idea though: a while ago we discussed embedding a graph db like cayley as a linkstore. Use cases for this include smarter GC algorithms, and IPLD selectors.

Where is this discussed, if there is a link/issue I could look at?

Aside, does anyone in the community object on having an incentivized bounty on perf improvements (kind of like http://www.spacex.com/hyperloop)? This was how NixOS/nix#341 was pulled off.

Voker57 · 2017-02-13T14:34:12Z

Will rework this with to make easier to use EnumerateChildrenAsync and include changes in feat/frugal-enumerate.

Kubuxu · 2017-01-28T17:31:00Z

blockservice/blockservice.go

+	if err != nil {
+		return err
+	}
+	if present == true {


Can be just present.

kevina · 2017-02-17T19:15:38Z

Note #3700 is related and will likely conflict with this one.

whyrusleeping · 2017-03-02T19:12:25Z

Yeah, #3700 has been merged now. It should make this a bit different (hopefully easier).

I think we should try out what @rht suggested with reading just the links from the object on disk. It will be really weird to wire it through (since the datastore interface doesnt allow for partial reads), but if we can prove that this will be efficient, then i've been wanting to change the datastore interface to return an io.Reader for quite a while, which would allow us to more easily make this change.

Voker57 · 2017-07-14T22:16:14Z

Rebased, fixed the tests & addressed the review, please let me know if any other adjustments are needed.

…inks GetLinks() now is using link cache in datastore under /local/links and will not necessarily fetch linked nodes. This also affects EnumerateChildrenAsync and EnumerateChildren, and their previous behaviour can be reproduced by using FetchingVisitor from merkledag module. Add `ipfs repo flushlinkcache` License: MIT Signed-off-by: Iaroslav Gridin <[email protected]>

Voker57 force-pushed the feat/linkdb branch from 1b31a28 to 9231768 Compare January 28, 2017 20:16

Voker57 force-pushed the feat/linkdb branch from 9231768 to 66b1112 Compare January 29, 2017 00:33

Voker57 mentioned this pull request Jan 29, 2017

ipfs pin add is very slow / hanging #3505

Closed

Kubuxu reviewed Feb 13, 2017

View reviewed changes

blockservice/blockservice.go Outdated

if err != nil {

return err

}

if present == true {

Copy link

Member

Kubuxu Jan 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be just present.

Voker57 force-pushed the feat/linkdb branch from 66b1112 to 4ed3bb7 Compare February 14, 2017 00:19

whyrusleeping added this to the Ipfs 0.4.7 milestone Feb 17, 2017

whyrusleeping modified the milestones: Ipfs 0.4.8, Ipfs 0.4.7 Mar 10, 2017

whyrusleeping modified the milestones: Ipfs 0.4.9, Ipfs 0.4.8 Mar 24, 2017

Voker57 force-pushed the feat/linkdb branch 2 times, most recently from 43256c1 to 591230e Compare May 7, 2017 11:53

whyrusleeping modified the milestones: Ipfs 0.4.10, Ipfs 0.4.9 May 8, 2017

Voker57 force-pushed the feat/linkdb branch from 591230e to f7a326f Compare May 14, 2017 15:04

Voker57 force-pushed the feat/linkdb branch from f7a326f to 5135202 Compare June 27, 2017 17:53

Voker57 force-pushed the feat/linkdb branch 2 times, most recently from 12d29e9 to 2ce89db Compare June 28, 2017 17:29

Voker57 force-pushed the feat/linkdb branch 5 times, most recently from a047c47 to d6f43ff Compare July 14, 2017 21:49

Voker57 mentioned this pull request Jul 14, 2017

add fetch flag to pin add and Pin() functions #3555

Closed

magik6k modified the milestones: Ipfs 0.4.10, Ipfs 0.4.11 Jul 28, 2017

Voker57 force-pushed the feat/linkdb branch 2 times, most recently from 4efb586 to 0e52f99 Compare August 18, 2017 12:15

whyrusleeping added the need/review Needs a review label Aug 28, 2017

whyrusleeping modified the milestones: Ipfs 0.4.12, Ipfs 0.4.11 Aug 28, 2017

Voker57 force-pushed the feat/linkdb branch 2 times, most recently from 13b1a8f to 2fad11e Compare September 9, 2017 11:17

Voker57 force-pushed the feat/linkdb branch from 2fad11e to 91b63bf Compare September 22, 2017 13:51

Voker57 force-pushed the feat/linkdb branch from 91b63bf to 4283aad Compare October 20, 2017 11:56

Voker57 force-pushed the feat/linkdb branch from 4283aad to d7e23c7 Compare November 1, 2017 14:10

whyrusleeping modified the milestones: Ipfs 0.4.12, go-ipfs 0.4.13 Nov 14, 2017

Voker57 force-pushed the feat/linkdb branch from d7e23c7 to 1828336 Compare December 16, 2017 04:45

Voker57 force-pushed the feat/linkdb branch from 1828336 to bcb8a35 Compare January 19, 2018 20:17

guseggert closed this Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link caching #3642

Link caching #3642

Voker57 commented Jan 28, 2017 •

edited

Loading

Kubuxu commented Jan 28, 2017

Voker57 commented Jan 28, 2017

ghost commented Jan 28, 2017

rht commented Jan 28, 2017 •

edited

Loading

whyrusleeping commented Jan 29, 2017

rht commented Jan 29, 2017 •

edited

Loading

Voker57 commented Jan 29, 2017

rht commented Jan 29, 2017 •

edited

Loading

Voker57 commented Jan 29, 2017

rht commented Jan 30, 2017 •

edited

Loading

rht commented Jan 30, 2017 •

edited

Loading

Voker57 commented Feb 13, 2017

Kubuxu Jan 28, 2017

kevina commented Feb 17, 2017

whyrusleeping commented Mar 2, 2017

Voker57 commented Jul 14, 2017

Link caching #3642

Link caching #3642

Conversation

Voker57 commented Jan 28, 2017 • edited Loading

Kubuxu commented Jan 28, 2017

Voker57 commented Jan 28, 2017

ghost commented Jan 28, 2017

rht commented Jan 28, 2017 • edited Loading

whyrusleeping commented Jan 29, 2017

rht commented Jan 29, 2017 • edited Loading

Voker57 commented Jan 29, 2017

rht commented Jan 29, 2017 • edited Loading

Voker57 commented Jan 29, 2017

rht commented Jan 30, 2017 • edited Loading

rht commented Jan 30, 2017 • edited Loading

Voker57 commented Feb 13, 2017

Kubuxu Jan 28, 2017

Choose a reason for hiding this comment

kevina commented Feb 17, 2017

whyrusleeping commented Mar 2, 2017

Voker57 commented Jul 14, 2017

Voker57 commented Jan 28, 2017 •

edited

Loading

rht commented Jan 28, 2017 •

edited

Loading

rht commented Jan 29, 2017 •

edited

Loading

rht commented Jan 29, 2017 •

edited

Loading

rht commented Jan 30, 2017 •

edited

Loading

rht commented Jan 30, 2017 •

edited

Loading