-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link caching #3642
Link caching #3642
Conversation
Apart from test failing, could you compare it vs memory only ARC cache? Persistent cache rises some problems (for example deleting). |
@Kubuxu I don't think memory-only cache will bring considerable improvement and it obviously would only work until ipfs daemon is restarted (if it's even present) |
I'm not generally opposed to this (although it touches a lot of code). Here's a different idea though: a while ago we discussed embedding a graph db like cayley as a linkstore. Use cases for this include smarter GC algorithms, and IPLD selectors. |
Looks neat -- I think the idea in the third commit of this PR could be repurposed as |
@Voker57 Nice work! Those are some serious perf improvements. However, the changeset is uncomfortably large. Theres a lot we can do to break this up, First i'll review and merge #3598. Then i have two ideas to move forward with the rest of this. We can keep this functionality in the dagservice (increasing the complexity and responsibilities of that one object). If we go this route, id like to see the We can also extract this functionality into a Beyond that, we need to think about cleaning up the datastore (this adds a not-insignificant overhead of storage space) and also tests. Thanks again! |
Caching doesn't change the fact that every nodes has to have a counterpart that contains just links but without data. Wouldn't it be sufficient to just read the links-bytes of the serialized node stream without additional caching? |
@rht yeah sounds viable, this however requires fixed node marshaling format, with filestore it's going to get a bit complicated and will need expanded blockstore interface |
@Voker57 does "fixed node marshaling format" refer to having the marshaled data deterministic/stable? On another note, using in-memory linkstore/DAG cache sounds like a reverse of what zfs does (in-memory block deduptable/DDT cache). Haven't figured out why such is the case for zfs other than suspecting that it is optimized for RW on a single file rather than frequent fs DAG walk (the use case for pin and gc -- but then, zfs does gc well, as well). |
That's assuming data is stored in flatfs. What if it's for example stored in filestore and links are in different place? Blockstore then needs GetLinks method or something. Returning physical file path from blockstore would greatly restrict development of other blockstore backends. |
case flatfs: case filestore: |
Where is this discussed, if there is a link/issue I could look at? Aside, does anyone in the community object on having an incentivized bounty on perf improvements (kind of like http://www.spacex.com/hyperloop)? This was how NixOS/nix#341 was pulled off. |
Will rework this with to make easier to use EnumerateChildrenAsync and include changes in feat/frugal-enumerate. |
blockservice/blockservice.go
Outdated
if err != nil { | ||
return err | ||
} | ||
if present == true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be just present
.
Note #3700 is related and will likely conflict with this one. |
Yeah, #3700 has been merged now. It should make this a bit different (hopefully easier). I think we should try out what @rht suggested with reading just the links from the object on disk. It will be really weird to wire it through (since the datastore interface doesnt allow for partial reads), but if we can prove that this will be efficient, then i've been wanting to change the datastore interface to return an |
43256c1
to
591230e
Compare
12d29e9
to
2ce89db
Compare
a047c47
to
d6f43ff
Compare
Rebased, fixed the tests & addressed the review, please let me know if any other adjustments are needed. |
4efb586
to
0e52f99
Compare
13b1a8f
to
2fad11e
Compare
2fad11e
to
91b63bf
Compare
…inks GetLinks() now is using link cache in datastore under /local/links and will not necessarily fetch linked nodes. This also affects EnumerateChildrenAsync and EnumerateChildren, and their previous behaviour can be reproduced by using FetchingVisitor from merkledag module. Add `ipfs repo flushlinkcache` License: MIT Signed-off-by: Iaroslav Gridin <[email protected]>
This PR introduces link caching in GetLinks, dramatically speeding up, among other things, pinning large sets of data already present in DB.
Performance on my machine:
Pinning 1.7GB without cache: 1.08m
Pinning the same with cache: ~1s
Includes #3598.
Fixes #3505.