-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch Hamt Children Nodes in Parallel #4979
Conversation
f78d58b
to
b6ba27d
Compare
7537e29
to
c6be2e2
Compare
If we're looking for more perf, using bitswap sessions for this would help, heres a small patch that enables that: https://github.com/ipfs/go-ipfs/compare/feat/ls-session?expand=1 |
Flagging as WIP please remove the flag and ping me when done. |
@kevina What's the latest here? Eagerly awaiting this change! :) |
@ajbouh I think right now we just need some cleanup and review. cc @magik6k and @schomatis for review |
So, right now my only knowledge of HAMT directories is what @Stebalien explained to me in #5157 (comment), I definitely want to learn more about them as they are in the Files API milestone roadmap, but that will take me some time so I can't contribute any meaningful review at the moment. |
@schomatis no real knowledge of HAMTs in particular is needed for this review. This is just a prefetching algorithm that should work over any tree. |
Great, I'll take a look at it then. (GitHub is acting weird.) |
Yes. It's been not doing well today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review only of the fetcher, not the modified HAMT logic in hamt.go
(which I'm not familiar with).
unixfs/hamt/fetcher.go
Outdated
"context" | ||
"time" | ||
//"fmt" | ||
//"os" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove these?
unixfs/hamt/fetcher.go
Outdated
|
||
var log = logging.Logger("hamt") | ||
|
||
// fetcher implements a background fether to retrieve missing child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fether -> fetcher
unixfs/hamt/fetcher.go
Outdated
|
||
// fetcher implements a background fether to retrieve missing child | ||
// shards in large batches. It attempts to retrieves the missing | ||
// shards in an order that allow streaming of the complete hamt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this order is referring to.
unixfs/hamt/fetcher.go
Outdated
dserv ipld.DAGService | ||
|
||
reqRes chan *Shard | ||
result chan result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some documentation for these two fields?
unixfs/hamt/fetcher.go
Outdated
todo jobStack // stack of jobs that still need to be done | ||
jobs map[*Shard]*job // map of all jobs in which the results have not been collected yet | ||
|
||
// stats relevent for streaming the complete hamt directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relevent -> relevant
unixfs/hamt/fetcher.go
Outdated
} | ||
|
||
func (f *fetcher) mainLoop() { | ||
var want *Shard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still uncertain of how want
works, can we document this field?
fetched.vals[string(no.Node.Cid().Bytes())] = hamt | ||
} | ||
for _, job := range bj.jobs { | ||
job.res = fetched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all of the jobs get all of the results? (Related to comment at line 90.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And I do this for code simplicity and performance reasons.
unixfs/hamt/fetcher.go
Outdated
delete(f.jobs, j.id) | ||
f.doneCnt-- | ||
if len(j.res.errs) != 0 { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a single error the entire children fetching chain is cut? Could that be documented in the fetcher
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the question.
Edit: Sorry the question makes sense now, it has been a while since I last touched this code.
unixfs/hamt/fetcher.go
Outdated
case bj := <-f.done: | ||
f.doneCnt += len(bj.jobs) | ||
f.cidCnt += len(bj.cids) | ||
f.launch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble following the launch()
/idle
dynamic but this unconditional launch()
call seems to be guaranteeing that we don't reach a deadlock state where the idle
is false
and no one can call launch()
to unblock it, right?
unixfs/hamt/fetcher.go
Outdated
// the result of the batch request and not just the single job. In | ||
// particular, if the 'errs' field is empty the 'vals' of the result | ||
// is guaranteed to contain the all the missing child shards, but the | ||
// map may also contain child shards of other jobs in the batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand, so I may get more than I asked for? Or rather, many different get
operation results may be returned in a single result
of a particular get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get more than you ask for.
@kevina Most of the review comments are minor details, so feel free to ignore them, the only relevant two are at line 152 and 162. |
unixfs/hamt/hamt.go
Outdated
|
||
func (ds *Shard) missingChildShards() []*cid.Cid { | ||
if len(ds.children) != len(ds.nd.Links()) { | ||
panic("inconsistent lengths between children array and Links array") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any scenarios that this could happen with a maliciously crafted node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. The reason I panic here is because I am unsure what to do with the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the reason for the panic is because it should not happen. This is only called inside preloadChildren
which already verifies the link.
Edit: This needs more careful thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. This should be fixed now.
unixfs/hamt/hamt.go
Outdated
// are not already loaded they are fetched in parallel using GetMany | ||
func (ds *Shard) preloadChildren(ctx context.Context, f *fetcher) error { | ||
if len(ds.children) != len(ds.nd.Links()) { | ||
return fmt.Errorf("inconsistent lengths between children array and Links array") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels inconsistent to panic over this in one place, but throw an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
37fb98b
to
c1605a6
Compare
Note: Removed the use of bitswap sessions as I don't fully understand that code and I think it is causing tests to fail. Someone else can p.r. that in separately. |
1aae2e2
to
1f37b9c
Compare
…atches License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
762311a
to
80f212c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looked at the fetcher implementation. Logic seems to make sense, but is a bit too hard to read.
Instead of using go logger I'd use go-log - https://github.com/ipfs/go-log/blob/5dc2060baaf8db344f31dafd852340b93811d03f/log.go#L63-L77 (See ipfs/notes#277 for more on tracing stuff)
|
||
// startFetcher starts a new fetcher in the background | ||
func startFetcher(ctx context.Context, dserv ipld.DAGService) *fetcher { | ||
log.Infof("fetcher: starting...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Debugf
/ remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather keep this at the same level of the rest of the stats output for consistency.
// The recommend minimum size is thus a size slightly larger than the | ||
// maximum number children in a HAMT node (which is the largest number | ||
// of CIDs that could be requested in a single job) or 256 + 64 = 320 | ||
const batchSize = 320 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd open an issue to run benchmarks with different values of this once this PR is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but I already did do some informal benchmarking to arrive at this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevina is that benchmarking reproducible? Do you have scripts we can run to try it out and rederive the number ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually. That number is based more on reason (as outlined on comments) then benchmarks. I do not recommend we go below that size. Larger values might be beneficial at the increase of resource usage. I don't have any formal benchmarks, I mostly just observed the behavior when fetching a large directory. I can create an issue to consider increasing the value if desirable.
unixfs/hamt/fetcher.go
Outdated
} | ||
|
||
// get gets the missing child shards for the hamt object. | ||
// The missing children for the passed in shard is returned. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/shard is returned/shard are returned
unixfs/hamt/fetcher.go
Outdated
dserv ipld.DAGService | ||
|
||
reqRes chan *Shard // channel for requesting the children of a shard | ||
result chan result // channel for retrieving the results of the request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this requestCh
/ resultCh
res result | ||
} | ||
|
||
func (f *fetcher) mainLoop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd split this function into smaller functions as this is rather hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that might be helpful, but I would rather do this in a separate p.r., as I have other higher priority things I want to work on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't necessary agree it will be an improvement, but see my other comments.
j.idx = -1 | ||
} | ||
|
||
func (f *fetcher) launch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd split this up too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
for no := range ch { | ||
if no.Err != nil { | ||
fetched.errs = append(fetched.errs, no.Err) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not abort early here (close context and return)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's unnecessary. The fetcher can continue even if it encounters an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, at least in https://github.com/ipfs/go-ipfs/pull/4979/files#diff-689271378656b0bd9fc790b4a0a2b784R393 we throw the result away if there are errors, so it would make sense to not fetch more stuff if we know it won't be used (I might be wrong here though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that code code also aborts the context so the amount of extra work done is minimal.
hamt, err := NewHamtFromDag(f.dserv, no.Node) | ||
if err != nil { | ||
fetched.errs = append(fetched.errs, err) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here (abort early)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
Um I already do: import (
...
logging "gx/ipfs/QmcVVHfdyv15GVPk7NrxdWjh2hLVccXnoD8j2tyQShiXJb/go-log"
)
var log = logging.Logger("hamt") Or I am I missing something? |
What I meant is that it would probably be better / more useful to use logging spans here (i.e. https://github.com/ipfs/go-log/blob/5dc2060baaf8db344f31dafd852340b93811d03f/log.go#L63-L77) |
unixfs/hamt/fetcher.go
Outdated
for { | ||
select { | ||
case id := <-f.requestCh: | ||
if want != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we could just pull most of the code in this select case into a separate function, making this flow easier to read and reason about (also removing the need for continues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I agree this will be helpful. I will do this later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay done.
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
I am not at all familiar with that code or the proper usage of it. |
@Stebalien do you now consider this p.r dead. The last time this got some activity I was in the middle of base32 work so I didn't want to spend a lot of time on this and the remaining reviews I saw more as nits then anything else. My apologizes if this caused this P.R. to get dropped. |
It was a combination of everyone being too busy to give a careful review this patch being complicated enough to make such a review difficult and time consuming. It's also so targeted at one use-case (sharded directories) that the it's unclear if the burden of maintaining all of this machinery outweighs the benefit. It may turn out that we can't do any better but, for now, @hannahhoward has picked up this work and is starting with some simpler approaches. Eventually, I'd like to have a more generalized pre-fetcher (that can prefetch a DAG up to some node-"class"). For now, just focus on the base32 stuff. That's absolutely critical for the browser folks. |
closing per previous comments and ipfs/go-unixfs#19 being merged |
Closes #4908