Fetch Hamt Children Nodes in Parallel #4979

kevina · 2018-04-26T07:02:28Z

whyrusleeping · 2018-05-01T05:54:20Z

If we're looking for more perf, using bitswap sessions for this would help, heres a small patch that enables that: https://github.com/ipfs/go-ipfs/compare/feat/ls-session?expand=1

Kubuxu · 2018-05-01T14:31:21Z

Flagging as WIP please remove the flag and ping me when done.

ajbouh · 2018-07-17T20:56:45Z

@kevina What's the latest here? Eagerly awaiting this change! :)

whyrusleeping · 2018-07-17T21:06:30Z

@ajbouh I think right now we just need some cleanup and review. cc @magik6k and @schomatis for review

schomatis · 2018-07-17T21:38:18Z

So, right now my only knowledge of HAMT directories is what @Stebalien explained to me in #5157 (comment), I definitely want to learn more about them as they are in the Files API milestone roadmap, but that will take me some time so I can't contribute any meaningful review at the moment.

whyrusleeping · 2018-07-17T21:39:16Z

@schomatis no real knowledge of HAMTs in particular is needed for this review. This is just a prefetching algorithm that should work over any tree.

schomatis · 2018-07-17T21:40:23Z

Great, I'll take a look at it then.

(GitHub is acting weird.)

whyrusleeping · 2018-07-17T21:42:00Z

(GitHub is acting weird.)

Yes. It's been not doing well today

schomatis

This is a partial review only of the fetcher, not the modified HAMT logic in hamt.go (which I'm not familiar with).

schomatis · 2018-07-20T11:36:52Z

unixfs/hamt/fetcher.go

+	"context"
+	"time"
+	//"fmt"
+	//"os"


Could we remove these?

schomatis · 2018-07-20T11:48:41Z

unixfs/hamt/fetcher.go

+
+var log = logging.Logger("hamt")
+
+// fetcher implements a background fether to retrieve missing child


fether -> fetcher

schomatis · 2018-07-20T11:48:44Z

unixfs/hamt/fetcher.go

+
+// fetcher implements a background fether to retrieve missing child
+// shards in large batches.  It attempts to retrieves the missing
+// shards in an order that allow streaming of the complete hamt


I'm not sure what this order is referring to.

schomatis · 2018-07-20T11:49:31Z

unixfs/hamt/fetcher.go

+	dserv ipld.DAGService
+
+	reqRes chan *Shard
+	result chan result


Could we add some documentation for these two fields?

schomatis · 2018-07-20T11:50:04Z

unixfs/hamt/fetcher.go

+	todo      jobStack        // stack of jobs that still need to be done
+	jobs      map[*Shard]*job // map of all jobs in which the results have not been collected yet
+
+	// stats relevent for streaming the complete hamt directory


relevent -> relevant

schomatis · 2018-07-20T12:23:23Z

unixfs/hamt/fetcher.go

+}
+
+func (f *fetcher) mainLoop() {
+	var want *Shard


I'm still uncertain of how want works, can we document this field?

schomatis · 2018-07-20T14:46:58Z

unixfs/hamt/fetcher.go

+			fetched.vals[string(no.Node.Cid().Bytes())] = hamt
+		}
+		for _, job := range bj.jobs {
+			job.res = fetched


Do all of the jobs get all of the results? (Related to comment at line 90.)

Yes. And I do this for code simplicity and performance reasons.

schomatis · 2018-07-20T14:54:46Z

unixfs/hamt/fetcher.go

+	delete(f.jobs, j.id)
+	f.doneCnt--
+	if len(j.res.errs) != 0 {
+		return


If there is a single error the entire children fetching chain is cut? Could that be documented in the fetcher?

~~I am not sure I understand the question.~~

Edit: Sorry the question makes sense now, it has been a while since I last touched this code.

schomatis · 2018-07-20T15:07:14Z

unixfs/hamt/fetcher.go

+		case bj := <-f.done:
+			f.doneCnt += len(bj.jobs)
+			f.cidCnt += len(bj.cids)
+			f.launch()


I'm having trouble following the launch()/idle dynamic but this unconditional launch() call seems to be guaranteeing that we don't reach a deadlock state where the idle is false and no one can call launch() to unblock it, right?

schomatis · 2018-07-20T15:08:34Z

unixfs/hamt/fetcher.go

+// the result of the batch request and not just the single job.  In
+// particular, if the 'errs' field is empty the 'vals' of the result
+// is guaranteed to contain the all the missing child shards, but the
+// map may also contain child shards of other jobs in the batch


Not sure I understand, so I may get more than I asked for? Or rather, many different get operation results may be returned in a single result of a particular get?

You get more than you ask for.

schomatis · 2018-07-20T15:15:41Z

@kevina Most of the review comments are minor details, so feel free to ignore them, the only relevant two are at line 152 and 162.

whyrusleeping · 2018-07-20T15:41:43Z

unixfs/hamt/hamt.go

+
+func (ds *Shard) missingChildShards() []*cid.Cid {
+	if len(ds.children) != len(ds.nd.Links()) {
+		panic("inconsistent lengths between children array and Links array")


Are there any scenarios that this could happen with a maliciously crafted node?

Not sure. The reason I panic here is because I am unsure what to do with the error.

~~Actually the reason for the panic is because it should not happen. This is only called inside preloadChildren which already verifies the link.~~

Edit: This needs more careful thought.

Okay. This should be fixed now.

whyrusleeping · 2018-07-20T15:43:42Z

unixfs/hamt/hamt.go

+// are not already loaded they are fetched in parallel using GetMany
+func (ds *Shard) preloadChildren(ctx context.Context, f *fetcher) error {
+	if len(ds.children) != len(ds.nd.Links()) {
+		return fmt.Errorf("inconsistent lengths between children array and Links array")


feels inconsistent to panic over this in one place, but throw an error here.

kevina · 2018-07-20T21:24:48Z

Note: Removed the use of bitswap sessions as I don't fully understand that code and I think it is causing tests to fail. Someone else can p.r. that in separately.

…atches License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

magik6k

Only looked at the fetcher implementation. Logic seems to make sense, but is a bit too hard to read.

Instead of using go logger I'd use go-log - https://github.com/ipfs/go-log/blob/5dc2060baaf8db344f31dafd852340b93811d03f/log.go#L63-L77 (See ipfs/notes#277 for more on tracing stuff)

magik6k · 2018-07-23T10:58:35Z

unixfs/hamt/fetcher.go

+
+// startFetcher starts a new fetcher in the background
+func startFetcher(ctx context.Context, dserv ipld.DAGService) *fetcher {
+	log.Infof("fetcher: starting...")


log.Debugf / remove

I would rather keep this at the same level of the rest of the stats output for consistency.

magik6k · 2018-07-23T11:07:17Z

unixfs/hamt/fetcher.go

+// The recommend minimum size is thus a size slightly larger than the
+// maximum number children in a HAMT node (which is the largest number
+// of CIDs that could be requested in a single job) or 256 + 64 = 320
+const batchSize = 320


I'd open an issue to run benchmarks with different values of this once this PR is merged

I can do that, but I already did do some informal benchmarking to arrive at this value.

@kevina is that benchmarking reproducible? Do you have scripts we can run to try it out and rederive the number ourselves?

Actually. That number is based more on reason (as outlined on comments) then benchmarks. I do not recommend we go below that size. Larger values might be beneficial at the increase of resource usage. I don't have any formal benchmarks, I mostly just observed the behavior when fetching a large directory. I can create an issue to consider increasing the value if desirable.

magik6k · 2018-07-23T11:10:00Z

unixfs/hamt/fetcher.go

+}
+
+// get gets the missing child shards for the hamt object.
+// The missing children for the passed in shard is returned.  The


s/shard is returned/shard are returned

magik6k · 2018-07-23T11:19:55Z

unixfs/hamt/fetcher.go

+	dserv ipld.DAGService
+
+	reqRes chan *Shard // channel for requesting the children of a shard
+	result chan result // channel for retrieving the results of the request


I'd call this requestCh / resultCh

magik6k · 2018-07-23T11:30:21Z

unixfs/hamt/fetcher.go

+	res  result
+}
+
+func (f *fetcher) mainLoop() {


I'd split this function into smaller functions as this is rather hard to read

I agree that might be helpful, but I would rather do this in a separate p.r., as I have other higher priority things I want to work on.

As this is new code, and not a refactor, i'd prefer getting it a bit cleaner before merging it in. @kevina If you'd like to hack on other things, maybe @magik6k could help out cleaning things up?

I also don't necessary agree it will be an improvement, but see my other comments.

magik6k · 2018-07-23T11:52:38Z

unixfs/hamt/fetcher.go

+	j.idx = -1
+}
+
+func (f *fetcher) launch() {


I'd split this up too

magik6k · 2018-07-23T12:00:48Z

unixfs/hamt/fetcher.go

+		for no := range ch {
+			if no.Err != nil {
+				fetched.errs = append(fetched.errs, no.Err)
+				continue


Any reason to not abort early here (close context and return)?

Because it's unnecessary. The fetcher can continue even if it encounters an error here.

From what I can see, at least in https://github.com/ipfs/go-ipfs/pull/4979/files#diff-689271378656b0bd9fc790b4a0a2b784R393 we throw the result away if there are errors, so it would make sense to not fetch more stuff if we know it won't be used (I might be wrong here though)

Note that code code also aborts the context so the amount of extra work done is minimal.

magik6k · 2018-07-23T12:01:34Z

unixfs/hamt/fetcher.go

+			hamt, err := NewHamtFromDag(f.dserv, no.Node)
+			if err != nil {
+				fetched.errs = append(fetched.errs, err)
+				continue


And here (abort early)

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

kevina · 2018-07-23T15:17:27Z

Instead of using go logger I'd use go-log

Um I already do:

import (
	...
	logging "gx/ipfs/QmcVVHfdyv15GVPk7NrxdWjh2hLVccXnoD8j2tyQShiXJb/go-log"
)

var log = logging.Logger("hamt")

Or I am I missing something?

magik6k · 2018-07-23T15:34:02Z

Instead of using go logger I'd use go-log

am I missing something?

What I meant is that it would probably be better / more useful to use logging spans here (i.e. https://github.com/ipfs/go-log/blob/5dc2060baaf8db344f31dafd852340b93811d03f/log.go#L63-L77)

whyrusleeping · 2018-07-23T16:36:54Z

unixfs/hamt/fetcher.go

+	for {
+		select {
+		case id := <-f.requestCh:
+			if want != nil {


seems like we could just pull most of the code in this select case into a separate function, making this flow easier to read and reason about (also removing the need for continues)

Okay. I agree this will be helpful. I will do this later today.

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

kevina · 2018-07-23T22:55:45Z

What I meant is that it would probably be better / more useful to use logging spans here (i.e. https://github.com/ipfs/go-log/blob/5dc2060baaf8db344f31dafd852340b93811d03f/log.go#L63-L77)

I am not at all familiar with that code or the proper usage of it.

kevina · 2018-10-03T00:12:57Z

@Stebalien do you now consider this p.r dead. The last time this got some activity I was in the middle of base32 work so I didn't want to spend a lot of time on this and the remaining reviews I saw more as nits then anything else. My apologizes if this caused this P.R. to get dropped.

Stebalien · 2018-10-03T01:01:56Z

It was a combination of everyone being too busy to give a careful review this patch being complicated enough to make such a review difficult and time consuming. It's also so targeted at one use-case (sharded directories) that the it's unclear if the burden of maintaining all of this machinery outweighs the benefit. It may turn out that we can't do any better but, for now, @hannahhoward has picked up this work and is starting with some simpler approaches. Eventually, I'd like to have a more generalized pre-fetcher (that can prefetch a DAG up to some node-"class").

For now, just focus on the base32 stuff. That's absolutely critical for the browser folks.

hannahhoward · 2018-10-15T15:42:04Z

closing per previous comments and ipfs/go-unixfs#19 being merged

kevina requested a review from Kubuxu as a code owner April 26, 2018 07:02

ghost assigned kevina Apr 26, 2018

ghost added the status/in-progress In progress label Apr 26, 2018

kevina force-pushed the kevina/parallel-hamt-fetch branch 3 times, most recently from f78d58b to b6ba27d Compare April 29, 2018 21:55

kevina mentioned this pull request Apr 29, 2018

Sharded directory fetching is unusably slow #4908

Closed

kevina changed the title ~~WIP: Fetch Hamt Children Nodes in Parallel~~ Fetch Hamt Children Nodes in Parallel Apr 29, 2018

kevina force-pushed the kevina/parallel-hamt-fetch branch 2 times, most recently from 7537e29 to c6be2e2 Compare April 30, 2018 10:18

Kubuxu added the status/WIP This a Work In Progress label May 1, 2018

kevina mentioned this pull request May 3, 2018

Datastore benchmarks #4870

Closed

whyrusleeping added the topic/perf Performance label Jul 17, 2018

schomatis reviewed Jul 20, 2018

View reviewed changes

whyrusleeping reviewed Jul 20, 2018

View reviewed changes

kevina force-pushed the kevina/parallel-hamt-fetch branch from 37fb98b to c1605a6 Compare July 20, 2018 21:17

kevina force-pushed the kevina/parallel-hamt-fetch branch 2 times, most recently from 1aae2e2 to 1f37b9c Compare July 21, 2018 11:11

kevina removed the status/WIP This a Work In Progress label Jul 22, 2018

kevina added 12 commits July 22, 2018 23:29

Add background go routine to retrieve missing child shards in large b…

e5dfefc

…atches License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Simplify interface. Collect additional stats.

0ca431f

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Rework retrieval strategy.

75b5619

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Collect some additional stats.

78a65cf

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Code documentation clean ups.

2ed4711

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Enhance code documentation.

30b971c

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Code documentation enhancements and general cleanups.

111f92a

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Properly deal with error condition in missingChildShards

7976b20

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Document jobStack.

37628bd

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Wait for background batch job to complete before exiting fetcher.

b5e0d71

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Documentation improvements.

26f0e87

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Address code review.

80f212c

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

kevina force-pushed the kevina/parallel-hamt-fetch branch from 762311a to 80f212c Compare July 23, 2018 03:33

magik6k reviewed Jul 23, 2018

View reviewed changes

Address code review.

50ac94b

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

whyrusleeping reviewed Jul 23, 2018

View reviewed changes

Minor Refactor.

94303ee

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

hannahhoward mentioned this pull request Sep 21, 2018

Add sessions when fetching MerkleDAG in LS #5509

Merged

kevina mentioned this pull request Oct 3, 2018

Use EnumerateChildrenAsync in for enumerating HAMT links ipfs/go-unixfs#19

Merged

Stebalien added the status/deferred Conscious decision to pause or backlog label Oct 3, 2018

hannahhoward closed this Oct 15, 2018

ghost removed status/deferred Conscious decision to pause or backlog status/in-progress In progress labels Oct 15, 2018

Stebalien deleted the kevina/parallel-hamt-fetch branch February 28, 2019 22:44


		var log = logging.Logger("hamt")

		// fetcher implements a background fether to retrieve missing child

Fetch Hamt Children Nodes in Parallel #4979

Fetch Hamt Children Nodes in Parallel #4979

Conversation

kevina commented Apr 26, 2018 • edited Loading

whyrusleeping commented May 1, 2018

Kubuxu commented May 1, 2018

ajbouh commented Jul 17, 2018

whyrusleeping commented Jul 17, 2018

schomatis commented Jul 17, 2018

whyrusleeping commented Jul 17, 2018

schomatis commented Jul 17, 2018

whyrusleeping commented Jul 17, 2018

schomatis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina Jul 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schomatis commented Jul 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina Jul 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina commented Jul 20, 2018

magik6k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina Jul 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina commented Jul 23, 2018

magik6k commented Jul 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina commented Jul 23, 2018

kevina commented Oct 3, 2018

Stebalien commented Oct 3, 2018 • edited Loading

hannahhoward commented Oct 15, 2018

kevina commented Apr 26, 2018 •

edited

Loading

kevina Jul 20, 2018 •

edited

Loading

kevina Jul 20, 2018 •

edited

Loading

kevina Jul 23, 2018 •

edited

Loading

Stebalien commented Oct 3, 2018 •

edited

Loading