Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write providers to disk to avoid memory leaks #2860

Merged
merged 3 commits into from
Jun 28, 2016

Conversation

whyrusleeping
Copy link
Member

This is a rough first hack on this. It works as intended, but i'm not happy with the actual way of putting the information on disk.

Currently, it puts the list of peers providing a given key to the datastore at /providers/<KEY>. This means that everytime a provider is added, we write marshal all the providers and write to the datastore.

License: MIT
Signed-off-by: Jeromy [email protected]

@whyrusleeping
Copy link
Member Author

I'm thinking it might be best to just write /providers/<KEY>/<peerid> for each provider entry instead of storing the whole set at once

@whyrusleeping
Copy link
Member Author

@lgierth @Kubuxu @kevina If you guys have thoughts or CR for this, that would be much appreciated

@whyrusleeping
Copy link
Member Author

This also has the super cool advantage of persisting providers information across node reboots

@whyrusleeping
Copy link
Member Author

@Kubuxu @jbenet @kevina I'm converting keys and peerIDs to hex before passing them to the datastore NewKey constructor. This is a workaround for the go-datastore bug #2601

Before merging this, I think we might want to decide if this is the way we want to do this moving forward, or if we should attempt a fix at the go-datastore level (allowing us to skip the hex encoding)

@kevina
Copy link
Contributor

kevina commented Jun 21, 2016

@whyrusleeping #2601 is causing some problems for me in the filestore, in particular the filestore maintenance commands will report the mangled hash which can confuse users at best and at also cause problems when trying to access the block outside of the filestore, I think we should work to fix this once and for all. I image any fix will likely introduce a repo. change. Let's take this discussion over to #2601.

@@ -81,7 +81,7 @@ func NewProviderManager(ctx context.Context, local peer.ID, dstore ds.Datastore)
const providersKeyPrefix = "/providers/"

func mkProvKey(k key.Key) ds.Key {
return ds.NewKey(providersKeyPrefix + hex.EncodeToString([]byte(k)))
return ds.NewKey(providersKeyPrefix + base64.StdEncoding.EncodeToString([]byte(k)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping I would use RawURLEncoding. (1) StdEncoding includes '/' in the Alphabet (see https://tools.ietf.org/html/rfc4648) URLEncoding does not (hence it's name) (2) The RAW form does not include the unnecessary '=' padding character.

@whyrusleeping whyrusleeping added this to the Ipfs 0.4.3 milestone Jun 21, 2016
@whyrusleeping
Copy link
Member Author

@kevina @Kubuxu could I get you guys to review this PR? Would be very helpful

@Kubuxu
Copy link
Member

Kubuxu commented Jun 22, 2016

SGTM but I would like one more pair of eyes on it.

@kevina
Copy link
Contributor

kevina commented Jun 22, 2016

@whyrusleeping there is a failed test, should we worry about it?

@Kubuxu
Copy link
Member

Kubuxu commented Jun 22, 2016

@kevina no it isn't connected

@Kubuxu
Copy link
Member

Kubuxu commented Jun 22, 2016

@whyrusleeping t0060-daemon.sh compares daemon output in tests. It needs fixing.

@whyrusleeping
Copy link
Member Author

@Kubuxu thats fixed in #2891

KeysOnly: true,
Prefix: providersKeyPrefix,
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping, using dstore.Query will work, but it is very slow. The Query mechanism has a lot of overhead. The data is being sent to you via an unbuffered channel in a separate go routine. In my own filestore code I was able to get a modest speedup by increasing the channel buffer size used the query, but it was still slow (speed up of 2 to 3x). By querying the leveldb directly I was able to get a 10-12x speedup when doing a filestore ls. (See ipfs-filestore#10) I do not know how bad it will hurt performance here but it is something to keep it mind if you notice a slowdown after they code is deployed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevina even with KeysOnly set to true? Now that you mention this, i think i'll throw in some timers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping Yes, in fact that is how I was using Query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping would it be possible to improve the perf of that call, I depend on it in bloom PR (with AllKeysChan), better perf on it, shorter bloom build time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quick hack you can try and increase the channel buffer size here https://github.com/ipfs/go-datastore/blob/master/query/query.go#L185.

A more long term solution might be to rewrite the query interface not use a direct iterator rather then using Goroutines, however Disk IO may nullify this benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://ewencp.org/blog/golang-iterators/ for some performance comparisons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, using a channel as an iterator sucks. If one of you wants to work on improving the perf of query that would be great.

We could change the interface to not use a channel, and have it instead just return the next value directly. Then on top of that we could provide a method for turning the direct query result into a channel buffered one for usecases that need it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for not using channels there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you can avoid using a query entirely if you store the records as ipfs objects properly, and only use leveldb to store key -> hash_of_providers_object, similar to how the pinset is stored

@kevina
Copy link
Contributor

kevina commented Jun 23, 2016

SGTM. As far as I can tell I don't see any real problems.

License: MIT
Signed-off-by: Jeromy <[email protected]>
@Kubuxu Kubuxu added the RFM label Jun 26, 2016
License: MIT
Signed-off-by: Jeromy <[email protected]>
@whyrusleeping
Copy link
Member Author

Choo Choo!

@whyrusleeping whyrusleeping merged commit 5592144 into master Jun 28, 2016
@whyrusleeping whyrusleeping deleted the feat/provide-storage branch June 28, 2016 17:23
ghost pushed a commit that referenced this pull request Jul 28, 2016
This reverts commit 5592144, reversing
changes made to 3b2993d.

License: MIT
Signed-off-by: Lars Gierth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants