Skip to content
This repository has been archived by the owner on Aug 19, 2022. It is now read-only.

reduce allocations and garbage collect the in-memory peerstore #39

Merged
merged 3 commits into from
Oct 4, 2018

Conversation

Stebalien
Copy link
Member

  1. Revert to the memory heavy, allocation light in-memory representation. We switched to arrays to save memory but, unfortunately, that started killing us on allocations.
  2. Add a once-per-hour GC cycle. We GC when we read a peer's addresses but we don't do that very often.

Now, there are better ways to GC and we should probably use them. However, this is probably a decent stop-gap.

We switched to a slice to reduce the amount of memory the peerstore ended up
taking up unfortunately, this really killed us in allocations. Looking at
go-ipfs profiles, I'm worried that memory fragmentation is killing us so I'd
like to revert to the old behavior.

Note: The real solution here is dealing with "address abusers".
There are better ways to do this but pausing dialing once an hour likely isn't
going to break anything and is the simplest approach.
@ghost ghost assigned Stebalien Oct 2, 2018
@ghost ghost added the status/in-progress In progress label Oct 2, 2018
@Stebalien Stebalien requested a review from raulk October 2, 2018 17:54
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

subManager: NewAddrSubManager(),
}
}

func (mab *memoryAddrBook) gc() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment to clarify that gc() is called within the context of an add/update, hence it piggybacks on their locking of the addr map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -152,55 +156,42 @@ func (mab *memoryAddrBook) UpdateAddrs(p peer.ID, oldTTL time.Duration, newTTL t
mab.addrmu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't change the locking, but since update and add operations are scoped to a peer, we might benefit from using a striped lock vs a global lock. Any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the tricky part would be iterating over the map. We'd need to copy the peer IDs into a separate array to do that.

Now, we could use a sync.Map but I'm worried about the memory overhead.


Really, I think the correct solution is a tiered store (with rotation) or something like that. That is:

  • 1m tier
  • 10m tier
  • 1hr tier
  • 1day tier
  • infinity? // GC once every N days.

For each tier, we'd have an "expiring" map and a "live" map. Every time period, we'd delete the expiring map, move the live map to the expiring map, and create a new live map.

There are probably other ways to do this but this seems like the easiest solution to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also try out sync.Map if it is a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we could use a sync.Map but I'm worried about the memory overhead.

But yeah, we could try it and see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that in your message. Sorry.

@Stebalien Stebalien merged commit 2a27e91 into master Oct 4, 2018
@Stebalien Stebalien deleted the fix/allocations branch October 4, 2018 00:43
@ghost ghost removed the status/in-progress In progress label Oct 4, 2018
@rob-deutsch
Copy link

rob-deutsch commented Oct 9, 2018

For what it's worth, I just 'benchmarked' this PR, and while allocations (HeapObjects) have generally gone down, total heap size (HeapAlloc and HeapSys) has gone up.

In the image below:

  • base in the below images refer to ipfs/kubo@41a73885e
  • head refers to this PR layered on top of the go-libp2p-peerstore that's being used.

Samples were taken from each executable at 60second intervals. I use the term 'benchmark' relatively loosely, because both ipfs executables are running live, they are not in a controlled environment. The screenshot below is a 6 hour period, and was consistent with what I'd seen for the full 18 hours it was running.

screen shot 2018-10-09 at 12 06 36

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants