-
Notifications
You must be signed in to change notification settings - Fork 28
reduce allocations and garbage collect the in-memory peerstore #39
Conversation
We switched to a slice to reduce the amount of memory the peerstore ended up taking up unfortunately, this really killed us in allocations. Looking at go-ipfs profiles, I'm worried that memory fragmentation is killing us so I'd like to revert to the old behavior. Note: The real solution here is dealing with "address abusers".
There are better ways to do this but pausing dialing once an hour likely isn't going to break anything and is the simplest approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
subManager: NewAddrSubManager(), | ||
} | ||
} | ||
|
||
func (mab *memoryAddrBook) gc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment to clarify that gc()
is called within the context of an add/update, hence it piggybacks on their locking of the addr map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -152,55 +156,42 @@ func (mab *memoryAddrBook) UpdateAddrs(p peer.ID, oldTTL time.Duration, newTTL t | |||
mab.addrmu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't change the locking, but since update and add operations are scoped to a peer, we might benefit from using a striped lock vs a global lock. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the tricky part would be iterating over the map. We'd need to copy the peer IDs into a separate array to do that.
Now, we could use a sync.Map
but I'm worried about the memory overhead.
Really, I think the correct solution is a tiered store (with rotation) or something like that. That is:
- 1m tier
- 10m tier
- 1hr tier
- 1day tier
- infinity? // GC once every N days.
For each tier, we'd have an "expiring" map and a "live" map. Every time period, we'd delete the expiring map, move the live map to the expiring map, and create a new live map.
There are probably other ways to do this but this seems like the easiest solution to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also try out sync.Map
if it is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, we could use a sync.Map but I'm worried about the memory overhead.
But yeah, we could try it and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that in your message. Sorry.
For what it's worth, I just 'benchmarked' this PR, and while allocations (HeapObjects) have generally gone down, total heap size (HeapAlloc and HeapSys) has gone up. In the image below:
Samples were taken from each executable at 60second intervals. I use the term 'benchmark' relatively loosely, because both ipfs executables are running live, they are not in a controlled environment. The screenshot below is a 6 hour period, and was consistent with what I'd seen for the full 18 hours it was running. |
Now, there are better ways to GC and we should probably use them. However, this is probably a decent stop-gap.