Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring memory usage #5530

Open
rob-deutsch opened this issue Sep 27, 2018 · 18 comments
Open

Measuring memory usage #5530

rob-deutsch opened this issue Sep 27, 2018 · 18 comments

Comments

@rob-deutsch
Copy link
Contributor

I have a question about measuring and understanding go-ipfs' memory usage - because I'm getting conflicting signs.

I'm doing this because go-ipfs is using a lot of memory on my docker instance (~1GB) and I want to understand why.

I've been following the instructions in debug-guide.md but I'm getting different results to what my OS is telling me.

My OS is telling me that I'm using around 256MB:

$ ps aux | head -n1
USER               PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
$ ps aux | grep ipfs
rob              16719   0.5  3.1 558590036 256180 s002  Ss+   2:24pm   6:28.99 ipfs daemon
$

But pprof tells me that it's only using 76.38MB:

$ curl localhost:5001/debug/pprof/heap > ipfs.heap
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 78010    0 78010    0     0  3822k      0 --:--:-- --:--:-- --:--:-- 4009k
$ go tool pprof /Users/rob/golang/bin/ipfs ipfs.heap 
File: ipfs
Type: inuse_space
Time: Sep 27, 2018 at 5:21pm (AEST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top5
Showing nodes accounting for 38.29MB, 50.14% of 76.38MB total
Showing top 5 nodes out of 207
      flat  flat%   sum%        cum   cum%
   13.47MB 17.63% 17.63%    22.97MB 30.07%  gx/ipfs/QmQjMHF8ptRgx4E57UFMiT4YM6kqaJeYxZ1MCDX23aw4rK/golang-lru/simplelru.(*LRU).Add
    9.50MB 12.44% 30.07%     9.50MB 12.44%  container/list.(*List).PushFront
    6.31MB  8.26% 38.34%     6.31MB  8.26%  gx/ipfs/Qmda4cPRvSRyox3SqgJN6DfSZGU5TtHufPTp9uXjFj71X6/go-libp2p-peerstore/pstoremem.(*memoryAddrBook).AddAddrs
    4.52MB  5.91% 44.25%     4.52MB  5.91%  gx/ipfs/QmZt87ZHYGaZFBrtGPEqYjAC2yNAhgiRDBmu8KaCzHjx6h/yamux.newSession
    4.50MB  5.89% 50.14%     4.50MB  5.89%  gx/ipfs/QmYmsdtJ3HsodkePE3eU3TsCaP2YvPZJ4LoXnNkDE5Tpt7/go-multiaddr.NewMultiaddrBytes
(pprof) 

What am I doing wrong?

@Kubuxu
Copy link
Member

Kubuxu commented Sep 27, 2018

pprof shows only space used on the allocated heap. ps tells you how much space go runtime grabbed from system for the heap, stacks and so on.

@rob-deutsch
Copy link
Contributor Author

Thanks for the quick response. That makes sense.

What's the best way to get a feel for what's actually using all the memory?

I'm looking to run IPFS on systems that don't have 1GB available for IPFS, and I'm wondering if there's low hanging fruit for me to tackle.

@Stebalien
Copy link
Member

Some of it will be stacks (go routines). However, those should generally only use on the order of a few megabytes in total.

I have a sneaking suspicion that the rest is memory fragmentation. Basically:

  1. We allocate something and go doesn't have space for it.
  2. Go asks the kernel for a page to put this data in.
  3. Go fills up this page with a bunch of random allocations.
  4. We free a bunch of stuff but not everything in the page.

At this point, we're not using the entire page but go can't give it back to the OS. Go will use the rest of the space in these pages, but we may not need them (and/or the gaps in allocations are too small to be useful).

This is exacerbated by:

  1. Go-ipfs's bursty allocation patterns. We'll often do a bunch of work all at once, allocate a bunch of tiny things, and then free them. Unfortunately, if we also happen to allocate a long-lived object in one of these pages full of short-lived objects, we'll have to keep the whole page around even after we free the short-lived objects. High performance systems with allocation patterns like this will actually use separate allocators/memory pools to keep these kinds of allocations on separate pages. Unfortunately, go doesn't really support custom allocators.

  2. Garbage collection. I'm guessing (I haven't confirmed this) that go doesn't wait for a full GC to finish before asking the kernel to allocate a new page. That means when allocating and deallocating really quickly, we can end up asking the operating system for significantly more memory than we actually use at any given point in time. Unfortunately, due to (1), we may end up having to keep this memory reserved because we may be using small pieces of it.


TL;DR: reduce allocation frequency and use memory pools (see Pool in sync).

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Sep 29, 2018

Thanks for the insight! Maybe that actually makes it easier to fix, because no core logic has to be changed, just the creation/deletion of objects. However, investigating this issue seems to be tough.

Ignore the following. I think I confirmed your suspicion. Although the numbers dont quite reconcile.

What I've been looking into now is how I can confirm that suspicion, but I haven't come up with any reliable methods.

How do I find out how many bytes of pages Go is currently holding to support the 76.38MB of actual objects on the heap? I'm very open to suggestions.

I've come up with 3 options, none of which work:

1. I've played with the inuse_space and alloc_space options for pprof, but they just tell me the 76.38MB figure or the total virtual memory allocated (something like 518GB).

2. I've also looked into MemStats struct from the runtime pkg but that doesn't seem to have the desired info.

3. This golang/go/issues/11035 showed some form of investigation, but it wasn't terribly concrete. It just tracks the total memory use over time with the number of objects allocated over time. I describe this as "not concrete" because a) there could be different types/sizes of objects being allocated and b) it can't be used to do a point-in-time analysis without history.

Any suggestions on how to dig deeper into this?

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Sep 29, 2018

Okay, I've been starting to make some progress on this. It turns out that MemStats from the runtime package might have the info we want, and that go-ipfs is kind enough to expose it (I'll submit a PR adding it to the documentation).

However, I'm slightly concerned that the numbers don't reconcile to ps aux. For example, I'm surprised that MemStats is reporting less heap than the total RES size.

Also, according to the MemStats documentation, HeapSys "includes virtual address that has been reserved but not yet used, which consumes no physical memory". The documentation claims that the loss due to fragmentation should be HeapInuse less HeapAlloc (or only 80MB in my case).

  1. debug/var says that HeapSys is 331MB:
{
  "Alloc": 100528888,
  "TotalAlloc": 25294914672,
  "Sys": 378910968,
  "Lookups": 162513,
  "Mallocs": 291384450,
  "Frees": 290363035,
  "HeapAlloc": 100528888,
  "HeapSys": 331874304,
  "HeapIdle": 151117824,
  "HeapInuse": 180756480,
  "HeapReleased": 82337792,
  "HeapObjects": 1021415,
  "StackInuse": 26738688,
  "StackSys": 26738688,
  "MSpanInuse": 3169048,
  "MSpanSys": 3407872,
  "MCacheInuse": 3472,
  "MCacheSys": 16384,
  "BuckHashSys": 2340107,
  "GCSys": 13131776,
  "OtherSys": 1401837,
  "NextGC": 190323504,
  "LastGC": 1538202755353124600,
  "PauseTotalNs": 75400404,
  "NumGC": 328,
  "NumForcedGC": 0,
  "GCCPUFraction": 0.0014981951587293512,
  "EnableGC": true,
  "DebugGC": false
}
  1. ps says RES is 316MB:
$ ps aux | head -n1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
$ ps aux | grep ipfs
ipfs         6  6.7  7.8 1000736 316252 ?      Sl   03:48  12:38 ipfs daemon --migrate=true
  1. pprof heap says I'm using 87MB on the heap:
File: ipfs
Build ID: f3d5672af853da042619ce8ddf88d293648641ed
Type: inuse_space
Time: Sep 29, 2018 at 4:31pm (AEST)
Showing nodes accounting for 87.51MB, 100% of 87.51MB total
      flat  flat%   sum%        cum   cum%
   15.42MB 17.63% 17.63%    23.42MB 26.77%  gx/ipfs/QmVYxfoJQiZijTgPNHCHgHELvQpbsJNTg6Crmc3dQkj3yy/golang-lru/simplelru.(*LRU)
.Add
       8MB  9.14% 26.77%        8MB  9.14%  container/list.(*List).insertValue (inline)
    7.53MB  8.60% 35.37%     7.53MB  8.60%  bufio.NewReaderSize (inline)
    5.87MB  6.70% 42.07%     5.87MB  6.70%  gx/ipfs/QmZR2XWVVBCtbgBWnQhWk2xcQfaR3W8faQPriAiaaj7rsr/go-libp2p-peerstore.(*AddrM
anager).AddAddrs

@Stebalien
Copy link
Member

The documentation claims that the loss due to fragmentation should be HeapInuse less HeapAlloc (or only 80MB in my case).

So, that's the upper bound on memory that can't be reused (I think?). However, there may be some that can be reused but isn't and also can't be returned (maybe?). Unfortunately, the documentation isn't very clear on this point.

I really don't know what's going on with those numbers but I'm guessing it has to do with the fact that the OS is free to do whatever it wants and go can't accurately guess.

@rob-deutsch
Copy link
Contributor Author

Thanks for the response @Stebalien. I think there are currently a lot of unknowns. It's possible that OS ambiguity is a cause here, although I'm not 100% convinced yet, but even if that's the case it's not the only cause. pprof says the heap is using 80MB but MemStats reports that HeapInUse is 180MB.

In general my desire is to understand why actual memory usage is so much higher than the pprof heap figure plus the MemStat's stack size (~100MB). Or, to narrow it a bit, why MemStat's HeapSys is so much bigger than the pprof heap figure.

There is very little information about investigating what Go's memory allocator and garbage collector is actually doing. I am currently investigating using runtime/debug.WriteHeapDump (format here) to get more insight into what's happening.

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Oct 3, 2018

Is it okay if I use this issue for a log of my continuing investigation and experiments? I'm also happy to take any thoughts/suggestions that people have.

I hacked go-ipfs such that it would serve me debug.WriteHeapDump() files and it went a decent way to clearing up what is actually happening here. The total size of the objects on the heap, in the example above, is HeapAlloc's 100MB (100528888 bytes). pprof heap's figure of 87MB was taken right after GC, but HeapAlloc is at a point in time and includes GC-able objects.

Summary

Golang's MemStat's HeapSys is reporting 331MB. This is the amount that go-ipfs has taken from the OS for the heap. It is made up as follows:

  1. 87MB: Actual go-ipfs object data. This is the theoretical minimum data that needs to be on the heap. The rest is 'waste'.
  2. 13MB: go-ipfs object data that's likely ready to be GC'd.
  3. 80MB: go-ipfs fragmentation and Go runtime sizeclass waste (probably skewed to fragmentation)
  4. 151MB: Go runtime's hoarding of memory. It's refusing to give it back to the OS. I think because go-ipfs has a relatively high 'allocation velocity' which it wants to serve quickly.

Note that Go has told the OS that 82MB of this 332MB the memory isn't strictly being used, which is probably why ps is reporting only 316MB of memory usage. However, according to this guy, and my own experience, Linux doesnt usually pay much attention to it.

Conclusion and Next Steps

While fragmentation is certainly an issue, it seems that high allocation/deallocation velocity might be a bigger issue. The Go runtime is hoarding a lot of memory, and I would suggest that go-ipfs could/should better utilise the Go runtime's memory allocator.

There are two things I'd like to look into next:

  1. Confirm the hypothesis. Or, to put it in other words, confirm that Go is hoarding the 151MB of memory because of what its observing from go-ipfs.

  2. Confirm that the profile of the 'waste' I've observed in this example is generally applicable, and wasn't just a once off. E.g. maybe it was just happenstance that Go's hoarding was causing the problem in this example? Maybe another part (e.g. fragmentation) is usually the cause?

Incidentally, I was hoping that debug.WriteHeapDump() would be totally comprehensive and include all the info required, but that doesn't seem to be the case. At the moment I don't think it's possible to replicate pprof's heap dump using just debug.WriteHeapDump(). For starters, I don't think that its possible to determine object's type, nor any sign about where in the code they were created.

Detailed Explanation

In summary I think that the heap usage is built up in the following way:

  1. pprof's heap figure of 87.51MB needs to be on the heap at the given point in time. This is the theoretical minimum amount of heap memory that go-ipfs needs on a continuous basis.

However, go-ipfs has a relatively high 'allocation velocity' (i.e. its continually creating/destroying objects) and this adds some overhead between GC runs. In this case 13MB.

  1. MemStat's HeapAlloc value of 100.5MB includes the total amount of actual data on the heap at the moment, even though some is ready to be purged by the GC.

However, there's the memory fragmentation issue that was raised in this thread, and also Go's memory allocator is based on fixed size class chunks, which area always slightly bigger than the underlying data. In this case there is 80.2MB of wasted space.

  1. MemStat's HeapInuse value of 180.75MB includes the amount of this wasted space. It's the amount that the Go allocator strictly needs from the OS at this point in time.

But Go keeps hold of some other memory to reduce how often it needs to go to the OS to ask for more. In theory this is fine, but in this example there's really no need for it to keep hold of 151MB when the underlying memory requirement of go-ipfs is relatively constant. I suspect this is also due to the above-mentioned high allocation velocity.

  1. MemStat's HeapSys value of 332MB includes the amount that Go is holding on to even though we don't need it.

HOWEVER, Go tries to be nice about the memory its holding on to, and tries to tell the OS about memory its holding on to but isn't actually being used, so that the OS can release the physical RAM supporting it. In this case 82MB.

There are two issues with relying on this:

  1. This guy, and my own experience, Linux doesnt seem to pay much attention to it.
  2. Go only flags this after the memory has been unused for 5 minutes.

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Oct 3, 2018

So I've been playing with prometheus, and grafana, because go-ipfs is awesome and has prometheus built in!

I think that this graph confirms the 2nd item that I wanted to confirm. I.e. Go runtime's hoarding is the biggest contributor of heap waste.

I think that @Stebalien might've been right on the money in his first reply. go-ipfs is allocating bucketloads of short lived objects.

  • Blue area is the Go runtime's hoarding. (absolute value is HeapSys)
  • Yellow area is is fragmentation and sizeclass waste. (HeapInuse minus HeapAlloc)
  • Green area is objects, both live and ready to be GC'd (HeapAlloc)

screen shot 2018-10-03 at 13 05 20

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Oct 4, 2018

Progress:

  1. Confirmed that 'allocation velocity' is an issue. I did this by setting GOGC=10 (i.e. asking Go to garbage collect when heap is 10% higher than the heap at last GC) down from 100 and instantly saved >100MB of RAM. The downside is that the GC is now responsible for ~3% of CPU time instead of 0.3%. The more elegant solution to wasted memory is to reduce 'allocation velocity'

screen shot 2018-10-04 at 11 41 37

  1. Started to identify hotspots. peerstore.AddAddrs() is coming up a lot in my analyses. The changes made for Optimize for memory libp2p/go-libp2p-peerstore#15 were effective but they could be improved by minimising the allocations required.

Also, I just noticed #3318 about reducinggo-ipfs's memory usage. I will try and keep this issue related to measuring and understanding memory usage.

[edit]: Ignore the below. This doc gives more info on Pacer and it seems it's irrelevant. I think Pacer is only concerned with giving sufficient resources/time to the GC to prevent GOGC being breached.

3. Is Pacer a simpler solution? I thought that there were two ways to minimise the waste memory effects of high allocation velocity, GCing more often or rewriting code, but during my Googles I've come across this fantastic keynote from July.

This has left me with a question: Is the Pacer not doing the best managing the garbage that peerstore.AddAddrs() because of how go-ipfs uses that method? go-ipfs spawns many goroutines each of which calls the method once, and maybe this is preventing Pacer from properly managing the garbage this method creates? I don't understand Pacer enough to form a solid hypothesis on this.

@Stebalien
Copy link
Member

With respect to libp2p/go-libp2p-peerstore#15, I've just reverted that in libp2p/go-libp2p-peerstore#39 and introduced naive garbage collection of stale peer addresses. I'm hoping that'll help significantly (and that the GC will offset the additional memory usage) but we'll see.

I've also been fixing some issues with our multiaddrs:

It also looks like bitswap is causing quite a bit of allocation. We really need to get sessions #3786 working, that'll reduce the number of bitswap "wantlists" each peer gets.

@Stebalien
Copy link
Member

Also, this is truly amazing progress!

@rob-deutsch
Copy link
Contributor Author

We need to be slightly careful when looking at the above numbers. I had a suspicion, which I think I've confirmed, that Prometheus itself also has a high allocation velocity (when hitting it once a second).

In particular, its generating a lot of garbage in its calls to flate.NewWriter and MetricFamilyToText.

There are a few different ways of solving this. But my gut is telling me the simplest is to run Prometheus in a seperate process.

The alternative is to run our own seperate pprof heap scraper, but then we lose:

  1. All the other good metrics that Prometheus monitors
  2. Nice integration with Grafana

@Stebalien
Copy link
Member

That's... annoying. I wonder if we could optimize Prometheus. We could consider adding a metrics plugin interface for running stuff like this in a separate process but that's a non-trivial undertaking.

@rob-deutsch
Copy link
Contributor Author

I'm hesitant to start optimising Promotheus lest we end up saying "To optimise ipfs let's optimise Prometheus. To optimise Promotheus lets optimise [a dependancy]. Etc. Etc"

There might be a sneaky way to run Promotheus in a seperate tightly bound process, e.g. fork ipfs with an argument that kicks off a seperate codepath, but I don't have capacity to investigate this right now.

@sanderpick
Copy link
Contributor

@rob-deutsch - Apologizes for the off-topic question, but I'm doing some digging myself and am curious how you're visualizing / grokking the output of debug.WriteHeapDump. The methods outlined here seem to be out of date and there's very little info on the webs.

@rob-deutsch
Copy link
Contributor Author

@sanderpick - If I recall correctly, I wrote a small Go program to parse the heap dump according to this doc. I did not end up using this in my analysis, because it just didn't have the depth of information I wanted/needed.

My analysis here was done using runtime.MemStats, pprof, and Prometheus. go-ipfs had very nice integration of pprof and Prometheus.

@sanderpick
Copy link
Contributor

Okay cool, thanks for the info 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants