-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: GC freeing goroutines memory but then taking it again #8832
Comments
When will this issue be fixed? It happens on our production project too. |
It's targeted for Go 1.5. Any updates will occur on this thread. |
I'm trying to get a sense of the use case here. You have 100,000 goroutines, could you indicate
I ask this since the answers will help dictate the best solution. An optimal solution for a goroutine |
I don't see the behavior the OP sees at tip. The RSS drops to 46MB and stays there. go version devel +f8176f8 Mon Feb 9 18:20:28 2015 +0000 linux/amd64 |
It is still there. Slower, but I see it going up to 56 MB before I stopped it, in about half an hour. |
So it is, if you wait long enough (for me, 16 minutes after the free) it starts growing. Very strange. The runtime doesn't allocate a single MSpan during that time, so I'm inclined to think this is a bug in linux/madvise. But I haven't been able to reproduce it outside the Go runtime. I'll keep looking. |
Could it because somehow our GC tries to scan or otherwise
read freed MSpans?
|
I suppose it is possible, but I don't see how that doesn't happen in the first 16 minutes and then starts happening thereafter. Lots of GCs happen in that window. |
I replaced the madvise call with munmap. It fixed the problem and the program never crashed. That tells us that the runtime never touches those pages after it is done with them, and that the bug is probably in madvise. Or munmap is buggy as well... |
Have you changed sysUsed? How? |
No, I have not changed sysUsed. If the program tried to use any sysUnused page again, it would fault. Can't be checked in, of course, but it's fine for debugging this problem. |
I do not see this bug on my home Ubuntu box $ uname -a Might be something os-specific. siritinga, what's your machine running? |
On an amazon ec2 instance (c4.xlarge, Ubuntu Server 14.04 LTS), the memory comes back really fast, about 4 minutes after it gets madvised: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Linux ip-172-30-0-129 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Still don't know what is going on. |
I don't have access right now to the computer where I did the test to check the kernel version, but it is a Ubuntu 14.04 TLS 64 bits with the latests updates applied. |
Update #8832 This is probably not the root cause of the issue. Resolve TODO about setting unusedsince on a wrong span. Change-Id: I69c87e3d93cb025e3e6fa80a8cffba6ad6ad1395 Reviewed-on: https://go-review.googlesource.com/4390 Reviewed-by: Keith Randall <[email protected]>
I have successfully reproduced this outside of Go. I hacked the runtime to log every time it does an mmap, munmap, or madvise. A little C program replays the log and exhibits the same behavior. Part of the problem may be the number of madvise calls. There are over 4000 of them in this example. The resulting heap has lots of 1 page in-use spans interspersed with small (1-10 pages typically) not-in-use spans. I'm not sure why our in-use pages are so scattered. I'll look into that, it is not the root cause of this bug but may be contributing. I'll see if I can send the extracted example to linux folks. |
Testing on my system (ubuntu linux 14.04.1) fails to grow at all, stupid Dumb question, is swap disabled on the systems that you see this behaviour On Thu, Feb 12, 2015 at 8:25 AM, Keith Randall [email protected]
|
Nice repro. Wacky, but nice. I guess we can leave this issue open until we hear back from the kernel folks to see whether there is something we can do to avoid the odd behaviour. |
Both. Swap is disabled on the ec2 instance where this happens quickest. It also happens on my laptop where swap is enabled, although it takes a lot longer (2 min vs. 45 min). |
Kernel bug report: https://bugzilla.kernel.org/show_bug.cgi?id=93111 |
Patch is out to work around the kernel bug. The other part of this issue is why our heap is so fragmented in the first place. It deserves its own issue, #9869. |
CL https://golang.org/cl/15191 mentions this issue. |
This fixes an issue where the runtime panics with "out of memory" or "cannot allocate memory" even though there's ample memory by reducing the number of memory mappings created by the memory allocator. Commit 7e1b61c worked around issue #8832 where Linux's transparent huge page support could dramatically increase the RSS of a Go process by setting the MADV_NOHUGEPAGE flag on any regions of pages released to the OS with MADV_DONTNEED. This had the side effect of also increasing the number of VMAs (memory mappings) in a Go address space because a separate VMA is needed for every region of the virtual address space with different flags. Unfortunately, by default, Linux limits the number of VMAs in an address space to 65530, and a large heap can quickly reach this limit when the runtime starts scavenging memory. This commit dramatically reduces the number of VMAs. It does this primarily by only adjusting the huge page flag at huge page granularity. With this change, on amd64, even a pessimal heap that alternates between MADV_NOHUGEPAGE and MADV_HUGEPAGE must reach 128GB to reach the VMA limit. Because of this rounding to huge page granularity, this change is also careful to leave large used and unused regions huge page-enabled. This change reduces the maximum number of VMAs during the runtime benchmarks with GODEBUG=scavenge=1 from 692 to 49. Fixes #12233. Change-Id: Ic397776d042f20d53783a1cacf122e2e2db00584 Reviewed-on: https://go-review.googlesource.com/15191 Reviewed-by: Keith Randall <[email protected]>
CL https://golang.org/cl/16980 mentions this issue. |
…age granularity This fixes an issue where the runtime panics with "out of memory" or "cannot allocate memory" even though there's ample memory by reducing the number of memory mappings created by the memory allocator. Commit 7e1b61c worked around issue #8832 where Linux's transparent huge page support could dramatically increase the RSS of a Go process by setting the MADV_NOHUGEPAGE flag on any regions of pages released to the OS with MADV_DONTNEED. This had the side effect of also increasing the number of VMAs (memory mappings) in a Go address space because a separate VMA is needed for every region of the virtual address space with different flags. Unfortunately, by default, Linux limits the number of VMAs in an address space to 65530, and a large heap can quickly reach this limit when the runtime starts scavenging memory. This commit dramatically reduces the number of VMAs. It does this primarily by only adjusting the huge page flag at huge page granularity. With this change, on amd64, even a pessimal heap that alternates between MADV_NOHUGEPAGE and MADV_HUGEPAGE must reach 128GB to reach the VMA limit. Because of this rounding to huge page granularity, this change is also careful to leave large used and unused regions huge page-enabled. This change reduces the maximum number of VMAs during the runtime benchmarks with GODEBUG=scavenge=1 from 692 to 49. Fixes #12233. Change-Id: Ic397776d042f20d53783a1cacf122e2e2db00584 Reviewed-on: https://go-review.googlesource.com/15191 Reviewed-by: Keith Randall <[email protected]> Reviewed-on: https://go-review.googlesource.com/16980 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Russ Cox <[email protected]>
The text was updated successfully, but these errors were encountered: