Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection is not working in this case on linux #25884

Closed
atbug opened this issue Feb 5, 2018 · 19 comments
Closed

Garbage collection is not working in this case on linux #25884

atbug opened this issue Feb 5, 2018 · 19 comments
Labels
GC Garbage collector

Comments

@atbug
Copy link

atbug commented Feb 5, 2018

This issue is also posted on https://discourse.julialang.org/t/garbage-collection-is-not-working-in-this-case-on-linux/8798 but is not getting any attention. Since this is possibly a bug, I am reposting it here:

I have a minimal working example here:

# This file is saved as ts.jl
struct NoTBModel
    norbits::Int64
    positions::Dict{Vector{Int64},Vector{Matrix{Float64}}}
end


function NoTBModel(norbits::Int64)
    return NoTBModel(
        norbits,
        Dict{Vector{Int64},Vector{Matrix{Float64}}}()
        )
end

function populateR(nm::NoTBModel, R::Vector{Int64})
    nm.positions[R] = [zeros(nm.norbits, nm.norbits) for i in 1:3]
    nm.positions[-R] = [zeros(nm.norbits, nm.norbits) for i in 1:3]
end

function createmodel()
    cellindex = [
        0   0   0;
       -1   0  -2;
       -1   0  -1;
       -1   0   0;
       -1   0   1;
       -1   0   2;
       -1   1  -1;
       -1   1   0;
        0  -1  -2;
        0  -1  -1;
    ]

    # cellindex = rand(Int64, 10, 3) # uncommenting this line solves the problem.

    nm = NoTBModel(
        2000,  # It seems changing 2000 to 2500 also solves the problem.
    )
    for i in 1:10
        populateR(nm, cellindex[i, :])
    end
    return nothing
end

Then in REPL, I type

julia> include("ts.jl")
createmodel (generic function with 1 method)

julia> createmodel()

julia> gc()

However, the memory usage of NoTBModel created in createmodel function is not released. Strangely, changing only one line of the code (see the comment in the code) solves the problem. I tracked the usage of memory through top in linux. This issue does NOT happen in Windows.

Tested on

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40* (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)

and

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
@ararslan
Copy link
Member

ararslan commented Feb 5, 2018

Try running gc() multiple times. For example, the BenchmarkTools package internally uses a function that runs gc() four times in a row to ensure everything is garbage collected.

@atbug
Copy link
Author

atbug commented Feb 5, 2018

@ararslan I tried. It didn't work.

@yuyichao
Copy link
Contributor

yuyichao commented Feb 5, 2018

What's the values you see and what value did you read (edit: from top)?

@atbug
Copy link
Author

atbug commented Feb 5, 2018

Before uncommenting that line: (cellindex = rand(Int64, 10, 3) as described in the OP)
image
After uncommenting that line: (No issue here)
image

@atbug
Copy link
Author

atbug commented Jun 6, 2018

Confirmed again on 0.7.0-alpha.

@atbug
Copy link
Author

atbug commented Sep 22, 2018

Confirmed again on 1.0.0

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

@nalimilan
Copy link
Member

FWIW, in my experience the same kind of thing happens with R, which releases memory much better on Windows than on Linux. See https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14611 for a more detailed explanation by R developers.

@atbug
Copy link
Author

atbug commented Oct 10, 2018

@ararslan I understand your claim. However, it would still be too miraculous since uncommenting the line cellindex = rand(Int64, 10, 3) can change the behaviour significantly, as explained in #25884 (comment)

@jebej
Copy link
Contributor

jebej commented Oct 11, 2018

As an extra data point, on Windows 7 and julia 1.01, the memory does get released after a single GC.gc() call (mem usage goes from 2GB back down to 200MB).

@chethega
Copy link
Contributor

As an extra datapoint, I cannot reproduce on linux either, i.e. the memory gets released after a single GC.gc() call. I tested on both master and archlinux binaries:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-5###U CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)

@Mistguy Did you use a source build or a binary distribution?

@nalimilan
Copy link
Member

I see it here on Fedora 28 (kernel 4.18.7, glibc 2.27) both with a recent git master and 1.0.1 (official binaries). Memory usage goes down from 1.9GB to 1.7GB after calling GC.gc(), but it doesn't go below that.

What's your Linux kernel and glibc version?

Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

@atbug
Copy link
Author

atbug commented Oct 12, 2018

@chethega I am using the official binary, on linux kernel 4.4.155.

@atbug atbug closed this as completed Oct 12, 2018
@atbug atbug reopened this Oct 12, 2018
@nalimilan
Copy link
Member

@chethega Since you say you cannot reproduce the problem, can you give more details on your system? Could other people try on Linux too?

BTW, there are news on the R bug I had mentioned (see also linked thread). Apparently glibc's malloc doesn't allow the kernel to release memory, but alternative malloc implementation like jemalloc are much better. Calling malloc_trim during GC can also help. See also how Python handles this.

@chethega
Copy link
Contributor

I tried again.

Procedure: copy-paste your setup code into the REPL. Run createmodel() a couple of times. Run GC.gc() a couple of times. Repeat these operations (also run createmodel() multiple times in a row without manual collection in between). Check output of $top to see memory consumption: Either RES is about 1.8g or RES is very small. After createmodel() and before GC.gc(), memory consumption is always high (unsurprisingly). Memory consumption never increases with subsequent calls of createmodel(), so there is no memory leak; the only question is whether the memory is returned to the kernel or kept by julia.

Results: Memory return to the kernel is unreliable. Sometimes julia returns the memory to the kernel, but sometimes it gets stuck and keeps the memory. The memory is used for subsequent allocations (also allocs that are not by createmode()). Is this is an intended heuristic? (always returning memory to the kernel and requesting fresh memory is expensive!)

I noticed that subsequent larger allocations (e.g. a single 2g buffer) don't reuse the memory. If we keep the page mappings then this is unavoidable (because the new buffer must be contiguous), but we could remap the old pages. I think that is the job of the libc, though?

System: Archlinux with Julia Version 1.0.1 (official archlinux binary) and Julia Version 1.1.0-DEV.678 and:

$ uname -rvm
4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 22:06:45 UTC 2018 x86_64
$ ldd --version
ldd (GNU libc) 2.28

@mauro3
Copy link
Contributor

mauro3 commented Mar 17, 2019

On Arch (4.20.8-arch1-1-ARCH #1 SMP PREEMPT Wed Feb 13 00:06:58 UTC 2019 x86_64), I can reproduce it on Julia 0.6 but not on Julia 1.1. (Aside, here a Ruby-blog-post (posted on Slack by Stefan) which looks at using malloc_trim, which was mentioned above.)

@nalimilan
Copy link
Member

Interesting! Has anything changed since 0.6 that could explain the difference? Have you tried with official binaries, custom builds, or Arch packages?

I wonder whether the Julia GC call malloc_trim in some cases.

@mauro3
Copy link
Contributor

mauro3 commented Mar 17, 2019

These are both source builds. I built Julia-0.6 on 19 Feb (so after the last system-update) and Julia 1.1 on 22 Jan (so before the last system-update).

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 17, 2019

Related issue in Ruby:

https://www.joyfulbikeshedding.com/blog/2019-03-14-what-causes-ruby-memory-bloat.html

Seems probable that we should be calling malloc_trim sometimes.

@ViralBShah
Copy link
Member

Closing as dup of #30653

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

No branches or pull requests

9 participants