DArray : memory not fully recovered upon gc() #8912

amitmurthy · 2014-11-06T05:15:10Z

DArrays do not seem to be fully garbage collected.

context: https://groups.google.com/d/msg/julia-users/O8-Axv7wVZE/Xe-_8LIGKhAJ

amitmurthy · 2014-11-07T08:40:59Z

Starting with one julia worker, and executing the below multiple times,

a=remotecall(2, ()->ones(10^8));
a=1;gc(); Base.flush_gc_msgs();remotecall(2,gc);

I notice a leak the first time around but not on subsequent invocations of the remoetcall

However, if I change the remotecall to remotecall_fetch and execute

a=remotecall_fetch(2, ()->ones(10^8));
a=1;gc(); Base.flush_gc_msgs();remotecall(2,gc);

there is additional memory retained in the worker every time around. There is also an increase in the memory size of the master, but that is only for the first time.

I suspect there are two issues here:

I am pretty sure one is #6597, since both remotecall and remotecall_fetch result in a new task on the worker.

The other is probably #4508 / #6876

nowozin · 2014-11-11T10:18:47Z

I noticed another thing which may be related or a separate issue:

On Windows 8.1, 64 bit, Julia 0.4.0-dev+1318 7a7110b, when using distribute() to create a DArray over 12 cores from an array of size 100, it seems the full memory is used on each julia subprocess.

That is, I have a memory usage of 2.6GB in the process that calls distribute(A), where around 1.8GB are in the array A, and afterwards all 11 processes (using addprocs(11)), use the same amount of memory, 2.6GB that is. This was not the case three weeks ago on 0.4.0-dev.

Another symptom is that the distribute() call is much slower than before, and during the several minutes or so it is taking, only a few Julia processes are active. For example, initially the main Julia process uses 8% CPU (1/12), and one other Julia process is also using 8% CPU (1/12), but there are 10 processes which have 0% CPU activity. After a minute there are three processes, each using 8%, after another minute, there are four processes, etc. until all processes are active and 100% CPU is utilized. Then, the distribute() call returns.

This is very different from what I observed when originally developing the code a few weeks ago and I have not changed it since.

denizyuret · 2015-03-17T10:16:36Z

Here are a couple of more problem reports for reference:
https://groups.google.com/d/topic/julia-users/q39vyGQF4Fs/discussion
https://groups.google.com/d/topic/julia-users/zsT2qfwDuHA/discussion

As a workaround, removing all the workers (rmprocs(workers())) and restarting them (addprocs(ncpu)) every iteration seems to work.

I suspect the problem may go deeper than distributed arrays: In my case the distributed array is not that big, but the result I fetch from the workers (via pmap) is. The memory usage is consistent with those fetched values not being properly garbage collected. I will post a simple example to replicate if I can come up with one.

denizyuret · 2015-03-20T19:29:10Z

OK, here is the example:

mypid = ccall((:getpid, "libc"), Int32, ())
addprocs(10)
for i=1:10
    p=pmap(workers()) do x
        rand(1<<27)
    end
    p=nothing
    @everywhere gc()
    run(pipe(`ps auxww`,`grep $mypid`))
end

and here is the output (10GB memory added to master every iteration):

dyuret   31594  4.3  4.0 19759168 10685080 pts/2 Rl+ 21:19   0:12 julia
dyuret   31594  6.1  8.0 30292596 21218468 pts/2 Sl+ 21:19   0:17 julia
dyuret   31594  7.8 11.9 40782492 31708344 pts/2 Sl+ 21:19   0:23 julia
dyuret   31594  9.4 15.9 51276484 42202348 pts/2 Sl+ 21:19   0:28 julia
dyuret   31594 10.9 19.9 61762284 52688156 pts/2 Sl+ 21:19   0:33 julia
dyuret   31594 12.3 23.8 72248084 63173968 pts/2 Sl+ 21:19   0:39 julia
dyuret   31594 13.7 27.8 82733884 73659792 pts/2 Sl+ 21:19   0:44 julia
dyuret   31594 15.1 31.7 93219684 84145604 pts/2 Sl+ 21:19   0:49 julia
dyuret   31594 16.4 35.7 103705484 94631416 pts/2 Sl+ 21:19   0:54 julia
dyuret   31594 17.7 35.7 103705484 94631424 pts/2 Sl+ 21:19   0:59 julia

samuela · 2015-04-20T02:39:10Z

What's the status on this? Until we have a fix, this issue seems significant enough to warrant labeling DArrays as an experimental feature. At the very least I think something should be mentioned about this in the docs.

jiahao · 2015-04-20T02:46:41Z

@samuela DArrays no longer exist in base.

samuela · 2015-04-20T20:51:39Z

Oh, cool beans! Should this issue be closed then? What should be used in place of DArrays now?

pao · 2015-04-20T21:04:10Z

https://github.com/JuliaParallel/DistributedArrays.jl took over the DArray code.

I'm not sure where the actual issue lies, but since it's been crossreferenced by folks who should have some idea, I'll leave this open.

amitmurthy · 2015-04-23T03:15:30Z

After 3bbc5fc

mypid = getpid()
addprocs(10)
for i=1:200
    p=pmap(workers()) do x
        ones(10^7)
    end
    p=nothing
    @everywhere gc()
    run(pipe(`ps auxww`,`grep $mypid`))
end

I find that the master slowly grows to around 10GB of resident memory after 20 iterations which holds steady for the next 180 iterations. This 10GB is not released even after all the iterations complete.

@carnaval , @vtjnash any explanation for this?

vtjnash · 2015-04-23T04:39:12Z

it seems possible that gc() is just not getting called enough locally to propagate the gc messages, but I'm not sure that fully explains it.

jakebolewski · 2015-04-23T04:42:20Z

why would that impact memory usage on the master node?

amitmurthy · 2015-04-23T04:54:44Z

With

julia> using DistributedArrays

julia> for i in 1:100
         d=dones(2*10^8)
         a=convert(Array,d)
         @everywhere gc()
       end

top shows

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                             
 4459 amitm     20   0 19.787g 5.408g   2160 R  91.1 34.7   1:07.21 julia                                                                                               
 4467 amitm     20   0 14.397g 4.622g  10132 S  19.0 29.7   0:14.08 julia                                                                                               
 4468 amitm     20   0 13.001g 2.461g  10124 S  18.3 15.8   0:13.14 julia

with the master process varying between 30% to 40% of system memory and the workers between 15% to 30%

It is no longer a leak, the loop runs to completion, but at the end, the memory is not being released. Does libuv or the malloc implementation cache memory buffers anticipating future use?

I'll try and test the stuff on OSX later in the day to see if the behavior is limited to Linux.

vtjnash · 2015-04-23T05:02:00Z

libuv tries really hard not to allocate anything. but malloc and the julia gc will hold onto some amount of memory. 10GB sounds a bit high. although, i guess if there was something on every page, it would have trouble releasing the memory fully.

amitmurthy · 2015-04-30T03:31:45Z

@vtjnash , can you take a look at #10960 sometime? While debugging parallel code, it is a little difficult to proceed when there is no guarantee that finalizers on RemoteRefs are being called in all circumstances.

amitmurthy · 2015-06-06T12:50:12Z

closed by 6b94780

amitmurthy added the parallelism Parallel or distributed computation label Nov 6, 2014

JeffBezanson mentioned this issue Nov 25, 2014

Memory Error when creating large number of workers (using DArrray) #9149

Closed

amitmurthy mentioned this issue Dec 14, 2014

SharedArray not being garbage collected #9348

Closed

simonster mentioned this issue Jan 4, 2015

Distributed arrays are not garbage collected #9592

Closed

amitmurthy mentioned this issue Mar 30, 2015

DArray conversion uses 3x memory JuliaParallel/DistributedArrays.jl#11

Closed

timholy mentioned this issue Apr 14, 2015

pmap memory leak #10818

Closed

amitmurthy closed this as completed Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DArray : memory not fully recovered upon gc() #8912

DArray : memory not fully recovered upon gc() #8912

amitmurthy commented Nov 6, 2014

amitmurthy commented Nov 7, 2014

nowozin commented Nov 11, 2014

denizyuret commented Mar 17, 2015

denizyuret commented Mar 20, 2015

samuela commented Apr 20, 2015

jiahao commented Apr 20, 2015

samuela commented Apr 20, 2015

pao commented Apr 20, 2015

amitmurthy commented Apr 23, 2015

vtjnash commented Apr 23, 2015

jakebolewski commented Apr 23, 2015

amitmurthy commented Apr 23, 2015

vtjnash commented Apr 23, 2015

amitmurthy commented Apr 30, 2015

amitmurthy commented Jun 6, 2015

DArray : memory not fully recovered upon gc() #8912

DArray : memory not fully recovered upon gc() #8912

Comments

amitmurthy commented Nov 6, 2014

amitmurthy commented Nov 7, 2014

nowozin commented Nov 11, 2014

denizyuret commented Mar 17, 2015

denizyuret commented Mar 20, 2015

samuela commented Apr 20, 2015

jiahao commented Apr 20, 2015

samuela commented Apr 20, 2015

pao commented Apr 20, 2015

amitmurthy commented Apr 23, 2015

vtjnash commented Apr 23, 2015

jakebolewski commented Apr 23, 2015

amitmurthy commented Apr 23, 2015

vtjnash commented Apr 23, 2015

amitmurthy commented Apr 30, 2015

amitmurthy commented Jun 6, 2015