Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM despite --heap-size-hint #50658

Closed
ufechner7 opened this issue Jul 24, 2023 · 15 comments
Closed

OOM despite --heap-size-hint #50658

ufechner7 opened this issue Jul 24, 2023 · 15 comments
Labels
GC Garbage collector

Comments

@ufechner7
Copy link

ufechner7 commented Jul 24, 2023

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 4 on 8 virtual cores
Environment:
  JULIA_CONDAPKG_OFFLINE = yes

I often see that my code is killed due to out-of-memory. This happens when using pmap, but also when running single threaded single process code that allocates a lot repeatedly from the repl. I tried to add --heap-size-hint, but it did not help.

My workaround: I added the following code to all functions that allocate a lot:

if Sys.free_memory()/2^30 < 6.0
    GC.gc()
end

This should not be needed, the garbage collector should do a full collection before the system runs out of memory on its own.

@jishnub
Copy link
Contributor

jishnub commented Jul 24, 2023

Could you post a small example that leads to the error? This would help a lot in narrowing the issue down

@elextr
Copy link

elextr commented Jul 25, 2023

Possibly duplicate of #42566, see from here on.

@ufechner7
Copy link
Author

Could you post a small example that leads to the error? This would help a lot in narrowing the issue down

I happens reproducible with my production code, but I am not allowed to share it... So far it did not happen with the smaller code examples I tried, I will continue to try to create an MWE...

@ufechner7
Copy link
Author

Possibly duplicate of #42566, see from here on.

But in #42566 they say that "GC.gc(true); GC.gc() Does not fix it."

But for me GC.gc() frees the unreleased memory. So it might be a different issue.

@elextr
Copy link

elextr commented Jul 25, 2023

But for me GC.gc() frees the unreleased memory. So it might be a different issue.

Indeed, if manually running GC stops the OOM killer bumping your process off, then the problem is likely not failing to return freed memory to the system, but how GC knows that OOM is approaching and so can work harder to collect unreferenced memory. IIRC there are several Julia issues about that, but of course my search for them is failing just now.

@vchuravy
Copy link
Member

What is --heap-size-hint you set?

@vchuravy vchuravy changed the title Garbage collector not working OOM despite --heap-size-hint Jul 25, 2023
@vchuravy vchuravy added the GC Garbage collector label Jul 25, 2023
@ufechner7
Copy link
Author

ufechner7 commented Jul 25, 2023

What is --heap-size-hint you set?

julia -J bin/kps-image-1.9.so --project -i -q -p 16 --heap-size-hint=1G

And I have 32 G memory.

@oscardssmith
Copy link
Member

It would be good to see if this is happening on recent julia nightlies. @gbaraldi's recent GC logic changes should have fixed this.

@elextr
Copy link

elextr commented Jul 26, 2023

Just a note that the OOM killer is activated by the total memory of your cgroup IIUC, not just the parent, so would likely include any worker process memory usage as well as the parent process.

Does --heap-size-hint propagate to the workers?

@vchuravy
Copy link
Member

How big is bin/kps-image-1.9.so? Or after just starting Julia how much memory does ps aux say you are using?

--heap-size-hint is currently not strict, and only measures the live heap and not sysimage/shared libraries etc.

@MilesCranmer
Copy link
Member

@elextr Does --heap-size-hint propagate to the workers?

I also noticed this in a different context. It seems like the interaction between processes and heap-size-hint is not yet defined (?). I posted an issue here: #50673.

@MilesCranmer
Copy link
Member

@oscardssmith can you link the PRs you mentioned?

@oscardssmith
Copy link
Member

#50144

@ufechner7
Copy link
Author

ufechner7 commented Jul 26, 2023

How big is bin/kps-image-1.9.so? Or after just starting Julia how much memory does ps aux say you are using?

--heap-size-hint is currently not strict, and only measures the live heap and not sysimage/shared libraries etc.

ufechner@ufryzen:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            30Gi        11Gi        12Gi        74Mi       6,7Gi        18Gi
Swap:          1,9Gi          0B       1,9Gi

and in ps aux 16 times:

ufechner   10971  3.5  3.5 2397148 1143096 ?     Ssl  08:18   0:07 /home/ufechner/packages/julias/julia-1.9/bin/julia -Cnative -J/home/ufechner/repos/WindTurbines/bin/kps-image-1.9.so -g1 --bind-to 127.0.0.1 --worker

and

ufechner@ufryzen:~/repos/WindTurbines/bin$ ls -lah kps-image-1.9.so 
-rwxrwxr-x 1 ufechner ufechner 808M jul 24 11:48 kps-image-1.9.so

@vtjnash
Copy link
Member

vtjnash commented Feb 11, 2024

We have updated the heuristics more, so the GC should try harder to avoid exceeding this memory limit. We, however, don't control how much memory is required by external libraries (e.g. LLVM) so we expect precompile to take substantial amounts of memory and only possible to do on large build machines as a requirement for building (but not running).

@vtjnash vtjnash closed this as completed Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

No branches or pull requests

7 participants