-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback if we have make a weird GC decision. #50682
Conversation
I will do a follow up PR to refactor the GC num stuff because there is currently an accounting bug somewhere and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The currently unreliable live_bytes
metric seems to be still in use for deciding when to do a full sweep (among other criteria):
if (live_bytes > max_total_memory) {
sweep_full = 1;
}
Edit: Without a heap-size-hint, cgroup restriction or the change I suggested below this would currently never trigger since the max memory is 2PB by default. But with my change this might cause the GC to always do full sweeps when when live_bytes
is unreasonably big due to the miscount.
@@ -3612,10 +3619,10 @@ void jl_gc_init(void) | |||
total_mem = uv_get_total_memory(); | |||
uint64_t constrained_mem = uv_get_constrained_memory(); | |||
if (constrained_mem > 0 && constrained_mem < total_mem) | |||
total_mem = constrained_mem; | |||
jl_gc_set_max_memory(constrained_mem - 250*1024*1024); // LLVM + other libraries need some amount of memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to set the max memory based on total_mem
as well, when it is non-zero and constrained_mem
is not set? Maybe like this:
jl_gc_set_max_memory(constrained_mem - 250*1024*1024); // LLVM + other libraries need some amount of memory | |
total_mem = constrained_mem; | |
if (total_mem > 0) | |
jl_gc_set_max_memory(total_mem - 250*1024*1024); // LLVM + other libraries need some amount of memory |
Currently the maximum without a heap hint is very large:
julia> Base.format_bytes(@ccall jl_gc_get_max_memory()::UInt64)
"2.000 PiB"
(since uv_get_constrained_memory()
returns something like "8192.000 PiB" without any cgroup restrictions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats the idea, if you don't set an explicit hint we will just follow the heap resizing algorithm.
So increase the amount for 32 bit and tests
is this ready to merge? |
I believe finally yes! |
#endif | ||
if (jl_options.heap_size_hint) | ||
jl_gc_set_max_memory(jl_options.heap_size_hint); | ||
jl_gc_set_max_memory(jl_options.heap_size_hint - 250*1024*1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to protect this from underflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jl_gc_set_max_memory
protects from underflow, but it doesn't warn the user. Maybe it should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to warn then either
If something odd happens during GC (the PC goes to sleep) or a very big transient the heuristics might make a bad decision. What this PR implements is if we try to make our target more than double the one we had before we fallback to a more conservative method. This fixes the new issue @vtjnash found in #40644 for me. (cherry picked from commit ab94fad)
Backported PRs: - [x] #50637 <!-- Remove SparseArrays legacy code --> - [x] #50665 <!-- print `@time` msg into print buffer --> - [x] #50523 <!-- Avoid generic call in most cases for getproperty --> - [x] #50635 <!-- `versioninfo()`: include build info and unofficial warning --> - [x] #50670 <!-- Make reinterpret specialize fully. --> - [x] #50666 <!-- include `--pkgimage=no` caches for stdlibs --> - [x] #50765 - [x] #50764 - [x] #50768 - [x] #50767 - [x] #50618 <!-- inference: continue const-prop' when concrete-eval returns non-inlineable --> - [x] #50689 <!-- Attach `tanpi` docstring to method --> - [x] #50671 <!-- Fix rdiv of complex lhs by real factorizations --> - [x] #50598 <!-- only limit types in stack traces in the REPL --> - [x] #50766 <!-- Don't partition alwaysinline functions --> - [x] #50771 <!-- re-allow non-string values in ENV `get!` --> - [x] #50682 <!-- Add fallback if we have make a weird GC decision. --> - [x] #50781 <!-- fix `bit_map!` with aliasing --> - [x] #50172 <!-- print feature flags used for matching pkgimage --> - [x] #50844 <!-- Bump OpenBLAS binaries to use the new GEMM multithreading threshold --> - [x] #50826 <!-- Update dependency builds --> - [x] #50845 <!-- fix #50438, use default pool for at-threads --> - [x] #50568 <!-- `Array(::AbstractRange)` should return an `Array` --> - [x] #50655 <!-- fix hashing regression. --> - [x] #50779 <!-- Minor refactor to image generation --> - [x] #50791 <!-- Make symbols internal in jl_create_native, and only externalize them when partitioning --> - [x] #50724 <!-- Merge opaque closure modules with the rest of the workqueue --> - [x] #50738 <!-- Add alignment to constant globals --> - [x] #50871 <!-- macOS: Don't inspect dead threadtls during exception handling. --> Need manual backport: Contains multiple commits, manual intervention needed: Non-merged PRs with backport label: - [ ] #50850 <!-- Remove weird Rational dispatch and add pi functions to list --> - [ ] #50823 <!-- Make ranges more robust with unsigned indexes. --> - [ ] #50809 <!-- Limit type-printing in MethodError --> - [ ] #50663 <!-- Fix Expr(:loopinfo) codegen --> - [ ] #50594 <!-- Disallow non-index Integer types in isassigned --> - [ ] #50385 <!-- Precompile pidlocks: add to NEWS and docs --> - [ ] #49805 <!-- Limit TimeType subtraction to AbstractDateTime -->
If something odd happens during GC (the PC goes to sleep) or a very big transient the heuristics might make a bad decision. What this PR implements is if we try to make our target more than double the one we had before we fallback to a more conservative method. This fixes the new issue @vtjnash found in JuliaLang#40644 for me.
If something odd happens during GC (the PC goes to sleep) or a very big transient the heuristics might make a bad decision. What this PR implements is if we try to make our target more than double the one we had before we fallback to a more conservative method. This fixes the new issue @vtjnash found in #40644 for me.
If something odd happens during GC (the PC goes to sleep) or a very big transient the heuristics might make a bad decision. What this PR implements is if we try to make our target more than double the one we had before we fallback to a more conservative method. This fixes the new issue @vtjnash found in #40644 for me.