-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have jl_realloc_aligned /try/ realloc, then fallback to malloc if not aligned #32320
base: master
Are you sure you want to change the base?
Conversation
… aligned Since there isn't a system realloc_aligned on non-windows systems, we will simply call realloc, and hope that it will either manage to grow the existing allocation, or if it had to move it, hope that the new allocation is correctly aligned. If not, we will manually redo the allocation via (malloc_aligned, copy, free). Note that this makes growing large arrays drastically faster: ``` $ julia-master -e 'b = collect(1:2^30); @time push!(b, 1);' 5.433267 seconds (121 allocations: 327.686 MiB, 0.02% gc time) $ ./julia -e 'b = collect(1:2^30); @time push!(b, 1);' 0.003441 seconds (128 allocations: 327.689 MiB, 31.64% gc time) Down to 0.003441 seconds from the previous 5.433267 seconds to double an array of 2^30 integers. :)
I want to add some benchmarks and get some good proper performance testing in before we merge this, but the initial results seem promising! It's easy, clean, and seems like a reasonable improvement. :) |
Could we try only using this for large arrays? I.e. in form of a Reason is that
|
Mmm bummer. Yeah that seems like a reasonable approach to me... But then the trouble is picking what that cutoff should be? |
One possibility is to go with glibc cutoffs and see whether other users complain. That would be e.g. here, i.e. 32 MB. Afaik windows is perfectly capable of aligned realloc, so there is no issue there. Any apple or bsd users to chime in? A somewhat ugly approach could be to have some global variable that tracks the largest observed realloc alignment failure, and use malloc+memcopy for everything below this. Checking against this is cheap (as long as we don't share a cache line with anything that is frequently written in multithreaded contexts). The actual limits are mostly unpredictable (glibc uses a dynamic threshold), but that way we would adapt to whatever our allocator/OS does. |
Any updates on this? Inefficient |
@NHDaly, for me actually worse timing before this change (before killed) on (and "100% of which was recompilation" is simply wrong?!):
I have "10060,6 free" in top. |
Since there isn't a system realloc_aligned on non-windows systems, we
will simply call realloc, and hope that it will either manage to grow
the existing allocation, or if it had to move it, hope that the new
allocation is correctly aligned. If not, we will manually redo the
allocation via (malloc_aligned, copy, free).
Note that this makes the growth event on large arrays drastically faster:
Down to 0.003441 seconds from the previous 5.433267 seconds to double an
array of 2^30 integers. :)
(Of course, if you're growing an array from empty, one element at a time, this cost
would be amortized over all those insertions, so for such a task, this should end
up being around a 2x improvement.)
This is the solution to problem (2.) of #28588.