-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
close #8269: grow arrays incrementally by factors of 1.5 rather than 2 #16305
Conversation
Should there be some docs about this? I don't see anything in the current docs about the efficiency of functions like |
@JaredCrean2, good point; I added some docs to |
base/docs/helpdb/Base.jl
Outdated
@@ -1406,6 +1406,10 @@ julia> push!([1, 2, 3], 4, 5, 6) | |||
6 | |||
``` | |||
|
|||
Internally, the storage for the array is increased exponentially (by 50%) | |||
as needed, so that calling `push!` ``n`` times on an empty array | |||
involves only ``O(\log n)`` allocations and ``O(n)`` time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be \\log n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
Is there a sensible way to try factors for the growth other than 2 and 1.5? |
@TotalVerb, probably most C compilers do this, but C programmers traditionally read |
@dpsanders, you can try anything you want. Honestly, I doubt the cost of the arithmetic is significant here, so you could just try |
Okay, I ran a benchmark of There is a clear penalty for making the growth factor too small, and if anything the optimum growth factor below 2 is around 1.7. However, I would take small differences in timing here with a large grain of salt, because it depends on things like our garbage-collection algorithm (not to mention on the hardware, in this case my 2014 2.15GHz i7 laptop). 1.5 still seems like a reasonable compromise, minimizing over-allocation without sacrificing much if any performance. |
I always found that https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md gives a nice reasoning for choosing 1.5 |
Interesting. The statement from https://blog.mozilla.org/nnethercote/2014/11/04/please-grow-your-buffers-exponentially/ cited
is misleading, "Forever" here is only three allocations as 1 + ϕ^1 = ϕ^2 . The next interesting number is χ = 1.8392867552 = "golden number corresponding to the tribonacci sequence" with 1 + χ + χ^2 = χ^3 which allows reuse after four steps. Choosing 1.75 (a bit smaller than 1.83 to allow for book keeping) also makes sense. Edit: Whether 1 + χ + χ^2 = χ^3 or 1 + χ + χ^2 = χ^4 is relevant depends on the implementation of realloc. |
Specifically, it'll be good to see how the benchmarks in the @nanosoldier |
Reading the most recent status made me realize that @nanosoldier |
@jrevels, that is cool stuff. ( |
Technically, shouldn't it compare to the |
If the merge can performed automatically (without conflicts), then it will use the merge commit of the PR and compare it against whatever commit you specified with |
@jrevels, any way to see the full error log on nanosoldier? Hopefully the failure is not the fault of this PR since Travis and Appveyor passed? |
Not at this time, unfortunately. It seems Nanosoldier just had trouble posting the final "job completed" comment, it wasn't the fault of this PR. Weird, since some other jobs just finished and the comments were correctly posted there...seems I still have some kinks to work out since the most recent refactor. Besides that last comment error, the entire job completed and the report was uploaded here. You can play with these same benchmarks locally using BaseBenchmarks.jl. |
It is relevant to note that all the growth benchmarks have a final size of 2^k. |
Doesn't the theoretical argument for the golden ration being optimal plus your benchmarks showing that the optimal ratio is around 1.6 argue pretty strongly for the golden ratio actually being a good choice? |
@StefanKarpinski The conclusion of the theoretical arguments really depends on whether the new resized array should fit into the space occupied by all previous resized arrays excluding or including the space occupied by the current array. If it is excluding, then 1.6... is not optimal, but to the contrary the smallest factor for which all previous sizes excluding the current one never is enough. Took me also a moment. I would go with @stevengj 's dictum "But you never want to trust simple performance models too much without checking them on actual hardware, which is almost always vastly more complicated than the theory." |
Good point, it would be better to test more varied sizes...in the meantime, it's probably worth running the whole suite here to test all of our application-based benchmarks (the previous run was only the @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Isn't "checking them on actual hardware" exactly what @stevengj did? |
Looks like the nanosoldier is not happy with this. |
Simulations are good, it is the theoretical arguments which do not give clear answers. By the way, @stevengj, I think
would be a relevant test: to have at least one array which is not at the top of the heap where growing to the right/upwards is easy. |
@StefanKarpinski, if you take my benchmark as gospel, then 1.7 is even better. However:
|
Do we want to revisit this at some point? |
Growing an array to a size of 10^8 must be rare. The common case is probably growing thousands of arrays to sizes of 10^3 or 10^5. I'd benchmark these cases as well before making a decision. |
Yes, but I should emphasize that it's not completely a question of performance (and there is bound to be some variability between different benchmarks); there's also a space/time tradeoff here. |
If the array is large, and on a 64-bit system, the additional space should only be "address space", not "allocated memory". With paging, the system will also be able to reuse previously allocated memory since fragmentation is (in principle) limited to the OS page size, and (in practice) to 128 kByte or so, for GNU malloc. |
Rebased, re-running benchmarks to see if anything has changed: @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Something worth noting is that none of these have a memory difference of more than 10% in phi's favor. To me, this seems that either our theory is really wrong, or something is messed up with the implementation. (or allocated but unused memory isn't included. Is that true, because if so, it should probably be fixed). |
Note that there shouldn't be any expected memory saving in the benchmarks that has repeated array growing at all since those numbers does not take into account the memory reuse. |
Bump. Seems like the decision is to use 1.5 as a good compromise, any reason this got stale? |
Probably worth to try again after four years. |
Rebased. As @JeffBezanson commented in #32035 (comment), we may want to grow by a larger factor for small-size arrays until the length reaches some threshold. |
But we could always close this PR in favor of #32035. |
closing this in favor of #32035 |
As discussed in #8269, many authors seem to consider this to be a good idea. The patch seems trivial.
I tried a quick benchmark, and the performance
difference was barely measurable (if anything, a 1% slowdown)(whoops, screwed up the patch) was _3–15% faster_ after this patch.It's apparently faster in both theory and practice, and wastes less memory. Seems worthwhile.