-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array push!
seems to grow amortized O(N)...
#28588
Comments
general array-insert is |
This is normal |
So... this is probably an orthogonal, unrelated problem, but @JeffBezanson and I have found that We're manually doing a Lines 840 to 844 in a2149f8
This seems to be because there doesn't exist a reasonable method to realloc aligned data, so we have to manually re-do the That's especially true when the arrays get huge -- because in that case it's pretty hard to imagine realloc not producing an aligned memory region, so it's probably worth the potential double allocation. EDIT: This would make it slower, but should be an amortized constant increase, not an order of magnitude increase. |
push!
seems to grow amortized O(N)...
Sorry, the problem description wasn't clear. yes, this is |
This is still an |
Yeah, agreed. That's what i meant by this is orthogonal. I shouldn't have qualified that with
Huh yeah that's interesting, i'm interested to hear more about that, but that makes sense. |
I still think we should use the technique of calling |
(I'm happy to branch this conversation to another thread, though, if we want?) |
So, to continue the original conversation here. We discovered the actual problem. To summarize what we discussed offline at JuliaCon:
However, upon more investigation with Valentin, we think that's not quite right. The behavior we noticed above should actually be summarized as follows:
And that actually seems somewhat reasonable! That change was introduced here: Now, despite that being significantly more reasonable than we originally thought, I still think that might be somewhat related to the performance described above. It still means that above a certain size, the arrays grow constantly -- just with a much larger constant growth rate than growing by 1. So I think that would still result in an exponential growth of the total cost to insert N elements? So this still bears some thinking about: is that acceptable? |
BUT THEN, something else has changed that introduced a significant slowdown in array growth inbetween julia 0.4 and 0.5. As far as I can tell, it's a constant-slowdown of around 1.7x. We see a pretty big slowdown between 0.4 and 0.5, and it's not related to the above change; the above change was introduced 5 years ago, in julia 0.3. Here's the plots comparing 0.4-0.7: In particular, here's the change between 0.4 and 0.5 for 0.4 looks deceivingly flat, but it actually has the same shape, just constant-scaled down: So all of them suffer from this same problem of greater-than- |
Sorry, and just to clarify, the v0.4 to v0.5 regression seems to be always around 2x: julia0.7> v05[2] ./ v04[2]
>> 6-element Array{Float64,1}:
2.048994351707916
1.7865544298045446
1.1380501128347926
0.9317471797454173
0.9720312759644577
3.3472547019000554 |
So, NEXT STEPS:
|
what about 1.0? where is that relative to .5? |
Something in the Sigmoid family maybe? Probably best to write down the properties we want it to have and then derive a function that does that. |
CC: @vchuravy |
I didn't actually measure 1.0, because i assumed it was the same as 0.7. I can do that in a bit though. Assuming it is, 0.7 was the same as 0.5. |
oh, all the benchmarks were labled .4 or.5 so i assumed. |
Leaving breadcrumbs to #25879, which might not be needed if we fix the performance of |
Ah, sorry, i didn't include a screenshot, but here's the full results:
|
-- EDIT: I forgot I ran the 0.4 and 0.5 tests on julia-debug, but 1.0 on julia. After switching them all to Okay, profiling update. (Is there a better place to be posting these thoughts/updates? Maybe in a performance slack channel or something? let me know.) So @vchuravy helped me set up a profiling script, which I then profiled on my Mac using Xcode's Instruments: # profilegrowth.jl (Note, this file is targetting julia 0.4 and 0.5)
function testfunc(capacity)
a = zeros(capacity÷2)
ccall(:jl_array_sizehint, Void, (Array, Cint), a, capacity) # For the julia 1.0 test, I changed Void to Nothing
end
testfunc(10)
function test(iterations)
for _ in 1:iterations
testfunc(10^8)
end
end
test(1)
test(10) Some interesting early observations:
EDIT: And julia 1.0's profile is almost identical to 0.5's. (What I thought was a speed-up was actually just due to using (zero-filing in 1.0 is now slightly faster than in 0.4 though, so that's cool!) So then, digging into those profiles, from what I can tell, the extra time seems to come from the fact that So actually, as best as I can tell, this might actually be a symptom of the problem Jeff originally identified above (#28588 (comment)) and which I broke out into a separate issue here (#29526). That is, that 0.4 was avoiding the expensive operation of actually copying the data, since most of the time it can get away with just growing the memory in-place! Am I understanding that correctly? And if so, is that only true because this is a toy example that does nothing else besides allocate that array? In the real-world, would we expect memory to be more fragmented than this such that that optimization matters less? I think it would probably be good to profile a bunch of example normal workflows to compare the CPU spent in This also makes sense to explain the ~2x slowdown: we're now touching all the data twice, once during allocation and once during the grow event. |
Oops, I just realized I used julia-debug binaries for 0.4 and 0.5, but julia for 1.0. The results are still similar, but let me update the numbers above. |
I watched a talk recently that seems quite relevant to this (https://youtu.be/2YXwg0n9e7E?t=2193), it compares I paraphrase:
|
@KristofferC Yeah, so the equivalent C++ (compiled with It spends ~30% of its time zeroing the initial array, and an equal 30% of time copying that data to a new array. Although c++ is hindered by spending an extra 30% of its time zeroing out the rest of the array after growing, whereas julia chooses to leave that data uninitialized. It's hard to directly compare the total times, since the julia runs included parsing and compiling the code as well, and I'm not sure how much of that contributed to the final profiles, but still, it seems that the C++ is actually a good amount faster despite doing the extra work that 0.5 and 1.0 do... It would be interesting to know why! :) Here's the c++ file I used: // Compiled via: clang++ --std=c++11 -O3 profilegrowth.cc
#include <vector>
using namespace std;
void testfunc(int capacity) {
auto a = vector<int>(capacity/2);
a.resize(capacity);
}
void test(int iterations) {
for (int i = 0; i < iterations; ++i) {
testfunc((int)1e8);
}
}
int main() {
// "Warm up" To maintain correlation with the julia script which runs these to precompile
testfunc(10);
test(1);
// Actual test
test(10);
return 0;
} |
This is a fun talk, thanks for sharing! Agreed that it seems quite relevant, and that your summary is good. |
Hi again! First, a recap:There are two problems we've discovered:
The solution to problem (2.) is probably to follow the suggested behavior in this stackoverflow post: Problem number (1.) however is straightforward. All the literature I have read on this topic says you should never grow with a constant growth amount. It should always be a scale-factor of the current array size. So we should fix that. The next remaining question is what should the growth factor be? Picking a scaling factorThere is apparently a lot of interesting debate on this topic. It's summarized well in the wikipedia article on dynamic arrays, which also includes a table of the growth factors in various languages:
Introductory CS courses explain that you double an array when it needs to grow. However, some sources such as Facebook's folly claim that 2x growth is provably the worst choice, "because it never allows the vector to reuse any of its previously-allocated memory": I think that this stackoverflow answer explained the concept well: From what I understand, conceptually any growth factor less than or equal to the golden ratio sufficiently allows for reusing previous allocation slots, but people prefer 1.5 for other reasons. Changing growth factor based on physical RAMThe approach julia took to limit allocation size as you approach the total system RAM is interesting. One thing we could maybe consider is gradually slowing the growth factor as you get larger, rather than switching to a constant growth size like we do now. (For example, we'd discussed other interesting growth functions like sqrt(x) in the past.) But I think that this is probably a bad idea. Here are some reasons:
Next StepsSo given all of that, we have a few decisions to make. I'll list them and my proposals here:
How does all of that sound? :) |
I've opened #32035, which implements the proposals above, which we can use as a strawperson to investigate the best fix. |
Average insertion time for inserting n-elements:
Total insertion time for inserting n-elements:
Preliminarily, this behavior seems to be true for both push-front and push-back since 0.5 onwards. Still running the profiles.
The text was updated successfully, but these errors were encountered: