Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression when indexing an 4+ dimensional array #11819

Closed
bfredl opened this issue Jun 23, 2015 · 8 comments · Fixed by #11833
Closed

Performance regression when indexing an 4+ dimensional array #11819

bfredl opened this issue Jun 23, 2015 · 8 comments · Fixed by #11833
Assignees
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@bfredl
Copy link
Contributor

bfredl commented Jun 23, 2015

Recently, after updating Julia some time last week or so, I see serious performance regression on some code using 4-dimensional arrays or more (my orginal application code used 6 dimensional arrays)

function v4test()
    l = 10
    v = zeros(Float64, (l, l, l, l))
    for i4 = 1:l
        for i3 = 1:l
            for i2 = 1:l
                for i1 = 1:l
                    v[i1,i2,i3,i4] = i1*i2
                end
            end
        end
    end
    v
end
@time v4test();
@code_llvm v4test()

The llvm code shows that the setindex! isn't statically inlined, rather a dynamic call is done in the innermost loop. No problem on 3 dimensional arrays. I believe this is a different problem than #11787 , as the regression is much bigger (50x than julia 0.3) and only for 4+ dimensional arrays.

I don't remember the last verison it worked without slowdown, but it was definitely after the Tuplecalypse.

@tkelman tkelman added the performance Must go faster label Jun 23, 2015
@bfredl
Copy link
Contributor Author

bfredl commented Jun 23, 2015

Incidently, it seems to work well with Base.checkbounds(v, i1,i2,i3,i4); v[sub2ind(size(v),i1,i2,i3,i4)] = i1*i2. I'll use that as a workaround for now.

@simonster
Copy link
Member

Most likely related to #9622 (see #9622 (comment)). Not sure exactly what commit of Julia you're on, but it sounds like updating may improve the situation a bit, since 6a3c173 improved the situation a bit.

@bfredl
Copy link
Contributor Author

bfredl commented Jun 23, 2015

I tried using latest master, so I do have that commit. MAX_TUPLETYPE_LEN mean that static dispatch doesn't work with more than MAX_TUPLETYPE_LEN arguments ? That explains that it should work with sub2ind directly at least to 7 dimensions.

@bfredl bfredl closed this as completed Jun 23, 2015
@simonster simonster reopened this Jun 23, 2015
@simonster
Copy link
Member

I can reproduce this on master too. Not sure if there's been a regression since 6a3c173 or if not all the cases are fixed, but I think it's a good idea to leave this issue open until it's fixed. This would be a bad performance regression to leave in the final release.

@simonster simonster added the regression Regression in behavior compared to a previous version label Jun 23, 2015
@mbauman
Copy link
Member

mbauman commented Jun 23, 2015

Instead of relying upon Julia to do the sub2ind for the builtin Array types, let's go back to defining all of the getindex(A::Array, i1::Int, i2::Int, …) manually. It was an interesting experiment, but I think the additional complexity on the Julia side outweighs the necessity of these things being as fast as possible.

@bfredl
Copy link
Contributor Author

bfredl commented Jun 23, 2015

Perhaps MAX_TUPLETYPE_LEN could be increased to say, 12 or 16? No matter if changed, shouldn't there be warning when the limit is exceeded? Even a somewhat confusing warning (due to vararg expansion deep down in a callstack and so on) is much better than silently 50x overhead when adding a extra paramenter to a function call.

@mbauman
Copy link
Member

mbauman commented Jun 23, 2015

Aha, I figured it out. This one stems from removing map from to_index (6a3c173). While the right direction, tuples don't get the same sort of special treatment for recursive inlining that arguments do. The right thing to do here is to remove to_index(::Tuple) and instead always splat the arguments, with definitions like:

to_index() = ()
to_index(i1, I...) = (to_index(i1), to_index(I...)...)

(and also adjusting all the call sites to splat their tuples)… but that hits a stack overflow within femptolisp when computing the primes during bootstrap.

@mbauman
Copy link
Member

mbauman commented Jun 23, 2015

Ah, of course, those definitions won't work. Amazing how describing the problem helps you see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants