-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sorting wrt multiple arrays #12
Comments
The following works well actually. @time sortperm(collect(zip(a, b)))
#elapsed time: 1.230827695 seconds (63991856 bytes allocated, 9.36% gc time) Is this the correct way to do it then? I'm not sure I understand the difference between these syntaxes. |
Don't time code in global scope (see the Julia performance tips). If you do: s0(a,b) = sortperm([(a[i], b[i]) for i in 1:length(a)])
s1(a,b) = sortperm(collect(zip(a, b)))
@time s0(a,b);
@time s1(a,b); then you will get almost identical timings. |
It should in principle be possible to do this without making a copy of the arrays, by defining a new type ZipVector{T,S} <: AbstractVector{Tuple{T,S}}
a::Vector{T}
b::Vector{S}
end
Base.getindex(z::ZipVector, i::Int) = (a[i],b[i])
Base.size(z::ZipVector) = size(a)
n = 1_000_000
a = rand(1:n, n)
b = rand(1:10, n)
@time sortperm(ZipVector(a,b)); but it is insanely slow (in Julia 0.4): It seems like |
It definitely seems to be function foo(X)
for (a,b) in X
a + b
end
end
@time foo(ZipVector(a,b)); and got |
Okay, it is a type-inference problem: Z = ZipVector(a,b)::ZipVector{Int64,Int64}
Base.Test.@inferred getindex(Z, 1) gives
|
I think you mean: Base.getindex(z::ZipVector, i::Int) = (z.a[i],z.b[i])
Base.size(z::ZipVector) = size(z.a) |
Oh, it is just a stupid typo. I was returning Base.getindex(z::ZipVector, i::Int) = (z.a[i],z.b[i]) gives
which is about 30% slower than the version above that makes a copy of the array. As @JeffBezanson remarked to me, making a copy is O(n) while the sort performs O(n log n) comparisons, so it is typically going to be preferable for cache locality to make a copy (unless maybe the arrays are already in-cache). But it may be nice for other reasons to do it without making a copy, and it is nice that this is (a) possible and (b) not too much slower. |
@simonster, yes, you spotted it as I was writing up my last post. |
Thanks a lot for getting back to me. The |
For me, |
What's the best way to sort lexicographically wrt 2 arrays a and b ?
The code below is quite slow, so I'm wondering whether there is a better solution
The text was updated successfully, but these errors were encountered: