Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip() has limits: bug or needs documenting? #20615

Closed
cormullion opened this issue Feb 15, 2017 · 9 comments
Closed

zip() has limits: bug or needs documenting? #20615

cormullion opened this issue Feb 15, 2017 · 9 comments
Labels
collections Data structures holding multiple items, e.g. sets compiler:inference Type inference regression Regression in behavior compared to a previous version

Comments

@cormullion
Copy link
Contributor

Travis found this bug for me on 0.6: It seems that zip() can handle up to a certain number of inputs, then complains.

julia> a = 1:5
1:5

julia> b = ["e","d","b","c","a"]
5-element Array{String,1}:
 "e"
 "d"
 "b"
 "c"
 "a"

julia> for e in zip(a, b)
           println(e)
       end
(1, "e")
(2, "d")
(3, "b")
(4, "c")
(5, "a")

julia> for e in zip(a, b, a, b, a, b, a, b, a, b, a)
           println(e)
       end
(1, "e", 1, "e", 1, "e", 1, "e", 1, "e", 1)
(2, "d", 2, "d", 2, "d", 2, "d", 2, "d", 2)
(3, "b", 3, "b", 3, "b", 3, "b", 3, "b", 3)
(4, "c", 4, "c", 4, "c", 4, "c", 4, "c", 4)
(5, "a", 5, "a", 5, "a", 5, "a", 5, "a", 5)

julia> for e in zip(a, b, a, b, a, b, a, b, a, b, a, b)
           println(e)
       end
ERROR: type Array has no field a
Stacktrace:
 [1] start at ./iterators.jl:161 [inlined] (repeats 10 times)
 [2] anonymous at ./<missing>:?

Version

julia> versioninfo()
Julia Version 0.6.0-dev.2635
Commit dc2459d (2017-02-13 07:46 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-2500S CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)

@fredrikekre
Copy link
Member

I did some digging on this. If I annotate start, next and done with @noinline for Zip it works:

julia> A = 1:2;

julia> zw = zip(A, A, A, A, A, A, A, A, A, A, A); # this length work on master

julia> z =  zip(A, A, A, A, A, A, A, A, A, A, A, A); # does not work on master

julia> start(z)
(1, (1, (1, (1, (1, (1, (1, (1, (1, (1, (1, 1)))))))))))

Too much inlining when the Zip is this deep? I guess using @noinline is not a good solution since that would slow down for smaller Zips?

(It was a bit confusing debugging this, since start(z) returns tuple(start(z.a), start(z.z)) which works in the REPL, since z.z are on "working depth" 😄 )

@fredrikekre
Copy link
Member

Did some benchmarking and using @noinline seems to result in a significant slowdown for shorter Zips but actually a speedup for Zip's deeper than 5:

zip depth master with @noinline
3 38.363 ns (0 allocations: 0 bytes) 928.281 ns (0 allocations: 0 bytes)
4 38.374 ns (0 allocations: 0 bytes) 1.759 μs (0 allocations: 0 bytes)
5 38.366 ns (0 allocations: 0 bytes) 2.519 μs (0 allocations: 0 bytes)
6 205.323 μs (1438 allocations: 60.95 KiB) 59.108 μs (534 allocations: 37.36 KiB)
7 250.649 μs (1740 allocations: 79.83 KiB) 109.841 μs (1736 allocations: 131.30 KiB)
8 321.720 μs (2465 allocations: 115.41 KiB) 187.674 μs (2657 allocations: 190.06 KiB)
9 383.511 μs (3169 allocations: 157.84 KiB) 263.626 μs (4059 allocations: 323.11 KiB)
10 459.869 μs (3898 allocations: 204.67 KiB) 356.791 μs (5184 allocations: 410.23 KiB)
11 560.129 μs (7642 allocations: 466.77 KiB) 474.821 μs (7189 allocations: 642.20 KiB)

Could it be worth introducing Zip3, Zip4 and Zip5 (like there is is Zip1 and Zip2) and then have @noinline for Zip? 😄

@timholy
Copy link
Member

timholy commented Feb 15, 2017

Related:

julia/base/tuple.jl

Lines 128 to 140 in cb1aae9

# stop inlining after some number of arguments to avoid code blowup
Any16{N} = Tuple{Any,Any,Any,Any,Any,Any,Any,Any,
Any,Any,Any,Any,Any,Any,Any,Any,Vararg{Any,N}}
All16{T,N} = Tuple{T,T,T,T,T,T,T,T,
T,T,T,T,T,T,T,T,Vararg{T,N}}
function map(f, t::Any16)
n = length(t)
A = Array{Any}(n)
for i=1:n
A[i] = f(t[i])
end
(A...,)
end

@martinholters
Copy link
Member

Doing @code_warntype reveals that for the working case, the innermost Zip2 is treated as Base.Iterators.Zip2{_,_} where _<:UnitRange, while for the non-working case, it's a Base.Iterators.Zip2. Might be related?

@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Feb 15, 2017
@martinholters
Copy link
Member

Looking closer at the @code_warntype output, there is

      SSAValue(0) = (Core.getfield)(z::Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip2{UnitRange{Int64},UnitRange{Int64}}}}}}}}}}}}, :z)::Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{UnitRange{Int64},Base.Iterators.Zip{_,_}}}}}}}}} where _<:Base.Iterators.Zip2

near the top. The innermost type here is Base.Iterators.Zip{_,_} where _ <: Base.Iterators.Zip2. Is that just mis-printing, or is that actually wrongly inferred? The first _ in there should obviously not be a Zip2.

@martinholters
Copy link
Member

Looks like a bug in limit_type_depth:

julia> T=Core.Inference.limit_type_depth(typeof(z.z), 0);

julia> T.var
_<:Base.Iterators.Zip2

julia> T2 = T.body.types[2].types[2].types[2].types[2].types[2].types[2].types[2].types[2]
Base.Iterators.Zip{_<:Base.Iterators.Zip2,_<:Base.Iterators.Zip2}

julia> T2.parameters[1] === T2.parameters[2]
true

Obviously, this would need to be two different TypeVars with different upper bounds.

@fredrikekre
Copy link
Member

fredrikekre commented Feb 16, 2017

This works on v0.5 actually:

start(zip(ntuple(i->1:2, N)...))

for arbitrary N although quite slow for N larger than, say, 1000. So technically a regression.

@cormullion
Copy link
Contributor Author

@fredrikekre Yes, the bug started to occur when I started Travis testing on v0.6.

@martinholters
Copy link
Member

limit_type_depth was heavily modified as part of #18457. I suspect that's what broke this.

@pabloferz pabloferz added the regression Regression in behavior compared to a previous version label Feb 17, 2017
JeffBezanson pushed a commit that referenced this issue Feb 20, 2017
…0626)

Allow `limit_type_depth` to introduce more than one new `TypeVar`

Assert `limit_type_depth` returns a supertype of its input

Fixes #20615.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets compiler:inference Type inference regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

6 participants