Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for julia 0.7 #3

Closed
lhupe opened this issue Jun 11, 2018 · 15 comments
Closed

Support for julia 0.7 #3

lhupe opened this issue Jun 11, 2018 · 15 comments

Comments

@lhupe
Copy link

lhupe commented Jun 11, 2018

Sadly, this package does not load in julia 0.7 alpha as of today (11 June).
Precompilation fails with this error message

julia> using Unrolled
┌ Warning: Deprecated syntax `(loopbody...)` at /home/███/.julia/packages/Unrolled/eA4l/src/Unrolled.jl:33.
│ Use `(loopbody...,)` instead.
└ @ nothing Unrolled.jl:33
[ Info: Precompiling module Unrolled
┌ Warning: Deprecated syntax `(loopbody...)` at /home/███/.julia/packages/Unrolled/eA4l/src/Unrolled.jl:33.
│ Use `(loopbody...,)` instead.
└ @ nothing Unrolled.jl:33
ERROR: LoadError: syntax: expected ")"
Stacktrace:
 [1] include at ./boot.jl:314 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1071
 [3] include(::Module, ::String) at ./sysimg.jl:29
 [4] top-level scope
 [5] eval at ./boot.jl:316 [inlined]
 [6] eval(::Expr) at ./client.jl:394
 [7] macro expansion at ./none:3 [inlined]
 [8] top-level scope at ./<missing>:0
in expression starting at /home/███/.julia/packages/Unrolled/eA4l/src/Unrolled.jl:33
ERROR: Failed to precompile Unrolled to /home/███/.julia/compiled/v0.7/Unrolled/BnVL.ji.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]
 [2] compilecache(::Base.PkgId) at ./loading.jl:1207
 [3] _require(::Base.PkgId) at ./loading.jl:1007
 [4] require(::Base.PkgId) at ./loading.jl:878
 [5] require(::Module, ::Symbol) at ./loading.jl:873

Will support for 0.7 (and eventually 1.0) be added to Unrolled.jl?

@cstjean
Copy link
Owner

cstjean commented Jun 11, 2018

Yes! As mentioned in #1 (comment), this macro should be a lot simpler in 0.7. I sadly can't promise when I'll get to it, but PRs are welcome.

@Datseris
Copy link

Datseris commented Jun 11, 2018

Unfortunately both me and @lhupe know almost nothing about metaprogramming, so we can't really help with PRs.

DynamicalBilliards.jl however is an excellent real-life usage and test-case of your package. If you progress this into 0.7, we will gladly test it and report possible issues. You can let us know of a working transition branch if and when one opens up and we can test it there.

@cstjean
Copy link
Owner

cstjean commented Jun 12, 2018

Should be fixed on master. Let me know how it goes, I haven't tested it thoroughly.

@lhupe
Copy link
Author

lhupe commented Jun 12, 2018

Well, we can definitely import Unrolled now. However, while testing this function

function next_collision(p::AbstractParticle{T}, bt::Tuple)::Tuple{T,Int} where {T}
    findmin(unrolled_map(x -> collisiontime(p, x), bt))
end

we noticed a massive performance loss compared to v0.6.3.

> bd = billiard_stadium(); p = randominside(bd)
> @btime next_collision($p, $(b.obstacles))

On the old julia version, the function took about 40 ns, now it needs 210 ns.
We could not pinpoint the problem to unrolled_map because the @code_unrolled macro doesn't work for functions without @unrollin their definition (?), but further testing showed that Base.map and unrolled_map seem to perform almost equally.
However, the problem definitely doesn't come from colisiontime, which needs between 2 and 10 ns and is called 4 times in total for my example.

@lhupe lhupe closed this as completed Jun 12, 2018
@lhupe lhupe reopened this Jun 12, 2018
@lhupe
Copy link
Author

lhupe commented Jun 12, 2018

Dammit. I always press the wrong button

@Datseris
Copy link

Datseris commented Jun 12, 2018

I am making this a bit more concrete. The function we are talking about is what Lukas cited,

@inline next_collision(p::AbstractParticle, bt::Billiard) =
    next_collision(p, bt.obstacles)

function next_collision(
    p::AbstractParticle{T}, bt::Tuple)::Tuple{T,Int} where {T}
    findmin(Unrolled.unrolled_map(x -> collisiontime(p, x), bt))
end

We have checked and the function collisiontime itself is actually faster on Julia 0.7.

We use the following code to have reproducible results:

using DynamicalBilliards, BenchmarkTools

bt = billiard_sinai()
p = Particle(0.1,0.1, π/8)

@btime next_collision($p, $bt)

On the master branch of DynamicalBilliards, which operates on Julia 0.6 and uses Unrolled v0.0.2, we have the result:

  39.253 ns (0 allocations: 0 bytes)
(0.398…, 1)

On the branch 0.7 of DynamicalBilliards, which operates on Julia 0.7 and uses the master branch of Unrolled we get:

julia> @btime next_collision($p, $bt)
  209.870 ns (3 allocations: 144 bytes)
(0.39757685533455417, 1)

I think it is clear that the problem is in the allocations, but I do not really know where it comes from. Also, due to the nature of the complexity we have (which requires us to use Unrolled in the first place), I am not sure on how to make a MWE to post on discourse.

@Datseris
Copy link

I have to report something more: Unrolled as a package in general seems to have no impact.

Compare the following two functions:

function next_collision(
    p::AbstractParticle{T}, bt::Tuple)::Tuple{T,Int} where {T}
    tmin::T = T(Inf)
    ind::Int = 0
    for i in eachindex(bt)
        tcol::T = collisiontime(p, bt[i])
        # Set minimum time:
        if tcol < tmin
            tmin = tcol
            ind = i
        end
    end#obstacle loop
    return tmin, ind
end

@unroll function next_collision2(p::AbstractParticle{T}, bt::Tuple) where {T}
    tmin::T = T(Inf)
    ind::Int = 0
    i::Int = 0
    @unroll for obst in bt
        tcol::T = collisiontime(p, obst)
        i+=1
        # Set minimum time:
        if tcol < tmin
            tmin = tcol
            ind = i
        end
    end#obstacle loop
    return tmin, ind
end

They seem to have indentical performance:

julia> @btime next_collision2($p, $bt.obstacles)
  196.204 ns (2 allocations: 128 bytes)
(0.39757685533455417, 1)

julia> @btime next_collision($p, $bt.obstacles)
  201.187 ns (2 allocations: 128 bytes)
(0.39757685533455417, 1)

(julia 0.7 and master branch of Unrolled)

@Datseris
Copy link

I have also started a discussion here: https://discourse.julialang.org/t/manually-unroll-operations-with-objects-of-tuple/11604

on how to make a "minimal unrolled version" of the function we need, so I can better pinpoint where is the regression (Unrolled is too complicated for me to understand :D)

@cstjean
Copy link
Owner

cstjean commented Jun 12, 2018

Thank you for the detailed report. I'll look into it when I get a chance, hopefully tonight. Unrolled relies on compiler details to a certain extent, so I'm not surprised that performance decreased, but there should be a fix. If not, then maybe we can file it as a performance regression.

@Datseris
Copy link

Hi cstjean, thanks a lot for helping us with this.

@Datseris
Copy link

@cstjean big update. Unrolled may be totally fine after all, see here: https://discourse.julialang.org/t/manually-unroll-operations-with-objects-of-tuple/11604/18

If indeed it is StaticArray's fault, I am very sorry for the confusion. I just "assumed" that static arrays were ok in 0.7

@cstjean
Copy link
Owner

cstjean commented Jun 13, 2018

No worries. But if I'm reading this right, on top of the SVector problems, it's also slower with tuples, right? If so, we should investigate where that slowdown comes from. unrolled_map is pretty straight-forward. Have you tried manually unrolling

function next_collision(p::AbstractParticle{T}, bt::Tuple)::Tuple{T,Int} where {T}
    findmin(unrolled_map(x -> collisiontime(p, x), bt))
end

(for, say, a tuple of three elements) and comparing performance?

@Datseris
Copy link

It may not be at all that it is slower. When I run the code of my own meta-programmed unrolled function, i.e.

@generated function nc_meta(p::Particle{T}, bt::TUP) where {T, TUP}
    L = length(TUP.parameters)

    out = quote
        i = 0; ind = 0
        tmin = T(Inf)
    end

    for j=1:L
        push!(out.args,
              quote
                  let x = bt[$j]
                tcol = collisiontime(p, x)
                # Set minimum time:
                if tcol < tmin
                  tmin = tcol
                  ind = $j
                end
              end
          end
              )
    end
    push!(out.args, :(return tmin, ind))
    return out
end

I got same speed with the 0.6 version. I am assuming that Unrolled will do the same as well, but I will test it after static arrays have been updated (so I am sure that I know where the problems are ;) ).

I am closing this for now, and when I test it thoroughly, after StaticArrays are good to go I will open a separate issue if need be!

@lhupe lhupe closed this as completed Jun 13, 2018
@cstjean
Copy link
Owner

cstjean commented Jun 13, 2018

Ok. But I would still be curious to try

function next_collision(p::AbstractParticle{T}, bt::Tuple{Any, Any, Any})::Tuple{T,Int} where {T}
    findmin((collisiontime(p, bt[1]), collisiontime(p, bt[2]), collisiontime(p, bt[3])))
end

vs. unrolled_map

@Datseris
Copy link

Yeap, I will do that once I am sure StaticArrays are good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants