Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particles as Distribution parameters #22

Open
cscherrer opened this issue Jun 28, 2019 · 24 comments
Open

Particles as Distribution parameters #22

cscherrer opened this issue Jun 28, 2019 · 24 comments

Comments

@cscherrer
Copy link

Really enjoying this package, thank you for your work on it.

For Soss.jl, here's a simple model:

m = @model begin
    x ~ Normal(0,1)
    y ~ Normal(x^2, 1) |> iid(10)
end

For simple examples like this, I can now use MonteCarloMeasurements to get

julia> particles(m)
(x = 0.00257 ± 1.0, y = Particles{Float64,1000}[1.0 ± 1.7, 1.0 ± 1.7, 1.0 ± 1.7, 0.998 ± 1.7, 1.0 ± 1.8, 1.0 ± 1.7, 0.998 ± 1.7, 0.997 ± 1.7, 0.999 ± 1.7, 0.996 ± 1.8])

The way I get there... it's not so pretty. Let's start with μ and σ:

julia> μ = Particles(1000,Normal(0,1))
Part1000(0.000235 ± 1.0)

julia> σ = Particles(1000,Normal(0,1))^2
Part1000(1.008 ± 1.5)

Surprisingly, this works just fine:

julia> Normal(μ,σ)
Normal{Particles{Float64,1000}}(
μ: 0.000235 ± 1.0
σ: 1.01 ± 1.5
)

Now trying the obvious thing,

julia> x = rand(Normal(μ,σ))
Part1000(-0.03248 ± 1.0)

It thinks it works, but it's really horribly broken:

julia> (x-μ)/σ
Part1000(-0.03246 ± 2.6e-8)

So I got it to work with a little helper function:

N = 1000
parts(x::Normal{P} where {P <: AbstractParticles}) = Particles(length(x.μ), Normal()) * x.σ + x.μ
parts(x::Sampleable{F,S}) where {F,S} = Particles(N,x)
parts(x::Integer) = parts(float(x))
parts(x::Real) = parts(repeat([x],N))
parts(x::AbstractArray) = Particles(x)
parts(p::Particles) = p 

So now I have

julia> x = parts(Normal(μ,σ))
Part1000(0.05171 ± 2.37)

julia> (x-μ)/σ
Part1000(0.0007729 ± 1.0)

Much better.

This is fine for one distribution, but most don't compose as nicely as a normal.

Many distributions break entirely, seemingly because of argument checking in Distributions.jl:

julia> Bernoulli(Particles(1000,Beta(2,3)))
ERROR: TypeError: non-boolean (Particles{Float64,1000}) used in boolean context
Stacktrace:
 [1] macro expansion at /home/chad/.julia/packages/Distributions/Iltex/src/utils.jl:5 [inlined]
 [2] Bernoulli{Particles{Float64,1000}}(::Particles{Float64,1000}) at /home/chad/.julia/packages/Distributions/Iltex/src/univariate/discrete/bernoulli.jl:31
 [3] Bernoulli(::Particles{Float64,1000}) at /home/chad/.julia/packages/Distributions/Iltex/src/univariate/discrete/bernoulli.jl:37
 [4] top-level scope at none:0

Do you see a path toward...

  1. Being able to build distributions with particle parameters in this way?
  2. Getting rand to work properly?
@baggepinnen
Copy link
Owner

Hey and thanks!
This seems like a cool use case that I admittedly have never thought about before. Let me see if I have understood what it is you are trying to do:

  1. You are using particles to represent distributions of parameters of distributions, resulting in a (pseudo) type Distribution{Particles}
  2. You want to be able to sample from these Distribution{Particles} using rand(d::Distribution{Particles})

How is this sampling expected to behave? Sample one distribution of the 1000 distributions represented by Distribution{Particles{1000}} and then sample a value from this distribution?

I can see three ways of representing these distributions:

  1. Particles{Particles}
  2. Distribution{Particles}
  3. Particles{Distribution}

Is it obvious that Distribution{Particles} is the preferred way? I have tried to make particles work together with ForwardDiff.Dual and have come to realize that it is not at all clear whether Particles{Dual} or Dual{Particles} are to be preferred, it really depends on situation and how you want to dispatch. I often had to convert between the two.

@cscherrer
Copy link
Author

This seems like a cool use case that I admittedly have never thought about before.

Thanks!

How is this sampling expected to behave? Sample one distribution of the 1000 distributions represented by Distribution{Particles{1000}} and then sample a value from this distribution?

That's how I would expect it to behave, yes.

Is it obvious that Distribution{Particles} is the preferred way?

Not at all, Particles{Distribution} could be better. I think Particles{Particles} is right out, because maybe other applications will be more interested in, say, logpdf than rand. I was thinking Distribution{Particles} because it seemed more consistent with your approach for Particles to only wrap primitives. Didn't know about the Dual issues at the time.

@bhgomes
Copy link

bhgomes commented Jun 28, 2019

@cscherrer just a question. I was looking at your example code and tried it out and I ran into a few issues. Here was my sequence:

using MonteCarloMeasurements, Distributions
μ = Particles(1000,Normal(0,1))
σ = Particles(1000,Normal(0,1))^2
Normal(μ,σ)

where I get the error:

ERROR: Comparison operators are not well defined for uncertain values and are currently turned off. Call `unsafe_comparisons(true)` to enable comparison operators for particles using the current reduction function Statistics.mean. Change this function using `set_comparison_function(f)`.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] _comparioson_operator at /Users/brandongomes/.julia/packages/MonteCarloMeasurements/nd0Nh/src/particles.jl:344 [inlined]
 [3] <=(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/MonteCarloMeasurements/nd0Nh/src/particles.jl:360
 [4] >=(::Particles{Float64,1000}, ::Particles{Float64,1000}) at ./operators.jl:333
 [5] macro expansion at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/utils.jl:5 [inlined]
 [6] Normal{Particles{Float64,1000}}(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/univariate/continuous/normal.jl:35
 [7] Normal(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/univariate/continuous/normal.jl:41
 [8] top-level scope at none:0

So now I continue as follows:

julia> unsafe_comparisons(true)
[ Info: Unsafe comparisons using the function `Statistics.mean` has been enabled globally. Use `@unsafe` to enable in a local expression only or `unsafe_comparisons(false)` to turn off unsafe comparisons
julia> Normal(μ,σ)
Normal{Particles{Float64,1000}}(
μ: 0.000425 ± 1.0
σ: 0.999 ± 1.4
)

So that works! Then I try the Bernoulli:

julia> Bernoulli(Particles(1000,Beta(2,3)))
Bernoulli{Particles{Float64,1000}}(
p: 0.4 ± 0.2
)

and it works just fine. If I turn unsafe_comparisons back off, I get the same error as the normal distribution. I don't get the error you got. I was wondering if this makes sense?

@cscherrer
Copy link
Author

cscherrer commented Jun 28, 2019

@baggepinnen understands this much better then I do, but I think the issue is how to compute a<b when a and b are Particles:

julia> unsafe_comparisons(true)
[ Info: Unsafe comparisons using the function `Statistics.mean` has been enabled globally. Use `@unsafe` to enable in a local expression only or `unsafe_comparisons(false)` to turn off unsafe comparisons

julia> Particles()<Particles()
false

julia> unsafe_comparisons(false)

julia> for f in [<=, >=, <, >]
           register_primitive(f)
       end

julia> Particles()<Particles()
Part500(0.494 ± 0.5)

unsafe_comparisons just does one comparison, using the means. For particle-wise computations, you need to register_primitive

Oh, I see, Bernoulli breaks because the particles are coerced to be floating point. @baggepinnen, is it reasonable to have Particles of Bool, Int, etc?

@baggepinnen
Copy link
Owner

Welcome @bhgomes . Your error message is different (hopefully improved) since you use a newer version of the package. @cscherrer is correct in asserting that the error is thrown when particles appear in comparisons, e.g., p<0, as it is not always clear how you want this comparison to be handled. I called the function to allow comparisons unsafe_comparisons to emphasize that one should really think through if this is what one really wants. My long-term plan is to write a compiler pass that identifies when Particles{Bool} appear in a boolean context and transforms this code to a loop over a function containing the control flow, but this is above my head at the moment, and register_primitive will have to do for now, with limitations.

It is certainly reasonable to have Particles{Bool/Int} etc. in correct situation, e.g., to represent binary distributions or histograms.

The problem with Distribution{Particles} vs Particles{Distribution} is that one of the types behaves (dispatches) as a Distribution whereas the other behaves as Particles. Ideally, you would like the composite type to behave as both a distribution and as particles, at least in some situations, but this requires you to either convert between the two representations or implement quite a lot of new methods.
The easiest way forward, i suspect, is to identify which of the types Particles/Distribution have more special methods implemented for it, and let this type be the outermost. I have a feeling this leads to Particles{Distribution}. I do however believe that the representation as Distribution{Particles} is easier to reason about and sample from. Indeed, I can't even imagine how a sample would be drawn from Particles{Distribution}.

@baggepinnen
Copy link
Owner

I'm struggling with coming up with a good solution to this. I think the following does what it's supposed to, but is horribly inefficient

import MonteCarloMeasurements.particletype
struct ParticleDistribution{D}
    d::D
end

function ParticleDistribution(d::Type{<:Distribution}, p...)
    @unsafe ParticleDistribution(d(p...))
end
particletype(pd::ParticleDistribution) = particletype(getfield(pd.d,1))


function Base.rand(d::ParticleDistribution{D}) where D
    T,N = particletype(d)
    i = rand(1:N)
    d2 = MonteCarloMeasurements.replace_particles(d.d,P->P isa AbstractParticles, P->P[i])
    rand(d2)
end

pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 822 ns
@btime rand(Bernoulli(0.3)) # 4 ns

@cscherrer
Copy link
Author

It seems that relative to rand(Bernoulli(0.3)), rand($pd) does 1000x the work in 200x the time, is that right? I'm missing how this shows inefficiency. But it's early here, so maybe I just need coffee :)

@baggepinnen
Copy link
Owner

It doesn't really do 1000x the work, it just draws one random number, selects that distribution and then draws a random number from that, so two random numbers in total. The inefficiency comes from replace_particles which does a lot of runtime code introspection, which is a problem I have been struggling to get around. One potential solution is to use the method linked in #21

@cscherrer
Copy link
Author

Are the isa calls in replace_particles all compiled away? If not, would it be faster to turn each of these into a different method?

Or if generated functions would help, I've had good luck with GeneralizedGenerated.jl, which gives a lot more flexibility than built-in @generated

@baggepinnen
Copy link
Owner

baggepinnen commented Jan 27, 2020

Some progress

using MonteCarloMeasurements, Distributions
import MonteCarloMeasurements.particletypetuple
struct ParticleDistribution{D,P}
    d::D
    constructor::P
end

function ParticleDistribution(constructor::Type{<:Distribution}, p...)
    @unsafe ParticleDistribution(constructor(p...), constructor)
end
particletypetuple(pd::ParticleDistribution) = particletypetuple(getfield(pd.d,1))
particletypetuple(::Type{D}) where D <: ParticleDistribution = particletypetuple(getfield(pd.d,1))

@generated function Base.rand(d::ParticleDistribution{D}) where D
    nfields = fieldcount(D)
    indtuple = ()
    N = 0
    for field in fieldnames(D)
        FT = fieldtype(D, field)
        if FT <: AbstractParticles
            N = particletypetuple(FT)[2]
            indtuple = (indtuple..., :(d.d.$field[i]))
        else
            indtuple = (indtuple..., :(d.d.$field))
        end
    end
    tupleex = Expr(:tuple, indtuple...)
    quote
        i = rand(1:$N)
        d2 = d.constructor($(tupleex)...)
        # d2 = newstruct(d.constructor{typeof.($(tupleex))...}, $(tupleex)...) # This can be used to bypas a missing inner constructor, is a bit slower though
        rand(d2)
    end
end

pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 194.651 ns (2 allocations: 32 bytes)
@btime rand(Bernoulli(0.3)) # 10.050 ns (0 allocations: 0 bytes)

pd = ParticleDistribution(Normal, Particles(1000,Normal(10,3)), Particles(1000,Normal(2,0.1)))
@btime rand($pd) # 149.837 ns (4 allocations: 80 bytes)
@btime rand(Normal(10,2)) # 12.788 ns (0 allocations: 0 bytes)

Some assumptions made:

@baggepinnen
Copy link
Owner

baggepinnen commented Feb 1, 2020

I have been thinking more about this and might have changed my mind; Particles are a subtype of Real, and contains a collection of Real samples such that each sample on its own is useful and can be propagated through the same functions as Particles can be applied to. If a hierarchical distribution was represented by a collection of individual distributions, we would have the same situation, i.e., Particles{Distribution} <: Distribution. Each sample would itself be a distribution, just like a sample of Particles{Real} is a Real. This would make sampling a random number from the distribution very easy, just sample an index and then sample from that distribution. I am not sure how the calculation of logpdf would be, just average all sample logpdfs?

Where the reverse representation, i.e., Distribution{Particles} comes in handy is during construction and display, it is very convenient to construct a hierarchical distribution by thinking about the outer distribution's parameters in terms of particles. This can be facilitated with special constructors though, and is decoupled from the internal representation.

Is there any situation in which computation is made easier by the representation as Distribution{Particles} like the example in my last post?

@baggepinnen
Copy link
Owner

Note to self: It's quite possible that StructArrays.jl can significantly simplify how we deal with the Particles{Distribution} / Distribution{Particles} dilemma.

@baggepinnen
Copy link
Owner

baggepinnen commented Apr 13, 2020

This is starting to become useful now. With the code below, particle distributions are represented as a list of regular distributions, but are constructed and printed as if they were Distributions{Particles}. Drawing random numbers from such a distribution is very fast, it costs roughly the same as drawing one integer and one number from the underlying distribution.

This gets around the problem with distributions performing arg checks inside their constructors since the constructor is called with normal scalars.

@cscherrer How would you envision calculating the logpdf for such a distribution? I am not sure which one of

logpdf(pd,x) = mean(logpdf(d,x) for d in pd.d)
pdf(pd,x)    = mean(pdf(d,x) for d in pd.d)

is correct.

using MonteCarloMeasurements, Distributions
import MonteCarloMeasurements: nparticles, indexof_particles

struct ParticleDistribution{D,P}
    d::D
    constructor::P
end

function ParticleDistribution(constructor::Type{<:Distribution}, p...)
    dists = [constructor(getindex.(p, i)...) for i in 1:nparticles(p[1])]
    ParticleDistribution(dists, constructor)
end

function Base.rand(d::ParticleDistribution{D}) where D
    ind = rand(1:length(d.d))
    rand(d.d[ind])
end

function Base.show(io::IO, d::ParticleDistribution)
    T = eltype(d.d)
    fields = map(fieldnames(T)) do fn
        getfield.(d.d, fn)
    end
    println(io, "Particle", T, "(")
    for (i,fn) in enumerate(fieldnames(T))
        println(io, string(fn), ": ", Particles(fields[i]))
    end
    println(io, ")")
end

pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) #  23.304 ns (0 allocations: 0 bytes)
@btime rand(Bernoulli(0.3)) # 10.050 ns (0 allocations: 0 bytes)

pd = ParticleDistribution(Normal, Particles(1000,Normal(10,3)), Particles(1000,Normal(2,0.1)))
@btime rand($pd) #  27.726 ns (0 allocations: 0 bytes)
@btime rand(Normal(10,2)) # 12.788 ns (0 allocations: 0 bytes)

julia> pd
ParticleNormal{Float64}(
μ: 10.0 ± 3.0
σ: 2.0 ± 0.1
)

@cscherrer
Copy link
Author

This looks great! I wonder, how far can "particle semantics" be pushed? For the last pd in your example, could rand(pd) return a Particles by drawing one sample from each ParticleDistribution?

Pretend ParticleDistribution had a .particles field (guess getproperty could even make this a reality). Then something like

rand(pd) = Particles(rand.(pd.particles))
logpdf(pd, x) = Particles(logpdf.(pd.particles, x))

There may be a more efficient implementation, but would this work? At what point do the semantics break down?

@baggepinnen
Copy link
Owner

I'm sure it can be made to work, but I'm struggling to understand what it would represent.

  • If a distribution with particle fields represent a hierarchical distribution, I would still expect a scalar random number to be drawn from it. If a random draw from the distribution is itself a distribution (of particles), I'm not sure how one would plug it into code that expects a regular distribution. The same for logpdf, how would one use this particle representation of the logpdf?
  • If it instead represents a distribution of distributions, it makes sense to return either a random draw from all inner distributions, or a random Distribution from the collection.
  • Maybe a hierarchical distribution and a distribution of distributions are the same thing and I'm just not used to think about this :P

I am very open to be convinced of its utility, it feels like hitherto untrodden land :)

@cscherrer
Copy link
Author

In a typical MCM use case, say you have some function f(x::Real) that returns another Real. If you give that function a Particles, you might expect it to return another Particles, which you can use to propagate uncertainty.

Now what if f looks like, say

function f(x)
    return rand(Normal(x,1))
end

This fits into the same framework, the only difference is that now there's uncertainty other than from the particles themselves.

@cscherrer
Copy link
Author

This comes up all the time in probabilistic programming. We have uncertainty on our parameter estimates (because they're samples from the posterior distribution), which we propagate through the observation model for forecasting or posterior predictive checks.

@baggepinnen
Copy link
Owner

I buy that! This branch now follows the semantics you outlined, i.e.

rand(pd) = Particles(rand.(pd.particles))
logpdf(pd, x) = Particles(logpdf.(pd.particles, x))

It appears to be reasonably efficient to use the naive (but type stable) way of drawing random numbers.

julia> pd = ParticleDistribution(Normal, 1±0.1, 1±0.1)
ParticleNormal{Float64}(
 μ: 1.0 ± 0.1
 σ: 1.0 ± 0.1
)

julia> rand(pd)
Part10000(1.012 ± 1.01)

julia> @btime rand($pd)
  116.768 μs (3 allocations: 78.22 KiB)
Part10000(0.996 ± 0.997)

julia> @btime 1 .+ 0.1 .* randn(10000);
  89.188 μs (4 allocations: 156.41 KiB)

@cscherrer
Copy link
Author

For some cases there are some nice optimizations, e.g.

rand(ParticleDistribution(Normal, m, s)) == m + s * Particles(Normal())

(assuming I got that right)

@baggepinnen
Copy link
Owner

With the current representation we have an array of structs.

m + s * Particles(Normal())

implies the struct of arrays representation.
Struct of arrays is often (but not always) faster, but also harder to use. With the struct of arrays (Distribution{Particles}), we are back at the problems with Distribution constructors etc. and that special methods would have to be implemented for each subtype of Distribution, alternatively use the slower fallback outlined in #22 (comment).

One could also store both representations inside a ParticleDistribution and use whichever is most efficient for a particular task, at the expense of a 2x memory overhead.

The naive sampling outlined in my previous post is actually nearly just as fast as the struct-of-arrays version, Julia must do a very good job optimizing the broadcasting of rand in Particles(rand.(rng, d.d))

julia> @btime rand($pd);
  119.941 μs (3 allocations: 78.22 KiB)

julia> @btime $m + $s*Particles{Float64, 10000}(randn(10000));
  101.810 μs (9 allocations: 234.66 KiB)

@cscherrer
Copy link
Author

That's really close! I'd guess the latter might be a little more stable, since it takes advantage of stratified sampling. But I'm not sure about that.

You mentioned https://github.com/JuliaArrays/StructArrays.jl, did that seem to fit here?

@baggepinnen
Copy link
Owner

It might be a bit more stable thanks to the systematic sampling, but that is significantly slower

julia> @btime $m + $s*Particles{Float64, 10000}(systematic_sample(10000, Normal()));
  535.076 μs (13 allocations: 391.06 KiB)

mostly due to quantile being slower than randn

julia> @btime quantile(Normal(), 0.6);
  18.838 ns (0 allocations: 0 bytes)

julia> 0.018*10000
180.0

but also since the particles after systematic sampling have to be permuted.

@cscherrer
Copy link
Author

Oh interesting, I thought Particles() used systematic sampling by default. Think I'm misunderstanding something

@baggepinnen
Copy link
Owner

No you are correct, it's just in my previous benchmark I bypassed that by calling randn manually, so I made the systematic sampling explicit so it would be clearer what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants