Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unitful support? #1413

Open
NAThompson opened this issue Oct 30, 2021 · 16 comments
Open

Unitful support? #1413

NAThompson opened this issue Oct 30, 2021 · 16 comments

Comments

@NAThompson
Copy link

NAThompson commented Oct 30, 2021

Using Unitful v1.9.1 and Distributions v0.25.23, it appears Unitful numbers are not supported:

julia> using Unitful
julia> using Distributions
julia> μ = 1.0u"s"
julia> σ = 0.01u"s"
julia> Normal(μ, σ)
ERROR: MethodError: no method matching Normal(::Quantity{Float64, 𝐓, Unitful.FreeUnits{(s,), 𝐓, nothing}}, ::Quantity{Float64, 𝐓, Unitful.FreeUnits{(s,), 𝐓, nothing}})

Could this be supported without undue pain?

@tfiers
Copy link

tfiers commented Jan 17, 2022

I am wondering the same.

Replacing some Reals by Numbers, and 0s by zero(x)s, already makes a surprising number of things work:

> using Distributions
> using Unitful: mV, Hz, s

> n = Normal(0mV, 1mV)
Normal{Unitful.Quantity{…}}=0 mV, σ=1 mV)

> pdf(n, 3mV)
0.0044318484119380075 mV^-1

> λ = 4Hz;
> e = Exponential(1/λ)
Exponential{Unitful.Quantity{Float64, 𝐓, Unitful.FreeUnits{(Hz^-1,), 𝐓, nothing}}}  (θ=0.25 Hz^-1)

> rand(e) |> s
0.6119753615525725 s

How fun!

Of course more patches are needed.
And some support from Unitful's side (randn(::AbstractRNG, ::Type{Quantity{T,D,U}) eg).
And some distributions will need an extra "scale" or "units" parameter: μ and σ of LogNormal's default parametrisation are unitless, and do not give enough information to implement cdf and rand etc with units.

@devmotion
Copy link
Member

Just replacing Real with Number is not sufficient for Normal as the example above shows - pdf, logpdf, cdf, logcdf, etc. should all return unitless numbers. Thus unfortunately I assume one would have to modify each distribution very carefully and hence probably it's easier to change each one separately.

@tfiers
Copy link

tfiers commented Jan 17, 2022

pdf does have units, namely the reciprocal of the independent variable's.
(As the pdf is a density, we have eg 0.24 probability mass per 1 mV, at some voltage).

You can see this by doing dimensional analysis on pdf expressions.
Eg for Exponential: pdf(x) = 1/θ exp(–x/θ). The scale parameter θ has the same units as x, so the pdf has one over these units.
Or for Normal: pdf(x) =

where both μ and σ are in data units.

cdf is unitless (it's a probability, and also the integral of pdf(x) * dx, where dx has data units; another way to see what the units of pdf should be).
logcdf is then of course also unitless.
logpdf is unitless as well (as is every logarithm), but a conversion factor is needed (to get eg log(0.24/mV * mV)).

Samples (rand) and quantiles of course have units, as do various summary statistics.

Indeed many patches besides Real -> Number are needed. (Although this simple change already goes a long way, thanks to Julia's generic algorithm culture. But Distributions still has a bunch of non generic code, like 0 instead of zero, or eltype(::Sampleable) being hardcoded as Int/Float64).
And of course an extensive test suite is a must.
I also agree that each distribution needs to be manually examined.

@devmotion
Copy link
Member

pdf does have units, namely the reciprocal of the independent variable's.

Not in general but only for distributions without atoms - ie. eg. not for DiscreteDistribution, censored distributions (open PR), or the Normal(0, 0) limit case (which is supported).

Thus it seems supporting the limit case Normal(0, 0) or censored distributions created from ContinuousDistributions would lead to type instabilities or inconsistencies.

(Clearly, logpdf is always unitless but one has to deal with a possibly arbitrary unit constant.)

@sethaxen
Copy link
Contributor

It's unfortunate we use pdf to refer to the PDF and the PMF. For a discrete distribution, it's returning probabilities so it should be unit less, but I think for mixtures of continuous and discrete distributions, where it's a density function, the PMF of the atomic component should be implicitly multiplied by a Dirac delta function so that expectations could be taken via integrals. And by it's definition of integrating to 1, Dirac delta has units of the inverse of the variable.

@devmotion
Copy link
Member

Another - but probably non-standard - approach would be to choose different base measures (they're all implicit anyway currently) that ensure that pdf is always unitless. That shifts the unit constant choice from logpdf to pdf and would work with atoms and Diracs without unit. This might seem a bit strange but in the case of Normal it would be similar to a Lebesgue measure that is scaled by 1/sqrt(2pi*sigma^2) as base measure.

@devmotion
Copy link
Member

It's unfortunate we use pdf to refer to the PDF and the PMF. For a discrete distribution, it's returning probabilities so it should be unit less

It's a density in the discrete case as well, just with respect to the counting measure. So the name pdf makes sense. Nevertheless it would be helpful for eg. Truncated and Censored if one could query the (possibly zero) probability of a specific value.

@KronosTheLate
Copy link

+1 that unitful support would be cool ^_^

@tfiers
Copy link

tfiers commented Oct 20, 2022

I gave this a go a while back: master...tfiers:Distributions.jl:master

I implemented unit support for Exponential and LogNormal, and wrote tests and documentation.

Changes done to make this work:

  • Replace many ::Real types by ::RealQuantity (besides in lognormal.jl and exponential.jl, also in univariates.jl and genericrand.jl).
    That type is a union of Real and Unitful.Quantity{<:Real}

    • To be less verbose, I suppose the ::Real typings could be dropped altogether.
  • In exponential.jl and lognormal.jl, go through each function (zval, entropy, logpdf, ..) and add / unit(d), * unit(d), * unit(d)^2..., in appropriate places. Also replace some one(x)s by oneunit(x), and some 0s by zero(x).


  • I added units.jl, which does nothing more than define the RealQuantity alias,
    and the unit(d) mentioned above: Unitful.unit(::D) where {D <: Distribution} = unit(eltype(D)).
    (Note that oneunit(x) is provided by Base Julia).

    • The unit(d) = unit(eltype(D)) definition works nicely for Exponential, but not for LogNormal: in the (μ,σ) parametrization, both parameters are in the unitless logarithmic domain. So I added a type parameter to store the unit: struct LogNormal{T<:Real,U<:Units}. It can be supplied in a constructor, and is NoUnits by default.

I made sure that there was no change in current usage of Distributions.jl.
Most unit operations are done in the type-domain, so the runtime impact should be minimal or zero (though I have not benchmarked that).

I started a docs page with a list of "Supported distributions" (just the two), and a guide for how to add unit support to a new distribution.

@tfiers
Copy link

tfiers commented Oct 20, 2022

supporting the limit case Normal(0, 0)

pdf(Normal(0mV, σ)) for σ → 0mV should be ∞ / mV no? I.e. same units (mV⁻¹) as for other values, so type stable.

@tfiers
Copy link

tfiers commented Nov 2, 2022

using Unitful currently takes 1 to 3 seconds (1.0 seconds on replit, 1.6 to 2.6 seconds on my laptop).

Therefore, to minimize the impact of unit support in Distributions.jl on the 99% of users that don't need it, there first needs to be created a lightweight package defining just the interface/API for working with units (UnitfulCore or UnitsInterface or sth like that). StaticArrays with StaticArraysCore is an example of this principle.

Instead of importing the full Unitful package, Distributions will then instead import just the interface package, with minimal impact on load time.

@tfiers
Copy link

tfiers commented Jan 23, 2023

An update, from the discussion at

with Julia 1.9's 'extension packages' (https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)), a unitful core or interface package is no longer needed, and the full Unitful would be added as a [weakdeps], and there would presumably be a UnitfulExt module and entry under [extensions] in Distributions.jl's Project.toml.

Though, given that is 1.9+ only, I'm not sure how this interacts with Distributions.jl's current support for Julias down to 1.0


UPDATE: See this helpful summary by mbauman and chrisr that anwers these questions: https://discourse.julialang.org/t/package-extensions-for-julia-1-9/93397/5?u=tfiers

@juliohm
Copy link
Contributor

juliohm commented Jan 21, 2025

Is there a consensus on how to address this issue?

I am seeing open PRs for Exponential, but that is it.

What would be the minimum amount of work that one would need to put in a PR to get this feature gradually added? Julia has package extensions nowadays, so the Unitful.jl load time is no longer an issue.

@devmotion
Copy link
Member

No, there isn't. IMO the problem is more a conceptual one and the fact that it would be require changes everywhere since otherwise you end up in a messy half-baked situation as has been the case for multiple years after moving from hardcoded Float64 to Real. There are also more pressing items that also involve changes throughout Distributions, in particular since it is just a convenient feature but not something that's absolutely necessary - you can just perform calculations after stripping units.

@juliohm
Copy link
Contributor

juliohm commented Jan 21, 2025

require changes everywhere since otherwise you end up in a messy half-baked situation

Isn't a half-baked situation better than never-baked situation in this case? What is the problem with releasing patches whenever a new distribution is generalized to Number? Are you concerned that some distributions may not be well-defined with units at all?

@devmotion
Copy link
Member

Maybe some distributions might be problematic. The more common problem though, I assume, is that many parameters that currently have the same type T <: Real would need different type parameters, which 1) would require to change type parameters, dispatches etc. and 2) would most likely introduce quite a few type stability problems.

I've experienced this Float64->Real transition, and there are still problems to fix even though that happened a long time ago. Since IMO it's really just a convenience thing, I'm strongly against a half-baked solution. There's just too many other things we could and should spend time on rather than cleaning up support for unitful numbers IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants