Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type of return value of density, cdf and quantile is inconsistent #99

Open
venpopov opened this issue Apr 1, 2024 · 1 comment
Open
Milestone

Comments

@venpopov
Copy link
Contributor

venpopov commented Apr 1, 2024

While doing some testing I noticed that density, cdf and quantile sometimes return vectors, sometimes lists:

library(distributional)

density(dist_normal(), c(0,1))                     # returns list
#> [[1]]
#> [1] 0.3989423 0.2419707
density(c(dist_normal(),dist_normal()), c(0,1))    # returns list
#> [[1]]
#> [1] 0.3989423 0.2419707
#> 
#> [[2]]
#> [1] 0.3989423 0.2419707
density(dist_normal(), 0)                          # returns vector
#> [1] 0.3989423
density(c(dist_normal(),dist_normal()), 0)         # returns vector
#> [1] 0.3989423 0.3989423


cdf(dist_normal(), c(0,1))                         # returns list
#> [[1]]
#> [1] 0.5000000 0.8413447
cdf(c(dist_normal(),dist_normal()), c(0,1))        # returns list
#> [[1]]
#> [1] 0.5000000 0.8413447
#> 
#> [[2]]
#> [1] 0.5000000 0.8413447
cdf(dist_normal(), 0)                              # returns vector
#> [1] 0.5
cdf(c(dist_normal(),dist_normal()), 0)             # returns vector
#> [1] 0.5 0.5


quantile(dist_normal(), c(0,1))                    # returns list
#> [[1]]
#> [1] -Inf  Inf
quantile(c(dist_normal(),dist_normal()), c(0,1))   # returns list
#> [[1]]
#> [1] -Inf  Inf
#> 
#> [[2]]
#> [1] -Inf  Inf
quantile(dist_normal(), 0)                         # returns vector
#> [1] -Inf
quantile(c(dist_normal(),dist_normal()), 0)        # returns vector
#> [1] -Inf -Inf


# all return lists
generate(dist_normal(), 2)                         
#> [[1]]
#> [1] 1.7134674 0.4435964
generate(c(dist_normal(),dist_normal()), 2)
#> [[1]]
#> [1] -0.8288996  1.3283409
#> 
#> [[2]]
#> [1]  0.3532802 -0.7265644
generate(c(dist_normal(),dist_normal()), c(5,10))
#> [[1]]
#> [1] -0.24069096 -0.69155357  0.75666937  1.41401325 -0.07069785
#> 
#> [[2]]
#>  [1] -0.5421478 -1.2236593 -0.4537853 -0.3443427  0.2597258 -1.8010175
#>  [7]  0.8553006 -0.9100003 -0.3662697 -0.1896853

Created on 2024-04-01 with reprex v2.1.0

And the expected output type is not documented for these functions. I think it would be good to have consistent output type. For example, I was surprised that density(dist_norma(), 0) returns a scalar, but that density(dist_normal(), c(0,1)) returns a list, while with two distributions and single point we get a vector.

One possibility is to always return a list, where each element corresponds to one distribution, like it is now for some of the outputs. Although I believe this would be desirable, it is a breaking change that might affect other packages. In any case, the return value type should be documented.

Additionally, it could also be nice to have an option simplify2array as an argument.

@mitchelloharawild mitchelloharawild added this to the v1.0.0 milestone Apr 19, 2024
@mitchelloharawild
Copy link
Owner

The documentation should definitely be improved to explain how the distributions and arguments are vectorised. Essentially the operation's argument (e.g. density(at=)) is vectorised across the distributions and the result is simplified if possible. While I agree that returning lists with one element is more consistent, it would be less efficient and less usable - it could be a non-default option, like drop = FALSE.

The issue with simplify2array() is that the vectorisation needs to work with multivariate distributions which already produces arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants