Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for variance and centering functions #137

Closed
jvdp1 opened this issue Feb 2, 2020 · 0 comments
Closed

Proposal for variance and centering functions #137

jvdp1 opened this issue Feb 2, 2020 · 0 comments
Labels
implementation Implementation in experimental and submission of a PR topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...

Comments

@jvdp1
Copy link
Member

jvdp1 commented Feb 2, 2020

Based on discussions in #113, #3, #128, I would like to propose the following addition to stdlib_experimental_stats:

var - variance of array elements

Description

Returns the variance of all the elements of array, or of the elements of array along dimension dim if provided, and if the corresponding element in mask is true.

The variance is defined as the best unbiased estimator and is computed as:

 var(x) = 1/(n-1) sum_i (array(i) - mean(array))^2

Syntax

result = var(array [, mask])

result = var(array, dim [, mask])

Arguments

array: Shall be an array of type integer, or real.

dim: Shall be a scalar of type integer with a value in the range from 1 to n, where n is the rank of array.

mask (optional): Shall be of type logical and either by a scalar or an array of the same shape as array.

Return value

If array is of type real, the result is of the same type as array.
If array is of type integer, the result is of type double precision.

If dim is absent, a scalar with the variance of all elements in array is returned. Otherwise, an array of rank n-1, where n equals the rank of array, and a shape similar to that of ar ray with dimension dim dropped is returned.

If mask is specified, the result is the variance of all elements of array corresponding to true elements of mask. If every element of mask is false, the result is IEEE NaN.

Example

program demo_mean
    use stdlib_experimental_stats, only: var
    implicit none
    real :: x(1:6) = [ 1., 2., 3., 4., 5., 6. ]
    print *, var(x)                            !returns __TOBECOMPLETED__
    print *, var( reshape(x, [ 2, 3 ] ))       !returns __TOBECOMPLETED__ 
    print *, var( reshape(x, [ 2, 3 ] ), 1)    !returns [__TOBECOMPLETED__]
    print *, var( reshape(x, [ 2, 3 ] ), 1,&
                  reshape(x, [ 2, 3 ] ) > 3.)  !returns [__TOBECOMPLETED__]
end program demo_mean

To be discussed (not exhaustive):

  • Based on discussions in Style guide #3, I suggest to first implement a two-pass algorithm. Other algorithms can be implemented later, as proposed in Trade-off between efficiency and robustness/accuracy #134. Allowing dim and mask in the API will not lead to a function as simple as in #3 comment.

  • The centering of an array along a dimension (e.g., x(:, i) - mean(x, 2)) will most likely require a loop. To have a clean implementation of the function var, I propose to add a function center to perform the different centering of an array x, and var would call it for the centering. However, I am afraid about efficiency (especially memory usage since an additional temporary array could be needed for the function center) with this proposition.

  • The proposed name for the variance function is var. But what about variance (or other propositions)?

Others:
Octave var
R var
Julia var
Numpy var

Requesting feedback from (at least) @certik @milancurcic @ivan-pi @aradi @leonfoks

@jvdp1 jvdp1 added topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ... idea Proposition of an idea and opening an issue to discuss it labels Feb 2, 2020
@jvdp1 jvdp1 added implementation Implementation in experimental and submission of a PR and removed idea Proposition of an idea and opening an issue to discuss it labels Feb 18, 2020
@jvdp1 jvdp1 closed this as completed Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...
Projects
None yet
Development

No branches or pull requests

1 participant