Proposal for variance and centering functions #137
Labels
implementation
Implementation in experimental and submission of a PR
topic: mathematics
linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...
Based on discussions in #113, #3, #128, I would like to propose the following addition to
stdlib_experimental_stats
:var
- variance of array elementsDescription
Returns the variance of all the elements of
array
, or of the elements ofarray
along dimensiondim
if provided, and if the corresponding element inmask
istrue
.The variance is defined as the best unbiased estimator and is computed as:
Syntax
result = var(array [, mask])
result = var(array, dim [, mask])
Arguments
array
: Shall be an array of typeinteger
, orreal
.dim
: Shall be a scalar of typeinteger
with a value in the range from 1 to n, where n is the rank ofarray
.mask
(optional): Shall be of typelogical
and either by a scalar or an array of the same shape asarray
.Return value
If
array
is of typereal
, the result is of the same type asarray
.If
array
is of typeinteger
, the result is of typedouble precision
.If
dim
is absent, a scalar with the variance of all elements inarray
is returned. Otherwise, an array of rank n-1, where n equals the rank ofarray
, and a shape similar to that ofar ray
with dimensiondim
dropped is returned.If
mask
is specified, the result is the variance of all elements ofarray
corresponding totrue
elements ofmask
. If every element ofmask
isfalse
, the result is IEEENaN
.Example
To be discussed (not exhaustive):
Based on discussions in Style guide #3, I suggest to first implement a two-pass algorithm. Other algorithms can be implemented later, as proposed in Trade-off between efficiency and robustness/accuracy #134. Allowing
dim
andmask
in the API will not lead to a function as simple as in #3 comment.The centering of an array along a dimension (e.g.,
x(:, i) - mean(x, 2)
) will most likely require a loop. To have a clean implementation of the functionvar
, I propose to add a functioncenter
to perform the different centering of an arrayx
, andvar
would call it for the centering. However, I am afraid about efficiency (especially memory usage since an additional temporary array could be needed for the functioncenter
) with this proposition.The proposed name for the variance function is
var
. But what aboutvariance
(or other propositions)?Others:
Octave var
R var
Julia var
Numpy var
Requesting feedback from (at least) @certik @milancurcic @ivan-pi @aradi @leonfoks
The text was updated successfully, but these errors were encountered: