analysis.stats.pearsonr and masked arrays #1534

rcomer · 2015-01-22T13:12:05Z

Suppose you have two matching sets of data, but with missing values in different places:

import iris
import numpy as np
import numpy.ma as npma

mask1 = np.zeros((7), dtype=bool)
mask2 = np.zeros((7), dtype=bool)
mask1[2]=True
mask2[4]=True

cube1 = iris.cube.Cube(npma.MaskedArray(range(7), mask=mask1),
                       dim_coords_and_dims = 
                       [(iris.coords.DimCoord(range(7), long_name='blah'),0)])

cube2 = iris.cube.Cube(npma.MaskedArray(range(7), mask=mask2),
                       dim_coords_and_dims =
                       [(iris.coords.DimCoord(range(7), long_name='blah'),0)])

The correlation should be 1, but you get different values depending on which function you use:

import iris.analysis.stats as istats
import scipy.stats.mstats as spsm

print istats.pearsonr(cube1, cube2).data
print spsm.pearsonr(cube1.data, cube2.data)
print npma.corrcoef(cube1.data, cube2.data)

The npma function gives 1.0, but the iris and scipy functions both give 0.963... The scipy function already has a lot of discussion over here: scipy/scipy#3645.

esc24 · 2015-01-22T14:49:00Z

@niallrobinson added the pearsonr functionality to Iris. I'd be interested in his thoughts.

niallrobinson · 2015-01-26T14:21:56Z

Its not something that I remember being aware of. My initial thoughts are that your statement

The correlation should be 1
isn't as obvious as it first sounds. Pearson's r has got a dependence on the length of the arrays, which is ambiguous in this case

That said, the expected behaviour is probably what you describe, and what they landed on on the scipy discussion i.e. your effective datasets are arrays A and B both masked with maskA OR maskB. I'll make a PR

rcomer · 2015-09-29T15:20:24Z

Now that #1748 is merged I guess we can close this?

ajdawson · 2015-09-29T15:22:55Z

Agreed.

niallrobinson mentioned this issue Jan 28, 2015

Changed Pearson r to handle masked data in a more reasonable way #1540

Closed

rcomer mentioned this issue Aug 3, 2015

pearsonr apply common mask #1748

Merged

ajdawson closed this as completed Sep 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis.stats.pearsonr and masked arrays #1534

analysis.stats.pearsonr and masked arrays #1534

rcomer commented Jan 22, 2015

esc24 commented Jan 22, 2015

niallrobinson commented Jan 26, 2015

rcomer commented Sep 29, 2015

ajdawson commented Sep 29, 2015

analysis.stats.pearsonr and masked arrays #1534

analysis.stats.pearsonr and masked arrays #1534

Comments

rcomer commented Jan 22, 2015

esc24 commented Jan 22, 2015

niallrobinson commented Jan 26, 2015

rcomer commented Sep 29, 2015

ajdawson commented Sep 29, 2015