Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise versions for rolling_cov, ewmcov and expanding_cov #4950

Merged
merged 5 commits into from
Mar 28, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 56 additions & 12 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,19 @@ The ``Series`` object has a method ``cov`` to compute covariance between series
Analogously, ``DataFrame`` has a method ``cov`` to compute pairwise covariances
among the series in the DataFrame, also excluding NA/null values.

.. _computation.covariance.caveats:

.. note::

Assuming the missing data are missing at random this results in an estimate
for the covariance matrix which is unbiased. However, for many applications
this estimate may not be acceptable because the estimated covariance matrix
is not guaranteed to be positive semi-definite. This could lead to
estimated correlations having absolute values which are greater than one,
and/or a non-invertible covariance matrix. See `Estimation of covariance
matrices <http://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices>`_
for more details.

.. ipython:: python

frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
Expand Down Expand Up @@ -99,6 +112,12 @@ correlation methods are provided:

All of these are currently computed using pairwise complete observations.

.. note::

Please see the :ref:`caveats <computation.covariance.caveats>` associated
with this method of calculating correlation matrices in the
:ref:`covariance section <computation.covariance>`.

.. ipython:: python

frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
Expand Down Expand Up @@ -325,11 +344,14 @@ Binary rolling moments
two ``Series`` or any combination of ``DataFrame/Series`` or
``DataFrame/DataFrame``. Here is the behavior in each case:

- two ``Series``: compute the statistic for the pairing
- two ``Series``: compute the statistic for the pairing.
- ``DataFrame/Series``: compute the statistics for each column of the DataFrame
with the passed Series, thus returning a DataFrame
- ``DataFrame/DataFrame``: compute statistic for matching column names,
returning a DataFrame
with the passed Series, thus returning a DataFrame.
- ``DataFrame/DataFrame``: by default compute the statistic for matching column
names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
passed then computes the statistic for each pair of columns, returning a
``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
<stats.moments.corr_pairwise>`).

For example:

Expand All @@ -340,20 +362,42 @@ For example:

.. _stats.moments.corr_pairwise:

Computing rolling pairwise correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computing rolling pairwise covariances and correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In financial data analysis and other fields it's common to compute correlation
matrices for a collection of time series. More difficult is to compute a
moving-window correlation matrix. This can be done using the
``rolling_corr_pairwise`` function, which yields a ``Panel`` whose ``items``
are the dates in question:
In financial data analysis and other fields it's common to compute covariance
and correlation matrices for a collection of time series. Often one is also
interested in moving-window covariance and correlation matrices. This can be
done by passing the ``pairwise`` keyword argument, which in the case of
``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
question. In the case of a single DataFrame argument the ``pairwise`` argument
can even be omitted:

.. note::

Missing values are ignored and each entry is computed using the pairwise
complete observations. Please see the :ref:`covariance section
<computation.covariance>` for :ref:`caveats
<computation.covariance.caveats>` associated with this method of
calculating covariance and correlation matrices.

.. ipython:: python

correls = rolling_corr_pairwise(df, 50)
covs = rolling_cov(df[['B','C','D']], df[['A','B','C']], 50, pairwise=True)
covs[df.index[-50]]

.. ipython:: python

correls = rolling_corr(df, 50)
correls[df.index[-50]]

.. note::

Prior to version 0.14 this was available through ``rolling_corr_pairwise``
which is now simply syntactic sugar for calling ``rolling_corr(...,
pairwise=True)`` and deprecated. This is likely to be removed in a future
release.

You can efficiently retrieve the time series of correlations between two
columns using ``ix`` indexing:

Expand Down
13 changes: 13 additions & 0 deletions doc/source/v0.14.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,19 @@ These are out-of-bounds selections

Because of the default `align` value changes, coordinates of bar plots are now located on integer values (0.0, 1.0, 2.0 ...). This is intended to make bar plot be located on the same coodinates as line plot. However, bar plot may differs unexpectedly when you manually adjust the bar location or drawing area, such as using `set_xlim`, `set_ylim`, etc. In this cases, please modify your script to meet with new coordinates.

- ``pairwise`` keyword was added to the statistical moment functions
``rolling_cov``, ``rolling_corr``, ``ewmcov``, ``ewmcorr``,
``expanding_cov``, ``expanding_corr`` to allow the calculation of moving
window covariance and correlation matrices (:issue:`4950`). See
:ref:`Computing rolling pairwise covariances and correlations
<stats.moments.corr_pairwise>` in the docs.

.. ipython:: python

df = DataFrame(np.random.randn(10,4),columns=list('ABCD'))
covs = rolling_cov(df[['A','B','C']], df[['B','C','D']], 5, pairwise=True)
covs[df.index[-1]]


MultiIndexing Using Slicers
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
Loading