coords for pairs of variables. #1627

pscicluna · 2021-03-22T00:50:37Z

Short Description

HI! I'm trying to understand how I can make the coords keyword of many arviz plotting routines to behave in a certain way. I hope explain the problem well enough, I'll gladly try to clarify if not.

To outline my problem, I'm inferring the contents of a correlation matrix with PyMC3. Now, because the matrix is symmetrical, I only want to use the upper (or lower, if it makes a difference) off-diagonal elements, that is to use elements [0,1], [1,2], [0,2] if it's a 3-dimensional problem.

However, plot_pair seems to interpret coords as a "slice" along a dimension, such that if I set coords to [0,1] on dimension one of the correlation matrix and [1,2] on dimension two, I will also get the plots for [1,1]. Is there any way that I can leave this one out automatically, or do I have to manually iterate over a set of axes to get this?

Code Example or link

Here's a short example, you can find a more complete version of the code for context in this repo.

import numpy as np
import pymc3 as pm
import arviz as az


mu = np.array([10., 30.])
sigma = np.array([20., 40.])
rho = -0.7
cov = np.zeros((len(mu), len(mu)))
cov[0,:] = [sigma[0]**2, rho*sigma[0]*sigma[1]]
cov[1,:] = [rho*sigma[1]*sigma[0], sigma[1]**2]

data = np.random.multivariate_normal(mu, cov, size=100)
ndim = data.shape[1]

with pm.Model() as model:
            #we put weakly informative priors on the means and standard deviations of the multivariate normal distribution
            mu = pm.Normal("mu", mu=mu_prior[0], sigma=mu_prior[1], shape=ndim)
            sigma = pm.HalfCauchy.dist(sigma_prior)
            #and a prior on the covariance matrix which weakly penalises strong correlations
            chol, corr, stds = pm.LKJCholeskyCov("chol", n=ndim, eta=2.0, sd_dist=sigma, compute_corr=True)
            #the prior gives us the Cholesky Decomposition of the covariance matrix, so for completeness we can calculate that determinisitically
            cov = pm.Deterministic("cov", chol.dot(chol.T))

            #and now we can put our observed values into a multivariate normal to complete the model
            vals = pm.MvNormal('vals', mu=mu, chol=chol, observed=data)

trace = pm.sample(
                steps, tune=tune, target_accept=0.9, compute_convergence_checks=False,return_inferencedata=True
            )

coords = {"chol_corr_dim_0":[0], "chol_corr_dim_1":[1]} #For 2-D data I can do this
                        #But it's not clear how I can just get the upper triangle in 3 or more dimensions

plot_vars = ['mu', 'chol_corr']
az.plot_pair(trace,
                     var_names = plot_vars,
                     coords = coords,
                     kind="kde",
                     marginals=True,
                     point_estimate="mean",
                     show=True,
            )

Thanks in advance for any input you can give me!

The text was updated successfully, but these errors were encountered:

OriolAbril · 2021-03-22T08:01:04Z

Pointwise or non-slice selection with xarray is something that always feels very complicated to me, with very obscure docs. You should be able to perform pointwise selection using DataArrays whose dimension is not present in the xarray object you want to index. Here is an example.

Our dataset has 4 dimensions, and we want to perform pointwise selection with values over 2 of them (I think it should work with as many dims as desired though).

import xarray as xr
import arviz as az

ds = az.dict_to_dataset(
    {"s": np.random.normal(size=(4, 100, 3, 2))}, 
    dims={"s": ["time", "sensor"]}, 
    coords={"sensor": ["A", "B"]}
)

Output:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 100, pointwise_sel: 3)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
    time     (pointwise_sel) int64 0 2 2
    sensor   (pointwise_sel) <U1 'A' 'A' 'B'
Dimensions without coordinates: pointwise_sel
Data variables:
    s        (chain, draw, pointwise_sel) float64 2.155 -0.3476 ... 0.305

We can now select a subset with:

coords = {
     'sensor': xr.DataArray(['A', 'A', 'B'], dims=['pointwise_sel']),
     'time': xr.DataArray([0, 2, 2], dims=['pointwise_sel'])
 }
ds.sel(coords)

whose output is:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 100, pointwise_sel: 3)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
    time     (pointwise_sel) int64 0 2 2
    sensor   (pointwise_sel) <U1 'A' 'A' 'B'
Dimensions without coordinates: pointwise_sel
Data variables:
    s        (chain, draw, pointwise_sel) float64 2.155 -0.3476 ... 0.305

ArviZ uses .sel under the hood, so using this code below will work

az.plot_posterior(ds, coords=coords)

There is one important caveat though, which is that now the original coordinate values that we used for indexing have been converted to coordinated without dimension, and they are not shown in the plot. As a comparison:

az.plot_posterior(ds)

With the latest (yet to be released at the time of writing) updates to ArviZ, you could define custom labellers to take care of that. See the label guide for info on labellers and installation guide for info on installing the development version.

And thanks for reporting! I think I'll use this as the example for the last section of the label guide 😄

The emerging idea/pattern could be:

subset_ds = ds.sel(coords)
# some extra work to get the "proper" labels
labeller = # define labeller with the processed ds
az.plot_xyz(subset_ds, labeller=labeller)

pscicluna · 2021-03-22T08:35:03Z

Thanks, that's awesome! I'll give that a try

Any idea if it's possible to hack some sort of custom labelling in v0.11.0? I'm stuck with that to make arviz play nice with PyMC3...

OriolAbril · 2021-03-22T08:43:32Z

I don't think it's possible :/.

What may be easier is to get latest ArviZ to play nice with old PyMC3 if you are interested in submitting a PR for that. I think that the only breaking incompatibility is that we removed geweke from arviz/stats/diagnostics.py, it had been basically broken for a while, it only worked on 1d arrays which little to no arviz data are, nobody complained about this which for us meant that nobody was using it and we broke no tests removing it (we only test on latest release and dev pymc3 versions).

If you are willing to take this on and locally test on your older pymc3 versions, submitting a PR to add geweke back with no code in the function, only raising a not implemented error should fix the compatibility issue and allow to work with latest arviz as long as geweke is not used.

pscicluna · 2021-03-24T06:52:29Z

Ah, okay, thanks for explaining that. If I get the chance, I will attempt the PR, but seeing as this is quite a new package I might as well also force an update to the latest versions of pymc and arviz once the custom labellers are available and I have figured out how to use them :)

Thanks for all the help!

OriolAbril · 2021-03-26T05:15:11Z

I have added a PR to extend the label guide and show how to keep the right labels #1635

OriolAbril · 2022-11-23T08:54:52Z

I'm trying to do some clean up, so I'll close this as it seems resolved, feel free to reopen if it is not the case.

OriolAbril mentioned this issue Mar 24, 2021

Multilevel modeling pymc-devs/pymc-examples#47

Closed

OriolAbril changed the title ~~plot_pair: coords for pairs of variables.~~ coords for pairs of variables. Mar 26, 2021

This was referenced Mar 29, 2021

LKJ pymc-devs/pymc-examples#53

Closed

Add a coords/dims guide #1645

Open

review-notebook-app bot mentioned this issue Nov 9, 2021

update multilevel modeling notebook pymc-devs/pymc-examples#246

Merged

OriolAbril closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coords for pairs of variables. #1627

coords for pairs of variables. #1627

pscicluna commented Mar 22, 2021 •

edited

Loading

OriolAbril commented Mar 22, 2021

pscicluna commented Mar 22, 2021

OriolAbril commented Mar 22, 2021

pscicluna commented Mar 24, 2021

OriolAbril commented Mar 26, 2021

OriolAbril commented Nov 23, 2022

coords for pairs of variables. #1627

coords for pairs of variables. #1627

Comments

pscicluna commented Mar 22, 2021 • edited Loading

Short Description

Code Example or link

OriolAbril commented Mar 22, 2021

pscicluna commented Mar 22, 2021

OriolAbril commented Mar 22, 2021

pscicluna commented Mar 24, 2021

OriolAbril commented Mar 26, 2021

OriolAbril commented Nov 23, 2022

pscicluna commented Mar 22, 2021 •

edited

Loading