Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate categorical labels in plots/summary #1182

Closed
oscarbranson opened this issue May 12, 2020 · 6 comments
Closed

Incorporate categorical labels in plots/summary #1182

oscarbranson opened this issue May 12, 2020 · 6 comments

Comments

@oscarbranson
Copy link

oscarbranson commented May 12, 2020

I'm using arviz in conjunction with pymc3 fitting a model with categorical variables. When using arviz to view/plot variables the names associated with the categories are replaced by their integer values, making information harder to interpret. It would be nice if there were a way to restore meaningful category names, perhaps during creation of the InferenceData object?

For example, plot_forest:

data_path = "https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/NWOGrants.csv"
nwo = pd.read_csv(data_path, delimiter=';')

gender = pd.Categorical(nwo['gender'])  # 0 is female, 1 is male

with pm.Model() as m_1:
    a = pm.Normal('a', 0, 1.5, shape=gender.categories.size)
    
    p = pm.invlogit(a[gender.codes])
    # Category labels are removed by the use of `.codes`, which provides integer labels.
    # This is necessary because theano throws a RecursionError if the categories are not integers.
    # This is the information that it would be nice to re-incorporate in arviz. 
    
    award = pm.Binomial('award', nwo.applications, p, observed=nwo.awards)
    
    trace_1 = pm.sample()

az.plot_forest(trace_1, combined=True)

image

It would be nice if there a way to re-incorporate the category labels lost through having to provide the categories as integers (gender.codes)

@OriolAbril
Copy link
Member

Hi, you can use the dims and coords arguments when creating the inference data object. There are some examples in the cookbook, one of them using from_pymc3. In your case, I think it will look similar to the following:

...
    idata = az.from_pymc3(trace_1, dims={"a": ["gender"]}, coords={"gender": nwo['gender']}

az.plot_forest(idata, combined=True)

This will show the labels like it happens in plot_forest API examples. It should work with all plots, but not with az.summary, the feature is still pending, see #1091

@oscarbranson
Copy link
Author

Thanks @OriolAbril - all working now, though in my example it was coords={"gender": gender.categories} because it expects an iterable with len(it) = n_categories.

My issue here was finding out how to do this. As far as I can tell this functionality only appears in that one cookbook example because most examples are based on an arviz dataset where all the labels are already correct. It would be great if there were some info on this in the InferenceData docs.

@OriolAbril
Copy link
Member

We would like to have a notebook for every converter function to cover these aspects, however this will probably take some time. In the meantime, how does adding a link to the cookbook in the API section of each converter sound? Also in InferenceData docs.

I would not add it directly to InferenceData docs, because it is not recommended to create InferenceData objects directly using the InferenceData class. Quickstart and cookbook notebooks will generally be a better reference.

And thanks for opening the issue, we really value feedback on documentation.

@oscarbranson
Copy link
Author

The notebook sounds like a good idea, but I appreciate this is a lot of time and work. I agree that this doesn't really belong in the InferenceData docs, and think that adding a link to the cookbook in the API section (I assume to the from_pymc3 doc) would be a very useful stopgap.

Thanks for taking the comments on board! All my interactions with you guys have been 💯!

@OriolAbril
Copy link
Member

I have been doing some tweaks to the docs, from_pymc3 will have a link to the cookbook and the cookbook will have a table of contents to ease navigation. Links point to the updated docs (hosted in my fork until merged in #1184)

@oscarbranson
Copy link
Author

Great, Thanks @OriolAbril. Think this can probably be closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants