Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add givens argument to DensityDist #4433

Closed
wants to merge 4 commits into from

Conversation

OriolAbril
Copy link
Member

@OriolAbril OriolAbril commented Jan 23, 2021

This closes #4002 and should finish with all the DensityDist issues and complains when trying to
pass FreeRVs to the observed kwarg. This was originally possible and at some point it broke due
to ArviZ trying to store the variables passed as observed in the observed_data group. This PR
implements @rpgoldman idea of "splitting" the original unintuitive observed argument into
observed (which must be data) and givens which can be FreeRVs to maintain all the initial
flexibility of DensityDist.

The current proposal keeps the data attribute unchanged and adds a givens attribute which ArviZ
can use to filter all the FreeRVs in data and avoid crashing.

Note: To actually work, this PR also needs arviz-devs/arviz#1520

Checklist

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 23, 2021

One question. I usually use DensityDist as follows:

def mydist(var1, var1):
  self.var1 = var1
  self.var2 = var2

  def logp(value):
    return value * self.var1 + self.var2**2

with pm.Model() as m:
  norm = pm.Normal("norm", 0, 1)
  halfnorm = pm.HalfNormal("halfnorm", 1)
  like = pm.DensityDist("like", mydist(norm, halfnorm), observed=data)

Is there a problem with this approach? It looks much more in line with how normal distributions are used. If it works, why not just force this way of using DensityDist (by checking observed is just data as you are doing but without the new argument)?

Edit: Oh, I see this is designed to address the use of non theano functions.

@OriolAbril
Copy link
Member Author

There already are checks in place (that I kept) to make sure that the variable passed as observed is not a FreeRV nor an ObservedRV, however, these checks are only triggered when a single variable is passed. In the case of multiple observed variables, when a dict is used as observed no check is triggered, a MultiObservedRV is created straight away and it works, and it has worked since the dawn of time it looks like (until a couple ArviZ releases back).

We considered forcing this way of using DensityDist but there was no consensus and eventuallly we compromised on this observed/givens split to try and get the best of both worlds: extreme flexibility as well as clarity and predictability for ArviZ. There are many people, devs and users alike, who use DensityDist without passing data as observed. In addition to the issue above which refers to the blackbox likelihood example in pymc docs, there are also issues about this

For better or for worse, using DensityDist like this is a feature whose usage is encouraged by the official documentation.

@ricardoV94
Copy link
Member

Thanks for the detailed answer. I would think a single standard would be better, but I understand there is some considerable baggage to this issue.

Should we then have a brief notebook that shows the different compatible ways of using the DensityDist?

I worry that for novices facing issues (me often included), having different ways of instantiating it makes it more difficult to pinpoint where the problems are coming from (e.g. is it a syntax issue or a model/statistical one?).

@ricardoV94
Copy link
Member

Maybe add one example with the givens in the docstrings?

@OriolAbril
Copy link
Member Author

Should we then have a brief notebook that shows the different compatible ways of using the DensityDist?

Yes, that is the plan, especially since part of the original issue was with documentation using model parameters as observed. I have started with pymc-devs/pymc-examples#28 and I am planning to also extend the examples in the docstring or add a DensityDist guide as an extra example.

I worry that for novices facing issues (me often included), having different ways of instantiating it makes it more difficult to pinpoint where the problems are coming from (e.g. is it a syntax issue or a model/statistical one?).

I completely understand that, it happens to me too, and especially if shapes are involved I think it happens to everyone. I have tried to be as explicit and vigilant as possible to catch issues early on and raise informative error messages, maybe there are even a couple extra ones that can be added.

@OriolAbril OriolAbril changed the title Add givens argument to DensityDist [WIP] Add givens argument to DensityDist Jan 23, 2021
@ricardoV94
Copy link
Member

TIL: givens is the plural of given. I was going to ask why not just call it given.

Comment on lines +436 to +437
givens : dict, optional
Model variables on which the DensityDist is conditioned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not include this in the arguments to __init__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that givens (or given if it's renamed) will never get to be passed to __init__ because it's popped from kwargs in __new__, but truth be told, even after getting this to work I am not yet completely sure I understand new vs init. I have added it to the docstring anyways because of its special use tied to DensityDist and MultiObservedRV to make sure it's documented somewhere people searching for it can find it.

@OriolAbril
Copy link
Member Author

TIL: givens is the plural of given. I was going to ask why not just call it given.

We can call it given, I have no preference whatsoever

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 24, 2021

We can call it given, I have no preference whatsoever

I am not a native English speaker, so that's probably why I was surprised. Just to be clear, I am not defending that it be (or not) changed. I just wrote it to save googling time if someone else happened to be as surprised as I was.

@OriolAbril
Copy link
Member Author

I guess now it may be better to wait until after #4440, not sure about the timeline/versions

@michaelosthege
Copy link
Member

I guess now it may be better to wait until after #4440, not sure about the timeline/versions

This will take a while. I think we will soon make a patch release 3.11.1 to fix some things that just came up after 3.11.0 was released. If you can get this PR to the finish line, consider adding it to the milestone.

@OriolAbril
Copy link
Member Author

What is the released date we are aiming for with 3.11.1 roughly?

@michaelosthege
Copy link
Member

What is the released date we are aiming for with 3.11.1 roughly?

We could get it done this week if we want to. The two PRs in the milestone are at the finish line and the issue is just optional.

@michaelosthege
Copy link
Member

This PR touches APIs that are heavily refactored with v4. @brandonwillard how do you think we should proceed with this PR ?

@brandonwillard
Copy link
Contributor

how do you think we should proceed with this PR ?

That's a tough question, because—right now—v4 removes the DensityDist idea entirely. As I understand it, a DensityDist would simply be a user-defined RandomVariable Op with or without an Op.perform (similar to a DensityDist with or without a random implementation). We could reproduce/emulate the DensityDist feature by providing a factory that creates a RandomVariable and registers a logpt dispatch function for said RandomVariable using the user-provided log-likelihood function. This seems to be what DensityDist does—more or less.

Regardless, we can merge this and I can simply rebase the v4 branch; that won't be any trouble

@OriolAbril
Copy link
Member Author

I don't really care if this gets merged or not (personally I would have already removed DensityDist), but I think that having this factory that you mention would be a great idea, people seem to love the DensityDist idea and are quite attached to it 🤷 even in cases when using a pm.Potential would be basically the same or even easier.

Therefore, I think the best approach would be to at some point (not sure when) add a FutureWarning to DensityDist with some guidance on how to update the code, either mentioning this factory or a page about "update your densitydists"

@michaelosthege
Copy link
Member

I don't really care if this gets merged or not (personally I would have already removed DensityDist), but I think that having this factory that you mention would be a great idea, people seem to love the DensityDist idea and are quite attached to it 🤷 even in cases when using a pm.Potential would be basically the same or even easier.

Therefore, I think the best approach would be to at some point (not sure when) add a FutureWarning to DensityDist with some guidance on how to update the code, either mentioning this factory or a page about "update your densitydists"

It "too late" for a future warning. Instead it sounds like the DensityDist will be either rewritten for 4.0 with a similar API, removed entirely or removed and re-introduced in >4.0.

@OriolAbril
Copy link
Member Author

sounds good, I will close this PR then and the related ones

@OriolAbril OriolAbril closed this Feb 27, 2021
@jduerholt
Copy link

Hi,

I am new to pyMC3 and need to use a black box likelihood model as described here https://docs.pymc.io/notebooks/blackbox_external_likelihood.html. I stumbled above the arviz error message also described here https://stackoverflow.com/questions/62800160/theano-error-when-using-pymc3-theano-gof-fg-missinginputerror.

Following this conversation, I am not sure if it is again possible to use an external likelihood or not?

Best,

Johannes

@michaelosthege
Copy link
Member

@jduerholt should be - the backbox likelihood examples are getting updated in this PR: pymc-devs/pymc-examples#28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blackbox_external_likelihood notebook is broken
7 participants