Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About log likelihood of data point (logpX) #89

Open
Meng-Wei opened this issue Sep 20, 2019 · 2 comments
Open

About log likelihood of data point (logpX) #89

Meng-Wei opened this issue Sep 20, 2019 · 2 comments

Comments

@Meng-Wei
Copy link

Hi, there! I am trying to calculate the log-likelihood of data points.

To do so, I directly modified the provided loss function:
Screen Shot 2019-09-20 at 3 31 48 PM

Then, I tested it on the cifar-10 dataset, and got the following:
image

This result is similar to the one in the paper: https://arxiv.org/abs/1810.09136 "DO DEEP GENERATIVE MODELS KNOW WHAT THEY DON’T KNOW", so I assumed the "logpX" function is correct.

[Problem: ]
The question is, when I sampled from cifar-10, it seems that the sampled images have overall higher log-likelihood than the real images: (std here has unit 0.1, i.e. 9 implies 0.9. sry for ambiguity)
image
image
image
image

I am not sure why this happens. Is this a bug, or it is desired? Thank you in advance!

@Meng-Wei
Copy link
Author

Moreover, is "logpX" - "logpZ" = "logpDet"?
And should "logpDet" be the same (or almost the same) for different datapoints?
Thank you

@ikrets
Copy link

ikrets commented Oct 19, 2019

I am not an author of Glow, but my understanding is that you are sampling with temperature, using density p(x)^(1/T^2) instead of p(x). The std that you mention is presumably this T parameter. The effect of sampling with temperature is that higher likelihood samples are favored. I think you will get the desired histogram with std=1.

As for the second question, the formula is true, and logpDet shouldn't be equal for different datapoints. I know from doing an exercise with flows on 2D data, that there it was different by a large margin for the dataset I had. I don't know whether it is the same for high-dimensional datasets, but my guess would be yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants