Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating FID #5

Open
mmuckley opened this issue Jan 26, 2023 · 3 comments
Open

Calculating FID #5

mmuckley opened this issue Jan 26, 2023 · 3 comments

Comments

@mmuckley
Copy link

Hello, thanks for publishing this paper and repo.

I am curious about reproducing the results in the paper. I applied the Gaussian blur model to the first 1,000 images of FFHQ-256 as per Issue #4, but when using torch-fidelity I don't reproduce the FID numbers. If I include torch-fidelity's image resizing, I get 29.3. If I don't include image resizing, I get 37.0. Both of these are pretty far away from the paper value of 44.05.

Could you provide some more details on how to reproduce the numbers of Table 1?

@z-fabian
Copy link

z-fabian commented Feb 9, 2023

This might not apply to your case, but one discrepancy we found is that authors normalize images to (0, 1) range using the given image's min and max before saving, instead of clipping it to (-1, 1) and then normalizing to (0,1) by adding 1 and dividing by 2.

This can be a problem if the reconstructions have some outliers that are not clipped and therefore the range is skewed. In fact, after reading the labels, the above normalization is applied to the labels as well before saving and therefore the loaded and saved labels are not necessarily equal.

@neginraoof
Copy link

Hey @mmuckley,
I'm trying to reproduce table 1 FID scores, and I'm unable to match FFHQ random inpainting results. I'm wondering if I'm missing some preprocessing steps. Here I'm using the FFHQ256 set https://www.kaggle.com/datasets/xhlulu/flickrfaceshq-dataset-nvidia-resized-256px.

@z-fabian
Copy link

z-fabian commented May 9, 2023

FID might also differ based on whether the reconstructions are compared only to the validation set, or to the training and validation set combined. Typically if compared on less samples FID is much worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants