Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: FFHQ 256x256 #19

Open
KomputerMaster64 opened this issue Sep 16, 2022 · 4 comments
Open

Query: FFHQ 256x256 #19

KomputerMaster64 opened this issue Sep 16, 2022 · 4 comments

Comments

@KomputerMaster64
Copy link

Respected sir
Thank you for sharing the implementation and weights for the DDGAN model. I am comparing the DDGAN model with other generative models for image generation.I wanted to train the model on FFHQ 256x256 dataset. For getting to the 256x256 version of the dataset, one has to download the 1024x1024 version of it (the dataset preparation method is given in the NVIDIA NVAE repository). However I am facing an issue, the dataset (FFHQ 1024x1024) is almost 90 GB in size, which exceeds the limits of my current resources.

I thought of downloading the resized FFHQ 256x256 version from kaggle, however I am not sure the pre-processing scripts will work fine. I humbly request you to guide me.

PS I would be grateful if you could share the pre-trained DDGAN model on the FFHQ 256x256 dataset.

@KomputerMaster64
Copy link
Author

I am trying to implement the DDGAN model on the FFHQ 256x256 dataset. I have used the FFHQ 256x256 resized dataset from the kaggle since the FFHQ 1024x1024 dataset has a size of 90 GB, which exceeds the limits of my resources.


The Kaggle dataset has the files in archive.zip file, which has a directory "resized" which contains the 70k .jpg files.


The file structure is as follows:
archive.zip
├  resized
├ (70k images)


I am using google drive and colab notebooks for the implementation. I am using the file setup with CODE_DIR = "/content/drive/MyDrive/Repositories/NVAE" and DATA_DIR = "/content/drive/MyDrive/Repositories/NVAE/dataset_nvae". When I try to run the command !python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train, I get the following error message:

Traceback (most recent call last):
  File "create_ffhq_lmdb.py", line 70, in <module>
    main(args.split, args.ffhq_img_path, args.ffhq_lmdb_path)
  File "create_ffhq_lmdb.py", line 46, in main
    im = Image.open(img_path)
  File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2843, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/55962.png'

@KomputerMaster64
Copy link
Author

KomputerMaster64 commented Sep 16, 2022

I altered the line 45 from img_path = os.path.join(ffhq_img_path, '%05d.png' % i) to img_path = os.path.join(ffhq_img_path, '%05d.jpg' % i) since the kaggle ffhq 256x256 resized dataset has .jpg image files.


The above change has resulted in the command !python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train giving the following output

100
200
300
400
500
600
700
800
900
1000
1100
1200
.
.
.
.
.
.

@KomputerMaster64
Copy link
Author

I cross checked with the files that were unzipped. The number of files should be 70k but after repeated unzipping operations I am able to extract only 50k or 52k images even though the output of the cell shows the last file unzipped was 69999.jpg



Google Colab Notebook and Google Drive used for the implementation.

Command used: !unzip images1024x1024.zip -d $DATA_DIR/ffhq/

Last few lines of h the output of the cell:

  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69990.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69991.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69992.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69993.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69994.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69995.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69996.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69997.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69998.jpg  
  inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69999.jpg 


Output of the Google Drive after the operation.
image

@KomputerMaster64
Copy link
Author

I altered the line 45 from img_path = os.path.join(ffhq_img_path, '%05d.png' % i) to img_path = os.path.join(ffhq_img_path, '%05d.jpg' % i) since the kaggle ffhq 256x256 resized dataset has .jpg image files. The above change has resulted in the command !python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train giving the following output

100
200
300
400
500
600
700
800
900
1000
1100
1200
.
.
.
.
.
.

After executing the command !python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train I am getting the following output showing that the training set has been converted into the LMDB dataset:

48600
48700
48800
48900
49000
...
62800
62900
63000
added 63000 items to the LMDB datset.


HOWEVER, right after 2 minutes, the above suggested output changes to the following output:

48600
48700
48800
48900
49000
49100
...
    main(args.split, args.ffhq_img_path, args.ffhq_lmdb_path)
  File "create_ffhq_lmdb.py", line 55, in main
    print('added %d items to the LMDB dataset.' % count)
lmdb.Error: mdb_txn_commit: Disk quota exceeded



This behaviour is not observed for the validation set.
I request you to please guide me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant