- Download ImageNet64 from ImageNet Download Page.
- Extract the
*.zip
files to unveil the dataset batches, includingtrain_data_batch_1
totrain_data_batch_10
andval_data
.
Organize the extracted data into:
- Training data:
downloadeddata/train
- Validation data:
downloadeddata/val
Install the required dependencies to ensure the scripts run smoothly:
pip install -r requirements.txt
-
Extracting Images: Run
extractimages.py
to decode and store images in PNG format, categorizing them into directories named after ImageNet synset IDs. -
Data Clustering: Execute
clustereddata.py
to organize images based on clustered labels. For instance, all cat images (n02124075, n02123394, n02123159, n02123597, n02123045, n02127052) are clustered, enhancing dataset manageability for training purposes. The script also balances the dataset by equalizing the number of images across clusters.
For comprehensive label mappings and insights into data clustering, refer to the following resources: