NOTE: This script is for non-commercial research or educational use only.
Use this script to cut ImageNet, Pascal VOC, and Common Objects in Context(COCO) datasets.
Download the script cut_dataset.py
.
In a Python console, run the following command after specifying the parameters:
python C:/Users/Downloads/cut_dataset.py \
--source_archive_dir=<full_path_to_source_archive> \
--output_size=<number_of_images> \
--output_archive_dir=<path_to_output_archive> \
--dataset_type=imagenet
--first_image=<image_number>
This command runs the script with the following arguments:
Parameter | Explanation |
---|---|
--source_archive_dir=<full_path_to_source_archive> |
Full path to the downloaded archive including the name |
--output_size=<number_of_images> |
Number of images to be left in a smaller dataset |
--output_archive_dir=<path_to_output_archive> |
Full directory to the smaller dataset excluding the name |
--dataset_type=<dataset_type> |
Type of the source dataset (imagenet or voc ) |
--first_image=<image_number> |
Optional. The number of the image to start cutting from. Specify if you want to split your dataset into train/val subsets. The default is 0. |
In a Python console, run the following command after specifying the parameters:
python C:/Users/Downloads/cut_dataset.py \
--source_images_archive_dir=<full_path_to_source_images_archive> \
--source_annotations_archive_dir=<full_path_to_source_annotations_archive> \
--output_size=<number_of_images> \
--output_archive_dir=<path_to_output_archive> \
--dataset_type=coco
--first_image=<image_number>
This command runs the script with the following arguments:
Parameter | Explanation |
---|---|
--source_images_archive_dir=<full_path_to_source_images_archive> |
Full path to the downloaded archive with images, including the name |
--source_annotations_archive_dir=<full_path_to_source_annotations_archive> |
Full path to the downloaded archive with annotations, including the name |
--output_size=<number_of_images> |
Number of images to be left in a smaller dataset |
--output_archive_dir=<path_to_output_archive> |
Full directory to the smaller dataset excluding the name |
--dataset_type=<dataset_type> |
Type of the source dataset |
--first_image=<image_number> |
Optional. The number of the image to start cutting from. Specify if you want to split your dataset into train/val subsets. The default is 0. |