Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Suggestion] Regarding ImageFolder #145

Closed
rtqichen opened this issue Apr 12, 2017 · 8 comments
Closed

[Suggestion] Regarding ImageFolder #145

rtqichen opened this issue Apr 12, 2017 · 8 comments

Comments

@rtqichen
Copy link
Contributor

The current ImageFolder is much too restrictive, as is evident by the number of PRs trying to add various functionalities.

Would it make sense to simply have find_classes [1] be a callable argument similar to how a custom loader can be specified? ImageFolder can then be made recursive without impacting the current functionality, and both existing PRs (#100, #47) would be special cases of this. This change would allow ImageFolder to be a much more versatile image loader.

[1] https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py#L17

@lucasb-eyer
Copy link

Or, since it's already a class, make find_classes a method with the default implementation as provided and new datasets should just inherit and overwrite it.

@greaber
Copy link

greaber commented May 17, 2017

I guess is_image_file and make_dataset should also be methods and also make_dataset should be modified to only look in the directories returned by find_classes.

It would be nice if PyTorch got a generic DataFolder class that could be used for any modality, and ImageFolder could inherit from that and just specify some sensible defaults for images. I am using ImageFolder for audio, and I have to either rename my audio files with extensions normally used for images or used a hacked version of ImageFolder. is_image_file could be renamed is_example_file (or maybe there is a better non-image-specific name).

Possibly it would be best to refactor find_classes to call an is_class method analogous to is_example_file.

Finally, maybe is_example_file should be passed the whole path of the file relative to self.root as a convenience so that the user doesn't have to rename or move files in order to make which files are selected depend on the class. This would be convenient if the user wants to exclude some files but they are named the same as other files that should be included.

@fmassa
Copy link
Member

fmassa commented Sep 3, 2017

I think we can make find_classes be either an optional arguments in the constructor or a method of the class.
But adding the same support for make_dataset might not be necessary, as there would be no functionality left in ImageFolder and would be just better to create a new dataset.

I think that having a DataFolder base class would be good, but might be out of the scope of this repo, which is vision-specific.

What do you think? Would you be willing to send a PR improving ImageFolder with support for a custom find_classes?

@alykhantejani
Copy link
Contributor

I'm not sure about how many use cases there are to supply a custom find_classes or a make_dataset (which are the main guts of this class).

How about moving is_image_file, make_dataset and find_classes into the ImageFolder class, so if someone really wants different behavior (or to support other image extensions) then can just subclass ImageFolder and override this functionality?

Choco31415 added a commit to Choco31415/vision that referenced this issue Jun 4, 2018
@Choco31415
Copy link
Contributor

Choco31415 commented Jun 4, 2018

@fmassa , it's currently not possible to only override find_classes because make_dataset requires that sub folders are the class names. Specifically class_to_idx[target] line 54.

Considering that overriding one requires overriding the other, what if there were a wrapper function in DatasetFolder, called _load_dataset_outline()? It would return classes, class_to_idx, and samples. The DatasetFolder default implementation would call find_classes and make_dataset like the current behavior.

@Choco31415
Copy link
Contributor

Choco31415 commented Jun 4, 2018

Edit, I found a solution to my concerns. Submitting a new PR now.

@lucasb-eyer
Copy link

These really ought to be methods of the class such that they can individually be overwritten in a user's child-class IMO.

fmassa pushed a commit that referenced this issue Jun 6, 2018
* Addresses issue #145 as per @fmessa's suggestion.

* Removed blank line for styling.
@fmassa
Copy link
Member

fmassa commented Jun 6, 2018

Fixed via #145.

@fmassa fmassa closed this as completed Jun 6, 2018
varunagrawal pushed a commit to varunagrawal/vision that referenced this issue Jul 23, 2018
* Addresses issue pytorch#145 as per @fmessa's suggestion.

* Removed blank line for styling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants