Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding fvgc_aircraft dataset #5178

Merged
merged 13 commits into from
Jan 14, 2022
Merged

Adding fvgc_aircraft dataset #5178

merged 13 commits into from
Jan 14, 2022

Conversation

yiwen-song
Copy link
Contributor

@yiwen-song yiwen-song commented Jan 9, 2022

Per #5108

image

cc @pmeier

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 9, 2022

💊 CI failures summary and remediations

As of commit caf762b (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sallysyw, thanks a lot for the PR. I've one question before I actually dive into the PR: why do we want restructure the data into an image folder hierarchy? IIUC, we could also work with the given format, which I would prefer.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR Yiwen!

This looks great, I made a few minor comments below. As Philip noted above, we tend not to change the dataset underlying structure, partly so that users can also just manually download the files, should they want to. Was there a specific reason to re-organise the dataset structure?

Thanks!

torchvision/datasets/fvgc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fvgc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fvgc_aircraft.py Outdated Show resolved Hide resolved
test/test_datasets.py Outdated Show resolved Hide resolved
test/test_datasets.py Outdated Show resolved Hide resolved
@yiwen-song
Copy link
Contributor Author

Hey @sallysyw, thanks a lot for the PR. I've one question before I actually dive into the PR: why do we want restructure the data into an image folder hierarchy? IIUC, we could also work with the given format, which I would prefer.

Good point! I did that for no specific reason - just follow what VISSL did. But I agree that we'd better not copy these images to another folder after downloading to avoid additional overhead. Let me send a patch to address this issue. Thanks!

@pmeier @NicolasHug

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sallysyw, I have some minor comments inline.

Apart from that I'm wondering if it makes sense to add a parameter to select the different "levels" of the dataset.

  • Model, e.g. Boeing 737-76J. Since certain models are nearly visually indistinguishable, this level is not used in the evaluation.
  • Variant, e.g. Boeing 737-700. A variant collapses all the models that are visually indistinguishable into one class. The dataset comprises 102 different variants.
  • Family, e.g. Boeing 737. The dataset comprises 70 different families.
  • Manufacturer, e.g. Boeing. The dataset comprises 41 different manufacturers.

We could have a level="variant" that also allows "family" and "manufacturer". From what I see, we would only need to change the files we read from based on this parameter.

torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
test/test_datasets.py Outdated Show resolved Hide resolved
@NicolasHug
Copy link
Member

Apart from that I'm wondering if it makes sense to add a parameter to select the different "levels" of the dataset.
...
We could have a level="variant" that also allows "family" and "manufacturer". From what I see, we would only need to change the files we read from based on this parameter.

That's a great point. Ultimately our goal with this dataset is to support our colleagues which are implementing FLAVA, and the "reference" in this case is https://github.com/facebookresearch/vissl/blob/main/extra_scripts/datasets/create_fgvc_aircraft_data_files.py. From what I understand, they would only need to use the "variant" categories. Perhaps it would be OK to just implement the "variant" category in this PR to keep it simple, and to implement the rest if users request more categories?

@yiwen-song
Copy link
Contributor Author

Thanks both of you for the review. For the categories, I think for completeness I will provide a parameter called "level" or "annotation_level" and set the default to be "variant", this way it covers the existing functionalities in FLAVA and still giving users flexibility to use other levels' annotations of this dataset.

@pmeier @NicolasHug

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits and one question inline. Otherwise LGTM. Thanks a lot @sallysyw!

torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
test/test_datasets.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
torchvision/datasets/fgvc_aircraft.py Outdated Show resolved Hide resolved
@yiwen-song yiwen-song merged commit adf8466 into pytorch:main Jan 14, 2022
@yiwen-song yiwen-song deleted the fvgc branch January 14, 2022 22:59
facebook-github-bot pushed a commit that referenced this pull request Jan 17, 2022
Summary:
* add fvgc_aircraft dataset

* add docstring & remove useless import

* resolve lint issue

* address comments

* adding more annotation level

* nit

* address comments

* Apply suggestions from code review

* unify format

* remove useless line

Reviewed By: NicolasHug

Differential Revision: D33618172

fbshipit-source-id: ce6471096d8527b08373061e8ec2059a25f96f1d

Co-authored-by: Philip Meier <[email protected]>
@pmeier pmeier mentioned this pull request Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants