Add widerface dataset #2883

jgbradley1 · 2020-10-23T19:25:48Z

This PR addresses issue #1627 and adds a dataset that is more representative (and harder) of generic face detection.

Main changes in the PR include

adds WIDERFace dataset as an option
improves the method to check and download files from google drive when download quota has been exceeded. See the Google drive API docs for more details, A quota limit exceeded will return a 403 response. Checking for the correct response code before performing a string search saves a significant amount of time for the most general use case (i.e. when the quota limit has not been exceeded).

codecov · 2020-10-26T22:46:56Z

Codecov Report

Merging #2883 (c11858f) into master (36daee3) will decrease coverage by 0.84%.
The diff coverage is 51.31%.

@@            Coverage Diff             @@
##           master    #2883      +/-   ##
==========================================
- Coverage   73.41%   72.57%   -0.85%     
==========================================
  Files          99      100       +1     
  Lines        8801     8947     +146     
  Branches     1389     1419      +30     
==========================================
+ Hits         6461     6493      +32     
- Misses       1915     2033     +118     
+ Partials      425      421       -4

Impacted Files	Coverage Δ
torchvision/datasets/folder.py	`85.54% <ø> (ø)`
torchvision/models/detection/anchor_utils.py	`89.61% <0.00%> (-2.50%)`	⬇️
torchvision/models/quantization/inception.py	`31.45% <0.00%> (ø)`
torchvision/datasets/widerface.py	`15.70% <15.70%> (ø)`
torchvision/models/inception.py	`85.03% <33.33%> (+0.39%)`	⬆️
torchvision/models/detection/retinanet.py	`75.87% <50.00%> (-0.21%)`	⬇️
torchvision/datasets/__init__.py	`100.00% <100.00%> (ø)`
torchvision/models/densenet.py	`83.58% <100.00%> (+0.76%)`	⬆️
torchvision/models/detection/_utils.py	`85.43% <100.00%> (+0.39%)`	⬆️
torchvision/models/detection/backbone_utils.py	`95.55% <100.00%> (+1.26%)`	⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36daee3...1f0223c. Read the comment docs.

…into add-widerface-dataset

jgbradley1 · 2020-11-10T04:52:36Z

I think it would be preferable instead to always return the "raw" mode as a dict. I know we have target_type in CelebA already, but I would prefer if the output type of the datasets was always the same.

@fmassa I need some clarification. Are you suggesting to remove the target_type argument entirely and always return a dictionary? For example, the raw annotation for one image with one face would be:

{
  "raw": tensor([[983, 296, 7, 9, 2, 0, 0, 1, 1, 0]]), # torch.Size([1, 10])
  "bbox": tensor([[983, 296, 7, 9]]),                  # torch.Size([1, 4])
  "attributes": tensor([[2, 0, 0, 1, 1, 0]])           # torch.Size([1, 6])
}

Should the raw key-value pair be left as part of the dict? The tensors will change shape based on the number of faces in an image but hopefully you get the idea from the example above.

If target_type is removed, this may change how target_transform is handled. I used CelebA as a template for applying target_transform. Is there a better Dataset class I should reference as the standard?

…into add-widerface-dataset

jgbradley1 · 2021-01-04T05:00:18Z

@fmassa and @pmeier I'm just picking this PR back up. It got lost during the holidays I think.

I've removed the dataset citation for now and removed the target_type argument as requested earlier. Also updated the output type to always be a dict.

Do you have any other suggestions or is this about ready to merge?

pmeier

If I understand your implementation correctly, the target is always returned as None for the test split. If that is the case, I would like to hear @fmassa s thoughts about this, since it would create precedence. As of now, all our datasets always return a valid target.

Other than that two minor nitpicks.

torchvision/datasets/widerface.py

Co-authored-by: Philip Meier <[email protected]>

jgbradley1 · 2021-01-04T23:47:49Z

If I understand your implementation correctly, the target is always returned as None for the test split. If that is the case, I would like to hear @fmassa s thoughts about this, since it would create precedence. As of now, all our datasets always return a valid target.

Ground truth annotations for the WIDERFace test split are not released publicly (as noted in the description section of the main website). In this case, the only options we can provide here is either:

drop support for the test split or
support reading the test split (to improve ease of evaluation) and return an empty target

A comment has been added to getitem to explain the empty target to users.

Actual evaluation on the test split requires a submission to the dataset author. I prefer option 2 as it would encourage users to go for SOTA on the dataset.

fmassa · 2021-01-06T16:17:04Z

Hi @jgbradley1

Sorry for the delay.

I think I would be ok letting the dataset return a target=None for the test split. This way, the return types are always consistent.

If we were to return either image or image, target, then I would be a bit more reticent to do it, but returning image, None I think it's fine.

jgbradley1 · 2021-01-08T05:36:44Z

Looks like there is a problem with a couple of the conda CI builds (unrelated to my changes as far I can tell). I don't think it's anything I can resolve from my end.

pmeier

Thanks a lot @jgbradley1!

Summary: * initial commit of widerface dataset * comment out old code * improve parsing of annotation files * code cleanup and fix docstring comments * speed up check for quota exceeded * cleanup print statements * reformat code and remove print statements * minor code cleanup and reformatting * add more comments * reuse variable * reverse formatting changes * fix flake8 errors * add type annotations * fix mypy errors * add a base_folder to root directory * some formatting fixes * GDrive threshold does not throw 403 error * testing new download logic * cleanup logic for download and integrity check * use a better variable name * format fix * reorder list in docstring * initial widerface unit test - fails on MD5 check * use list of dictionaries to store dataset * fix docstring formatting * remove unnecessary error checking * fix type checker error * revert typo fix * rename var constants, use file context manager, verify str args * fix flake8 error * fix checking target_type argument values * create uncompressed dataset folders * cleanup unit tests for widerface * use correct os function * add more info to docstring * disable unittests for windows * fix _check_integrity logic * update docstring * remove citation * remove target_type option * fix formatting issue * remove comment and add more info to docstring * update type annotations * restart CI jobs Reviewed By: datumbox Differential Revision: D25954560 fbshipit-source-id: 213da6919cda16e03d4a6be66eaa5eaa220cd8d0 Co-authored-by: Philip Meier <[email protected]> Co-authored-by: Joshua Bradley <[email protected]> Co-authored-by: Philip Meier <[email protected]> Co-authored-by: vfdev <[email protected]>

jgbradley1 added 14 commits October 21, 2020 01:26

initial commit of widerface dataset

2dd7d6c

Merge branch 'master' into add-widerface-dataset

edc3a9a

comment out old code

f9e31c9

improve parsing of annotation files

e4ee45f

code cleanup and fix docstring comments

9eae696

speed up check for quota exceeded

1fbd0b7

cleanup print statements

75c620d

Merge branch 'master' into add-widerface-dataset

9637659

reformat code and remove print statements

a82d7b5

minor code cleanup and reformatting

bba0db2

add more comments

0c33f5f

reuse variable

a7c0b30

reverse formatting changes

40cde34

Merge branch 'master' into add-widerface-dataset

bb50718

jgbradley1 mentioned this pull request Oct 23, 2020

Add VGGface2 dataset #1193

Closed

jgbradley1 and others added 5 commits October 26, 2020 03:03

fix flake8 errors

48a620f

add type annotations

bc8c35b

fix mypy errors

e0b8664

Merge branch 'master' into add-widerface-dataset

40e9823

add a base_folder to root directory

2e73130

jgbradley1 added 9 commits October 27, 2020 13:16

some formatting fixes

4de06aa

Merge branch 'master' into add-widerface-dataset

c28966f

GDrive threshold does not throw 403 error

70dc752

testing new download logic

4d2506f

cleanup logic for download and integrity check

6f76fd7

Merge branch 'master' into add-widerface-dataset

5a55195

use a better variable name

9c6d02c

Merge branch 'add-widerface-dataset' of github.com:jgbradley1/vision …

57f3777

…into add-widerface-dataset

format fix

2f76d94

jgbradley1 added 11 commits November 11, 2020 14:39

disable unittests for windows

2813d4e

Merge branch 'master' into add-widerface-dataset

9984146

fix _check_integrity logic

f5981ed

update docstring

a4d3051

Merge branch 'master' into add-widerface-dataset

7f6c327

Merge branch 'master' into add-widerface-dataset

6513f7f

Merge branch 'master' into add-widerface-dataset

871088d

Merge branch 'master' into add-widerface-dataset

1beba85

Merge branch 'add-widerface-dataset' of github.com:jgbradley1/vision …

7845d45

…into add-widerface-dataset

remove citation

2dcd8c8

remove target_type option

95d6708

pmeier reviewed Jan 4, 2021

View reviewed changes

torchvision/datasets/widerface.py Outdated Show resolved Hide resolved

torchvision/datasets/widerface.py Outdated Show resolved Hide resolved

jgbradley1 and others added 2 commits January 4, 2021 18:07

fix formatting issue

00448e9

Co-authored-by: Philip Meier <[email protected]>

remove comment and add more info to docstring

752ed0d

jgbradley1 added 3 commits January 6, 2021 20:29

update type annotations

31b0122

Merge branch 'master' into add-widerface-dataset

f8ef3d3

restart CI jobs

02ae27c

Merge branch 'master' into add-widerface-dataset

6c4a1e8

pmeier approved these changes Jan 8, 2021

View reviewed changes

Merge branch 'master' into add-widerface-dataset

00f24a8

vfdev-5 merged commit d0063f3 into pytorch:master Jan 11, 2021

jgbradley1 deleted the add-widerface-dataset branch January 22, 2021 14:52

pmeier mentioned this pull request Feb 4, 2021

Addition of WIDER Face Dataset to Datasets #1627

Closed

pmeier mentioned this pull request Mar 19, 2021

enable tests for WIDERFace on Windows #3589

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add widerface dataset #2883

Add widerface dataset #2883

jgbradley1 commented Oct 23, 2020 •

edited

Loading

codecov bot commented Oct 26, 2020 •

edited

Loading

jgbradley1 commented Nov 10, 2020 •

edited

Loading

jgbradley1 commented Jan 4, 2021

pmeier left a comment

jgbradley1 commented Jan 4, 2021 •

edited

Loading

fmassa commented Jan 6, 2021

jgbradley1 commented Jan 8, 2021

pmeier left a comment

Add widerface dataset #2883

Add widerface dataset #2883

Conversation

jgbradley1 commented Oct 23, 2020 • edited Loading

codecov bot commented Oct 26, 2020 • edited Loading

Codecov Report

jgbradley1 commented Nov 10, 2020 • edited Loading

jgbradley1 commented Jan 4, 2021

pmeier left a comment

Choose a reason for hiding this comment

jgbradley1 commented Jan 4, 2021 • edited Loading

fmassa commented Jan 6, 2021

jgbradley1 commented Jan 8, 2021

pmeier left a comment

Choose a reason for hiding this comment

jgbradley1 commented Oct 23, 2020 •

edited

Loading

codecov bot commented Oct 26, 2020 •

edited

Loading

jgbradley1 commented Nov 10, 2020 •

edited

Loading

jgbradley1 commented Jan 4, 2021 •

edited

Loading