Mobile support, AutoAugment, improved IO and more
This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.
Highlights
Better mobile support
torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks.
It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.
Classification
We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012.
import torch
import torchvision
# Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.mobilenet_v3_large(pretrained=True)
# m_classifier = torchvision.models.mobilenet_v3_small(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
# Quantized Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
The pre-trained models have the following accuracies on ImageNet 2012 val:
Model | Top-1 Acc | Top-5 Acc |
---|---|---|
MobileNetV3 Large | 74.042 | 91.340 |
MobileNetV3 Large (Quantized) | 73.004 | 90.858 |
MobileNetV3 Small | 67.620 | 87.404 |
Object Detection
We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows
import torch
import torchvision
# Fast Low Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Highly Accurate High Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
And yield the following accuracies on COCO val 2017 (full results available in #3265):
Model | mAP | mAP@50 | mAP@75 |
---|---|---|---|
Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 |
Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |
Semantic Segmentation
We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.
import torch
import torchvision
# Fast Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
# Highly Accurate Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):
Model | mean IoU | global pixelwise accuracy |
---|---|---|
Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 |
DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |
Addition of the AutoAugment method
AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:
from torchvision import transforms
t = transforms.AutoAugment()
transformed = t(image)
transform=transforms.Compose([
transforms.Resize(256),
transforms.AutoAugment(),
transforms.ToTensor()])
Improved Image IO and on-the-fly image type conversions
All the read and decode methods of the io.image
package have been updated to:
- Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
- Allow the on-the-fly conversion of image from one type to the other during read.
from torchvision.io.image import read_image, ImageReadMode
# keeps original type, channels unchanged
x1 = read_image("image.png")
# converts to grayscale, channels = 1
x2 = read_image("image.png", mode=ImageReadMode.GRAY)
# converts to grayscale with alpha transparency, channels = 2
x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)
# coverts to RGB, channels = 3
x4 = read_image("image.png", mode=ImageReadMode.RGB)
# converts to RGB with alpha transparency, channels = 4
x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)
Python 3.9 and CUDA 11.1
This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)
Backwards Incompatible Changes
- [Ops] Change default
eps
value ofFrozenBN
to better align withnn.BatchNorm
(#2933) - [Ops] Remove deprecated _new_empty_tensor. (#3156)
- [Transforms]
ColorJitter
gets its random params by callingget_params()
(#3001) - [Transforms] Change rounding of transforms on integer tensors (#2964)
- [Utils] Remove
normalize
fromsave_image
(#3324)
New Features
- [Datasets] Add WiderFace dataset (#2883)
- [Models] Add MobileNetV3 architecture:
- [Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (#3205)
- [Mobile] Add Android gradle project with demo test app (#2897)
- [Transforms] Implemented AutoAugment, along with required new transforms + Policies (#3123)
- [Ops] Added support of Autocast in all Operators: #2938, #2926, #2922, #2928, #2905, #2906, #2907, #2898
- [Ops] Add modulation input for DeformConv2D (#2791)
- [IO] Improved
io.image
with on-the-fly image type conversions: (#3193, #3069, #3024, #2988, #2984) - [IO] Add option to write audio to video file (#2304)
- [Utils] Added a utility to draw bounding boxes (#2785, #3296, #3075)
Improvements
Datasets
- Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
- Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
- Check if dataset file is located on Google Drive before downloading it (#3245)
- Improve Coco implementation (#3417)
- Make download_url follow redirects (#3236)
make_dataset
asstaticmethod
ofDatasetFolder
(#3215)- Add a warning if any clip can't be obtained from a video in
VideoClips
. (#2513)
Models
- Improve error message in
AnchorGenerator
(#2960) - Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
- Support for image with no annotations in RetinaNet (#3032)
- Change RoIHeads reshape to support empty batches. (#3031)
- Fixed typing exception throwing issues with JIT (#3029)
- Replace deprecated
functional.sigmoid
withtorch.sigmoid
in RetinaNet (#3307) - Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
- Speedup RetinaNet's postprocessing (#2828)
Ops
- Added eps in the
__repr__
of FrozenBN (#2852) - Added
__repr__
toMultiScaleRoIAlign
(#2840) - Exposing LevelMapper params in
MultiScaleRoIAlign
(#3151) - Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)
Transforms
adjust_hue
now accepts tensors with one channel (#3222)- Add
fill
color support for tensor affine transforms (#2904) - Remove torchscript workaround for
center_crop
(#3118) - Improved error message for
RandomCrop
(#2816)
IO
- Enabling to import
read_file
and the other methods from torchvision.io (#2918) - accept python bytes in
_read_video_from_memory()
(#3347) - Enable rtmp timeout in decoder (#3076)
- Specify tls cert file to decoder through config (#3289, #3374)
- Add UUID in LOG() in decoder (#3080)
References
- Add weight averaging and storing methods in references utils (#3352)
- Adding Preset Transforms in reference scripts (#3317)
- Load variables when
--resume /path/to/checkpoint --test-only
(#3285) - Updated video classification ref example with new transforms (#2935)
Misc
- Various documentation improvements (#3039, #3271, #2820, #2808, #3131, #3062, #3061, #3000, #3299, #3400, #2899, #2901, #2908, #2851, #2909, #3005, #2821, #2957, #3360, #3019, #3124, #3217, #2879, #3234, #3180, #3425, #2979, #2935, #3298, #3268, #3203, #3290, #3295, #3200, #2663, #3153, #3147, #3232)
- The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (#3259, #3378, #3408, #3373, #3290)
- Avoid some deprecation warnings from PyTorch (#3348)
- Ensure operators are added in C++ (#2798, #3091, #3391)
- Fixed compilation warnings on C++ codebase (#3390)
- CI Improvements (#3401, #3329, #2990, #2978, #3189, #3230, #3254, #2844, #2872, #2825, #3144, #3137, #2827, #2848, #2914, #3419, #2895, #2837)
- Installation improvements (#3302, #2969, #3113, #3202)
- CMake improvements (#2801, #2805, #3212, #3381)
Mobile
- Add Torch Selective macros in all C++ Ops for better support on mobile (#3218)
Code Quality, testing
- [BC-breaking] Modernized C++ codebase & made it mobile-friendly (25% faster to compile): #2885, #2891, #2892, #2893, #2905, #2906, #2907, #2938, #2944, #2945, #3011, #3020, #3097, #3105, #3134, #3135, #3143, #3146, #3154, #3156, #3163, #3218, #3308, #3311, #3312, #3326, #3350, #3390
- Cleaned up Python codebase & made it more Pythonic: #3263, #3239, #3059, #3055, #3045, #3382, #3159, #3171
- Improve type annotations (#3288, #3045, #2862, #2858, #2857, #2863, #2865, #2856, #2860, #2864, #2875, #2859, #2854, #2861, #3174, #3059)
- Code refactoring and static analysis improvements (#3379, #3335, #3229, #3204, #3095)
- Miscellaneous test improvements (#2966, #2965, #3018, #3035, #2961, #2806, #2812, #2815, #2834, #2874, #3099, #3092, #3160, #3103, #2971, #3023, #2803, #3136, #3319, #3310, #3287, #3033, #2983, #3386, #3369, #3116, #2985, #3320)
Bug Fixes
- [DATASETS] Fixes EMNIST split and label issues (#2673)
- [DATASETS] Fix overflow in STL10 fold reading (#3353)
- [MODELS] Fix incorrectly frozen BN on ResNet FPN backbone (#3396)
- [MODELS] Fix scriptability support in Inception V3 (#2976)
- [MODELS] Changed default value of eps in FrozenBatchNorm to match BatchNorm: #2940 #2933
- [MODELS] Fixed warning in
models.detection.transforms.resize_image_and_masks
. (#3237) - [MODELS] Fix trainable_layers on RetinaNet (#3234)
- [MODELS] Fix ShuffleNetV2 ONNX model export issue. (#3158)
- [UTILS] Fixes no grad and range bugs in utils. (#3269)
- [UTILS] make_grid uses a more correct normalization (#2967)
- [OPS] fix GET_THREADS() for ROCm with DeformConv (#2997)
- [OPS] Fix NMS and IoU overflows for fp16 (#3383, #3382)
- [OPS] Fix ops registration on windows (#3380)
- [OPS] Fix initialisation bug on FeaturePyramidNetwork (#2954)
- [IO] Replace hardcoded error code with ENODATA (#3277)
- [REFERENCES] Fix repeated UserWarning and add more flexibility to reference code for segmentation tasks (#2886)
- [TRANSFORMS] Fix default fill value in RandomRotation (#3303)
- [TRANSFORMS] Correct aspect ratio sampling in transforms.RandomErasing (#3344)
- [TRANSFORMS] Fix
CenterCrop
for Tensor size is greater thanimgsize
(#3333) - [TRANSFORMS] Functional to_tensor returns float tensor of default dtype (#3398)
- [TRANSFORMS] Add explicit check for number of channels (#3013)
- [TRANSFORMS]
pil_to_tensor
with accimage backend now return uint8 (#3109) - [TRANSFORMS] Fix potential overflow in
convert_image_dtype
(#3107) - [TRANSFORMS] Check num of channels on
adjust*_
transformations (#3069)