This release adds support for video models and datasets, and brings several improvements.

Note: torchvision 0.4 requires PyTorch 1.2 or newer

Highlights

Video and IO

Video is now a first-class citizen in torchvision. The 0.4 release includes:

efficient IO primitives for reading and writing video files
Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with torch.utils.data.DataLoader
Pre-trained models for action recognition, trained on Kinetics-400
Training and evaluation scripts for reproducing the training results.

Writing your own video dataset is easy. We provide an utility class VideoClips that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.

from torchvision.datasets.video_utils import VideoClips

class MyVideoDataset(object):
    def __init__(self, video_paths):
        self.video_clips = VideoClips(video_paths,
                                      clip_length_in_frames=16,
                                      frames_between_clips``=1,
                                      frame_rate=15)

    def __getitem__(self, idx):
        video, audio, info, video_idx = self.video_clips.get_clip(idx)
        return video, audio
    
    def __len__(self):
        return self.video_clips.num_clips()

We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.

model	clip @ 1
r3d_18	52.748
mc3_18	53.898
r2plus1d_18	57.498

Bugfixes

change aspect ratio calculation formula in references/detection (#1194)
bug fixes in ImageNet (#1149)
fix save_image when height or width equals 1 (#1059)
Fix STL10 __repr__ (#969)
Fix wrong behavior of GeneralizedRCNNTransform in Python2. (#960)

Datasets

New

Add USPS dataset (#961)(#1117)
Added support for the QMNIST dataset (#995)
Add HMDB51 and UCF101 datasets (#1156)
Add Kinetics400 dataset (#1077)

Improvements

Miscellaneous dataset fixes (#1174)
Standardize str argument verification in datasets (#1167)
Always pass transform and target_transform to abstract dataset (#1126)
Remove duplicate transform assignment in FakeDataset (#1125)
Automatic extraction for Cityscapes Dataset (#1066) (#1068)
Use joint transform in Cityscapes (#1024)(#1045)
CelebA: track attr names, support split="all", code cleanup (#1008)
Add folds option to STL10 (#914)

Models

New

Add pretrained Wide ResNet (#912)
Memory efficient densenet (#1003) (#1090)
Implementation of the MNASNet family of models (#829)(#1043)(#1092)
Add VideoModelZoo models (#1130)

Improvements

Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
Add checks to roi_heads in detection module (#1091)
Make shallow copy of input list in GeneralizedRCNNTransform (#1085)(#1111)(#1084)
Make MobileNetV2 number of channel divisible by 8 (#1005)
typo fix: ouput -> output in Inception and GoogleNet (#1034)
Remove empty proposals from the RPN (#1026)
Remove empty boxes before NMS (#1019)
Reduce code duplication in segmentation models (#1009)
allow user to define residual settings in MobileNetV2 (#965)
Use flatten instead of view (#1134)

Documentation

Consistency in detection box format (#1110)
Fix Mask R-CNN docs (#1089)
Add paper references to VGG and Resnet variants (#1088)
Doc, Test Fixes in Normalize (#1063)
Add transforms doc to more datasets (#1038)
Corrected typo: 5 to 0.5 (#1041)
Update doc for torchvision.transforms.functional.perspective (#1017)
Improve documentation for fillcolor option in RandomAffine (#994)
Fix COCO_INSTANCE_CATEGORY_NAMES (#991)
Added models information to documentation. (#985)
Add missing import in faster_rcnn.py documentation (#979)
Improve make_grid docs (#964)

Tests

Add test for SVHN (#1086)
Add tests for Cityscapes Dataset (#1079)
Update CI to Python 3.6 (#1044)
Make test_save_image more robust (#1037)
Add a generic test for the datasets (#1015)
moved fakedata generation to separate module (#1014)
Create imagenet fakedata on-the-fly (#1012)
Minor test refactorings (#1011)
Add test for CIFAR10(0) (#1010)
Mock MNIST download for less flaky tests (#1004)
Add test for ImageNet (#976)(#1006)
Add tests for datasets (#966)

Transforms

New

Add Random Erasing for image augmentation (#909) (#1060) (#1087) (#1095)

Improvements

Allowing 'F' mode for 1 channel FloatTensor in ToPILImage (#1100)
Add shear parallel to y-axis (#1070)
fix error message in to_tensor (#1000)
Fix TypeError in RandomResizedCrop.get_params (#1036)
Fix normalize for different dtype than float32 (#1021)

Ops

Renamed vision.h files to vision_cpu.h and vision_cuda.h (#1051)(#1052)
Optimize nms_cuda by avoiding extra torch.cat call (#945)

Reference scripts

Expose data-path in the detection reference scripts (#1109)
Make utils.py work with pytorch-cpu (#1023)
Add mixed precision training with Apex (#972)(#1124)
Add reference code for similarity learning (#1101)

Build

Add windows build steps and wheel build scripts (#998)
add packaging scripts (#996)
Allow forcing GPU build with FORCE_CUDA=1 (#927)

Misc

Misc lint fixes (#1020)
Reraise error on failed downloading (#1013)
add more hub models (#974)
make C extension lazy-import (#971)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video support, new datasets and models

Highlights

Video and IO

Bugfixes

Datasets

New

Improvements

Models

New

Improvements

Documentation

Tests

Transforms

New

Improvements

Ops

Reference scripts

Build

Misc