Skip to content

Video support, new datasets and models

Compare
Choose a tag to compare
@fmassa fmassa released this 08 Aug 15:09
· 13 commits to v0.4.0 since this release
d31eafa

This release adds support for video models and datasets, and brings several improvements.

Note: torchvision 0.4 requires PyTorch 1.2 or newer

Highlights

Video and IO

Video is now a first-class citizen in torchvision. The 0.4 release includes:

  • efficient IO primitives for reading and writing video files
  • Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with torch.utils.data.DataLoader
  • Pre-trained models for action recognition, trained on Kinetics-400
  • Training and evaluation scripts for reproducing the training results.

Writing your own video dataset is easy. We provide an utility class VideoClips that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.

from torchvision.datasets.video_utils import VideoClips

class MyVideoDataset(object):
    def __init__(self, video_paths):
        self.video_clips = VideoClips(video_paths,
                                      clip_length_in_frames=16,
                                      frames_between_clips``=1,
                                      frame_rate=15)

    def __getitem__(self, idx):
        video, audio, info, video_idx = self.video_clips.get_clip(idx)
        return video, audio
    
    def __len__(self):
        return self.video_clips.num_clips()

We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.

model clip @ 1
r3d_18 52.748
mc3_18 53.898
r2plus1d_18 57.498

Bugfixes

  • change aspect ratio calculation formula in references/detection (#1194)
  • bug fixes in ImageNet (#1149)
  • fix save_image when height or width equals 1 (#1059)
  • Fix STL10 __repr__ (#969)
  • Fix wrong behavior of GeneralizedRCNNTransform in Python2. (#960)

Datasets

New

  • Add USPS dataset (#961)(#1117)
  • Added support for the QMNIST dataset (#995)
  • Add HMDB51 and UCF101 datasets (#1156)
  • Add Kinetics400 dataset (#1077)

Improvements

  • Miscellaneous dataset fixes (#1174)
  • Standardize str argument verification in datasets (#1167)
  • Always pass transform and target_transform to abstract dataset (#1126)
  • Remove duplicate transform assignment in FakeDataset (#1125)
  • Automatic extraction for Cityscapes Dataset (#1066) (#1068)
  • Use joint transform in Cityscapes (#1024)(#1045)
  • CelebA: track attr names, support split="all", code cleanup (#1008)
  • Add folds option to STL10 (#914)

Models

New

  • Add pretrained Wide ResNet (#912)
  • Memory efficient densenet (#1003) (#1090)
  • Implementation of the MNASNet family of models (#829)(#1043)(#1092)
  • Add VideoModelZoo models (#1130)

Improvements

  • Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
  • Add checks to roi_heads in detection module (#1091)
  • Make shallow copy of input list in GeneralizedRCNNTransform (#1085)(#1111)(#1084)
  • Make MobileNetV2 number of channel divisible by 8 (#1005)
  • typo fix: ouput -> output in Inception and GoogleNet (#1034)
  • Remove empty proposals from the RPN (#1026)
  • Remove empty boxes before NMS (#1019)
  • Reduce code duplication in segmentation models (#1009)
  • allow user to define residual settings in MobileNetV2 (#965)
  • Use flatten instead of view (#1134)

Documentation

  • Consistency in detection box format (#1110)
  • Fix Mask R-CNN docs (#1089)
  • Add paper references to VGG and Resnet variants (#1088)
  • Doc, Test Fixes in Normalize (#1063)
  • Add transforms doc to more datasets (#1038)
  • Corrected typo: 5 to 0.5 (#1041)
  • Update doc for torchvision.transforms.functional.perspective (#1017)
  • Improve documentation for fillcolor option in RandomAffine (#994)
  • Fix COCO_INSTANCE_CATEGORY_NAMES (#991)
  • Added models information to documentation. (#985)
  • Add missing import in faster_rcnn.py documentation (#979)
  • Improve make_grid docs (#964)

Tests

  • Add test for SVHN (#1086)
  • Add tests for Cityscapes Dataset (#1079)
  • Update CI to Python 3.6 (#1044)
  • Make test_save_image more robust (#1037)
  • Add a generic test for the datasets (#1015)
  • moved fakedata generation to separate module (#1014)
  • Create imagenet fakedata on-the-fly (#1012)
  • Minor test refactorings (#1011)
  • Add test for CIFAR10(0) (#1010)
  • Mock MNIST download for less flaky tests (#1004)
  • Add test for ImageNet (#976)(#1006)
  • Add tests for datasets (#966)

Transforms

New

Improvements

  • Allowing 'F' mode for 1 channel FloatTensor in ToPILImage (#1100)
  • Add shear parallel to y-axis (#1070)
  • fix error message in to_tensor (#1000)
  • Fix TypeError in RandomResizedCrop.get_params (#1036)
  • Fix normalize for different dtype than float32 (#1021)

Ops

  • Renamed vision.h files to vision_cpu.h and vision_cuda.h (#1051)(#1052)
  • Optimize nms_cuda by avoiding extra torch.cat call (#945)

Reference scripts

  • Expose data-path in the detection reference scripts (#1109)
  • Make utils.py work with pytorch-cpu (#1023)
  • Add mixed precision training with Apex (#972)(#1124)
  • Add reference code for similarity learning (#1101)

Build

  • Add windows build steps and wheel build scripts (#998)
  • add packaging scripts (#996)
  • Allow forcing GPU build with FORCE_CUDA=1 (#927)

Misc

  • Misc lint fixes (#1020)
  • Reraise error on failed downloading (#1013)
  • add more hub models (#974)
  • make C extension lazy-import (#971)