Video support, new datasets and models
This release adds support for video models and datasets, and brings several improvements.
Note: torchvision 0.4 requires PyTorch 1.2 or newer
Highlights
Video and IO
Video is now a first-class citizen in torchvision. The 0.4 release includes:
- efficient IO primitives for reading and writing video files
- Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with
torch.utils.data.DataLoader
- Pre-trained models for action recognition, trained on Kinetics-400
- Training and evaluation scripts for reproducing the training results.
Writing your own video dataset is easy. We provide an utility class VideoClips
that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.
from torchvision.datasets.video_utils import VideoClips
class MyVideoDataset(object):
def __init__(self, video_paths):
self.video_clips = VideoClips(video_paths,
clip_length_in_frames=16,
frames_between_clips``=1,
frame_rate=15)
def __getitem__(self, idx):
video, audio, info, video_idx = self.video_clips.get_clip(idx)
return video, audio
def __len__(self):
return self.video_clips.num_clips()
We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.
model | clip @ 1 |
---|---|
r3d_18 | 52.748 |
mc3_18 | 53.898 |
r2plus1d_18 | 57.498 |
Bugfixes
- change aspect ratio calculation formula in
references/detection
(#1194) - bug fixes in ImageNet (#1149)
- fix save_image when height or width equals 1 (#1059)
- Fix STL10
__repr__
(#969) - Fix wrong behavior of
GeneralizedRCNNTransform
in Python2. (#960)
Datasets
New
- Add USPS dataset (#961)(#1117)
- Added support for the QMNIST dataset (#995)
- Add HMDB51 and UCF101 datasets (#1156)
- Add Kinetics400 dataset (#1077)
Improvements
- Miscellaneous dataset fixes (#1174)
- Standardize str argument verification in datasets (#1167)
- Always pass
transform
andtarget_transform
to abstract dataset (#1126) - Remove duplicate transform assignment in FakeDataset (#1125)
- Automatic extraction for Cityscapes Dataset (#1066) (#1068)
- Use joint transform in Cityscapes (#1024)(#1045)
- CelebA: track attr names, support split="all", code cleanup (#1008)
- Add folds option to STL10 (#914)
Models
New
- Add pretrained Wide ResNet (#912)
- Memory efficient densenet (#1003) (#1090)
- Implementation of the MNASNet family of models (#829)(#1043)(#1092)
- Add VideoModelZoo models (#1130)
Improvements
- Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
- Add checks to
roi_heads
in detection module (#1091) - Make shallow copy of input list in
GeneralizedRCNNTransform
(#1085)(#1111)(#1084) - Make MobileNetV2 number of channel divisible by 8 (#1005)
- typo fix: ouput -> output in Inception and GoogleNet (#1034)
- Remove empty proposals from the RPN (#1026)
- Remove empty boxes before NMS (#1019)
- Reduce code duplication in segmentation models (#1009)
- allow user to define residual settings in MobileNetV2 (#965)
- Use
flatten
instead ofview
(#1134)
Documentation
- Consistency in detection box format (#1110)
- Fix Mask R-CNN docs (#1089)
- Add paper references to VGG and Resnet variants (#1088)
- Doc, Test Fixes in
Normalize
(#1063) - Add transforms doc to more datasets (#1038)
- Corrected typo: 5 to 0.5 (#1041)
- Update doc for
torchvision.transforms.functional.perspective
(#1017) - Improve documentation for
fillcolor
option inRandomAffine
(#994) - Fix
COCO_INSTANCE_CATEGORY_NAMES
(#991) - Added models information to documentation. (#985)
- Add missing import in
faster_rcnn.py
documentation (#979) - Improve
make_grid
docs (#964)
Tests
- Add test for SVHN (#1086)
- Add tests for Cityscapes Dataset (#1079)
- Update CI to Python 3.6 (#1044)
- Make
test_save_image
more robust (#1037) - Add a generic test for the datasets (#1015)
- moved fakedata generation to separate module (#1014)
- Create imagenet fakedata on-the-fly (#1012)
- Minor test refactorings (#1011)
- Add test for CIFAR10(0) (#1010)
- Mock MNIST download for less flaky tests (#1004)
- Add test for ImageNet (#976)(#1006)
- Add tests for datasets (#966)
Transforms
New
Improvements
- Allowing 'F' mode for 1 channel FloatTensor in
ToPILImage
(#1100) - Add shear parallel to y-axis (#1070)
- fix error message in
to_tensor
(#1000) - Fix TypeError in
RandomResizedCrop.get_params
(#1036) - Fix
normalize
for differentdtype
thanfloat32
(#1021)
Ops
- Renamed
vision.h
files tovision_cpu.h
andvision_cuda.h
(#1051)(#1052) - Optimize
nms_cuda
by avoiding extratorch.cat
call (#945)
Reference scripts
- Expose data-path in the detection reference scripts (#1109)
- Make
utils.py
work with pytorch-cpu (#1023) - Add mixed precision training with Apex (#972)(#1124)
- Add reference code for similarity learning (#1101)
Build
- Add windows build steps and wheel build scripts (#998)
- add packaging scripts (#996)
- Allow forcing GPU build with
FORCE_CUDA=1
(#927)