All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Support for custom media types, new
PointCloud
media type,DatasetItem.media
and.media_as(type)
members (openvinotoolkit#539) - [API] A way to request dataset and extractor media type with
media_type
(openvinotoolkit#539) - BraTS format (import-only) (.npy and .nii.gz), new
MultiframeImage
media type (openvinotoolkit#628) - Common Semantic Segmentation dataset format (import-only) (openvinotoolkit#685)
- An option to disable
data/
prefix inclusion in YOLO export (openvinotoolkit#689) - New command
describe-downloads
to print information about downloadable datasets (openvinotoolkit#678) - Detection for Cityscapes format (openvinotoolkit#680)
- Maximum recursion
--depth
parameter fordetect-dataset
CLI command (openvinotoolkit#680) - An option to save a single subset in the
download
command (openvinotoolkit#697) - Common Super Resolution dataset format (import-only) (openvinotoolkit#700)
- Kinetics 400/600/700 dataset format (import-only) (openvinotoolkit#706)
- NYU Depth Dataset V2 format (import-only) (openvinotoolkit#712)
- Skeleton annotation type (#6, #14, #15)
- Storing labels with the same name but with a different parent (#8)
- Functions to work with plain polygons (COCO-style) -
close_polygon
,simplify_polygon
(#39) - An option to specify scale factor in
resize
transform (#46) - Skeleton support in datumaro format (#47)
- Support for Ultralytics YOLO formats (#50)
- Support for Ultralytics YOLO Classification format (#59)
- YOLO formats now merge default subset and train subset if both are present (#71)
- Support for tracks in Ultralytics YOLO formats (#70)
env.detect_dataset()
now returns a list of detected formats at all recursion levels instead of just the lowest one (openvinotoolkit#680)- Open Images: allowed to store annotations file in root path as well (openvinotoolkit#680)
- Improved parsing error messages in COCO, VOC and YOLO formats (openvinotoolkit#684, openvinotoolkit#686, openvinotoolkit#687)
- YOLO format now supports almost any subset names, except
backup
,names
andclasses
(instead of justtrain
andvalid
). The reserved names now raise an error on exporting. (openvinotoolkit#688) - [CLI] Removed the
--all
flag indatum info
, added the--json
flag, addedformat
andmedia type
fields in theinfo
command output (#5) - item id in MOT format (#17)
- Annotation matching algorithm in
datumaro.components.operations.match_segments()
(#30) - Automatic detection of
is_crowd
parameter is disabled insegment_iou()
, added a separate function argument (turned off by default) (#41)
--save-images
is replaced with--save-media
in CLI and converter API (openvinotoolkit#539)- [API]
image
,point_cloud
andrelated_images
ofDatasetItem
are replaced withmedia
andmedia_as(type)
members and c-tor parameters (openvinotoolkit#539) - [API]
datumaro.util.annotation_util._get_bbox()
is renamed intoget_bbox()
(#41)
- TBD
- Collision between parents and names in LabelCategories
- (#51)
- Detection for LFW format (openvinotoolkit#680)
- Export of masks with background class with id != 0 in the VOC, KITTI and Cityscapes formats (#9, #16)
- Missing comparison of the base class attributes in the
Mask
class (#28) - Image stats when no image info available for some images in the dataset (#29)
- Incorrect writing of
media
field in the Datumaro format, when there are specific media fields (#34) - Added missing
PointCloud
media type in the datumaro module namespace (#34) - Incorrect computation of binary mask bbox (missed 1 pixel of the size) (#41)
Dataset.get()
could ignore existing transforms in the dataset (#45)- Failing
resize
transform for RLE masks (#46)
- TBD
- Ability to import a video as frames with the
video_frames
format and to split a video into frames with thedatum util split_video
command (openvinotoolkit#555) --subset
parameter in theimage_dir
format (openvinotoolkit#555)MediaManager
API to control loaded media resources at runtime (openvinotoolkit#555)- Command to detect the format of a dataset (openvinotoolkit#576)
- More comfortable access to library API via
import datumaro
(openvinotoolkit#630) - CLI command-like free functions (
export
,transform
, ...) (openvinotoolkit#630) - Reading specific annotation files for train dataset in Cityscapes (openvinotoolkit#632)
- Random sampling transforms (
random_sampler
,label_random_sampler
) to create smaller datasets from bigger ones (openvinotoolkit#636, openvinotoolkit#640) - API to report dataset import and export progress; API to report dataset import and export errors and take action (skip, fail) (supported in COCO, VOC and YOLO formats) (openvinotoolkit#650)
- Support for downloading the ImageNetV2 and COCO datasets (openvinotoolkit#653, openvinotoolkit#659)
- A way for formats to signal that they don't support detection (openvinotoolkit#665)
- Removal transforms to remove items/annoations/attributes from dataset
(
remove_items
,remove_annotations
,remove_attributes
) (openvinotoolkit#670)
- Allowed direct file paths in
datum import
. Such sources are imported like when therpath
parameter is specified, however, only the selected path is copied into the project (openvinotoolkit#555) - Improved
stats
performance, added new filtering parameters, image stats (unique
,repeated
) moved to thedataset
section, removedmean
andstd
from thedataset
section (openvinotoolkit#621) - Allowed
Image
creation from justsize
info (openvinotoolkit#634) - Added image search in VOC XML-based subformats (openvinotoolkit#634)
- Added image path equality checks in simple merge, when applicable (openvinotoolkit#634)
- Supported saving box attributes when downloading the TFDS version of VOC (openvinotoolkit#668)
- Switched to a
pyproject.toml
-based build (openvinotoolkit#671)
- TBD
- Official support of Python 3.6 (due to it's EOL) (openvinotoolkit#617)
- Backward compatibility annotation symbols in
components.extractor
(openvinotoolkit#630)
- Prohibited calling
add
,import
andexport
commands without a project (openvinotoolkit#555) - Calling
make_dataset
on empty project tree now produces the error properly (openvinotoolkit#555) - Saving (overwriting) a dataset in a project when rpath is used (openvinotoolkit#613)
- Output image extension preserving in the
Resize
transform (openvinotoolkit#606) - Memory overuse in the
Resize
transform (openvinotoolkit#607) - Invalid image pixels produced by the
Resize
transform (openvinotoolkit#618) - Numeric warnings that sometimes occurred in
stats
command (e.g. openvinotoolkit#607) (openvinotoolkit#621) - Added missing item attribute merging in simple merge (openvinotoolkit#634)
- Inability to disambiguate VOC from LabelMe in some cases (openvinotoolkit#658)
- TBD
- Command to download public datasets (openvinotoolkit#582)
- Extension autodetection in
ByteImage
(openvinotoolkit#595) - MPII Human Pose Dataset (import-only) (.mat and .json) (openvinotoolkit#584)
- MARS format (import-only) (openvinotoolkit#585)
- The
pycocotools
dependency lower bound is raised to2.0.4
. (openvinotoolkit#449) smooth_line
fromdatumaro.util.annotation_util
- the function is renamed toapproximate_line
and has updated interface (openvinotoolkit#592)
- Python 3.6 support
- TBD
- Fails in multimerge when lines are not approximated and when there are no label categories (openvinotoolkit#592)
- Cannot convert LabelMe dataset, that has no subsets (openvinotoolkit#600)
- TBD
- Video reading API (openvinotoolkit#521)
- Python API documentation (openvinotoolkit#526)
- Mapillary Vistas dataset format (Import-only) (openvinotoolkit#537)
- Datumaro can now be installed on Windows on Python 3.9 (openvinotoolkit#547)
- Import for SYNTHIA dataset format (openvinotoolkit#532)
- Support of
score
attribute in KITTI detetion (openvinotoolkit#571) - Support for Accuracy Checker dataset meta files in formats (openvinotoolkit#553, openvinotoolkit#569, openvinotoolkit#575)
- Import for VoTT dataset format (openvinotoolkit#573)
- Image resizing transform (openvinotoolkit#581)
- The following formats can now be detected unambiguously:
ade20k2017
,ade20k2020
,camvid
,coco
,cvat
,datumaro
,icdar_text_localization
,icdar_text_segmentation
,icdar_word_recognition
,imagenet_txt
,kitti_raw
,label_me
,lfw
,mot_seq
,open_images
,vgg_face2
,voc
,widerface
,yolo
(openvinotoolkit#531, openvinotoolkit#536, openvinotoolkit#550, openvinotoolkit#557, openvinotoolkit#558) - Allowed Pytest-native tests (openvinotoolkit#563)
- Allowed export options in the
datum merge
command (openvinotoolkit#545)
- Using
Image
,ByteImage
fromdatumaro.util.image
- these classes are moved todatumaro.components.media
(openvinotoolkit#538)
- Equality comparison support between
datumaro.components.media.Image
andnumpy.ndarray
(openvinotoolkit#568)
- Bug #560: import issue with MOT dataset when using seqinfo.ini file (openvinotoolkit#564)
- Empty lines in VOC subset lists are not ignored (openvinotoolkit#587)
- TBD
- Import for CelebA dataset format. (openvinotoolkit#484)
- File
people.txt
became optional in LFW (openvinotoolkit#509) - File
image_ids_and_rotation.csv
became optional Open Images (openvinotoolkit#509) - Allowed underscores (
_
) in subset names in COCO (openvinotoolkit#509) - Allowed annotation files with arbitrary names in COCO (openvinotoolkit#509)
- The
icdar_text_localization
format is no longer detected in every directory (openvinotoolkit#531) - Updated
pycocotools
version to 2.0.2 (openvinotoolkit#534)
- TBD
- TBD
- Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (openvinotoolkit#530)
- Exporting dataset without
color
attribute into theicdar_text_segmentation
format (openvinotoolkit#556)
- TBD
- A new installation target:
pip install datumaro[default]
, which should be used by default. The simpledatumaro
is supposed for library users. (openvinotoolkit#238) - Dataset and project versioning capabilities (Git-like) (openvinotoolkit#238)
- "dataset revpath" concept in CLI, allowing to pass a dataset path with
the dataset format in
diff
,merge
,explain
andinfo
CLI commands (openvinotoolkit#238) import
,remove
,commit
,checkout
,log
,status
,info
CLI commands (openvinotoolkit#238)Coco*Extractor
classes now have an option to preserve label IDs from the original annotation file (openvinotoolkit#453)patch
CLI command to patch datasets (openvinotoolkit#401)ProjectLabels
transform to change dataset labels for merging etc. (openvinotoolkit#401, openvinotoolkit#478)- Support for custom labels in the KITTI detection format (openvinotoolkit#481)
- Type annotations and docs for Annotation classes (openvinotoolkit#493)
- Options to control label loading behavior in
imagenet_txt
import (openvinotoolkit#434, openvinotoolkit#489)
- A project can contain and manage multiple datasets instead of a single one. CLI operations can be applied to the whole project, or to separate datasets. Datasets are modified inplace, by default (openvinotoolkit#328)
- CLI help for builtin plugins doesn't require project (openvinotoolkit#328)
- Annotation-related classes were moved into a new module,
datumaro.components.annotation
(openvinotoolkit#439) - Rollback utilities replaced with Scope utilities (openvinotoolkit#444)
- The
Project
class fromdatumaro.components
is changed completely (openvinotoolkit#238) diff
andediff
are joined into a singlediff
CLI command (openvinotoolkit#238)- Projects use new file layout, incompatible with old projects.
An old project can be updated with
datum project migrate
(openvinotoolkit#238) - Inheriting
CliPlugin
is not required in plugin classes (openvinotoolkit#238) Importer
s do not createProject
s anymore and just return a list of extractor configurations (openvinotoolkit#238)
- TBD
import
,project merge
CLI commands (openvinotoolkit#238)- Support for project hierarchies. A project cannot be a source anymore (openvinotoolkit#238)
- Project cannot have independent internal dataset anymore. All the project data must be stored in the project data sources (openvinotoolkit#238)
datumaro_project
format (openvinotoolkit#238)- Unused
path
field ofDatasetItem
(openvinotoolkit#455)
- Deprecation warning in
open_images_format.py
(openvinotoolkit#440) lazy_image
returning unrelated data sometimes (openvinotoolkit#409)- Invalid call to
pycocotools.mask.iou
(openvinotoolkit#450) - Importing of Open Images datasets without image data (openvinotoolkit#463)
- Return value type in
Dataset.is_modified
(openvinotoolkit#401) - Remapping of secondary categories in
RemapLabels
(openvinotoolkit#401) - VOC dataset patching for classification and segmentation tasks (openvinotoolkit#478)
- Exported mask label ids in KITTI segmentation (openvinotoolkit#481)
- Missing
label
forPoints
read in the LFW format (openvinotoolkit#494)
- TBD
- The Open Images format now supports bounding box and segmentation mask annotations (openvinotoolkit#352, openvinotoolkit#388).
- Bounding boxes values decrement transform (openvinotoolkit#366)
- Improved error reporting in
Dataset
(openvinotoolkit#386) - Support ADE20K format (import only) (openvinotoolkit#400)
- Documentation website at https://openvinotoolkit.github.io/datumaro (openvinotoolkit#420)
- Datumaro no longer depends on scikit-image (openvinotoolkit#379)
Dataset
remembers export options on saving / exporting for the first time (openvinotoolkit#386)
- TBD
- TBD
- Application of
remap_labels
to dataset categories of different length (openvinotoolkit#314) - Patching of datasets in formats (openvinotoolkit#348)
- Improved Cityscapes export performance (openvinotoolkit#367)
- Incorrect format of
*_labelIds.png
in Cityscapes export (openvinotoolkit#325, openvinotoolkit#342) - Item id in ImageNet format (openvinotoolkit#371)
- Double quotes for ICDAR Word Recognition (openvinotoolkit#375)
- Wrong display of builtin formats in CLI (openvinotoolkit#332)
- Non utf-8 encoding of annotation files in Market-1501 export (openvinotoolkit#392)
- Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (openvinotoolkit#392)
- Saving of images with Unicode paths on Windows (openvinotoolkit#392)
- Calling
ProjectDataset.transform()
with a string argument (openvinotoolkit#402) - Attributes casting for CVAT format (openvinotoolkit#403)
- Loading of custom project plugins (openvinotoolkit#404)
- Reading, writing anno file and saving name of the subset for test subset (openvinotoolkit#447)
- Fixed unsafe unpickling in CIFAR import (openvinotoolkit#362)
- Support for import/export zip archives with images (openvinotoolkit#273)
- Subformat importers for VOC and COCO (openvinotoolkit#281)
- Support for KITTI dataset segmentation and detection format (openvinotoolkit#282)
- Updated YOLO format user manual (openvinotoolkit#295)
ItemTransform
class, which describes item-wise datasetTransform
s (openvinotoolkit#297)keep-empty
export parameter in VOC format (openvinotoolkit#297)- A base class for dataset validation plugins (openvinotoolkit#299)
- Partial support for the Open Images format; only images and image-level labels can be read/written (openvinotoolkit#291, openvinotoolkit#315).
- Support for Supervisely Point Cloud dataset format (openvinotoolkit#245, openvinotoolkit#353)
- Support for KITTI Raw / Velodyne Points dataset format (openvinotoolkit#245)
- Support for CIFAR-100 and documentation for CIFAR-10/100 (openvinotoolkit#301)
- Tensorflow AVX check is made optional in API and disabled by default (openvinotoolkit#305)
- Extensions for images in ImageNet_txt are now mandatory (openvinotoolkit#302)
- Several dependencies now have lower bounds (openvinotoolkit#308)
- TBD
- TBD
- Incorrect image layout on saving and a problem with ecoding on loading (openvinotoolkit#284)
- An error when XPath filter is applied to the dataset or its subset (openvinotoolkit#259)
- Tracking of
Dataset
changes done by transforms (openvinotoolkit#297) - Improved CLI startup time in several cases (openvinotoolkit#306)
- Known issue: loading CIFAR can result in arbitrary code execution (openvinotoolkit#327)
- Support for escaping in attribute values in LabelMe format (openvinotoolkit#49)
- Support for Segmentation Splitting (openvinotoolkit#223)
- Support for CIFAR-10/100 dataset format (openvinotoolkit#225, openvinotoolkit#243)
- Support for COCO panoptic and stuff format (openvinotoolkit#210)
- Documentation file and integration tests for Pascal VOC format (openvinotoolkit#228)
- Support for MNIST and MNIST in CSV dataset formats (openvinotoolkit#234)
- Documentation file for COCO format (openvinotoolkit#241)
- Documentation file and integration tests for YOLO format (openvinotoolkit#246)
- Support for Cityscapes dataset format (openvinotoolkit#249)
- Support for Validator configurable threshold (openvinotoolkit#250)
- LabelMe format saves dataset items with their relative paths by subsets without changing names (openvinotoolkit#200)
- Allowed arbitrary subset count and names in classification and detection splitters (openvinotoolkit#207)
- Annotation-less dataset elements are now participate in subset splitting (openvinotoolkit#211)
- Classification task in LFW dataset format (openvinotoolkit#222)
- Testing is now performed with pytest instead of unittest (openvinotoolkit#248)
- TBD
- TBD
- Added support for auto-merging (joining) of datasets with no labels and having labels (openvinotoolkit#200)
- Allowed explicit label removal in
remap_labels
transform (openvinotoolkit#203) - Image extension in CVAT format export (openvinotoolkit#214)
- Added a label "face" for bounding boxes in Wider Face (openvinotoolkit#215)
- Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (openvinotoolkit#216)
- Empty lines in YOLO annotations are ignored (openvinotoolkit#221)
- Export in VOC format when no image info is available (openvinotoolkit#239)
- Fixed saving attribute in WiderFace extractor (openvinotoolkit#251)
- TBD
- TBD
- Added an option to allow undeclared annotation attributes in CVAT format export (openvinotoolkit#192)
- COCO exports images in separate dirs by subsets. Added an option to control this (openvinotoolkit#195)
- TBD
- TBD
- Instance masks of
background
class no more introduce an instance (openvinotoolkit#188) - Added support for label attributes in Datumaro format (openvinotoolkit#192)
- TBD
- OpenVINO plugin examples (openvinotoolkit#159)
- Dataset validation for classification and detection datasets (openvinotoolkit#160)
- Arbitrary image extensions in formats (import and export) (openvinotoolkit#166)
- Ability to set a custom subset name for an imported dataset (openvinotoolkit#166)
- CLI support for NDR(openvinotoolkit#178)
- Common ICDAR format is split into 3 sub-formats (openvinotoolkit#174)
- TBD
- TBD
- The ability to work with file names containing Cyrillic and spaces (openvinotoolkit#148)
- Image reading and saving in ICDAR formats (openvinotoolkit#174)
- Unnecessary image loading on dataset saving (openvinotoolkit#176)
- Allowed spaces in ICDAR captions (openvinotoolkit#182)
- Saving of masks in VOC when masks are not requested (openvinotoolkit#184)
- TBD
- TBD
- TBD
- TBD
- TBD
- Images with no annotations are exported again in VOC formats (openvinotoolkit#123)
- Inference result for only one output layer in OpenVINO launcher (openvinotoolkit#125)
- TBD
Icdar13/15
dataset format (openvinotoolkit#96)- Laziness, source caching, tracking of changes and partial updating for
Dataset
(openvinotoolkit#102) Market-1501
dataset format (openvinotoolkit#108)LFW
dataset format (openvinotoolkit#110)- Support of polygons' and masks' confusion matrices and mismathing classes in
diff
command (openvinotoolkit#117) - Add near duplicate image removal plugin (openvinotoolkit#113)
- Sampler Plugin that analyzes inference result from the given dataset and selects samples for annotation(openvinotoolkit#115)
- OpenVINO model launcher is updated for OpenVINO r2021.1 (openvinotoolkit#100)
- TBD
- TBD
- High memory consumption and low performance of mask import/export, #53 (openvinotoolkit#101)
- Masks, covered by class 0 (background), should be exported with holes inside (openvinotoolkit#104)
diff
command invocation problem with missing class methods (openvinotoolkit#117)
- TBD
WiderFace
dataset format (openvinotoolkit#65, openvinotoolkit#90)- Function to transform annotations to labels (openvinotoolkit#66)
- Dataset splits for classification, detection and re-id tasks (openvinotoolkit#68, openvinotoolkit#81)
VGGFace2
dataset format (openvinotoolkit#69, openvinotoolkit#82)- Unique image count statistic (openvinotoolkit#87)
- Installation with pip by name
datumaro
Dataset
class extended with new operations:save
,load
,export
,import_from
,detect
,run_model
(openvinotoolkit#71)- Allowed importing
Extractor
-only defined formats (inProject.import_from
,dataset.import_from
and CLI/project import
) (openvinotoolkit#71) datum project ...
commands replaced withdatum ...
commands (openvinotoolkit#84)- Supported more image formats in
ImageNet
extractors (openvinotoolkit#85) - Allowed adding
Importer
-defined formats as project sources (source add
) (openvinotoolkit#86) - Added max search depth in
ImageDir
format and importers (openvinotoolkit#86)
datum project ...
CLI context (openvinotoolkit#84)
- TBD
- Allow plugins inherited from
Extractor
(instead of onlySourceExtractor
) (openvinotoolkit#70) - Windows installation with
pip
forpycocotools
(openvinotoolkit#73) YOLO
extractor path matching on Windows (openvinotoolkit#73)- Fixed inplace file copying when saving images (openvinotoolkit#76)
- Fixed
labelmap
parameter type checking inVOC
converter (openvinotoolkit#76) - Fixed model copying on addition in CLI (openvinotoolkit#94)
- TBD
CamVid
dataset format (openvinotoolkit#57)- Ability to install
opencv-python-headless
dependency withDATUMARO_HEADLESS=1
environment variable instead ofopencv-python
(openvinotoolkit#62)
- Allow empty supercategory in COCO (openvinotoolkit#54)
- Allow Pascal VOC to search in subdirectories (openvinotoolkit#50)
- TBD
- TBD
- TBD
- TBD
ImageNet
andImageNetTxt
dataset formats (openvinotoolkit#41)
- TBD
- TBD
- TBD
- Default
label-map
parameter value for VOC converter (openvinotoolkit#34) - Randomness of random split transform (openvinotoolkit#38)
Transform.subsets()
method (openvinotoolkit#38)- Supported unknown image formats in TF Detection API converter (openvinotoolkit#40)
- Supported empty attribute values in CVAT extractor (openvinotoolkit#45)
- TBD
ByteImage
class to represent encoded images in memory and avoid recoding on save (openvinotoolkit#27)
- Implementation of format plugins simplified (openvinotoolkit#22)
default
is now a default subset name, instead ofNone
. The values are interchangeable. (openvinotoolkit#22)- Improved performance of transforms (openvinotoolkit#22)
- TBD
image/depth
value from VOC export (openvinotoolkit#27)
- Zero division errors in dataset statistics (openvinotoolkit#31)
- TBD
reindex
option in COCO and CVAT converters (openvinotoolkit#18)- Support for relative paths in LabelMe format (openvinotoolkit#19)
- MOTS png mask format support (https://github.com/openvinotoolkit/datumaro/21)
- TBD
- TBD
- TBD
- TBD
- TBD
- Initial release
## [Unreleased]
### Added
- TBD
### Changed
- TBD
### Deprecated
- TBD
### Removed
- TBD
### Fixed
- TBD
### Security
- TBD