Skip to content

Commit

Permalink
Update changelog and docs (cvat-ai#98)
Browse files Browse the repository at this point in the history
* Update changelog

* Update docs
  • Loading branch information
Maxim Zhiltsov authored Jan 23, 2021
1 parent 30c0648 commit e1ed5f6
Show file tree
Hide file tree
Showing 5 changed files with 66 additions and 41 deletions.
8 changes: 4 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## 01/19/2021 - Release v0.1.5
## 01/23/2021 - Release v0.1.5
### Added
- `WiderFace` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/65>, <https://github.com/openvinotoolkit/datumaro/pull/90>)
- Function to transform annotations to labels (<https://github.com/openvinotoolkit/datumaro/pull/66>)
- Task-specific Splitter (<https://github.com/openvinotoolkit/datumaro/pull/68>, <https://github.com/openvinotoolkit/datumaro/pull/81>)
- Dataset splits for classification, detection and re-id tasks (<https://github.com/openvinotoolkit/datumaro/pull/68>, <https://github.com/openvinotoolkit/datumaro/pull/81>)
- `VGGFace2` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/69>, <https://github.com/openvinotoolkit/datumaro/pull/82>)
- Unique image count statistic (<https://github.com/openvinotoolkit/datumaro/pull/87>)
- Installation with pip by name `datumaro`

### Changed
- `Dataset` class extended with new operations: `save`, `load`, `export`, `import_from`, `detect`, `run_model` (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- `Dataset` operations return `Dataset` instances, allowing to chain operations (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- Allowed importing `Extractor`-only defined formats (in `Project.import_from`, `dataset.import_from` and CLI/`project import`) (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- `datum project ...` commands replaced with `datum ...` commands (<https://github.com/openvinotoolkit/datumaro/pull/84>)
- Supported more image formats in `ImageNet` extractor (<https://github.com/openvinotoolkit/datumaro/pull/85>)
- Supported more image formats in `ImageNet` extractors (<https://github.com/openvinotoolkit/datumaro/pull/85>)
- Allowed adding `Importer`-defined formats as project sources (`source add`) (<https://github.com/openvinotoolkit/datumaro/pull/86>)
- Added max search depth in `ImageDir` format and importers (<https://github.com/openvinotoolkit/datumaro/pull/86>)

Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ CVAT annotations ---> Publication, statistics etc.

[(Back to top)](#table-of-contents)

- Dataset reading, writing, conversion in any direction. Supported formats:
- Dataset reading, writing, conversion in any direction. [Supported formats](docs/user_manual.md#supported-formats):
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
Expand Down Expand Up @@ -188,7 +188,7 @@ python -m virtualenv venv
Install Datumaro package:

``` bash
pip install 'git+https://github.com/openvinotoolkit/datumaro'
pip install datumaro
```

## Usage
Expand Down Expand Up @@ -234,13 +234,14 @@ dataset = dataset.transform(project.env.transforms.get('remap_labels'),
{'cat': 'dog', # rename cat to dog
'truck': 'car', # rename truck to car
'person': '', # remove this label
}, default='delete')
}, default='delete') # remove everything else

# iterate over dataset elements
for item in dataset:
print(item.id, item.annotations)

# export the resulting dataset in COCO format
project.env.converters.get('coco').convert(dataset, save_dir='dst/dir')
dataset.export('dst/dir', 'coco')
```

> Check our [developer guide](docs/developer_guide.md) for additional information.
Expand Down
17 changes: 10 additions & 7 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,11 @@ Datumaro is:

## RC 1 vision

In the first version Datumaro should be a project manager for CVAT.
It should only consume data from CVAT. The collected dataset
can be downloaded by user to be operated on with Datumaro CLI.
*CVAT integration*

Datumaro needs to be integrated with [CVAT](https://github.com/openvinotoolkit/cvat),
extending CVAT UI capabilities regarding task and project operations.
It should be capable of downloading and processing data from CVAT.

<!--lint disable fenced-code-flag-->
```
Expand All @@ -94,6 +96,7 @@ can be downloaded by user to be operated on with Datumaro CLI.

- [x] Python API for user code
- [x] Installation as a package
- [x] Installation with `pip` by name
- [x] A command-line tool for dataset manipulations

### Features
Expand All @@ -106,7 +109,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
- [x] YOLO
- [x] TF Detection API
- [ ] Cityscapes
- [ ] ImageNet
- [x] ImageNet

- Dataset visualization (`show`)
- [ ] Ability to visualize a dataset
Expand All @@ -117,7 +120,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
- [x] Object counts (detection scenario)
- [x] Image-Class distribution (classification scenario)
- [x] Pixel-Class distribution (segmentation scenario)
- [ ] Image similarity clusters
- [x] Image similarity clusters
- [ ] Custom statistics

- Dataset building
Expand Down Expand Up @@ -164,7 +167,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
### Optional features

- Dataset publishing
- [ ] Versioning (for annotations, subsets, sources, etc.)
- [x] Versioning (for annotations, subsets, sources, etc.)
- [ ] Blur sensitive areas on images
- [ ] Tracking of legal information
- [ ] Documentation generation
Expand All @@ -175,7 +178,7 @@ can be downloaded by user to be operated on with Datumaro CLI.

- Dataset and model debugging
- [ ] Training visualization
- [ ] Inference explanation (`explain`)
- [x] Inference explanation (`explain`)
- [ ] White-box approach

### Properties
Expand Down
67 changes: 42 additions & 25 deletions docs/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,28 +38,27 @@ Datumaro has a number of dataset and annotation features:
- various annotation operations

```python
from datumaro.components.project import Environment, Dataset
from datumaro.components.dataset import Dataset
from datumaro.components.extractor import Bbox, Polygon, DatasetItem

# Import and save a dataset
env = Environment()
dataset = env.make_importer('voc')('src/dir').make_dataset()
env.converters.get('coco').convert(dataset, save_dir='dst/dir')
# Import and export a dataset
dataset = Dataset.import_from('src/dir', 'voc')
dataset.export('dst/dir', 'coco')

# Create a dataset, convert polygons to masks, save in PASCAL VOC format
dataset = Dataset.from_iterable([
DatasetItem(id='image1', annotations=[
Bbox(x=1, y=2, w=3, h=4, label=1),
Polygon([1, 2, 3, 2, 4, 4], label=2, attributes={'occluded': True}),
]),
DatasetItem(id='image1', annotations=[
Bbox(x=1, y=2, w=3, h=4, label=1),
Polygon([1, 2, 3, 2, 4, 4], label=2, attributes={'occluded': True}),
]),
], categories=['cat', 'dog', 'person'])
dataset = dataset.transform(env.transforms.get('polygons_to_masks'))
env.converters.get('voc').convert(dataset, save_dir='dst/dir')
dataset = dataset.transform('polygons_to_masks')
dataset.export('dst/dir', 'voc')
```

### The Dataset class

The `Dataset` class from the `datumaro.components.project` module represents
The `Dataset` class from the `datumaro.components.dataset` module represents
a dataset, consisting of multiple `DatasetItem`s. Annotations are
represented by members of the `datumaro.components.extractor` module,
such as `Label`, `Mask` or `Polygon`. A dataset can contain items from one or
Expand All @@ -80,16 +79,19 @@ The main operation for a dataset is iteration over its elements.
An item corresponds to a single image, a video sequence, etc. There are also
few other operations available, such as filtration (`dataset.select`) and
transformations (`dataset.transform`). A dataset can be created from extractors
or other datasets with `dataset.from_extractors` and directly from items with
`dataset.from_iterable`. A dataset is an extractor itself. If it is created from
multiple extractors, their categories must match, and their contents will be
merged.
or other datasets with `Dataset.from_extractors()` and directly from items with
`Dataset.from_iterable()`. A dataset is an extractor itself. If it is created
from multiple extractors, their categories must match, and their contents
will be merged.

A dataset item is an element of a dataset. Its `id` is a name of a
corresponding image. There can be some image `attributes`,
an `image` and `annotations`.

```python
from datumaro.components.dataset import Dataset
from datumaro.components.extractor import Bbox, Polygon, DatasetItem

# create a dataset from other datasets
dataset = Dataset.from_extractors(dataset1, dataset2)

Expand All @@ -105,7 +107,7 @@ dataset = Dataset.from_iterable([
dataset = dataset.select(lambda item: len(item.annotations) != 0)

# change dataset labels
dataset = dataset.transform(project.env.transforms.get('remap_labels'),
dataset = dataset.transform('remap_labels',
{'cat': 'dog', # rename cat to dog
'truck': 'car', # rename truck to car
'person': '', # remove this label
Expand All @@ -116,8 +118,7 @@ for item in dataset:
print(item.id, item.annotations)

# iterate over subsets
for subset_name in dataset.subsets():
subset = dataset.get_subset(subset_name) # a dataset, again
for subset_name, subset in dataset.subsets().items():
for item in subset:
print(item.id, item.annotations)
```
Expand All @@ -129,6 +130,7 @@ persistence, of extending, and CLI operation for Datasets. A project can
be converted to a Dataset with `project.make_dataset`. Project datasets
can have multiple data sources, which are merged on dataset creation. They
can have a hierarchy. Project configuration is available in `project.config`.
A dataset can be saved in `datumaro_project` format.

The `Environment` class is responsible for accessing built-in and
project-specific plugins. For a project, there is an instance of
Expand Down Expand Up @@ -204,11 +206,12 @@ YoloConverter.convert(dataset, save_dir=dst_dir)

### Writing a plugin

A plugin is a Python module with any name, which exports some symbols.
To export a symbol, inherit it from one of special classes:
A plugin is a Python module with any name, which exports some symbols. Symbols,
starting with `_` are not exported by default. To export a symbol,
inherit it from one of the special classes:

```python
from datumaro.components.extractor import Importer, SourceExtractor, Transform
from datumaro.components.extractor import Importer, Extractor, Transform
from datumaro.components.launcher import Launcher
from datumaro.components.converter import Converter
```
Expand All @@ -224,6 +227,19 @@ There is also an additional class to modify plugin appearance in command line:

```python
from datumaro.components.cli_plugin import CliPlugin

class MyPlugin(Converter, CliPlugin):
"""
Optional documentation text, which will appear in command-line help
"""

NAME = 'optional_custom_plugin_name'

def build_cmdline_parser(self, **kwargs):
parser = super().build_cmdline_parser(**kwargs)
# set up argparse.ArgumentParser instance
# the parsed args are supposed to be used as invocation options
return parser
```

#### Plugin example
Expand Down Expand Up @@ -269,13 +285,14 @@ class MyTransform(Transform, CliPlugin):
`my_plugin2.py` contents:

```python
from datumaro.components.extractor import SourceExtractor
from datumaro.components.extractor import Extractor

class MyFormat: ...
class MyFormatExtractor(SourceExtractor): ...
class _MyFormatConverter(Converter): ...
class MyFormatExtractor(Extractor): ...

exports = [MyFormat] # explicit exports declaration
# MyFormatExtractor won't be exported
# MyFormatExtractor and _MyFormatConverter won't be exported
```

## Command-line
Expand Down
6 changes: 5 additions & 1 deletion docs/user_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,15 @@ python -m virtualenv venv

Install:
``` bash
# From PyPI:
pip install datumaro

# From the GitHub repository:
pip install 'git+https://github.com/openvinotoolkit/datumaro'
```

> You can change the installation branch with `...@<branch_name>`
> Also note `--force-reinstall` parameter in this case.
> Also use `--force-reinstall` parameter in this case.
## Interfaces

Expand Down

0 comments on commit e1ed5f6

Please sign in to comment.