SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset #5417

SpecLad · 2022-12-02T16:54:29Z

Motivation and context

How has this been tested?

Unit tests; custom demo script.

Checklist

I submit my changes into the develop branch
I have added a description of my changes into CHANGELOG file
I have updated the documentation accordingly
I have added tests to cover my changes
~~[ ] I have linked related issues (read github docs)~~
[ ] I have increased versions of npm packages if it is necessary (cvat-canvas,
cvat-core, cvat-data and cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

nmanovic · 2022-12-02T19:30:27Z

cvat-sdk/cvat_sdk/pytorch/__init__.py

+    label_id_to_index: Mapping[int, int]
+
+
+class TaskDataset(torchvision.datasets.VisionDataset):


If we don't inherit the class, which problems we will have?

But at the same time, if we want to test the feature, we need to install torch and torchvision explicitly, but indeed we don't need them. What do think?

If we don't inherit the class, which problems we will have?

That depends on what the alternative is. The PyTorch DataLoader class requires the dataset to be an instance of torch.utils.data.Dataset, so we would at least have to inherit from that.

We could inherit directly from Dataset, rather than VisionDataset (in fact, for a long time, that was what my prototype did). The VisionDataset class provides dataset transform support that helps the class behave more like other torchvision datasets; however we could, in principle, reimplement that ourselves. I eventually decided to use VisionDataset because IMO, if you're working with vision datasets in PyTorch, you'll want to use torchvision anyway for its image processing transforms.

We could implement multiple classes (e.g. TaskDataset inheriting from Dataset and TaskVisionDataset inheriting from VisionDataset) and multiple sets of extras (with torchvision and without), but that just seems like overcomplicating things.

But at the same time, if we want to test the feature, we need to install torch and torchvision explicitly, but indeed we don't need them. What do think?

I don't understand the question. Naturally we would have to install PyTorch and torchvision to test integration with them.

nmanovic · 2022-12-02T19:46:50Z

cvat-sdk/cvat_sdk/pytorch/__init__.py

+
+_ModelType = TypeVar("_ModelType")
+
+_CACHE_DIR = Path(appdirs.user_cache_dir("CVAT_SDK", "CVAT.ai"))


Suggested change

_CACHE_DIR = Path(appdirs.user_cache_dir("CVAT_SDK", "CVAT.ai"))

_CACHE_DIR = Path(appdirs.user_cache_dir("cvat-sdk", "CVAT.ai"))

nmanovic · 2022-12-02T19:49:25Z

cvat-sdk/cvat_sdk/pytorch/__init__.py

+    label_id_to_index: Mapping[int, int]
+
+
+class TaskDataset(torchvision.datasets.VisionDataset):


Where can I find ProjectDataset?

There is none at the moment; however, one could be built easily using ConcatDataset.

Frankly, I'm not sure if it would even make sense to create a Dataset out of the entire project - such a dataset would contain all of the subsets, so it wouldn't be suitable for either training or testing.

…dataset

We might eventually support other types of datasets, so use a slightly more specific name.

Forcing the user to specify a filter ensures that we won't surprise them later by causing additional boxes to appear if we add support for more shape types.

SpecLad · 2022-12-08T14:53:17Z

Some notes I made while developing this:

There's an issue with CVAT's data model that makes it somewhat difficult to translate CVAT tasks to datasets. In a dataset, a label is usually represented by a numeric index, but there's no corresponding concept in CVAT. CVAT labels have IDs, which are numeric, but they are not controllable by the user, and different projects will have different label IDs, even if their sets of labels are the same. Therefore, I don't think CVAT label IDs are suitable for use as indexes when working with neural networks.

In this PR, I assign indexes by enumerating all labels in order of ascending ID, which is at least somewhat controllable by the user (you can enter the labels in the correct order), but still not amazing. There's no way to determine from the CVAT UI which labels will have which indexes, and no way to affect the indexes once they've been created.

IMO, for better interoperability with deep learning frameworks, CVAT should assign each label a numeric index, and let the user customize it. Then we can use those assignments in the PyTorch adapter (and other adapters, should we make them).

Another way to help the user in this regard would be to let them specify a custom label-id-to-index mapping when instantiating the adapter class.

The way I extract the Nth frame in a chunk is by taking the Nth entry in the ZIP file's info list, which is the same method that the UI uses. It's not great, though; we're relying on the zip producer library that the server uses to record the files in the catalog in the order that they were specified, which doesn't seem especially safe to me. I don't think the order of files in the catalog is meant to be part of ZIP's data model.

IMO, it would be better to record some sort of manifest in each chunk file that would give the member name for each frame index. Alternatively, this manifest could be part of the data metadata (i.e. the result of GET /api/tasks/<id>/data/meta).

I haven't yet implemented some features that could be useful in the adapter layer. Some of them are already documented in the docstrings (as "limitations"). Some others are:

Accessing images from an external directory (rather than downloading chunks). Could be useful if the task uses files from a mounted share, and the user has also mounted that share. Shouldn't be too difficult to implement; the file names that need to be accessed are part of the data metadata that we already download (to check for deleted frames).
Limiting the dataset to one or several jobs that are part of the task. To support this, we need to a) filter out frames that are not in those jobs, and b) query the job annotations instead of task annotations and combine them ourselves. I assume CVAT combines annotations on overlapping frames by just concatenating the corresponding arrays, but I would have to check to be sure.

nmanovic

OK. It looks good to me as an initial version. Let's us merge it.

SpecLad · 2022-12-08T16:46:32Z

Oh, I forgot another note:

I'm currently wiping the entire task from the cache when a change in the updated_date is detected. This includes the data chunks, which might seem rather imprudent, given that CVAT (as far as I know) does not let the user modify the data once it has been uploaded, and it would be much more efficient to keep the chunks when wiping out the cache.

This is just because I wanted to protect against the scenario where a future version of CVAT does introduce a way to alter task data, and a user with an old version of the SDK receives stale data, because that version never purges data from the cache.

Ideally, there should be a way to independently verify how up-to-date each chunk is, so that we can redownload the absolutely minimum amount of information needed. I was thinking we could use conditional HTTP queries (AKA If-Modified-Since) to redownload chunks if the updated_date of a task does not match the last modified timestamp of the local chunk. If the server replies that the chunk hasn't changed, we simply bump the last modified date of the local chunk file. This would be fairly efficient (still need one HTTP query per chunk, but no actual data retransmission in the common case) and more future-proof.

Of course, this would need the server to support If-Modified-Since queries. I don't know if it currently does, or how difficult it would be to implement. However, given that we offload file transfers to Django (via django-sendfile2), I wouldn't be surprised if we got conditional queries for free.

zhiltsov-max · 2022-12-09T09:11:21Z

[no way to specify label index in CVAT]
Another way to help the user in this regard would be to let them specify a custom label-id-to-index mapping when instantiating the adapter class

Personally, I like this variant, since it is specific to the training. At least, as an option. On the other hand, the ability to specify it in CVAT directly would make the process more transparent and provide a single point of configuration for this. Many datasets rely on the label order.

[all the cache is cleaned] This is just because I wanted to protect against the scenario where a future version of CVAT does introduce a way to alter task data

Probably, it's better to keep cached chunks. CVAT doesn't allow to change data once it's uploaded, and there are no expected changes in this logic. The data can be changed, however, if we use a shared folder and the caching is enabled. The data can also be changed in a project. Per chunk update reporting sounds good to me, but it's probably quite far from what we have now.

[projects are not implemented, because it is difficult to use them]

I think, we can allow to select a subset to work with.

SpecLad · 2022-12-13T09:46:13Z

Another way to help the user in this regard would be to let them specify a custom label-id-to-index mapping when instantiating the adapter class.

This is now implemented in #5455.

…dataset (cvat-ai#5417)

nmanovic reviewed Dec 2, 2022

View reviewed changes

SpecLad force-pushed the pytorch-adapter branch 3 times, most recently from efaec34 to d19cea6 Compare December 7, 2022 18:49

SpecLad changed the title ~~SDK: Add an adapter layer that presents a CVAT task as a torchvision …~~ SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset Dec 8, 2022

SpecLad added 7 commits December 8, 2022 12:27

SDK: Add an adapter layer that presents a CVAT task as a torchvision …

4913c5c

…dataset

SDK: Add tests for TaskDataset

0294e23

Rename TaskDataset to TaskVisionDataset

0f8d0ab

We might eventually support other types of datasets, so use a slightly more specific name.

ExtractBoundingBoxes: remove default filter

a7318d5

Forcing the user to specify a filter ensures that we won't surprise them later by causing additional boxes to appear if we add support for more shape types.

cvat_sdk/pytorch: add docstrings

55e24b8

Update the workflows to install the pytorch SDK extra

fe4d6c7

Update the changelog

6986cb9

SpecLad force-pushed the pytorch-adapter branch from bdfd29b to 6986cb9 Compare December 8, 2022 09:27

SpecLad marked this pull request as ready for review December 8, 2022 09:27

SpecLad requested a review from azhavoro as a code owner December 8, 2022 09:27

SpecLad requested a review from zhiltsov-max December 8, 2022 09:27

SpecLad mentioned this pull request Dec 8, 2022

Deleting frames via the API does not bump the task's "updated date" #5437

Open

2 tasks

nmanovic approved these changes Dec 8, 2022

View reviewed changes

nmanovic merged commit 487c60c into cvat-ai:develop Dec 8, 2022

SpecLad deleted the pytorch-adapter branch December 8, 2022 16:00

nmanovic mentioned this pull request Dec 12, 2022

CVAT v2.3.0 #5452

Merged

mikhail-treskin pushed a commit to retailnext/cvat that referenced this pull request Jul 1, 2023

SDK: Add an adapter layer that presents a CVAT task as a torchvision …

86160ee

…dataset (cvat-ai#5417)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset #5417

SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset #5417

SpecLad commented Dec 2, 2022 •

edited

Loading

nmanovic Dec 2, 2022

nmanovic Dec 2, 2022 •

edited

Loading

SpecLad Dec 6, 2022

nmanovic Dec 2, 2022

SpecLad Dec 7, 2022

nmanovic Dec 2, 2022

SpecLad Dec 6, 2022

SpecLad commented Dec 8, 2022

nmanovic left a comment

SpecLad commented Dec 8, 2022

zhiltsov-max commented Dec 9, 2022

SpecLad commented Dec 13, 2022

		label_id_to_index: Mapping[int, int]


		class TaskDataset(torchvision.datasets.VisionDataset):


		_ModelType = TypeVar("_ModelType")

		_CACHE_DIR = Path(appdirs.user_cache_dir("CVAT_SDK", "CVAT.ai"))

SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset #5417

SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset #5417

Conversation

SpecLad commented Dec 2, 2022 • edited Loading

Motivation and context

How has this been tested?

Checklist

License

nmanovic Dec 2, 2022

Choose a reason for hiding this comment

nmanovic Dec 2, 2022 • edited Loading

Choose a reason for hiding this comment

SpecLad Dec 6, 2022

Choose a reason for hiding this comment

nmanovic Dec 2, 2022

Choose a reason for hiding this comment

SpecLad Dec 7, 2022

Choose a reason for hiding this comment

nmanovic Dec 2, 2022

Choose a reason for hiding this comment

SpecLad Dec 6, 2022

Choose a reason for hiding this comment

SpecLad commented Dec 8, 2022

nmanovic left a comment

Choose a reason for hiding this comment

SpecLad commented Dec 8, 2022

zhiltsov-max commented Dec 9, 2022

SpecLad commented Dec 13, 2022

SpecLad commented Dec 2, 2022 •

edited

Loading

nmanovic Dec 2, 2022 •

edited

Loading