-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK: Add an adapter layer that presents a CVAT task as a torchvision dataset #5417
Conversation
label_id_to_index: Mapping[int, int] | ||
|
||
|
||
class TaskDataset(torchvision.datasets.VisionDataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't inherit the class, which problems we will have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But at the same time, if we want to test the feature, we need to install torch and torchvision explicitly, but indeed we don't need them. What do think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't inherit the class, which problems we will have?
That depends on what the alternative is. The PyTorch DataLoader
class requires the dataset to be an instance of torch.utils.data.Dataset
, so we would at least have to inherit from that.
We could inherit directly from Dataset
, rather than VisionDataset
(in fact, for a long time, that was what my prototype did). The VisionDataset
class provides dataset transform support that helps the class behave more like other torchvision datasets; however we could, in principle, reimplement that ourselves. I eventually decided to use VisionDataset
because IMO, if you're working with vision datasets in PyTorch, you'll want to use torchvision anyway for its image processing transforms.
We could implement multiple classes (e.g. TaskDataset
inheriting from Dataset
and TaskVisionDataset
inheriting from VisionDataset
) and multiple sets of extras (with torchvision and without), but that just seems like overcomplicating things.
But at the same time, if we want to test the feature, we need to install torch and torchvision explicitly, but indeed we don't need them. What do think?
I don't understand the question. Naturally we would have to install PyTorch and torchvision to test integration with them.
|
||
_ModelType = TypeVar("_ModelType") | ||
|
||
_CACHE_DIR = Path(appdirs.user_cache_dir("CVAT_SDK", "CVAT.ai")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_CACHE_DIR = Path(appdirs.user_cache_dir("CVAT_SDK", "CVAT.ai")) | |
_CACHE_DIR = Path(appdirs.user_cache_dir("cvat-sdk", "CVAT.ai")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
label_id_to_index: Mapping[int, int] | ||
|
||
|
||
class TaskDataset(torchvision.datasets.VisionDataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can I find ProjectDataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is none at the moment; however, one could be built easily using ConcatDataset
.
Frankly, I'm not sure if it would even make sense to create a Dataset
out of the entire project - such a dataset would contain all of the subsets, so it wouldn't be suitable for either training or testing.
efaec34
to
d19cea6
Compare
We might eventually support other types of datasets, so use a slightly more specific name.
Forcing the user to specify a filter ensures that we won't surprise them later by causing additional boxes to appear if we add support for more shape types.
bdfd29b
to
6986cb9
Compare
Some notes I made while developing this: There's an issue with CVAT's data model that makes it somewhat difficult to translate CVAT tasks to datasets. In a dataset, a label is usually represented by a numeric index, but there's no corresponding concept in CVAT. CVAT labels have IDs, which are numeric, but they are not controllable by the user, and different projects will have different label IDs, even if their sets of labels are the same. Therefore, I don't think CVAT label IDs are suitable for use as indexes when working with neural networks. In this PR, I assign indexes by enumerating all labels in order of ascending ID, which is at least somewhat controllable by the user (you can enter the labels in the correct order), but still not amazing. There's no way to determine from the CVAT UI which labels will have which indexes, and no way to affect the indexes once they've been created. IMO, for better interoperability with deep learning frameworks, CVAT should assign each label a numeric index, and let the user customize it. Then we can use those assignments in the PyTorch adapter (and other adapters, should we make them). Another way to help the user in this regard would be to let them specify a custom label-id-to-index mapping when instantiating the adapter class. The way I extract the Nth frame in a chunk is by taking the Nth entry in the ZIP file's info list, which is the same method that the UI uses. It's not great, though; we're relying on the zip producer library that the server uses to record the files in the catalog in the order that they were specified, which doesn't seem especially safe to me. I don't think the order of files in the catalog is meant to be part of ZIP's data model. IMO, it would be better to record some sort of manifest in each chunk file that would give the member name for each frame index. Alternatively, this manifest could be part of the data metadata (i.e. the result of I haven't yet implemented some features that could be useful in the adapter layer. Some of them are already documented in the docstrings (as "limitations"). Some others are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It looks good to me as an initial version. Let's us merge it.
Oh, I forgot another note: I'm currently wiping the entire task from the cache when a change in the This is just because I wanted to protect against the scenario where a future version of CVAT does introduce a way to alter task data, and a user with an old version of the SDK receives stale data, because that version never purges data from the cache. Ideally, there should be a way to independently verify how up-to-date each chunk is, so that we can redownload the absolutely minimum amount of information needed. I was thinking we could use conditional HTTP queries (AKA Of course, this would need the server to support |
Personally, I like this variant, since it is specific to the training. At least, as an option. On the other hand, the ability to specify it in CVAT directly would make the process more transparent and provide a single point of configuration for this. Many datasets rely on the label order.
Probably, it's better to keep cached chunks. CVAT doesn't allow to change data once it's uploaded, and there are no expected changes in this logic. The data can be changed, however, if we use a shared folder and the caching is enabled. The data can also be changed in a project. Per chunk update reporting sounds good to me, but it's probably quite far from what we have now.
I think, we can allow to select a subset to work with. |
This is now implemented in #5455. |
Motivation and context
How has this been tested?
Unit tests; custom demo script.
Checklist
develop
branch[ ] I have linked related issues (read github docs)[ ] I have increased versions of npm packages if it is necessary (cvat-canvas,cvat-core, cvat-data and cvat-ui)
License
Feel free to contact the maintainers if that's a concern.