-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine Export or "Dump" mulitple Jobs into one .zip with train/test/val splits #791
Comments
@Sparrowtech , we are going to introduce projects where you can join all similar tasks and after that export images + annotations for the whole project. Will it work for you? |
That would be great! Look forward to the update and also would be great to have ability to export the images/annotations not only to one file but also option to export as Train, Test, & Validation sets. Thanks! |
@zhiltsov-max, I see that there has been some activity and a "Release" that has been made on this request. Not super familiar with how the Releases are made available from GitHub whether production or for Beta. Really simply question... Is this something I can have access to today or is it being embedded into another release down the road? Please advise if you don't mind. |
@nmanovic, please, answer here. |
@Sparrowtech , the feature will be available in Release 1.0.0 (~ end of February next year). During a week or two first prototype will be merged into develop branch. If you can test the implementation and confirm that it is something useful for you. We don't recommend to use develop in production but internally we use it for our own tasks. Does it answer on your question? |
yes, thank you and will look for feature in the development branch over the next few weeks. |
Let's keep the issue till it is resolved. |
Currently, it is possible with Datumaro:
|
Keeping open as a request for:
|
when can we expect to have the |
Done in #3365 |
WORKFLOW WORKAROUNDS: We've created individual "Jobs" to represent different classes of objects; i.e. "car, truck, van, helicopter, airplane, etc." largely due to CVAT difficulties-ability to load very large datasets. Each CVAT Job represents ~2500 images and tends to be collectively around 1GB in size between the images and annotations. Currently there are ~ 60 different jobs or classes of objects, 60 GB and ~ 150,000 images.
Routinely we create specific datasets (10-20 object classes or Jobs") which require a lot of post-exporting heavy lifting having to merge tfrecords or xml files into one or batches, not to mention splitting of train/test/val sets. I know that there are a lot of tools out there to help with pre-process and we currently employ many.
Would be ideal to have functionality to choose " Car, Airplane, Helicopter, Bus, ... etc" from the dashboard to EXPORT INTO ONE TASK... AND ability to choose ratio of images to be split into train/test/val sets. e.g. 70% train, 15% test, 15% val. resulting in .zip file(s) with images-annotations or tfrecords created. No extra processing for randomizing, just extract split % from each job and combined for e.g. "Train" insuring well balanced classes rather than relying on function later unknown which is just a random exercise.
Thanks!
The text was updated successfully, but these errors were encountered: