You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When importing image files from the share I have do not have the Copy data into CVAT checked, but the images are zipped and copied into the cvat_server docker image under the paths /home/django/data/data/<id>/original/*.zip and /home/django/data/data/<id>/compressed/*.zip, with the checkbox checked images are also copied to /home/django/data/data/<id>/raw/. When importing images into CVAT stored on my server I now have 3-4 copies of each image - the original images, the original images zipped, the compressed images zipped, and the raw images (if checkbox not checked). Many people would like to have only one copy of the images on disk.
Notes:
The compressed/0.zip is used by the ui to load images when labeling. Maybe we want to keep a compressed version around for the webui? A more memory friendly solution might be using progressive decoding or jpeg xl, where only part of the original image file is transferred if a lower resolution is desired (and the original file supports it).
The original/0.zip is used when exporting the dataset. Zipping the files doesn't do anything to reduce their size and it seems equally valid to provide the original images if they are available on the persistent share volume.
#2377 removed a single copy of the images in the raw folder. #2862 would like to see the functionality from #2377 be applied to cli usage #204 Asked for this same functionality - maybe I should re-open it, but it seems like enough time has passed and other developments have been made that I opted to create a new issue
Expected Behaviour
While importing image files from the share with the Copy data into CVAT unchecked no images should be copied into the docker images, available disk space should be approximately the same. There should be a flag to duplicate this behavior when copying using the cli tool.
Current Behaviour
While importing image files from the share with the Copy data into CVAT unchecked images are zipped and copied into the cvat_server docker image under:
/home/django/data/data/<id>/original/0.zip
/home/django/data/data/<id>/compressed/0.zip
The checkbox does stop images from being copied to /home/django/data/data/<id>/raw/, but the cli does not have a flag to duplicate this behavior (duplicate of #2862)
Possible Solution
Could create soft-links to the original files and add support for serving image files instead of just zip.
Could keep track of where the files live in the share volume and serve directly from the share volume. For a compressed version could support lower resolution images for file formats that support progressive decoding (eg jpeg xl, flif)
under advanced make sure Copy data into CVAT unchecked
submit
docker exec -ti cvat ls -1v /home/django/data/data | tail -n1 to get the id
docker cp cvat:/home/django/data/data/<id> ./ to copy the files for local inspection. Can verify that if you unzip the files in <id>/original/0.zip and <id>/compressed/0.zip they are derived from the original file.
Context
When importing images using the cli in an automated fashion, I found my import had halted due to the harddrive running out of memory when I had 100 GB of free disk space before starting. Also the import took much longer than expected, since the files were already on the server.
@shortcipher3, Hi,
Currently, there are 2 ways to create an annotation task:
You do not enable the checkbox Use cache and in this case the necessary chunks (e.g /home/django/data/data/<id>/original/0.zip, /home/django/data/data/<id>/compressed/0.zip) are prepared during task creation and saved in folders ../original/, ../compressed/
You enable the checkbox Use cache and in this case, the task is created on the fly, no data copies are created in the folders (original/compressed), and the necessary chunks are prepared as needed and stored in the cache.
My actions before raising this issue
When importing image files from the share I have do not have the
Copy data into CVAT
checked, but the images are zipped and copied into thecvat_server
docker image under the paths/home/django/data/data/<id>/original/*.zip
and/home/django/data/data/<id>/compressed/*.zip
, with the checkbox checked images are also copied to/home/django/data/data/<id>/raw/
. When importing images into CVAT stored on my server I now have 3-4 copies of each image - the original images, the original images zipped, the compressed images zipped, and the raw images (if checkbox not checked). Many people would like to have only one copy of the images on disk.Notes:
The
compressed/0.zip
is used by the ui to load images when labeling. Maybe we want to keep a compressed version around for the webui? A more memory friendly solution might be using progressive decoding or jpeg xl, where only part of the original image file is transferred if a lower resolution is desired (and the original file supports it).The
original/0.zip
is used when exporting the dataset. Zipping the files doesn't do anything to reduce their size and it seems equally valid to provide the original images if they are available on the persistentshare
volume.#2377 removed a single copy of the images in the
raw
folder.#2862 would like to see the functionality from #2377 be applied to cli usage
#204 Asked for this same functionality - maybe I should re-open it, but it seems like enough time has passed and other developments have been made that I opted to create a new issue
Expected Behaviour
While importing image files from the share with the
Copy data into CVAT
unchecked no images should be copied into the docker images, available disk space should be approximately the same. There should be a flag to duplicate this behavior when copying using thecli
tool.Current Behaviour
While importing image files from the share with the
Copy data into CVAT
unchecked images are zipped and copied into thecvat_server
docker image under:/home/django/data/data/<id>/original/0.zip
/home/django/data/data/<id>/compressed/0.zip
The checkbox does stop images from being copied to
/home/django/data/data/<id>/raw/
, but the cli does not have a flag to duplicate this behavior (duplicate of #2862)Possible Solution
Could create soft-links to the original files and add support for serving image files instead of just zip.
Could keep track of where the files live in the
share
volume and serve directly from theshare
volume. For a compressed version could support lower resolution images for file formats that support progressive decoding (eg jpeg xl, flif)Steps to Reproduce (for bugs)
share
volume (instructions)Copy data into CVAT
uncheckeddocker exec -ti cvat ls -1v /home/django/data/data | tail -n1
to get theid
docker cp cvat:/home/django/data/data/<id> ./
to copy the files for local inspection. Can verify that if you unzip the files in<id>/original/0.zip
and<id>/compressed/0.zip
they are derived from the original file.Context
When importing images using the cli in an automated fashion, I found my import had halted due to the harddrive running out of memory when I had 100 GB of free disk space before starting. Also the import took much longer than expected, since the files were already on the server.
Your Environment
git log -1
): commit 472d535docker version
(e.g. Docker 17.0.05): 20.10.8The text was updated successfully, but these errors were encountered: