Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVAT server doesn't handle interrupted uploads well #5261

Open
SpecLad opened this issue Nov 9, 2022 · 2 comments
Open

CVAT server doesn't handle interrupted uploads well #5261

SpecLad opened this issue Nov 9, 2022 · 2 comments
Assignees
Labels
bug Something isn't working server

Comments

@SpecLad
Copy link
Contributor

SpecLad commented Nov 9, 2022

When I was working on #4839, I checked how the server would handle an upload that was interrupted (e.g. by a network disconnection), and the result was not good. The server tries to read the entire request body into memory at once:

https://github.com/opencv/cvat/blob/e5d01359aa09521c9ca87f0ec77a6dede097211b/cvat/apps/engine/mixins.py#L94

If the upload is interrupted, that raises an exception, and all data that has already been transmitted is discarded. This is obviously suboptimal - the whole point of TUS is to support upload resumption, so if some data has been transmitted successfully, it should be written to the file, and the upload should then be able to resume from that point.

The current behavior is also problematic from another perspective: by reading the entire upload into memory, we might potentially consume an unbounded amount of memory.

I think the right thing to do here is to read the uploaded file in small chunks, and write each chunk to the destination file before handling the next one. That will solve both problems.

@SpecLad
Copy link
Contributor Author

SpecLad commented Nov 9, 2022

The current behavior is also problematic from another perspective: by reading the entire upload into memory, we might potentially consume an unbounded amount of memory.

Okay, that wasn't quite accurate: the amount of memory is bounded by the DATA_UPLOAD_MAX_MEMORY_SIZE Django setting, which is currently set to 100 MiB. Still, we could avoid consuming even that much by reading the request body in chunks.

@SpecLad
Copy link
Contributor Author

SpecLad commented Nov 9, 2022

Come to think of it, this also means that we don't accept TUS upload requests with a chunk size greater than 100 MB. Which is not great, but also not terrible, since neither the UI nor the SDK should send chunks bigger than that.

@nmanovic nmanovic added the enhancement New feature or request label Nov 19, 2022
@nmanovic nmanovic added bug Something isn't working server and removed enhancement New feature or request labels Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working server
Projects
None yet
Development

No branches or pull requests

2 participants