Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(job_attachments): distinguish large/small file uploads #114

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

gahyusuh
Copy link
Contributor

@gahyusuh gahyusuh commented Nov 20, 2023

What was the problem/requirement? (What/Why)

Uploading multiple large files at the same time overall hurts the job bundles experience. Consider the difference:

  • Parallel large files:
    • Job bundles starts 4 100MB file uploads in parallel. It works for a while, so each upload gets to 53%. There's a failure which stops everything.
    • Retry - it restarts everything from scratch, even though it uploaded 200MB of data last time
  • Serial large files:
    • Job bundles starts 1 100MB file upload. It finishes it, starts a second 100MB upload. Starts the third, then fails (same 53% mark overall).
    • Retry - it restarts from the third upload.

What was the solution? (How)

We do serial large file uploads.

  • Split the input files to upload into two separate queues based on their sizes, one for smaller files and another for larger ones.
    • Size threshold to draw a line on large vs. small file: 160 MB (= 20 * 8 MB. Picked multiple of the part size, 8 MB, which is a default value configured for multi-part uploads.)
  • Process each queue differently.
    • First, process the whole "small file" queue with parallel object uploads.
    • Then, once the small files are done, process the whole "large file" queue with serial object uploads (but still parallel multi-part upload).
  • For example: A job bundle has 15 1-MB files, and 5 1-GB files to upload. Those files are split into 'small file' queue (which includes 15 1-MB files,) and 'large file' queue (which includes 5 1-GB files.) First, 15 small files get uploaded in parallel object uploading, and then those 5 large files are uploaded in serial.

What is the impact of this change?

Better UX. Especially, when a job submission with multiple large files is canceled during upload, retrying on submitting (the same job) later can provide much faster submission speeds.

How was this change tested?

  • Unit tests: ran hatch run lint && hatch run test and ensured all tests passed.
  • Test script: ran scripted_tests/upload_cancel_test.py to verify that the files in 'large file' queue got uploaded serially.
  • End-to-end test: made sure the job submission is working as normal.

Was this change documented?

No.

Is this a breaking change?

No.

@gahyusuh gahyusuh force-pushed the gahyusuh/serial_large_files branch from cc6ec8e to 4d86874 Compare November 20, 2023 18:05
@gahyusuh gahyusuh marked this pull request as ready for review November 20, 2023 18:15
@gahyusuh gahyusuh requested a review from a team as a code owner November 20, 2023 18:15
@gahyusuh gahyusuh force-pushed the gahyusuh/serial_large_files branch from 4d86874 to 60f4fd0 Compare November 20, 2023 18:29
Copy link
Contributor

@marofke marofke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! We've been wanting to do this improvement for a while, so nice to see it go in. Just a few nit comments

src/deadline/job_attachments/upload.py Show resolved Hide resolved
src/deadline/job_attachments/upload.py Show resolved Hide resolved
@gahyusuh gahyusuh force-pushed the gahyusuh/serial_large_files branch from 60f4fd0 to b3b061d Compare November 21, 2023 20:16
@gahyusuh gahyusuh force-pushed the gahyusuh/serial_large_files branch from b3b061d to 93cff4e Compare November 26, 2023 03:28
- Split the input files to upload into two separate queues based on their sizes, one for smaller files and another for larger ones.
- First, process the whole "small file" queue with parallel object uploads. Then, once the small files are done, process the whole "large file" queue with serial object uploads (but still parallel multi-part upload).

Signed-off-by: Gahyun Suh <[email protected]>
@gahyusuh gahyusuh force-pushed the gahyusuh/serial_large_files branch from 93cff4e to 6897ba7 Compare November 28, 2023 19:15
@gahyusuh gahyusuh merged commit 03edab1 into mainline Nov 28, 2023
18 checks passed
@gahyusuh gahyusuh deleted the gahyusuh/serial_large_files branch November 28, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants