Add back compression and encryption #877

lynnliu030 · 2023-06-21T03:55:28Z

No description provided.

S3 currently does not have proper bucket cleanup after testing leading to `TooManyBuckets` errors on tests after a certain point. This adds the cleanup logic to every integration test. Furthermore, the hadoop jdk installs were removed as they should not be required anymore.

Instead of storing the vcpu limits as a hardcoded variable inside Planner, created a new .csv file that includes that information. When Planner is created, we read this csv file. Wrote test cases that include fake quota limits.

Edited the AzureBlobInterface class: * Wrote the logic for staging/uploading a block for a multipart upload in the method upload_object * Created two functions to 1) initiate the multipart upload and 2) complete the multipart upload * Since Azure works differently than s3 and gcs in that it doesn't provide a global upload id for a destination object, I used the destination object name instead as an upload id to stay consistent with the other object stores. This pseudo-upload id is to keep track of which blocks and their blockIDs belong to in the CopyJob/SyncJob. * Upon completion of uploading/staging all blocks, all blocks for a destination object are committed together. More things to consider about this implementation: Upload ID handling: Azure doesn't really have a concept equivalent to AWS's upload IDs. Instead, blobs are created immediately and blocks are associated with a blob via block IDs. My workaround of using the blob name as the upload ID should work since I only use upload_id to distinguish between requests in the finalize() method Block IDs: It's worth noting that Azure requires block IDs to be of the same length. I've appropriately handled this by formatting the IDs to be of length len("{number of digits in max blocks supported by Azure (50000) = 5}{destination_object_key}"). --------- Co-authored-by: Sarah Wooders <[email protected]>

* Modified the tests so that they load from an actual quota file instead of me defining a dictionary. * Modified planner so that it can accept a file name for the quota limits (default to the skyplane config quota files) * Added more tests for error conditions (no quota file is provided + quota file is provided but the requested region is not included in the quota file) --------- Co-authored-by: Sarah Wooders <[email protected]> Co-authored-by: Asim Biswal <[email protected]>

…rypt

sarahwooders · 2023-07-10T22:16:22Z

So in the current implementation, my understanding is that each time data moves between a VM it is compressed and decompressed? This is fine for now, but maybe we should add an issue so in the future the data is compressed/encrypted just once when it is read from the object store.

sarahwooders

Added a few minor nits but looks great otherwise! Feel free to merge once cleaned up.

skyplane/api/dataplane.py

skyplane/planner/planner.py

lynnliu030 · 2023-07-10T22:49:57Z

So in the current implementation, my understanding is that each time data moves between a VM it is compressed and decompressed? This is fine for now, but maybe we should add an issue so in the future the data is compressed/encrypted just once when it is read from the object store.

@sarahwooders I don't think so? all of these are specified in the gateway program; it'll only be decompressed or decrypted if you set it explicitly in the gateway programs of VMs located in destination regions

sarahwooders and others added 12 commits June 15, 2023 09:37

Make MulticastDirectPlanner take in TransferConfig (#872)

b2741c5

Disable cloudflare for integration tests (#864)

faba29b

little changes to the fall back logic + tests (#859)

c601c34

Instead of storing the vcpu limits as a hardcoded variable inside Planner, created a new .csv file that includes that information. When Planner is created, we read this csv file. Wrote test cases that include fake quota limits.

Cleanup planner to take in TransferConfig and remove unused planner code

ac4d589

add compression and encryption

4e68500

Merge branch '0.3.1-release' into compress_encrypt

edf7490

change back to 64 connections

923a533

Add pytest integration tests (#874)

d6c5430

Merge remote-tracking branch 'origin/0.3.1-release' into compress_enc…

72c74ce

…rypt

lynnliu030 requested a review from sarahwooders June 21, 2023 04:24

lynnliu030 added 2 commits June 20, 2023 21:46

reformat

aa8a630

typo

22dca42

Base automatically changed from 0.3.1-release to main June 22, 2023 22:20

lynnliu030 added 2 commits July 8, 2023 17:39

Merge branch 'main' into compress_encrypt

14d01d5

reformat

4180091

sarahwooders approved these changes Jul 10, 2023

View reviewed changes

skyplane/api/dataplane.py Outdated Show resolved Hide resolved

skyplane/api/dataplane.py Outdated Show resolved Hide resolved

skyplane/planner/planner.py Show resolved Hide resolved

lynnliu030 added 2 commits July 10, 2023 16:00

remove print

b82de44

reformat

5825c62

sarahwooders merged commit f4fea11 into main Jul 11, 2023

sarahwooders deleted the compress_encrypt branch July 11, 2023 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add back compression and encryption #877

Add back compression and encryption #877

lynnliu030 commented Jun 21, 2023

sarahwooders commented Jul 10, 2023

sarahwooders left a comment

lynnliu030 commented Jul 10, 2023

Add back compression and encryption #877

Add back compression and encryption #877

Conversation

lynnliu030 commented Jun 21, 2023

sarahwooders commented Jul 10, 2023

sarahwooders left a comment

Choose a reason for hiding this comment

lynnliu030 commented Jul 10, 2023