Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependencies on specific cloud provider packages in TransferJob and update test deprovision bucket names #884

Merged
merged 47 commits into from
Jun 23, 2023

Conversation

sarahwooders
Copy link
Contributor

No description provided.

sarahwooders and others added 30 commits June 15, 2023 06:57
…ne-project#865)

S3 currently does not have proper bucket cleanup after testing leading
to `TooManyBuckets` errors on tests after a certain point. This adds the
cleanup logic to every integration test.

Furthermore, the hadoop jdk installs were removed as they should not be
required anymore.
Instead of storing the vcpu limits as a hardcoded variable inside
Planner, created a new .csv file that includes that information. When
Planner is created, we read this csv file.

Wrote test cases that include fake quota limits.
Edited the AzureBlobInterface class:
* Wrote the logic for staging/uploading a block for a multipart upload
in the method upload_object
* Created two functions to 1) initiate the multipart upload and 2)
complete the multipart upload
* Since Azure works differently than s3 and gcs in that it doesn't
provide a global upload id for a destination object, I used the
destination object name instead as an upload id to stay consistent with
the other object stores. This pseudo-upload id is to keep track of which
blocks and their blockIDs belong to in the CopyJob/SyncJob.
* Upon completion of uploading/staging all blocks, all blocks for a
destination object are committed together.

More things to consider about this implementation:

Upload ID handling: Azure doesn't really have a concept equivalent to
AWS's upload IDs. Instead, blobs are created immediately and blocks are
associated with a blob via block IDs. My workaround of using the blob
name as the upload ID should work since I only use upload_id to
distinguish between requests in the finalize() method

Block IDs: It's worth noting that Azure requires block IDs to be of the
same length. I've appropriately handled this by formatting the IDs to be
of length len("{number of digits in max blocks supported by Azure
(50000) = 5}{destination_object_key}").

---------

Co-authored-by: Sarah Wooders <[email protected]>
* Modified the tests so that they load from an actual quota file instead
of me defining a dictionary.
* Modified planner so that it can accept a file name for the quota
limits (default to the skyplane config quota files)
* Added more tests for error conditions (no quota file is provided +
quota file is provided but the requested region is not included in the
quota file)

---------

Co-authored-by: Sarah Wooders <[email protected]>
Co-authored-by: Asim Biswal <[email protected]>
@sarahwooders sarahwooders changed the title Remove dependencies on specific cloud provider packages in TransferJob Remove dependencies on specific cloud provider packages in TransferJob and update test deprovision bucket names Jun 23, 2023
@sarahwooders sarahwooders merged commit e9ce64f into skyplane-project:main Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants