Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add as_url method to each CloudPath implementation with ability to generate presigned_urls #236

Merged
merged 26 commits into from
Feb 17, 2024

Conversation

kabirkhan
Copy link
Contributor

@kabirkhan kabirkhan commented Jun 3, 2022

Overview

Adds an as_url method that returns the https:// address of the object or generates a presigned URL if presign=True is passed.

Based on #235

Usage

from cloudpathlib import CloudPath

path = CloudPath("az://tests/test.txt")

print(path.as_url())
# https://myaccount.blob.core.windows.net/tests/text.txt

print(path.as_url(presign=True))
# https://myaccount.blob.core.windows.net/tests/text.txt?se=...[OTHER_SAS_TOKEN_PARAMS]'

print(path.as_url(presign=True, expire_seconds=10))
# URL will now expire in 10 seconds
# https://myaccount.blob.core.windows.net/tests/text.txt?se=...[OTHER_SAS_TOKEN_PARAMS]'

Open Questions

1. What should we call this method? as_url would be fine but there's already the Pathlib as_uri method so it's probably confusing.

Maybe we make as_uri return the default https:// address for the blob instead of the str of the Path. Then we have this method for generating presigned urls be more specific (e.g. get_signed_url)
So for the above example:

from cloudpathlib import CloudPath

path = CloudPath("az://tests/test.txt")

print(path.as_uri())
# https://myaccount.blob.core.windows.net/tests/text.txt
# 
# Note: Currently this would return the str of Path so:
# "az://tests/text.txt"

print(path.get_signed_url())
# https://myaccount.blob.core.windows.net/tests/text.txt?se=...[OTHER_SAS_TOKEN_PARAMS]'

@pjbull
Copy link
Member

pjbull commented Jun 17, 2022

@kabirkhan Thanks for this, I think the implementation is looking great. A few questions:

(1) Do you know if any of the provider specific SDKs support getting the URL? Might be more reliable than the f-string version, even in the non-presigned case
(2) The S3 will need to handle if a custom endpoint url is set (e.g., on a MinIO server)
(3) I'm partial to using as_url and having that support the presigning, and also leaving as_uri alone. I do definitely see the point you brought up that it is potentially confusing, but IMO at least for S3 there's enough treatment of the s3://... form as a URI that having both options may provide utility (even if you can get the as_uri with str already).

Other than those qs, this will need:

  • tests
  • documentation updates

Also, do you have creds that you can use to test the various live versions of the services? If not, I can share some with you.

@pjbull
Copy link
Member

pjbull commented Jul 22, 2023

@kabirkhan I know this is a little old, but I'm pushing to get some valuable stale PRs in if possible. Would you have time to wrap the implementation here with the items above?

@kabirkhan
Copy link
Contributor Author

Hey @pjbull, I found some time! lol sorry for the super late reply here. I ended up not needing this for my work but it popped in my head that I never finished this and it's legitimately a useful addition.

Finished up the main code changes and this now uses the SDK of each provider to get the public URLs as well so local emulators should work.

Started working on the tests as well but not all the there yet.

@kabirkhan kabirkhan marked this pull request as ready for review February 2, 2024 20:01
@pjbull
Copy link
Member

pjbull commented Feb 2, 2024

Awesome, thanks @kabirkhan! I actually had just had to reach into the S3Client and do this with the boto client directly, so excited to get this in.

I'll take a look this weekend.

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (991704a) 94.3% compared to head (5867217) 93.9%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #236     +/-   ##
========================================
- Coverage    94.3%   93.9%   -0.5%     
========================================
  Files          22      22             
  Lines        1560    1615     +55     
========================================
+ Hits         1472    1517     +45     
- Misses         88      98     +10     
Files Coverage Δ
cloudpathlib/azure/azblobclient.py 94.2% <100.0%> (+0.3%) ⬆️
cloudpathlib/azure/azblobpath.py 93.3% <100.0%> (+0.6%) ⬆️
cloudpathlib/gs/gsclient.py 93.5% <100.0%> (-1.1%) ⬇️
cloudpathlib/gs/gspath.py 94.6% <100.0%> (+0.5%) ⬆️
cloudpathlib/s3/s3client.py 92.9% <100.0%> (-1.1%) ⬇️
cloudpathlib/s3/s3path.py 98.1% <100.0%> (+0.1%) ⬆️
cloudpathlib/cloudpath.py 93.3% <66.6%> (-0.2%) ⬇️
cloudpathlib/local/localpath.py 90.0% <50.0%> (-4.5%) ⬇️
cloudpathlib/client.py 88.1% <66.6%> (-1.5%) ⬇️
cloudpathlib/local/localclient.py 94.6% <50.0%> (-2.1%) ⬇️

@pjbull pjbull changed the base branch from master to 236-live-tests February 17, 2024 17:15
@pjbull
Copy link
Member

pjbull commented Feb 17, 2024

Thanks @kabirkhan! Moving to local branch to run the live backend tests

@pjbull pjbull merged commit e362366 into drivendataorg:236-live-tests Feb 17, 2024
25 of 27 checks passed
pjbull added a commit that referenced this pull request Feb 17, 2024
* Add `as_url` method to each CloudPath implementation with ability to generate presigned_urls (#236)

* Add ability to set endpoint_url from AWS_ENDPOINT_URL env variable. Very useful for localstack while waiting for upstream PR from boto3: boto/boto3#2746

* add test

* rm print

* lint

* initial implementation of as_url for presigned urls

* add az implementation

* fix non-presigned urls for each cloud

* fix az presigned url

* actually test and fix urls across implementations. conditional keys

* use sdks to get presigned and public urls (gs and s3)

* use sdk to get public url as well

* rm extra commented out

* add basic implementations for new url methods for local

* black format

* rm unused import

* fix sig

* add url stuff to the mock test client

* format

* fix up tests for each specific rig

* don't use a test that requires creds in s3 specific

* lint

* rm path checks that wouldn't work on windows for local mocks

* format

* update docs

* move default implementation to cloudpath

* Update tests and documentation

---------

Co-authored-by: Kabir Khan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants