-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed bug #237 skylark cp does not warn user if object does not exist #248
Conversation
@@ -211,6 +211,7 @@ def run_replication_plan(self, job: ReplicationJob) -> ReplicationJob: | |||
"azure", | |||
"gcp", | |||
], f"Only AWS, Azure, and GCP are supported, but got {job.dest_region}" | |||
assert len(job.src_objs) > 0, f"Found {len(job.src_objs)} matching objects. Objects do not exist in src bucket or bucket is empty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback: return errors to users by using an if statement then printing a readable error (e.g. logger.error), then raise typer.Abort()
@milesturin I added you as a reviewer since you raised the initial issue. Would this fix the original issue? |
@ethanmehta unfortunately these do not fix the issue. Try running skylark cp on a nonexistent object, if the fix is successful it will report that the object does not exist to the user, if not it will print |
…loud to cloud with s3 source bucket
…if the source bucket or object do not exist, bug #237
Thanks @milesturin. It should work now. If a user uses If a user uses Is this the desired behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanmehta Seems to work well, I left a few small suggestions but other than that it looks good to me. @parasj I can only speak to the s3 interface stuff so take a quick glance and make sure it looks good for the replication stuff as well.
skylark/cli/cli.py
Outdated
@@ -126,6 +126,12 @@ def cp( | |||
# Set up replication topology | |||
if solve: | |||
objs = list(src_client.list_objects(path_src)) | |||
if not objs: | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this to a logger.error
since it causes program termination. Also, no need to make the argument an f string since we aren't inserting any data into it. Finally I'd say make it a one liner since the parameter is small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make the error slightly more specific too, something like "Specified object does not exist."
total_bytes = 0.0 | ||
for obj in object_interface.list_objects(prefix=src_key): | ||
sub_key = obj.key[len(src_key) :] | ||
sub_key = sub_key.lstrip("/") | ||
dest_path = dst / sub_key | ||
total_bytes += _copy(obj, dest_path) | ||
obj_count += 1 | ||
|
||
if not obj_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestions as before
skylark/cli/cli_helper.py
Outdated
@@ -230,6 +238,13 @@ def replicate_helper( | |||
else: | |||
# make replication job | |||
objs = list(ObjectStoreInterface.create(topo.source_region(), source_bucket).list_objects(src_key_prefix)) | |||
if not objs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestions as before
@@ -23,6 +25,9 @@ def __init__(self, aws_region, bucket_name, use_tls=True, part_size=None, throug | |||
self.auth = AWSAuthentication() | |||
self.aws_region = self.infer_s3_region(bucket_name) if aws_region is None or aws_region == "infer" else aws_region | |||
self.bucket_name = bucket_name | |||
if not self.bucket_exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just change it to a logger.error
skylark/obj_store/s3_interface.py
Outdated
region = s3_client.get_bucket_location(Bucket=bucket_name).get("LocationConstraint", "us-east-1") | ||
return region if region is not None else "us-east-1" | ||
except: | ||
logger.warn("Specified bucket does not exist.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as before
skylark/cli/cli_helper.py
Outdated
f"Objects do not exist." | ||
) | ||
raise typer.Abort() | ||
return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this return
, it'll never get run
@@ -11,6 +11,7 @@ | |||
from skylark.utils import logger | |||
from tqdm import tqdm | |||
import pandas as pd | |||
import typer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this gets used
skylark/obj_store/s3_interface.py
Outdated
@@ -22,6 +24,9 @@ def __init__(self, aws_region, bucket_name, use_tls=True, part_size=None, throug | |||
self.auth = AWSAuthentication() | |||
self.aws_region = self.infer_s3_region(bucket_name) if aws_region is None or aws_region == "infer" else aws_region | |||
self.bucket_name = bucket_name | |||
if not self.bucket_exists(): | |||
logger.warn("Specified bucket does not exist.") | |||
raise typer.Abort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanmehta Can we raise an exception other than the typer ones? S3Interface is in the API so it shouldn't import this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanmehta check this page out, there are a number of exceptions regarding missing objects in botocore, might be able to find an applicable one in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a new exceptions.py
file with cloud provider independent exceptions, so we should make a new exception class (e.g. MissingBucketException or MissingObjectException)
@ethanmehta I resolved a merge conflict that I introduced in main. |
…b.com/parasj/skylark into dev/ethanmehta/bug-warn-objects-exist
….py. Skylark raises these instead of typer.Abort.
Should be good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@ethanmehta Feel free to merge this PR after resolving merge conflicts! |
Fixes #237
In
run_replication_plan
method ofReplicatorClient
, added assertion warning user that no matching objects were found with the specified source bucket and prefix. Tested withskylark cp
method.