Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug #237 skylark cp does not warn user if object does not exist #248

Merged
merged 12 commits into from
Apr 26, 2022

Conversation

ethanmehta
Copy link
Contributor

@ethanmehta ethanmehta commented Apr 4, 2022

Fixes #237

In run_replication_plan method of ReplicatorClient, added assertion warning user that no matching objects were found with the specified source bucket and prefix. Tested with skylark cp method.

@parasj parasj requested a review from milesturin April 4, 2022 22:01
@@ -211,6 +211,7 @@ def run_replication_plan(self, job: ReplicationJob) -> ReplicationJob:
"azure",
"gcp",
], f"Only AWS, Azure, and GCP are supported, but got {job.dest_region}"
assert len(job.src_objs) > 0, f"Found {len(job.src_objs)} matching objects. Objects do not exist in src bucket or bucket is empty"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback: return errors to users by using an if statement then printing a readable error (e.g. logger.error), then raise typer.Abort()

@parasj
Copy link
Contributor

parasj commented Apr 4, 2022

@milesturin I added you as a reviewer since you raised the initial issue. Would this fix the original issue?

… method of s3_interface.py, if source bucket does not exist. bug #237
… of s3_interface.py, if source bucket does not exist. bug #237
@milesturin
Copy link
Contributor

@ethanmehta unfortunately these do not fix the issue. Try running skylark cp on a nonexistent object, if the fix is successful it will report that the object does not exist to the user, if not it will print Downloading: 0.00B [00:00, ?B/s]

@ethanmehta
Copy link
Contributor Author

Thanks @milesturin. It should work now.

If a user uses skylark cp s3 local and the s3 bucket or object do not exist, skylark will warn and abort.

If a user uses skylark cp s3 s3 and the s3 source bucket or object do not exist, skylark will warn and abort.

Is this the desired behavior?

Copy link
Contributor

@milesturin milesturin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanmehta Seems to work well, I left a few small suggestions but other than that it looks good to me. @parasj I can only speak to the s3 interface stuff so take a quick glance and make sure it looks good for the replication stuff as well.

@@ -126,6 +126,12 @@ def cp(
# Set up replication topology
if solve:
objs = list(src_client.list_objects(path_src))
if not objs:
logger.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to a logger.error since it causes program termination. Also, no need to make the argument an f string since we aren't inserting any data into it. Finally I'd say make it a one liner since the parameter is small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make the error slightly more specific too, something like "Specified object does not exist."

total_bytes = 0.0
for obj in object_interface.list_objects(prefix=src_key):
sub_key = obj.key[len(src_key) :]
sub_key = sub_key.lstrip("/")
dest_path = dst / sub_key
total_bytes += _copy(obj, dest_path)
obj_count += 1

if not obj_count:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestions as before

@@ -230,6 +238,13 @@ def replicate_helper(
else:
# make replication job
objs = list(ObjectStoreInterface.create(topo.source_region(), source_bucket).list_objects(src_key_prefix))
if not objs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestions as before

@@ -23,6 +25,9 @@ def __init__(self, aws_region, bucket_name, use_tls=True, part_size=None, throug
self.auth = AWSAuthentication()
self.aws_region = self.infer_s3_region(bucket_name) if aws_region is None or aws_region == "infer" else aws_region
self.bucket_name = bucket_name
if not self.bucket_exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just change it to a logger.error

region = s3_client.get_bucket_location(Bucket=bucket_name).get("LocationConstraint", "us-east-1")
return region if region is not None else "us-east-1"
except:
logger.warn("Specified bucket does not exist.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as before

f"Objects do not exist."
)
raise typer.Abort()
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this return, it'll never get run

@@ -11,6 +11,7 @@
from skylark.utils import logger
from tqdm import tqdm
import pandas as pd
import typer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this gets used

@@ -22,6 +24,9 @@ def __init__(self, aws_region, bucket_name, use_tls=True, part_size=None, throug
self.auth = AWSAuthentication()
self.aws_region = self.infer_s3_region(bucket_name) if aws_region is None or aws_region == "infer" else aws_region
self.bucket_name = bucket_name
if not self.bucket_exists():
logger.warn("Specified bucket does not exist.")
raise typer.Abort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanmehta Can we raise an exception other than the typer ones? S3Interface is in the API so it shouldn't import this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanmehta check this page out, there are a number of exceptions regarding missing objects in botocore, might be able to find an applicable one in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a new exceptions.py file with cloud provider independent exceptions, so we should make a new exception class (e.g. MissingBucketException or MissingObjectException)

@parasj
Copy link
Contributor

parasj commented Apr 15, 2022

@ethanmehta I resolved a merge conflict that I introduced in main.

@ethanmehta
Copy link
Contributor Author

Should be good to merge.

Copy link
Contributor

@parasj parasj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@parasj
Copy link
Contributor

parasj commented Apr 25, 2022

@ethanmehta Feel free to merge this PR after resolving merge conflicts!

@parasj parasj added the bug Something isn't working label Apr 25, 2022
@parasj parasj added this to the Bugs milestone Apr 25, 2022
@ethanmehta ethanmehta merged commit bdea2ea into main Apr 26, 2022
@ethanmehta ethanmehta deleted the dev/ethanmehta/bug-warn-objects-exist branch April 26, 2022 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[bug] skylark cp does not warn user if object does not exist
3 participants