Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doesObjectExist should not return true or false based on assumptions #2342

Closed
3 tasks done
maiermic opened this issue Oct 29, 2021 · 10 comments · Fixed by #2424
Closed
3 tasks done

doesObjectExist should not return true or false based on assumptions #2342

maiermic opened this issue Oct 29, 2021 · 10 comments · Fixed by #2424
Assignees
Labels
bug This issue is a bug.

Comments

@maiermic
Copy link

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
If the request made in doesObjectExist fails for any reason, it should throw an exception instead of returning false based on any assumptions (e.g. bucket does not exist if endpoint can not be reached). Otherwise, the result may be wrong, i.e. false is returned, even though the object does actually exist.

I guess it is a similar issue as in the Java SDK. You may consider to introduce doesBucketExistV2 as well

Due to some concerns about internal service teams being tightly coupled to the underlying API call we decided to rollback this change so that doesBucketExist continues to use headBucket. We've deprecated this method and introduced a doesBucketExistV2 which uses the new getBucketAcl approach.

Version of AWS SDK for PHP?
Example: v3.191.6

Version of PHP (php -v)?
7.4

To Reproduce (observed behavior)
To use an endpoint URL that can not be resolved, is IMO the easiest way to get an error in the request. Otherwise, it depends on your endpoint configuration. You can run the following example without any changes to the configuration:

$client = new \Aws\S3\S3Client([
  'version' => 'latest',
  'region' => 'eu-central-1',
  'endpoint' => 'http://endpoint-that-can-not-be-resolved',
  'use_path_style_endpoint' => true,
  'credentials' => [
    'key' => 's3CredentialsKey',
    'secret' => 's3CredentialsSecret',
  ],
]);

$bucketName = 'bucketName';
$doesExist = $client->doesObjectExist($bucketName, 'bucketKey');
// should not be reached, since exception should be thrown by doesObjectExist:
var_dump($doesExist); // false
// the following throws an exception as expected
$client->createBucket(['Bucket' => $bucketName]);

You can also reproduce this issue with your own endpoint if you use invalid credentials to trigger a 403. In that case, doesObjectExist should throw an exception, too.

Expected behavior
doesObjectExist should throw an exception.

Additional context
In my private project, I had an issue, when I uploaded a lot of files. I used doesObjectExist to check if I already uploaded the file. This case was actually meant to be used, if I have to run the script again. However, in the first run, doesObjectExist returned true in some cases (0.1%), even though the file did actually not exist and hence, wasn't uploaded. Are there any assumptions, where true is returned? I'm not sure how to reproduce this consistently or how to log relevant information if the result was wrong, but I guess that requests have failed with another error as I described in the reproduce section of this issue.

@maiermic maiermic added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 29, 2021
@maiermic maiermic changed the title doesObjectExist should not return false based on assumptions doesObjectExist should not return true or false based on assumptions Oct 29, 2021
@stobrien89 stobrien89 self-assigned this Feb 8, 2022
@stobrien89
Copy link
Member

Hi @maiermic,

Thanks for raising this! We'll probably go the route of a doesObjectExistV2 to avoid breaking anyone relying on the existing behavior. Don't have an ETA as of yet, but I'll let you know as soon as I have an update.

@stobrien89 stobrien89 removed the needs-triage This issue or PR still needs to be triaged. label Feb 8, 2022
@maiermic
Copy link
Author

maiermic commented Feb 9, 2022

@stobrien89 Thank you for looking into this 😃

@adrianbecker013
Copy link

What is the cost of doesObjectExist()?

@stobrien89
Copy link
Member

stobrien89 commented Apr 1, 2022

@adrianbecker013, it's the equivalent of a single HEAD request/headObject call.

@stobrien89
Copy link
Member

@maiermic,

I know it's been a while, but would it be possible for you to pinpoint whether or not the operation was returning true (for non-existent objects) in the case of an AccessDenied error code? I'm wrapping up the implementation and am trying to determine if this is the culprit.

@maiermic
Copy link
Author

maiermic commented Apr 4, 2022

@stobrien89 I'm not sure I understood your question correctly. Regarding my private project that I mention in the issue description in Additional context: doesObjectExist returned true for actually non-existent objects, even though the credentials were valid and the same as for calls that returned false as expected (object existed). However, I don't know anything about the responses (status codes) of those requests.

I'm wrapping up the implementation and am trying to determine if this is the culprit.

It looks like the cause of the case I mentioned to reproduce the issue in another way:

You can also reproduce this issue with your own endpoint if you use invalid credentials to trigger a 403.

I would not catch any of these exceptions. However, I don't think that I got a status code 403 as response in my private project. According to this table, AccessDenied should only be returned in case of status code 403. However, there is another table Asynchronous Error Codes that associates AccessDenied with status code 200. I'm not sure, which table applies in my case. By the way, where is the return value of $e->getAwsErrorCode() set?

Maybe, it helps to take a look, how they fixed it in the Java SDK. I asked for the relevant changes.

@stobrien89
Copy link
Member

Hi @maiermic,

Sorry for the confusion. I was just asking in the off chance you had debugged this down to S3ClientTrait level. Thank you for the information!

The return value for getAwsErrorCode is set here. It's possible that we were looking for the asynchronous AccessDenied case, because so far I've been unable to produce an AccessDenied error code in manual testing.

As far as the new implementation is concerned, I want to stick with HeadObject and HeadBucket (working on DoesBucketExist as well) if possible, due to the low cost. I plan to throw an exception for any unsuccessful call aside from a 404 Not Found (or possibly AccessDenied error code), which seems to be the only 'correct' way to determine that an object does not exist. Any thoughts/feedback are welcome.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@stobrien89
Copy link
Member

Hi @maiermic,

Thanks again for the feedback. The new methods have been implemented and will be available in tomorrow's release.

@maiermic
Copy link
Author

@stobrien89 Thank you very much for fixing this issue 👍
@SamRemis Thank you very much for reviewing the PR 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants