-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try harder to make believable directories #3775
Try harder to make believable directories #3775
Conversation
Previous implementation of fake folders just looked for a trailing '/'. Now we also list the files with this prefix and use this to determine whether the input is a folder. This will detect 'dir' in addition to just 'dir/' before. This version will still say 'yes' to anything that ends in a slash, even if they don't exist, maintaining the previous behavior.
This PR should fix issue #3774 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chingor13 @JesseLovelace please also take a look
...ib/google-cloud-nio/src/main/java/com/google/cloud/storage/contrib/nio/CloudStoragePath.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should note somewhere that this check now makes an additional list operation. usePseudoDirectories()
appears to default to true
.
It also seems like a psuedo directory is the failure case for many operations. A user might be better served by turning off psuedo directories and handling any exception if they don't want an additional request per file specified.
...ib/google-cloud-nio/src/main/java/com/google/cloud/storage/contrib/nio/CloudStoragePath.java
Show resolved
Hide resolved
...ib/google-cloud-nio/src/main/java/com/google/cloud/storage/contrib/nio/CloudStoragePath.java
Show resolved
Hide resolved
Improved performance by moving the pseudodir code to only run on file not found, leaving the common-case path to run faster. Added tests for pseudodirs, and one in LocalStorage too. Improved comments following reviewer suggestions.
@jean-philippe-martin The thing I'm most concerned about is that we're silently adding additional list requests and it's opaque and possibly unexpected. A user upgrading may see many more list requests and at scale this could cost them $$$. |
@chingor13 you're right, there would be more list requests: but only when the user tries to access files that do not exist (and do not end in slash). This is not a very common operation at all. And there is an easy way to make sure no list operation is sent: disable pseudo-directories (of course that has some drawbacks too), or whenever using directories, make sure to append a final "/" at the end so it's clear it's not a file name. We could conceivably add a third intermediate mode ("pseudo-dirs that require slash", effectively what pseudo-dir mode was before this PR), but having more modes multiplies the maintenance and testing costs. I don't think it would be worth it. What we can do is make sure the pseudodir code only triggers when the file isn't there, and I've done that already. We could search for other optimization opportunities. The other thing we can do is update the documentation. I'll be happy to do this if you think it'll be helpful. |
@jean-philippe-martin I think extra calls are happening on more than just missing file failures. If you run a simple snippet from the google-cloud-nio README: try (CloudStorageFileSystem fs = CloudStorageFileSystem.forBucket("bucket")) {
Path path = fs.getPath("lolcat.csv");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
} The above will now make an additional API call to list if the file exists. |
Oh you're right, it does! Let's see if I can avoid it. |
@chingor13 I've pushed an update that removes the call from that case and a few others. |
Previous implementation of fake folders just looked for a trailing
/
. Now we also list the files with this prefix and use this to determine whether the input is a folder. This will detectdir
in addition to justdir/
before.This version will still say 'yes' to anything that ends in a slash, even if they don't exist, maintaining the previous behavior.
This PR also adds support for MAX_VALUES to our fake storage provider, and adds an integration test for finding directories (passing).
Fixes #3774