Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][FS][Azure] Fix CreateDir and DeleteDir on hierarchical namespace accounts where the directory has a trailing / #40052

Closed
Tom-Newton opened this issue Feb 12, 2024 · 5 comments · Fixed by #40054
Assignees
Milestone

Comments

@Tom-Newton
Copy link
Contributor

Tom-Newton commented Feb 12, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Child of #18014

Manually modifying #40021 to run the Python tests against a hierarchical namespace account has highlighted some missing test coverage in the existing C++ tests and some cases that we need to fix.

Currently the following fail on hierarchical namespace storage accounts.

fs->CreateDir("directory/")
fs->DeleteDir("directory/")

They fail with

Failed to delete a directory: directory/: https://tomtesthns.blob.core.windows.net/ea119933-c9d3-11ee-989a-71cec6336ac8/directory/ Azure Error: [InvalidUri] 400 The request URI is invalid.
The request URI is invalid.
RequestId:c9ad826a-101f-0005-5be0-5d0db4000000
Time:2024-02-12T18:24:12.9974541Z
Request ID: c9ad826a-101f-0005-5be0-5d0db4000000

Removing the trailing slash solves the problem.

I haven't tested but I expect DeleteDirContents probably has the same issue.

Component(s)

C++

@Tom-Newton
Copy link
Contributor Author

take

felipecrv pushed a commit that referenced this issue Feb 13, 2024
… issues on hierarchical namespace accounts (#40054)

### Rationale for this change
There were the following failure cases
```
fs->CreateDir("directory/")
``` 
```
fs->DeleteDir("directory/")
``` 
They fail with 
```
Failed to delete a directory: directory/: https://tomtesthns.blob.core.windows.net/ea119933-c9d3-11ee-989a-71cec6336ac8/directory/ Azure Error: [InvalidUri] 400 The request URI is invalid.
The request URI is invalid.
RequestId:c9ad826a-101f-0005-5be0-5d0db4000000
Time:2024-02-12T18:24:12.9974541Z
Request ID: c9ad826a-101f-0005-5be0-5d0db4000000
```

### What changes are included in this PR?
Add tests to cover these cases. 
Remove trailing slashes to avoid these issues. 

### Are these changes tested?
Yes. I added new test cases to cover these cases. I was rather torn about whether to add precise tests like I've done or to duplicate every test case we had and run it a second time with trailing slashes.

### Are there any user-facing changes?
Fixed bug on `CreateDir` and `DeleteDir`.

* Closes: #40052

Authored-by: Thomas Newton <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
@felipecrv felipecrv added this to the 16.0.0 milestone Feb 13, 2024
@pitrou pitrou modified the milestones: 16.0.0, 15.0.1 Feb 14, 2024
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
… slash issues on hierarchical namespace accounts (apache#40054)

### Rationale for this change
There were the following failure cases
```
fs->CreateDir("directory/")
``` 
```
fs->DeleteDir("directory/")
``` 
They fail with 
```
Failed to delete a directory: directory/: https://tomtesthns.blob.core.windows.net/ea119933-c9d3-11ee-989a-71cec6336ac8/directory/ Azure Error: [InvalidUri] 400 The request URI is invalid.
The request URI is invalid.
RequestId:c9ad826a-101f-0005-5be0-5d0db4000000
Time:2024-02-12T18:24:12.9974541Z
Request ID: c9ad826a-101f-0005-5be0-5d0db4000000
```

### What changes are included in this PR?
Add tests to cover these cases. 
Remove trailing slashes to avoid these issues. 

### Are these changes tested?
Yes. I added new test cases to cover these cases. I was rather torn about whether to add precise tests like I've done or to duplicate every test case we had and run it a second time with trailing slashes.

### Are there any user-facing changes?
Fixed bug on `CreateDir` and `DeleteDir`.

* Closes: apache#40052

Authored-by: Thomas Newton <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
@raulcd
Copy link
Member

raulcd commented Feb 20, 2024

@pitrou @felipecrv trying to cherry-pick this fix is causing a bunch of conflicts on azurefs.cc. I can see there has been several new additions for 16.0.0:
#38703
#39718
#38704

This is the conflict:

diff --cc cpp/src/arrow/filesystem/azurefs.cc
index 730adab,1175059..0000000
--- a/cpp/src/arrow/filesystem/azurefs.cc
+++ b/cpp/src/arrow/filesystem/azurefs.cc
@@@ -1649,20 -1856,29 +1650,27 @@@ class AzureFileSystem::Impl 
    /// \pre location.container is not empty.
    /// \pre location.path is not empty.
    Status DeleteDirOnFileSystem(const DataLake::DataLakeFileSystemClient& adlfs_client,
 -                               const AzureLocation& location, bool recursive,
 -                               bool require_dir_to_exist,
 -                               Azure::Nullable<std::string> lease_id = {}) {
 +                               const AzureLocation& location) {
      DCHECK(!location.container.empty());
      DCHECK(!location.path.empty());
++<<<<<<< HEAD
 +    auto directory_client = adlfs_client.GetDirectoryClient(location.path);
 +    // XXX: should "directory not found" be considered an error?
++=======
+     auto directory_client = adlfs_client.GetDirectoryClient(
+         std::string(internal::RemoveTrailingSlash(location.path)));
+     DataLake::DeleteDirectoryOptions options;
+     options.AccessConditions.LeaseId = std::move(lease_id);
++>>>>>>> 0dbbd43 (GH-40052: [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054))
      try {
 -      auto response = recursive ? directory_client.DeleteRecursive(options)
 -                                : directory_client.DeleteEmpty(options);
 -      // Only the "*IfExists" functions ever set Deleted to false.
 -      // All the others either succeed or throw an exception.
 -      DCHECK(response.Value.Deleted);
 -    } catch (const Storage::StorageException& exception) {
 -      if (exception.ErrorCode == "FilesystemNotFound" ||
 -          exception.ErrorCode == "PathNotFound") {
 -        if (require_dir_to_exist) {
 -          return PathNotFound(location);
 -        }
 +      auto response = directory_client.DeleteRecursive();
 +      if (response.Value.Deleted) {
          return Status::OK();
 +      } else {
 +        return StatusFromErrorResponse(directory_client.GetUrl(), *response.RawResponse,
 +                                       "Failed to delete a directory: " + location.path);
        }
 +    } catch (const Storage::StorageException& exception) {
        return ExceptionToStatus(exception, "Failed to delete a directory: ", location.path,

I suppose this is correct an I should just pick the incoming change but just wanted to confirm.

@pitrou
Copy link
Member

pitrou commented Feb 20, 2024

Or perhaps avoid cherry-picking this change?

@felipecrv
Copy link
Contributor

@raulcd I agree with @pitrou here: don't cherry-pick this change. Things are still moving a lot in azurefs.cc and I believe all the current users are involved in the development of the feature and looking closely at these bugs.

@raulcd
Copy link
Member

raulcd commented Feb 20, 2024

That makes sense to me. I'll just change the milestone to 16.0.0.
Thanks

@raulcd raulcd modified the milestones: 15.0.1, 16.0.0 Feb 20, 2024
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
… slash issues on hierarchical namespace accounts (apache#40054)

### Rationale for this change
There were the following failure cases
```
fs->CreateDir("directory/")
``` 
```
fs->DeleteDir("directory/")
``` 
They fail with 
```
Failed to delete a directory: directory/: https://tomtesthns.blob.core.windows.net/ea119933-c9d3-11ee-989a-71cec6336ac8/directory/ Azure Error: [InvalidUri] 400 The request URI is invalid.
The request URI is invalid.
RequestId:c9ad826a-101f-0005-5be0-5d0db4000000
Time:2024-02-12T18:24:12.9974541Z
Request ID: c9ad826a-101f-0005-5be0-5d0db4000000
```

### What changes are included in this PR?
Add tests to cover these cases. 
Remove trailing slashes to avoid these issues. 

### Are these changes tested?
Yes. I added new test cases to cover these cases. I was rather torn about whether to add precise tests like I've done or to duplicate every test case we had and run it a second time with trailing slashes.

### Are there any user-facing changes?
Fixed bug on `CreateDir` and `DeleteDir`.

* Closes: apache#40052

Authored-by: Thomas Newton <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
… slash issues on hierarchical namespace accounts (apache#40054)

### Rationale for this change
There were the following failure cases
```
fs->CreateDir("directory/")
``` 
```
fs->DeleteDir("directory/")
``` 
They fail with 
```
Failed to delete a directory: directory/: https://tomtesthns.blob.core.windows.net/ea119933-c9d3-11ee-989a-71cec6336ac8/directory/ Azure Error: [InvalidUri] 400 The request URI is invalid.
The request URI is invalid.
RequestId:c9ad826a-101f-0005-5be0-5d0db4000000
Time:2024-02-12T18:24:12.9974541Z
Request ID: c9ad826a-101f-0005-5be0-5d0db4000000
```

### What changes are included in this PR?
Add tests to cover these cases. 
Remove trailing slashes to avoid these issues. 

### Are these changes tested?
Yes. I added new test cases to cover these cases. I was rather torn about whether to add precise tests like I've done or to duplicate every test case we had and run it a second time with trailing slashes.

### Are there any user-facing changes?
Fixed bug on `CreateDir` and `DeleteDir`.

* Closes: apache#40052

Authored-by: Thomas Newton <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment