-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure Blob Storage Scaler doesn't list blobs recursively #1789
Comments
Thanks for the report! @ahmelsayed can you look into this? |
Based on my knowledge of the Java SDK list by hierarchy is non recursive. If you want to list everything under a hierarchy you need to check the |
Looking at the REST API the Go lib is using that sounds like the same https://docs.microsoft.com/en-gb/rest/api/storageservices/enumerating-blob-resources The following would have been in the response
Which indeed needs to be fed into a subsequent request. Reading the Azure docs for the java sdk you can see the subtle differences between calling the Looking back at the REST API docs which both these libraries will be calling underneath, the only difference appears to be that the Hierarchy provides a I tested this out using the This returned the following once I'd uploaded some test files in sub directories: <?xml version=“1.0” encoding=“utf-8"?>
<EnumerationResults ContainerName=“https://*******.blob.core.windows.net/testing”>
<Blobs>
<Blob>
<Name>subdir/subsub/test6.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/subdir/subsub/test6.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:43 GMT</Last-Modified>
<Etag>0x8D9116DC7117ABE</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>subdir/test4.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/subdir/test4.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:33 GMT</Last-Modified>
<Etag>0x8D9116DC0F607C8</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>subdir/test5.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/subdir/test5.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:36 GMT</Last-Modified>
<Etag>0x8D9116DC2A71325</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>test1.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/test1.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:21 GMT</Last-Modified>
<Etag>0x8D9116DBA09FFAB</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>test2.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/test2.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:18 GMT</Last-Modified>
<Etag>0x8D9116DB7C9E0E1</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>test3.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/test3.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:25 GMT</Last-Modified>
<Etag>0x8D9116DBBFFE8C6</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
</Blobs>
<NextMarker />
</EnumerationResults> Hitting this endpoint would yield the result I'm looking for of 6 instead of 3 which us returned when I call with the <?xml version=“1.0” encoding=“utf-8"?>
<EnumerationResults ContainerName=“https://*******.blob.core.windows.net/testing”>
<Delimiter>/</Delimiter>
<Blobs>
<BlobPrefix>
<Name>subdir/</Name>
</BlobPrefix>
<Blob>
<Name>test1.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/test1.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:21 GMT</Last-Modified>
<Etag>0x8D9116DBA09FFAB</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>test2.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/test2.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:18 GMT</Last-Modified>
<Etag>0x8D9116DB7C9E0E1</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
<Blob>
<Name>test3.txt</Name>
<Url>https://*******.blob.core.windows.net/testing/test3.txt</Url>
<Properties>
<Last-Modified>Fri, 07 May 2021 15:35:25 GMT</Last-Modified>
<Etag>0x8D9116DBBFFE8C6</Etag>
<Content-Length>0</Content-Length>
<Content-Type>text/plain</Content-Type>
<Content-Encoding />
<Content-Language />
<Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
<Cache-Control />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
</Properties>
</Blob>
</Blobs>
<NextMarker />
</EnumerationResults> Passing the parameter for |
Yup so testing using the following, I still don't get any jobs running when i upload a blob to a subdir
Which would suggest to be that somewhere the value for |
Again, not a golang dev by any stretch but this looks like the potential culprit: keda/pkg/scalers/azure_blob_scaler.go Lines 82 to 84 in c2ad43e
I think a value of |
It would be good to be able to a put a 'folder' (dynamic name) in the blob storage (with multiple files inside) and for that to count as 1 - maybe optional so it counts a folder as 1 or counts all of the files individually... i know they aren't folders as such, but its how azure represents them. The use case behind this is we are using cognitive services and putting the results in to the blob storage for the keda function to take care of, but it puts folders in with random long names with files inside it. |
This would also be useful for us. We process batch requests and based on the amount of blobs scale our functions. We separate our blobs in folders so batches don't get mixed. |
Closes #1789 Signed-off-by: Ahmed ElSayed <[email protected]>
Closes #1789 Signed-off-by: Ahmed ElSayed <[email protected]>
Regarding recursive listing of blobs, @jasonpaige is right. The delimiter is valid as Regarding your scenario @kevinmatthews-kpmg, I'm assuming what you mean is for example if you have a container
You want to be able to count the "folders" under
Would be 2 ( type: azure-blob
metadata:
blobContainerName: foo
blobPrefix: "somefolder"
blobDelimiter: "/"
count: "blobs" # "blobs" for the current behavior, or "prefixes" to count "folders" under /{blobContainerName}/{blobPrefix}.....{blobDelimiter} The type: azure-blob
metadata:
blobContainerName: foo
blobPrefix: ""
blobDelimiter: "/"
count: "blobs" # "blobs" for the current behavior, or "prefixes" to count "folders" under /{blobContainerName}/{blobPrefix}.....{blobDelimiter} will be 1 ( For this issue, I opened this PR: #2036 One complication: it seems that the scaler was appending |
Sounds good to me, let's open an issue for tracking this |
@ahmelsayed Instead of counting folders, I think we should support scaling based on blob count and container count which go recusively through all sub-containers. If I configure |
I wouldn't be afraid to introduce this breaking change in the next release (with proper documentation), if y'all think that this change is something that would be beneficial for users in v2. |
For my scenario, we use cognitive services and the output of which is stored in a storage account, however it outputs it in the container with a random string for the "folder"... so lets say the structure is as follows: /cog1234/index.json /cog4531/index.json I'd want the above to count the "folders" in the top level so in this case 2, this would allow me to fire up to 2 pods from the scaler, read the index.json file within each and process whatever files are in the files directory inside the top level directory. Does that make sense? Thanks Kevin |
I certainly think it does, but it's different from @jasonpaige & @joachimgoris their scenario who want to scale based on the amount of blobs and not the containers. So since this was originally reported for blob count, I'd recommend creating a new feature request and link to this one for context so we track both. @ahmelsayed are you up for implementing both? |
We've agreed to make the following changes:
|
Thanks so much everyone, great to see this getting added to v2.5 as well! 👍 |
@jasonpaige thanks, but let's keep this open until the PR is not merged :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
I can take this up, @kedacore/keda-contributors. |
Proposal
I'm not sure if this is a bug in the current implementation but given the default values, if I upload a blob to
foo/bar/blob.txt
the scaler will not "see" the file in order to count it. I think (next to 0 Go knowledge) this is becausekeda/pkg/scalers/azure/azure_blob.go
Line 26 in c2ad43e
ListBlobsHierarchySegment
whereasListBlobsFlatSegment
tells the Azure API to "flatten" the list of files on the server side prior to returning the list.If this is a bug then it would be great to get this fixed or docs updated to make this clear. However, if this is intended behaviour it would be great to have this as a new feature whereby a developer can pass a switch to the trigger metadata:
Use-Case
when I upload a blob which includes a directory structure, it is still included in the blobCount to trigger the Scale Target.
Given this directory tree:
The call to
GetAzureBlobListLength
would return 2With the proposed feature in place,
GetAzureBlobListLength
would return 4.Anything else?
No response
The text was updated successfully, but these errors were encountered: