-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway listing only 1000 first object of prefix/dir #150
Comments
I just confirmed with the AWS documentation that the maximum number of results that can be returned is 1k. I imagine that the only way to solve this is to add paging support to the directory listing style sheet and list proxy javascript. Perhaps, the XSL stylesheet could grab the key name of the last key returned and pass it as part of a link on a query parameter, whereby nginx will then issue a request to S3 with that key name set as the marker. What comes to mind is how to sanitize that input such that it does not become a vector for injection attacks. This would be a great PR. |
Hi folks, I am actually interested in this feature and is happy to give it a try with its implementation. I was wondering what are your recommendation for santaizing the inputs? Will storing the nextToken from the response from S3 in a cache and comparing it to the token in the request as well as request paramaters be sufficient? EDITED: i just realized that keyval cache is only available for paid version of nginx. |
Hi @xquek , thank you very much for your interest in helping out! In terms of storing references to the As a new(ish) maintainer I haven't dug too deeply into the xslt side of things but if we wanted "next" and "previous" functionality we could do something like pushing and popping the cursor into the shared dict. xslt_string_param may be helpful in getting things from javascript land into the xslt template. As for sanitization, maybe @dekobon could comment more on their concerns. Populating the "next" and "previous" links in the template using values from the s3 object listing response seems safe enough to me but it's possible I'm missing something. Happy to continue discussing as you get into the change but these are just my initial thoughts. Ref: |
My thoughts on sanitation is need to be sure that only your expected inputs make it to the eventual S3 URL. Otherwise, arbitrary access to the S3 bucket may be possible. I think characters like the following may be the most problematic: As for storing the cursor for the next or previous token, it seems like you could write that to the HTML output generated by the XSLT page. I may be missing something, but I'm not sure you need to use a shared dictionary for that. I imagine something like a link that goes: |
@dekobon thank you for your perspective. The assumption around a cache comes from the fact that s3 does not send any kind of cursor representing the previous page, only the next page. So if you wanted to enable "previous" button functionality you'd need to keep track of at least one token for the previous page. |
Couldn't the token for the previous link be passed as a URL parameter when you click the next button? This value could then be potentially passed to the XSL sheet. |
Ah right because the next link will go through the gateway as well so we'd be able to pull it out maybe using a |
Final update, I spoke with @dekobon offline and wanted to clarify a misunderstanding I had regarding sanitization: Currently the s3 gateway strips any query parameters before passing requests on to S3 (this was the part I didn't know). So in order to pass the |
Thanks! @4141done , just wondering if you would like me to pick this up ! thank you! |
Please do! Excited to see someone tackle this one! |
Describe the bug
Listing only 1000 first object of prefix/dir.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
List all files in current prefix/dir. Or display max files based on some settings w/wo pagination.
Your environment
The text was updated successfully, but these errors were encountered: