Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remove_dot_segments #827

Merged
merged 3 commits into from
Mar 21, 2016
Merged

Fix remove_dot_segments #827

merged 3 commits into from
Mar 21, 2016

Conversation

migtor
Copy link
Contributor

@migtor migtor commented Mar 3, 2016

In the RFC 3986, section 5.2.4, 2A and 2B, it's indicated to remove /. or /.. but only if . or .. are complete path segments. The problem was that these were always removed, so an URL like /.example/ would become /example/, then the signature would be rejected by the services.

Rewrote the function in a (I think) more pythonic way. Also, added two tests to cover those cases.

@kyleknap
Copy link
Contributor

Looks good to me. @jamesls any thoughts?

@jamesls
Copy link
Member

jamesls commented Mar 10, 2016

Taking a look...

@jamesls
Copy link
Member

jamesls commented Mar 11, 2016

Is this change conceptually switching elif url.startswith('/.'): to : elif url == "/."?

It's a little hard to review because we're refactoring along with fixing the bug. I slightly prefer keeping the original format because it matches the description in the RFC. I do appreciate a more pythonic approach, but mirroring the RFC makes it easier to verify.

@migtor
Copy link
Contributor Author

migtor commented Mar 14, 2016

I think the RFC algorithm and the one proposed here are equivalent if we take into account that we also need to remove consecutive slashes. In this case it makes sense to process the URL path as a list of segments. In my understanding of the RFC, if we consider complete segments and forget about the slashes, the algorithm is just:

  • If the segment is ., just remove it.
  • If the segment is .., remove it and remove the previous segment as well (if any).

The paths can be seen as Unix directory paths, so for example /home/./user1/../user2 becomes /home/user2, that's what the algorithm is doing, and what I understand the RFC is trying to do as well. The RFC requires this because the dots are only present in relative references, the absolute URL path should never have these dots, so they are 'resolved'.

IMHO the RFC code is harder to understand, but makes sense for example for C implementations of the algorithm, which is usually the case in the examples given by RFCs.

@jamesls jamesls assigned jamesls and unassigned kyleknap Mar 21, 2016
@jamesls
Copy link
Member

jamesls commented Mar 21, 2016

Ok sounds good. I made one small adjustment for an empty string, but otherwise left everything unmodified.

Thanks for the pull request.

@jamesls jamesls merged commit 6354ccb into boto:develop Mar 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants