Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do away with using URLComponents for urlURL #199

Merged
merged 3 commits into from
Jun 5, 2019
Merged

Do away with using URLComponents for urlURL #199

merged 3 commits into from
Jun 5, 2019

Conversation

pushkarnk
Copy link
Contributor

HTTPServerRequest.urlURL is currently built using URLComponents. Though
convenient, this approach has proven to be convoluted because of percent
encoding restrictions. The alternative is to fall back to building a URL String
, something that we did before adopting URLComponents. We can chose to keep
the percent encoding in HTTPServerRequest.

An additional change is in the initializer of ClientRequest. Though we need
to add percent encoding wherever necessary, we should not percent encode
an percent-encoded path.

@pushkarnk
Copy link
Contributor Author

Tested this with: Kitura, SwiftMetrics, Kitura-Sample and Kitura-CouchDB

@pushkarnk
Copy link
Contributor Author

pushkarnk commented May 3, 2019

We're basically reverting 46fbee0 and a few other commits that fixed regressions of this. Also, the ~2% perf gain by using URLComponents is offset by multiple calls to addingPercentEncoding\removingPercentEncoding

@pushkarnk pushkarnk requested a review from djones6 May 3, 2019 04:42
@@ -329,7 +329,11 @@ public class ClientRequest {
if thePath.first != "/" {
thePath = "/" + thePath
}
path = thePath.addingPercentEncoding(withAllowedCharacters: NSCharacterSet.urlQueryAllowed) ?? thePath
if thePath.contains("%") == false {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% is not a permissible character in any of the URL components. It can be safely assumed that if a URL contains % it is already percent-encoded.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's really quite dangerous though, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi ClientRequest accepts the path as a String, which may or may not be percent encoded. If we don't make this assumption, it is difficult to handle both the cases. Do you see any other unhandled cases here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pushkarnk what if the path contains a % sign? Then that also needs to be percent encoded to %25. For example:

$ curl http://httpbin.org/anything/%%
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>

vs.

$ curl http://httpbin.org/anything/%25%25
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Host": "httpbin.org", 
    "User-Agent": "curl/7.54.0"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "86.190.4.221, 86.190.4.221", 
  "url": "https://httpbin.org/anything/%25%25"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Johannes makes a good point, it's not safe to assume the presence of a % character means the string is escaped. It seems to me that there needs to be a contract here: either the user must percent-escape parameters before passing them, or they must not do so. At the moment we don't indicate which we are expecting.

Even parsing a string to detect whether there are any invalid escape sequences isn't safe - what if my path was to a resource named %25? I'd want the result to be %2525, but there's no way for us to mechanically tell whether the resource is really % or %25 without the user indicating whether the string should be escaped or has been already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to detect whether a string is already percent encoded, wouldn't it be possible to percent decode it and compare to the original string?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not actually possible to decide if a String is already percent escaped. For example

https://httpbin.org/anything/%25%25

could be percent escaped or not. Maybe the user meant https://httpbin.org/anything/%% (in which case https://httpbin.org/anything/%25%25 was percent escaped) or maybe the user literally meant https://httpbin.org/anything/%25%25 in which case https://httpbin.org/anything/%25%25 was not percent escaped and the percent escaped form would be https://httpbin.org/anything/%2525%2525.

So in the end we can only ever do a heuristic or (much better) let the user tell us if they have or have not previously percent escaped the string, right?

Copy link
Contributor Author

@pushkarnk pushkarnk May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. But I wonder how libcurl decides this. I mentioned an example in the previous comment, libcurl seems to percent encode https://httpbin.org/anything/% but it doesn't percent-encode https://httpbin.org/anything/%%. And Kitura-net's ClientRequest delegates this decision (%-encode or not) to libcurl.

So, the ClientRequest API hasn't been designed with this percent-encoding consideration. Now, if we decide to add an extra Bool parameter to the initialisers to indicate if a passed URL string is percent-encoded, a couple of issues crop up:

  1. What's the default parameter value - true or false? Some existing users might have assumed ClientRequest does the %-encoding, others might have assumed it doesn't. Their assumptions may be based on how libcurl and hence ClientRequest behaved. Their code might break if we assume any one default parameter value.

  2. If we don't have a default value for the new parameter, we're anyway breaking existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that behaviour of libcurl surprising, and on that basis I don't think we should attempt to emulate it. We do need predictable behaviour, that is appropriate by default for the majority, with the ability to cater for minority cases.

Rather than a boolean true/false, what about an enum of always and automatic, where the default is automatic:

  • always indicates that the values need to be percent-escaped, so a literal % should translate into %25,
  • automatic means that percent-escaping should be applied to the values if they do not already contain a % character.

To my mind, leaving the default on automatic ought to cover the vast majority of uses, and was the original line of thinking for this PR. Users who know that the data they are passing in is definitely not escaped, and which may contain a literal %, can select always to handle it.

I was initially going to suggest a never case, but that's equivalent to automatic when the content has already been escaped. There might be a slight performance benefit from not having to search the value for the % symbol which might make it worth including, in cases where you know that escaping is not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On closer inspection, I misunderstood your current solution. Checking for % and that the URL is valid is sufficient to cover almost every case: A URL that needs escaping (even if it happens to contain a %) will be escaped, and an already valid one will be left as-is.

The only case that it doesn't allow for, then, is when the URL being provided is valid, but should still be escaped (as in the case where %25 is intended literally, not as an escape sequence). This would require an additional option that the user can set.

But this is already an improvement on Kitura-net, I think the chances of encountering such a case are low in practice, and such a parameter could be added in the future if required.

djones6
djones6 previously approved these changes May 7, 2019
Copy link
Contributor

@djones6 djones6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@djones6 djones6 dismissed their stale review May 16, 2019 12:30

See thread above.

Pushkar N Kulkarni added 3 commits May 27, 2019 18:08
`HTTPServerRequest.urlURL` is currently built using `URLComponents`. Though
convenient, this approach has proven to be convoluted because of percent
encoding restrictions. The alternative is to fall back to building a URL String
, something that we did before adopting URLComponents. We can chose to keep
the percent encoding in HTTPServerRequest.

An additional change is in the initializer of ClientRequest. Though we need
to add percent encoding wherever necessary, we should not percent encode
an percent-encoded path.
Copy link
Contributor

@djones6 djones6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this fix for now, and we can easily revisit if there's a real demand for user control of percent-escaping.

@djones6 djones6 merged commit 43d9383 into master Jun 5, 2019
@djones6 djones6 deleted the issue198 branch June 7, 2019 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants