Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PSR-7] Scheme delimiters clarification #480

Closed

Conversation

weierophinney
Copy link
Contributor

A discussion on the mailing list raised the point that : is the actual scheme delimiter. In reviewing RFC 3986 section 3, I noted that : is the end delimiter for a scheme, and // is the opening delimiter for the authority. Since the UriInterface does not allow setting the authority in a single go, the only time the // delimiter will likely be used is as part of the scheme, making the current wording valid. However, the wording should also note that a trailing : delimiter MUST be removed as well.

This patch updates the UriInterface's withScheme() and getScheme() methods to clarify that either the : or ://, when present, MUST be removed.

@weierophinney weierophinney changed the title [PSR-7] Errata: scheme delimiters [PSR-7] Scheme delimiters clarification Apr 1, 2015
@weierophinney
Copy link
Contributor Author

Pinging @simensen for review.

@simensen
Copy link
Contributor

simensen commented Apr 2, 2015

👍

* provided includes the "://" delimiter, it MUST be removed.
* provided includes either the closing scheme delimiter (":") or the
* combination of the closing scheme delimiter and the opening authority
* delimiter ("://"), these strings should be removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is should written lowercase here but in other places in capital letters, like 2 lines below?

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

I really wonder why withScheme and getScheme enforce handling of these separators. RFC 3986 does explicitly never include these separators in the definition of the scheme, e.g. scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ). So why enforce something that is not according to the standard?

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

If this trimming is added, why not also add trimming for whitespace and so forth? This shows that this is arbitrary IMO.

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

Also an URI must not have an authority part. So scheme:/path is also valid. So why does it not also trim :/?

@weierophinney
Copy link
Contributor Author

@Tobion Thanks for the feedback; I'll wait for others before incorporating any changes. I like the idea of just not mentioning the delimiters at all; it would simplify it greatly and remove ambiguity.

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

@weierophinney do you know why this was incorparated in the first place? Never seen this handling before in any http library.

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

What I would instead add, is that getScheme MUST return the scheme in lowercase as this according to the standard

Although schemes are case-
insensitive, the canonical form is lowercase and documents that
specify schemes must do so with lowercase letters. An implementation
should accept uppercase letters as equivalent to lowercase in scheme
names (e.g., allow "HTTP" as well as "http") for the sake of
robustness but should only produce lowercase scheme names for
consistency.

This would actually be quite important for the usability. As a user you can simply do if ($request->getScheme() === 'https') without worrying about the case.

@simensen
Copy link
Contributor

simensen commented Apr 3, 2015

If this is still the direction we are heading, I'm 👍 on removing the discussion of the delimiters altogether and possibly moving that to the metadoc.

@jrnickell
Copy link

@weierophinney The verbiage "If no scheme is present, this method MUST return an empty string." seems to conflict with section 3: "The scheme and path components are required, though the path may be empty (no characters)"

Should a URI always have a scheme? I have an implementation that I was working on some time ago. I would be glad to share it if you are interested. In my case, I followed the relative resolution algorithm from the RFC to make a static constructor. Once the Uri instance is created, it had to have a scheme.

I agree that value objects like these should be immutable as well. Thanks for all of your hard work :)

Something like this:

/**
 * Creates a Uri instance from a base URI and relative reference
 *
 * @see [Relative Resolution](http://tools.ietf.org/html/rfc3986#section-5.2)
 *
 * @param UriInterface|string $base      A Uri instance or URI string
 * @param string              $reference A relative URI reference
 * @param boolean             $strict    Whether strict parsing is enabled
 *
 * @return UriInterface
 *
 * @throws Exception When the base or reference are invalid
 */
public static function resolve($base, $reference, $strict = true);

The "strict" argument in my class was a flag for this part of the algorithm in 5.2.2:

if ((not strict) and (R.scheme == Base.scheme)) then

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

An empty path value is equivalent to removing the path.

Should also be removed IMO as a path is required but can be empty. So this sentence makes no sense to me.

@nyamsprod
Copy link

@weierophinney I've review the changes and @Tobion seems to highlights the same problems that I had with the previous text. My original take was to allow scheme AND scheme: only. The reason I'm okay with the latter input is when I look at other languages. In JavaScript for instance:

alert(window.location.protocol); //will returns 'http:' or 'https:'

So allowing the latter representation may ease working with output from other languages.

On the other hand, the // should never be accepted in any situation but should be added if an authority part is present when generated the string representation of an URL, when using the __toString method.

This way we can ensure that:

  • data submitted to withScheme are correct (without using any trim function);
  • We are not promoting erroneous interpretation of the scheme component.

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

Since UriInterface is made to be failure tolerant, I agree that withScheme should only remove a trailing :, but not anything other. So we have:

  • withScheme: allows trailing :, e.g. http:
  • withPath: allows missing start slash, e.g. path/foo
  • withQuery: allows starting ?
  • withFragment allows starting #

Edit: I just realize withPath without start slash is optional (MAY) which is kinda inconsistent. Either we do enforce failure tolerance everywhere or nowhere.

@nyamsprod
Copy link

@Tobion maybe adding

  • withHost: allows trailing: . eg. example.com.

@Tobion
Copy link
Contributor

Tobion commented Apr 3, 2015

@nyamsprod yes why not. In theory example.com != example.com. since it depends on the network configuration. So example.com could map to example.com.mydomain.com because without the dot, it can be relative to a corporate intranet for example. But I guess we can just ignore that and simply make example.com == example.com. and remove the dot.
After all PSR-7 is not a generic URI implementation but opinionated for http and must not cover every single use-case. Another example is path/foo != /path/foo in URI terms, e.g. mailto:[email protected] that we also do not cover.

A [discussion on the mailing list](https://groups.google.com/d/msgid/php-fig/db3d5421-30a5-4135-9f9e-fe432f03d719%40googlegroups.com?utm_medium=email&utm_source=footer) raised the point that
`:` is the actual scheme delimiter. In reviewing [RFC 3986 section 3](https://tools.ietf.org/html/rfc3986#section-3),
I noted that `:` is the end delimiter for a scheme, and `//` is the opening
delimiter for the authority. Since the `UriInterface` does not allow setting the
authority in a single go, the only time the `//` delimiter will likely be used
is as part of the scheme, making the current wording valid. However, the wording
should also note that a trailing `:` delimiter MUST be removed as well.

Created an "Errata" section that adds this; we can vote on it post-acceptance.
Instead of errata, we'll now address this directly in the specification.
@weierophinney weierophinney force-pushed the errata/psr7-uri-scheme branch from d1fea64 to aec885a Compare April 7, 2015 18:55
- Addressed feedback from @Tobion about specifying that `getScheme()` MUST
  normalize to lowercase.
- Removed information about `//` authority delimiter altogether. Kept
  information about `:` delimiter per feedback from @nyamsprod.
- Removed details abou scheme delimiter and case from `withScheme()`, and kept
  it specifically in only `getScheme()`.
@weierophinney
Copy link
Contributor Author

@Tobion and @nyamsprod — I've incorporated your feedback at this time; please review! :)

@nyamsprod
Copy link

👍

* an instance that contains the specified scheme. If the scheme
* provided includes the "://" delimiter, it MUST be removed.
* This method MUST retain the state of the current instance, and return a
* new instance that contains the specified scheme.
*
* Implementations SHOULD restrict values to "http", "https", or an empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this restriction is pointless. If it's about validation then it's useless since one cannot rely on it as it may accept other schemes as well.
So I would reverse the logic: "Implementations MUST support schemes "http", "https" and an empty string but MAY accept other schemes if required."

- MUST support http and https and empty schemes. Credit to @Tobion
@weierophinney
Copy link
Contributor Author

@Tobion : Updated.

* an instance that contains the specified scheme. If the scheme
* provided includes the "://" delimiter, it MUST be removed.
* This method MUST retain the state of the current instance, and return a
* new instance that contains the specified scheme.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wrongly reverts stuff: new vs an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh - bad rebase. I'll get it fixed.

- `s/a new instance/an instance/`
@weierophinney
Copy link
Contributor Author

@Tobion updated; please review.

@@ -1161,10 +1161,14 @@ interface UriInterface
* Implementations SHOULD restrict values to "http", "https", or an empty
* string but MAY accommodate other schemes if required.
*
* If no scheme is present, this method MUST return an empty string.
* The string MUST omit the closing scheme delimiter (":"), if present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be moved to withScheme to be consistent with withQuery and withFragment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, @Tobion: should I move it there, or remove that verbiage from withQuery()/withFragment() (as it's now in the getQuery()/getFragment() methods)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tobion I chose to remove the verbiage from each of withQuery() and withFragment(), as normalization is already properly detailed in the get*() methods for each component.

- Per comment from @Tobion, removing the verbiage for trimming the leading
  delimiter from each of the `withQuery()` and `withFragment()` methods, as the
  relevant location for detailing normalization is in the `get*()` methods (and
  the verbiage already exists there).
- for consistency with RFC 3986
@weierophinney
Copy link
Contributor Author

@pmjones and/or @simensen — this one is ready for merge.

@Tobion
Copy link
Contributor

Tobion commented Apr 9, 2015

The more I think about this the more I'm sure we should NOT add this auto-removal of delimiters.
It makes the behavior error prone and magic. And magic is never good.
For example what would $request->withQuery('??%3F=bar')->getQuery() result in?
It is not forbidden to have a ? (encoded) in the query param name. And browsers also do handle it when it's not encoded, e.g. http://example.org??%3F=bar will result in _GET["??"] = 'bar'.

But when you add the auto-trimming there are 4 different possible outcomes which will result in different parameters:

  • only the first ? is removed by one implementation
  • one implementation uses ltrim and removes two ?
  • one user actually wants the first ? to be there, then he would need to magically add one or use the encoded version
  • one implementation might even wrongly handle the trimming after parse_str which could even remove the encoded version %3F because the statement is now in getQuery instead of withQuery.

So IMO such magic does not belong in a standard. And if this trimming is about helping users minimize errors, then it's a different layer where such thing can be handled. For example asking if one really intends to use a question mark in the query string, e.g. in a route. Or logging a warning or whatever.

@weierophinney
Copy link
Contributor Author

@Tobion Respectfully, I disagree on this point. The point is to ensure that you can construct the full URI and know that the delimiters are not present when you fetch each component. In other words, the data present is the component data only, no delimiters. What we're specifically suggesting trimming here are the delimiters as specified for the given component, in the given position only.

This is, in fact, precisely what parse_url() already does:

$url = 'https://mwop.net:9993/path?foo=bar#fragment';

var_export(parse_url($url));

/*
Results in:

array (
  'scheme' => 'https',
  'host' => 'mwop.net',
  'port' => 9993,
  'path' => '/path',
  'query' => 'foo=bar',
  'fragment' => 'fragment',
)
*/

I'm leaving this verbiage in, with the changes as noted in this PR.

@Tobion
Copy link
Contributor

Tobion commented Apr 9, 2015

If it's just about saying that the query does not include the ? delimiters then you must rephrase the text.
This is kinda obvious since that is already defined in RFC 3986.
Omitting/removing as stated sofar in PSR-7 is something completely different.

@Tobion
Copy link
Contributor

Tobion commented Apr 9, 2015

I guess you misunderstood me. Otherwise this whole discussion so far doesn't make sense.

@weierophinney
Copy link
Contributor Author

@Tobion Then please clarify your position.

The thread started with @nyamsprod 's suggestion that we strip a trailing : from schemes as well as ://. We then decided that we should omit ONLY :, as // is a delimiter for the authority, not the scheme. Those changes were made.

Additionally, the discussion of normalization was moved only to the get*() methods, to allow for lazy-normalization, and to indicate what the consumer can expect as a return value.

I've continued to make changes as you find new issues.

However, honestly, it feels like any change I make, you're specifically trying to find a reason why it's wrong or how I misinterpreted your intent. As such: please clarify EXACTLY what you would like to see. It may be best to do this in a new pull request, so that there can be no room for me misinterpreting your intent.

I already have a 👍 from @nyamsprod , and, considering this was raised on their behalf, I consider it ready. Give me a PR with the changes you would like to see so we can compare.

@Tobion
Copy link
Contributor

Tobion commented Apr 10, 2015

I'll provide a PR.

@weierophinney
Copy link
Contributor Author

Closing in favor of #503.

@weierophinney weierophinney deleted the errata/psr7-uri-scheme branch January 5, 2018 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants