-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a neato informative table of various URL pieces #337
Comments
I think so, although maybe a table is better as that would be more accessible I suspect. Note also that |
Oops, an off-by-one.
That does indeed sound like a nice idea.
I did so intentionally, as it's not a concept intrinsically related to URL parsing, but rather more about Web apps/security. And because it's hard. I did a table version with a few variants. The first is a straight translation of the SVG graph. The second is closer to the version in the Node.js doc. The third is the same as the second but has |
Alright, I'm game. We should be able to get the links to work with SVG too. I'm not really sure if we can make all of it equally accessible though. |
Personally I like the first table one, possibly with additional text-align center. I also think it might be interesting to have a counterpart table that is about the URL record terms, instead of the API? (E.g. scheme instead of protocol, query instead of search, fragment instead of hash.) Maybe that wouldn't be that helpful though. |
It's probably useful as there are some interesting differences between the two. Bit unclear where the table should be located at that point, but maybe we could put it in an Appendix? |
So far the url structure was heavily inspired by whatwg/url#337. I initially only wanted to make some tweaks to it to improve querying but I realised I never fully felt comfortable with the field names used here. So I started to look at the url parser of different languages like Go, Ruby, Python and the output they provide are surprisingly similar but not consistent with whatwg. The change made here brings the field names closer to what most url parsers output.
So far the url structure was heavily inspired by whatwg/url#337. I initially only wanted to make some tweaks to it to improve querying but I realised I never fully felt comfortable with the field names used here. So I started to look at the url parser of different languages like Go, Ruby, Python and the output they provide are surprisingly similar but not consistent with whatwg. The change made here brings the field names closer to what most url parsers output.
So far the url structure was heavily inspired by whatwg/url#337. I initially only wanted to make some tweaks to it to improve querying but I realised I never fully felt comfortable with the field names used here. So I started to look at the url parser of different languages like Go, Ruby, Python and the output they provide are surprisingly similar but not consistent with whatwg. The change made here brings the field names closer to what most url parsers output.
* Change structure of URL So far the url structure was heavily inspired by whatwg/url#337. I initially only wanted to make some tweaks to it to improve querying but I realised I never fully felt comfortable with the field names used here. So I started to look at the url parser of different languages like Go, Ruby, Python and the output they provide are surprisingly similar but not consistent with whatwg. The change made here brings the field names closer to what most url parsers output. * fix description of query field * switch it to integer * Add dot to host.name to make it consistent
What is the status of this issue? The issue that I submitted today, "Documentation on URL syntax", has been closed and deferred to this issue, which has been in play for over a year. In the meantime, the only documentation I've found that lays out URL syntax is the series of steps in section 4.5, which requires some figuring out to understand. So are we going to add something to fix that? |
@EnnexMB it basically needs someone to work on it and resolve the open questions above. |
Ok, can I help with this? It seems that the two open questions are:
What have I left out? If Timothy will post the code behind his graphic, I'll work on editing it to implement the suggestions above. |
Sounds good, thanks. What do you think @TimothyGu? As for placement, the top of section 4 might also work, given that it illustrates the relationship between various subsections. |
@EnnexMB Thanks for your interest in this. The WIP are unfortunately on my laptop that has seen some physical damage since the time I created them. I'll try to recover the files tonight.
For the first (https://user-images.githubusercontent.com/1538624/30042227-a0f3af64-9222-11e7-96a4-39c0cf11d279.png) it was just a manually created SVG. For the second it was a pretty standard HTML table with the spec's default styling.
NB: what's optional is quite different for different URL schemes. The URN at the end is a good indicator of that. In fact, for non-special URLs only the scheme is required and nothing else –
I'd be okay removing that. It's not really a component of the URL but rather a byproduct, so may not fit in that table. |
Hi @TimothyGu, were you able to recover the file? I don't think we need the first one, since it seems to be superseded by the second, the table version. Standard HTML is fine, and it could help to start from the structure you've already created, rather than starting from scratch. Of course, if you want to move forward with the changes yourself, that would be great too. But I'd be happy to do it if that would help. I understand that optionality is complex. I had in mind to devise some compact way to represent it in the first row of the table. You (and others) might want to take a look at the formula in my original post and see if you agree with the optionality as represented there by square brackets. (I just now edited with a correction.) It does have everything optional except |
@TimothyGu, any luck getting that file? I really think that one way or another we should get this done. |
@EnnexMB Sorry about the delay, but yes! Here's the diff for the table version: https://gist.github.com/5eb111b5021b338d516e97225a65bed4 Here's the SVG if you're interested. Note the https://gist.github.com/bf539f420463bab1eb7426cff267a5b4 (drawing2.svg have the fonts embedded) Please go ahead and work on it. I won't be able to do so myself and I really appreciate your stepping up. |
Thank you @TimothyGu. I need some help with the format of the file in the first link. Can someone send me a link to documentation on the diff format used there? I Googled "diff file" and don't see anything relevant. |
I found https://www.thegeekstuff.com/2014/12/patch-command-examples/. The document being patched is the source file for the URL Standard by the way, |
@EnnexMB Oops, I’m sorry to have missed your comment on the gist itself. What @annevk gave should work, though I would personally do this:
|
Okay, I'm sorry, but I still need a bit more help here. I think the problem is that this all started when I was reading the URL standard and posted an issue about it, which landed me here in GitHub, but I have no experience in GitHub. So when I'm told to use I Googled Sorry to distract from the thread topic by needing some guidance. |
It's for the command line, e.g., the Terminmal application on macOS. And yeah, you'd need to have such tooling installed (for macOS you'll get prompted to install it). To help you, I applied the diff to url.bs and copied the result to https://html5.org/temp/url.bs. |
Edit, Sept. 15: Disregard this post, and see my next one below. Okay, thank you. That gave me a helpful starting point. I don't know how to include HTML in this post, so I've inserted two images of what I've done and then after those images, I provide a link to the HTML file that generated both of them. Here is @TimothyGu's third table with the changes I suggested and some additional changes: In addition, I've done some further work to present an alternative proposal, which has three parts.
In the following image, the underlined text is working links in the HTML file linked further below. The two images above were generated in an HTML file using the same CSS as the URL standard. However, that didn't handle conversion of the double-brace wrappers used in @TimothyGu's code, so I converted those to The HTML file is posted at Gist, and I don't see a way to link to it so it can be read directly by your browser. So to see it as intended, you will have to copy it into your own htm file and view it in your browser from there. If someone will tell me a better way to do this in the future, I will do that. |
Alright, hold on a second. Disregard my previous post from a few days ago. I was just reading up on CSS syntax and in sections 4.1 and 5.1 came upon railroad diagrams. It's a far better way to represent syntax than my home-spun graphical representation above. I found a website for generating them, and here is the result for URLs: Along with that graphic, there is an htm file that shows that diagram with links on the element names to the relevant sections of the URL Standard, along with another representation of the syntax in EBNF notation, which is the code used to generate the diagram. As above, the htm file is saved as a Gist, and I wish I knew a way to post it so it would load directly in your browser, but I don't. From my previous post, the table of element conditions might still be useful. I'd say disregard all the rest. |
See #24 on some previous work done on creating a formal grammar for URLs, perhaps displayed through railroad diagrams (see http://intertwingly.net/stories/2014/10/20/Url.xhtml). In my opinion, RR diagrams and formal grammar solve a different problem, and a version of what I had should be enough just for a simple overview of URLs, which is what this bug is all about. |
The RR diagrams you linked to are very complex and, as you say, solve a different problem than we are discussing here. The RR diagram I posted is very simple and contains the same information as in your table plus information on optionality of elements. Do you have an idea of how to convey that optionality information in your table? That was what I was getting at with the graphical representation, but I think the RR diagram does it much better. Whether we use the RR diagram or a version of that table or something else, I would like to suggest that this issue be brought to a conclusion by posting something in the standard to give readers and easy way to understand the syntax of URLs. |
The railroad diagrams were intended to replace section 4.3 (writing) and 4.4 (parser). Some problems were:
If you don't plan to replace those sections and only offer them as non-normative guidance, then 2 and 3 go away. |
@rubys, would you like to work with me (or without me) on this? You obviously have far more experience and expertise on it and have already done a lot of the work. I wouldn't want to reinvent your wheel. It sounds like there's interest in using something now if it's either perfect or non-normative. |
@EnnexMB, I'm willing to help, but there seems to be some confusion. For example, I don't believe that the railroad diagrams were ever meant to replace any existing sections, and if I ever gave that impression, I apologize. Nor do I believe that they were meant to be normative (my memory is fuzzy on this point, perhaps they were initially proposed as such, but if so, we quickly determined that they were best non-normative. Beyond that, there is an even bigger disconnect. To illustrate, look at the original table and note that it uses the word I guess what I am getting at is that there may be multiple issues here, and they aren't mutually exclusive. It may be worthwhile adding multiple graphics to different sections. Finally, yes, I'm willing to help. If you have something you would like to see in the document and can show it displaying in a web page, I can review it and do the command line magic to make pull request for you. If what you produce addresses this issue, that's great. But if not, that's not a problem either. |
Hi @rubys, I'm glad you're willing to put those misunderstandings behind us and move forward with this. We do need to figure out the matter of terminology you mentioned. Let me ask this question. I see two possibilities:
If the first case is true, then perhaps each box of the diagrams could include both terms, i.e., it would be bilingual. Regarding a web page that displays a candidate of something to go in the Standard, I'd like to suggest that we're talking about something on the spectrum between my diagram and your diagrams. One problem with my diagram is that it doesn't include the case of relative URLs. But what I like about my diagram is that it summarizes the whole sequence of absolute URLs in one diagram (albeit with a line break). Your diagrams go into much more detail and therefore cover the content of my diagram in at least four different diagrams, as listed above. It seems that both approaches are worthwhile for seeing both the forest and the trees. In addition to @rubys, it would be helpful if @annevk and anyone else chimes in if you ever feel that we're going off in a direction that's not going to work. It would be unpleasant for us to develop something, only to be told later it's not suitable. |
I have added syntax diagrams to the Wikipedia pages on URNs and URIs. Those diagrams are generated directly from the syntax code posted on those pages (which was there before). The portion of the URI article that includes that drawing is transcluded (automatically copied) to the page on URLs. That means that other people at Wikipedia have decided that the syntax of URIs and URLs are the same. I don't know if that's correct or not. There is a contradiction between the URI/URL diagram and the one I originally proposed above. The diagram above shows the path as optional, but the syntax in Wikipedia (based on RFC 3896) shows it as required. So I suppose this is another error in that original diagram. If either of those diagrams posted in Wikipedia is incorrect, or if it is incorrect that URI and URL syntax are the same thing, then either feel free to edit the Wikipedia pages or let me know what the problems are and I'll get them corrected. The URI/URL syntax drawing does not have the level of detail in my original diagram or in @rubys's diagrams. I won't enhance the diagrams in Wikipedia until a new diagram or diagrams have been vetted and approved here. |
There's been no response on either of my last two posts for two weeks. I don't know if this is because my questions were deemed to dumb to comment on or too difficult to answer. I do think @rubys was right when he said that the conflict in terminology is an important place to start. But whereas he suggested choosing one form of terminology based on who the audience is, I'm suggesting sorting out and resolving the conflict so that all audiences can talk with each other and be understood. Can we do that? If we can, then we can proceed to make up a useful and correct (albeit nonnormative) illustration of the syntax. Regarding URIs and URLs, there is some disagreement in the world about whether they are synonymous or not. It would seem that the folks who set the standard for URLs would be a good authority for establishing the correct answer to that. And when we have that answer, we'll know whether the illustration of the URL syntax also applies to URIs synonymously or needs to be adjusted to apply to URIs. Are we going to move forward to get an illustration of the syntax done? |
It's not really clear to me what questions you have, I only count one question mark in the preceding two posts. Here's my view on the terminology:
|
Thanks @annevk. In your answer 1, it sounds like you're saying that the terminologies are synonymous. Therefore, the diagram I first posted above can be made bilingual by inserting the corresponding API terms in brackets below the non-API terms where they are different, as follows: @rubys, does this resolve your concern about the terminology? Can we move forward with developing the correct syntax diagrams in this way? @annevk, in your answer 2, is the view of the URL Standard authoritative, or is there some competing body that could disagree with you? If this view is authoritative, then I could propose that Wikipedia state that the term "URI" is depricated and when we finish the syntax diagram here, that should be posted on Wikipedia in the URL article with reference to the new diagram in the URL Standard. |
The IETF would likely disagree. |
Okay, thank you. |
Sorry, I think it would be nicer to always list the second term, even if it's identical, and link it to its definition. |
Yeah, listing both names in all boxes could make it clearer, even if repetitive. |
Note that they don't really match. E.g. if the scheme is " |
Okay, thank you @domenic. This is the question I was originally asking. If they don't match exactly, then they are not synonymous and the bilingual diagram above is not appropriate. In that case, either:
I think the second one (incorporate the differences) would be best if it can be done reasonably well, as it would help people understand the relationship between the terminologies. So, is there a place that lays out the exact relationship between the terminologies, i.e., the differences that you are referring to? |
The getter algorithms in https://url.spec.whatwg.org/#dom-url-href |
Okay, could you help by providing a translation of those algorithms to a set of correspondence rules, like the one you stated above, that scheme I wonder how that rule applies. The scheme is always followed by ":", so from that rule, it looks like the definition of |
The set of rules are described by the algorithms, no? |
It would be good to name and identify both the domain-names in the host-name (separated by dots) and the path-components (akak 'folder names') in the path (separated by slashes). |
As suggested in #337 by David Singer. This also formalizes single-dot and double-dot URL path segments as proper concepts and allows them to be part of the data structure rather than writing section, which is much more sound.
As suggested in #337 by David Singer. This also formalizes single-dot and double-dot URL path segments as proper concepts and allows them to be part of the data structure rather than writing section, which is much more sound.
Basically copy the bottom half of this: https://nodejs.org/api/url.html#url_url_strings_and_url_objects
(We could presumably SVG-ize it so it's a little prettier.)
Via the thread at https://twitter.com/wa7son/status/886982643463708673
The text was updated successfully, but these errors were encountered: