u- parsing should always do relative URL resolution #10

Zegnat · 2017-07-29T12:51:52Z

This question is separate from but affects #9.

Currently the parsing description for u- properties is as follows:

if a.u-x[href] or area.u-x[href], then get the href attribute

else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute

else if video.u-x[poster], then get the poster attribute

else if object.u-x[data], then get the data attribute

if there is a gotten value, return the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first element, if any).

else parse the element for the value-class-pattern. If a value is found, return it.

else if abbr.u-x[title], then return the title attribute

else if data.u-x[value] or input.u-x[value], then return the value attribute

else return the textContent of the element after removing all leading/trailing whitespace and nested <script> & <style> elements.

Note that URL normalisation is applied on the fifth point. Values gained from VCP, abbr, data, or input are never normalised. Is this really correct?

I ran into an issue here when implementing a partial feed. In this case I did not want the feed title to link to itself as that made no sense in relation to the surrounding HTML. Thus I opted for data instead of a:

<div class="h-feed" id="partial-feed">
  <h2 class="p-name"><data class="u-url" value="#partial-feed">Partial Feed</data></h2>
  …
</div>

However, because data[value] is never normalised, I am forced to write an absolute URL in there. That will hurt portability of the code.

I also think it is bad for input based values. My reasoning here is that a microformats editor should be able to use the same parsing algorithm on the editing and on the output. But if someone writes #fragment in an input-element text field the algorithm will output #fragment, and if this is converted to an a-element on save the same algorithm will output https://example.com/#fragment.

I propose moving the 5th point (“if there is a gotten value, return the normalized absolute URL […]”) as far down the list as possible. Is there any reason why for specific elements this should not be done? I am not sure of abbr but can’t come up with any abbr.u-x use-cases either.

If people can come up with good reasons why outputs for u- properties should not always be normalised on VCP and abbr I still propose to move the data/input case to be above the normalisation step.

The text was updated successfully, but these errors were encountered:

tantek · 2017-09-22T16:36:26Z

Use-case makes sense to me. And the change is relatively simple (move the relative URL resolution step after all the sources of retrieving the value).

From a compat perspective it shouldn't break any existing working content, because such relative URLs outside of URL attributes don't work today anyway. The only "odd" side-effect that is possible is that some existing broken u-url property values may start suddenly "working".

In addition if someone wants a non-relative-resolved "url" value from something like etc., they can just use p-url, e.g. and that way still get the old behavior (no idea why you would want that but just in case we're missing something).

aaronpk · 2017-09-22T16:38:51Z

I'm in favor of changing the u- parsing rule to always resolve URLs.

Another example of when you might want to use a <data> element instead of an <a> is to create a hidden link but not have the link be visible to screen readers or other consumers that are doing something with the HTML <a> semantic.

Supporting relative URL resolution on any element whose value came from a u- class seems consistent. It basically means the u- prefix tells the parser the value is a URL, whether that value comes from an <a href="" class="u-url"> or <data value="" class="u-url">, and should be resolved accordingly.

tantek · 2017-09-22T16:43:47Z

We now have a pull request jekyll/minima#160 that depends on this newer behavior so lets get at least one parser implementing this (so I'll add it to the spec as provisional) and either approvals or no objections from other implementers so we can move forward quickly (will make it official in the spec).

tantek · 2017-09-22T17:33:24Z

Since this greatly expands when relative URL resolution is done, this issue's resolution should depend on resolving #9 first.

bdesham · 2017-09-23T22:09:20Z

If I’m reading both correctly, this section on the “microformats2-parsing-faq” page on the wiki deals with this same topic.

Zegnat · 2017-09-23T22:38:03Z

@bdesham, yes, and that FAQ item will need updating if the proposed change from this issue is accepted.

The argument made there is that URLs being “displayed and used as is” by a browser should not be normalised, so microformats parsers will match browser output. This issue argues that doing that is not what is expected from microformats parsers.

tantek · 2018-04-17T17:49:47Z

Upon reconsideration, I retract my suggestion in #10 (comment) that "this issue's resolution should depend on resolving #9 first", and commented on how to orthogonally resolve issue #9 (http://tantek.com/2018/107/t1).

As promised in #10 (comment), I’ve added PROPOSED text inline in the u-* parsing section per the proposal of this issue: http://microformats.org/wiki/index.php?title=microformats2-parsing&diff=66782&oldid=66724.

I see github.com/aaronpk’s agreement with this proposal, and would like to see at least one, preferably 2-3, more parser developer(s) explicitly agreeing as well.

We also need to see this proposed change prototyped in at least one parser to make sure it is implementable (seems like it) and to see if there are any unintended consequences.

(Originally published at: http://tantek.com/2018/107/t2/)

tantek · 2018-04-18T05:02:55Z

Additionally there is a compelling use-case for this proposal:

Permalink pages which do not link to themselves or otherwise display their own URL.

This proposal would enable the relatively (so to speak) minimal markup:

To provide the u-url for the h-entry of such permalink pages, instead of having to provide an absolute URL in the value attribute.

(Originally published at: http://tantek.com/2018/107/t3/)

Zegnat · 2018-04-20T10:18:27Z

I am definitely 👍 on this. Will free up some time to get a working implementation in the PHP parser.

This matches the proposed parsing change from microformats/microformats2-parsing#10.

implements proposal in microformats/microformats2-parsing#10

willnorris · 2018-08-25T01:54:30Z

I'm fully supportive of this. I've made the change in the go library (in a separate relurl branch for now) to see what tests will break, and the only one that does is microformats-v1/hcard/email. I'll prep a PR for the tests repo to fix this once this spec change goes in.

% go test .
--- FAIL: TestSuite (0.03s)
    --- FAIL: TestSuite/microformats-v1 (0.01s)
        --- FAIL: TestSuite/microformats-v1/hcard/email (0.00s)
                testsuite_test.go:130: Parse value differs:
                         {
                          items: [
                           {
                            properties: {
                             email: [
                              "mailto:[email protected]",
                        -     "[email protected]",
                        +     "http://example.com/[email protected]",
                              "mailto:[email protected]?subject=parser-test",
                        -     "[email protected]",
                        +     "http://example.com/[email protected]",
                             ],
                             name: [
                              "John Doe",
                             ],
                            },
                            type: [
                             "h-card",
                            ],
                           },
                          ],
                          rel-urls: {
                          },
                          rels: {
                          },
                         }
FAIL
FAIL    willnorris.com/go/microformats  0.036s

willnorris · 2018-08-25T01:58:11Z

the fact that only one test broke also suggests that we should add a few additional test cases to cover this change.

willnorris · 2018-08-25T02:41:42Z

This proposal would enable the relatively (so to speak) minimal markup:
<data class="u-url" value=""></data>

Even simpler, you could just have <data class="u-url">. Without a value attribute, it will go to text content parsing, which will still result in an empty string, which will be resolved the same.

updates tests to match microformats/microformats2-parsing#10 by fixing one broken test in v1/hcard/email, and adding a new test in v2/hcard-relativeurlsempty that will pass only with the new parsing rules implemented.

sknebel · 2018-10-04T16:17:42Z

This has two implementations now and as far as I can see no objections, and thus should be ready to be integrated into the spec.

sknebel · 2018-10-17T19:07:02Z

PR available for mf2py: microformats/mf2py#139

Zegnat · 2018-10-18T07:44:35Z

Something else that was brought up: empty <a> elements will throw errors on accessibility reporting tools. Yet several sites use them for hidden permalinks today. Something we can get rid off once <data> can be used!

With two parsers update and the mf2py PR sitting I feel like it should be made permanent in the spec. If there are no further objections I'll update the wiki - at the latest during IWC this coming weekend.

tantek · 2018-12-24T21:36:36Z

Resolution: proposal accepted.

No objections in above discussion, and positive opinions (👍) from several implementors on the proposal.

Proposal implementations in mf2py and microformats go parsers is sufficient to demonstrate implementability and interoperability (with updated tests cases), all as noted/linked in issue thread.

Editing specification accordingly.

(Originally published at: http://tantek.com/2018/358/t4/)

implements proposal in microformats/microformats2-parsing#10

tantek changed the title ~~When should u- values be normalised to absolute URLs?~~ u- parsing should always do relative URL resolution Sep 22, 2017

tantek mentioned this issue Apr 18, 2018

Bridgy Publish to GitHub should preserve markup in plaintext as such snarfed/bridgy#810

Closed

Zegnat mentioned this issue Apr 20, 2018

"return the normalized absolute URL" for invalid URLs? #9

Open

Zegnat added a commit to Zegnat/php-mf2 that referenced this issue Apr 20, 2018

Move resolve step last in u-* parsing

a60905d

This matches the proposed parsing change from microformats/microformats2-parsing#10.

Zegnat mentioned this issue Apr 20, 2018

Move resolve step last in u-* parsing microformats/php-mf2#170

Merged

willnorris added a commit to willnorris/microformats that referenced this issue Aug 25, 2018

move URL resolution to the end of u- processing

e195788

implements proposal in microformats/microformats2-parsing#10

willnorris mentioned this issue Aug 25, 2018

update tests to always resolve relative URLs microformats/tests#108

Merged

sknebel mentioned this issue Oct 16, 2018

always do relative URL resolution microformats/mf2py#138

Closed

tantek closed this as completed Dec 24, 2018

willnorris added a commit to willnorris/microformats that referenced this issue Dec 24, 2018

move URL resolution to the end of u- processing

0f19fbc

implements proposal in microformats/microformats2-parsing#10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

u- parsing should always do relative URL resolution #10

u- parsing should always do relative URL resolution #10

Zegnat commented Jul 29, 2017

tantek commented Sep 22, 2017 •

edited

Loading

aaronpk commented Sep 22, 2017

tantek commented Sep 22, 2017

tantek commented Sep 22, 2017

bdesham commented Sep 23, 2017

Zegnat commented Sep 23, 2017

tantek commented Apr 17, 2018

tantek commented Apr 18, 2018 •

edited

Loading

Zegnat commented Apr 20, 2018

willnorris commented Aug 25, 2018

willnorris commented Aug 25, 2018

willnorris commented Aug 25, 2018

sknebel commented Oct 4, 2018

sknebel commented Oct 17, 2018

Zegnat commented Oct 18, 2018

tantek commented Dec 24, 2018

u- parsing should always do relative URL resolution #10

u- parsing should always do relative URL resolution #10

Comments

Zegnat commented Jul 29, 2017

tantek commented Sep 22, 2017 • edited Loading

aaronpk commented Sep 22, 2017

tantek commented Sep 22, 2017

tantek commented Sep 22, 2017

bdesham commented Sep 23, 2017

Zegnat commented Sep 23, 2017

tantek commented Apr 17, 2018

tantek commented Apr 18, 2018 • edited Loading

Zegnat commented Apr 20, 2018

willnorris commented Aug 25, 2018

willnorris commented Aug 25, 2018

willnorris commented Aug 25, 2018

sknebel commented Oct 4, 2018

sknebel commented Oct 17, 2018

Zegnat commented Oct 18, 2018

tantek commented Dec 24, 2018

tantek commented Sep 22, 2017 •

edited

Loading

tantek commented Apr 18, 2018 •

edited

Loading