Prevent top-level navigation of data URLs #5279

annevk · 2020-02-12T14:08:25Z

It also seems that since we introduced this currentURL variable in process a navigate request we could move some of the scheme checks we do on response's location URL into the loop. That might be nicer?

One thing that's not entirely clear to me is if the user navigating the top-level to a data URL should be a thing that's supported by the standard or considered a UI feature. If it should be supported, I guess we want to check the source being null? It's not entirely clear to me.

At least two implementers are interested (and none opposed):
- Firefox
- Chrome
Tests are written and can be reviewed and commented upon at:
- HTML: top-level data: URLs web-platform-tests/wpt#17730
Implementation bugs are filed:
- Chrome: …
- Firefox: …
- Safari: …

(See WHATWG Working Mode: Changes for more details.)

💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 7:59 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

Parsing MDN data...
Parsing...

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

Tests: web-platform-tests/wpt#17730.

domenic · 2020-02-12T17:12:11Z

One thing that's not entirely clear to me is if the user navigating the top-level to a data URL should be a thing that's supported by the standard or considered a UI feature.

It seems to me that it should be supported by the standard. Otherwise, the moment you navigate to a data: URL using the location bar, you're in totally non-standard, anything-goes territory. And I think the standard probably has interesting things to say around what happens when you document.write() into such pages, or insert iframes, or...

Stepping back further, if we say that top-level navigation via the URL bar in general is not supported by the standard, then it'd be compliant to build a browser which totally ignores all of the HTML Standard, since it'd make up its own rules for how to process top-level navigations, and there's no clear point at which you re-enter the world of standardized behavior. E.g., clicking a link in such a scenario doesn't have to be considered navigating; the top-level content displayed could contain things that look a lot like links, but what says that they actually are? Maybe that's a bit too philosophical :).

Also, the spec says:

A user agent may provide various ways for the user to explicitly cause a browsing context to navigate, in addition to those defined in this specification.

whcih probably covers the URL bar.

So overall, I'm not super-comfortable with this PR as-is, because it seems to prohibit something that browsers do allow. We should at least include an XXX box about the situation, if we can't figure out a way to untangle it.

If it should be supported, I guess we want to check the source being null? It's not entirely clear to me.

Here you mean sourceBrowsingContext? Hmm. That seems reasonable on its face, but the spec says

Navigation always involves source browsing context , which is the browsing context which was responsible for starting the navigation.

which implies it will never be null. And I haven't checked thoroughly, but the source browsing context is passed through to a bunch of algorithms, which I doubt are all equipped to handle a null one.

I'd love to hear from @mikewest who wrote a lot of those algorithms (in CSP), as to what architecture makes sense here. Also @zetafunction and @jeremyroman who are Chromium folks who I know are both experts in navigation and experts in the navigation spec.

jeremyroman · 2020-02-12T18:50:10Z

I don't understand this well enough to comment offhand (and am going OOO shortly), but @zetafunction @csreis and @naskooskov would know about navigation.

csreis · 2020-02-13T00:27:10Z

I'm not sure his Github username, but please talk with [email protected], who implemented https://crbug.com/594215. That was when Chrome started blocking pages from navigating to top-level data URLs (due to URL spoof concerns), though users could still navigate to them manually in the address bar, etc.

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.

Automatic update from web-platform-tests HTML: top-level data: URLs Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279. -- wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5 wpt-pr: 17730

Automatic update from web-platform-tests HTML: top-level data: URLs Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279. -- wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5 wpt-pr: 17730 UltraBlame original commit: 2c0b0e26f5c9408f8ff6895a1a8803d72571d5d2

annevk · 2020-04-02T13:14:13Z

The idea that there's always a source browsing context is also why request's client never seems to be null. But they seem clearly wrong. If the user creates a browsing context and navigates it, there's no source of authority anywhere. Unfortunately, that means this depends on fixing #1130.

annevk · 2020-04-07T10:06:35Z

Per https://bugzilla.mozilla.org/show_bug.cgi?id=1403814 some non-user-initiatied top-level navigations to data URLs apparently still succeed in Chrome/Firefox. Hmm.

rniwa · 2020-08-12T23:29:39Z

Could someone clarify what the motivation of this change is?

rniwa · 2020-08-12T23:38:43Z

FWIW, Safari 14 (beta) disables navigations to data URLs from web pages but still allows users to navigate to a data URL manually.

domenic · 2020-08-13T00:13:49Z

I believe the motivation here is to spec the Safari 14 beta behavior. However, @annevk's pull request goes a bit further and prevents all top-level navigation to data URLs, kinda accidentally.

We then got into a semi-philosophical discussion as to whether the spec actually specifies how URL-bar navigations work at all. Since the spec isn't super-clear on that point, it's not obvious how to change @annevk's pull request to only prohibit navigation from web pages, without prohibiting URL bar navigations. And I guess there isn't interop on the blocking anyway, despite us earlier thinking that there was. So that's why things have stalled a bit.

rniwa · 2020-08-13T00:39:59Z

I believe the motivation here is to spec the Safari 14 beta behavior. However, @annevk's pull request goes a bit further and prevents all top-level navigation to data URLs, kinda accidentally.

Okay, that's good to know.

Per https://bugzilla.mozilla.org/show_bug.cgi?id=1403814 some non-user-initiatied top-level navigations to data URLs apparently still succeed in Chrome/Firefox. Hmm.

What are those cases?

annevk · 2020-08-17T05:51:18Z

If they result in a download. In Firefox you can also navigate to text/plain documents, though I would like us to stop doing that.

Aleksey-D-F · 2020-11-11T14:05:16Z

Dear @annevk,

A browser can load an XML document natively by navigating its URL and execute the associated transformations. On the client side, how to use this functionality to load natively a dynamically generated XML document from a string variable (without the use of XMLHttpRequest) so that all the transformations are executed by the browser itself rather than JS code using DOMParser, XSLTProcessor and other Web APIs?

The native XML and XSLT support by a browser without the need of heavy JS code is much cleaner and simpler. And this native functionality can be easily accessed via data: protocol, for which it was initially introduced many years ago. I know XML and XSLT, however learning JS and Web API peculiarities is out of my plans as it is very time-consuming because of a lot of browsers and version dependencies. It is better I develop yet one feature for processing LC-SRM data (chromatography-mass-spectrometry) in C++ rather than find a way of loading client-side XML document via Web API and JS. The same origin policy already broke native loading of a dynamically generated XML document on the client side.

The exact use case is that such XML documents contain parameters extracted from URL and serve as requests to an XML-XSL-SVG-based report (i.e. to its pages or other parts pointed by the URL) and associate corresponding XSL transformations depending on the kind of request.

bzbarsky · 2020-11-11T22:48:23Z

@Aleksey-D-F Does navigating to a blob: url address your use case?

annevk added the topic: navigation label Feb 12, 2020

annevk mentioned this pull request Feb 12, 2020

HTML: top-level data: URLs web-platform-tests/wpt#17730

Merged

Prevent top-level navigation of data URLs

2d9e6b0

Tests: web-platform-tests/wpt#17730.

annevk force-pushed the annevk/top-level-data-urls branch from 220f385 to 2d9e6b0 Compare February 12, 2020 16:56

annevk requested review from bzbarsky, domenic and mikewest February 12, 2020 16:57

annevk mentioned this pull request Feb 13, 2020

Can data: URLs be part of a secure context? w3c/webappsec-secure-contexts#69

Closed

annevk added a commit to web-platform-tests/wpt that referenced this pull request Feb 18, 2020

HTML: top-level data: URLs

8e19fa6

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.

Base automatically changed from master to main January 15, 2021 07:57

annevk mentioned this pull request May 14, 2021

Find the best terminology to restrict the usage of data urls w3ctag/design-reviews#635

Closed

domenic force-pushed the main branch from 0947339 to 1b5099f Compare July 27, 2021 17:07

annevk mentioned this pull request Sep 9, 2021

What should happen when a page redirects to a data: URL? #7045

Closed

domenic mentioned this pull request Sep 9, 2021

Do not allow redirecting to data: or about: URLs #7042

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent top-level navigation of data URLs #5279

Prevent top-level navigation of data URLs #5279

annevk commented Feb 12, 2020 •

edited by pr-preview bot

Loading

domenic commented Feb 12, 2020 •

edited

Loading

jeremyroman commented Feb 12, 2020

csreis commented Feb 13, 2020

annevk commented Apr 2, 2020

annevk commented Apr 7, 2020

rniwa commented Aug 12, 2020

rniwa commented Aug 12, 2020

domenic commented Aug 13, 2020

rniwa commented Aug 13, 2020

annevk commented Aug 17, 2020

Aleksey-D-F commented Nov 11, 2020 •

edited

Loading

bzbarsky commented Nov 11, 2020

Prevent top-level navigation of data URLs #5279

Are you sure you want to change the base?

Prevent top-level navigation of data URLs #5279

Conversation

annevk commented Feb 12, 2020 • edited by pr-preview bot Loading

💥 Error: Wattsi server error 💥

domenic commented Feb 12, 2020 • edited Loading

jeremyroman commented Feb 12, 2020

csreis commented Feb 13, 2020

annevk commented Apr 2, 2020

annevk commented Apr 7, 2020

rniwa commented Aug 12, 2020

rniwa commented Aug 12, 2020

domenic commented Aug 13, 2020

rniwa commented Aug 13, 2020

annevk commented Aug 17, 2020

Aleksey-D-F commented Nov 11, 2020 • edited Loading

bzbarsky commented Nov 11, 2020

annevk commented Feb 12, 2020 •

edited by pr-preview bot

Loading

domenic commented Feb 12, 2020 •

edited

Loading

Aleksey-D-F commented Nov 11, 2020 •

edited

Loading