Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent top-level navigation of data URLs #5279

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

annevk
Copy link
Member

@annevk annevk commented Feb 12, 2020

It also seems that since we introduced this currentURL variable in process a navigate request we could move some of the scheme checks we do on response's location URL into the loop. That might be nicer?

One thing that's not entirely clear to me is if the user navigating the top-level to a data URL should be a thing that's supported by the standard or considered a UI feature. If it should be supported, I guess we want to check the source being null? It's not entirely clear to me.

(See WHATWG Working Mode: Changes for more details.)


💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 7:59 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

Parsing MDN data...
Parsing...



If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

@annevk annevk force-pushed the annevk/top-level-data-urls branch from 220f385 to 2d9e6b0 Compare February 12, 2020 16:56
@domenic
Copy link
Member

domenic commented Feb 12, 2020

One thing that's not entirely clear to me is if the user navigating the top-level to a data URL should be a thing that's supported by the standard or considered a UI feature.

It seems to me that it should be supported by the standard. Otherwise, the moment you navigate to a data: URL using the location bar, you're in totally non-standard, anything-goes territory. And I think the standard probably has interesting things to say around what happens when you document.write() into such pages, or insert iframes, or...

Stepping back further, if we say that top-level navigation via the URL bar in general is not supported by the standard, then it'd be compliant to build a browser which totally ignores all of the HTML Standard, since it'd make up its own rules for how to process top-level navigations, and there's no clear point at which you re-enter the world of standardized behavior. E.g., clicking a link in such a scenario doesn't have to be considered navigating; the top-level content displayed could contain things that look a lot like links, but what says that they actually are? Maybe that's a bit too philosophical :).

Also, the spec says:

A user agent may provide various ways for the user to explicitly cause a browsing context to navigate, in addition to those defined in this specification.

whcih probably covers the URL bar.

So overall, I'm not super-comfortable with this PR as-is, because it seems to prohibit something that browsers do allow. We should at least include an XXX box about the situation, if we can't figure out a way to untangle it.


If it should be supported, I guess we want to check the source being null? It's not entirely clear to me.

Here you mean sourceBrowsingContext? Hmm. That seems reasonable on its face, but the spec says

Navigation always involves source browsing context , which is the browsing context which was responsible for starting the navigation.

which implies it will never be null. And I haven't checked thoroughly, but the source browsing context is passed through to a bunch of algorithms, which I doubt are all equipped to handle a null one.

I'd love to hear from @mikewest who wrote a lot of those algorithms (in CSP), as to what architecture makes sense here. Also @zetafunction and @jeremyroman who are Chromium folks who I know are both experts in navigation and experts in the navigation spec.

@jeremyroman
Copy link
Collaborator

I don't understand this well enough to comment offhand (and am going OOO shortly), but @zetafunction @csreis and @naskooskov would know about navigation.

@csreis
Copy link

csreis commented Feb 13, 2020

I'm not sure his Github username, but please talk with [email protected], who implemented https://crbug.com/594215. That was when Chrome started blocking pages from navigating to top-level data URLs (due to URL spoof concerns), though users could still navigate to them manually in the address bar, etc.

annevk added a commit to web-platform-tests/wpt that referenced this pull request Feb 18, 2020
Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.
xeonchen pushed a commit to xeonchen/gecko that referenced this pull request Feb 21, 2020
Automatic update from web-platform-tests
HTML: top-level data: URLs

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.
--

wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5
wpt-pr: 17730
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Feb 21, 2020
Automatic update from web-platform-tests
HTML: top-level data: URLs

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.
--

wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5
wpt-pr: 17730
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Feb 23, 2020
Automatic update from web-platform-tests
HTML: top-level data: URLs

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.
--

wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5
wpt-pr: 17730

UltraBlame original commit: 2c0b0e26f5c9408f8ff6895a1a8803d72571d5d2
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Feb 23, 2020
Automatic update from web-platform-tests
HTML: top-level data: URLs

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.
--

wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5
wpt-pr: 17730

UltraBlame original commit: 2c0b0e26f5c9408f8ff6895a1a8803d72571d5d2
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Feb 23, 2020
Automatic update from web-platform-tests
HTML: top-level data: URLs

Non-user-initiated-top-level navigations to a data: URL need to fail. This still needs to be specified in detail and that is tracked in whatwg/html#5279.
--

wpt-commits: 8e19fa6bfbd17074c4810d6acbe6ab04acb648e5
wpt-pr: 17730

UltraBlame original commit: 2c0b0e26f5c9408f8ff6895a1a8803d72571d5d2
@annevk
Copy link
Member Author

annevk commented Apr 2, 2020

The idea that there's always a source browsing context is also why request's client never seems to be null. But they seem clearly wrong. If the user creates a browsing context and navigates it, there's no source of authority anywhere. Unfortunately, that means this depends on fixing #1130.

@annevk
Copy link
Member Author

annevk commented Apr 7, 2020

Per https://bugzilla.mozilla.org/show_bug.cgi?id=1403814 some non-user-initiatied top-level navigations to data URLs apparently still succeed in Chrome/Firefox. Hmm.

@rniwa
Copy link

rniwa commented Aug 12, 2020

Could someone clarify what the motivation of this change is?

@rniwa
Copy link

rniwa commented Aug 12, 2020

FWIW, Safari 14 (beta) disables navigations to data URLs from web pages but still allows users to navigate to a data URL manually.

@domenic
Copy link
Member

domenic commented Aug 13, 2020

I believe the motivation here is to spec the Safari 14 beta behavior. However, @annevk's pull request goes a bit further and prevents all top-level navigation to data URLs, kinda accidentally.

We then got into a semi-philosophical discussion as to whether the spec actually specifies how URL-bar navigations work at all. Since the spec isn't super-clear on that point, it's not obvious how to change @annevk's pull request to only prohibit navigation from web pages, without prohibiting URL bar navigations. And I guess there isn't interop on the blocking anyway, despite us earlier thinking that there was. So that's why things have stalled a bit.

@rniwa
Copy link

rniwa commented Aug 13, 2020

I believe the motivation here is to spec the Safari 14 beta behavior. However, @annevk's pull request goes a bit further and prevents all top-level navigation to data URLs, kinda accidentally.

Okay, that's good to know.

Per https://bugzilla.mozilla.org/show_bug.cgi?id=1403814 some non-user-initiatied top-level navigations to data URLs apparently still succeed in Chrome/Firefox. Hmm.

What are those cases?

@annevk
Copy link
Member Author

annevk commented Aug 17, 2020

If they result in a download. In Firefox you can also navigate to text/plain documents, though I would like us to stop doing that.

@Aleksey-D-F
Copy link

Aleksey-D-F commented Nov 11, 2020

Dear @annevk,

A browser can load an XML document natively by navigating its URL and execute the associated transformations. On the client side, how to use this functionality to load natively a dynamically generated XML document from a string variable (without the use of XMLHttpRequest) so that all the transformations are executed by the browser itself rather than JS code using DOMParser, XSLTProcessor and other Web APIs?

The native XML and XSLT support by a browser without the need of heavy JS code is much cleaner and simpler. And this native functionality can be easily accessed via data: protocol, for which it was initially introduced many years ago. I know XML and XSLT, however learning JS and Web API peculiarities is out of my plans as it is very time-consuming because of a lot of browsers and version dependencies. It is better I develop yet one feature for processing LC-SRM data (chromatography-mass-spectrometry) in C++ rather than find a way of loading client-side XML document via Web API and JS. The same origin policy already broke native loading of a dynamically generated XML document on the client side.

The exact use case is that such XML documents contain parameters extracted from URL and serve as requests to an XML-XSL-SVG-based report (i.e. to its pages or other parts pointed by the URL) and associate corresponding XSL transformations depending on the kind of request.

@bzbarsky
Copy link
Contributor

@Aleksey-D-F Does navigating to a blob: url address your use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants