Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML MIME types used for documents are not interoperable #7420

Open
annevk opened this issue Dec 14, 2021 · 9 comments
Open

XML MIME types used for documents are not interoperable #7420

annevk opened this issue Dec 14, 2021 · 9 comments
Labels
interop Implementations are not interoperable with each other topic: navigation

Comments

@annevk
Copy link
Member

annevk commented Dec 14, 2021

As reported at https://bugzilla.mozilla.org/show_bug.cgi?id=1717560 different browsers have different fixed XML MIME type sets (rather than adhering to the +xml convention). This can sometimes be used to exploit unaware websites. I'm not inclined to treat this as a browser security problem, but the lack of interop is a browser problem we ought to address.

Examples:

  • data:application/atom+xml,<script xmlns="http://www.w3.org/1999/xhtml">alert(1)</script> executes in Chrome and Safari, downloads in Firefox.
  • data:application/mathml+xml,<script xmlns="http://www.w3.org/1999/xhtml">alert(1)</script> executes in Firefox, downloads in Chrome and Safari.

No browser appears to support +xml in general. My inclination is that we should try to fix this and align browsers with the standard. Thoughts on that?

cc @gijsk @mfreed7 @cdumez

@annevk annevk added topic: navigation interop Implementations are not interoperable with each other agenda+ To be discussed at a triage meeting labels Dec 14, 2021
@domenic
Copy link
Member

domenic commented Dec 14, 2021

So in particular this is about what path we go down in step 9 of https://html.spec.whatwg.org/#process-a-navigate-response, where we have two choices:

  1. "Not explicitly supported XML MIME type": produce an XML document, using the computed navigationParams. (Notably the origin, i.e., the resulting document could be same-origin and thus script-inspectable.)
  2. "Explicitly supported XML MIME type": proceed onward to steps 10 and 11, i.e. either display as an opaque-origin document with custom presentation, or hand off to external application/download.

Currently the spec lets engines decide what MIME types are in the "explicitly supported" set. And that seems pretty reasonable, e.g., Firefox and Safari might have MathML, whereas Chrome would not? So I'm not sure how much interop we want to demand here...

Do we have a complete matrix of all the types we'd want to consider? Maybe we should assemble one by looking at browser source code? Then we can run some tests.

@annevk
Copy link
Member Author

annevk commented Dec 14, 2021

Note that per that definition it would mean that application/mathml+xml is an explicitly supported XML MIME type for Chrome and Safari (because they download), but not Firefox (because it does the normal document thing). And that all browsers have application/{random}+xml as explicitly supported XML MIME types... In general I would expect that browsers don't have explicitly supported XML MIME types. That was mainly for RSS from what I remember.

@domenic
Copy link
Member

domenic commented Dec 14, 2021

Yeah, I think the term "explicitly supported" is bad, but I think maybe it's OK to have browser-dependent behavior as to whether things are downloaded/displayed-as-plugin vs. displayed-as-potentially-same-origin XML? But I dunno, maybe we could tighten that up a bit, or flip the default.

@annevk
Copy link
Member Author

annevk commented Dec 14, 2021

Apart from RSS and perhaps XML formats that a native app consumes I have a hard time coming up with examples. I think I would prefer a world whereby we have a set of XML MIME types that always results in a document and all others result in a download/plugin/dispatch-to-native.

@domenic
Copy link
Member

domenic commented Mar 7, 2022

IMO best next steps are to try to determine how large the divergence is. We can do this either by code inspection or blackbox testing.

I'm trying to get help finding the Chromium code; if anyone from Mozilla or WebKit could help with those that'd also be lovely. /cc @cdumez. (Context: trying to figure out whether we can/should make browsers interoperable on which XML MIME types trigger the XML tree viewer vs. downloads vs. any other option.)

For blackbox testing, I created https://cool-massive-appendix.glitch.me/xml?contentType=application/xml , where you can change the contentType parameter. Remember to encode + as %2B. Basic results so far:

@domenic
Copy link
Member

domenic commented Mar 7, 2022

@jeremyroman found some of the relevant Chromium code for me:

  • MIME sniffing
    • Any text/* except text/html, text/xml, and text/xsl will end up as a text document. So e.g. text/foo+xml is a text document.
    • Otherwise we use IsXMLMimeType which has special cases for text/xml, text/xsl, application/xml, and then uses a complicated RFC-based parser to count certain x/y+xml MIME types as XML.
  • XML tree viewer vs. not
    • If there's an XSL transform, use that
    • If there's no error (presumably an XML parsing error?) and no CSS and it's not SVG and we didn't see elements in "known namespaces", then use XML viewer mode
    • Otherwise... not sure exactly what the fallback is, but I suspect it could spit out either text, HTML, or SVG documents.

None of this yet explains why some cases end up downloaded. I suspect that happens in earlier code.

@smaug----
Copy link

smaug---- commented Apr 7, 2022

Not including media and image types, this is what Gecko does:

(1) (X)HTML documents: text/html, application/x-view-source, application/xhtml+xml, application/vnd.wap.xhtml+xml
(application/x-view-source is converted internally to text/plain)

(2) XML document: text/xml, application/xml, application/mathml+xml, application/rdf+xml, text/rdf
(image/svg+xml is handled similarly to items in this group when loaded as a document.)

(3) Plain text: text/plain, text/css, text/cache-manifest, text/vtt, application/javascript, application/x-javascript, text/ecmascript, application/ecmascript, text/javascript, application/json (gets json viewer), text/json

(1) and (2) may execute scripts (except view-source which is converted to text/plain)

Random text/foo or text/foo+xml is downloaded.

XML viewer mode is used if the document is parsed as an XHTML/XML document and doesn't have XHTML nor SVG elements and there isn't a style (css or xslt) link from the header nor from a processing instruction.

@josepharhar
Copy link
Contributor

I made a test page with all of the mime types of the last comment thrown into iframes, and there are definitely some differences between what the browsers render: https://volcano-raspy-lead.glitch.me/
WebKit and chromium seem to be mostly similar.
WebKit and chromium's XML tree viewers don't run in iframes though :(

To add to domenic's comment, chromium and webkit also have behavior where application/rss+xml and application/atom+xml get converted to text/plain very early in the network stack due to a security bug from 2009. There is a chrome bug with a high number of stars to get rid of this behavior and allow these mime types to be rendered with the XML tree viewer.

Here is a table for some mime types ending with +xml

content-type Firefox Chromium WebKit
application/rss+xml download plain text plain text
application/atom+xml download plain text plain text
application/mathml+xml execute download download
application/foo+xml download download download

None of this yet explains why some cases end up downloaded. I suspect that happens in earlier code.

Yeah I'm not sure where that code in chromium is either.

Another thing worth considering that I found while trying to address the high star open chrome bug is that the XML tree viewer in chromium actually executes the XML file before rendering the tree viewer, so I guess that for mime types which we believe are a security issue the only options are render as plain text and download...?

@josepharhar
Copy link
Contributor

Here is a more exhaustive table of test cases. I'm not going to look at whether the XML tree viewer is opened in this analysis, just executed vs plain text vs download when navigating directly to an XHTML document.

content-types with non-interoperable behavior:

content-type firefox chromium webkit
application/rss+xml download plain text plain text
application/atom+xml download plain text plain text
application/mathml+xml execute download download
application/x-view-source plain text? download download
application/vnd.wap.xhtml+xml execute download execute
application/rdf+xml execute download download
text/rdf execute download download
text/foo download plain text plain text
text/foo+xml download plain text plain text

content-types with interoperable behavior:

content-type behavior
application/foo+xml download
application/xml execute
text/plain plain text
text/html execute
application/xhtml+xml execute
text/xml execute
image/svg+xml execute
text/css plain text
text/cache-manifest plain text
text/vtt plain text
application/javascript plain text
application/x-javascript plain text
text/ecmascript plain text
application/ecmascript plain text
text/javascript plain text
application/json plain text
text/json plain text

@domenic domenic added the agenda+ To be discussed at a triage meeting label May 3, 2022
@past past removed the agenda+ To be discussed at a triage meeting label Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Implementations are not interoperable with each other topic: navigation
Development

No branches or pull requests

5 participants