Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] Adopting the Sanitizer API #596

Open
mozfreddyb opened this issue Jul 12, 2021 · 24 comments
Open

[meta] Adopting the Sanitizer API #596

mozfreddyb opened this issue Jul 12, 2021 · 24 comments
Labels

Comments

@mozfreddyb
Copy link
Contributor

As it was discussed in a previous webappsec meeting, @otherdaniel and I are hoping for the Sanitizer API to be adopted into the working group, when the time comes.

Filing this issue to make sure that this gets into an upcoming charter - eventually.

@mikewest
Copy link
Member

Hey Freddy! Are y'all ready to kick off a CfC to adopt the spec and publish an FPWD, or is this really just about making room in the charter? From my perspective, the charter's existing scope pretty clearly covers the santizer: does Mozilla think we should make changes to that text? Or simply list the Sanitizer as a potential deliverable?

/cc @samuelweiler @wseltzer @dveditz

@mozfreddyb
Copy link
Contributor Author

I think we all agree that this is in scope of the charter, but I also noted that it might be better to cover potential deliverables explicitly, rather than as an aside.
Speaking about explicit, the spec has two prototype implementations, one in Gecko and one in Blink.

Originally, I was hesitant to aim for FPWD already. This is because we have yet to resolve some important API concerns with regards to contextual HTML parsing and parser-options that are orthogonal to sanitization (due to declarative shadow dom).
But having looked at the FPWD of other specs (e.g., Subresource Integrity) it seems that significant changes are not a big deal. Specifically, the FPWD of Subresource Integrity supported all sorts of elements and had a different syntax for integrity metadata.
I'll bring the FPWD question to our Sanitizer meeting later today.

@mikewest
Copy link
Member

With my chair hat on, I see a proposed spec with shared interest, implementation, and tests that fits squarely into the scope the WG has carved out for itself. FPWDs are not done but a basis upon which further work will proceed, and changes are both expected and normal. This seems like a totally reasonable thing to adopt at this point.

That said, I'll happily leave the timing up to y'all. :)

@otherdaniel
Copy link
Member

Hi... just want to eplicitly support Frederik's request, and particularly aiming for CfC & FPWD. I'd also be happy with just adopting this as a future deliverable, if that's what the group recommends instead.

@mikewest
Copy link
Member

Ok. Thanks, Daniel! Freddy, are you happy with me sending a CfC to the group to adopt this as a deliverable?

@mozfreddyb
Copy link
Contributor Author

Yes. Please send the CfC, @mikewest!

@mikewest
Copy link
Member

WICG/sanitizer-api#114 and https://lists.w3.org/Archives/Public/public-webappsec/2021Jul/0008.html.

@samuelweiler, can we add this as a deliverable now, or do we need to wait for the CfC to run its course?

@samuelweiler
Copy link
Member

If we're adding it, I would prefer to let that CfC run its course.

I have mixed thoughts on adding to the charter at this stage, given that the AC review has happened, so TBD.

@annevk
Copy link
Member

annevk commented Jul 19, 2021

How will this work longer term with respect to HTML? I think that most of this would be better placed in the HTML specification and evolved alongside with it.

cc @domenic

@otherdaniel
Copy link
Member

How will this work longer term with respect to HTML? I think that most of this would be better placed in the HTML specification and evolved alongside with it.

Long-term, I'm happy to follow whatever advice the group has.

I think I don't even understand what difference this makes. What are the pros and cons of integrating specs vs a stand-alone doc? I suspect that short-term we're fine maintaining this as a seperate doc?

@annevk
Copy link
Member

annevk commented Jul 20, 2021

It's a lot easier to incorporate a WICG draft into the HTML Standard than a WG deliverable. We have a process of sorts for the latter (see https://github.com/w3c/whatwg-coord), but it's not straightforward and if we already know ahead of time where something should go I would prefer to avoid it.

@mikewest
Copy link
Member

With my chair hat off, and process concerns to the side:

Stand-alone documents seem like the most developer-friendly way of explaining a feature, especially when that feature is somewhat decoupled as a concept in itself. It seems reasonable to me for us to end up with some sort of document the WG produces that outlines the interface, the considerations that went into it, and the way it works. In particular, the Sanitizer interface, as well as the SanitizerConfig dictionary and the the list of built-in constants seem quite separable from HTML. The algorithms seem generally separable as well, depending on bits of HTML's public interface (though there may be private details there I'm not familiar with).

At the same time, there are maintenance benefits to putting things into HTML that belong in HTML. Skimming through the document, the Element.setHTML method jumps out as something that's monkey-patching HTML (or DOM Parsing, similar to the InnerHTML mixin, or the Element extensions?). Does anything else jump out at you as being more clearly HTMLish, @annevk?

Regarding the process concerns:

I agree that it would be best to do the work ahead of time to make whatever work we think should flow into WHATWG spec as straightforward as possible. It might be reasonable to explicitly describe those sections as monkey-patches, and to put them into a non-normative "Integration with HTML" section so that we can extract PRs easily later. @wseltzer, @samuelweiler, and/or @sideshowbarker might have helpful opinions on how to grease that particular track.

@annevk
Copy link
Member

annevk commented Jul 21, 2021

It makes more sense to me that parsing and sanitizing is co-located so they can evolve together. E.g., if we added a new element that can execute script, it would make sense to address that at the same time. I think web developers would benefit most from a clear article on MDN, not a specification.

@domenic
Copy link

domenic commented Jul 21, 2021

I've noticed this bias toward developer-friendly explanations in WebAppSec specs before, and how it causes these sort of integration and maintenance concerns. E.g. https://w3c.github.io/webappsec-secure-contexts/ (only section 3 is normative) and https://wicg.github.io/cross-origin-embedder-policy/ (the normative sections 2 and 3 there are obsolete and have been moved into HTML; see whatwg/html#5849).

These non-normative sections contain great material, and I am glad they are written by domain experts. But in my opinion this explainer-type stuff should be done in an explainer or MDN article. Whereas the normative spec text should be written in such a way as to allow best maintenance and integration.

@mikewest
Copy link
Member

I look forward to continuing to disagree with y'all. :) I'm happy for good content from specifications to make its way into documentation on MDN or Stack Overflow or web.dev or personal blogs, or etc. I don't think the existence of those sites justifies specifications being purely algorithmic, nor do I think it's reasonable to expect algorithms to be comprehensible in the absence of their underlying premises.

You provided good examples of documents that have mostly or in whole moved to HTML. Let's add one that started in HTML: https://html.spec.whatwg.org/multipage/origin.html#cross-origin-opener-policies is very well defined from a technical perspective, and (I'll assert) very hard to understand from any other. Or another from Fetch: https://fetch.spec.whatwg.org/#cross-origin-resource-policy-header is in a similar situation.

IMO, specifications are best when they answer the question of "Why?" alongside the "What?" To that end, a document that encompasses the considerations that went into the mechanism that we ended up compromising on would be a substantial boon. I don't claim to always do this well, and the counterpoints you're making about maintenance are real. But I think giving readers a full picture is critical, and I worry that's a bar we're missing.

@mikewest
Copy link
Member

(I should have mentioned: I'm pointing to COOP and CORP specifically not to blame the authors (who did excellent work!), but precisely because I feel responsible for them in one way or another, and see them as examples of an imbalance between explanation and algorithm that I helped create. Given infinite time, they're the kind of thing that I'd love to explain better in the specification.)

@domenic
Copy link

domenic commented Jul 22, 2021

Can you say more on why you think it's important for non-normative background material to live in specifications instead of explainers? I totally agree such documents should exist.

@msporny
Copy link
Member

msporny commented Jul 22, 2021

IMO, specifications are best when they answer the question of "Why?" alongside the "What?" To that end, a document that encompasses the considerations that went into the mechanism that we ended up compromising on would be a substantial boon. I don't claim to always do this well, and the counterpoints you're making about maintenance are real. But I think giving readers a full picture is critical, and I worry that's a bar we're missing.

Could. 👏 Not. 👏 Agree. 👏 More. 👏

Most specifications (W3C, WHATWG, IETF RFC) suffer from the problem that @mikewest highlights above. They don't spend enough time explaining:

"What does the ecosystem look like?"
https://w3c.github.io/vc-data-model/#ecosystem-overview

"What are the core concepts in this specification and how do they fit together?"
https://w3c.github.io/vc-data-model/#core-data-model

"What does an example use of this spec look like?"
https://w3c.github.io/vc-data-model/#concrete-lifecycle-example

So, a huge -1 to dense specifications that are purely algorithmic. There is no excuse for bad writing habits and vomiting WebIDL+algorithms into a document and calling it a day.

I know it takes a lot of work to write these things down, and sometimes your fellow WG members make it difficult to speak plainly, but it has always bothered me that folks seem to think "where you simply explain why you created the thing and how it's intended to be used" goes in another document... and just wanted to take a second to support @mikewest in his righteous quest to request that spec editors write with more empathy.

Thank you @mikewest, you have my support.

@mikewest
Copy link
Member

Can you say more on why you think it's important for non-normative background material to live in specifications instead of explainers?

Same reason I think it's important to put comments in code.

@wseltzer
Copy link
Member

On the process questions, a Working Group can discuss work that remains hosted in a CG, or can adopt it into the WG. Either can move to WHATWG if that's the appropriate place for ongoing specification. Some have expressed that the WG should publish a Candidate Recommendation Snapshot (Patent Review Draft) before migrating work, to solidify a broader set of royalty-free patent commitments.

@mikewest
Copy link
Member

Re-reading @msporny's comment above, the tone jumped out at me in a way that I should have addressed when I skimmed it this morning. Conclusions to the side, I don't think it's acceptable to characterize the examples we've discussed above as "bad writing habits" or "vomiting WebIDL+algorithms into a document and calling it a day", or to talk about good-faith disagreements about the best way to reach developers as "WG members mak[ing] it difficult to speak plainly". Please comment here with empathy.

@msporny
Copy link
Member

msporny commented Jul 23, 2021

Please comment here with empathy.

I apologize, the perceived tone was not my intent (and I was certainly not saying any of those things of the people in this thread)... I was also not talking about the examples that are being discussed here (I didn't even look at them), but rather examples elsewhere that seemed to meet @mikewest's description (without wanting to identify them directly and turn this into a finger pointing exercise).

It seemed like @mikewest's point was not being acknowledged, and I merely wanted to provide another voice in support of what he was saying.

My comments were meant to be delivered as a passionate plea to address the concerns @mikewest was expressing without making any sort of value judgements or pointing fingers at specific specs or people. I apologize for falling short of that goal and in the future will try to consider how my statements could unintendedly and negatively impact the people doing the work.

@domenic
Copy link

domenic commented Jul 23, 2021

I appreciate the apology, although I'm still not inclined to engage much in this thread given the direction it took.

All I'll say before departing is that I worry a lot about the confusion caused by these extra documents post-integration, especially if they get promoted to official working group products (e.g. via Patent Review Drafts). whatwg/html#5849 remains open, and the maintenance story around secure contexts was very frustrating and remains split across specifications to this day (with, roughly, the parts that so far haven't needed maintenance staying in WebAppSec who does not maintain them, vs. the parts that need maintenance moving to HTML). It would be much better if there were a single source of truth for these concepts in spec-land at least, so that people didn't see two specs for the same concept.

In other words, we're only just starting to get a handle on the confusion in https://wiki.whatwg.org/wiki/Fork_tracking. So creating a kind of inverse problem where things start out in w3c.github.io, wicg.github.io, or w3.org/TR, and have to move to HTML to get proper maintenance while leaving the old URLs full of out-of-sync non-normative documentation, is not a good outcome.

I'd strongly encourage the community to determine (1) in the long term, who will maintain a given work product (e.g. this has clearly not been WebAppSec for examples such as CORP, COEP, secure contexts, and referrer policy); (2) if the long-term maintenance is to be in HTML or Fetch or similar, do the work up front to prepare the way for there to be only one specification document at the end of the day. This means a commitment to upstreaming once incubation is finished, and a commitment to removing or redirecting all the original URLs.

And if you want to ensure the eventual, maintained specification has some non-normative "code comments" in it, then please be sure to include that in your upstreaming process. Don't do the current thing where a mostly-nonnormative document sits around in WebAppSec or WICG URL space for all time.

@msporny
Copy link
Member

msporny commented Jul 23, 2021

Don't do the current thing where a mostly-nonnormative document sits around in WebAppSec or WICG URL space for all time.

I do want to support @domenic's frustration here as well. Having up-to-date approachable explanations on things people are actively using is a problem w/ W3C TRs and some CG output (in general). I know many of the specs I've worked on suffer from this problem to one degree or another, largely due to WGs becoming inactive (and so @domenic's point about groups needing to do pre-planning on maintenance also resonates)y. That's partly the reason that Process 2020 allows updates to be made to specifications (even normative ones) and why Maintenance Mode WGs seem to be on the rise.

Sometimes this is a WG issue (because the decision is now up to the WGs, mostly, to continue maintenance)... and sometimes it's an availability issue (busy people are... busy, even with the best of intentions).

In an ideal world, specifications would be readable by people with a basic understanding in the space and those explanations would be kept up to date... and that's not necessarily a replacement for a clear article on MDN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants