-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should the 'empty cell' definition ignore Default_ignoreable code points? #4854
Comments
Hmm. It's not clear to me what the right thing to do here is. These definitions serve rather different purposes. But, having them be aligned might be sort of nice? In particular, the CSS definition is about what is to be rendered. It's saying that certain characters should not be rendered. The HTML definition is about the data model. In particular the only usage of that definition is in the line "Remove all the empty cells from the header list." So it's trying to say that if you have an only-whitespace table header, then it's not a real table header; it's just kind of hanging out. For example the one in the corner of the following table:
The rules for what should be rendered on the screen, and the rules for interpreting the data model, are distinct. Generally we separate style and content so there's no a-priori reason why they should be the same. The more relevant question is, if someone puts a zero width space or zero width no-break space in that upper-left corner, should it still count as "not a real table header"? Probably, but it's not 100% obvious, since the author went and did something very specifically outside the beaten path of denoting not-real-headers using whitespace. Additionally, if we're going to increase the set of characters that allow you to denote something as "not a real table header", is default-ignorables really the right cutoff? I feel like there's a whole bunch of Unicode characters which, if they were the sole occupants of a So I'm not opposed to this change, but I'm also not sure it's worth tweaking this, and I find the reasoning so far a bit weaker than I'd like. /cc @tabatkins @fantasai if they have thoughts from the CSS side. |
Wrt Default_Ignorable: that line isn't saying CSS shouldn't be paying attention to those characters ever, only that they are ignored when rendering the text. I'll clarify that point in css-text-3. (Also notice that this rule only applies to unsupported Default_ignorable codepoints.) Wrt what HTML should say: imho the only characters that HTML should ignore as not being significant content are the document white space characters. This definition should be aligned with the Selectors definition of |
…ring, not for all of CSS. whatwg/html#4854
+1 for making them consistent. My reason for raising this is that there are inconsistencies in white space definitions between various specs, and this can lead to weird situations where one spec considers something empty, but another doesn't. One of the main consumers of the headers algorithm that uses the 'empty cells' definition are accessibility APIs - they also use the https://www.w3.org/TR/accname-1.1/ algorithm to compute the value of various elements, including table headers. Problems arise if various specs disagree on what empty or whitespace means in document content (e.g.
U+FEFF Zero Width No Break Space can appear when the author doesn't intend it because use as a UTF-16 BOM means it can sneak into the middle of content. For example, saving a blank Windows Notepad document produces a file containing only U+FEFF, which can appear in the middle of content by careless use of Edit: It's particularly hard to know this has happened because you usually need a hexdump to see the zero width character. |
I agree with @fantasai that HTML using White_Space rather than ASCII whitespace is suspect. Anything that is White_Space but not ASCII whitespace would already be outside the "beaten path". Created #4860. |
For semantics of something being empty (table cells in this case) we should only consider ASCII whitespace, as we do elsewhere. Fixes #4854.
There's also the a) the 'empty cells' definition in HTML and the This might be difficult to resolve. Looking at the definitions it seems that
My reading of the spec is .. but Edit: I had mis-read the spec, but I think there's still a difference, and edited the example to show this |
Some of these problems seem to be layering issues. The original intent of the header cells algorithm in HTML 4 was rendering by non-visual user agents (i.e. screen readers) So the question is - is this algorithm in the correct layer? Does anything other than a screen reader need to know about the association of cells and headers? If not, is the HTML spec the right place for a screen reader rendering algorithm? Screen readers do what they say on the tin and read out the screen, so they need to take screen rendering into account (e.g. CSS As currently specified the table headers algorithm discards empty cells that have been given a screen reader name by other means (e.g. HTML For example:
|
I think we should have some kind of algorithm that does not rely on CSS for this obtaining this information, as it's rather intrinsic part of data tables. The definition seems to match It is a little weird that |
Agree with @annevk. I think the most important thing is to keep Fwiw,
Yes, anything that's trying to parse out or transform the data needs to know that.
No, but it's the right place to define the association of header and data cells. |
Can you give me a couple of days on this? I'm investigating a related whitespace issue which might impact this. |
Sure thing. |
I'm happy with the outcome in #4860 - it makes things more consistent. I've been investigating the interop story for whitespace between different specs - it's not good: For ASCII whitespace different specs disagree on whether these are whitespace:
There are definite implementation dangers to trap the unwary - using a RegEx pattern like \s or an built-in isspace() function may cause unexpected results in a tokenizer. |
XHTML 1.0 and HTML 4.01 are obsolete and processed with an HTML5 processor so that's fortunately no longer an issue. JavaScript's whitespace definition is based on that of Unicode and definitely different therefore. |
For semantics of something being empty (table cells in this case) we should only consider ASCII whitespace, as we do elsewhere. Fixes #4854.
For semantics of something being empty (table cells in this case) we should only consider ASCII whitespace, as we do elsewhere. Fixes #4854.
The definition used to discard empty table header cells doesn't discard cells containing only Default_ignorable characters:
https://html.spec.whatwg.org/multipage/tables.html#empty-cell
... but the CSS spec says that:
This leads to the following table having :
The text was updated successfully, but these errors were encountered: