Skip to content
This repository has been archived by the owner on Jul 30, 2019. It is now read-only.

UTF-8 All The Things #1273

Merged
merged 9 commits into from
Mar 29, 2018
Merged

UTF-8 All The Things #1273

merged 9 commits into from
Mar 29, 2018

Conversation

edent
Copy link
Member

@edent edent commented Mar 1, 2018

edent and others added 2 commits March 1, 2018 22:13
@edent edent requested a review from chaals March 2, 2018 13:45
@chaals chaals mentioned this pull request Mar 2, 2018
Copy link
Collaborator

@chaals chaals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to make a change requiring UTF-8 from authors, we should ask for very wide review.

Meanwhile, requested a couple of small changes at least to be considered...


In addition, due to a number of restrictions on <{meta}> elements, there can only be one
<code>meta</code>-based character encoding declaration per document.
Authoring tools should default to using <a>UTF-8</a> for newly-created documents. [[!ENCODING]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we require utf-8 then this (and similar requirements) have to be a must.

@@ -1417,24 +1414,31 @@

A <dfn>character encoding declaration</dfn> is a mechanism by which the <a>character encoding</a>
used to store or transmit a document is specified.

The only acceptable character encoding declaration for the modern web is <a>UTF-8</a>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rephrase this less didactically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :-)

That is, I'm not sure how else to phrase it in order to get the point across. Suggestions welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problems outlined here go away when exclusively using UTF-8, which is one of the many reasons that is now the mandatory encoding for all things.
https://www.w3.org/TR/2018/CR-encoding-20180327/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is hardly a high-order problem.

might end up interpreting supposedly benign plain text content as HTML tags and JavaScript.
</p>
<a state for="http-equiv" lt="content-type">encoding declaration state</a>, then the character
encoding used must be an <a>ASCII-compatible encoding</a>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this just require utf-8?

The parameter's value must be one of the <a lt="character encoding">labels</a> of the <a>character encoding</a>
used to serialize the file. [[!ENCODING]]
The parameter's value must be an <a>ASCII case-insensitive</a> match for the string
"<code>utf-8</code>". [[!ENCODING]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just enforce a string, there is no normative dependency on the encoding spec here.

@edent
Copy link
Member Author

edent commented Mar 14, 2018

Regarding wide review. UTF-8 is now 91% of the web https://w3techs.com/technologies/overview/character_encoding/all

Happy to hear arguments why it shouldn't be mandated.

Copy link
Collaborator

@chaals chaals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be noted in the changes section.

<dd>The charset parameter may be provided. The parameter's value must be "<code>utf-8</code>". This parameter serves no purpose; it is only allowed for compatibility with legacy servers.</dd>
<dt><code data-x="">charset</code></dt>
<dd>The charset parameter may be provided. The parameter's value must be "<code>utf-8</code>".
This parameter serves no purpose; it is only allowed for compatibility with legacy servers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This parameter is for compatibility with legacy servers", no?

@@ -1417,24 +1414,31 @@

A <dfn>character encoding declaration</dfn> is a mechanism by which the <a>character encoding</a>
used to store or transmit a document is specified.

The only acceptable character encoding declaration for the modern web is <a>UTF-8</a>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is hardly a high-order problem.

@edent
Copy link
Member Author

edent commented Mar 29, 2018

@chaals I have updated the Changes file.

@chaals chaals merged commit 8d82a21 into master Mar 29, 2018
@r12a
Copy link

r12a commented May 31, 2018

See comments from the i18n WG on this change at #1039

@edent edent deleted the 1039-encoding-edent branch October 17, 2018 15:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants