From b7b1a013421d9291a1ef468517b9370dfc66a18a Mon Sep 17 00:00:00 2001 From: Terence Eden Date: Mon, 9 Jul 2018 08:55:00 +0100 Subject: [PATCH] Reinstate legacy encoding information (#1504) Fixes #1039 --- sections/iana.include | 5 ++++- sections/semantics-document-metadata.include | 6 +++++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/sections/iana.include b/sections/iana.include index 9b89f0ad71..5ce7cc0fc0 100644 --- a/sections/iana.include +++ b/sections/iana.include @@ -37,8 +37,11 @@ :: The charset parameter may be provided to specify the document's character encoding, overriding any [=character encoding declarations=] in the document other than a Byte Order Mark (BOM). - The parameter's value must be an ASCII case-insensitive match for the string + For newly created documents, the parameter's value must be an ASCII case-insensitive match for the string "utf-8". + For legacy documents, the character encoding name given must be an + ASCII case-insensitive match for one of the labels + of the character encoding used to serialize the file. [[!ENCODING]] : Encoding considerations: :: 8bit (see the section on [=character encoding declarations=]) : Security considerations: diff --git a/sections/semantics-document-metadata.include b/sections/semantics-document-metadata.include index 4d16b64b52..c0036207d4 100644 --- a/sections/semantics-document-metadata.include +++ b/sections/semantics-document-metadata.include @@ -1417,7 +1417,7 @@ Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8. [[!ENCODING]] - The following restrictions apply to [=character encoding declarations=]: + The following restrictions apply to all [=character encoding declarations=]: * The character encoding declaration must be serialized without the use of character references or character escapes of any kind. @@ -1426,6 +1426,10 @@ * Due to a number of restrictions on <{meta}> elements, there can only be one meta-based character encoding declaration per document. + For legacy documents, the character encoding name given must be an ASCII case-insensitive + match for one of the labels of the character encoding used + to serialize the file. [[!ENCODING]] + Authoring tools must default to using UTF-8 for newly-created documents. [[!ENCODING]] If an HTML document does not start with a BOM, and its encoding is not explicitly