Skip to content

Commit

Permalink
Clarify instance language around decoders and encoders
Browse files Browse the repository at this point in the history
And also stop defaulting error mode in "run" and "process".

Fixes #240.
  • Loading branch information
annevk authored Oct 26, 2020
1 parent c55584b commit 47a3e55
Showing 1 changed file with 55 additions and 66 deletions.
121 changes: 55 additions & 66 deletions encoding.bs
Original file line number Diff line number Diff line change
Expand Up @@ -238,18 +238,17 @@ This specification does not provide wrapper algorithms that would combine with <
<h3 id=encoders-and-decoders>Encoders and decoders</h3>

<p>Each <a for=/>encoding</a> has an associated <dfn>decoder</dfn> and most of them have an
associated <dfn>encoder</dfn>. Each <a for=/>decoder</a> and <a for=/>encoder</a> have a
<dfn>handler</dfn> algorithm. A <a>handler</a> algorithm takes an input
associated <dfn>encoder</dfn>. Instances of <a for=/>decoders</a> and <a for=/>encoders</a> have a
<dfn>handler</dfn> algorithm and might also have state. A <a>handler</a> algorithm takes an input
<a for=/>I/O queue</a> and an <a for=list>item</a>, and returns
<dfn>finished</dfn>, one or more <a for=list>items</a>, <dfn>error</dfn>
optionally with a <a>code point</a>, or <dfn>continue</dfn>.

<p class="note no-backref">The <a>replacement</a> and <a>UTF-16BE/LE</a> <a for=/>encodings</a> have
no <a for=/>encoder</a>.

<p>An <dfn>error mode</dfn> as used below is "<code>replacement</code>" (default) or
"<code>fatal</code>" for a <a for=/>decoder</a> and "<code>fatal</code>" (default) or
"<code>html</code>" for an <a for=/>encoder</a>.
<p>An <dfn>error mode</dfn> as used below is "<code>replacement</code>" or "<code>fatal</code>" for
a <a for=/>decoder</a> and "<code>fatal</code>" or "<code>html</code>" for an <a for=/>encoder</a>.

<p class=note>An XML processor would set <a for=/>error mode</a> to "<code>fatal</code>".
[[XML]]
Expand All @@ -264,24 +263,17 @@ happening.
[[HTML]]

<p>To <dfn id=concept-encoding-run>run</dfn> an <a for=/>encoding</a>'s <a for=/>decoder</a> or
<a for=/>encoder</a> <var>encoderDecoder</var> with input <a for=/>I/O queue</a> <var>input</var>,
output <a for=/>I/O queue</a> <var>output</var>, and optional <a for=/>error mode</a>
<a for=/>encoder</a> instance <var>encoderDecoder</var> with <a for=/>I/O queue</a>
<var>input</var>, <a for=/>I/O queue</a> <var>output</var>, and <a for=/>error mode</a>
<var>mode</var>, run these steps:

<ol>
<li><p>If <var>mode</var> is not given, then set it to "<code>replacement</code>" if
<var>encoderDecoder</var> is a <a for=/>decoder</a>, otherwise "<code>fatal</code>".

<li><p>Let <var>encoderDecoderInstance</var> be a new <var>encoderDecoder</var>.

<li>
<p>While true:

<ol>
<li><p>Let <var>result</var> be the result of
<a>processing</a> the result of
<a>reading</a> from <var>input</var> for
<var>encoderDecoderInstance</var>, <var>input</var>, <var>output</var>, and
<li><p>Let <var>result</var> be the result of <a>processing</a> the result of <a>reading</a> from
<var>input</var> for <var>encoderDecoder</var>, <var>input</var>, <var>output</var>, and
<var>mode</var>.

<li><p>If <var>result</var> is not <a>continue</a>, then return <var>result</var>.
Expand All @@ -290,28 +282,23 @@ output <a for=/>I/O queue</a> <var>output</var>, and optional <a for=/>error mod

<p>To <dfn id=concept-encoding-process>process</dfn> an <a for=list>item</a> <var>item</var> for an
<a for=/>encoding</a>'s <a for=/>encoder</a> or <a for=/>decoder</a> instance
<var>encoderDecoderInstance</var>, <a for=/>I/O queue</a> <var>input</var>, output
<a for=/>I/O queue</a> <var>output</var>, and optional <a for=/>error mode</a> <var>mode</var>, run
these steps:
<var>encoderDecoder</var>, <a for=/>I/O queue</a> <var>input</var>, <a for=/>I/O queue</a>
<var>output</var>, and <a for=/>error mode</a> <var>mode</var>, run these steps:

<ol>
<li><p>If <var>mode</var> is not given, then set it to "<code>replacement</code>" if
<var>encoderDecoderInstance</var> is a <a for=/>decoder</a> instance, otherwise
"<code>fatal</code>".

<li><p>Assert: if <var>encoderDecoderInstance</var> is an <a for=/>encoder</a> instance,
<var>mode</var> is not "<code>replacement</code>".
<li><p>Assert: if <var>encoderDecoder</var> is an <a for=/>encoder</a> instance, <var>mode</var> is
not "<code>replacement</code>".

<li><p>Assert: if <var>encoderDecoderInstance</var> is a <a for=/>decoder</a> instance,
<var>mode</var> is not "<code>html</code>".
<li><p>Assert: if <var>encoderDecoder</var> is a <a for=/>decoder</a> instance, <var>mode</var> is
not "<code>html</code>".

<li><p>Assert: if <var>encoderDecoderInstance</var> is an <a for=/>encoder</a> instance,
<var>item</var> is not a <a>surrogate</a>.
<li><p>Assert: if <var>encoderDecoder</var> is an <a for=/>encoder</a> instance, <var>item</var> is
not a <a>surrogate</a>.

<li><p>Let <var>result</var> be the result of running <var>encoderDecoderInstance</var>'s
<a>handler</a> on <var>input</var> and <var>item</var>.
<li><p>Let <var>result</var> be the result of running <var>encoderDecoder</var>'s <a>handler</a> on
<var>input</var> and <var>item</var>.

<li><p>If <var>result</var> is <a>continue</a>, return <var>result</var>.
<li><p>If <var>result</var> is <a>continue</a>, then return <var>result</var>.

<li>
<p>Otherwise, if <var>result</var> is <a>finished</a>:
Expand All @@ -327,8 +314,8 @@ these steps:
<p>Otherwise, if <var>result</var> is one or more <a for=list>items</a>:

<ol>
<li><p>Assert: if <var>encoderDecoderInstance</var> is a <a for=/>decoder</a> instance,
<var>result</var> does not contain any <a>surrogates</a>.
<li><p>Assert: if <var>encoderDecoder</var> is a <a for=/>decoder</a> instance, <var>result</var>
does not contain any <a>surrogates</a>.

<li><p><a>Push</a> <var>result</var> to <var>output</var>.
</ol>
Expand Down Expand Up @@ -1005,8 +992,8 @@ queue of scalar values <var>output</var> (default « »), run these steps:
<li><p>If <var>buffer</var> does not match 0xEF 0xBB 0xBF, <a>prepend</a> <var>buffer</var> to
<var>ioQueue</var>.

<li><p><a>Run</a> <a>UTF-8</a>'s <a for=/>decoder</a> with <var>ioQueue</var> and
<var>output</var>.
<li><p><a>Run</a> an instance of <a>UTF-8</a>'s <a for=/>decoder</a> with <var>ioQueue</var>,
<var>output</var>, and "<code>replacement</code>".

<li><p>Return <var>output</var>.
</ol>
Expand All @@ -1015,8 +1002,8 @@ queue of scalar values <var>output</var> (default « »), run these steps:
optional I/O queue of scalar values <var>output</var> (default « »), run these steps:

<ol>
<li><p><a>Run</a> <a>UTF-8</a>'s <a for=/>decoder</a> with <var>ioQueue</var> and
<var>output</var>.
<li><p><a>Run</a> an instance of <a>UTF-8</a>'s <a for=/>decoder</a> with <var>ioQueue</var>,
<var>output</var>, and "<code>replacement</code>".

<li><p>Return <var>output</var>.
</ol>
Expand All @@ -1028,7 +1015,7 @@ given an optional I/O queue of scalar values <var>output</var> (default « »),
-->

<ol>
<li><p>Let <var>potentialError</var> be the result of <a>running</a> <a>UTF-8</a>'s
<li><p>Let <var>potentialError</var> be the result of <a>running</a> an instance of <a>UTF-8</a>'s
<a for=/>decoder</a> with <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".

<li><p>If <var>potentialError</var> is an <a>error</a>, then return failure.
Expand Down Expand Up @@ -1078,8 +1065,8 @@ these steps:
than anything else. In a context where HTTP is used this is in violation of the semantics of the
`<code>Content-Type</code>` header.

<li><p><a>Run</a> <var>encoding</var>'s <a for=/>decoder</a> with <var>ioQueue</var> and
<var>output</var>.
<li><p><a>Run</a> an instance of <var>encoding</var>'s <a for=/>decoder</a> with
<var>ioQueue</var>, <var>output</var>, and "<code>replacement</code>".

<li><p>Return <var>output</var>.
</ol>
Expand Down Expand Up @@ -1135,12 +1122,12 @@ is safe as it never triggers <a>errors</a>. [[HTML]]
<ol>
<li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>.

<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.
<li><p>Return an instance of <var>encoding</var>'s <a for=/>encoder</a>.
</ol>

<p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these
steps:
<a for=/>encoder</a> instance <var>encoder</var> and an I/O queue of bytes <var>output</var>, run
these steps:

<ol>
<li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with
Expand All @@ -1156,10 +1143,10 @@ steps:

<div class=note id=pit-of-iso-2022-jp>
<p>This is a legacy hook for URL percent-encoding. The caller will have to keep an
<a for=/>encoder</a> alive as the <a>ISO-2022-JP encoder</a> can be in two different states when
returning an <a>error</a>. That also means that if the caller emits bytes to encode the error in
some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, 0x1B, 0x5C,
and 0x7E. [[URL]]
<a for=/>encoder</a> instance alive as the <a>ISO-2022-JP encoder</a> can be in two different
states when returning an <a>error</a>. That also means that if the caller emits bytes to encode the
error in some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F,
0x1B, 0x5C, and 0x7E. [[URL]]

<p>In particular, if upon returning an <a>error</a> the <a>ISO-2022-JP encoder</a> is in the
<a lt="ISO-2022-JP decoder Roman">Roman</a> state, the caller cannot output 0x5C (\) as it will not
Expand All @@ -1171,7 +1158,7 @@ steps:

<p>The return value is either the number representing the <a>code point</a> that could not be
encoded or null, if there was no <a>error</a>. When it returns non-null the caller will have to
invoke it again, supplying the same <a for=/>encoder</a> and a new output I/O queue.
invoke it again, supplying the same <a for=/>encoder</a> instance and a new output I/O queue.
</div>


Expand Down Expand Up @@ -1268,7 +1255,7 @@ interface mixin TextDecoderCommon {
<dd>An <a for=/>encoding</a>.

<dt><dfn for=TextDecoderCommon oldids=textdecoder-decoder,textdecoderstream-decoder>decoder</dfn>
<dd>A <a for=/>decoder</a>.
<dd>A <a for=/>decoder</a> instance.

<dt><dfn for=TextDecoderCommon oldids=textdecoder-stream,textdecoderstream-stream,textdecodercommon-stream>I/O queue</dfn>
<dd>An <a for=/>I/O queue</a> of bytes.
Expand Down Expand Up @@ -1419,10 +1406,10 @@ method steps are:

<ol>
<li><p>If <a>this</a>'s <a for=TextDecoder>do not flush</a> is false, then set <a>this</a>'s
<a for=TextDecoderCommon>decoder</a> to a new <a for=/>decoder</a> for <a>this</a>'s
<a for=TextDecoderCommon>encoding</a>, <a>this</a>'s <a for=TextDecoderCommon>I/O queue</a> to the
<a for=/>I/O queue</a> of bytes « <a>end-of-queue</a> », and <a>this</a>'s
<a for=TextDecoderCommon>BOM seen</a> to false.
<a for=TextDecoderCommon>decoder</a> to a new instance of <a>this</a>'s
<a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>, <a>this</a>'s
<a for=TextDecoderCommon>I/O queue</a> to the <a for=/>I/O queue</a> of bytes
« <a>end-of-queue</a> », and <a>this</a>'s <a for=TextDecoderCommon>BOM seen</a> to false.

<li><p>Set <a>this</a>'s <a for=TextDecoder>do not flush</a> to
<var>options</var>["{{TextDecodeOptions/stream}}"].
Expand Down Expand Up @@ -1554,8 +1541,8 @@ constructor steps are to do nothing.
<li><p>Let <var>item</var> be the result of
<a>reading</a> from <var>input</var>.

<li><p>Let <var>result</var> be the result of <a>processing</a> <var>item</var> for the
<a>UTF-8 encoder</a>, <var>input</var>, <var>output</var>.
<li><p>Let <var>result</var> be the result of <a>processing</a> <var>item</var> for an instance
of the <a>UTF-8 encoder</a>, <var>input</var>, <var>output</var>, and "<code>fatal</code>".

<li>
<p>Assert: <var>result</var> is not <a>error</a>.
Expand All @@ -1582,6 +1569,8 @@ method steps are:
<a lt="get a reference to the buffer source">getting a reference to the bytes held by</a>
<var>destination</var>.

<li><p>Let <var>encoder</var> be an instance of the <a>UTF-8 encoder</a>.

<li>
<p>Let <var>unused</var> be the <a for=/>I/O queue</a> of scalar values « <a>end-of-queue</a> ».

Expand All @@ -1597,8 +1586,8 @@ method steps are:
<ol>
<li><p>Let <var>item</var> be the result of <a>reading</a> from <var>source</var>.

<li><p>Let <var>result</var> be the result of running the <a>UTF-8 encoder</a>'s <a>handler</a>
on <var>unused</var> and <var>item</var>.
<li><p>Let <var>result</var> be the result of running <var>encoder</var>'s <a>handler</a> on
<var>unused</var> and <var>item</var>.

<li><p>If <var>result</var> is <a>finished</a>, then <a for=iteration>break</a>.

Expand Down Expand Up @@ -1738,8 +1727,8 @@ constructor steps are:
<li><p>set <a>this</a>'s <a for=TextDecoderCommon>ignore BOM</a> to
<var>options</var>["{{TextDecoderOptions/ignoreBOM}}"].

<li><p>Set <a>this</a>'s <a for=TextDecoderCommon>decoder</a> to a new <a for=/>decoder</a> for
<a>this</a>'s <a for=TextDecoderCommon>encoding</a>, and set <a>this</a>'s
<li><p>Set <a>this</a>'s <a for=TextDecoderCommon>decoder</a> to a new instance of <a>this</a>'s
<a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>, and set <a>this</a>'s
<a for=TextDecoderCommon>I/O queue</a> to a new <a for=/>I/O queue</a>.

<li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
Expand Down Expand Up @@ -1846,7 +1835,7 @@ TextEncoderStream includes GenericTransformStream;

<dl>
<dt><dfn for=TextEncoderStream>encoder</dfn>
<dd>An <a for=/>encoder</a>.
<dd>An <a for=/>encoder</a> instance.

<dt><dfn for=TextEncoderStream>pending high surrogate</dfn>
<dd>Null or a <a for=/>surrogate</a>, initially null.
Expand Down Expand Up @@ -1887,8 +1876,8 @@ textReadable
constructor steps are:

<ol>
<li><p>Set <a>this</a>'s <a for=TextEncoderStream>encoder</a> to <a>UTF-8</a>'s
<a for=/>encoder</a>.
<li><p>Set <a>this</a>'s <a for=TextEncoderStream>encoder</a> to an instance of the
<a>UTF-8 encoder</a>.

<li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
and runs the <a>encode and enqueue a chunk</a> algorithm with <a>this</a> and <var>chunk</var>.
Expand Down Expand Up @@ -1953,8 +1942,8 @@ constructor steps are:
value</a> algorithm with <var>encoder</var>, <var>item</var> and <var>input</var>.

<li><p>If <var>result</var> is not <a>continue</a>, then <a>process</a> <var>result</var> for
<a for=TextEncoderStream>encoder</a>, <var>input</var>, <var>output</var>.

<var>encoder</var>'s <a for=TextEncoderStream>encoder</a>, <var>input</var>, <var>output</var>,
and "<code>fatal</code>".
</ol>
</ol>

Expand Down Expand Up @@ -2023,7 +2012,7 @@ that are split between strings. [[!INFRA]]
to be more accurate in deployed content. Therefore it is not part of the <a>UTF-8 decoder</a>
algorithm but rather the <a>decode</a> and <a>UTF-8 decode</a> algorithms.

<p><a>UTF-8</a>'s <a for=/>decoder</a>'s has an associated
<p><a>UTF-8</a>'s <a for=/>decoder</a> has an associated
<dfn>UTF-8 code point</dfn>, <dfn>UTF-8 bytes seen</dfn>, and
<dfn>UTF-8 bytes needed</dfn> (all initially 0), a <dfn>UTF-8 lower boundary</dfn>
(initially 0x80), and a <dfn>UTF-8 upper boundary</dfn> (initially 0xBF).
Expand Down

0 comments on commit 47a3e55

Please sign in to comment.