From 47a3e55bf5ad15d02f5a228ac093e2aa4cbe010c Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 26 Oct 2020 18:23:03 +0100 Subject: [PATCH] Clarify instance language around decoders and encoders And also stop defaulting error mode in "run" and "process". Fixes #240. --- encoding.bs | 121 ++++++++++++++++++++++++---------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/encoding.bs b/encoding.bs index 7afd377..f95805e 100644 --- a/encoding.bs +++ b/encoding.bs @@ -238,8 +238,8 @@ This specification does not provide wrapper algorithms that would combine with <

Encoders and decoders

Each encoding has an associated decoder and most of them have an -associated encoder. Each decoder and encoder have a -handler algorithm. A handler algorithm takes an input +associated encoder. Instances of decoders and encoders have a +handler algorithm and might also have state. A handler algorithm takes an input I/O queue and an item, and returns finished, one or more items, error optionally with a code point, or continue. @@ -247,9 +247,8 @@ optionally with a code point, or continue.

The replacement and UTF-16BE/LE encodings have no encoder. -

An error mode as used below is "replacement" (default) or -"fatal" for a decoder and "fatal" (default) or -"html" for an encoder. +

An error mode as used below is "replacement" or "fatal" for +a decoder and "fatal" or "html" for an encoder.

An XML processor would set error mode to "fatal". [[XML]] @@ -264,24 +263,17 @@ happening. [[HTML]]

To run an encoding's decoder or -encoder encoderDecoder with input I/O queue input, -output I/O queue output, and optional error mode +encoder instance encoderDecoder with I/O queue +input, I/O queue output, and error mode mode, run these steps:

    -
  1. If mode is not given, then set it to "replacement" if - encoderDecoder is a decoder, otherwise "fatal". - -

  2. Let encoderDecoderInstance be a new encoderDecoder. -

  3. While true:

      -
    1. Let result be the result of - processing the result of - reading from input for - encoderDecoderInstance, input, output, and +

    2. Let result be the result of processing the result of reading from + input for encoderDecoder, input, output, and mode.

    3. If result is not continue, then return result. @@ -290,28 +282,23 @@ output I/O queue output, and optional error mod

      To process an item item for an encoding's encoder or decoder instance -encoderDecoderInstance, I/O queue input, output -I/O queue output, and optional error mode mode, run -these steps: +encoderDecoder, I/O queue input, I/O queue +output, and error mode mode, run these steps:

        -
      1. If mode is not given, then set it to "replacement" if - encoderDecoderInstance is a decoder instance, otherwise - "fatal". - -

      2. Assert: if encoderDecoderInstance is an encoder instance, - mode is not "replacement". +

      3. Assert: if encoderDecoder is an encoder instance, mode is + not "replacement". -

      4. Assert: if encoderDecoderInstance is a decoder instance, - mode is not "html". +

      5. Assert: if encoderDecoder is a decoder instance, mode is + not "html". -

      6. Assert: if encoderDecoderInstance is an encoder instance, - item is not a surrogate. +

      7. Assert: if encoderDecoder is an encoder instance, item is + not a surrogate. -

      8. Let result be the result of running encoderDecoderInstance's - handler on input and item. +

      9. Let result be the result of running encoderDecoder's handler on + input and item. -

      10. If result is continue, return result. +

      11. If result is continue, then return result.

      12. Otherwise, if result is finished: @@ -327,8 +314,8 @@ these steps:

        Otherwise, if result is one or more items:

          -
        1. Assert: if encoderDecoderInstance is a decoder instance, - result does not contain any surrogates. +

        2. Assert: if encoderDecoder is a decoder instance, result + does not contain any surrogates.

        3. Push result to output.

        @@ -1005,8 +992,8 @@ queue of scalar values output (default « »), run these steps:
      13. If buffer does not match 0xEF 0xBB 0xBF, prepend buffer to ioQueue. -

      14. Run UTF-8's decoder with ioQueue and - output. +

      15. Run an instance of UTF-8's decoder with ioQueue, + output, and "replacement".

      16. Return output.

      @@ -1015,8 +1002,8 @@ queue of scalar values output (default « »), run these steps: optional I/O queue of scalar values output (default « »), run these steps:
        -
      1. Run UTF-8's decoder with ioQueue and - output. +

      2. Run an instance of UTF-8's decoder with ioQueue, + output, and "replacement".

      3. Return output.

      @@ -1028,7 +1015,7 @@ given an optional I/O queue of scalar values output (default « »), -->
        -
      1. Let potentialError be the result of running UTF-8's +

      2. Let potentialError be the result of running an instance of UTF-8's decoder with ioQueue, output, and "fatal".

      3. If potentialError is an error, then return failure. @@ -1078,8 +1065,8 @@ these steps: than anything else. In a context where HTTP is used this is in violation of the semantics of the `Content-Type` header. -

      4. Run encoding's decoder with ioQueue and - output. +

      5. Run an instance of encoding's decoder with + ioQueue, output, and "replacement".

      6. Return output.

      @@ -1135,12 +1122,12 @@ is safe as it never triggers errors. [[HTML]]
      1. Assert: encoding is not replacement or UTF-16BE/LE. -

      2. Return encoding's encoder. +

      3. Return an instance of encoding's encoder.

      To encode or fail an I/O queue of scalar values ioQueue given an -encoder encoder and an I/O queue of bytes output, run these -steps: +encoder instance encoder and an I/O queue of bytes output, run +these steps:

      1. Let potentialError be the result of running encoder with @@ -1156,10 +1143,10 @@ steps:

        This is a legacy hook for URL percent-encoding. The caller will have to keep an - encoder alive as the ISO-2022-JP encoder can be in two different states when - returning an error. That also means that if the caller emits bytes to encode the error in - some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, 0x1B, 0x5C, - and 0x7E. [[URL]] + encoder instance alive as the ISO-2022-JP encoder can be in two different + states when returning an error. That also means that if the caller emits bytes to encode the + error in some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, + 0x1B, 0x5C, and 0x7E. [[URL]]

        In particular, if upon returning an error the ISO-2022-JP encoder is in the Roman state, the caller cannot output 0x5C (\) as it will not @@ -1171,7 +1158,7 @@ steps:

        The return value is either the number representing the code point that could not be encoded or null, if there was no error. When it returns non-null the caller will have to - invoke it again, supplying the same encoder and a new output I/O queue. + invoke it again, supplying the same encoder instance and a new output I/O queue.

        @@ -1268,7 +1255,7 @@ interface mixin TextDecoderCommon {
        An encoding.
        decoder -
        A decoder. +
        A decoder instance.
        I/O queue
        An I/O queue of bytes. @@ -1419,10 +1406,10 @@ method steps are:
        1. If this's do not flush is false, then set this's - decoder to a new decoder for this's - encoding, this's I/O queue to the - I/O queue of bytes « end-of-queue », and this's - BOM seen to false. + decoder to a new instance of this's + encoding's decoder, this's + I/O queue to the I/O queue of bytes + « end-of-queue », and this's BOM seen to false.

        2. Set this's do not flush to options["{{TextDecodeOptions/stream}}"]. @@ -1554,8 +1541,8 @@ constructor steps are to do nothing.

        3. Let item be the result of reading from input. -

        4. Let result be the result of processing item for the - UTF-8 encoder, input, output. +

        5. Let result be the result of processing item for an instance + of the UTF-8 encoder, input, output, and "fatal".

        6. Assert: result is not error. @@ -1582,6 +1569,8 @@ method steps are: getting a reference to the bytes held by destination. +

        7. Let encoder be an instance of the UTF-8 encoder. +

        8. Let unused be the I/O queue of scalar values « end-of-queue ». @@ -1597,8 +1586,8 @@ method steps are:

          1. Let item be the result of reading from source. -

          2. Let result be the result of running the UTF-8 encoder's handler - on unused and item. +

          3. Let result be the result of running encoder's handler on + unused and item.

          4. If result is finished, then break. @@ -1738,8 +1727,8 @@ constructor steps are:

          5. set this's ignore BOM to options["{{TextDecoderOptions/ignoreBOM}}"]. -

          6. Set this's decoder to a new decoder for - this's encoding, and set this's +

          7. Set this's decoder to a new instance of this's + encoding's decoder, and set this's I/O queue to a new I/O queue.

          8. Let transformAlgorithm be an algorithm which takes a chunk argument @@ -1846,7 +1835,7 @@ TextEncoderStream includes GenericTransformStream;

            encoder -
            An encoder. +
            An encoder instance.
            pending high surrogate
            Null or a surrogate, initially null. @@ -1887,8 +1876,8 @@ textReadable constructor steps are:
              -
            1. Set this's encoder to UTF-8's - encoder. +

            2. Set this's encoder to an instance of the + UTF-8 encoder.

            3. Let transformAlgorithm be an algorithm which takes a chunk argument and runs the encode and enqueue a chunk algorithm with this and chunk. @@ -1953,8 +1942,8 @@ constructor steps are: value algorithm with encoder, item and input.

            4. If result is not continue, then process result for - encoder, input, output. - + encoder's encoder, input, output, + and "fatal".

          @@ -2023,7 +2012,7 @@ that are split between strings. [[!INFRA]] to be more accurate in deployed content. Therefore it is not part of the UTF-8 decoder algorithm but rather the decode and UTF-8 decode algorithms. -

          UTF-8's decoder's has an associated +

          UTF-8's decoder has an associated UTF-8 code point, UTF-8 bytes seen, and UTF-8 bytes needed (all initially 0), a UTF-8 lower boundary (initially 0x80), and a UTF-8 upper boundary (initially 0xBF).