Document how to handle CBOR objects concatenated in a stream #30

Sekenre · 2018-11-06T19:08:35Z

An EOF while decoding should not be used to signal the end of a stream.

If you're decoding a set of CBOR objects that are just concatenated together in a file without delimiters then do this:

from cbor2 import CBORDecoder
with open('myfile.cbor', 'rb') as f:
    decoder = CBORDecoder(f)
    while f.peek(1): # Check that there's data at the next file position
        print(decoder.decode())
    print('Done!')

Otherwise wrap the items in an indefinite length array.

This is equivalent to how you would decode cbor from a network socket or queue, you always expect a complete item and if you don't it's corrupted data. Hence EOF errors do not make sense to pass up to the reader.

Sekenre · 2018-11-07T10:56:55Z

According to some discussion on the CBOR mailing list, Most encoders should stop when they reach the end of the outermost item or report 'garbage' at the end of the file.

Sekenre · 2019-10-28T15:45:46Z

This will be fixed once pull request #61 has been finalized. Calling loads(sequence=True) and expecting an indefinite length array seems like the way to go.

Sekenre · 2019-10-29T10:53:09Z

@waveform80 gave a detailed breakdown of why calling cbor2.load(f) multiple times to iterate over a concatenated stream of bare CBOR objects is a good option.

#61 (comment)

mikenerone · 2020-04-10T16:54:01Z

Just a thought: it would be convenient if a CBORDecoder was an iterable that yielded successive decode() results.

davepeck · 2024-11-20T00:34:47Z

Everywhere I use cbor2 I find myself rewriting these two methods:

import typing as t
import cbor2
from io import BytesIO

def cbor_load_seq(
    fp: t.IO[bytes],
    tag_hook: t.Callable[[cbor2.CBORDecoder, cbor2.CBORTag], t.Any] | None = None,
    object_hook: t.Callable[[cbor2.CBORDecoder, dict], t.Any] | None = None,
    str_errors: t.Literal["strict", "error", "replace"] = "strict",
) -> t.Iterator[object]:
    decoder = cbor2.CBORDecoder(
        fp, tag_hook=tag_hook, object_hook=object_hook, str_errors=str_errors
    )
    while True:
        try:
            yield decoder.decode()
        except EOFError:
            break


def cbor_loads_seq(
    b: bytes,
    tag_hook: t.Callable[[cbor2.CBORDecoder, cbor2.CBORTag], t.Any] | None = None,
    object_hook: t.Callable[[cbor2.CBORDecoder, dict], t.Any] | None = None,
    str_errors: t.Literal["strict", "error", "replace"] = "strict",
) -> t.Iterator[object]:
    with BytesIO(b) as bio:
        yield from cbor_load_seq(bio, tag_hook, object_hook, str_errors)

I guess that's a roundabout way of saying +1 to this; I'd love to even have cbor_load_seq() and cbor_loads_seq() be part of library itself.

Sekenre added the documentation label Nov 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how to handle CBOR objects concatenated in a stream #30

Document how to handle CBOR objects concatenated in a stream #30

Sekenre commented Nov 6, 2018

Sekenre commented Nov 7, 2018

Sekenre commented Oct 28, 2019

Sekenre commented Oct 29, 2019

mikenerone commented Apr 10, 2020

davepeck commented Nov 20, 2024

Document how to handle CBOR objects concatenated in a stream #30

Document how to handle CBOR objects concatenated in a stream #30

Comments

Sekenre commented Nov 6, 2018

Sekenre commented Nov 7, 2018

Sekenre commented Oct 28, 2019

Sekenre commented Oct 29, 2019

mikenerone commented Apr 10, 2020

davepeck commented Nov 20, 2024