Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to handle CBOR objects concatenated in a stream #30

Open
Sekenre opened this issue Nov 6, 2018 · 5 comments
Open

Document how to handle CBOR objects concatenated in a stream #30

Sekenre opened this issue Nov 6, 2018 · 5 comments

Comments

@Sekenre
Copy link
Collaborator

Sekenre commented Nov 6, 2018

An EOF while decoding should not be used to signal the end of a stream.

If you're decoding a set of CBOR objects that are just concatenated together in a file without delimiters then do this:

from cbor2 import CBORDecoder
with open('myfile.cbor', 'rb') as f:
    decoder = CBORDecoder(f)
    while f.peek(1): # Check that there's data at the next file position
        print(decoder.decode())
    print('Done!')

Otherwise wrap the items in an indefinite length array.

This is equivalent to how you would decode cbor from a network socket or queue, you always expect a complete item and if you don't it's corrupted data. Hence EOF errors do not make sense to pass up to the reader.

@Sekenre
Copy link
Collaborator Author

Sekenre commented Nov 7, 2018

According to some discussion on the CBOR mailing list, Most encoders should stop when they reach the end of the outermost item or report 'garbage' at the end of the file.

@Sekenre
Copy link
Collaborator Author

Sekenre commented Oct 28, 2019

This will be fixed once pull request #61 has been finalized. Calling loads(sequence=True) and expecting an indefinite length array seems like the way to go.

@Sekenre
Copy link
Collaborator Author

Sekenre commented Oct 29, 2019

@waveform80 gave a detailed breakdown of why calling cbor2.load(f) multiple times to iterate over a concatenated stream of bare CBOR objects is a good option.

#61 (comment)

@mikenerone
Copy link

Just a thought: it would be convenient if a CBORDecoder was an iterable that yielded successive decode() results.

@davepeck
Copy link

Everywhere I use cbor2 I find myself rewriting these two methods:

import typing as t
import cbor2
from io import BytesIO

def cbor_load_seq(
    fp: t.IO[bytes],
    tag_hook: t.Callable[[cbor2.CBORDecoder, cbor2.CBORTag], t.Any] | None = None,
    object_hook: t.Callable[[cbor2.CBORDecoder, dict], t.Any] | None = None,
    str_errors: t.Literal["strict", "error", "replace"] = "strict",
) -> t.Iterator[object]:
    decoder = cbor2.CBORDecoder(
        fp, tag_hook=tag_hook, object_hook=object_hook, str_errors=str_errors
    )
    while True:
        try:
            yield decoder.decode()
        except EOFError:
            break


def cbor_loads_seq(
    b: bytes,
    tag_hook: t.Callable[[cbor2.CBORDecoder, cbor2.CBORTag], t.Any] | None = None,
    object_hook: t.Callable[[cbor2.CBORDecoder, dict], t.Any] | None = None,
    str_errors: t.Literal["strict", "error", "replace"] = "strict",
) -> t.Iterator[object]:
    with BytesIO(b) as bio:
        yield from cbor_load_seq(bio, tag_hook, object_hook, str_errors)

I guess that's a roundabout way of saying +1 to this; I'd love to even have cbor_load_seq() and cbor_loads_seq() be part of library itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants