Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-extension does allow customizing serialization #192

Open
2 tasks done
vamega opened this issue Nov 15, 2023 · 2 comments
Open
2 tasks done

C-extension does allow customizing serialization #192

vamega opened this issue Nov 15, 2023 · 2 comments
Labels

Comments

@vamega
Copy link

vamega commented Nov 15, 2023

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

cbor2 version

5.5.1

Python version

3.11

What happened?

The behaviour of the serializer differs between the C extension and the Python implementation of this library.

The Python implementation allows customizing the by updating the value of the _encoders map for known types. The C-implementation as an optimization does not use this map and defines it's own implementations of encoders for the known types. This comment on the original issue adding the C implementation shows that it was intended that the C implementation allow for the canonical parameter to be set to a value of 2 to opt into the slower codepath where the Python defined encoders were used.

The two changes below appear to be the cause of this:

A few questions here:

  1. I recognize that modifying _encoders is not necessarily a supported interface for the library given how the variable is named? If that's not the case, is there no supported way to perform this substitution?
  2. If allowing the C implementation to use the Python defined encoders, how should this be exposed? Go back to overloading the canonical parameter? Introduce an Enum as a value for that field?

How can we reproduce the bug?

The following test case should show the difference.
The python implementation will pass, the C implementation will fail.

import calendar

def test_encoder_replacement(impl):
    def to_epoch(dt: datetime) -> float:
        millis = int(dt.microsecond / 1000.0) / 1000.0
        return calendar.timegm(dt.astimezone(timezone.utc).timetuple()) + millis


    def encode_datetime(self, obj: datetime):
        self.encode_float(to_epoch(obj))


    dt = datetime(2013, 3, 21, 20, 4, 0, tzinfo=timezone.utc)
    expected = unhexlify("fb41d452d9ec000000")
    with BytesIO() as fp:
        encoder = impl.CBOREncoder(fp, canonical=2)
        encoder._encoders[datetime] = encode_datetime
        encoder.encode(dt)
        serialized = fp.getvalue()
@vamega vamega added the bug label Nov 15, 2023
@vamega
Copy link
Author

vamega commented Nov 15, 2023

Curiously enough, this could have been missed in version 5.4.3 if people were accessing the encoder via cbor2.encoder instead of the top level module. The C extension didn't shadow namespaces other than the top level one.

@agronholm
Copy link
Owner

None of the serialization libraries I know of allow overriding the handling of built-in types. This was never supported, as you correctly deduced from the naming of the _encoders private variable. What's your use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants