Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigint encoding takes quadratic O(n^2) time #29

Closed
Sekenre opened this issue Nov 6, 2018 · 3 comments
Closed

Bigint encoding takes quadratic O(n^2) time #29

Sekenre opened this issue Nov 6, 2018 · 3 comments
Assignees

Comments

@Sekenre
Copy link
Collaborator

Sekenre commented Nov 6, 2018

The code seems to be a long integer time complexity O (n ^ 2)?
Let's say I have a 10,000 bit integer

Originally posted by @fsssosei in #28 (comment)

The repeated divisions and inserting at the beginning of the list get noticeably slow once you get to about 100000 digits.

>>> timeit(lambda: cbor2.dumps(2**200), number=50000)
1.3028403969080813
>>> timeit(lambda: cbor2.dumps(2**2000), number=50000)
12.468400787521446
>>> timeit(lambda: cbor2.dumps(2**20000), number=50000)
710.3362544665841
>>> 
@Sekenre
Copy link
Collaborator Author

Sekenre commented Nov 6, 2018

I have a fix for this in my repo and will merge it in once I've done some more testing.

The algorithm used for bigint encoding comes from Stackoverflow. and is perfectly reasonable for arbitrary bases in C but carries a lot of overhead in Python.

The most efficient method I've found so far converts the float into it's hexadecimal representation and then unhexlify's the string.

@Sekenre Sekenre self-assigned this Nov 6, 2018
@Sekenre
Copy link
Collaborator Author

Sekenre commented Nov 6, 2018

Performance improvement on python 3.6

>>> timeit(lambda: cbor2.dumps(2**20000), number=50000)
3.508080048715488

Sekenre added a commit that referenced this issue Nov 12, 2018
Solution: Encode to base16 string and then decode to bytes.

* Most performant big integer encodings for 2.7 and >=3.2
@Sekenre
Copy link
Collaborator Author

Sekenre commented Nov 12, 2018

Fix has been merged, will be in next release

@Sekenre Sekenre closed this as completed Nov 12, 2018
Sekenre added a commit to Sekenre/cbor2 that referenced this issue Nov 12, 2018
…gronholm#31)

Solution: Encode to base16 string and then decode to bytes.

* Most performant big integer encodings for 2.7 and >=3.2

Problem: Bug agronholm#15, CBOR maps can contain unhashable types as keys

Solution: Use a context manager for decoding keys and set items which
makes sure to use a hashable type.

Also indefinite length bytestring decoding returns bytes() instead of
bytearray().

Fixes for flake8 tests

Problem: HashableMap is unclear and bogus

Solution: Make it clear we are creating an immutable type and call it
FrozenDict to be like the builtin frozenset()

Remove premature optimization

Add a test for nested immutable maps

Test should verify a value that is not nested in a key uses mutable type

Whitespace fixes

Add tests for unused magic methods on FrozenDict

An extra space is a terrible thing

Fixing merge conflict:

Switched from absolute to relative imports

This was requested by Mercurial developers to facilitate vendoring. See agronholm#20 for the discussion.

Make the immutable flag a read-only property and document it's use

Added FrozenDict to encoder for roundtrip decode->encode

Minor whitespace fix
vamega pushed a commit to vamega/cbor2 that referenced this issue Nov 15, 2023
…gronholm#31)

Solution: Encode to base16 string and then decode to bytes.

* Most performant big integer encodings for 2.7 and >=3.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant