gh-118610: Centralize power caching in `_pylong.py` #118611

tim-one · 2024-05-05T21:08:02Z

A new internal function computes, in advance, all and only the powers various other functions will need. While a few percent faster overall (timing roundtrip int(str(n)) on "large" n), the primary point is to free the other functions from needing to embed their own ad-hoc caching schemes.

Still a work in progress. For now, both the old and new versions of the client functions are present, and new setold() and setnew() functions allow dynamically switching between them. That's for testing, and all that will eventually be removed.

Issue: Centralize power caching in _pylong.py #118610

to cache the powers they need.

needed, and added asserts to ensure that's so.

…conv

newer int->str function to receive fewer powers than it needed.

tim-one · 2024-05-06T06:36:41Z

I've been testing roundtrip times (and verifying results) by running this:

Code here

from random import randrange
from time import perf_counter as now
import _pylong
bits = 500
while bits <= 10_000_000_000:
    print("bits", format(bits, "_"))
    hibit = 1 << (bits - 1)
    for i in range(3):
        lo = randrange(hibit)
        n = lo + hibit
        assert n.bit_length() == bits

        _pylong.setold()
        s = now()
        nn = int(str(n))
        f = now()
        e1 = f - s
        assert n == nn

        _pylong.setnew()
        s = now()
        nn = int(str(n))
        f = now()
        e2 = f - s
        assert n == nn
        print(f"   {e1:.3f} {e2:.3f} {e2/e1}")
    bits += (bits >> 1) + randrange(-9, 10)

An example of an unusually good bit length:

bits 1_113_174
   0.273 0.257 0.9410583385184018
   0.272 0.261 0.9590880969910299
   0.274 0.259 0.9467701893597175

Under 1 means the new code is faster. So about 5% speedup there.

And an example of an unusually bad bit lentgh:

bits 28_529_466
   24.035 23.962 0.9969707640927915
   24.068 23.994 0.9969205537287908
   24.059 24.350 1.0120921891569215

So no apparent difference there.

Typical:

bits 42_794_206
   45.060 43.988 0.9762041206675952
   44.566 43.998 0.9872495022740332
   44.655 43.813 0.98114660667433```

So about 2% faster.

Speed isn't the primary point of this PR, but it's good that it generally helps a bit.

I'm not 100% sure, but I expect the best cases are ones where compute_powers() finds a faster woy to compute all the powers needed than the various ad-hoc caching schemes happened to find.

Toward that end, it's only the largest powers needed that really matter. Computing the very largest power needed can consume a significant fraction of the total conversion time.

serhiy-storchaka

If the goal is simplification of the code, I do not see how is this change an improvement. My implementation of int_to_decimal_string() is about 20 lines long (ignoring comments), and the caching of 10**w adds only 3 lines. w2pow() and w5pow() add 12 lines each. compute_powers() is almost 50 lines long.

Lib/_pylong.py

tim-one · 2024-05-06T16:45:29Z

If the goal is simplification of the code

The primary goal, yes, but not the only. We'll have to disagree about whether it's simpler. As I noted when reviewing your code, the caching scheme you made up, while brief, was quite unlike the caching scheme the other two functions used (which were very similar to each other). That raises the bar for understanding the code as a whole. All caching logic is one place now.

For the other two functions, their caching logic consumed a very large proportion of their total code.

compute_powers() is larger than any of the earlier schemes because it's trying to do the best (reasonably) possible job of computing all the needed powers quickly. That's a goal too, but not the primary one.

tim-one · 2024-05-06T17:11:37Z

BTW, even in your function, centralizing the caching logic reduces the body of its inner() from 8 lines to 5, all of which is now top-level logic directly needed to compute its result. Putting low-level caching logic in there too distracted from that.

…tion.

That was for running comparative timing tests. & that's over.

to be negative too, and check that the result string doesn't start with a zero.

Lib/test/test_int.py

serhiy-storchaka

Looks correct to me.

When you finish with this, please do not forget to edit the commit message when merging the PR.

Lib/test/test_int.py

Lib/_pylong.py

Spelling error. Co-authored-by: Serhiy Storchaka <[email protected]>

Co-authored-by: Serhiy Storchaka <[email protected]>

tim-one · 2024-05-07T18:36:27Z

please do not forget to edit the commit message when merging the PR.

Last time I did "squash & merge", GitHub just went ahead and did it - it didn't give me a chance to edit. We'll see ☹️

serhiy-storchaka · 2024-05-07T18:43:51Z

It happens. If for some reason GitHub fails to merge the first time, do not try to press the button again. First reload the page.

tim-one · 2024-05-07T18:47:15Z

Let me explain "simplify" a bit more. The new function obviously isn't "simpler". The comment before it ends with:

the primary intent is to simplify the functions using this, by eliminating the need for them to craft their own ad-hoc caching schemes.

That's the point. The 3 (so far) client functions are inarguably simpler after the change, and that's the primary point. Base conversion is the primary job of this module, and the functions doing that are the focus. Computing powers of bases is a technical detail, that can be (and now is) delegated to an expert about that alone. "Separation of concerns."

Yhg1s · 2024-05-07T19:57:14Z

Do you think this should make it into 3.13, @tim-one?

tim-one · 2024-05-07T20:00:19Z

Do you think this should make it into 3.13, @tim-one?

Not needed. The primary point was to simplify the internal coding. The only visible effect should be speedups of at most a few percent when doing giant int<->string conversions. I doubt that really matters much to anyone.

) A new `compute_powers()` function computes all and only the powers of the base the various base-conversion functions need, as efficiently as reasonably possible (turns out that invoking `**`is needed at most once). This typically gives a few % speedup, but the primary point is to simplify the base-conversion functions, which no longer need their own, ad hoc, and less efficient power-caching schemes. Co-authored-by: Serhiy Storchaka <[email protected]>

Work in progress. Change functions to use a new function

9a706bf

to cache the powers they need.

bedevere-app bot added the awaiting core review label May 5, 2024

bedevere-app bot mentioned this pull request May 5, 2024

Centralize power caching in _pylong.py #118610

Closed

tim-one added the skip news label May 5, 2024

tim-one added 2 commits May 5, 2024 16:44

New test that n == int(str(n)) for large random n.

ba68ccc

Merge branch 'main' into intconv

949507b

tim-one requested a review from serhiy-storchaka May 5, 2024 22:09

tim-one added 4 commits May 5, 2024 18:30

Merge remote-tracking branch 'upstream/main' into intconv

748046f

Merge branch 'main' into intconv

4c7c30c

Fixed the logic so that ** is used only for the smalles power

4ff9a26

needed, and added asserts to ensure that's so.

Merge branch 'intconv' of https://github.com/tim-one/cpython into int…

b0db76a

…conv

tim-one self-assigned this May 6, 2024

I typed "1009" by mistake instead of "1000". This could cause the

350a2ae

newer int->str function to receive fewer powers than it needed.

serhiy-storchaka reviewed May 6, 2024

View reviewed changes

Lib/_pylong.py Outdated Show resolved Hide resolved

Lib/_pylong.py Outdated Show resolved Hide resolved

tim-one added 2 commits May 6, 2024 11:26

Merge remote-tracking branch 'upstream/main' into intconv

3d533e4

Remove the recursion in compute_powers().

7da69e7

tim-one added 2 commits May 6, 2024 16:22

Added a comment, and adopted Serhiy's suggestion to speed "lo" extrac…

bd0ba0a

…tion.

Merge remote-tracking branch 'upstream/main' into intconv

39928c4

tim-one linked an issue May 6, 2024 that may be closed by this pull request

Centralize power caching in _pylong.py #118610

Closed

tim-one added 5 commits May 6, 2024 16:27

Merge branch 'main' into intconv

120e867

Remove the old functions, and the ability to switch implementations.

7a870e0

That was for running comparative timing tests. & that's over.

Merge remote-tracking branch 'upstream/main' into intconv

5f9cd7a

In the new test, allow random changes to the bit length

f5b410f

to be negative too, and check that the result string doesn't start with a zero.

Merge remote-tracking branch 'upstream/main' into intconv

0280663

serhiy-storchaka reviewed May 7, 2024

View reviewed changes

Lib/test/test_int.py Outdated Show resolved Hide resolved

serhiy-storchaka approved these changes May 7, 2024

View reviewed changes

Lib/test/test_int.py Outdated Show resolved Hide resolved

Lib/_pylong.py Show resolved Hide resolved

Lib/_pylong.py Show resolved Hide resolved

Lib/_pylong.py Outdated Show resolved Hide resolved

Lib/_pylong.py Outdated Show resolved Hide resolved

bedevere-app bot removed the awaiting core review label May 7, 2024

bedevere-app bot added the awaiting merge label May 7, 2024

tim-one and others added 3 commits May 7, 2024 13:26

Update Lib/_pylong.py

1122332

Spelling error. Co-authored-by: Serhiy Storchaka <[email protected]>

Update Lib/test/test_int.py

de424d4

Co-authored-by: Serhiy Storchaka <[email protected]>

Update Lib/test/test_int.py

cca53d5

Co-authored-by: Serhiy Storchaka <[email protected]>

tim-one added 4 commits May 7, 2024 13:56

Add named constant for int_to_decimal_string's digit limit.

c46f295

Merge remote-tracking branch 'upstream/main' into intconv

271928f

Merge branch 'main' into intconv

afc681c

Merge branch 'main' into intconv

52bddcd

tim-one merged commit 2f0a338 into python:main May 8, 2024
31 checks passed

bedevere-app bot removed the awaiting merge label May 8, 2024

tim-one deleted the intconv branch May 8, 2024 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-118610: Centralize power caching in `_pylong.py` #118611

gh-118610: Centralize power caching in `_pylong.py` #118611

tim-one commented May 5, 2024 •

edited by bedevere-app bot

Loading

tim-one commented May 6, 2024

serhiy-storchaka left a comment

tim-one commented May 6, 2024 •

edited

Loading

tim-one commented May 6, 2024

serhiy-storchaka left a comment

tim-one commented May 7, 2024

serhiy-storchaka commented May 7, 2024

tim-one commented May 7, 2024

Yhg1s commented May 7, 2024

tim-one commented May 7, 2024

gh-118610: Centralize power caching in _pylong.py #118611

gh-118610: Centralize power caching in _pylong.py #118611

Conversation

tim-one commented May 5, 2024 • edited by bedevere-app bot Loading

tim-one commented May 6, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

tim-one commented May 6, 2024 • edited Loading

tim-one commented May 6, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

tim-one commented May 7, 2024

serhiy-storchaka commented May 7, 2024

tim-one commented May 7, 2024

Yhg1s commented May 7, 2024

tim-one commented May 7, 2024

gh-118610: Centralize power caching in `_pylong.py` #118611

gh-118610: Centralize power caching in `_pylong.py` #118611

tim-one commented May 5, 2024 •

edited by bedevere-app bot

Loading

tim-one commented May 6, 2024 •

edited

Loading