Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118610: Centralize power caching in _pylong.py #118611

Merged
merged 24 commits into from
May 8, 2024
Merged

Conversation

tim-one
Copy link
Member

@tim-one tim-one commented May 5, 2024

A new internal function computes, in advance, all and only the powers various other functions will need. While a few percent faster overall (timing roundtrip int(str(n)) on "large" n), the primary point is to free the other functions from needing to embed their own ad-hoc caching schemes.

Still a work in progress. For now, both the old and new versions of the client functions are present, and new setold() and setnew() functions allow dynamically switching between them. That's for testing, and all that will eventually be removed.

@tim-one tim-one self-assigned this May 6, 2024
newer int->str function to receive fewer powers than it needed.
@tim-one
Copy link
Member Author

tim-one commented May 6, 2024

I've been testing roundtrip times (and verifying results) by running this:

Code here
from random import randrange
from time import perf_counter as now
import _pylong
bits = 500
while bits <= 10_000_000_000:
    print("bits", format(bits, "_"))
    hibit = 1 << (bits - 1)
    for i in range(3):
        lo = randrange(hibit)
        n = lo + hibit
        assert n.bit_length() == bits

        _pylong.setold()
        s = now()
        nn = int(str(n))
        f = now()
        e1 = f - s
        assert n == nn

        _pylong.setnew()
        s = now()
        nn = int(str(n))
        f = now()
        e2 = f - s
        assert n == nn
        print(f"   {e1:.3f} {e2:.3f} {e2/e1}")
    bits += (bits >> 1) + randrange(-9, 10)

An example of an unusually good bit length:

bits 1_113_174
   0.273 0.257 0.9410583385184018
   0.272 0.261 0.9590880969910299
   0.274 0.259 0.9467701893597175

Under 1 means the new code is faster. So about 5% speedup there.

And an example of an unusually bad bit lentgh:

bits 28_529_466
   24.035 23.962 0.9969707640927915
   24.068 23.994 0.9969205537287908
   24.059 24.350 1.0120921891569215

So no apparent difference there.

Typical:

bits 42_794_206
   45.060 43.988 0.9762041206675952
   44.566 43.998 0.9872495022740332
   44.655 43.813 0.98114660667433```

So about 2% faster.

Speed isn't the primary point of this PR, but it's good that it generally helps a bit.

I'm not 100% sure, but I expect the best cases are ones where compute_powers() finds a faster woy to compute all the powers needed than the various ad-hoc caching schemes happened to find.

Toward that end, it's only the largest powers needed that really matter. Computing the very largest power needed can consume a significant fraction of the total conversion time.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is simplification of the code, I do not see how is this change an improvement. My implementation of int_to_decimal_string() is about 20 lines long (ignoring comments), and the caching of 10**w adds only 3 lines. w2pow() and w5pow() add 12 lines each. compute_powers() is almost 50 lines long.

Lib/_pylong.py Outdated Show resolved Hide resolved
Lib/_pylong.py Outdated Show resolved Hide resolved
@tim-one
Copy link
Member Author

tim-one commented May 6, 2024

If the goal is simplification of the code

The primary goal, yes, but not the only. We'll have to disagree about whether it's simpler. As I noted when reviewing your code, the caching scheme you made up, while brief, was quite unlike the caching scheme the other two functions used (which were very similar to each other). That raises the bar for understanding the code as a whole. All caching logic is one place now.

For the other two functions, their caching logic consumed a very large proportion of their total code.

compute_powers() is larger than any of the earlier schemes because it's trying to do the best (reasonably) possible job of computing all the needed powers quickly. That's a goal too, but not the primary one.

@tim-one
Copy link
Member Author

tim-one commented May 6, 2024

BTW, even in your function, centralizing the caching logic reduces the body of its inner() from 8 lines to 5, all of which is now top-level logic directly needed to compute its result. Putting low-level caching logic in there too distracted from that.

@tim-one tim-one linked an issue May 6, 2024 that may be closed by this pull request
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me.

When you finish with this, please do not forget to edit the commit message when merging the PR.

Lib/test/test_int.py Outdated Show resolved Hide resolved
Lib/_pylong.py Show resolved Hide resolved
Lib/_pylong.py Show resolved Hide resolved
Lib/_pylong.py Outdated Show resolved Hide resolved
Lib/_pylong.py Outdated Show resolved Hide resolved
tim-one and others added 3 commits May 7, 2024 13:26
Spelling error.

Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@tim-one
Copy link
Member Author

tim-one commented May 7, 2024

please do not forget to edit the commit message when merging the PR.

Last time I did "squash & merge", GitHub just went ahead and did it - it didn't give me a chance to edit. We'll see ☹️

@serhiy-storchaka
Copy link
Member

It happens. If for some reason GitHub fails to merge the first time, do not try to press the button again. First reload the page.

@tim-one
Copy link
Member Author

tim-one commented May 7, 2024

Let me explain "simplify" a bit more. The new function obviously isn't "simpler". The comment before it ends with:

the primary intent is to simplify the functions using this, by eliminating the need for them to craft their own ad-hoc caching schemes.

That's the point. The 3 (so far) client functions are inarguably simpler after the change, and that's the primary point. Base conversion is the primary job of this module, and the functions doing that are the focus. Computing powers of bases is a technical detail, that can be (and now is) delegated to an expert about that alone. "Separation of concerns."

@Yhg1s
Copy link
Member

Yhg1s commented May 7, 2024

Do you think this should make it into 3.13, @tim-one?

@tim-one
Copy link
Member Author

tim-one commented May 7, 2024

Do you think this should make it into 3.13, @tim-one?

Not needed. The primary point was to simplify the internal coding. The only visible effect should be speedups of at most a few percent when doing giant int<->string conversions. I doubt that really matters much to anyone.

@tim-one tim-one merged commit 2f0a338 into python:main May 8, 2024
31 checks passed
@tim-one tim-one deleted the intconv branch May 8, 2024 00:09
SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
)

A new `compute_powers()` function computes all and only the powers of the base the various base-conversion functions need, as efficiently as reasonably possible (turns out that invoking `**`is needed at most once). This typically gives a few % speedup, but the primary point is to simplify the base-conversion functions, which no longer need their own, ad hoc, and less efficient power-caching schemes.

Co-authored-by: Serhiy Storchaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Centralize power caching in _pylong.py
3 participants