Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-124688: _decimal: Get a module state from ctx objects for performance #124691

Merged
merged 6 commits into from
Sep 28, 2024

Conversation

neonene
Copy link
Contributor

@neonene neonene commented Sep 27, 2024

@vstinner
Copy link
Member

Do you have a benchmark showing the effect of this change?

@neonene
Copy link
Contributor Author

neonene commented Sep 27, 2024

_dec_mpd_radix() is added (818c24f) so as not to be passed a different object, not for performance.

@rhettinger rhettinger removed their request for review September 27, 2024 18:15
@neonene
Copy link
Contributor Author

neonene commented Sep 27, 2024

My different PGO builds says 3-5% faster on the telco benchmark.

@vstinner
Copy link
Member

My different PGO builds says 3-5% faster on the telco benchmark.

Would you mind to share your results (before/after, difference)?

@neonene
Copy link
Contributor Author

neonene commented Sep 27, 2024

On release builds, the result was as fast as the global state access: #123100 (comment) (PR with alternatives: 1.08x )

@neonene
Copy link
Contributor Author

neonene commented Sep 27, 2024

Picked a few telco results on Windows PGO builds:

a) main with d69529d restored (hard to evaluate main as-is)

9.68 ms +- 0.17 ms -> 8.94 ms +- 0.15 ms: 1.08x faster  # PR (ctx object)
9.68 ms +- 0.17 ms -> 8.76 ms +- 0.11 ms: 1.10x faster  # global state access

b) (a) with some patches

8.99 ms +- 0.11 ms -> 8.49 ms +- 0.16 ms: 1.06x faster  # PR (ctx object)
8.99 ms +- 0.11 ms -> 8.44 ms +- 0.20 ms: 1.07x faster  # global state access

c) (b) with 1 diff in PyType_GetBaseByToken()

8.91 ms +- 0.21 ms -> 8.78 ms +- 0.16 ms: 1.02x faster  # PR (ctx object)
8.91 ms +- 0.21 ms -> 8.25 ms +- 0.17 ms: 1.08x faster  # global state access

@vstinner
Copy link
Member

Sorry, I don't know how to read your 6 benchmarks. I don't understand what you compared. I don't know what are "some patches" nor why you mention PyType_GetBaseByToken(), it doesn't help me to compare this PR.

So I just compared the main branch to this PR:

Mean +- std dev: [ref] 8.18 ms +- 0.28 ms -> [change] 7.53 ms +- 0.18 ms: 1.09x faster

Nice! 1.09x faster is worth it.

@vstinner vstinner enabled auto-merge (squash) September 28, 2024 15:03
@vstinner vstinner merged commit dc12237 into python:main Sep 28, 2024
36 checks passed
@picnixz
Copy link
Contributor

picnixz commented Sep 28, 2024

I added the skip news since Victor enabled auto-merge. If a NEWS entry is really needed, we can add it later in a separate commit though.

@vstinner
Copy link
Member

Merged, thanks @neonene!

@neonene neonene deleted the decstate branch September 28, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants