gh-112075: refactor dictionary lookup functions for better re-usability #114629
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor dictionary lookup functions for better usability
Free threaded builds of Python will need to expand upon the lookup functions to have thread-safe versions of them that can be run without the dictionary being locked. These lookups will just be subtly different - they'll use atomic loads and they'll need to contain some extra checks for values that may change in flight.
Currently there are 3 loookup functions so for free-threaded builds we'll need to get 6 of them. That's going to be a good amount of copy and pasting with sometimes subtle differences. So I'm curious if people would prefer an approach like this as opposed to the copy and pasting. I don't want to use macros as that'd destroy debugging. If we're concerned about perf implications for non-mainstream compilers or don't like this approach I can just go ahead with copy and pasting.
This factors these functions into one core function which contains the loop, and 3 comparison helper functions which implement the different comparisons. In free-threaded versions we'll get modified versions of these comparisons as that's where the differences will be.
This still generates nearly identical code in LTO builds, here's the before https://pastebin.com/UZdhKc6f and the after: https://pastebin.com/sdhcFZP5. Even in non-LTO builds the code is remarkably similar (with the new code maybe resembling the LTO code a little bit more as for some reason the non-LTO code gets a weird jmp at the beginning).
There is technically one change here in that the loop unrolling has been applied to all versions of the code, although we could easily have 2 versions of the loop function with one unrolled and one not-unrolled.
Perf seems to be be mostly neutral to me: https://pastebin.com/LPJYZAj4
dict
objects thread-safe in--disable-gil
builds #112075