Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40255: Implement Immortal Instances - Optimization 1 #31488

Closed

Conversation

eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo commented Feb 22, 2022

Immortalizing Interned Strings

This is an optimization on top of PR19474.

The improvement here uses the assumption that all interned strings are alive during the entire lifecycle of the runtime. Thus, every time that a string is interned, it is automatically immortalized.

Benchmark Results

Overall: 1% slower compared to the master branch

pyperformance results
2to3: Mean +- std dev: [cpython_master] 432 ms +- 15 ms -> [immortal_instances_opt1] 437 ms +- 15 ms: 1.01x slower
chaos: Mean +- std dev: [cpython_master] 126 ms +- 4 ms -> [immortal_instances_opt1] 122 ms +- 5 ms: 1.03x faster
deltablue: Mean +- std dev: [cpython_master] 7.35 ms +- 0.20 ms -> [immortal_instances_opt1] 7.49 ms +- 0.22 ms: 1.02x slower
fannkuch: Mean +- std dev: [cpython_master] 664 ms +- 15 ms -> [immortal_instances_opt1] 654 ms +- 20 ms: 1.02x faster
float: Mean +- std dev: [cpython_master] 128 ms +- 4 ms -> [immortal_instances_opt1] 133 ms +- 3 ms: 1.04x slower
go: Mean +- std dev: [cpython_master] 244 ms +- 10 ms -> [immortal_instances_opt1] 232 ms +- 5 ms: 1.05x faster
html5lib: Mean +- std dev: [cpython_master] 97.9 ms +- 4.2 ms -> [immortal_instances_opt1] 99.7 ms +- 5.3 ms: 1.02x slower
json_dumps: Mean +- std dev: [cpython_master] 19.2 ms +- 0.7 ms -> [immortal_instances_opt1] 19.7 ms +- 0.7 ms: 1.02x slower
json_loads: Mean +- std dev: [cpython_master] 39.5 us +- 0.9 us -> [immortal_instances_opt1] 39.1 us +- 1.3 us: 1.01x faster
logging_format: Mean +- std dev: [cpython_master] 10.4 us +- 0.3 us -> [immortal_instances_opt1] 10.7 us +- 0.4 us: 1.03x slower
meteor_contest: Mean +- std dev: [cpython_master] 164 ms +- 5 ms -> [immortal_instances_opt1] 161 ms +- 6 ms: 1.02x faster
nbody: Mean +- std dev: [cpython_master] 163 ms +- 6 ms -> [immortal_instances_opt1] 160 ms +- 4 ms: 1.02x faster
nqueens: Mean +- std dev: [cpython_master] 159 ms +- 5 ms -> [immortal_instances_opt1] 149 ms +- 5 ms: 1.07x faster
pathlib: Mean +- std dev: [cpython_master] 28.5 ms +- 0.7 ms -> [immortal_instances_opt1] 27.5 ms +- 0.7 ms: 1.04x faster
pickle: Mean +- std dev: [cpython_master] 16.0 us +- 0.5 us -> [immortal_instances_opt1] 16.3 us +- 0.4 us: 1.02x slower
pickle_dict: Mean +- std dev: [cpython_master] 37.3 us +- 0.6 us -> [immortal_instances_opt1] 35.4 us +- 1.0 us: 1.05x faster
pickle_list: Mean +- std dev: [cpython_master] 5.77 us +- 0.24 us -> [immortal_instances_opt1] 5.70 us +- 0.15 us: 1.01x faster
pickle_pure_python: Mean +- std dev: [cpython_master] 572 us +- 14 us -> [immortal_instances_opt1] 582 us +- 24 us: 1.02x slower
pidigits: Mean +- std dev: [cpython_master] 284 ms +- 15 ms -> [immortal_instances_opt1] 273 ms +- 8 ms: 1.04x faster
pyflate: Mean +- std dev: [cpython_master] 770 ms +- 28 ms -> [immortal_instances_opt1] 742 ms +- 24 ms: 1.04x faster
python_startup: Mean +- std dev: [cpython_master] 12.6 ms +- 0.4 ms -> [immortal_instances_opt1] 13.0 ms +- 0.6 ms: 1.04x slower
python_startup_no_site: Mean +- std dev: [cpython_master] 8.89 ms +- 0.39 ms -> [immortal_instances_opt1] 8.97 ms +- 0.35 ms: 1.01x slower
raytrace: Mean +- std dev: [cpython_master] 529 ms +- 16 ms -> [immortal_instances_opt1] 539 ms +- 10 ms: 1.02x slower
regex_compile: Mean +- std dev: [cpython_master] 233 ms +- 6 ms -> [immortal_instances_opt1] 243 ms +- 6 ms: 1.04x slower
regex_dna: Mean +- std dev: [cpython_master] 239 ms +- 6 ms -> [immortal_instances_opt1] 244 ms +- 7 ms: 1.02x slower
regex_effbot: Mean +- std dev: [cpython_master] 4.53 ms +- 0.12 ms -> [immortal_instances_opt1] 4.73 ms +- 0.16 ms: 1.04x slower
regex_v8: Mean +- std dev: [cpython_master] 33.2 ms +- 0.8 ms -> [immortal_instances_opt1] 33.6 ms +- 0.9 ms: 1.01x slower
richards: Mean +- std dev: [cpython_master] 82.8 ms +- 3.7 ms -> [immortal_instances_opt1] 86.2 ms +- 5.2 ms: 1.04x slower
scimark_fft: Mean +- std dev: [cpython_master] 571 ms +- 12 ms -> [immortal_instances_opt1] 603 ms +- 19 ms: 1.06x slower
scimark_lu: Mean +- std dev: [cpython_master] 195 ms +- 6 ms -> [immortal_instances_opt1] 211 ms +- 4 ms: 1.08x slower
scimark_monte_carlo: Mean +- std dev: [cpython_master] 116 ms +- 5 ms -> [immortal_instances_opt1] 119 ms +- 4 ms: 1.03x slower
scimark_sor: Mean +- std dev: [cpython_master] 211 ms +- 6 ms -> [immortal_instances_opt1] 222 ms +- 4 ms: 1.05x slower
scimark_sparse_mat_mult: Mean +- std dev: [cpython_master] 8.28 ms +- 0.40 ms -> [immortal_instances_opt1] 8.51 ms +- 0.23 ms: 1.03x slower
spectral_norm: Mean +- std dev: [cpython_master] 208 ms +- 9 ms -> [immortal_instances_opt1] 211 ms +- 4 ms: 1.01x slower
sympy_expand: Mean +- std dev: [cpython_master] 878 ms +- 34 ms -> [immortal_instances_opt1] 860 ms +- 24 ms: 1.02x faster
sympy_integrate: Mean +- std dev: [cpython_master] 35.2 ms +- 1.0 ms -> [immortal_instances_opt1] 35.6 ms +- 1.0 ms: 1.01x slower
sympy_sum: Mean +- std dev: [cpython_master] 291 ms +- 13 ms -> [immortal_instances_opt1] 300 ms +- 8 ms: 1.03x slower
sympy_str: Mean +- std dev: [cpython_master] 514 ms +- 11 ms -> [immortal_instances_opt1] 522 ms +- 17 ms: 1.02x slower
telco: Mean +- std dev: [cpython_master] 10.4 ms +- 0.2 ms -> [immortal_instances_opt1] 10.6 ms +- 0.5 ms: 1.02x slower
unpack_sequence: Mean +- std dev: [cpython_master] 77.9 ns +- 2.3 ns -> [immortal_instances_opt1] 78.8 ns +- 2.3 ns: 1.01x slower
unpickle_list: Mean +- std dev: [cpython_master] 6.77 us +- 0.25 us -> [immortal_instances_opt1] 6.90 us +- 0.16 us: 1.02x slower
unpickle_pure_python: Mean +- std dev: [cpython_master] 463 us +- 17 us -> [immortal_instances_opt1] 448 us +- 9 us: 1.03x faster
xml_etree_parse: Mean +- std dev: [cpython_master] 245 ms +- 6 ms -> [immortal_instances_opt1] 250 ms +- 11 ms: 1.02x slower
xml_etree_generate: Mean +- std dev: [cpython_master] 146 ms +- 4 ms -> [immortal_instances_opt1] 144 ms +- 5 ms: 1.01x faster
xml_etree_process: Mean +- std dev: [cpython_master] 102 ms +- 2 ms -> [immortal_instances_opt1] 100.0 ms +- 3.0 ms: 1.02x faster

Benchmark hidden because not significant (6): django_template, hexiom, logging_silent, logging_simple, unpickle, xml_etree_iterparse

Geometric mean: 1.01x slower

Implementation Details

Any time that PyUnicode_InternInPlace is called, the string will be automatically immortalized. This also re-utilizes the string state mechanism by setting the state to SSTATE_INTERNED_IMMORTAL. Given that all interned strings are now immortal, the SSTATE_INTERNED_MORTAL usage has been deprecated.

Interned String Finalization

The current change does not attempt to fully clean up the strings during the runtime shutdown. The main reason being that interned strings include both statically allocated strings (i.e Py_Identifiers) and dynamically allocated strings (i.e PyUnicode_New). During the runtime shutdown, it is hard to determined which strings in the interned dictionary are statically allocated vs dynamically allocated. Blindly calling the deallocation function will end up calling PyMem_RawFree on statically allocated memory which will crash the program.

This could be in theory fixed by having a way to mark these statics strings differently than the dynamically allocated strings. If we distinguish them during the runtime shutdown, we can run the deallocation function on the dynamic strings. However, all the fields within these instances are already used and have no way to cleanly embed this information. Another option is to use the memory allocation range of _PyRuntime.global_objects.singletons.strings up to sizeof(_Py_global_strings) and anything outside of this is a dynamic string. However, this relies in exposing the internal _Py_global_strings struct. Given there's no clean option, I’ve decided to leave the interned strings as is and let the OS free the memory after the runtime shuts down.

https://bugs.python.org/issue40255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants