Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

Conversation

eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo commented Feb 22, 2022

This is an optimization on top of PR19474.

It combines PR31488, PR31489, and PR31490 into a single change to measure the combined performance benefits.

These results do not change too much from what was already achieved independently by these optimizations (as some of the immortalized instances start overlaping with each other). That being said, performance will keep scaling as the application scales as well. The current microbenchmarks do not measure applications that contain hundreds of imports or thousands of interned strings. Nonetheless, it is still worthwhile to consider all of these improvements in conjunction when thinking about larger scale applications.

Benchmark Results

Overall: 0% faster compared to the main branch and the highest number of non-stat significant benchmarks (18)

pyperformance results
2to3: Mean +- std dev: [cpython_master] 432 ms +- 15 ms -> [immortal_instances_opt_combined_v3] 451 ms +- 16 ms: 1.04x slower
chaos: Mean +- std dev: [cpython_master] 126 ms +- 4 ms -> [immortal_instances_opt_combined_v3] 123 ms +- 4 ms: 1.03x faster
deltablue: Mean +- std dev: [cpython_master] 7.35 ms +- 0.20 ms -> [immortal_instances_opt_combined_v3] 7.74 ms +- 0.41 ms: 1.05x slower
django_template: Mean +- std dev: [cpython_master] 62.2 ms +- 2.0 ms -> [immortal_instances_opt_combined_v3] 63.7 ms +- 2.4 ms: 1.02x slower
fannkuch: Mean +- std dev: [cpython_master] 664 ms +- 15 ms -> [immortal_instances_opt_combined_v3] 677 ms +- 18 ms: 1.02x slower
float: Mean +- std dev: [cpython_master] 128 ms +- 4 ms -> [immortal_instances_opt_combined_v3] 135 ms +- 7 ms: 1.05x slower
go: Mean +- std dev: [cpython_master] 244 ms +- 10 ms -> [immortal_instances_opt_combined_v3] 228 ms +- 14 ms: 1.07x faster
json_dumps: Mean +- std dev: [cpython_master] 19.2 ms +- 0.7 ms -> [immortal_instances_opt_combined_v3] 20.1 ms +- 0.8 ms: 1.04x slower
logging_format: Mean +- std dev: [cpython_master] 10.4 us +- 0.3 us -> [immortal_instances_opt_combined_v3] 11.0 us +- 0.4 us: 1.06x slower
logging_silent: Mean +- std dev: [cpython_master] 201 ns +- 8 ns -> [immortal_instances_opt_combined_v3] 205 ns +- 7 ns: 1.02x slower
logging_simple: Mean +- std dev: [cpython_master] 9.77 us +- 0.32 us -> [immortal_instances_opt_combined_v3] 10.2 us +- 0.4 us: 1.04x slower
nqueens: Mean +- std dev: [cpython_master] 159 ms +- 5 ms -> [immortal_instances_opt_combined_v3] 154 ms +- 3 ms: 1.03x faster
pickle: Mean +- std dev: [cpython_master] 16.0 us +- 0.5 us -> [immortal_instances_opt_combined_v3] 16.6 us +- 0.7 us: 1.04x slower
pickle_dict: Mean +- std dev: [cpython_master] 37.3 us +- 0.6 us -> [immortal_instances_opt_combined_v3] 35.6 us +- 2.1 us: 1.05x faster
pidigits: Mean +- std dev: [cpython_master] 284 ms +- 15 ms -> [immortal_instances_opt_combined_v3] 273 ms +- 9 ms: 1.04x faster
pyflate: Mean +- std dev: [cpython_master] 770 ms +- 28 ms -> [immortal_instances_opt_combined_v3] 746 ms +- 24 ms: 1.03x faster
python_startup: Mean +- std dev: [cpython_master] 12.6 ms +- 0.4 ms -> [immortal_instances_opt_combined_v3] 11.6 ms +- 0.6 ms: 1.08x faster
python_startup_no_site: Mean +- std dev: [cpython_master] 8.89 ms +- 0.39 ms -> [immortal_instances_opt_combined_v3] 8.11 ms +- 0.43 ms: 1.10x faster
raytrace: Mean +- std dev: [cpython_master] 529 ms +- 16 ms -> [immortal_instances_opt_combined_v3] 544 ms +- 17 ms: 1.03x slower
scimark_fft: Mean +- std dev: [cpython_master] 571 ms +- 12 ms -> [immortal_instances_opt_combined_v3] 589 ms +- 13 ms: 1.03x slower
scimark_lu: Mean +- std dev: [cpython_master] 195 ms +- 6 ms -> [immortal_instances_opt_combined_v3] 205 ms +- 7 ms: 1.05x slower
scimark_sor: Mean +- std dev: [cpython_master] 211 ms +- 6 ms -> [immortal_instances_opt_combined_v3] 216 ms +- 6 ms: 1.03x slower
scimark_sparse_mat_mult: Mean +- std dev: [cpython_master] 8.28 ms +- 0.40 ms -> [immortal_instances_opt_combined_v3] 8.66 ms +- 0.30 ms: 1.05x slower
sympy_expand: Mean +- std dev: [cpython_master] 878 ms +- 34 ms -> [immortal_instances_opt_combined_v3] 850 ms +- 27 ms: 1.03x faster
sympy_integrate: Mean +- std dev: [cpython_master] 35.2 ms +- 1.0 ms -> [immortal_instances_opt_combined_v3] 36.6 ms +- 1.5 ms: 1.04x slower
sympy_sum: Mean +- std dev: [cpython_master] 291 ms +- 13 ms -> [immortal_instances_opt_combined_v3] 309 ms +- 7 ms: 1.06x slower
sympy_str: Mean +- std dev: [cpython_master] 514 ms +- 11 ms -> [immortal_instances_opt_combined_v3] 535 ms +- 13 ms: 1.04x slower
telco: Mean +- std dev: [cpython_master] 10.4 ms +- 0.2 ms -> [immortal_instances_opt_combined_v3] 10.7 ms +- 0.5 ms: 1.03x slower
unpack_sequence: Mean +- std dev: [cpython_master] 77.9 ns +- 2.3 ns -> [immortal_instances_opt_combined_v3] 70.6 ns +- 2.0 ns: 1.10x faster
unpickle_list: Mean +- std dev: [cpython_master] 6.77 us +- 0.25 us -> [immortal_instances_opt_combined_v3] 6.95 us +- 0.22 us: 1.03x slower
xml_etree_parse: Mean +- std dev: [cpython_master] 245 ms +- 6 ms -> [immortal_instances_opt_combined_v3] 238 ms +- 5 ms: 1.03x faster
xml_etree_generate: Mean +- std dev: [cpython_master] 146 ms +- 4 ms -> [immortal_instances_opt_combined_v3] 143 ms +- 6 ms: 1.02x faster
xml_etree_process: Mean +- std dev: [cpython_master] 102 ms +- 2 ms -> [immortal_instances_opt_combined_v3] 100 ms +- 3 ms: 1.02x faster

Benchmark hidden because not significant (18): hexiom, html5lib, json_loads, meteor_contest, nbody, pathlib, pickle_list, pickle_pure_python, regex_compile, regex_dna, regex_effbot, regex_v8, richards, scimark_monte_carlo, spectral_norm, unpickle, unpickle_pure_python, xml_etree_iterparse

Geometric mean: 1.00x faster

https://bugs.python.org/issue40255

@kumaraditya303
Copy link
Contributor

You can use this in deepfrozen modules to get this even faster see

self.write(f".ob_refcnt = 999999999,")

@ericsnowcurrently
Copy link
Member

Overall: 0% faster compared to the main branch

That's great news! I'm going to update PEP 683 with the outcome and some of the details.

@brettcannon brettcannon removed their request for review February 23, 2022 21:49
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Here are some random review comments. Hopefully they're helpful. I decided to only review the final PR (with all optimizations). I skipped the .py files and a few others for now.

immortalize_object(PyObject *obj, PyObject *Py_UNUSED(ignored))
{
_Py_SetImmortal(obj);
/* Special case for PyCodeObjects since they don't have a tp_traverse */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various fields below are tuples and the individual items in the tuples should also become immortal, and for co_consts this should recurse down. Maybe whenever we make a tuple immortal we should immortalize all its items?

Comment on lines +2001 to +2002
Py_TYPE(FROM_GC(gc))->tp_traverse(
FROM_GC(gc), (visitproc)immortalize_object, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this loop find code objects contained inside other code objects? (I don't know what exactly is contained in permanent_generation.head.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code objects are not tracked by GC.
And most tuples are not tracked too.

So we need to find code and tuples via function objects, module global dict, class namespace dict, etc.

@@ -1829,6 +1829,10 @@ PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals,
if (mod == NULL) {
goto error;
}
// Immortalize top level modules
if (tstate->recursion_limit - tstate->recursion_remaining == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? I put a printf here and it doesn't seem to be immortalizing most of the frozen modules:

Immortalizing <module 'winreg' (built-in)>
Immortalizing <module '_frozen_importlib_external' (frozen)>
Immortalizing <module 'zipimport' (frozen)>
Immortalizing <module 'encodings' from 'C:\\Users\\gvanrossum\\cpython\\Lib\\encodings\\__init__.py'>  
Immortalizing <module '_winapi' (built-in)>
Immortalizing <module 'encodings.mbcs' from 'C:\\Users\\gvanrossum\\cpython\\Lib\\encodings\\mbcs.py'> 
Immortalizing <module '_signal' (built-in)>
Immortalizing <module 'io' (frozen)>
Immortalizing <module 'site' (frozen)>

(Most frozen modules are imported at a much higher recursion level, either 7, 13 or 18.)

@@ -145,6 +167,20 @@ static inline Py_ssize_t Py_SIZE(const PyVarObject *ob) {
}
#define Py_SIZE(ob) Py_SIZE(_PyVarObject_CAST_CONST(ob))

PyAPI_FUNC(PyObject *) _PyGC_ImmortalizeHeap(void);
PyAPI_FUNC(PyObject *) _PyGC_TransitiveImmortalize(PyObject *obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This name is awkward, I'd expect some variation on "immortalize transitively".)

Comment on lines -1371 to +1383
if (weaklist != NULL) { \
if (stdlib_list != NULL) { \
PyObject *list = user_weaklist; \
if (PySequence_Contains(stdlib_list, name)) { \
list = stdlib_weaklist; \
} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be (hyper-)correct you probably need to check for list != NULL after this, since it might just be possible that stdlib_list is not NULL but stdlib_weaklist or user_weaklist is NULL.

Comment on lines +1481 to +1482
finalize_modules_clear_weaklist(PyThreadState *tstate,
PyInterpreterState *interp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the interpreter state easily found from the thread state? So you'd only need a tstate arg.

// detect those modules which have been held alive.
PyObject *weaklist = finalize_remove_modules(modules, verbose);
// This prepares two lists, the user defined list of modules as well
// as stdlib list of modules. The user modules will be destroyed first in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// as stdlib list of modules. The user modules will be destroyed first in
// as the stdlib list of modules. The stdlib modules will be destroyed after all user modules in

(And probably reflow.)

PyObject *key, *value;


/* First, clear only names starting with a single underscore */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray comment?

Also, no phase deletes __builtins__, right?

}
}
}

void
_PyModule_Clear(PyObject *m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is no longer used. Is that right?

@@ -1994,7 +1995,9 @@ _Py_NewReference(PyObject *op)
#ifdef Py_REF_DEBUG
_Py_RefTotal++;
#endif
Py_SET_REFCNT(op, 1);
/* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This
/* Do not use Py_SET_REFCNT -- it skips the Immortal Instance check. This

}
if (from_prev) {
_PyGCHead_SET_PREV(from_next, from_prev);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed at all?
And please use 4-space indent.

}

PyObject *
_PyGC_ImmortalizeHeap(void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name. This function don't relating to "Heap" at all.

And please move the { to next line.

Comment on lines +2001 to +2002
Py_TYPE(FROM_GC(gc))->tp_traverse(
FROM_GC(gc), (visitproc)immortalize_object, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code objects are not tracked by GC.
And most tuples are not tracked too.

So we need to find code and tuples via function objects, module global dict, class namespace dict, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants