bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

eduardo-elizondo · 2022-02-22T05:36:32Z

This is an optimization on top of PR19474.

It combines PR31488, PR31489, and PR31490 into a single change to measure the combined performance benefits.

These results do not change too much from what was already achieved independently by these optimizations (as some of the immortalized instances start overlaping with each other). That being said, performance will keep scaling as the application scales as well. The current microbenchmarks do not measure applications that contain hundreds of imports or thousands of interned strings. Nonetheless, it is still worthwhile to consider all of these improvements in conjunction when thinking about larger scale applications.

Benchmark Results

Overall: 0% faster compared to the main branch and the highest number of non-stat significant benchmarks (18)

pyperformance results

2to3: Mean +- std dev: [cpython_master] 432 ms +- 15 ms -> [immortal_instances_opt_combined_v3] 451 ms +- 16 ms: 1.04x slower
chaos: Mean +- std dev: [cpython_master] 126 ms +- 4 ms -> [immortal_instances_opt_combined_v3] 123 ms +- 4 ms: 1.03x faster
deltablue: Mean +- std dev: [cpython_master] 7.35 ms +- 0.20 ms -> [immortal_instances_opt_combined_v3] 7.74 ms +- 0.41 ms: 1.05x slower
django_template: Mean +- std dev: [cpython_master] 62.2 ms +- 2.0 ms -> [immortal_instances_opt_combined_v3] 63.7 ms +- 2.4 ms: 1.02x slower
fannkuch: Mean +- std dev: [cpython_master] 664 ms +- 15 ms -> [immortal_instances_opt_combined_v3] 677 ms +- 18 ms: 1.02x slower
float: Mean +- std dev: [cpython_master] 128 ms +- 4 ms -> [immortal_instances_opt_combined_v3] 135 ms +- 7 ms: 1.05x slower
go: Mean +- std dev: [cpython_master] 244 ms +- 10 ms -> [immortal_instances_opt_combined_v3] 228 ms +- 14 ms: 1.07x faster
json_dumps: Mean +- std dev: [cpython_master] 19.2 ms +- 0.7 ms -> [immortal_instances_opt_combined_v3] 20.1 ms +- 0.8 ms: 1.04x slower
logging_format: Mean +- std dev: [cpython_master] 10.4 us +- 0.3 us -> [immortal_instances_opt_combined_v3] 11.0 us +- 0.4 us: 1.06x slower
logging_silent: Mean +- std dev: [cpython_master] 201 ns +- 8 ns -> [immortal_instances_opt_combined_v3] 205 ns +- 7 ns: 1.02x slower
logging_simple: Mean +- std dev: [cpython_master] 9.77 us +- 0.32 us -> [immortal_instances_opt_combined_v3] 10.2 us +- 0.4 us: 1.04x slower
nqueens: Mean +- std dev: [cpython_master] 159 ms +- 5 ms -> [immortal_instances_opt_combined_v3] 154 ms +- 3 ms: 1.03x faster
pickle: Mean +- std dev: [cpython_master] 16.0 us +- 0.5 us -> [immortal_instances_opt_combined_v3] 16.6 us +- 0.7 us: 1.04x slower
pickle_dict: Mean +- std dev: [cpython_master] 37.3 us +- 0.6 us -> [immortal_instances_opt_combined_v3] 35.6 us +- 2.1 us: 1.05x faster
pidigits: Mean +- std dev: [cpython_master] 284 ms +- 15 ms -> [immortal_instances_opt_combined_v3] 273 ms +- 9 ms: 1.04x faster
pyflate: Mean +- std dev: [cpython_master] 770 ms +- 28 ms -> [immortal_instances_opt_combined_v3] 746 ms +- 24 ms: 1.03x faster
python_startup: Mean +- std dev: [cpython_master] 12.6 ms +- 0.4 ms -> [immortal_instances_opt_combined_v3] 11.6 ms +- 0.6 ms: 1.08x faster
python_startup_no_site: Mean +- std dev: [cpython_master] 8.89 ms +- 0.39 ms -> [immortal_instances_opt_combined_v3] 8.11 ms +- 0.43 ms: 1.10x faster
raytrace: Mean +- std dev: [cpython_master] 529 ms +- 16 ms -> [immortal_instances_opt_combined_v3] 544 ms +- 17 ms: 1.03x slower
scimark_fft: Mean +- std dev: [cpython_master] 571 ms +- 12 ms -> [immortal_instances_opt_combined_v3] 589 ms +- 13 ms: 1.03x slower
scimark_lu: Mean +- std dev: [cpython_master] 195 ms +- 6 ms -> [immortal_instances_opt_combined_v3] 205 ms +- 7 ms: 1.05x slower
scimark_sor: Mean +- std dev: [cpython_master] 211 ms +- 6 ms -> [immortal_instances_opt_combined_v3] 216 ms +- 6 ms: 1.03x slower
scimark_sparse_mat_mult: Mean +- std dev: [cpython_master] 8.28 ms +- 0.40 ms -> [immortal_instances_opt_combined_v3] 8.66 ms +- 0.30 ms: 1.05x slower
sympy_expand: Mean +- std dev: [cpython_master] 878 ms +- 34 ms -> [immortal_instances_opt_combined_v3] 850 ms +- 27 ms: 1.03x faster
sympy_integrate: Mean +- std dev: [cpython_master] 35.2 ms +- 1.0 ms -> [immortal_instances_opt_combined_v3] 36.6 ms +- 1.5 ms: 1.04x slower
sympy_sum: Mean +- std dev: [cpython_master] 291 ms +- 13 ms -> [immortal_instances_opt_combined_v3] 309 ms +- 7 ms: 1.06x slower
sympy_str: Mean +- std dev: [cpython_master] 514 ms +- 11 ms -> [immortal_instances_opt_combined_v3] 535 ms +- 13 ms: 1.04x slower
telco: Mean +- std dev: [cpython_master] 10.4 ms +- 0.2 ms -> [immortal_instances_opt_combined_v3] 10.7 ms +- 0.5 ms: 1.03x slower
unpack_sequence: Mean +- std dev: [cpython_master] 77.9 ns +- 2.3 ns -> [immortal_instances_opt_combined_v3] 70.6 ns +- 2.0 ns: 1.10x faster
unpickle_list: Mean +- std dev: [cpython_master] 6.77 us +- 0.25 us -> [immortal_instances_opt_combined_v3] 6.95 us +- 0.22 us: 1.03x slower
xml_etree_parse: Mean +- std dev: [cpython_master] 245 ms +- 6 ms -> [immortal_instances_opt_combined_v3] 238 ms +- 5 ms: 1.03x faster
xml_etree_generate: Mean +- std dev: [cpython_master] 146 ms +- 4 ms -> [immortal_instances_opt_combined_v3] 143 ms +- 6 ms: 1.02x faster
xml_etree_process: Mean +- std dev: [cpython_master] 102 ms +- 2 ms -> [immortal_instances_opt_combined_v3] 100 ms +- 3 ms: 1.02x faster

Benchmark hidden because not significant (18): hexiom, html5lib, json_loads, meteor_contest, nbody, pathlib, pickle_list, pickle_pure_python, regex_compile, regex_dna, regex_effbot, regex_v8, richards, scimark_monte_carlo, spectral_norm, unpickle, unpickle_pure_python, xml_etree_iterparse

Geometric mean: 1.00x faster

https://bugs.python.org/issue40255

kumaraditya303 · 2022-02-22T06:51:56Z

You can use this in deepfrozen modules to get this even faster see

cpython/Tools/scripts/deepfreeze.py

Line 140 in 74127b8

self.write(f".ob_refcnt = 999999999,")

ericsnowcurrently · 2022-02-22T17:38:22Z

Overall: 0% faster compared to the main branch

That's great news! I'm going to update PEP 683 with the outcome and some of the details.

gvanrossum

Nice work! Here are some random review comments. Hopefully they're helpful. I decided to only review the final PR (with all optimizations). I skipped the .py files and a few others for now.

gvanrossum · 2022-03-01T21:20:25Z

Modules/gcmodule.c

+immortalize_object(PyObject *obj, PyObject *Py_UNUSED(ignored))
+{
+    _Py_SetImmortal(obj);
+    /* Special case for PyCodeObjects since they don't have a tp_traverse */


Various fields below are tuples and the individual items in the tuples should also become immortal, and for co_consts this should recurse down. Maybe whenever we make a tuple immortal we should immortalize all its items?

gvanrossum · 2022-03-01T21:32:38Z

Modules/gcmodule.c

+        Py_TYPE(FROM_GC(gc))->tp_traverse(
+              FROM_GC(gc), (visitproc)immortalize_object, NULL);


Will this loop find code objects contained inside other code objects? (I don't know what exactly is contained in permanent_generation.head.)

code objects are not tracked by GC.
And most tuples are not tracked too.

So we need to find code and tuples via function objects, module global dict, class namespace dict, etc.

gvanrossum · 2022-03-01T21:35:38Z

Python/import.c

@@ -1829,6 +1829,10 @@ PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals,
        if (mod == NULL) {
            goto error;
        }
+        // Immortalize top level modules
+        if (tstate->recursion_limit - tstate->recursion_remaining == 1) {


Does this work? I put a printf here and it doesn't seem to be immortalizing most of the frozen modules:

Immortalizing <module 'winreg' (built-in)> Immortalizing <module '_frozen_importlib_external' (frozen)> Immortalizing <module 'zipimport' (frozen)> Immortalizing <module 'encodings' from 'C:\\Users\\gvanrossum\\cpython\\Lib\\encodings\\__init__.py'> Immortalizing <module '_winapi' (built-in)> Immortalizing <module 'encodings.mbcs' from 'C:\\Users\\gvanrossum\\cpython\\Lib\\encodings\\mbcs.py'> Immortalizing <module '_signal' (built-in)> Immortalizing <module 'io' (frozen)> Immortalizing <module 'site' (frozen)>

(Most frozen modules are imported at a much higher recursion level, either 7, 13 or 18.)

gvanrossum · 2022-03-01T21:39:26Z

Include/object.h

@@ -145,6 +167,20 @@ static inline Py_ssize_t Py_SIZE(const PyVarObject *ob) {
 }
 #define Py_SIZE(ob) Py_SIZE(_PyVarObject_CAST_CONST(ob))

+PyAPI_FUNC(PyObject *) _PyGC_ImmortalizeHeap(void);
+PyAPI_FUNC(PyObject *) _PyGC_TransitiveImmortalize(PyObject *obj);


(This name is awkward, I'd expect some variation on "immortalize transitively".)

gvanrossum · 2022-03-01T21:51:32Z

Python/pylifecycle.c

-        if (weaklist != NULL) { \
+        if (stdlib_list != NULL) { \
+            PyObject *list = user_weaklist; \
+            if (PySequence_Contains(stdlib_list, name)) { \
+                list = stdlib_weaklist; \
+            } \


To be (hyper-)correct you probably need to check for list != NULL after this, since it might just be possible that stdlib_list is not NULL but stdlib_weaklist or user_weaklist is NULL.

gvanrossum · 2022-03-01T21:53:53Z

Python/pylifecycle.c

+finalize_modules_clear_weaklist(PyThreadState *tstate,
+                                PyInterpreterState *interp,


Isn't the interpreter state easily found from the thread state? So you'd only need a tstate arg.

gvanrossum · 2022-03-01T21:56:41Z

Python/pylifecycle.c

-    // detect those modules which have been held alive.
-    PyObject *weaklist = finalize_remove_modules(modules, verbose);
+    // This prepares two lists, the user defined list of modules as well
+    // as stdlib list of modules. The user modules will be destroyed first in


Suggested change

// as stdlib list of modules. The user modules will be destroyed first in

// as the stdlib list of modules. The stdlib modules will be destroyed after all user modules in

(And probably reflow.)

gvanrossum · 2022-03-01T22:03:09Z

Objects/moduleobject.c

+    PyObject *key, *value;
+
+
+    /* First, clear only names starting with a single underscore */


Stray comment?

Also, no phase deletes __builtins__, right?

gvanrossum · 2022-03-01T22:04:38Z

Objects/moduleobject.c

+        }
+    }
+}
+
 void
 _PyModule_Clear(PyObject *m)


IIUC this is no longer used. Is that right?

gvanrossum · 2022-03-01T22:05:42Z

Objects/object.c

@@ -1994,7 +1995,9 @@ _Py_NewReference(PyObject *op)
 #ifdef Py_REF_DEBUG
    _Py_RefTotal++;
 #endif
-    Py_SET_REFCNT(op, 1);
+    /* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This


Suggested change

/* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This

/* Do not use Py_SET_REFCNT -- it skips the Immortal Instance check. This

methane · 2022-03-09T00:24:08Z

Modules/gcmodule.c

+    }
+    if (from_prev) {
+      _PyGCHead_SET_PREV(from_next, from_prev);
+    }


Why is this change needed at all?
And please use 4-space indent.

methane · 2022-03-09T00:24:32Z

Modules/gcmodule.c

+}
+
+PyObject *
+_PyGC_ImmortalizeHeap(void) {


I don't like this name. This function don't relating to "Heap" at all.

And please move the { to next line.

methane · 2022-03-09T00:28:02Z

Modules/gcmodule.c

+        Py_TYPE(FROM_GC(gc))->tp_traverse(
+              FROM_GC(gc), (visitproc)immortalize_object, NULL);


code objects are not tracked by GC.
And most tuples are not tracked too.

So we need to find code and tuples via function objects, module global dict, class namespace dict, etc.

bpo-40255: Implement Immortal Instances - Optimizations Combined

a72b3cd

eduardo-elizondo requested review from pablogsal, brettcannon, encukou, ericsnowcurrently, ncoghlan, warsaw and vsajip as code owners February 22, 2022 05:36

bedevere-bot added the awaiting review label Feb 22, 2022

the-knights-who-say-ni added the CLA signed label Feb 22, 2022

corona10 requested a review from vstinner February 22, 2022 05:55

eduardo-elizondo mentioned this pull request Feb 22, 2022

gh-84436: Implement Immortal Objects #19474

Merged

brettcannon removed their request for review February 23, 2022 21:49

gvanrossum reviewed Mar 1, 2022

View reviewed changes

methane reviewed Mar 9, 2022

View reviewed changes

ezio-melotti removed the CLA signed label Jul 13, 2022

eduardo-elizondo mannequin mentioned this pull request Jun 6, 2022

Fixing Copy on Writes from reference counting and immortal objects #84436

Closed

eduardo-elizondo closed this Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

eduardo-elizondo commented Feb 22, 2022 •

edited by encukou

Loading

kumaraditya303 commented Feb 22, 2022

ericsnowcurrently commented Feb 22, 2022

gvanrossum left a comment

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

methane Mar 9, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

gvanrossum Mar 1, 2022

methane Mar 9, 2022

methane Mar 9, 2022

methane Mar 9, 2022

		Py_TYPE(FROM_GC(gc))->tp_traverse(
		FROM_GC(gc), (visitproc)immortalize_object, NULL);

		finalize_modules_clear_weaklist(PyThreadState *tstate,
		PyInterpreterState *interp,

	// as stdlib list of modules. The user modules will be destroyed first in
	// as the stdlib list of modules. The stdlib modules will be destroyed after all user modules in

		PyObject key, value;


		/* First, clear only names starting with a single underscore */

	/* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This
	/* Do not use Py_SET_REFCNT -- it skips the Immortal Instance check. This

bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

Conversation

eduardo-elizondo commented Feb 22, 2022 • edited by encukou Loading

Benchmark Results

kumaraditya303 commented Feb 22, 2022

ericsnowcurrently commented Feb 22, 2022

gvanrossum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eduardo-elizondo commented Feb 22, 2022 •

edited by encukou

Loading