move importlib functions to within C #408

ahgamut · 2022-05-14T18:24:43Z

@jart this moves some of the frequently-used functions from Python code in _bootstrap_external.py to C in import.c.

All tests pass in MODE=, MODE=tiny.

The idea is that almost every line of Python code can cause a memory allocation, and so functions that are frequently called end up wasting memory repeatedly. By moving the Python code to C, it becomes more clear how the function calls occur, and we can attempt more optimizations at the lower C level.

For example: during APE startup, _validate_bytecode_header and code_to_bytecode are called back-to-back for every .pyc import. This leads to a repeated sequence like below

    def get_code(self, fullname):
        # small UnicodeObject allocation for path
        path = self.get_filename(fullname)
        # allocate a large BytesObject from the `.pyc` file
        data = self.get_data(path)
        # allocate another large BytesObject because slicing data[:12] after validating bytecode header
        bytes_data = _validate_bytecode_header(data, name=fullname, path=path)
        # allocate a CodeObject after evaluating the BytesObject
        return _compile_bytecode(bytes_data, name=fullname, bytecode_path=path)

In C we can mostly avoid the BytesObjects allocations and many function calls by just calling PyMarshal_ReadObjectFromFile with a valid FILE *.

It's a small speedup, but it adds up if all tests run slightly faster than before.

when _bootstrap_external is undergoing setup(), I modify the functions and classes in its namespace (ie its __dict__) with those from _imp.

ahgamut · 2022-05-14T18:27:14Z

Testing on MODE=dbg is showing some ubsan errors.

ahgamut · 2022-05-14T18:53:34Z

diff --git a/libc/zipos/read.c b/libc/zipos/read.c
index 2767dd10c..4fc80fa4d 100644
--- a/libc/zipos/read.c
+++ b/libc/zipos/read.c
@@ -43,7 +43,7 @@ ssize_t __zipos_read(struct ZiposHandle *h, const struct iovec *iov,
   x = y = opt_offset != -1 ? opt_offset : h->pos;
   for (i = 0; i < iovlen && y < h->size; ++i, y += b) {
     b = min(iov[i].iov_len, h->size - y);
-    memcpy(iov[i].iov_base, h->mem + y, b);
+    if(iov[i].iov_base) memcpy(iov[i].iov_base, h->mem + y, b);
   }
   if (opt_offset == -1) h->pos = y;
   return y - x;

silences the warnings.

test_modulefinder triggers a ubsan error.

.
�[J�[1;31masan error�[0m: heap use after free 1-byte load at 0x100081d7261c shadow 0x200903a64c3
o/dbg/third_party/python/pythontester.com.dbg
                                        x                                       
�[31mFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF�[39m
    |-3     |-3     |-3     |-3     |-3     |-3     |-3     |-3     |-3         
████████████████████████████████████████████████████████████████████████████████
000000400000-000001a7d000 .text
000001a7d000-0000038f9000 .data
00007fff0000-00008000ffff
000080070000-0000808affff
02008fff0000-0200903dffff ←shadow
020090690000-020090dbffff
020091b80000-020091b9ffff
0e007fff0000-0e008004ffff
100040330000-10004038ffff
100080000000-100081e8ffff ←address
1000834f0000-100086e3ffff
6ffffffe0000-7000001dffff

the crash was caused by
0x000000000044a283: __die at libc/log/die.c:42
0x0000000001761113: __asan_report_load at libc/intrin/asan.c:1207
0x0000000001460de6: r_byte at third_party/python/Python/marshal.c:762
0x0000000001460f81: r_object at third_party/python/Python/marshal.c:963
0x0000000001471fa3: read_object at third_party/python/Python/marshal.c:1517
0x000000000147226b: marshal_loads at third_party/python/Python/marshal.c:1818
0x0000000000ef05f4: _PyMethodDef_RawFastCallKeywords at third_party/python/Objects/call.c:673
0x0000000000ef0ef2: _PyCFunction_FastCallKeywords at third_party/python/Objects/call.c:752
0x0000000001385386: call_function at third_party/python/Python/ceval.c:4829
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3375
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eee7ba: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:457
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eee7ba: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:457
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x000000000137d531: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x0000000000efa7a9: method_call at third_party/python/Objects/classobject.c:331
0x0000000000ef83bc: PyObject_Call at third_party/python/Objects/call.c:271
0x00000000013464e2: do_call_core at third_party/python/Python/ceval.c:4917
0x0000000001362c29: _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3476
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fccf8: slot_tp_call at third_party/python/Objects/typeobject.c:6267
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000135fb72: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x0000000000efa7a9: method_call at third_party/python/Objects/classobject.c:331
0x0000000000ef83bc: PyObject_Call at third_party/python/Objects/call.c:271
0x00000000013464e2: do_call_core at third_party/python/Python/ceval.c:4917
0x0000000001362c29: _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3476
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fccf8: slot_tp_call at third_party/python/Objects/typeobject.c:6267
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000135fb72: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x0000000000efa7a9: method_call at third_party/python/Objects/classobject.c:331
0x0000000000ef83bc: PyObject_Call at third_party/python/Objects/call.c:271
0x00000000013464e2: do_call_core at third_party/python/Python/ceval.c:4917
0x0000000001362c29: _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3476
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fccf8: slot_tp_call at third_party/python/Objects/typeobject.c:6267
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000135fb72: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fbb9d: slot_tp_init at third_party/python/Objects/typeobject.c:6502
0x00000000010e74b0: type_call at third_party/python/Objects/typeobject.c:958
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000137a42d: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3375
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x000000000134e933: PyEval_EvalCodeEx at third_party/python/Python/ceval.c:4230
0x000000000134e993: PyEval_EvalCode at third_party/python/Python/ceval.c:709
0x0000000001338a25: builtin_exec_impl at third_party/python/Python/bltinmodule.c:1164
0x00000000013393ad: builtin_exec at ./third_party/python/Python/clinic/bltinmodule.inc:284
0x0000000000ef05f4: _PyMethodDef_RawFastCallKeywords at third_party/python/Objects/call.c:673
0x0000000000ef0ef2: _PyCFunction_FastCallKeywords at third_party/python/Objects/call.c:752
0x0000000001381af3: call_function at third_party/python/Python/ceval.c:4829
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eee7ba: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:457
0x000000000137d531: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef84ef: PyObject_Call at third_party/python/Objects/call.c:252
0x000000000071146d: RunModule at third_party/python/Modules/main.c:250
0x0000000000714dca: Py_Main at third_party/python/Modules/main.c:783
0x000000000048b199: RunPythonModule at third_party/python/repl.c:333
0x0000000000402a33: main at third_party/python/repl.c:353
0x000000000040a1b5: cosmo at libc/runtime/cosmo.S:69
0x00000000004021a4: _start at libc/crt/crt.S:85

There was a comment there warning of memory faults. asan/ubsan proved it right.

instead of reading the FILE* in little parts via PyMarshal_ReadObjectFromFile, now I just load the whole file into memory and call PyMarshal_ReadObjectFromString instead. This prevents ubsan warnings.

ahgamut · 2022-05-14T19:41:14Z

The following change enabled me to fix my code that was causing the ubsan warning:

diff --git a/libc/intrin/ubsan.c b/libc/intrin/ubsan.c
index ec357de0f..c65eb4a88 100644
--- a/libc/intrin/ubsan.c
+++ b/libc/intrin/ubsan.c
@@ -216,6 +216,7 @@ static void __ubsan_warning(const struct UbsanSourceLocation *loc,
                             const char *description) {
   kprintf("%s:%d: %subsan warning: %s is undefined behavior%s\n", loc->file,
           loc->line, SUBTLE, description, RESET);
+  __die();
 }
 
 dontdiscard __ubsan_die_f *__ubsan_abort(const struct UbsanSourceLocation *loc,

ahgamut · 2022-05-14T20:09:34Z

Now all tests pass in MODE=, MODE=tiny, and MODE=dbg.

now we know what happens inside, so we just call the actual function PyEval_EvalCode with the right setup.

ahgamut · 2022-05-14T21:27:05Z

@jart e043046 and fccb0f1 is all the code that needs to be changed in _bootstrap.py for locking during imports

spec._initializing is only there for a single check + optimization within import.c that ends up calling _lock_unlock_module. Since we are ignoring locks for the time being, this check can also be ignored.

jart · 2022-05-26T04:32:08Z

-    memcpy(iov[i].iov_base, h->mem + y, b);
+    if(iov[i].iov_base) memcpy(iov[i].iov_base, h->mem + y, b);

To avoid undefined behavior there, you'd like want:

-    memcpy(iov[i].iov_base, h->mem + y, b);
+    if(b) memcpy(iov[i].iov_base, h->mem + y, b);

jart · 2022-05-26T04:36:47Z

It's a small speedup, but it adds up if all tests run slightly faster than before.

I'd be interested in seeing the numbers. Startup latency is something I value. It's difficult to do with a change like this, because the system calls themselves usually have a lot of overhead. For instance, a stat() system call can be on the order of a microsecond. Whereas a crossover between Python and C my best guess (not having measured it) would probably be around 100ns. There's also some latency when assets are pulled out and inflated from the zip.

ahgamut · 2022-05-27T02:41:49Z

So I implemented this because I read a blog post about Python startup times, and the question I had was "How can we use the benefits of APE ZIP store + Cosmopolitan Libc to speedup imports in APE Python?"

I tried to do this earlier when submitting #248, but at that time I wasn't as good at writing CPython with Cosmopolitan Libc's support.

Performance is measured in that blog post by just checking time for importing all available stdlib packages.
Simple gist from that post modified for APE Python: import_stdlib.py

time with current HEAD commit and MODE=optlinux:

real	0m0.390s
user	0m0.378s
sys	0m0.013s

time with current HEAD commit + this PR merged and MODE=optlinux:

real	0m0.366s
user	0m0.350s
sys	0m0.017s

If you like I can run the pyperformance benchmark for startup times. We can also look at python.com -m cProfile import_stdlib.py to see how the calls are distributed.

For instance, a stat() system call can be on the order of a microsecond. Whereas a crossover between Python and C my best guess (not having measured it) would probably be around 100ns. There's also some latency when assets are pulled out and inflated from the zip.

By moving things like this into C, ftrace/strace logs become clearer when trying imports, and so this part of import process (most used during startup) will be easier to optimize in C than in Python. (Debugging the C is easier too thanks to the wonderful ubsan runtime ❤️)

For example, we can add a bool into the SourcelessFileLoader struct to avoid an unnecessary stat call, which is a change that would be difficult to observe+do/not be as effective if done in Python. I imagine other ZIP store/Cosmo-specific tricks also become viable.

ahgamut added 3 commits May 14, 2022 23:23

add C impls of functions in _bootstrap_external

d1278fb

override methods in _bootstrap_external

b71fda9

when _bootstrap_external is undergoing setup(), I modify the functions and classes in its namespace (ie its __dict__) with those from _imp.

skip regex-matching test for error reports

cf00bd8

ahgamut added 2 commits May 15, 2022 00:46

fix refcount issue after ubsan pointed out error

ea2a5ff

There was a comment there warning of memory faults. asan/ubsan proved it right.

fix issue pointed out by ubsan warning

0632853

instead of reading the FILE* in little parts via PyMarshal_ReadObjectFromFile, now I just load the whole file into memory and call PyMarshal_ReadObjectFromString instead. This prevents ubsan warnings.

change methods to use METH_FASTCALL

6f8c108

ahgamut added 2 commits May 15, 2022 02:28

remove hacky PyBuiltin_Exec

d5543c8

now we know what happens inside, so we just call the actual function PyEval_EvalCode with the right setup.

remove locking context managers

e043046

ahgamut added 3 commits May 15, 2022 03:47

PyEval_EvalCode is just a wrapper

a0c31d2

remove spec._initializing + related locking check

fccb0f1

spec._initializing is only there for a single check + optimization within import.c that ends up calling _lock_unlock_module. Since we are ignoring locks for the time being, this check can also be ignored.

SourcelessFileLoader needn't be a full object

154de5a

ahgamut added 2 commits May 27, 2022 07:34

Merge branch 'master' into import-modc

86ddfa2

fix weird test flake in sys

f8b56b7

jart approved these changes May 27, 2022

View reviewed changes

jart merged commit 7e9fb0a into jart:master May 27, 2022

ahgamut mentioned this pull request Jun 16, 2022

add sys.meta_path entry for APE zip store #425

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move importlib functions to within C #408

move importlib functions to within C #408

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022 •

edited

Loading

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022 •

edited

Loading

jart commented May 26, 2022

jart commented May 26, 2022

ahgamut commented May 27, 2022 •

edited

Loading

move importlib functions to within C #408

move importlib functions to within C #408

Conversation

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022 • edited Loading

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022

ahgamut commented May 14, 2022 • edited Loading

jart commented May 26, 2022

jart commented May 26, 2022

ahgamut commented May 27, 2022 • edited Loading

ahgamut commented May 14, 2022 •

edited

Loading

ahgamut commented May 14, 2022 •

edited

Loading

ahgamut commented May 27, 2022 •

edited

Loading