Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze/link all stdlib modules imported during startup. #82

Closed
23 of 46 tasks
ericsnowcurrently opened this issue Aug 9, 2021 · 16 comments
Closed
23 of 46 tasks

Freeze/link all stdlib modules imported during startup. #82

ericsnowcurrently opened this issue Aug 9, 2021 · 16 comments
Assignees

Comments

@ericsnowcurrently
Copy link
Collaborator

ericsnowcurrently commented Aug 9, 2021

During runtime startup we import a bunch of modules, particularly if the site module is used (without -S). Any of these that aren't frozen/linked incur a substantial IO overhead. We already freeze importlib and link a number of builtin modules. The idea here is that we freeze/link the rest.

To do:

after:

investigate:

  • create a flamegraph for "after"
  • figure out where all the unexpected syscalls are coming from (e.g. 25 stats, down from 80)
  • get a clear picture of where the other syscalls come from (e.g. "rt_sigaction", "mmap")

related:


Candidate solutions to the editing-stdlib-.py-files problem, from bpo-45020:

  • use a command-line flag to opt-out of frozen modules?
  • use a build flag to opt out (e.g. a configure flag or a new Py_NO_FROZEN or even Py_DEBUG)?
  • ignore frozen modules if it's a dev build?
  • (note: importlib._bootstrap and importlib._bootstrap_external must always be frozen, regardless of a flag)
  • accommodate users of an installed Python that sometimes edit stdlib modules while debugging code?
  • always emit a warning if a frozen module is ignored (in favor of the source module)?
@ericsnowcurrently ericsnowcurrently self-assigned this Aug 9, 2021
@gvanrossum
Copy link
Collaborator

It would be nice to be able to put numbers on how substantial the I/O overhead really is.

@ericsnowcurrently
Copy link
Collaborator Author

ericsnowcurrently commented Aug 12, 2021

I've found the following .py modules get imported during startup (some only without -S) and frozen them:

without site (python -S):

  • abc
  • codecs
  • encodings.* # we need all of them since the one we use is determined at runtime
  • io

with site:

  • _collections_abc
  • _sitebuiltins
  • genericpath
  • os
  • posixpath (or ntpath)
  • site
  • stat

See my branch: https://github.com/ericsnowcurrently/cpython/tree/frozen-modules-cleanup

before adding stdlib modules
# repeated 100 times and averaged
$ strace -c ./python -c pass
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 20.43    0.001504          22        68         0 rt_sigaction
 17.32    0.001275          15        80         8 stat
  9.62    0.000708          15        47         0 fstat
  7.46    0.000549          15        35         0 read
  7.35    0.000541          17        31         2 openat
  6.66    0.000490          15        32         0 close
  6.20    0.000456          15        30         3 lseek
  5.62    0.000414          19        21         0 mmap
  4.21    0.000310          16        19        17 ioctl
  3.77    0.000277          23        12         0 mprotect
  2.46    0.000181          15        12         0 getdents
  1.70    0.000125          15         8         0 brk
  1.42    0.000105          14         7         7 access
  1.02    0.000075          25         3         0 fcntl
  0.60    0.000044          14         3         0 dup
  0.46    0.000034          33         1         0 getrandom
  0.41    0.000030          30         1         0 gettid
  0.39    0.000029           9         3         0 munmap
  0.31    0.000023          23         1         0 futex
  0.30    0.000022          21         1         0 arch_prctl
  0.29    0.000022          21         1         0 prlimit64
  0.29    0.000021          21         1         0 set_robust_list
  0.29    0.000021          21         1         0 set_tid_address
  0.29    0.000021          21         1         0 rt_sigprocmask
  0.23    0.000017          16         1         0 sysinfo
  0.19    0.000014          14         1         0 getcwd
  0.19    0.000014          13         1         1 readlink
  0.16    0.000012          11         1         0 getgid
  0.15    0.000011          10         1         0 getegid
  0.12    0.000009           8         1         0 geteuid
  0.11    0.000008           8         1         0 getuid
  0.00    0.000000           0         1         0 execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.007362          17       427        38 total
$ time ./python -c pass

real    0m0.024s
user    0m0.020s
sys     0m0.004s
$ time ./python -c pass

real    0m0.029s
user    0m0.025s
sys     0m0.004s
$ time ./python -c pass

real    0m0.028s
user    0m0.027s
sys     0m0.001s
after adding stdlib modules
# repeated 100 times and averaged
$ strace -c ./python -c pass
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 15.22    0.000731          10        68         0 rt_sigaction
 10.97    0.000527          25        21         0 mmap
 10.35    0.000497          19        25         5 stat
  9.74    0.000468          27        17         2 openat
  8.47    0.000407          20        20         0 fstat
  7.97    0.000383          21        18         0 close
  5.27    0.000253          21        12         0 mprotect
  4.99    0.000240          26         9         0 read
  4.88    0.000234          23        10         0 getdents
  3.28    0.000158          19         8         0 brk
  2.64    0.000127          18         7         7 access
  2.10    0.000101          16         6         4 ioctl
  2.04    0.000098          48         2         0 fcntl
  1.97    0.000095          23         4         3 lseek
  1.04    0.000050          16         3         0 dup
  0.75    0.000036          35         1         0 gettid
  0.72    0.000035          34         1         0 sysinfo
  0.70    0.000034          33         1         0 set_robust_list
  0.67    0.000032          32         1         0 getrandom
  0.63    0.000030          30         1         0 set_tid_address
  0.63    0.000030          10         3         0 munmap
  0.62    0.000030          29         1         0 futex
  0.62    0.000030          29         1         0 rt_sigprocmask
  0.60    0.000029          28         1         0 prlimit64
  0.56    0.000027          27         1         1 readlink
  0.55    0.000026          26         1         0 getcwd
  0.51    0.000025          24         1         0 getgid
  0.44    0.000021          21         1         0 getegid
  0.44    0.000021          21         1         0 getuid
  0.40    0.000019          19         1         0 arch_prctl
  0.21    0.000010          10         1         0 geteuid
  0.00    0.000000           0         1         0 execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.004806          19       249        22 total
$ time ./python -c pass

real    0m0.025s
user    0m0.025s
sys     0m0.001s
$ time ./python -c pass

real    0m0.026s
user    0m0.021s
sys     0m0.005s
$ time ./python -c pass

real    0m0.023s
user    0m0.022s
sys     0m0.001s

(Note that I ran this comparison in a VM, so the pure numbers aren't so reliable.)

Overall, It seems like there is a real, positive impact on performance (roughly 10%?), though the difference isn't huge (and the VM makes it a little cloudy).

@ericsnowcurrently
Copy link
Collaborator Author

ericsnowcurrently commented Aug 13, 2021

I re-ran the comparison 100 times and averaged the results:

before freezing those stdlib modules:

$ time ./python -c pass

real:    0m0.023
user:    0m0.020
sys:     0m0.003

after freezing those stdlib modules:

$ time ./python -c pass

real:    0m0.020
user:    0m0.018
sys:     0m0.002

So I'm seeing a 15% improvement, even after factoring in disk caching. (again, running in a VM) I'm still going to check those other "todo" items above.

@ericsnowcurrently
Copy link
Collaborator Author

ericsnowcurrently commented Aug 26, 2021

@gvanrossum, if I exclude encodings.* from freezing, the results are basically the same. (It's actually slower: 21ms instead of 20ms.)

@ericsnowcurrently
Copy link
Collaborator Author

I've verified that all the startup modules are either builtin or frozen:
$ ./python -c 'import sys^Mfor name in sorted(sys.modules): print(f"{name:30} {sys.modules[name]}")'
__main__                       <module '__main__' (built-in)>
_abc                           <module '_abc' (built-in)>
_codecs                        <module '_codecs' (built-in)>
_collections_abc               <module '_collections_abc' (frozen)>
_frozen_importlib              <module '_frozen_importlib' (frozen)>
_frozen_importlib_external     <module '_frozen_importlib_external' (frozen)>
_imp                           <module '_imp' (built-in)>
_io                            <module '_io' (built-in)>
_signal                        <module '_signal' (built-in)>
_sitebuiltins                  <module '_sitebuiltins' (frozen)>
_stat                          <module '_stat' (built-in)>
_thread                        <module '_thread' (built-in)>
_warnings                      <module '_warnings' (built-in)>
_weakref                       <module '_weakref' (built-in)>
abc                            <module 'abc' (frozen)>
builtins                       <module 'builtins' (built-in)>
codecs                         <module 'codecs' (frozen)>
encodings                      <module 'encodings' (frozen)>
encodings.aliases              <module 'encodings.aliases' (frozen)>
encodings.utf_8                <module 'encodings.utf_8' (frozen)>
genericpath                    <module 'genericpath' (frozen)>
io                             <module 'io' (frozen)>
marshal                        <module 'marshal' (built-in)>
os                             <module 'os' (frozen)>
os.path                        <module 'posixpath' (frozen)>
posix                          <module 'posix' (built-in)>
posixpath                      <module 'posixpath' (frozen)>
site                           <module 'site' (frozen)>
stat                           <module 'stat' (frozen)>
sys                            <module 'sys' (built-in)>
time                           <module 'time' (built-in)>
zipimport                      <module 'zipimport' (frozen)>

@gvanrossum
Copy link
Collaborator

This is awesome. Maybe move the generated files to a subdir like clinic?

@markshannon
Copy link
Member

markshannon commented Sep 20, 2021

One way to remove the regenerated file problem in PRs is to only generate one file.
One 1M line file that no one can read is no more or less incomprehensible that 100 10K line files that no one can read, but the GitHub PR experience is vastly better.

Another thing we can do is to only put two bytes per line not 16 in the output of the bytecode. Since we often add or remove a few instructions here and there, the diffs can be very large as subsequently lines get shifted.
With only two bytes (one instruction) per line the diffs should be much more localized preventing the size of the repo from ballooning. And drop the whitespace on the left to keep the file size down.

E.g.

    100,19,100,20,147,1,100,21,100,22,147,1,100,23,100,24,

would become

100,19,
100,20,
147,1,
100,21,
100,22,
147,1,
100,23,
100,24,

which is only 3 bytes larger, but produces much smaller diffs should one or two instructions be changed.

Forget readability of generated code. Minimizing the number of files and keeping diffs down are much more important to ease of development.

@gvanrossum
Copy link
Collaborator

But it’s marshal data, and the constants, names etc. are not aligned on even bytes. So maybe one byte per line?

@ericsnowcurrently
Copy link
Collaborator Author

Note that the generated .h files are no longer in the repo, so they don't show up in diffs or PRs any more. Thus we don't need to worry about strategies to mitigate the earlier annoyances.

One way to remove the regenerated file problem in PRs is to only generate one file.

Good point. Having a single .h file would also mean a single include in frozen.c (and we wouldn't have to generate the includes any more). So this may be worth doing even though .h files aren't in the repo.

Another thing we can do is to only put two bytes per line not 16 in the output of the bytecode. Since we often add or remove a few instructions here and there, the diffs can be very large as subsequently lines get shifted.

Good point, though there isn't much value to changing this now.

@ericsnowcurrently ericsnowcurrently changed the title [Experiment] Freeze/link all stdlib modules imported during startup. Freeze/link all stdlib modules imported during startup. Sep 30, 2021
@ericsnowcurrently
Copy link
Collaborator Author

@pxeger
Copy link

pxeger commented Oct 6, 2021

Probably too late now, but you could have got around the PR/diff problem with a setting in a .gitattributes file

@ericsnowcurrently
Copy link
Collaborator Author

Probably too late now, but you could have got around the PR/diff problem with a setting in a .gitattributes file

Thanks for the tip. We tried doing that but it only helped with some of the problems with having the frozen .h files in the repo.

@ericsnowcurrently
Copy link
Collaborator Author

FYI, @FFY00 is helping out with some of the remaining work related to frozen modules.

@gvanrossum
Copy link
Collaborator

Maybe we should put this epic to bed? We've done what we could, and at least on systems using GCC or clang the results are looking good.

@ericsnowcurrently
Copy link
Collaborator Author

Yeah. There are lingering odds and ends but nothing essential.

Repository owner moved this from Todo to Done in Fancy CPython Board Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants