-
-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sys.meta_path entry for APE zip store #425
Conversation
CosmoImporter.find_spec is a function similar to the find_spec methods found in _bootstrap.py and _bootstrap_external.py. It checks for built-in modules, frozen modules, and modules that may be within the zip store of the APE. For looking within the zip store, it performs upto two additional stat calls, but this is balanced by the fact that no more stat calls need to occur when loading a pyc file from the zip store. CosmoImporter calls small functions written within _bootstrap.py to create the correct ModuleSpec objects. This is done because the ModuleSpec.__init__ requires origin and is_package to be specified as kwargs, which is not easy to do from within C.
test_cmd_line did not consider that sys.meta_path would have a different starting entry, and so what would happen in the isolated mode test is that CosmoImporter would pick up uuid.pyc from the zip store, and claim no errors in both cases. Now the test removes CosmoImporter from sys.meta_path and so the expected behavior occurs again.
All tests pass in |
- malloc+qsort table entries for frozen modules - malloc+qsort table entries for builtin modules - static+qsort table entries for zip cdir modules used in startup - use bsearch instead of linear search for module names - use zip cdir table entries to avoid wasting stat calls - SourcelessFileLoader can't rely on stat when loading code
- the identifiers were used only in 1 function each - stinfo risks getting mixed in between different imports - stinfo was carrying over values from previous calls
_testcapi.run_in_subinterp makes a questionable PyImport_Cleanup(), which we work around by having a separate cleanup function for the lookup tables. also added a separate initialization function for the lookup tables for symmetry. some commented out code in the previous commit messed with GC, it was made visible again.
Ok I implemented a simple lookup table with binary search to check for modules. It doesn't seem to have improved on the
with merging this PR it becomes
The table |
All |
the hardest import method format to speed up is the "from x import y", because I have no idea whether y is a module or not. If y is a module, "from x import y" behaves similarly to the format "import x.y; y = x.y", which means I can bypass some checks by doing the import and equality separately. If y is a function, "from x import y" is not equivalent to the above format, and the import will trigger a ModuleNotFoundError. Unfortunately I can't check for or propagate such errors during the default APE startup without adding a whole bunch of checks, so I'd prefer to avoid these kinds of imports unless absolutely necessary. in this PR I avoid three such function imports in io.py and ntpath.py. The fix is to import the module wholesale (which happens underneath anyway) and then just check for the attribute - "import x; y = x.y", essentially similar to how it would be if y was a module.
Another interesting thing I noticed is that APE Python startup time is now competitive with Anaconda Python 3.7:
Would be nice to get a separate computer to measure benchmarks without jitter. |
it only provides a marginal boost over the existing setup, plus reasoning about package existence became difficult (what if the user deletes one of the files that have an entry in the table?)
- CosmoImporter calls the other two Importer's checks internally - ExtensionFileLoader will not be called because static
So the above comparisons might now be inaccurate, because I removed the hardcoded table that allowed skipping some |
and add an error message if failure
In jart#248 we changed _bootstrap_external to check for bytecode first, and then check for source files when loading in the import process. Now that we have our own meta_path entry, this change can be undone.
@jart this PR adds a
CosmoImporter
as the first entry insys.meta_path
, thereby allowing a glance (and room for optimizations) into another part of Python's import process at the C level.Entries in
sys.meta_path
return aModuleSpec
instance if they can find a particular module, otherwise they returnNone
.With
CosmoImporter
it is now possible to do things like:std::map<modulename, location_in_zip_store>
to avoidstat
syscalls for modules that are part of the startup processWith
CosmoImporter
, theSourcelessFileLoader
added in #408 now has an additional attribute to avoid astat
syscall.For a rough benchmark I again use the
import_stdlib.py
from #408, which imports all available packages in the APE stdlib.with
MODE=optlinux
and current HEAD commit c06ffd4:with
MODE=optlinux
and current HEAD commit merged with this PR:Not much, but by writing better versions of
is_builtin
andfind_frozen
and a lookup table for zip store modules, it can be reduced further.