-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster, parallel type checking #933
Comments
To give an idea of the scope of this, I think it would be useful to know precisely what data needs to be shared. Is it enough to give the slave processes the semantically analyzed tree of the modules it depends on? Idea: In this case we can envision changing the current |
I explained a possible serialization format in #932. We should be able to use the same format to share data about modules in parallel builds. A slave process could work like this, I think:
I'm less clear about how the coordinator would work. Some ideas:
|
Hi, is there any updates? thanks |
This would be really cool. Is there any way I can help? |
@manmartgarc Any help would be very welcome! I'm happy to chat about the details over Zoom if you are interested in working on this (you can find my email in the mypy commit history). |
Currently I invoke mypy on all my files like |
@jgarvin In my experience, running mypy only on a part of a code-base sometimes gives different results from a full mypy run. However, I believe the intention is for this to work, and any deviation is really a bug. I assume you would need to use separate cache directories for the processes to not corrupt each other (not sure this is true though). Also just noting, when doing a full mypy run, there should usually not be any need to filter files with |
Hmm mypy cache on a relatively small project (<20k lines) is ~40MB. So for a 128 core machine I'd be spending 4GB of disk on mypy caches which in the grand scheme of things is not huge disk space consumption but I have to imagine touching that much disk must slow down the checking. I looked at the caches to see what was responsible for the size and it seems it's a ton of JSON files which is not a very compact format. Maybe pickle would work better, or sqlite. |
@jgarvin #15731 and #15981 include some methods on reducing the size of the cached JSON files. I mentioned it in the comments somewhere, but pickling might take up more space since we only store certain fields in the JSON files, meaning pickling could potentially include data we don't need/want (have yet to look into this). Mypy does have a SQLite cache option, though it basically just stores the JSON data and filename in a table, see #3456 (comment) . |
If we have a sufficiently large program to type check we should be able to speed up type checking significantly by using multiple type checker processes that work in parallel.
One potential way to implement this:
For this to really work we are going to need a way of efficiently communicating type check results for a module between processes (to avoid type checking shared dependencies multiple times). Having a JSON serialization format (see #932) would be sufficient.
Additionally we need a quick way of figuring out the dependency graph of all modules (or at least an approximation of it). We'll probably have to cache that between runs and update it incrementally, similar to the approach outlined in #932.
So how much would this help? Obviously this depends on the shape of the dependency graph. Under reasonable assumptions we should be able to hit an effective level of parallelism of at least 4 for larger programs, but I wouldn't be surprised if we could get even better than that. Cyclic module dependencies can add a limit to how far we can parallelize. We can probably estimate the achievable level of parallelism for a particular program by analyzing the module dependency graph.
This is probably only worth implementing after we have incremental type checking (#932), and we should preserve incrementalism -- i.e., we'd only type check modules modified since the last run and modules that depend on them.
The text was updated successfully, but these errors were encountered: