Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress cached JSON files #15981

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

dosisod
Copy link
Contributor

@dosisod dosisod commented Aug 29, 2023

Closes #15731

@github-actions

This comment has been minimized.

@dosisod
Copy link
Contributor Author

dosisod commented Aug 29, 2023

At some point we should probably rename the .json cache files to .json.gz since they're gzipped now. The .json extension is sprinkled in a lot of different places though, so some refactoring would need to be done to fully complete this. Should this be done in this PR, or is it ok to leave it for another PR? Thanks!

@github-actions

This comment has been minimized.

Copy link
Collaborator

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that in this PR!

This seems to introduce new issues (as expected), particularly when manually
modifying/creating JSON files during testing. To get around this I should
either make a `gzfile` directive, or add a `gzipped` flag to the `file`
directive.
@dosisod
Copy link
Contributor Author

dosisod commented Sep 14, 2023

@hauntsaninja it would appear that some of the mypy tests rely on manually creating JSON files:

-- This is a heinous hack, but we simulate having a invalid cache by clobbering
-- the proto deps file with something with mtime mismatches.
[file ../.mypy_cache/3.8/@deps.meta.json.2]
{"snapshot": {"__main__": "a7c958b001a45bd6a2a320f4e53c4c16", "a": "d41d8cd98f00b204e9800998ecf8427e", "b": "d41d8cd98f00b204e9800998ecf8427e", "builtins": "c532c89da517a4b779bcf7a964478d67"}, "deps_meta": {"@root": {"path": "@root.deps.json", "mtime": 0}, "__main__": {"path": "__main__.deps.json", "mtime": 0}, "a": {"path": "a.deps.json", "mtime": 0}, "b": {"path": "b.deps.json", "mtime": 0}, "builtins": {"path": "builtins.deps.json", "mtime": 0}}}

Now that gzipped files are expected, what would be the best way to get around this? One of the things I considered was adding a gzip_file directive to the test data runner that's similar to file, except it gzips the data before writing it to disk. Does that seem like a good route, or do you see a better option?

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

graphql-core (https://github.com/graphql-python/graphql-core): typechecking got 1.05x slower (295.0s -> 311.0s)
(Performance measurements are based on a single noisy sample)

@JukkaL
Copy link
Collaborator

JukkaL commented Sep 14, 2023

Since this breaks some workflows that parse or compress the mypy cache files, I wonder if this should be treated as a breaking change, which would benefit from a feature flag to enable/disable compression. In the first release with the feature we'd have the flag off by default, and we could turn it on by default in the next feature release. See our policy for backward incompatible changes for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Reduce Mypy's Cache Size
3 participants