Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review tree_mft.py with mypy --strict #81

Conversation

ajnelson-nist
Copy link
Contributor

No description provided.

The `Mmap` class in `tree_mft.py` appears to be a redundant and fallen-
behind implementation of the `Mmap` class in `BinaryParser.py`.  This
patch uses the central implementation instead, which happens to enable
`mpy --strict` to pass.

These changes were tested in Pythons 3.8 through 3.12.  No effects were
observed on Make-managed files.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist ajnelson-nist marked this pull request as ready for review October 17, 2023 21:39
@ajnelson-nist ajnelson-nist marked this pull request as draft October 17, 2023 23:13
@ajnelson-nist
Copy link
Contributor Author

Sorry, converting to draft.

mypy --strict is being lax with requiring some type signatures along the call path, and it appears to hide a type-incompatibility.

Changing BinaryParser.Mmap.__enter__ to return mmap.mmap, accurately reflecting what it's doing now, causes another type error to arise:

indxparse/list_mft.py:385: error: Argument 1 to "MFTEnumerator" has incompatible type "mmap"; expected "array[Any]"  [arg-type]

I had thought mypy --strict would flag that, or flag the missing signatures on __enter__, but apparently not.

mypy version: 1.6.0.

I'll work through this tomorrow.

@williballenthin
Copy link
Owner

memory is cheap these days (it wasn't when the script was first written) so i think it might be reasonable to remove the mmap code. just read the whole MFT into memory as a bytearray.

alternatively, since Python 3.2 mmap supports being used as a context manager directly, so we should use that, not our own (probably broken) implementation.

This patch includes some updates to type signatures so `mypy` continues
to pass as called in the top Makefile.

These changes were tested in Pythons 3.8 through 3.12.  No effects were
observed on Make-managed files.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
This patch adds type annotations along the flow from some parameters
named `buf` were changed from being typed `array.array` to `mmap.mmap`.
This review seems to show there are no type conflicts along these
specific changes, according to `mypy` as it runs in CI.

These changes were tested in Pythons 3.8 through 3.12.  No effects were
observed on Make-managed files.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist
Copy link
Contributor Author

memory is cheap these days (it wasn't when the script was first written) so i think it might be reasonable to remove the mmap code. just read the whole MFT into memory as a bytearray.

alternatively, since Python 3.2 mmap supports being used as a context manager directly, so we should use that, not our own (probably broken) implementation.

Removing BinaryParser.Mmap and replacing its usage with two contexts induces some type revisions in MFT.py, changing some variables named buf and self._buf from the uniformly applied array.array to a few instances of mmap.mmap. However, I also checked over the information flow from those instances, and things seem to work out - see the second patch I just pushed (72cfe41).

So, if the overall decision is to keep mmap.mmap, things appear to work out with how they're currently written.

If the overall decision is instead to just read everything into memory, we keep type uniformity at the expense of RAM pressure from significant MFT sizes.

Unfortunately, I'm not sure there is currently a clean solution that lets some paths into the MFT.py classes use mmap, and others use something else like starting with array.arrays. I'm basing this on a not-yet-complete read of PEP 688, via Python typing Issue 593. The Buffer protocol is only implemented in Python 3.12, and only available as a kind of stub type hint in prior Python versions. And from this comment/code block in Typeshed, it looks like there is currently not a difference between readable and writable buffers.

So, it'd be fine to me whichever way this decision goes - keep mmap, or read everything up front. My personal preference is to keep mmap, because there might be some cases in the code where this will be a question about disk images. I can't recall seeing one offhand, but I think as the CLI testing expands, one's gonna be found.

@ajnelson-nist ajnelson-nist marked this pull request as ready for review October 18, 2023 14:57
@williballenthin
Copy link
Owner

sounds good to me - lets keep mmap.

I do recall seeing some MFTs from file servers that were multiple GBs, which would put pressure on some laptops these days.

@ajnelson-nist
Copy link
Contributor Author

All righty! Keeping mmap, this is ready for your review and merge.

Copy link
Owner

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@williballenthin williballenthin merged commit eec3035 into williballenthin:master Oct 18, 2023
2 checks passed
@ajnelson-nist ajnelson-nist deleted the review_tree_mft_py_with_mypy_strict branch October 18, 2023 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants