-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split load_file
based on return type
#4823
Conversation
If the value of spliting load_files in two functions is only type checking, we could solve this by doing: if typing.TYPE_CHECKING:
@typing.overload
def load_file(
fname: Union[str, os.PathLike],
*,
read_cb: Optional[Callable[[int], None]] = None,
quiet: bool = False,
decode: typing.Literal[True] = True
) -> str:
...
@typing.overload
def load_file(
fname: Union[str, os.PathLike],
*,
read_cb: Optional[Callable[[int], None]] = None,
quiet: bool = False,
decode: typing.Literal[False] = False
) -> bytes:
...
def load_file(
fname: Union[str, os.PathLike],
*,
read_cb: Optional[Callable[[int], None]] = None,
quiet: bool = False,
decode: bool = False,
) -> Union[str, bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically this doesn't do much, but it helps with typing static analysis tools and simplifies reasoning about what functions. Whether encode
is turning binary into utf-8 text or the inverse of that is something that may not be obvious to readers of the code - human grokking is easier with this change.
I have one nit, but feel free to take it or leave it.
Also I raised it in a comment (and I'm fine with this being a separate change), but probably worth looking into someday in the future is whether reading open()
in text mode would be more performant than manually encoding - I would expect that it is written in C and therefore is probably faster than post-processing in Python, but I may be wrong - that's just a hunch.
@@ -124,18 +125,14 @@ def lsb_release(): | |||
return data | |||
|
|||
|
|||
def decode_binary(blob, encoding="utf-8"): | |||
def decode_binary(blob: Union[str, bytes], encoding="utf-8") -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since blob.decode()
is simpler than import util -> util.decode_binary(blob)
, this helper really only has value the type of blob
is unknown or only known at runtime. With typing improving our understanding of variables throughout the codebase I don't think it really makes sense to use it at new call sites unless the type really is variable at runtime.
|
||
|
||
def encode_text(text, encoding="utf-8"): | ||
def encode_text(text: Union[str, bytes], encoding="utf-8") -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, but for .encode()
.
read_cb: Optional[Callable[[int], None]] = None, | ||
quiet: bool = False, | ||
) -> str: | ||
return decode_binary(load_binary_file(fname, read_cb=read_cb, quiet=quiet)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per the comment above, this could also just use the builtin .decode()
method, since we know the type returned by load_binary_file()
:
return load_binary_file(fname, read_cb=read_cb, quiet=quiet).decode()
`line` gets reused in two different loops in cc_mounts, causing mypy to think the later line has the same type as the early line. Change `line` to `entry` to make mypy happy.
) The `load_file` utility function has a `decode` flag to determine whether to return string or bytes. Having two possible return types makes reasoning its usage hard. Instead create a `load_text_file` function that always returns strings and replace all calls of `load_file` that were returning strings.
…anonical#4823) In the last commit, all `load_file` calls that returned strings were replaced with `load_text_file`. Now that they have been replaced, all remaining calls to `load_file` return bytes, so we can remove the `decode` parameter and make the naming more explicit.
d71f231
to
142b5d5
Compare
I'm going to leave it for now to not add any additional complexity, but I think there's further improvements we could make here. |
`line` gets reused in two different loops in cc_mounts, causing mypy to think the later line has the same type as the early line. Change `line` to `entry` to make mypy happy.
The `load_file` utility function has a `decode` flag to determine whether to return string or bytes. Having two possible return types makes reasoning its usage hard. Instead create a `load_text_file` function that always returns strings and replace all calls of `load_file` that were returning strings.
Proposed Commit Message
Additional Context
I did see a few places we could further refactor things, but I didn't want to complicate this PR any more than necessary. Other than the function definitions of
load_text_file
andload_binary_file
, all other changes should be replacing theload_file
as necessary. There is also one additional mypy related commit that surfaced from these changes.Merge type