Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split load_file based on return type #4823

Merged
merged 3 commits into from
Jan 31, 2024

Conversation

TheRealFalcon
Copy link
Member

@TheRealFalcon TheRealFalcon commented Jan 30, 2024

Proposed Commit Message

Use individual commits

Additional Context

I did see a few places we could further refactor things, but I didn't want to complicate this PR any more than necessary. Other than the function definitions of load_text_file and load_binary_file, all other changes should be replacing the load_file as necessary. There is also one additional mypy related commit that surfaced from these changes.

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@holmanb holmanb self-assigned this Jan 30, 2024
@aciba90
Copy link
Contributor

aciba90 commented Jan 31, 2024

If the value of spliting load_files in two functions is only type checking, we could solve this by doing:

if typing.TYPE_CHECKING:
    @typing.overload
    def load_file(
        fname: Union[str, os.PathLike],
        *,
        read_cb: Optional[Callable[[int], None]] = None,
        quiet: bool = False,
        decode: typing.Literal[True] = True
    ) -> str:
        ...


    @typing.overload
    def load_file(
       fname: Union[str, os.PathLike],
       *,
       read_cb: Optional[Callable[[int], None]] = None,
       quiet: bool = False,
       decode: typing.Literal[False] = False
    ) -> bytes:
        ...


def load_file(
    fname: Union[str, os.PathLike],
    *,
    read_cb: Optional[Callable[[int], None]] = None,
    quiet: bool = False,
    decode: bool = False,
) -> Union[str, bytes]:

Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically this doesn't do much, but it helps with typing static analysis tools and simplifies reasoning about what functions. Whether encode is turning binary into utf-8 text or the inverse of that is something that may not be obvious to readers of the code - human grokking is easier with this change.

I have one nit, but feel free to take it or leave it.

Also I raised it in a comment (and I'm fine with this being a separate change), but probably worth looking into someday in the future is whether reading open() in text mode would be more performant than manually encoding - I would expect that it is written in C and therefore is probably faster than post-processing in Python, but I may be wrong - that's just a hunch.

@@ -124,18 +125,14 @@ def lsb_release():
return data


def decode_binary(blob, encoding="utf-8"):
def decode_binary(blob: Union[str, bytes], encoding="utf-8") -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since blob.decode() is simpler than import util -> util.decode_binary(blob), this helper really only has value the type of blob is unknown or only known at runtime. With typing improving our understanding of variables throughout the codebase I don't think it really makes sense to use it at new call sites unless the type really is variable at runtime.



def encode_text(text, encoding="utf-8"):
def encode_text(text: Union[str, bytes], encoding="utf-8") -> bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, but for .encode().

read_cb: Optional[Callable[[int], None]] = None,
quiet: bool = False,
) -> str:
return decode_binary(load_binary_file(fname, read_cb=read_cb, quiet=quiet))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the comment above, this could also just use the builtin .decode() method, since we know the type returned by load_binary_file():

    return load_binary_file(fname, read_cb=read_cb, quiet=quiet).decode()

`line` gets reused in two different loops in cc_mounts,
causing mypy to think the later line has the same type as the
early line. Change `line` to `entry` to make mypy happy.
)

The `load_file` utility function has a `decode` flag to determine
whether to return string or bytes. Having two possible return types
makes reasoning its usage hard. Instead create a `load_text_file`
function that always returns strings and replace all calls of
`load_file` that were returning strings.
…anonical#4823)

In the last commit, all `load_file` calls that returned strings were
replaced with `load_text_file`. Now that they have been replaced,
all remaining calls to `load_file` return bytes, so we can remove
the `decode` parameter and make the naming more explicit.
@TheRealFalcon
Copy link
Member Author

I'm going to leave it for now to not add any additional complexity, but I think there's further improvements we could make here.

@TheRealFalcon TheRealFalcon merged commit d27eab1 into canonical:main Jan 31, 2024
29 checks passed
TheRealFalcon added a commit that referenced this pull request Jan 31, 2024
`line` gets reused in two different loops in cc_mounts,
causing mypy to think the later line has the same type as the
early line. Change `line` to `entry` to make mypy happy.
TheRealFalcon added a commit that referenced this pull request Jan 31, 2024
The `load_file` utility function has a `decode` flag to determine
whether to return string or bytes. Having two possible return types
makes reasoning its usage hard. Instead create a `load_text_file`
function that always returns strings and replace all calls of
`load_file` that were returning strings.
@TheRealFalcon TheRealFalcon deleted the decode-file-load branch January 31, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants