Split `load_file` based on return type #4823

TheRealFalcon · 2024-01-30T17:05:27Z

Proposed Commit Message

Use individual commits

Additional Context

I did see a few places we could further refactor things, but I didn't want to complicate this PR any more than necessary. Other than the function definitions of load_text_file and load_binary_file, all other changes should be replacing the load_file as necessary. There is also one additional mypy related commit that surfaced from these changes.

Merge type

Squash merge using "Proposed Commit Message"
Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

cloudinit/util.py

aciba90 · 2024-01-31T16:13:10Z

If the value of spliting load_files in two functions is only type checking, we could solve this by doing:

if typing.TYPE_CHECKING:
    @typing.overload
    def load_file(
        fname: Union[str, os.PathLike],
        *,
        read_cb: Optional[Callable[[int], None]] = None,
        quiet: bool = False,
        decode: typing.Literal[True] = True
    ) -> str:
        ...


    @typing.overload
    def load_file(
       fname: Union[str, os.PathLike],
       *,
       read_cb: Optional[Callable[[int], None]] = None,
       quiet: bool = False,
       decode: typing.Literal[False] = False
    ) -> bytes:
        ...


def load_file(
    fname: Union[str, os.PathLike],
    *,
    read_cb: Optional[Callable[[int], None]] = None,
    quiet: bool = False,
    decode: bool = False,
) -> Union[str, bytes]:

holmanb

Logically this doesn't do much, but it helps with typing static analysis tools and simplifies reasoning about what functions. Whether encode is turning binary into utf-8 text or the inverse of that is something that may not be obvious to readers of the code - human grokking is easier with this change.

I have one nit, but feel free to take it or leave it.

Also I raised it in a comment (and I'm fine with this being a separate change), but probably worth looking into someday in the future is whether reading open() in text mode would be more performant than manually encoding - I would expect that it is written in C and therefore is probably faster than post-processing in Python, but I may be wrong - that's just a hunch.

holmanb · 2024-01-31T00:18:27Z

cloudinit/util.py

@@ -124,18 +125,14 @@ def lsb_release():
    return data


-def decode_binary(blob, encoding="utf-8"):
+def decode_binary(blob: Union[str, bytes], encoding="utf-8") -> str:


Since blob.decode() is simpler than import util -> util.decode_binary(blob), this helper really only has value the type of blob is unknown or only known at runtime. With typing improving our understanding of variables throughout the codebase I don't think it really makes sense to use it at new call sites unless the type really is variable at runtime.

holmanb · 2024-01-31T00:19:12Z

cloudinit/util.py



-def encode_text(text, encoding="utf-8"):
+def encode_text(text: Union[str, bytes], encoding="utf-8") -> bytes:


Same comment as above, but for .encode().

holmanb · 2024-01-31T00:20:56Z

cloudinit/util.py

+    read_cb: Optional[Callable[[int], None]] = None,
+    quiet: bool = False,
+) -> str:
+    return decode_binary(load_binary_file(fname, read_cb=read_cb, quiet=quiet))


per the comment above, this could also just use the builtin .decode() method, since we know the type returned by load_binary_file():

return load_binary_file(fname, read_cb=read_cb, quiet=quiet).decode()

`line` gets reused in two different loops in cc_mounts, causing mypy to think the later line has the same type as the early line. Change `line` to `entry` to make mypy happy.

) The `load_file` utility function has a `decode` flag to determine whether to return string or bytes. Having two possible return types makes reasoning its usage hard. Instead create a `load_text_file` function that always returns strings and replace all calls of `load_file` that were returning strings.

…anonical#4823) In the last commit, all `load_file` calls that returned strings were replaced with `load_text_file`. Now that they have been replaced, all remaining calls to `load_file` return bytes, so we can remove the `decode` parameter and make the naming more explicit.

TheRealFalcon · 2024-01-31T18:43:54Z

I'm going to leave it for now to not add any additional complexity, but I think there's further improvements we could make here.

`line` gets reused in two different loops in cc_mounts, causing mypy to think the later line has the same type as the early line. Change `line` to `entry` to make mypy happy.

The `load_file` utility function has a `decode` flag to determine whether to return string or bytes. Having two possible return types makes reasoning its usage hard. Instead create a `load_text_file` function that always returns strings and replace all calls of `load_file` that were returning strings.

holmanb reviewed Jan 30, 2024

View reviewed changes

cloudinit/util.py Show resolved Hide resolved

holmanb self-assigned this Jan 30, 2024

holmanb approved these changes Jan 31, 2024

View reviewed changes

TheRealFalcon added 3 commits January 31, 2024 11:56

refactor: Change variable name for consistent typing (canonical#4823)

d604b34

`line` gets reused in two different loops in cc_mounts, causing mypy to think the later line has the same type as the early line. Change `line` to `entry` to make mypy happy.

TheRealFalcon force-pushed the decode-file-load branch from d71f231 to 142b5d5 Compare January 31, 2024 18:14

TheRealFalcon merged commit d27eab1 into canonical:main Jan 31, 2024
29 checks passed

TheRealFalcon deleted the decode-file-load branch January 31, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split `load_file` based on return type #4823

Split `load_file` based on return type #4823

TheRealFalcon commented Jan 30, 2024 •

edited

Loading

aciba90 commented Jan 31, 2024 •

edited

Loading

holmanb left a comment

holmanb Jan 31, 2024

holmanb Jan 31, 2024

holmanb Jan 31, 2024

TheRealFalcon commented Jan 31, 2024



		def encode_text(text, encoding="utf-8"):
		def encode_text(text: Union[str, bytes], encoding="utf-8") -> bytes:

Split load_file based on return type #4823

Split load_file based on return type #4823

Conversation

TheRealFalcon commented Jan 30, 2024 • edited Loading

Proposed Commit Message

Additional Context

Merge type

aciba90 commented Jan 31, 2024 • edited Loading

holmanb left a comment

Choose a reason for hiding this comment

holmanb Jan 31, 2024

Choose a reason for hiding this comment

holmanb Jan 31, 2024

Choose a reason for hiding this comment

holmanb Jan 31, 2024

Choose a reason for hiding this comment

TheRealFalcon commented Jan 31, 2024

Split `load_file` based on return type #4823

Split `load_file` based on return type #4823

TheRealFalcon commented Jan 30, 2024 •

edited

Loading

aciba90 commented Jan 31, 2024 •

edited

Loading