Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear why iter_items should be favored over list_items #1775

Closed
EliahKagan opened this issue Dec 21, 2023 · 2 comments · Fixed by #1780
Closed

Unclear why iter_items should be favored over list_items #1775

EliahKagan opened this issue Dec 21, 2023 · 2 comments · Fixed by #1780

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Dec 21, 2023

In git.util.IterableObj, the list_items method recommends that iter_items be used instead, and begins to explain why, but it appears the actual explanation was never written:

GitPython/git/util.py

Lines 1255 to 1256 in 4023f28

:note: Favor the iter_items method as it will

The deprecated git.util.Iterable class has the same note in its list_items docstring:

GitPython/git/util.py

Lines 1218 to 1219 in 4023f28

:note: Favor the iter_items method as it will

In both cases, the subsequent non-blank line in the docstring documents what the list_items method itself returns, so the continuation truly is missing. The intended continuation does not seem to me to be something that can be inferred from the docstring as a whole. For example, here's the whole IterableObj.list_items docstring:

GitPython/git/util.py

Lines 1250 to 1258 in 4023f28

"""
Find all items of this type - subclasses can specify args and kwargs differently.
If no args are given, subclasses are obliged to return all items if no additional
arguments arg given.
:note: Favor the iter_items method as it will
:return: list(Item,...) list of item instances
"""

It may seem odd that, unlike #1712, I did not notice this when working on #1725. But I think the reason is not that the docstrings had made sense to me at that time, but instead that I had noticed the more minor (really, almost trivial) issue that perhaps the first paragraph should be split so that only its first sentence would be its summary line, decided to return to it later to consider that further, and then forgot about it.

The text "Favor the iter_items method as it will" appears to have been present, and its continuation absent, for as long as the surrounding code has been in GitPython. It was introduced in f4fa1cb along with the Iterable class itself.

In general, it may sometimes be preferable to obtain an iterator rather than a list or other sequence because it may not be necessary to materialize a collection just to iterate over it, and because unnecessary materialization can sometimes increase space usage. On the other hand, materialization guards against mutation of the original collection during iteration. But these are completely general ideas, not informed by the list_items docstrings nor even by any consideration specific to GitPython.

My guess is that the docstring intended to say something more specific, or at least to identify which general benefit of iter_items serves to recommend it. So I don't think I could propose a specific improvement to that documentation without insight into what was originally intended.

@Byron
Copy link
Member

Byron commented Dec 21, 2023

Thanks for bringing this up!

Solely on the basis of 'knowing myself a little' I'd think that the continuation would be a general listing of advantages of iterators. If iter_items is indeed implemented as iterator, it should be worth following up on finishing the note as the latency of collecting all items could be high, depending on the item at hand. Nowadays I'd argue that list_items() shouldn't exist in the first place, but it's too late for that now.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Dec 22, 2023
- Fill in the missing part of the explanation of why to favor
  iter_items over list_items in IterableObj and Iterable (gitpython-developers#1775).

- Move the explanation of how subclasses must treat arguments from
  the list_items methods to the iter_items methods, because the
  iter_items methdos are the ones that are abstract and must be
  overridden by a well-behaved subclass, and also because, since
  the iter_items methods are preferred for use, they should be the
  place where less duplicated shared documentation resides.

- Subtantially reword the existing documentation for clarity,
  especially regarding the signifance of extra args and kwargs.

- Put the iter_items method before (i.e. above) the list_items
  method (in each of the IterableObj and Iterable classes), because
  that method is the one that should be used more often, and
  because it is also now the one with the more detailed docstring.

- Remove and old comment on a return type that said exactly the
  exact same thing as the annotation.

- In Iterable, note deprecation more consistently (and thus in more
  places).

- Rewrite the IterableClassWatcher class docstring to explain
  exactly what that metaclass achieves.
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Dec 22, 2023
- Fill in the missing part of the explanation of why to favor
  iter_items over list_items in IterableObj and Iterable (gitpython-developers#1775).

- Move the explanation of how subclasses must treat arguments from
  the list_items methods to the iter_items methods, because the
  iter_items methdos are the ones that are abstract and must be
  overridden by a well-behaved subclass, and also because, since
  the iter_items methods are preferred for use, they should be the
  place where less duplicated shared documentation resides.

- Subtantially reword the existing documentation for clarity,
  especially regarding the signifance of extra args and kwargs.

- Put the iter_items method before (i.e. above) the list_items
  method (in each of the IterableObj and Iterable classes), because
  that method is the one that should be used more often, and
  because it is also now the one with the more detailed docstring.

- Remove and old comment on a return type that said exactly the
  exact same thing as the annotation.

- In Iterable, note deprecation more consistently (and thus in more
  places).

- Rewrite the IterableClassWatcher class docstring to explain
  exactly what that metaclass achieves.
@EliahKagan
Copy link
Contributor Author

EliahKagan commented Dec 22, 2023

If iter_items is indeed implemented as iterator

I would say it is implemented as an iterator. The more detailed picture is:

  • Hopefully code outside GitPython satisfies the documented contract, and GitPython can't be responsible for ensuring that.
  • If anything implements the deprecated git.util.Iterable anymore, it's code outside GitPython.
  • iter_items is abstract in git.util.IterableObj, which introduces it.
  • iter_items explicitly raises NotImplementedError in PushInfo and FetchInfo. Those concrete classes declare themselves to implement IterableObj yet, in effect, decline to implement this abstract method. However, because those classes don't implement list_items, whose base-class implementation uses iter_items, this is not a reason to avoid recommending iter_items over list_items in the base class docstrings.
  • Other classes that implement IterableObj have concrete implementations of iter_items that return iterators: either their own, or those introduced by an intermediate base class. I don't think I missed anything while looking into this.
  • The other iter_items method in the codebase belongs to SymbolicReference, and it returns an iterator. SymbolicReference does not inherit from IterableObj, nor from any other classes, except the implicit inheritance from object (not to be confused with GitPython's Object). I wonder if it would be beneficial to explain its relationship further in its class docstring; the description "Special case of a reference that is symbolic" might lead readers to think it is a subclass of some class representing a more general case, or that it is intended to be a subclass of Reference (which does implement IterableObj). But if I understand correctly, this is independent of any considerations in how IterableObj and its methods should be documented.

I've opened #1780 to propose some IterableObj and iter_items related improvements, including filling in the missing information about why iter_items should be preferred to list_items along the lines of what you've said here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants