Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve typing for BS4 element.Tag's get and get_attribute_list. #12840

Merged
merged 4 commits into from
Oct 19, 2024

Conversation

kemus
Copy link

@kemus kemus commented Oct 17, 2024

Currently,

  • get() will always have None as a possible return type, even if the default is set.
  • get_attribute_list() will never have None in the possible types inside the list it returns, even if the default allows for that.
  • (minor issue) get() and get_attribute_list()'s typing doesn't allow for default to have a type that isn't str or list[str], even though that's technically allowed by the implementation.

For reference, here's what the implementation of get and get_attribute_list looks like:

    def get(self, key, default=None):
        """Returns the value of the 'key' attribute for the tag, or
        the value given for 'default' if it doesn't have that
        attribute."""
        return self.attrs.get(key, default)

    def get_attribute_list(self, key, default=None):
        """The same as get(), but always returns a list.

        :param key: The attribute to look for.
        :param default: Use this value if the attribute is not present
            on this PageElement.
        :return: A list of values, probably containing only a single
            value.
        """
        value = self.get(key, default)
        if not isinstance(value, list):
            value = [value]
        return value

This change improves the type handling for these methods by using the same idea as stdlib/builtins.pyi uses for dict.get().

This is what dict.get()'s typing looks like:

  @overload
  def get(self, key: _KT) -> _VT | None: ...
  @overload
  def get(self, key: _KT, default: _VT) -> _VT: ...
  @overload
  def get(self, key: _KT, default: _T) -> _VT | _T: ...

Since Tag.get takes a str key and returns a str or list[str] if the attribute exists, we have types similar to a dict[str, str | list[str]]:

  @overload
  def get(self, key: str) -> str | list | None: ...
  @overload
  def get(self, key: str, default: str | list) -> str | list: ...
  @overload
  def get(self, key: str, default: _T) -> str | list | _T: ...

Then because str | list == str | list | str | list, we don't actually need the second overload:

  @overload
  def get(self, key: str) -> str | list | None: ...
  @overload
  def get(self, key: str, default: _T) -> str | list | _T: ...

We can also do something similar for get_attribute_list():

  @overload
  def get_attribute_list(self, key: str) -> list[str | None]: ...
  @overload
  def get_attribute_list(self, key: str, default: _T) -> list[str | _T]: ...

Except this isn't quite right -- if the key isn't found and the default is a list[_T], the implementation returns it instead of a list[list[_T]], so we need to unwrap it:

  @overload
  def get_attribute_list(self, key: str) -> list[str | None]: ...
  @overload
  def get_attribute_list(self, key: str, default: list[_T]) -> list[str | _T]: ...
  @overload
  def get_attribute_list(self, key: str, default: _T) -> list[str | _T]: ...

Here's an example of the types that mypy infers for a few cases before and after this change:

from __future__ import annotations

from bs4 import Tag

tag = Tag()
k = "class"
# str | list[str] | None -> str | list[str] | None (unchanged)
value_1 = tag.get(k)
# str | list[str] | None -> str | list[str] | None (unchanged)
value_2 = tag.get(k, default=None)
# str | list[str] | None -> str | list[str]
value_3 = tag.get(k, default="a")
# str | list[str] | None -> str | list[str]
value_4 = tag.get(k, default=["a"])
# type error on the default arg -> str | list[str] | int
value_5 = tag.get(k, default=1)
# type error on the default arg -> str | list[str] | list[int]
value_6 = tag.get(k, default=[1])

# list[str] -> list[str | None]
values_1 = tag.get_attribute_list(k)
# list[str] -> list[str | None]
values_2 = tag.get_attribute_list(k, default=None)
# list[str] -> list[str] (unchanged)
values_3 = tag.get_attribute_list(k, default="a")
# list[str] -> list[str] (unchanged)
values_4 = tag.get_attribute_list(k, default=["a"])
# type error on the default arg -> list[str | int]
values_5 = tag.get_attribute_list(k, default=1)
# type error on the default arg -> list[str | int]
values_6 = tag.get_attribute_list(k, default=[1])

Currently,
- `get()` will always have `None` as a possible return type,
  even if the default is set.
- `get()`'s typing is wrong if the default's type is not `str` or
  `list[str]`.
- `get_attribute_list()` will never have `None` in the possible types
  inside the list it returns, even if the default allows for that.
- `get_attribute_list()`'s typing is wrong if the default's type is not
  `str` or `list[str]`.

This change improves the type handling for these methods by using the
same idea as `stdlib/builtins.pyi` uses for `dict.get()`.

This is what `dict.get()`'s typing looks like:
```python
  @overload  # type: ignore[override]
  def get(self, key: _KT) -> _VT | None: ...
  @overload
  def get(self, key: _KT, default: _VT) -> _VT: ...
  @overload
  def get(self, key: _KT, default: _T) -> _VT | _T: ...
```

Since Tag.get takes a `str` key and returns a `str` or `list[str]`
if the attribute exists, we can do something like a
`dict[str, str | list[str]]`:
```python
  @overload  # type: ignore[override]
  def get(self, key: str) -> str | list | None: ...
  @overload
  def get(self, key: str, default: str | list) -> str | list: ...
  @overload
  def get(self, key: str, default: _T) -> str | list | _T: ...
```

Then because `str | list == str | list | str | list`, we can simplify:

```python
  @overload  # type: ignore[override]
  def get(self, key: str) -> str | list | None: ...
  @overload
  def get(self, key: str, default: _T) -> str | list | _T: ...
````

We can also do something similar for `get_attribute_list()`:

```python
  @overload
  def get_attribute_list(self, key: str) -> list[str | None]: ...
  @overload
  def get_attribute_list(self, key: str, default: _T) -> list[str | _T]: ...
```

Except this isn't quite right -- if default is a list, the
implementation returns it instead of a list[list], so we need to unwrap
it:
```python
  @overload
  def get_attribute_list(self, key: str) -> list[str | None]: ...
  @overload
  def get_attribute_list(self, key: str, default: list[_T]) -> list[str | _T]: ...
  @overload
  def get_attribute_list(self, key: str, default: _T) -> list[str | _T]: ...
```
@kemus
Copy link
Author

kemus commented Oct 17, 2024

If you want, I could also change it so we still only allow default to be str or list[str], I'm not necessarily attached to allowing any default value since I can't think of any cases where that's useful.

This comment has been minimized.

This comment has been minimized.

stubs/beautifulsoup4/bs4/element.pyi Outdated Show resolved Hide resolved
stubs/beautifulsoup4/bs4/element.pyi Outdated Show resolved Hide resolved
Co-authored-by: Sebastian Rittau <[email protected]>
@kemus
Copy link
Author

kemus commented Oct 19, 2024

Thanks for the suggestions @srittau -- I should've seen that I could combine those.

Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@srittau srittau merged commit 7043ec2 into python:main Oct 19, 2024
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants