Skip to content

Commit

Permalink
Document "the Any trick" (#11117)
Browse files Browse the repository at this point in the history
  • Loading branch information
Avasam authored Dec 18, 2023
1 parent 6621586 commit 9d8188c
Showing 1 changed file with 55 additions and 0 deletions.
55 changes: 55 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,61 @@ It's a "to do" item and should be replaced if possible. `Any` is used when
it's not possible to accurately type an item using the current type system.
It should be used sparingly.

### "The `Any` trick"

Consider the following (simplified) signature of `re.Match[str].group`:

```python
class Match:
def group(self, __group: str | int) -> str | Any: ...
```

The `str | Any` seems unnecessary and weird at first.
Because `Any` includes all strings, you would expect `str | Any` to be
equivalent to `Any`, but it is not. To understand the difference,
let's look at what happens when type-checking this simplified example:

Suppose you have a legacy system that for historical reasons has two kinds
of user IDs. Old IDs look like `"legacy_userid_123"` and new IDs look like
`"456_username"`. The function below is supposed to extract the name
`"USERNAME"` from a new ID, and return `None` if you give it a legacy ID.

```python
import re

def parse_name_from_new_id(user_id: str) -> str | None:
match = re.fullmatch(r"\d+_(.*)", user_id)
if match is None:
return None
name_group = match.group(1)
return name_group.uper() # This line is a typo (`uper` --> `upper`)
```

The `.group()` method returns `None` when the given group was not a part of the match.
For example, with a regex like `r"\d+_(.*)|legacy_userid_\d+"`, we would get a match whose `.group(1)` is `None` for the user ID `"legacy_userid_7"`.
But here the regex is written so that the group always exists, and `match.group(1)` cannot return `None`.
Match groups are almost always used in this way.

Let's now consider typeshed's `-> str | Any` annotation of the `.group()` method:

* `-> Any` would mean "please do not complain" to type checkers.
If `name_group` has type `Any`, you will get no error for this.
* `-> str` would mean "will always be a `str`", which is wrong, and would
cause type checkers to emit errors for code like `if name_group is None`.
* `-> str | None` means "you must check for None", which is correct but can get
annoying for some common patterns. Checks like `assert name_group is not None`
would need to be added into various places only to satisfy type checkers,
even when it is impossible to actually get a `None` value
(type checkers aren't smart enough to know this).
* `-> str | Any` means "must be prepared to handle a `str`". You will get an
error for `name_group.uper`, because it is not valid when `name_group` is a
`str`. But type checkers are happy with `if name_group is None` checks,
because we're saying it can also be something else than an `str`.

In typeshed we unofficially call returning `Foo | Any` "the Any trick".
We tend to use it whenever something can be `None`,
but requiring users to check for `None` would be more painful than helpful.

## Submitting Changes

Even more excellent than a good bug report is a fix for a bug, or the
Expand Down

0 comments on commit 9d8188c

Please sign in to comment.