Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: add type annotations to API surface #200

Closed
mitchellkennedy opened this issue Jul 29, 2021 · 19 comments
Closed

Enhancement: add type annotations to API surface #200

mitchellkennedy opened this issue Jul 29, 2021 · 19 comments
Labels

Comments

@mitchellkennedy
Copy link

I believe exceptions due to type errors when trying to parse the wrong datatype should be more clear.
I recently ran into a situation whereby I was inadvertently passing a PhoneNumber object type to the parse function. This resulted in the following error:
AttributeError: 'PhoneNumber' object has no attribute 'find'

Despite the fact I was passing a PhoneNumber object, it was not obvious to me in the moment that I needed to be passing a string based on the exception, which sent me investigating why that object lacked the attribute being called instead of letting me know that I was experiencing a TypeError and that the function was expecting a string in the first place.
This becomes more ambiguous if a user feeds the function an int, (which seems a much more likely mistake) which returns the following exception:
TypeError: object of type 'int' has no len()

The following code produces these exceptions depending on what is printed:

import phonenumbers

phone_number_int = 1111111111
phone_number_str = str(phone_number_int)
phone_number_object = phonenumbers.parse(phone_number_str, 'US')
phone_number_error = phonenumbers.parse(phone_number_int)

print(phone_number_error)

I believe a simple check of the number variable that raises a TypeError if not a string type and lets the user know the function is expecting a string would be helpful, and I would be happy to raise a pull request with this change.

Thank you.

@daviddrysdale
Copy link
Owner

Thanks for the suggestion. I've been thinking about the problem and I'm not currently inclined to add manual type checking for this particular entrypoint, for a few reasons:

  • The expected type is documented.
  • This is mostly only a problem when users are initially writing code that uses the library.
  • It might impact performance.
  • This is only one entrypoint among many.

I think the right solution for modern Python would be to add type annotations across the whole library API surface. However, there's a couple of problems with this:

  • It's a large change, which makes it harder to find time for it.
  • The library still supports Python 2.x, which restricts the available type annotation mechanisms.

(I'll leave this issue open in case I get superfluous leisure to investigate further)

@mitchellkennedy
Copy link
Author

I agree entirely with your assessment. Admittedly, I have not explored the library all that extensively and was assuming the parse function would be a singular- or one of a few functions that had a very high likelihood of a user accidentally passing an int.
On the matter of Python 2 compatibility, would PEP 484 as outlined here not be sufficient?

@daviddrysdale
Copy link
Owner

On the matter of Python 2 compatibility, would PEP 484 as outlined here not be sufficient?

I've only had a quick skim of the Python typing stuff, but it seems like earlier Python versions (2.x, maybe also 3.y for y<5) would need comment based type annotations, whereas later Python versions could use val: type annotations.

So adding annotations now would require the comment variant, but at some point in the future the direct annotations would be more appropriate.

(However, I may not have correctly understood what's available in what version.)

@mitchellkennedy
Copy link
Author

So adding annotations now would require the comment variant, but at some point in the future the direct annotations would be more appropriate.

I believe this to be correct. Using comment syntax should add no dependencies while being fully supported across all Python versions. While true that at some point in the future when older Python version support is dropped, it would be appropriate to migrate to function/variable annotations, refactoring should not be overly tedious.
Given these facts, should I, you, or anyone reading this find the time to implement this in a PR, would that design be acceptable to you?

@daviddrysdale daviddrysdale changed the title Exception unclear when parsing incorrect data type Enhancement: add type annotations to API surface Aug 4, 2021
@daviddrysdale
Copy link
Owner

Assuming that comment-based type annotations work across all Python versions (2.x & 3.x), then it would be great if someone had the time to do a PR to add them!

(If anyone does take this on, please drop a comment here to indicate interest and maybe start with a trial change that only updates a couple of API entrypoints.)

@mitchellkennedy
Copy link
Author

I had a few hours and added some test annotations to 7 functions: parse, as it was the one being discussed, and the first 6 sequentially in the file. This works as expected in Pycharm for all tested versions.
I do not currently have a working python2.5/6 install, but wanted to double check this is still being supported before going through the installation process.
I tried also to find any documentation on compatibility of annotations with pre-2.7 releases and found no results. I am hoping in the worst case, pre-2.7 releases simply treat that code as an actual comment.

@AA-Turner
Copy link
Contributor

AA-Turner commented Aug 23, 2021

@daviddrysdale / @mitchellkennedy -- I went through and had a go at this for a personal project a few months ago, using .pyi files (which sit along the normal .py files, alleviating the annotation syntax issues).

I was only using phonenumberslite, so the full library isn't annotated, but if its useful I'd be happy to try and integrate it here so the stubs can be "official"

https://github.com/AA-Turner/python-phonenumbers-stubs/tree/master/phonenumbers

Info on pyi / stub files: https://mypy.readthedocs.io/en/stable/stubs.html#creating-a-stub
I created the inital drafts of the stubs mostly with stubgen, and then went through manually checking that things made sense & adding where stubgen failed to infer types.

A

@daviddrysdale
Copy link
Owner

@AA-Turner , thanks for the suggestion – .pyi files sound like an interesting alternative. Do they work with IDEs like Pycharm?

More generally, I'm struggling a bit with using mypy to check things. Using @AA-Turner 's .pyi files I can get a clean check:

% mypy-3.9 phonenumbers --exclude phonenumbers/pb2 --show-error-code
Success: no issues found in 595 source files

but the checker doesn't seem to notice if I deliberately introduce a type error:

% git diff
diff --git a/python/phonenumbers/phonenumberutil.py b/python/phonenumbers/phonenumberutil.py
index 6a73a7c47b26..73b4c175957b 100644
--- a/python/phonenumbers/phonenumberutil.py
+++ b/python/phonenumbers/phonenumberutil.py
@@ -730,7 +730,7 @@ def length_of_geographical_area_code(numobj):
     if metadata.national_prefix is None and not numobj.italian_leading_zero:
         return 0
 
-    ntype = number_type(numobj)
+    ntype = number_type("string")
     country_code = numobj.country_code
     if (ntype == PhoneNumberType.MOBILE and
         (country_code in _GEO_MOBILE_COUNTRIES_WITHOUT_MOBILE_AREA_CODES)):

So I suspect I need to learn more about what Python type annotations are capable of.

@AA-Turner
Copy link
Contributor

.pyi files work with pycharm (if you middle click a standard library function there's often an asterisk next to the line number that will switch between the Python and the type annotations) - almost certainly works with VS code etc but don't have first hand experience.

Very odd on the deliberate error not being caught -- I don't have access to an IDE right now but should be able to check later in the week. This is more likely to be my misconfiguration than a problem in the type checker, if I had to guess.

A

@daviddrysdale
Copy link
Owner

Very odd on the deliberate error not being caught

It might well be user error…

@AA-Turner
Copy link
Contributor

AA-Turner commented Aug 27, 2021

Having done some reading, it isn't user error!

It turns out that mypy assumes .pyi files are canonical and so ignores the corresponding .py completely, including not type checking the .py files. I didn't realise this was the case, but this issue (python/mypy#3505) states as much. My previous usage of .pyi files was limited to knowing that they exist and the syntax, I hadn't actually used them for a project, as I prefer inline annotations. Sorry for this!

.pyi files would still be useful for external library consumers, specifying the types of the API surface, effectively. However, the other side of type-checking the library code itself doesn't exist. To get that benefit:

  • support for Python 2 would need to be dropped (which you've said you don't want to do xref Remove Testing and support for EOL Pythons #156), or
  • type comments would need to be used (for which there's less automatic tooling support to integrate into the main codebase as annotations, should that ever happen).

Therefore I would suggest for now just the "describe the API" part of static typing, with the "type check library code" bit to be deferred. This is mainly just to split up the work // for pragmatism purposes.


Out of interest, I managed to use https://github.com/ambv/retype to do the "type check library code" work -- this led to some corrections in the stubs, and also raised some type errors -- for example in PhoneMetadata.load_all, I believe the second loop should be using country_code not region_code on line 338 -- this came up as a type error of str used where int was expected to index the dictionary.

@classmethod
def load_all(kls):
"""Force immediate load of all metadata"""
# Force expansion of contents to lists because we invalidate the iterator
for region_code, loader in list(kls._region_available.items()):
if loader is not None: # pragma no cover
loader(region_code)
kls._region_available[region_code] = None
for country_code, loader in list(kls._country_code_available.items()):
if loader is not None:
loader(country_code)
kls._country_code_available[region_code] = None

Around 160 "errors" were raised in total -- nine tenths of these are due to nullable values, which could be tightened up via assert var is not None or etc -- mypy doesn't seem to like the design pattern of doing checks first and then calling a function that could return None, but the checks mean that it won't.

A

daviddrysdale added a commit that referenced this issue Aug 27, 2021
One minor, one more significant spotted at:
#200 (comment)
@daviddrysdale
Copy link
Owner

Thanks for the information; I think the distinction between "describe the API types" and "type-check the library itself" is very helpful.

In the near term, using .pyi files to address the former seems reasonable, particularly as it doesn't have any impact on the main codebase.

(In the longer term, I imagine I'll eventually drop support for Python 2.x and then the annotations can be merged from the .pyi files into var: Type-style annotations in the main codebase.)

Thanks also for the find in PhoneMetadata.load_all; I've got a fix in #205 (together with another more minor change that my experiments with type checking turned up). Were there any other findings that didn't seem benign?

@AA-Turner
Copy link
Contributor

How should we proceed RE adding .pyi files to the library? Do you want me to draft up a PR with the stubs I've done? (There would also need to be some minor packaging changes to pick up the files for distribution and signal that the library is typed, which I'd be happy to do)

Nothing else that I'd describe as bugs, though there are two things that the type checker throws a bit of a hissy fit over -- reusing variable names in the same scope with different types. This happens once in the function mentioned above (loader is a callable taking a string in the first loop and a callable taking an int in the second), and later on in PhoneMetadata.__init__ where id and kls_map are reused. This isn't a typing issue per se, more of limitations of current tooling -- but something that would need special handling if inline annotations were used (or a slight refactoring).

for region_code, loader in list(kls._region_available.items()):
if loader is not None: # pragma no cover
loader(region_code)
kls._region_available[region_code] = None
for country_code, loader in list(kls._country_code_available.items()):
if loader is not None:
loader(country_code)
kls._country_code_available[region_code] = None

if self.id == REGION_CODE_FOR_NON_GEO_ENTITY:
kls_map = PhoneMetadata._country_code_metadata
id = self.country_code
elif self.short_data:
kls_map = PhoneMetadata._short_region_metadata
id = self.id
else:
kls_map = PhoneMetadata._region_metadata
id = self.id
if id in kls_map:
other = kls_map[id]
if self != other:
raise Exception("Duplicate PhoneMetadata for %s (from %s:%s)" % (id, self.id, self.country_code))
else:
kls_map[id] = self

daviddrysdale added a commit that referenced this issue Aug 29, 2021
One minor, one more significant spotted at:
#200 (comment)

Also rename reused `loader` variable to help avoid false positives.
@daviddrysdale
Copy link
Owner

A PR with your stubs would be great, thank you.

There's a couple of things I'd like to investigate before merging, though:

  • I'd still like to have some sort of automation that works over the .pyi files (accepting that they just describe the API for external users). Is it possible to (say) type-check the test suite, treating it as a chunk of code that uses the library API? (The background concern here is that I don't personally use the type annotations, so I need something to tell me if I accidentally break them.)
  • Are .pyi files needed for all the autogenerated files under phonenumbers/*data/, or can they be skipped/omitted? These files are fairly internal-only, so it would be nice not to have lots of associated .pyi files if they're not needed.

I've also tried to refactor around the current false positives – does #205 look like it would fix things?

@AA-Turner
Copy link
Contributor

AA-Turner commented Sep 1, 2021

sort of automation

I found python/typeshed#754 and python/mypy#5028, which pointed me to mypy.stubtest -- have added a actions workflow to test with this. It seems that this is also how typeshed (the stdlib stub file repo) do their testing.

The test suite can't be easily used as it is untyped, so then you either get into a chicken and egg situation of factoring all of the test utilities into a distinct files that has a .pyi complement, or creating a fully-typed set of "tests" that don't check values but types -- I started doing this a bit before I found stubtest.

Are .pyi files needed for all the autogenerated files ... ?

No (ish) -- I've included the __init__.pyi files and geodata/locale.pyi as these are imported by other modules. So no hundreds of files!

PR 205

Looks good! The other types of places are in e.g. PhoneMetadata.XXX_for_region, it is possible for None to be returned, but a lot of implementation code expects a value and doesn't check for None -- technically this is a type error, the three ways to circumvent it would be by

  • raising an error in the function if no metadata are found (but breaks backwards compatibility)
  • always checking for is not None at call sites (which users using type checkers will likely end up doing)
  • statically defining the regions/codes that have metadata, and 'overloading' the function signature, defining that these valid values will always return metadata and others will return "None". But not sure this works as of the dynamic registration of metadata.

The same applies to phonenumberutil.(example_number | invalid_example_number | example_number_for_type | example_number_for_non_geo_entity | region_code_for_number | ndd_prefix_for_region) and to a lesser extent re_util.fullmatch, as it is only ever used in a boolean context.

mypy also found that shortnumberinfotest._parse is broken on the error condition -- self isn't passed in, and the format string takes 4 args whereas 3 are subbed in (need meta tests for test-suite utilities!)

A

@AA-Turner
Copy link
Contributor

AA-Turner commented Sep 1, 2021

Oh, and _BlockRange.__eq__ // _BlockRange.__ne__ -- these assume that the other operand is an instance of _BlockRange, which breaks liskov substitution on object.__(eq|ne)__.

Fix here would be to do as with e.g. PhoneNumber and return False or NotImplemented.

def __eq__(self, other):
    if not isinstance(other, _BlockRange):
        return NotImplemented
	return (self.start == other.start and self.end == other.end)

Now, I'd hope consumers aren't doing comparisons of internal private unicode helper types, but good to be thorough!

Edit: see PR #208 -- the issues above are more design decision level, so haven't done a PR for them (yet)

A

daviddrysdale added a commit that referenced this issue Sep 2, 2021
One minor, one more significant spotted at:
#200 (comment)

Also rename reused `loader` variable to help avoid false positives.
@daviddrysdale
Copy link
Owner

The other types of places are in e.g. PhoneMetadata.XXX_for_region, it is possible for None to be returned, but a lot of implementation code expects a value and doesn't check for None

I think most of these cases can't end up with None, because of earlier checks. I've added some asserts in #213.

mypy also found that shortnumberinfotest._parse is broken on the error condition -- self isn't passed in, and the format string takes 4 args whereas 3 are subbed in (need meta tests for test-suite utilities!)

Nice spot, fix in #213.

@daviddrysdale
Copy link
Owner

Fixed in #207 – many thanks @AA-Turner !

@AA-Turner
Copy link
Contributor

AA-Turner commented Sep 5, 2021

Brilliant! Thanks for the reviews :)

And if you move to Python versions that support inline annotations, the offer is very much open to move the stubs inline if wanted -- hopefully that would be a quicker review!

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants