Enhancement: add type annotations to API surface #200

mitchellkennedy · 2021-07-29T06:25:55Z

I believe exceptions due to type errors when trying to parse the wrong datatype should be more clear.
I recently ran into a situation whereby I was inadvertently passing a PhoneNumber object type to the parse function. This resulted in the following error:
AttributeError: 'PhoneNumber' object has no attribute 'find'

Despite the fact I was passing a PhoneNumber object, it was not obvious to me in the moment that I needed to be passing a string based on the exception, which sent me investigating why that object lacked the attribute being called instead of letting me know that I was experiencing a TypeError and that the function was expecting a string in the first place.
This becomes more ambiguous if a user feeds the function an int, (which seems a much more likely mistake) which returns the following exception:
TypeError: object of type 'int' has no len()

The following code produces these exceptions depending on what is printed:

import phonenumbers

phone_number_int = 1111111111
phone_number_str = str(phone_number_int)
phone_number_object = phonenumbers.parse(phone_number_str, 'US')
phone_number_error = phonenumbers.parse(phone_number_int)

print(phone_number_error)

I believe a simple check of the number variable that raises a TypeError if not a string type and lets the user know the function is expecting a string would be helpful, and I would be happy to raise a pull request with this change.

Thank you.

The text was updated successfully, but these errors were encountered:

daviddrysdale · 2021-08-03T10:45:26Z

Thanks for the suggestion. I've been thinking about the problem and I'm not currently inclined to add manual type checking for this particular entrypoint, for a few reasons:

The expected type is documented.
This is mostly only a problem when users are initially writing code that uses the library.
It might impact performance.
This is only one entrypoint among many.

I think the right solution for modern Python would be to add type annotations across the whole library API surface. However, there's a couple of problems with this:

It's a large change, which makes it harder to find time for it.
The library still supports Python 2.x, which restricts the available type annotation mechanisms.

(I'll leave this issue open in case I get superfluous leisure to investigate further)

mitchellkennedy · 2021-08-04T06:53:05Z

I agree entirely with your assessment. Admittedly, I have not explored the library all that extensively and was assuming the parse function would be a singular- or one of a few functions that had a very high likelihood of a user accidentally passing an int.
On the matter of Python 2 compatibility, would PEP 484 as outlined here not be sufficient?

daviddrysdale · 2021-08-04T07:23:12Z

On the matter of Python 2 compatibility, would PEP 484 as outlined here not be sufficient?

I've only had a quick skim of the Python typing stuff, but it seems like earlier Python versions (2.x, maybe also 3.y for y<5) would need comment based type annotations, whereas later Python versions could use val: type annotations.

So adding annotations now would require the comment variant, but at some point in the future the direct annotations would be more appropriate.

(However, I may not have correctly understood what's available in what version.)

mitchellkennedy · 2021-08-04T07:55:18Z

So adding annotations now would require the comment variant, but at some point in the future the direct annotations would be more appropriate.

I believe this to be correct. Using comment syntax should add no dependencies while being fully supported across all Python versions. While true that at some point in the future when older Python version support is dropped, it would be appropriate to migrate to function/variable annotations, refactoring should not be overly tedious.
Given these facts, should I, you, or anyone reading this find the time to implement this in a PR, would that design be acceptable to you?

daviddrysdale · 2021-08-04T08:14:44Z

Assuming that comment-based type annotations work across all Python versions (2.x & 3.x), then it would be great if someone had the time to do a PR to add them!

(If anyone does take this on, please drop a comment here to indicate interest and maybe start with a trial change that only updates a couple of API entrypoints.)

mitchellkennedy · 2021-08-04T10:13:43Z

I had a few hours and added some test annotations to 7 functions: parse, as it was the one being discussed, and the first 6 sequentially in the file. This works as expected in Pycharm for all tested versions.
I do not currently have a working python2.5/6 install, but wanted to double check this is still being supported before going through the installation process.
I tried also to find any documentation on compatibility of annotations with pre-2.7 releases and found no results. I am hoping in the worst case, pre-2.7 releases simply treat that code as an actual comment.

AA-Turner · 2021-08-23T01:35:48Z

@daviddrysdale / @mitchellkennedy -- I went through and had a go at this for a personal project a few months ago, using .pyi files (which sit along the normal .py files, alleviating the annotation syntax issues).

I was only using phonenumberslite, so the full library isn't annotated, but if its useful I'd be happy to try and integrate it here so the stubs can be "official"

https://github.com/AA-Turner/python-phonenumbers-stubs/tree/master/phonenumbers

Info on pyi / stub files: https://mypy.readthedocs.io/en/stable/stubs.html#creating-a-stub
I created the inital drafts of the stubs mostly with stubgen, and then went through manually checking that things made sense & adding where stubgen failed to infer types.

A

daviddrysdale · 2021-08-24T17:26:09Z

@AA-Turner , thanks for the suggestion – .pyi files sound like an interesting alternative. Do they work with IDEs like Pycharm?

More generally, I'm struggling a bit with using mypy to check things. Using @AA-Turner 's .pyi files I can get a clean check:

% mypy-3.9 phonenumbers --exclude phonenumbers/pb2 --show-error-code
Success: no issues found in 595 source files

but the checker doesn't seem to notice if I deliberately introduce a type error:

% git diff
diff --git a/python/phonenumbers/phonenumberutil.py b/python/phonenumbers/phonenumberutil.py
index 6a73a7c47b26..73b4c175957b 100644
--- a/python/phonenumbers/phonenumberutil.py
+++ b/python/phonenumbers/phonenumberutil.py
@@ -730,7 +730,7 @@ def length_of_geographical_area_code(numobj):
     if metadata.national_prefix is None and not numobj.italian_leading_zero:
         return 0
 
-    ntype = number_type(numobj)
+    ntype = number_type("string")
     country_code = numobj.country_code
     if (ntype == PhoneNumberType.MOBILE and
         (country_code in _GEO_MOBILE_COUNTRIES_WITHOUT_MOBILE_AREA_CODES)):

So I suspect I need to learn more about what Python type annotations are capable of.

AA-Turner · 2021-08-24T23:33:54Z

.pyi files work with pycharm (if you middle click a standard library function there's often an asterisk next to the line number that will switch between the Python and the type annotations) - almost certainly works with VS code etc but don't have first hand experience.

Very odd on the deliberate error not being caught -- I don't have access to an IDE right now but should be able to check later in the week. This is more likely to be my misconfiguration than a problem in the type checker, if I had to guess.

A

daviddrysdale · 2021-08-25T08:32:51Z

Very odd on the deliberate error not being caught

It might well be user error…

AA-Turner · 2021-08-27T02:28:40Z

Having done some reading, it isn't user error!

It turns out that mypy assumes .pyi files are canonical and so ignores the corresponding .py completely, including not type checking the .py files. I didn't realise this was the case, but this issue (python/mypy#3505) states as much. My previous usage of .pyi files was limited to knowing that they exist and the syntax, I hadn't actually used them for a project, as I prefer inline annotations. Sorry for this!

.pyi files would still be useful for external library consumers, specifying the types of the API surface, effectively. However, the other side of type-checking the library code itself doesn't exist. To get that benefit:

support for Python 2 would need to be dropped (which you've said you don't want to do xref Remove Testing and support for EOL Pythons #156), or
type comments would need to be used (for which there's less automatic tooling support to integrate into the main codebase as annotations, should that ever happen).

Therefore I would suggest for now just the "describe the API" part of static typing, with the "type check library code" bit to be deferred. This is mainly just to split up the work // for pragmatism purposes.

Out of interest, I managed to use https://github.com/ambv/retype to do the "type check library code" work -- this led to some corrections in the stubs, and also raised some type errors -- for example in PhoneMetadata.load_all, I believe the second loop should be using country_code not region_code on line 338 -- this came up as a type error of str used where int was expected to index the dictionary.

python-phonenumbers/python/phonenumbers/phonemetadata.py

Lines 327 to 338 in bd23712

    
           @classmethod 
        
           def load_all(kls): 
        
               """Force immediate load of all metadata""" 
        
               # Force expansion of contents to lists because we invalidate the iterator 
        
               for region_code, loader in list(kls._region_available.items()): 
        
                   if loader is not None:  # pragma no cover 
        
                       loader(region_code) 
        
                       kls._region_available[region_code] = None 
        
               for country_code, loader in list(kls._country_code_available.items()): 
        
                   if loader is not None: 
        
                       loader(country_code) 
        
                       kls._country_code_available[region_code] = None

Around 160 "errors" were raised in total -- nine tenths of these are due to nullable values, which could be tightened up via assert var is not None or etc -- mypy doesn't seem to like the design pattern of doing checks first and then calling a function that could return None, but the checks mean that it won't.

A

One minor, one more significant spotted at: #200 (comment)

daviddrysdale · 2021-08-27T07:36:12Z

Thanks for the information; I think the distinction between "describe the API types" and "type-check the library itself" is very helpful.

In the near term, using .pyi files to address the former seems reasonable, particularly as it doesn't have any impact on the main codebase.

(In the longer term, I imagine I'll eventually drop support for Python 2.x and then the annotations can be merged from the .pyi files into var: Type-style annotations in the main codebase.)

Thanks also for the find in PhoneMetadata.load_all; I've got a fix in #205 (together with another more minor change that my experiments with type checking turned up). Were there any other findings that didn't seem benign?

AA-Turner · 2021-08-29T01:14:04Z

How should we proceed RE adding .pyi files to the library? Do you want me to draft up a PR with the stubs I've done? (There would also need to be some minor packaging changes to pick up the files for distribution and signal that the library is typed, which I'd be happy to do)

Nothing else that I'd describe as bugs, though there are two things that the type checker throws a bit of a hissy fit over -- reusing variable names in the same scope with different types. This happens once in the function mentioned above (loader is a callable taking a string in the first loop and a callable taking an int in the second), and later on in PhoneMetadata.__init__ where id and kls_map are reused. This isn't a typing issue per se, more of limitations of current tooling -- but something that would need special handling if inline annotations were used (or a slight refactoring).

python-phonenumbers/python/phonenumbers/phonemetadata.py

Lines 331 to 338 in bd23712

    
           for region_code, loader in list(kls._region_available.items()): 
        
               if loader is not None:  # pragma no cover 
        
                   loader(region_code) 
        
                   kls._region_available[region_code] = None 
        
           for country_code, loader in list(kls._country_code_available.items()): 
        
               if loader is not None: 
        
                   loader(country_code) 
        
                   kls._country_code_available[region_code] = None

python-phonenumbers/python/phonenumbers/phonemetadata.py

Lines 543 to 557 in bd23712

    
           if self.id == REGION_CODE_FOR_NON_GEO_ENTITY: 
        
               kls_map = PhoneMetadata._country_code_metadata 
        
               id = self.country_code 
        
           elif self.short_data: 
        
               kls_map = PhoneMetadata._short_region_metadata 
        
               id = self.id 
        
           else: 
        
               kls_map = PhoneMetadata._region_metadata 
        
               id = self.id 
        
           if id in kls_map: 
        
               other = kls_map[id] 
        
               if self != other: 
        
                   raise Exception("Duplicate PhoneMetadata for %s (from %s:%s)" % (id, self.id, self.country_code)) 
        
           else: 
        
               kls_map[id] = self

One minor, one more significant spotted at: #200 (comment) Also rename reused `loader` variable to help avoid false positives.

daviddrysdale · 2021-08-30T07:38:02Z

A PR with your stubs would be great, thank you.

There's a couple of things I'd like to investigate before merging, though:

I'd still like to have some sort of automation that works over the .pyi files (accepting that they just describe the API for external users). Is it possible to (say) type-check the test suite, treating it as a chunk of code that uses the library API? (The background concern here is that I don't personally use the type annotations, so I need something to tell me if I accidentally break them.)
Are .pyi files needed for all the autogenerated files under phonenumbers/*data/, or can they be skipped/omitted? These files are fairly internal-only, so it would be nice not to have lots of associated .pyi files if they're not needed.

I've also tried to refactor around the current false positives – does #205 look like it would fix things?

AA-Turner · 2021-09-01T21:47:09Z

sort of automation

I found python/typeshed#754 and python/mypy#5028, which pointed me to mypy.stubtest -- have added a actions workflow to test with this. It seems that this is also how typeshed (the stdlib stub file repo) do their testing.

The test suite can't be easily used as it is untyped, so then you either get into a chicken and egg situation of factoring all of the test utilities into a distinct files that has a .pyi complement, or creating a fully-typed set of "tests" that don't check values but types -- I started doing this a bit before I found stubtest.

Are .pyi files needed for all the autogenerated files ... ?

No (ish) -- I've included the __init__.pyi files and geodata/locale.pyi as these are imported by other modules. So no hundreds of files!

PR 205

Looks good! The other types of places are in e.g. PhoneMetadata.XXX_for_region, it is possible for None to be returned, but a lot of implementation code expects a value and doesn't check for None -- technically this is a type error, the three ways to circumvent it would be by

raising an error in the function if no metadata are found (but breaks backwards compatibility)
always checking for is not None at call sites (which users using type checkers will likely end up doing)
statically defining the regions/codes that have metadata, and 'overloading' the function signature, defining that these valid values will always return metadata and others will return "None". But not sure this works as of the dynamic registration of metadata.

The same applies to phonenumberutil.(example_number | invalid_example_number | example_number_for_type | example_number_for_non_geo_entity | region_code_for_number | ndd_prefix_for_region) and to a lesser extent re_util.fullmatch, as it is only ever used in a boolean context.

mypy also found that shortnumberinfotest._parse is broken on the error condition -- self isn't passed in, and the format string takes 4 args whereas 3 are subbed in (need meta tests for test-suite utilities!)

A

AA-Turner · 2021-09-01T23:59:21Z

Oh, and _BlockRange.__eq__ // _BlockRange.__ne__ -- these assume that the other operand is an instance of _BlockRange, which breaks liskov substitution on object.__(eq|ne)__.

Fix here would be to do as with e.g. PhoneNumber and return False or NotImplemented.

def __eq__(self, other):
    if not isinstance(other, _BlockRange):
        return NotImplemented
	return (self.start == other.start and self.end == other.end)

Now, I'd hope consumers aren't doing comparisons of internal private unicode helper types, but good to be thorough!

Edit: see PR #208 -- the issues above are more design decision level, so haven't done a PR for them (yet)

A

One minor, one more significant spotted at: #200 (comment) Also rename reused `loader` variable to help avoid false positives.

daviddrysdale · 2021-09-04T09:44:03Z

The other types of places are in e.g. PhoneMetadata.XXX_for_region, it is possible for None to be returned, but a lot of implementation code expects a value and doesn't check for None

I think most of these cases can't end up with None, because of earlier checks. I've added some asserts in #213.

mypy also found that shortnumberinfotest._parse is broken on the error condition -- self isn't passed in, and the format string takes 4 args whereas 3 are subbed in (need meta tests for test-suite utilities!)

Nice spot, fix in #213.

daviddrysdale · 2021-09-05T16:56:59Z

Fixed in #207 – many thanks @AA-Turner !

AA-Turner · 2021-09-05T17:44:15Z

Brilliant! Thanks for the reviews :)

And if you move to Python versions that support inline annotations, the offer is very much open to move the stubs inline if wanted -- hopefully that would be a quicker review!

A

daviddrysdale changed the title ~~Exception unclear when parsing incorrect data type~~ Enhancement: add type annotations to API surface Aug 4, 2021

mitchellkennedy mentioned this issue Aug 4, 2021

Adding trial type annotations to functions with correct return types. #202

Closed

daviddrysdale mentioned this issue Aug 24, 2021

fixed wrong if condition due region could be false #204

Closed

daviddrysdale added a commit that referenced this issue Aug 27, 2021

Fix a couple of type errors

be4cf14

One minor, one more significant spotted at: #200 (comment)

daviddrysdale mentioned this issue Aug 27, 2021

Fix a couple of type errors #205

Merged

daviddrysdale added a commit that referenced this issue Aug 29, 2021

Fix a couple of type errors

56a0061

One minor, one more significant spotted at: #200 (comment) Also rename reused `loader` variable to help avoid false positives.

AA-Turner mentioned this issue Sep 1, 2021

Add API type definitions through stub files #207

Merged

AA-Turner mentioned this issue Sep 2, 2021

🏷️ Restore Liskov adherence in _BlockRange equality operators #208

Merged

daviddrysdale added a commit that referenced this issue Sep 2, 2021

Fix a couple of type errors

48d4bf1

One minor, one more significant spotted at: #200 (comment) Also rename reused `loader` variable to help avoid false positives.

daviddrysdale closed this as completed Sep 5, 2021

daviddrysdale mentioned this issue Sep 6, 2021

Check for non-None expectations #214

Open

daviddrysdale added the feature label Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: add type annotations to API surface #200

Enhancement: add type annotations to API surface #200

mitchellkennedy commented Jul 29, 2021

daviddrysdale commented Aug 3, 2021

mitchellkennedy commented Aug 4, 2021

daviddrysdale commented Aug 4, 2021

mitchellkennedy commented Aug 4, 2021

daviddrysdale commented Aug 4, 2021

mitchellkennedy commented Aug 4, 2021

AA-Turner commented Aug 23, 2021 •

edited

Loading

daviddrysdale commented Aug 24, 2021

AA-Turner commented Aug 24, 2021

daviddrysdale commented Aug 25, 2021

AA-Turner commented Aug 27, 2021 •

edited

Loading

daviddrysdale commented Aug 27, 2021

AA-Turner commented Aug 29, 2021

daviddrysdale commented Aug 30, 2021

AA-Turner commented Sep 1, 2021 •

edited

Loading

AA-Turner commented Sep 1, 2021 •

edited

Loading

daviddrysdale commented Sep 4, 2021

daviddrysdale commented Sep 5, 2021

AA-Turner commented Sep 5, 2021 •

edited

Loading

Enhancement: add type annotations to API surface #200

Enhancement: add type annotations to API surface #200

Comments

mitchellkennedy commented Jul 29, 2021

daviddrysdale commented Aug 3, 2021

mitchellkennedy commented Aug 4, 2021

daviddrysdale commented Aug 4, 2021

mitchellkennedy commented Aug 4, 2021

daviddrysdale commented Aug 4, 2021

mitchellkennedy commented Aug 4, 2021

AA-Turner commented Aug 23, 2021 • edited Loading

daviddrysdale commented Aug 24, 2021

AA-Turner commented Aug 24, 2021

daviddrysdale commented Aug 25, 2021

AA-Turner commented Aug 27, 2021 • edited Loading

daviddrysdale commented Aug 27, 2021

AA-Turner commented Aug 29, 2021

daviddrysdale commented Aug 30, 2021

AA-Turner commented Sep 1, 2021 • edited Loading

AA-Turner commented Sep 1, 2021 • edited Loading

daviddrysdale commented Sep 4, 2021

daviddrysdale commented Sep 5, 2021

AA-Turner commented Sep 5, 2021 • edited Loading

AA-Turner commented Aug 23, 2021 •

edited

Loading

AA-Turner commented Aug 27, 2021 •

edited

Loading

AA-Turner commented Sep 1, 2021 •

edited

Loading

AA-Turner commented Sep 1, 2021 •

edited

Loading

AA-Turner commented Sep 5, 2021 •

edited

Loading