Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hints: Infer types from vegalite schema for autogenerated code #3208

Merged
merged 52 commits into from
Nov 22, 2023

Conversation

binste
Copy link
Contributor

@binste binste commented Sep 29, 2023

This will close #2854. See #2951 for an overview of the type hint effort

…e has multiple arguments with Undefined default value
…o not allow for combinations of different types in lists. This deals with the invariance of lists
…ect that the overload statements are correct
…move type hints from signatures in docs as they are already shown in docstrings
@binste binste mentioned this pull request Sep 29, 2023
14 tasks
@binste binste marked this pull request as ready for review October 1, 2023 13:58
@binste
Copy link
Contributor Author

binste commented Nov 5, 2023

Side note, please ignore if not related, is this related to this issue: #2877? Would it be possible to define a type for ArrayLike? Eg using range(-15, 46, 60) for scale-domain will currently raise:

SchemaValidationError: 'range(-15, 46, 60)' is an invalid value for `domain`. Valid values are of type 'array', 'string', or 'object'.

'unaggregated' was expected

Or is this merely a matter of converting an ArrayLike as the range in this case into a list within Altair, as is aimed for in #3237?

We'll need to first implement the conversion of anything that looks like a Sequence (or just any iterable?) to a list as in the Vega-Lite specification we need them to be lists (i.e. an array in JSON terminology). Afterwards, we could then update the type hints to accept any iterable/sequence. We might be able to stick with Sequence as a type hint.

@binste
Copy link
Contributor Author

binste commented Nov 5, 2023

GitHub shows our comments in this PR out of order for me :) Just something to be aware of:

image

@mattijn
Copy link
Contributor

mattijn commented Nov 5, 2023

Side note, please ignore if not related, is this related to this issue: #2877? Would it be possible to define a type for ArrayLike? Eg using range(-15, 46, 60) for scale-domain will currently raise:

SchemaValidationError: 'range(-15, 46, 60)' is an invalid value for `domain`. Valid values are of type 'array', 'string', or 'object'.

'unaggregated' was expected

Or is this merely a matter of converting an ArrayLike as the range in this case into a list within Altair, as is aimed for in #3237?

@binste
Copy link
Contributor Author

binste commented Nov 5, 2023

Thanks @binste. Can you elaborate a bit on the _ParameterProtocol type? You had to define it for circular reasons, but how can I make use of this type? See for example here, on hovering domain within .scale(domain=[-15, 45] ..: image It suggests that I can use something as such?

my_param = alt.param(value=[-15, 45])
.scale(domain=my_param ..)  # actually works :)

It means that wherever _ParameterProtocol is given, you can use an alt.Parameter which is usually created from alt.param or alt.selection_*. You would not use _ParameterProtocol directly, it's purely there for the benefit of a type checker. It's usually used for structural subtyping ("duck-typing"). For example, if you have a function that accepts any type which has a get_number method which returns an int and does not care about anything else about this type, you can do:

class Proto(Protocol):
    def get_number(self) -> int:
        ...

and use this as a type hint:

class A:
    def get_number(self) -> int:
         return 2
    def other_method(self) -> str:
         return "something"


def extract_number(x: Proto) -> int:
     return x.get_number()


a = A()
extract_number(a)  # Mypy will accept this

As you mentioned, here I had to use it to prevent circular imports as ideally, we could use alt.Parameter directly. Does that make sense? Happy to elaborate

Another question based on the type hints, and maybe it is something I've to get used to, but for me it was a bit confusing that Sequence[] is a type of list? You write a very valuable section on it around here, but I was wondering if we just could use list here? I understand that we then don't define what is possible within the list, but currently it feels a bit duplicated, that the mentioned duplicate types in the type hint are nested variants within the Sequence type.

We could use list which is more generic as it would accept a list with any arbitrary content. I think this is a trade off between how helpful a type checker can be (the more accurate type hints the better -> speaks in favour of Sequence[...] over list) vs. how helpful the function and method signatures are in an IDE (for advanced users, still speaks in favour of more complete type hints in my opinion, for the average user maybe in favour of list). I lean towards being more explicit. What do you think?

@binste binste closed this Nov 5, 2023
@binste binste reopened this Nov 5, 2023
@mattijn
Copy link
Contributor

mattijn commented Nov 6, 2023

Thanks for the clear explanation @binste. Much appreciated!

I feel there is a balance between how we suggest/hint how users should use the library and what is function-wise the correct type hint. It is great that we can make our type checker happy, but I'm hoping we also can use these type hints to be useful in communicating how users can use the altair API. I feel the battle that you've had with mypy (much appreciated!), but I'm trying to review as being an average user..

For my understanding, would it be possible to make a small example how this currently leads to circular imports?

I'm somehow hoping we can circumvent this definition of _ParameterProtocol by restructuring pieces of code or by introducing some more advanced OOP methods.

@binste
Copy link
Contributor Author

binste commented Nov 7, 2023

I feel the battle that you've had with mypy (much appreciated!), but I'm trying to review as being an average user..

That's much appreciated and exactly what this PR needs! It's easy to get too focused on just adding type hints everywhere and making mypy happy so I think the feedback from Joel and you have already improved this a lot :)

For my understanding, would it be possible to make a small example how this currently leads to circular imports?

I'm somehow hoping we can circumvent this definition of _ParameterProtocol by restructuring pieces of code or by introducing some more advanced OOP methods.

Sure! I hope as well that there is a feasible solution which I'm missing. Parameter is defined in api.py#L189. However, the autogenerated files core.py, channels.py, and mixin.py are imported at the beginning of api.py: https://github.com/binste/altair/blob/autoinfer_types/altair/vegalite/v5/api.py#L17. Taking core.py as an example, if we would import Parameter from api.py, then we have the circular import. Parameter also requires an import of the expr module which again imports core.py to complicate things.

To keep it maintainable, I think code should be imported from autogenerated code in schema folder -> 'hand-written' code in api.py, data.py, ... -> top-level imports making it all accessible under the altair namespace.

@mattijn
Copy link
Contributor

mattijn commented Nov 7, 2023

Thanks again for the extra explanations. I start to understand the issue. Would it be possible to define an abstract base class of the Parameter within core.py instead of the _ParameterProtocol? So the Parameter class within api.py can then become a subclass of the Parameter within core.py. Something as such:

api.py:
import core
import expr
from typing import Union, Optional
from typing import Dict as TypingDict


class Parameter(core.Parameter, expr.MySomethingExpr, object):
    _counter: int = 0

    @classmethod
    def _get_name(cls) -> str:
        cls._counter += 1
        return f"param_{cls._counter}"

    def to_dict(self) -> TypingDict[str, Union[str, dict]]:
        return {"param": self.name}

    def _to_expr(self) -> str:
        return f"{self.name}"

    def __init__(self, name: Optional[str] = None) -> None:
        if name is None:
            name = self._get_name()
        self.name = name


def selection(name) -> Parameter:
    parameter = Parameter(name)
    return parameter
core.py:
from abc import ABC, abstractmethod
from typing import Union
from typing import Dict as TypingDict


class Parameter(ABC):
    _counter: int

    @classmethod
    @abstractmethod
    def _get_name(cls) -> str:
        pass

    @abstractmethod
    def to_dict(self) -> TypingDict[str, Union[str, dict]]:
        pass

    @abstractmethod
    def _to_expr(self) -> str:
        pass


class MySomethingCore:
    def __init__(self, param: Parameter):
        self.param = param
expr.py:
class MySomethingExpr:
    ...

This code passed mypy and when in use, result in something as such:

import api
from core import MySomethingCore

my_sth_core = MySomethingCore(api.selection('my_param'))
type(my_sth_core.param), my_sth_core.param.name
(api.Parameter, 'my_param')

I've to admit, I tried to implement this POC within this PR, but I could not understand the errors that mypy gave. I've no idea if they were related to the change I tried to implement.

@binste
Copy link
Contributor Author

binste commented Nov 8, 2023

Thanks for the example and playing around with it! I can replicate the mypy errors when implementing it in this PR and I can't figure out what the issue is with this approach... It's not due to the naming of the class, I can rename the protocol to Parameter and everything works but as soon as it is a subclass of ABC it breaks down. Weird. If no one else has an idea here, I think we'll need to stick to it being a Protocol but we can of course change the naming.

I assume the improvement would be that Parameter is more intuitive for users then _ParameterProtocol right? I do see the value in the simpler name but I find classes with the same name rather tricky in Altair as everything is imported upwards with from ... import * statements. It then depends on the order of the imports here https://github.com/altair-viz/altair/blob/main/altair/vegalite/v5/__init__.py#L3 to make sure that the correct Parameter makes it to the top level.

How about renaming it to _Parameter? No user should instantiate a alt.Parameter directly anyway but instead use param so I think the underscore would not really matter. However, it would still clearly signal that this is not something to be used directly. I implemented it in the latest commit. What do you think @mattijn?

@mattijn
Copy link
Contributor

mattijn commented Nov 9, 2023

I agree with your point that it is not a good idea use the same name, make sense to do it like you've implemented it.

Another point. With this PR the tooltip looks like this:
image

The type hints are very detailed, but there is no extra documentation on the arguments.

But in the current main branch it looks as follows:
image

All the arguments are explained in detail after the Parameter heading.
Can we get these tooltips back again?

@binste
Copy link
Contributor Author

binste commented Nov 11, 2023

Oh, sure! I added it back in the latest commit:

image

The docstring got lost because I had to create a custom signature for encode and I forgot to add a docstring there so this should not be the case anywhere else.

@mattijn
Copy link
Contributor

mattijn commented Nov 11, 2023

Thanks @binste! Can you do the same for the mark_circle() etc, it seems still missing as well.

image

I was playing around and checking the type hints here and there (it's a lot, but informative!), but when I check the type hints of transform_filter I notice it accept also Any among others, that is a bit strange I think, and here I see it accept Parameter instead of _Parameter elsewhere.
image

@binste
Copy link
Contributor Author

binste commented Nov 11, 2023

The mark_ methods never had a more extensive docstring than this, it's the same on main. But I agree that we should have this! I opened #3262 to track it as I'd prefer to do it in a separate PR as it is not directly related to type hints.

Regarding transform_filter, this is because Pylance (and probably mypy as well) can't resolve expr.core.Expression due to how expr module is both a module but also a callable:

image

But good that you bring it up, I realised that if I change the import statement to from ...expr import core as _expr_core and then use that instead of expr.core, it works:

image

I only used _Parameter in the autogenerated files where otherwise we would have circular imports. In all other files, I went with Parameter directly. No preference if we want to consistently use the _Parameter protocol or use Parameter where possible (current implementation).

Copy link
Contributor

@mattijn mattijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work here @binste! I approve:).

Since this involves the Altair API's user-facing side, having an additional review would be much appreciated. No rush, but would you be willing to have a look to his @jonmmease?

@binste
Copy link
Contributor Author

binste commented Nov 11, 2023

Thanks @mattijn for the very helpful review! :) Sounds good, I'll see if someone else finds a moment to have a look at the changes and in the meantime, I'll already try to prepare the final changes in a separate PR so that we can mark Altair as typed.

Copy link
Contributor

@jonmmease jonmmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for the epic work on this @binste! This is a great step forward for Altair as a foundational library in the PyData ecosystem.

After playing with the branch for a bit, I only came up with one non-blocking suggestion:

Would it be possible to replace BinParams, ImputeParams, TitleParams with the shorter Bin, Impute, and Title aliases?

Screenshot 2023-11-22 at 6 28 41 AM

@binste
Copy link
Contributor Author

binste commented Nov 22, 2023

Thanks @jonmmease for the kind words and the review! :)

I agree that this would be nicer. Unfortunately, these aliases are defined in api.py and we'd need them in the autogenerated files -> Leads to circular imports. It's the same reason why we have to use the _Parameter protocol instead of Parameter directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

infer argument types from vegalite schema
4 participants