Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP/CI: Demonstrate possible integration of VirtusLab's type tests #45561

Closed
wants to merge 2 commits into from
Closed

TYP/CI: Demonstrate possible integration of VirtusLab's type tests #45561

wants to merge 2 commits into from

Conversation

twoertwein
Copy link
Member

Demonstrate the integration of positive type tests from https://github.com/VirtusLab/pandas-stubs for the public API (tests import
some private modules?).

This PR should NOT be merged and this PR probably violates licenses/copyrights. It is probably best if a member of the VirtusLab re-creates this PR. This PR is intended only to demonstrate how the snippets from VirtusLab could be integrated.

The typing errors would need to be ignored (and if pyright needs further ignores, # pyright: <rulename> = false at the top of a file). Pyright checks could also be skipped for now. There might be better ways to exclude/include files with pyright.

xref #45252

@pep8speaks
Copy link

pep8speaks commented Jan 23, 2022

Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-23 18:06:54 UTC

@simonjayhawkins
Copy link
Member

Thanks @twoertwein it may be worth getting the typing check to green for the POC. (pep less important)

what's the proposal for dealing with "reportUnusedVariable". would we just disable that check on the test files?

df[["col1", "col2"]] = [[1, 2], [3, 4]]
df[s] = [5, 6]
df[a] = [[1, 2], [3, 4]]
df[select_df] = [1, 2, 3]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work: ValueError: cannot assign mismatch length to masked array

Copy link
Contributor

@zkrolikowski-vl zkrolikowski-vl Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Investigating this as a possible regression from pandas version 1.3.2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type tests finding non-typing bugs :) If this worked in previous versions, feel free to open an issue!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not first time that this happened. See #40440 originally spotted by @joannasendorek.

I've opened up an issue: #45593. I'll circumvent this for now in my PR which is pending.

@twoertwein
Copy link
Member Author

twoertwein commented Jan 23, 2022

what's the proposal for dealing with "reportUnusedVariable". would we just disable that check on the test files?

I now limit pyright to the same config as we use everywhere else, but I enable its general type error checking (the most important option).

edit:
There are still a few cases where mypy and pyright are not in sync: if a function has no return annotation, pyright tries to infer one. In case of read_csv: pyright infers TextFileReader | Unknown (Unkown since, it doesn't know what parser.read() returns), correct would be TextFileReader | DataFrame. If mypy had TextFileReader | DataFrame as a return annotation, it would also create a few more errors.

@twoertwein
Copy link
Member Author

Thanks @twoertwein it may be worth getting the typing check to green for the POC. (pep less important)

@simonjayhawkins Type checks pass now (except one issue existing on main) and the CI is otherwise mostly green (except the one test mentioned above).


rdf4: pd.DataFrame = pd.concat(list(map(lambda x: s2, ["some_value", 3])), axis=1)


def test_types_json_normalize() -> None:
data1: List[Dict[str, Any]] = [
{"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
{"name": {"given": "Mose", "family": "Regner"}},
{"name": {"given": "More", "family": "Regner"}},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make codespell happy

@zkrolikowski-vl
Copy link
Contributor

@twoertwein So as more stubs are added we'll keep removing the type: ignore comments?

Let me look at the the invalid test case, and then I'll open a PR of my own based on this one - based on your preference. I agree that it's probably realistically more straight-forward in terms of the copyright/license.

@twoertwein
Copy link
Member Author

@twoertwein So as more stubs are added we'll keep removing the type: ignore comments?

I think that is the idea. Ideally, we want to achieve that with inline annotations (and pyi for cython/c). Knowing where we need ignores also helps us to prioritize annotating code.

Let me look at the the invalid test case, and then I'll open a PR of my own based on this one - based on your preference. I agree that it's probably realistically more straight-forward in terms of the copyright/license.

Great! You could also create a PR where multiple people from the VirtusLab are involved, in case attribution is important. You are welcome to either copy&past my follow-up commit or apply it as a patch, which ever you prefer.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 24, 2022

@twoertwein Thanks for doing this.

It seems that this would use the code from Virtus Labs to check our existing type stubs within the pandas code. Then when we brought over the pyi files from the Microsoft stubs, it would do the same. Am I correct about that?

@twoertwein
Copy link
Member Author

It seems that this would use the code from Virtus Labs to check our existing type stubs within the pandas code.

Yes, this simply copies the test snippets from Virtus Labs (but not their pyi files). This PR tries to address only #45252 and not #45253.

Then when we brought over the pyi files from the Microsoft stubs, it would do the same. Am I correct about that?

I'm sorry, I don't understand what you mean. Merging some of the Microsoft pyi files for cython files (e.g., offesets.pyi #45331 and interval.pyi #44922) might help reduce the number of ignores.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 24, 2022

Then when we brought over the pyi files from the Microsoft stubs, it would do the same. Am I correct about that?

I'm sorry, I don't understand what you mean. Merging some of the Microsoft pyi files for cython files (e.g., offesets.pyi #45331 and interval.pyi #44922) might help reduce the number of ignores.

I was referring to #45253 .

Following up on the discussion we had at the January developer meeting, @simonjayhawkins brought up that our current typing tests are meant to test the types of our internals, while if we create .pyi files throughout the code, which benefit external users, it could hurt some of that internal testing. So I had the following idea, which I don't know whether it is possible. Assuming we have a bunch of .pyi files throughout the pandas code, which is how we distribute the code, what if we ran mypy and pyright without them (maybe just keeping the ones corresponding to cython code) to check the internals, and then we run the type checking again, using the tests here (typing/valid and typing/invalid) on the full code with all pyi files present. Then we have a testing scheme for the internals and for the external pyi files.

@twoertwein
Copy link
Member Author

If we have two sets of type annotations (more complete public pyi files and the in-progress inline annotations), it would be great to test both!

If pyi files take precedence over inline (I'm not sure about that), your idea should be easily possible (run mypy&pyright twice, but for the second time delete some of the pyi files).

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 24, 2022

If we have two sets of type annotations (more complete public pyi files and the in-progress inline annotations), it would be great to test both!

Yes, I agree with that.

If pyi files take precedence over inline (I'm not sure about that), your idea should be easily possible (run mypy&pyright twice, but for the second time delete some of the pyi files).

From what I understand the pyi files do take precedence, and I was thinking about the deletion idea as a way to satisfy both goals of type checking - internal for what we put in the code and external for what is in the pyi files.

@twoertwein
Copy link
Member Author

One option could be to add only as many external pyi files to satisfy the public tests. I think type checkers don't mind a mix of pyi and inline (a.pyi but b.py has inline). Over time, we could then try to completely remove the external pyi files.

@twoertwein
Copy link
Member Author

Closing this PR in favor of #45594.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants