Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully type-annotate and type-check everything #100

Merged
merged 24 commits into from
Mar 6, 2023
Merged

Fully type-annotate and type-check everything #100

merged 24 commits into from
Mar 6, 2023

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Dec 6, 2021

@satra At the moment, the following nontrivial issues are causing problems with the type checking:

  • The dynamic creation of enum types from dicts does not play well with mypy at all.
  • mypy does not like these two lines, and I'm not even convinced they work as intended anyway:

https://github.com/dandi/dandischema/blob/62067fbb0184d51f2dd9ee415cf014f74249cd78/dandischema/models.py#L137-L138

@jwodder jwodder added the minor Increment the minor version when merged label Dec 6, 2021
@jwodder jwodder requested a review from satra December 6, 2021 16:12
@codecov
Copy link

codecov bot commented Dec 6, 2021

Codecov Report

Patch coverage: 99.70% and project coverage change: +1.04 🎉

Comparison is base (5b48f26) 96.67% compared to head (0259c29) 97.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
+ Coverage   96.67%   97.71%   +1.04%     
==========================================
  Files          18       17       -1     
  Lines        1654     1751      +97     
==========================================
+ Hits         1599     1711     +112     
+ Misses         55       40      -15     
Flag Coverage Δ
unittests 97.71% <99.70%> (+1.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandischema/tests/utils.py 100.00% <ø> (ø)
dandischema/models.py 97.69% <99.49%> (+3.42%) ⬆️
dandischema/consts.py 100.00% <100.00%> (ø)
dandischema/datacite.py 95.45% <100.00%> (ø)
dandischema/digests/dandietag.py 95.58% <100.00%> (+0.77%) ⬆️
dandischema/digests/tests/test_dandietag.py 100.00% <100.00%> (ø)
dandischema/digests/tests/test_zarr.py 100.00% <100.00%> (ø)
dandischema/digests/zarr.py 98.73% <100.00%> (ø)
dandischema/exceptions.py 100.00% <100.00%> (ø)
dandischema/metadata.py 96.63% <100.00%> (ø)
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@satra satra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments. i'll address your comments in a separate thread.

@@ -134,7 +134,7 @@ def to_datacite(
NAME_PATTERN, contr_el.name
).pop()

if getattr(contr_el, "affiliation"):
if hasattr(contr_el, "affiliation") and contr_el.affiliation is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this not be replaced by the following pattern?

Suggested change
if hasattr(contr_el, "affiliation") and contr_el.affiliation is not None:
if getattr(contr_el, "affiliation", None):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using getattr, mypy can't tell whether contr_el.affiliation is None or not, so it'll warn when it's used in a non-None way inside the block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a deficiency in mypy since logically it cannot be None since then if would not evaluate to True. Do you know of the issue may be already in mypy on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python/mypy#11142 seems like a good starting point.

dandischema/exceptions.py Show resolved Hide resolved
@satra
Copy link
Member

satra commented Dec 13, 2021

The dynamic creation of enum types from dicts does not play well with mypy at all.

mypy doesn't tend to like anything dynamic. if you have a suggestion let me know. we need more info around those enums, so turning them into enums to start with will lose that capability.

mypy does not like these two lines, and I'm not even convinced they work as intended anyway:
dandischema/dandischema/models.py

this was added by @yarikoptic instead of manually writing schemaKey in every class. in terms of functionality i believe it works, can you suggested why you think it doesn't work?

The definition of wasGeneratedBy in Dandiset — which declares the field as a Optional[List[Project]] — is incompatible with the definition of the field in the supertype — which declares it as Optional[List[Activity]].

a Project is a subclass of Activity, which should be fine from a class perspective.

The definition of url in PublishedDandiset — which declares the field as a str — is incompatible with the definition of the field in the supertype — which declares it as Optional[HttpUrl]. If PublishedDandiset.url is changed to an HttpUrl, pydantic then complains about the field's regex parameter not applying.

pydantic doesn't allow regex's for url patterns. and we need to be more specific as to what that published url looks like. this is why it was changed to a str.

mypy doesn't like the overriding of HttpUrl based on an environment variable.

i don't have a good suggestion, but we will likely continue with dynamic configurations.

@jwodder
Copy link
Member Author

jwodder commented Dec 13, 2021

The dynamic creation of enum types from dicts does not play well with mypy at all.

mypy doesn't tend to like anything dynamic. if you have a suggestion let me know. we need more info around those enums, so turning them into enums to start with will lose that capability.

Need more info for what? In the current state, I don't see what is gained from the dynamic construction.

mypy does not like these two lines, and I'm not even convinced they work as intended anyway:
dandischema/dandischema/models.py

this was added by @yarikoptic instead of manually writing schemaKey in every class. in terms of functionality i believe it works, can you suggested why you think it doesn't work?

While the Field assignment works, the Literal annotation does not, as annotating a dict key doesn't do anything; quoting the Python docs: "For expressions as assignment targets, the annotations are evaluated if in class or module scope, but not stored."

The definition of wasGeneratedBy in Dandiset — which declares the field as a Optional[List[Project]] — is incompatible with the definition of the field in the supertype — which declares it as Optional[List[Activity]].

a Project is a subclass of Activity, which should be fine from a class perspective.

And yet mypy complains. I think it has something to do with the covariance/contravariance of List.

The definition of url in PublishedDandiset — which declares the field as a str — is incompatible with the definition of the field in the supertype — which declares it as Optional[HttpUrl]. If PublishedDandiset.url is changed to an HttpUrl, pydantic then complains about the field's regex parameter not applying.

pydantic doesn't allow regex's for url patterns. and we need to be more specific as to what that published url looks like. this is why it was changed to a str.

You could use a validator instead of a regex.

mypy doesn't like the overriding of HttpUrl based on an environment variable.

i don't have a good suggestion, but we will likely continue with dynamic configurations.

What's wrong with just always using AnyHttpUrl?

@satra
Copy link
Member

satra commented Dec 13, 2021

Need more info for what

for json-ld graph constructions. the dictionaries from which the enums are built have additional properties around those values.

You could use a validator instead of a regex.

we are validating through multiple routes using both python and json schema validator. regex's also apply to json schema. anyhttpurl is too general and also does not support regex's. the intent here is to be as specific as possible for both routes, and specialize in python only when necessary. regex's therefore support both routes to start with.

@jwodder
Copy link
Member Author

jwodder commented Dec 13, 2021

@satra

  • Is anything currently consuming the dicts for JSON-LD purposes? What other ways can the data be provided to such consumers?
  • Need to provide support for different "instances" of DANDI archive #76 and some other issues I've seen around about supporting alternative URLs imply that you may want to drop the regex restriction on PublishedDandiset.url soon. If that happens, could we just change that field and all other URL fields to plain AnyHttpUrl?

@jwodder jwodder mentioned this pull request Mar 2, 2022
@yarikoptic
Copy link
Member

should we resurrect this one? seems has aged enough, would need rebase etc...

@jwodder
Copy link
Member Author

jwodder commented Mar 28, 2022

@yarikoptic I would like to resurrect this, but in order to be worthwhile, we would need to deal with the various issues listed in the first comment. (cc @satra)

@satra
Copy link
Member

satra commented Mar 30, 2022

for AnyHttpUrl see this open issue: pydantic/pydantic#3143 (so just shifting to anyhttpurl won't solve the mypy issue). also we do have something specific that makes it httpurl, it seems forcing something to be more general just for static validation seems to be counter to the idea of specificity of elements.

i welcome suggestions, but we should not move away from our overall goals of maintaining both a json schema that can do a fair bit of validating and pydantic, which can add additional validators. the dandi web app uses the json schema for validation, so we can't simply move away from those elements.

@jwodder
Copy link
Member Author

jwodder commented Mar 31, 2022

I'm not saying we should move away from the JSON Schema, but I would like to know how the current enum metadata is consumed & used so that a static alternative can be devised. The dynamic enum generation is currently the biggest problem with getting mypy to be happy with this library, and I have yet to see a real reason why it's done the way it is.

@yarikoptic
Copy link
Member

I would like to revive this development so we could get dandischema annotated and then dandi-cli.
It seems that some (not sure if applicable) workaround for pydantic/AnyHttpUrl issue was suggested, see pydantic/pydantic#3143 (comment) - could it be applied here?
@jwodder, would you be kind to merge in master (not sure if comments would survive with rebase)?

@jwodder
Copy link
Member Author

jwodder commented Oct 6, 2022

@yarikoptic

would you be kind to merge in master

You mean rebase this PR against master, not merge this PR into master, right?

@yarikoptic
Copy link
Member

@yarikoptic

would you be kind to merge in master

You mean rebase this PR against master, not merge this PR into master, right?

correct. I guess I should have said "merge master in [this branch]" ;-)

@jwodder
Copy link
Member Author

jwodder commented Oct 6, 2022

@yarikoptic Rebased.

@yarikoptic
Copy link
Member

@satra - could you please re-review this PR, and follow up on existing discussions. It would be nice to finish up this PR and get it merged so the work doesn't get wasted and we get better type annotations checking through out.

@yarikoptic
Copy link
Member

Meanwhile: I've rebased and force-pushed to get a fresh state. Locally I get "Found 132 errors in 7 files (checked 21 source files)", so delaying finishing it up causes the growth of annotations to "catch up on". I can promise keeping an eye on this PR to get it over the finish line if @jwodder makes another push to add missing annotations.

dandischema/metadata.py Outdated Show resolved Hide resolved
@satra
Copy link
Member

satra commented Feb 17, 2023

@jwodder - there are several intentional tests for checking validation, which the type checker seems to pick up on as invalid.

@satra
Copy link
Member

satra commented Mar 4, 2023

fine with me.

@yarikoptic
Copy link
Member

which one is fine with you - 0.6.4 or 0.7.0 @satra?

also any change in schema version results in dandi-cli testing going kaboom since dandi-archive is not accepting future/unknown version of the schema, so I guess we nohow tune up the docker compose setup while testing dandischema to use this particular dandischema "version" @jwodder ?

@satra
Copy link
Member

satra commented Mar 4, 2023

i should have been clearer. for this 0.6.4 is fine. there are probably bigger changes coming, for which we will want a bump to 0.7.0. (btw, dandischema will probably go through a significant overhaul as we align neuro data models across projects. eta: 3 months or so)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor Increment the minor version when merged release Create a release when this pr is merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants