Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing mypy errors #3335

Merged
merged 2 commits into from
Feb 1, 2021
Merged

Fixing mypy errors #3335

merged 2 commits into from
Feb 1, 2021

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Feb 1, 2021

The python_type_check CI job is failing on Master. It's unclear whether this is the result of updating mypy or some other dependency. I am unable to get the same errors on my dev env but hopefully this PR resolves the issue.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -182,4 +182,6 @@ def __load_folds(self, folds: Optional[int]) -> None:
with open(path_to_folds, 'r') as f:
str_idx = f.read().splitlines()[folds]
list_idx = np.fromstring(str_idx, dtype=np.uint8, sep=' ')
self.data, self.labels = self.data[list_idx, :, :, :], self.labels[list_idx]
self.data = self.data[list_idx, :, :, :]
if self.labels is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can self.labels actually be None in this code path?

If it can, then mypy properly caught a potential bug and we should probably add a test for it.

If it can't, mypy is unfortunately forcing us to obfuscate the code (as it often does...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be None. See:

def __loadfile(self, data_file: str, labels_file: Optional[str] = None) -> Tuple[np.ndarray, Optional[np.ndarray]]:
labels = None

@pmeier: Why are there no tests for the entire STL10 class? I think this is something that could be addressed on a separate PR, this one focuses on fixing master failures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datumbox There are no tests for a lot of datasets, see #963 (comment). I'm on it.

Copy link
Member

@NicolasHug NicolasHug Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation of the function doesn't tell us much unfortunately. It tells us that self.labels can be None in general, but it doesn't tell whether it can be None in that particular code path.

As far as I can tell, __load_folds is only called right after __loadfile is called with labels_file, and so self.labels can never be None within __load_folds.
Unfortunately, this isn't something that mypy can figure out, but I feel like the fix hurts readability of the code.

Perhaps we should tell mypy to ignore this line instead?

Copy link
Contributor Author

@datumbox datumbox Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that based on the current code the self.labels seems to always has values. It's not clear whether the intention is to support also None values because other parts of the class seem to check on whether self.labels is None:

if self.labels is not None:
img, target = self.data[index], int(self.labels[index])
else:
img, target = self.data[index], None

At any case, there is no point assessing the removal of this check because without it the type_check continues to fail.

Perhaps on the future it's worth refactoring the class to simplify the logic and increase the readability. Would you like to create an issue with your points?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other parts of the class seem to also check for on whether self.labels is None

It makes sense to check in these parts because self.labels can indeed be None there (although I haven't double-checked).

But checking for None in __load_folds is misleading IMHO:

  • it suggests that labels can be None even though it can't
  • it doesn't say what to do if labels is None, i.e. there's no else clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug Sounds good. The scope of this PR is not to do massive rewrites on the dataset class bur rather fix the failing master and facilitate FBcode syncing. Please create an issue with your proposals describing which parts you think should be rewritten/simplified so that we can discuss it in more detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting to refactor the code, I'm suggesting to ignore mypy

@@ -57,8 +57,8 @@ def __init__(
import bz2
with bz2.open(full_path) as fp:
raw_data = [line.decode().split() for line in fp.readlines()]
imgs = [[x.split(':')[-1] for x in data[1:]] for data in raw_data]
imgs = np.asarray(imgs, dtype=np.float32).reshape((-1, 16, 16))
tmp_list = [[x.split(':')[-1] for x in data[1:]] for data in raw_data]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is typically an instance of perfectly fine code that mypy flags as unsafe and it leads to obfuscated code.

Maybe we can just avoid the tmp variable and directly declare

imgs = np.asarray(
	[...],
	dtype=...
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy marks the code because imgs is originally defined as a list of lists and then it becomes a numpy.

Not sure if inlining the expression increases readability or makes the code less obfuscated. Line 61 does already casting and reshaping, should we add splitting and iterating?

@codecov
Copy link

codecov bot commented Feb 1, 2021

Codecov Report

Merging #3335 (5196845) into master (b2cf604) will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3335   +/-   ##
=======================================
  Coverage   73.90%   73.90%           
=======================================
  Files         104      104           
  Lines        9618     9618           
  Branches     1543     1544    +1     
=======================================
  Hits         7108     7108           
  Misses       2028     2028           
  Partials      482      482           
Impacted Files Coverage Δ
torchvision/datasets/semeion.py 36.36% <ø> (+1.58%) ⬆️
torchvision/datasets/stl10.py 25.24% <0.00%> (-0.50%) ⬇️
torchvision/datasets/usps.py 32.35% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2cf604...5196845. Read the comment docs.

@pmeier
Copy link
Collaborator

pmeier commented Feb 1, 2021

Just a hunch: mypy==0.800 was released a week ago and the errors we are seeing might be stuff that wasn't checked before. I wager a guess if we pin

pip install --user --progress-bar off numpy mypy

to mypy==0.790 the errors will go away.

@datumbox
Copy link
Contributor Author

datumbox commented Feb 1, 2021

@pmeier Thanks for looking into this.

if we pin to mypy==0.790 the errors will go away.

I have not tried that but it's likely.

The thing is Mypy is right to report the errors so it might be worth fixing. The issue is that I've installed v0.800 on my laptop and I still get no errors on master... I also updated typing-extensions and mypy_extensions to the version that CI uses but still nothing... It would be good to know under which conditions these are flagged to ensure our dev env is aligned with the CI...

Any ideas what it might be?

@datumbox datumbox merged commit 859a535 into pytorch:master Feb 1, 2021
@datumbox datumbox deleted the ci/mypy_update branch February 1, 2021 15:56
@NicolasHug
Copy link
Member

I understand this is merged but for the record, I maintain that mypy is flagging false positives here, and that a more correct fix would be to ignore the mypy warnings (debatable for https://github.com/pytorch/vision/pull/3335/files#r567885550, but not for https://github.com/pytorch/vision/pull/3335/files#r567883994 IMHO).

facebook-github-bot pushed a commit that referenced this pull request Feb 4, 2021
Summary:
* Fixing mypy errors.

* Fixing typing issue.

Reviewed By: datumbox

Differential Revision: D26226616

fbshipit-source-id: 5ebd98d70d9ae6edc2b8e960c6cea38279fd60c1
@datumbox datumbox added the bug label Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants