-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing mypy errors #3335
Fixing mypy errors #3335
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -182,4 +182,6 @@ def __load_folds(self, folds: Optional[int]) -> None: | |||
with open(path_to_folds, 'r') as f: | |||
str_idx = f.read().splitlines()[folds] | |||
list_idx = np.fromstring(str_idx, dtype=np.uint8, sep=' ') | |||
self.data, self.labels = self.data[list_idx, :, :, :], self.labels[list_idx] | |||
self.data = self.data[list_idx, :, :, :] | |||
if self.labels is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can self.labels
actually be None in this code path?
If it can, then mypy properly caught a potential bug and we should probably add a test for it.
If it can't, mypy
is unfortunately forcing us to obfuscate the code (as it often does...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can be None. See:
vision/torchvision/datasets/stl10.py
Lines 140 to 141 in 5196845
def __loadfile(self, data_file: str, labels_file: Optional[str] = None) -> Tuple[np.ndarray, Optional[np.ndarray]]: | |
labels = None |
@pmeier: Why are there no tests for the entire STL10 class? I think this is something that could be addressed on a separate PR, this one focuses on fixing master failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datumbox There are no tests for a lot of datasets, see #963 (comment). I'm on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation of the function doesn't tell us much unfortunately. It tells us that self.labels
can be None in general, but it doesn't tell whether it can be None
in that particular code path.
As far as I can tell, __load_folds
is only called right after __loadfile
is called with labels_file
, and so self.labels
can never be None
within __load_folds
.
Unfortunately, this isn't something that mypy
can figure out, but I feel like the fix hurts readability of the code.
Perhaps we should tell mypy to ignore this line instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that based on the current code the self.labels
seems to always has values. It's not clear whether the intention is to support also None values because other parts of the class seem to check on whether self.labels
is None:
vision/torchvision/datasets/stl10.py
Lines 120 to 123 in 5196845
if self.labels is not None: | |
img, target = self.data[index], int(self.labels[index]) | |
else: | |
img, target = self.data[index], None |
At any case, there is no point assessing the removal of this check because without it the type_check continues to fail.
Perhaps on the future it's worth refactoring the class to simplify the logic and increase the readability. Would you like to create an issue with your points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other parts of the class seem to also check for on whether self.labels is None
It makes sense to check in these parts because self.labels
can indeed be None there (although I haven't double-checked).
But checking for None
in __load_folds
is misleading IMHO:
- it suggests that
labels
can be None even though it can't - it doesn't say what to do if
labels
is None, i.e. there's noelse
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug Sounds good. The scope of this PR is not to do massive rewrites on the dataset class bur rather fix the failing master and facilitate FBcode syncing. Please create an issue with your proposals describing which parts you think should be rewritten/simplified so that we can discuss it in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting to refactor the code, I'm suggesting to ignore mypy
@@ -57,8 +57,8 @@ def __init__( | |||
import bz2 | |||
with bz2.open(full_path) as fp: | |||
raw_data = [line.decode().split() for line in fp.readlines()] | |||
imgs = [[x.split(':')[-1] for x in data[1:]] for data in raw_data] | |||
imgs = np.asarray(imgs, dtype=np.float32).reshape((-1, 16, 16)) | |||
tmp_list = [[x.split(':')[-1] for x in data[1:]] for data in raw_data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is typically an instance of perfectly fine code that mypy
flags as unsafe and it leads to obfuscated code.
Maybe we can just avoid the tmp variable and directly declare
imgs = np.asarray(
[...],
dtype=...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mypy marks the code because imgs
is originally defined as a list of lists and then it becomes a numpy.
Not sure if inlining the expression increases readability or makes the code less obfuscated. Line 61 does already casting and reshaping, should we add splitting and iterating?
Codecov Report
@@ Coverage Diff @@
## master #3335 +/- ##
=======================================
Coverage 73.90% 73.90%
=======================================
Files 104 104
Lines 9618 9618
Branches 1543 1544 +1
=======================================
Hits 7108 7108
Misses 2028 2028
Partials 482 482
Continue to review full report at Codecov.
|
Just a hunch: vision/.circleci/config.yml.in Line 118 in b2cf604
to |
@pmeier Thanks for looking into this.
I have not tried that but it's likely. The thing is Mypy is right to report the errors so it might be worth fixing. The issue is that I've installed v0.800 on my laptop and I still get no errors on master... I also updated Any ideas what it might be? |
I understand this is merged but for the record, I maintain that |
Summary: * Fixing mypy errors. * Fixing typing issue. Reviewed By: datumbox Differential Revision: D26226616 fbshipit-source-id: 5ebd98d70d9ae6edc2b8e960c6cea38279fd60c1
The python_type_check CI job is failing on Master. It's unclear whether this is the result of updating mypy or some other dependency. I am unable to get the same errors on my dev env but hopefully this PR resolves the issue.