-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing mypy errors #3335
Fixing mypy errors #3335
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,8 +57,8 @@ def __init__( | |
import bz2 | ||
with bz2.open(full_path) as fp: | ||
raw_data = [line.decode().split() for line in fp.readlines()] | ||
imgs = [[x.split(':')[-1] for x in data[1:]] for data in raw_data] | ||
imgs = np.asarray(imgs, dtype=np.float32).reshape((-1, 16, 16)) | ||
tmp_list = [[x.split(':')[-1] for x in data[1:]] for data in raw_data] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is typically an instance of perfectly fine code that Maybe we can just avoid the tmp variable and directly declare imgs = np.asarray(
[...],
dtype=...
) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mypy marks the code because Not sure if inlining the expression increases readability or makes the code less obfuscated. Line 61 does already casting and reshaping, should we add splitting and iterating? |
||
imgs = np.asarray(tmp_list, dtype=np.float32).reshape((-1, 16, 16)) | ||
imgs = ((cast(np.ndarray, imgs) + 1) / 2 * 255).astype(dtype=np.uint8) | ||
targets = [int(d[0]) - 1 for d in raw_data] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can
self.labels
actually be None in this code path?If it can, then mypy properly caught a potential bug and we should probably add a test for it.
If it can't,
mypy
is unfortunately forcing us to obfuscate the code (as it often does...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can be None. See:
vision/torchvision/datasets/stl10.py
Lines 140 to 141 in 5196845
@pmeier: Why are there no tests for the entire STL10 class? I think this is something that could be addressed on a separate PR, this one focuses on fixing master failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datumbox There are no tests for a lot of datasets, see #963 (comment). I'm on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation of the function doesn't tell us much unfortunately. It tells us that
self.labels
can be None in general, but it doesn't tell whether it can beNone
in that particular code path.As far as I can tell,
__load_folds
is only called right after__loadfile
is called withlabels_file
, and soself.labels
can never beNone
within__load_folds
.Unfortunately, this isn't something that
mypy
can figure out, but I feel like the fix hurts readability of the code.Perhaps we should tell mypy to ignore this line instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that based on the current code the
self.labels
seems to always has values. It's not clear whether the intention is to support also None values because other parts of the class seem to check on whetherself.labels
is None:vision/torchvision/datasets/stl10.py
Lines 120 to 123 in 5196845
At any case, there is no point assessing the removal of this check because without it the type_check continues to fail.
Perhaps on the future it's worth refactoring the class to simplify the logic and increase the readability. Would you like to create an issue with your points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to check in these parts because
self.labels
can indeed be None there (although I haven't double-checked).But checking for
None
in__load_folds
is misleading IMHO:labels
can be None even though it can'tlabels
is None, i.e. there's noelse
clause.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug Sounds good. The scope of this PR is not to do massive rewrites on the dataset class bur rather fix the failing master and facilitate FBcode syncing. Please create an issue with your proposals describing which parts you think should be rewritten/simplified so that we can discuss it in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting to refactor the code, I'm suggesting to ignore mypy