-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egork/flake8 #171
base: main
Are you sure you want to change the base?
Egork/flake8 #171
Conversation
… specify it in pytest.ini
… and coverage reporting tool
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
Co-authored-by: Luca Soldaini <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.
Some questions:
1. Is this a flavor of PEP8 we like?
Some non-standard things I noticed:
- 120 chars lines
- ok with mix of single and double quotes
- non-multiple indentation allowed
- new lines on binary operators
As an example, I went with the following in smashed:
[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
__pycache__
| \.git
| \.mypy_cache
| \.pytest_cache
| \.vscode
| \.venv
| \bdist\b
| \bdoc\b
)
'''
[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3
[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3
[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false
[tool.mypy-tests]
strict_optional = false
[tool.flake8]
exclude = [
".venv/",
"tmp/"
]
per-file-ignores = [
'*.py:E203',
'__init__.py:F401',
'*.pyi:E302,E305'
]
(Note that the pyproject.toml
above requires flake8-pyi
and Flake8-pyproject
to run properly)
We could also align with other AI2 projects are using, e.g. Tango's.
2. we should probably adopt some auto-formatting tools we run before release
Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .
3. Should we adopt mypy
too?
Probably not immediately? But again, other projects use it.
4. Some of the automatic refactor kills semantics of comments
I left a note in a couple of places where this happens.
@@ -288,7 +288,7 @@ def _simple_line_detection( | |||
Adapted from https://github.com/allenai/VILA/blob/e6d16afbd1832f44a430074855fbb4c3d3604f4a/src/vila/pdftools/pdfplumber_extractor.py#L24 | |||
|
|||
Modified Oct 2022 (kylel): Changed return value to be List[int] | |||
""" | |||
""" # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#noqa
is a blanket ignore, we should use specific errors ignore instead.
@@ -9,7 +8,7 @@ | |||
class BasePredictor: | |||
|
|||
################################################################### | |||
##################### Necessary Model Variables ################### | |||
# Necessary Model Variables # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shoudn't refactor this automatically
if next_row_first_token_text[-len(plural_suffix):] == plural_suffix: | ||
next_row_first_token_text = next_row_first_token_text[ | ||
: -len(plural_suffix) | ||
] | ||
: -len(plural_suffix) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not very legible
# input string list: [' Anon ', '1934', ' ', 'University and Educational Intelligence', ' ', 'Nature', | ||
# ' ', '133', ' ', '805–805'] | ||
# tokenization removes empty string: ['[CLS]', 'an', '##on', '1934', 'university', 'and', | ||
# 'educational', | ||
# 'intelligence', 'nature', '133', '80', '##5', '–', '80', '##5', '[SEP]'] | ||
# skipping empty string results in skipping word id: [None, 0, 0, 1, 3, 3, 3, 3, 5, 7, 9, 9, 9, 9, | ||
# 9, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now very hard to read.
from mmda.predictors.hf_predictors.bibentry_predictor.types import (BibEntryPredictionWithSpan, | ||
BibEntryStructureSpanGroups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a ton of non-multiple-of-4 indentation added by this PR–are we ok with them?
from tqdm import tqdm | ||
import layoutparser as lp | ||
|
||
from mmda.types import Document, Box, BoxGroup, Metadata | ||
from mmda.types.names import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, the star import was intentional here? or maybe it was intentional somewhere else.
|
||
from PIL.Image import Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicitly importing PIL
instead of type annotations via PIL.Image
might cause import errors if layoutparser
dependencies are not installed. Please check on a minimal installation.
mmda/types/old/boundingbox.old.py | ||
per-file-ignores = | ||
|
||
max-line-length = 119 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why 119 vs 79 vs something else?
- Why is
ai2_internal
,tests
, andexamples
not checked?
I agree and should probably asked before opening. Mix of styles in different parts of the code does not please my eyes.
Agree, might be worth formalizing style requirements for s2? Can even be part of 3 year vision plan (sharpen the saw)
|
Refactored the code to follow PEP8 style guide
Adding flake8 command to CICD pipeline