Egork/flake8 #171

egork520 · 2022-10-24T18:18:07Z

Refactored the code to follow PEP8 style guide

Adding flake8 command to CICD pipeline

… pytest

… specify it in pytest.ini

… and coverage reporting tool

…re coverage.

Co-authored-by: Luca Soldaini <[email protected]>

soldni

In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.

Some questions:

1. Is this a flavor of PEP8 we like?

Some non-standard things I noticed:

120 chars lines
ok with mix of single and double quotes
non-multiple indentation allowed
new lines on binary operators

As an example, I went with the following in smashed:

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
      __pycache__
    | \.git
    | \.mypy_cache
    | \.pytest_cache
    | \.vscode
    | \.venv
    | \bdist\b
    | \bdoc\b
)
'''

[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3

[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3

[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false

[tool.mypy-tests]
strict_optional = false

[tool.flake8]
exclude = [
    ".venv/",
    "tmp/"
]
per-file-ignores = [
    '*.py:E203',
    '__init__.py:F401',
    '*.pyi:E302,E305'
]

(Note that the pyproject.toml above requires flake8-pyi and Flake8-pyproject to run properly)

We could also align with other AI2 projects are using, e.g. Tango's.

2. we should probably adopt some auto-formatting tools we run before release

Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .

3. Should we adopt `mypy` too?

Probably not immediately? But again, other projects use it.

4. Some of the automatic refactor kills semantics of comments

I left a note in a couple of places where this happens.

soldni · 2022-10-24T20:23:33Z

mmda/parsers/pdfplumber_parser.py

@@ -288,7 +288,7 @@ def _simple_line_detection(
        Adapted from https://github.com/allenai/VILA/blob/e6d16afbd1832f44a430074855fbb4c3d3604f4a/src/vila/pdftools/pdfplumber_extractor.py#L24

        Modified Oct 2022 (kylel): Changed return value to be List[int]
-        """
+        """ # noqa


#noqa is a blanket ignore, we should use specific errors ignore instead.

soldni · 2022-10-24T20:24:13Z

mmda/predictors/base_predictors/base_predictor.py

@@ -9,7 +8,7 @@
 class BasePredictor:

    ###################################################################
-    ##################### Necessary Model Variables ###################
+    # Necessary Model Variables #


we shoudn't refactor this automatically

soldni · 2022-10-24T20:24:57Z

mmda/predictors/heuristic_predictors/dictionary_word_predictor.py

+                if next_row_first_token_text[-len(plural_suffix):] == plural_suffix:
                    next_row_first_token_text = next_row_first_token_text[
-                        : -len(plural_suffix)
-                    ]
+                                                : -len(plural_suffix)
+                                                ]


this is not very legible

soldni · 2022-10-24T20:25:31Z

mmda/predictors/hf_predictors/bibentry_predictor/predictor.py

+                # input string list: [' Anon ', '1934', ' ', 'University and Educational Intelligence', ' ', 'Nature',
+                # ' ', '133', ' ', '805–805']
+                # tokenization removes empty string: ['[CLS]', 'an', '##on', '1934', 'university', 'and',
+                # 'educational',
+                # 'intelligence', 'nature', '133', '80', '##5', '–', '80', '##5', '[SEP]']
+                # skipping empty string results in skipping word id: [None, 0, 0, 1, 3, 3, 3, 3, 5, 7, 9, 9, 9, 9,
+                # 9, None]


This comment is now very hard to read.

soldni · 2022-10-24T20:26:18Z

mmda/predictors/hf_predictors/bibentry_predictor/utils.py

+from mmda.predictors.hf_predictors.bibentry_predictor.types import (BibEntryPredictionWithSpan,
+                                                                    BibEntryStructureSpanGroups)


there are a ton of non-multiple-of-4 indentation added by this PR–are we ok with them?

soldni · 2022-10-24T20:27:17Z

mmda/predictors/lp_predictors.py

 from tqdm import tqdm
 import layoutparser as lp

 from mmda.types import Document, Box, BoxGroup, Metadata
-from mmda.types.names import *


iirc, the star import was intentional here? or maybe it was intentional somewhere else.

soldni · 2022-10-24T20:28:50Z

mmda/predictors/lp_predictors.py


+from PIL.Image import Image


explicitly importing PIL instead of type annotations via PIL.Image might cause import errors if layoutparser dependencies are not installed. Please check on a minimal installation.

soldni · 2022-10-24T20:30:41Z

setup.cfg

+    mmda/types/old/boundingbox.old.py
+per-file-ignores =
+
+max-line-length = 119


Why 119 vs 79 vs something else?

Why is ai2_internal, tests, and examples not checked?

egork520 · 2022-10-24T20:52:47Z

In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.

I agree and should probably asked before opening. Mix of styles in different parts of the code does not please my eyes.

Some questions:

1. Is this a flavor of PEP8 we like?

Some non-standard things I noticed:

120 chars lines
It is up for discussion, the resolution of the screens I personally prefer longer lines.

ok with mix of single and double quotes
Personally I am used to single quotes unless double is needed. It is up for discussion.

non-multiple indentation allowed
new lines on binary operators

As an example, I went with the following in smashed:

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
      __pycache__
    | \.git
    | \.mypy_cache
    | \.pytest_cache
    | \.vscode
    | \.venv
    | \bdist\b
    | \bdoc\b
)
'''

[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3

[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3

[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false

[tool.mypy-tests]
strict_optional = false

[tool.flake8]
exclude = [
    ".venv/",
    "tmp/"
]
per-file-ignores = [
    '*.py:E203',
    '__init__.py:F401',
    '*.pyi:E302,E305'
]

(Note that the pyproject.toml above requires flake8-pyi and Flake8-pyproject to run properly)

We could also align with other AI2 projects are using, e.g. Tango's.

Agree, might be worth formalizing style requirements for s2? Can even be part of 3 year vision plan (sharpen the saw)

2. we should probably adopt some auto-formatting tools we run before release

Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .

3. Should we adopt mypy too?

Probably not immediately? But again, other projects use it.

4. Some of the automatic refactor kills semantics of comments

I left a note in a couple of places where this happens.

egork520 and others added 30 commits October 19, 2022 16:29

Moving vila test to a class so that pytest ./test works locally

4f4c5c6

Added notes on how to run unit tests locally

ff18984

Moving change of directory to the setUp class. Locally tests fails in…

7f7bd66

… pytest

Moving change of directory to the setUp class. Locally tests fails in…

747ace4

… pytest

Creating variable for the fixtures path

a425583

Adding pytest.ini no need to type the folder for the tests

38b76b3

Adding exact command for selecting tests, removing directory name, we…

425b072

… specify it in pytest.ini

Adding test install dependencies, for running tests on multiple cpus,…

5c1ec12

… and coverage reporting tool

Command for running tests on multiple cpus

062f953

Adding test requirements installation to the mmda-ci.yml

55689ae

Adding plugin for running converage more easily, removing coverage

9e1bf61

Adding coverage config file,

6fbf179

Adding coverage lower bound 57%, specifying module for which to measu…

4035404

…re coverage.

Removing individual config files in favore of setup.cfg file

9d9d4d7

Update tests/test_predictors/test_vila_predictors.py

7739963

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_predictors/test_vila_predictors.py

d0f7117

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_predictors/test_vila_predictors.py

faded77

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_predictors/test_vila_predictors.py

ac54653

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_predictors/test_figure_table_predictors.py

c1b97fd

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_parsers/test_pdf_plumber_parser.py

cd06980

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_parsers/test_pdf_plumber_parser.py

fb00985

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_parsers/test_pdf_plumber_parser.py

69e1c7b

Co-authored-by: Luca Soldaini <[email protected]>

Update tests/test_parsers/test_pdf_plumber_parser.py

55cd37d

Co-authored-by: Luca Soldaini <[email protected]>

Rolling in test dependencies into dev

ffa7b56

Fixing typos

c26c46b

Adding parameters for the coverage

5e4b09f

Specifying percentage of the coverage in the builds

45eff40

Updating comments about pytest

641b542

Merge branch 'main' of github.com:allenai/mmda into egork/flake8

cf0436b

First part of lint fixing

6ba65aa

egork520 added 4 commits October 24, 2022 11:12

Second part of lint fixing

690c690

Skip old code

7ad0d57

Adding flake8 run to the compile step

0a9786a

Adding a note on how to run the flake8 test

46961d6

egork520 requested review from soldni, geli-gel and kyleclo October 24, 2022 18:18

soldni reviewed Oct 24, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Egork/flake8 #171

Egork/flake8 #171

egork520 commented Oct 24, 2022

soldni left a comment

soldni Oct 24, 2022 •

edited

Loading

soldni Oct 24, 2022

soldni Oct 24, 2022

soldni Oct 24, 2022

soldni Oct 24, 2022

soldni Oct 24, 2022

soldni Oct 24, 2022

soldni Oct 24, 2022

egork520 commented Oct 24, 2022

1. Is this a flavor of PEP8 we like?

2. we should probably adopt some auto-formatting tools we run before release

3. Should we adopt `mypy` too?

4. Some of the automatic refactor kills semantics of comments

		from mmda.predictors.hf_predictors.bibentry_predictor.types import (BibEntryPredictionWithSpan,
		BibEntryStructureSpanGroups)

Egork/flake8 #171

Are you sure you want to change the base?

Egork/flake8 #171

Conversation

egork520 commented Oct 24, 2022

soldni left a comment

Choose a reason for hiding this comment

1. Is this a flavor of PEP8 we like?

2. we should probably adopt some auto-formatting tools we run before release

3. Should we adopt mypy too?

4. Some of the automatic refactor kills semantics of comments

soldni Oct 24, 2022 • edited Loading

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

soldni Oct 24, 2022

Choose a reason for hiding this comment

egork520 commented Oct 24, 2022

1. Is this a flavor of PEP8 we like?

2. we should probably adopt some auto-formatting tools we run before release

3. Should we adopt mypy too?

4. Some of the automatic refactor kills semantics of comments

3. Should we adopt `mypy` too?

soldni Oct 24, 2022 •

edited

Loading

3. Should we adopt `mypy` too?