Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Used qpdf in all tests to compare resulting PDFs + improved development doc #33

Merged
merged 14 commits into from
Jan 8, 2021

Conversation

Lucas-C
Copy link
Member

@Lucas-C Lucas-C commented Jan 7, 2021

Pros:

  • ease debug a lot when the actual PDF differs from the expected one
  • test PDFs can easily be viewed and serve as a gallery of examples to users
  • fallback to hash-comparison if qpdf is not available

Cons:

  • require to store expected PDF in the git repo, but so far it's only 1,4MB in total

Note: it could be even better to use pdfrw to compare PDFs,
but sadly this does not support comparison so far: pmaupin/pdfrw#212

@Lucas-C Lucas-C requested a review from alexanderankin January 7, 2021 08:12
@Lucas-C Lucas-C force-pushed the qpdf branch 4 times, most recently from b376e18 to cd4b022 Compare January 7, 2021 08:36
@Lucas-C Lucas-C changed the title Used qpdf in all tests to compare resulting PDFs + improved development doc Used qpdf in all tests to compare resulting PDFs + improved development doc + removed tox Jan 7, 2021
@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

There is an issue with the tests that does not happen on my computer: they seem stuck on InsertImagesTest.test_insert_png

I'll have a look at it later on

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

The issue is solved, this is ready for review! :)

@alexanderankin
Copy link

tox was there to run tests on all version of python locally, is there no longer a way to do that? that makes sense to keep...

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Yeah, we can keep it for sure if you are used to it!

I removed it because I neither use it myself nor in the GitHub Actions pipeline: on my computer I handle virtualenvs myself with pew, and tox only brings me a slow overhead; on GitHub Actions it seems to prevent progressive display of test results.

Moreover, to me it can be a bit demotivating for newcomer contributors to require to run tests with tox, while simply running them with their current version of Python is good enough (and other tools like pytest or python -m unittest provide many useful features, like easily executing a single test).

I'm going to add a commit to this PR to restore the tox.ini file now.

.github/workflows/continuous-integration-workflow.yml Outdated Show resolved Hide resolved
docs/Development.md Show resolved Hide resolved
test/catalog_test.py Show resolved Hide resolved
test/cell_multi_cell_refactor/cell.py Outdated Show resolved Hide resolved
test/cell_multi_cell_refactor/multi_cell.py Show resolved Hide resolved
test/class_unit_test/shape_tests.py Show resolved Hide resolved
test/utilities.py Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
@alexanderankin
Copy link

yeah I get what youre coming from, like its a dependency, and i use a virtualenv too, but its also super annoying in case python introduces breaking changes or new features in 3.8 or 3.9 that some one wants to add in here and only finds out when the Github Action fails.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

OK, I understand your point.

Honestly, I do not see much benefit if having it in tox instead of in the GitHub Actions YAML:
one of the main benefit of a shared, automated CI pipeline is to detect this kind of regression, instead of having to install & test new Python (or libs) versions on one computer :)

@Lucas-C Lucas-C changed the title Used qpdf in all tests to compare resulting PDFs + improved development doc + removed tox Used qpdf in all tests to compare resulting PDFs + improved development doc Jan 7, 2021
@alexanderankin
Copy link

this test is hanging for me locally test_insert_jpg (image.image_types.insert_images.InsertImagesTest) ...

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

How did you run the tests? With tox?

Do you have any idea where exactly it hangs?

@alexanderankin
Copy link

alexanderankin commented Jan 7, 2021

figuring it out. ran it with tox, without qpdf

it hangs with python setup.py test as well, inside my virtualenvironment.

also pytest only finds some tests (not all)

hangs inside the assert_pdf_equal function, not sure where yet

@alexanderankin
Copy link

it hangs at:

                test.assertSequenceEqual(
                    actual_qpdf_file.readlines(), expected_qpdf_file.readlines()
                )

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Well spotted!

Could you try to replace .readlines() by .read().splitlines()?

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

@alexanderankin
Copy link

same effect

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Also, do you work under Linux or Windows? (to see if I can reproduce the issue on my side)

@alexanderankin
Copy link

linux

@alexanderankin
Copy link

ill try and come up with a docker command for it, but here's my debugging diff:

diff
diff --git a/test/cell_multi_cell_refactor/cell.py b/test/cell_multi_cell_refactor/cell.py
index 672a09b..e1cfcca 100644
--- a/test/cell_multi_cell_refactor/cell.py
+++ b/test/cell_multi_cell_refactor/cell.py
@@ -57,5 +57,6 @@ class CRefactorTest(unittest.TestCase):
             self, doc, "test_ln_positioning_and_page_breaking_for_cell.pdf"
         )
 
+
 if __name__ == "__main__":
     unittest.main()
diff --git a/test/cell_multi_cell_refactor/multi_cell.py b/test/cell_multi_cell_refactor/multi_cell.py
index df2f4da..5450706 100644
--- a/test/cell_multi_cell_refactor/multi_cell.py
+++ b/test/cell_multi_cell_refactor/multi_cell.py
@@ -68,5 +68,6 @@ class MCRefactorTest(unittest.TestCase):
             self, doc, "test_ln_positioning_and_page_breaking_for_multicell.pdf"
         )
 
+
 if __name__ == "__main__":
     unittest.main()
diff --git a/test/utilities.py b/test/utilities.py
index ce00565..998e80c 100644
--- a/test/utilities.py
+++ b/test/utilities.py
@@ -53,9 +53,13 @@ def assert_pdf_equal(test, pdf_or_tmpl, rel_expected_pdf_filepath, delete=True):
                         actual_qpdf_file.name,
                         expected_qpdf_file.name,
                     )
-                test.assertSequenceEqual(
-                    actual_qpdf_file.readlines(), expected_qpdf_file.readlines()
-                )
+                print("actual_lines = actual_qpdf_file.read().splitlines()")
+                actual_lines = actual_qpdf_file.read().splitlines()
+                print("expected_lines = expected_qpdf_file.read().splitlines()")
+                expected_lines = expected_qpdf_file.read().splitlines()
+                print("test.assertSequenceEqual(")
+                test.assertSequenceEqual(actual_lines, expected_lines)
+                print("test.assertSequenceEqual( done!")
         else:  # Fallback to hash comparison
             actual_hash = calculate_hash_of_file(actual_pdf_file.name)
             expected_hash = calculate_hash_of_file(expected_pdf_filepath)

@alexanderankin
Copy link

$ docker run --rm -it -w /app -v $(pwd):/app python bash
root@dockerhash:/app# apt-get update && apt-get install qpdf -y && python setup.py install && pip install tox && tox
...
test_insert_jpg (image.image_types.insert_images.InsertImagesTest) ... actual_lines = actual_qpdf_file.read().splitlines()
expected_lines = expected_qpdf_file.read().splitlines()
test.assertSequenceEqual(
^Cinterrupted
ERROR: got KeyboardInterrupt signal
____________________________________________________________________________________________ summary ____________________________________________________________________________________________
ERROR:  py36: InterpreterNotFound: python3.6
ERROR:   py37: keyboardinterrupt
ERROR:   py38: undefined
ERROR:   py39: undefined

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

I may not be able to look at it today due to some Windows update issues, but I'll try to solve this soon

Copy link

@alexanderankin alexanderankin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass on my machine but the machine im on right now only has python 3.8.5 so hopefully all the tests do actually work (not just the ones picked up by pytest) for the other python versions.

docs/Development.md Outdated Show resolved Hide resolved
test/utilities.py Outdated Show resolved Hide resolved
test/utilities.py Outdated Show resolved Hide resolved
test/utilities.py Outdated Show resolved Hide resolved
@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Given that the GitHub Actions pipeline was hanging before your commit, and that now it is passing for all 4 versions of Python, I think you found a fix! Congrats :)

I took good note not not to use tempfile.NamedTemporaryFile anymore...

@py-pdf py-pdf deleted a comment from alexanderankin Jan 7, 2021
@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

I think that while debugging & commiting on this branch you erased the changes I had pushed earlier on test/utilities.py:
9ee8a6e#diff-f996d7b465dbd822a0f768e107758e058ffd7612202f4ed541250e2166c9cf26

Give me a notice when you'll be done on this, and I'll make some final changes on test/utilities.py to restore those lost docstrings changes.

@alexanderankin
Copy link

one of the tests has a differing length sequence. sorry, yep - added the changes back in.

@alexanderankin
Copy link

alright im done working on this for today, not sure why the test is failing but it looks like its genuinely failing. maybe the whole tempfile approach wasnt worth it? I guess it might not have been worth it for me when i first made this test suite - dont remember why exactly i went with the way i did

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Ok, I'll take over then, no worry :)

I was able to reproduce the issue with Docker

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

I think the origin of the "hanging" call to assertSequenceEqual is the underlying call to difflib.
This comment by Tim Peters clearly states that its complexity is cubic to the number of lines of input: https://bugs.python.org/issue6931#msg223459

@alexanderankin
Copy link

that's, umm, hilariously bad...

So i guess i incorrectly identified the point of failure then and it can be reverted to namedtmpfile? i just tried it with io.open and it worked, with the length check being the real fix.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

I'm on it :)

…e.assertSequenceEqual limitation due to difflib
@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Looks good to me :)

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 7, 2021

Just a notice: I created python-pillow/Pillow#5190 to investigate why the test_insert_jpg behaves differently on my Windows computer than on yours and in GitHub Pages.

This is non blocking though, and I had witnessed this inconsistent behaviour before this PR.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 8, 2021

Thank you for the review & merge @alexanderankin !

@Lucas-C Lucas-C deleted the qpdf branch January 11, 2021 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants