-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink the testing mechanism for images #963
Comments
I'm a fan of switching to That being said, I have begun to realize that it doesn't seem like we're always effectively testing all that much when we're just passing the same parameter to single-letter and PyGMT arguments and testing that the plots turn out the same. As we've discussed in #771, the thing we should be most concerned about is the "Python" parts of the function, not the aliases that pass arguments to the GMT API. I think we can reduce the testing workload by consolidating some of the tests that test nothing more than aliases (which includes many of the tests I've written) and focus on the Python parts (my recent example is testing the Python parts of |
Yes, I agree. This is what we must do. Still, the biggest challenge is "how to make sure that baseline images are correct.". We had some discussions in #451. One solution is storing static images in a separate repository (e.g.
Another solution is generating baseline images by directly calling
|
I like this idea. I think it more effectively tests that the aliases and Python functions line up with the expected outcome from GMT, as opposed to seeing if passing the same arguments to PyGMT twice will produce different results. We assume that if the "correct" inputs are sent to GMT, the figure will turn out as expected, much like a reference image. The downside is that it will expect someone to learn GMT commands, but I don't think this is too advanced from someone wrapping a new module. Why wouldn't this also be applicable for grd tests? Since we use standard GMT-hosted grids, wouldn't we be able to add How would this work with |
Yes, it sounds reasonable and valid assumption.
It can also be applied to grid tests.
pygmt/pygmt/tests/test_grd2cpt.py Lines 23 to 37 in e057927
Just take this test (written by you) as an example, I think I mentioned before that the test may still pass even if from pygmt.clib import Session
@check_figures_equal()
def test_grd2cpt(grid):
"""
Test creating a CPT with grd2cpt to create a CPT based off a grid input and
plot it with a color bar.
"""
# reference image
fig_ref = Figure()
with Session() as lib:
lib.call_module("basemap", "-Ba -JW0/15c -Rd")
lib.call_module("grd2cpt", "@earth_relief_01d")
lib.call_module("colorbar", "-Ba2000")
# test image
fig_test = Figure()
fig_test.basemap(frame="a", projection="W0/15c", region="d")
grd2cpt(grid=grid)
fig_test.colorbar(frame="a2000")
return fig_ref, fig_test |
@GenericMappingTools/python-contributors Does anyone have opinions on this? I'm in support of @seisman's example using |
As a new PyGMT and an old GMT user, it seems that the test method by If the right figure cannot be generated by PyGMT, I guess there are two possible reasons: 1) some bugs exist in PyGMT; 2) GMT has some bugs. We should fix the first one in PyGMT, while we should report GMT bugs to upstream. But if the PyGMT project plans to develop more functions that are not in GMT, this testing mechanism will not work. |
Any new functions in PyGMT would reply on GMT, so we can always find equivalent GMT command lines. For example, in the new |
You're right, so far I made no use in the tests to compare the images as you suggested @seisman. Hopefully I have time to work on this the upcoming weekend. |
@seisman Should we begin working on rewriting the tests, or should we wait until the GMT 6.2 release? I'm assuming we want to prioritize the rewriting the tests that use |
Not to throw a spinner into things, but do we want to reconsider using There's also solutions like Alternatively, I wonder if storing SVG instead of PNG would make things lighter? And yes, all this should be done closer to the GMT 6.2.0 release. We have the GMT dev tests set up on that CI for that matter and should be able to fix most tests before the actual GMT 6.2.0 package is out on conda-forge. |
I agree that
Not sure if it really works for us. How can we download and update baseline images if we want to run tests locally?
Unfortunately, GMT doesn't support SVG anymore (because recent Ghostscript versions drop the SVG support). Even GMT can save figures in SVG formats, I doubt that it may still not work. The GMT project stores PS files (ASCII) in the repository, and the repository size still grows quickly, because images (especially figures generated by grdimage) are saved as binary data in ASCII PS files (Not sure if I explain it clearly, but you can plot an image using |
The hash of the images will be stored in a I'll probably need to open up a demo PR to illustrate how things would work, but things we'll need to do are:
|
Yes, that would be better.
If it works, can we just store the PNG images in another github repository? |
Will need to have a think about where to store things as I create that PR. Probably won't have time to do this until v0.4.0 though. Edit: Just mirrored the PyGMT repo at https://dagshub.com/GenericMappingTools/pygmt. DAGsHub is a web platform for data version control (see FAQ). Give me a few days or weeks and I'll try and get a pipeline of some sort set-up for us to start uploading images! |
Ok, #1036 has been merged which sets up data version control (dvc) for the PyGMT repo. The new We will slowly migrate the tests from
Originally posted by @weiji14 in #1036 (comment)
Originally posted by @seisman in #1036 (comment) I'd encourage everyone to use for their open PRs when creating test images, and feel free to ask any questions if things are unclear! |
@weiji14 Perhaps you could open a separate issue or several issues with a list of TODOs so that people who want to help have a better idea of what to do.
Question: Do we want to do the migration before GMT 6.2.0 or after? I prefer to do the migration before v6.2.0, although it means more work for us. After bumping GMT to 6.2.0, most tests will fail due to the changes in GMT 6.2.0, but I feel it's also a good opportunity for us to learn the GMT changes and find potential bugs by comparing the images generated by GMT 6.1.1 and 6.2.0. FYI, one week ago, I built the PyGMT documentation using GMT 6.2.0, and found several issues with the GMT dev version (GenericMappingTools/gmt#4955), and they were all fixed in less than one week!
I think we may still need |
@weiji14, could I please get added on DAGshub? |
Ok done |
Done in #1108.
Waiting for the GMT v6.2.0 release.
Tracked by issue #1131.
Tracked by issue #1131.
I think we still need it when testing grids.
Yes, it's already documented in the contributing guides.
I just opened issue #1200 for discussions.
I just opened issue #1201 for discussions. |
I think we can close the issue. |
If you're unclear about how PyGMT tests images, please read the "Testing plots" section in the contributing guides first.
In short, for image-based tests, we need to specify the baseline/reference image. When we make any changes to the code, we can generate the new "test" image and compare it with the "baseline" image. If the two images are different, then we know the changes break the tests. The most important thing is, to ensure that the "baseline" images are correct.
Currently, we have two different methods to generate the "baseline" image and compare them:
@pytest.mark.mpl_image_compare
@check_figures_equal()
The
@pytest.mark.mpl_image_compare
method is the most straightforward way to do image testing. Using the decorator, we need to generate baseline images, check their correctness, and store them in the repository (https://github.com/GenericMappingTools/pygmt/tree/master/pygmt/tests/baseline).Pros:
Cons:
To avoid storing many large static images in the repository, we (mainly @weiji14 and @seisman) had some discussions (in #451, #522) and developed the
@check_figures_equal
decorator (#555, #590, #600).Below is an example test using the
@check_figures_equal()
decorator:pygmt/pygmt/tests/test_basemap.py
Lines 67 to 77 in e057927
In this example, the baseline/reference image
fig_ref
is generated usingbasemap(R="0/360/0/1000", J="P6i", B="afg")
, while the test imagefig_test
is generated usingbasemap(region=[0, 360, 0, 1000], projection="P6i", frame="afg")
. We can't see what the baseline image looks like, but we're somehow confident that the baseline image is correct, because thebasemap
wrapper is very simple.Pros:
Cons:
J="X10c/10c"
is disallowed) as proposed in Disallow single character arguments #262 (also related to Fail for invalid input arguments #256), then most of the code for generating reference images will be invalid.For some complicated wrappers, we even can't easily know if the reference image is correct. For example,
pygmt/pygmt/tests/test_subplot.py
Lines 30 to 42 in e057927
In this test, we expect that the baseline image has a 2-row-by-1-column subplot layout. However, if we make a silly mistake in
Figure.subplot
, resulting in a 1-row-by-2-column layout, the test still passes, because both the baseline and test images have the same "wrong" layout. Then the test is useless to us.Almost every plotting tools have to decide if they want to store static images in the repository. There are some similar discussions in the upstream GMT project (GenericMappingTools/gmt#3470) and the matplotlib project (matplotlib/matplotlib#16447).
As we're having more active developers now, I think we should rethink how we want to test PyGMT.
The text was updated successfully, but these errors were encountered: