You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, tests/integration/test_diags.py runs the all_sets.cfg diagnostics and takes the diffs of the results and compares them against a baseline (whatever is on Chrysalis). We set the minimum diff threshold of non-zero pixels to 2%. The issue with taking a diff of two images is that any noise can break the test (e.g., change in matplotlib formatting, shifting of legend, floating point formatting, different font sizes). The baseline results sometimes need to be updated if
matplotlib updates introduce side-effects. It is challenging to debug the integration tests and they take a long time to run (#643), which bogs down development.
For example, below is the actual, expected, and the difference of both. Notice that the diff is basically just noise from the legend shifting over a bit and a change in the "Test" name.
Describe the solution you'd like
We should compare the underlying metrics in the .json files instead. Users should manually validate the plots are as expected based on the metrics being plotted since that is a more reliable over pixel comparisons.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
Currently,
tests/integration/test_diags.py
runs theall_sets.cfg
diagnostics and takes the diffs of the results and compares them against a baseline (whatever is on Chrysalis). We set the minimum diff threshold of non-zero pixels to 2%. The issue with taking a diff of two images is that any noise can break the test (e.g., change in matplotlib formatting, shifting of legend, floating point formatting, different font sizes). The baseline results sometimes need to be updated ifmatplotlib updates introduce side-effects. It is challenging to debug the integration tests and they take a long time to run (#643), which bogs down development.
For example, below is the actual, expected, and the difference of both. Notice that the diff is basically just noise from the legend shifting over a bit and a change in the "Test" name.
Describe the solution you'd like
We should compare the underlying metrics in the
.json
files instead. Users should manually validate the plots are as expected based on the metrics being plotted since that is a more reliable over pixel comparisons.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: