Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid a potentially expensive reflection-based loop in assertArrayEquals... #918

Merged
merged 1 commit into from
Sep 11, 2014

Conversation

eamonnmcmanus
Copy link
Contributor

..., in the usual case where the arrays are in fact exactly equal.

This is purely a performance optimization so there are no new tests. For primitive arrays, the Arrays.deepEquals method quickly dispatches to the appropriate overload of Arrays.equals, which simply loops over the elements in the expected way. Additionally, Arrays.equals(char[], char[]) is intrinsified by the HotSpot VM so the boost there should be even more substantial.

…als, in the usual case where the arrays are in fact exactly equal.
@kcooney
Copy link
Member

kcooney commented May 31, 2014

I would like to propose something more radical. See #924

@eamonnmcmanus
Copy link
Contributor Author

I guess I don't really see the advantage. It'll perform much better in the case where the assertion fails, but is that something we really care about? Otherwise it seems like 200 lines of code for not that much gain over the fix here. Additionally, it performs no optimizations at all for float and double arrays.

@kcooney
Copy link
Member

kcooney commented Jun 3, 2014

@eamonnmcmanus I've looked at the source of deepEquals() and isn't cheap.

As for float and double, it wouldn't be hard for me to do something simulator for float and double. Note that your optimization would make comparing float and double arrays much slower for a number of cases where the arrays were "equal" within the delta.

@eamonnmcmanus
Copy link
Contributor Author

So, I'm looking at the source of deepEquals here:
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/Arrays.java

I don't see what it is that is not cheap. I recall that the original context of this change is that tests that compare very large arrays of primitives are slow. Calling deepEquals(new Object[] {a1}, new Object[] {a2}) has a slightly higher constant overhead than having a separate method for each kind of primitive array, but after that it should be equally fast. Additionally, equals(char[], char[]) is intrinsified so that particular case will likely be much faster. As for float and double arrays, I can't agree that they would be "much slower" than the existing code, since the cost of the Arrays.equals call is negligible compared to the cost of the reflective loop. We might want to handle that particular case with explicit code, if we think that people often compare big arrays of floats or doubles that are not exactly equal but equal within the tolerance. I still don't think there's much call for adding 200 lines of methods for every primitive array type.

@kcooney
Copy link
Member

kcooney commented Jun 3, 2014

For float and double arrays, if the arrays are not exactly equal but are equal within the delta, then your change will slow down the code vs. the current code.

I've updated my pull to reduce the logic duplication and handle double and float arrays. Let's see what the other JUnit owners think.

@eamonnmcmanus
Copy link
Contributor Author

I guess I'll just repeat that the slowdown will be trivial. If you are comparing 1000 floats against 1000 other nearly-equal floats, the runtime will be completely dominated by the reflective Array.get. The extra deepEquals call will be insignificant.

Anyway it looks like you're not interested in this fix, which is fine. It wasn't very hard to make. :-)

@marcphilipp
Copy link
Member

Both for #924 and this one, I really would love to see some performance numbers both for passing and failing assertions before/after the change.

@kcooney
Copy link
Member

kcooney commented Jun 3, 2014

@eamonnmcmanus that's one of the reasons why my solution avoids the reflective Array.get where possible

@marcphilipp I'm swamped for the remainder of the week, but I'm asking around to see if there's someone that can help.

@kcooney
Copy link
Member

kcooney commented Jul 8, 2014

One of my coworkers wrote some Caliper microbenchmarks for assertArrayEquals() and I ran them using the JUnit 4.10 implementation as a baseline. For arrays with 1,000,000 elements, the speedup for the changes in this pull request did not exceed 0.9%, and for many use cases it was slower than the existing JUnit 4.10 implementation.

@eamonnmcmanus
Copy link
Contributor Author

I have to say that I don't see how that can possibly be the case. What exactly was being measured?

@kcooney
Copy link
Member

kcooney commented Jul 8, 2014

Scratch that. Somoeone reviewing the changes caught a problem. I will rerun the benchmarks later this week.

I am trying to get them to contribute the benchmark back into JUnit so others could reproduce the results (I cannot contribute their changes)

@kcooney
Copy link
Member

kcooney commented Jul 29, 2014

I fixed the problem in the benchmarks and ran them again. Here are the results:

https://docs.google.com/spreadsheets/d/1BuOnxOz2xua6xcs9ix_PNbqhzHnh0pIobWmJdBbPo14/edit?usp=sharing

Pull request #924 speeds up assertArrayEquals() for all measured use cases, but the speed ups for two-dimensional arrays and multiple-dimension arrays are less that 7%. For other use cases, the speedup is over 98.3%

The speedups for this pull request (918) for single-dimensional arrays are over 98.8% except for arrays of doubles that are equivalent (within the delta) but not equals, which are slower by 1.6%. For two-dimensional arrays and multiple-dimension arrays, speedups are above 80.9%.

We could update pull #924 to use the methods in java.util.Arrays to quickly determine if the arrays are equal for all types except doubles. This would make #924 do well in all use cases, but, of course, would make the code more complicated.

We did some searches in the internal Google Java code base, and uses of assertArrayEquals() for float and double are quite uncommon, and the arrays being compared are not large. It's hard to say whether this is representative of code in the wild.

@stefanbirkner
Copy link
Contributor

We should update #924. Otherwise somebody will create an issue for the double arrays in the future.

@kcooney
Copy link
Member

kcooney commented Jul 30, 2014

@stefanbirkner #924 is a lot more complicated than this pull. I tried to simplify it, but I don't see how i can get it much simpler.

Do we really want to move to a more complicated solution just in case someone eventually complains about the performance of comparing large arrays of floating point values?

@stefanbirkner
Copy link
Contributor

I'm torn between the two solutions. @marcphilipp, @dsaff What do you think?

@dsaff
Copy link
Member

dsaff commented Jul 30, 2014

I'm terribly tempted by the simplicity of #918. Would it be possible to not regress on inexact matches by extracting a protected method called something like shortCircuitEquals, which we could override in InexactComparisonCriteria to do nothing?

Of course, that assumes that when comparing arrays of doubles, inexact-but-equivalent is a more common solution than exactly-equal, which I can't guess the truth of.

@eamonnmcmanus
Copy link
Contributor Author

To be clear, there is no change in behaviour in #918. It's just that the short-circuit test that arrays are equal (before iterating through them to see how they differ) won't apply to float or double arrays that differ but only within the tolerance. We could achieve similar short-circuiting for floats and doubles via a couple of instanceof checks and explicit loops for those two cases, which would allow us to avoid the expensive reflection-based loop for arrays that are equal within the tolerance. But my feeling from looking through Google's very large code base for calls to these methods is that it would not really be worthwhile. FP arrays are very rarely compared, and in every case I checked, the arrays were small (6 elements at most) so the performance hit of the reflection loop is not significant. The real win of this change is for comparing things like byte arrays, where it's easy to think of scenarios where you might be comparing arrays with thousands or even millions of elements. (Does the file I just read back match the bytes I wrote?)

@stefanbirkner
Copy link
Contributor

I'm convinced. Let us merge this pull request.

@kcooney
Copy link
Member

kcooney commented Jul 30, 2014

I think we could move this code change to ExactComparisonCriteria.arrayEquals(String message, Object expecteds, Object actuals) and then we would get the speedup for bytes, ints, etc, and no speed change for doubles and floats.

Edit: I meant ExactComparisonCriteria

@kcooney
Copy link
Member

kcooney commented Sep 9, 2014

No one's done the actual merge yet.

I decided to take a stab at simplifying and optimizing #924 and I pushed the new version. Benchmark results are in the "Run 2" rows here:

https://docs.google.com/spreadsheets/d/1BuOnxOz2xua6xcs9ix_PNbqhzHnh0pIobWmJdBbPo14/edit#gid=1430950594

The new version is faster than 4.12 in all test cases, and much faster than #918 for comparing two floating point arrays that are equivalent but not identical. Pull #918 is slightly faster for comparing two identical floating point arrays and Object arrays.

I'll leave it to the other maintainers to decide which pull to merge. Pull #918 is a smaller delta from the current code, while pull #924 is a rewrite of the array comparison assertion code.

@marcphilipp
Copy link
Member

I'm in favor of #918 because it simply calls the JDK implementation.

@kcooney
Copy link
Member

kcooney commented Sep 10, 2014

@marcphilipp both call the JDK implementation, and then fall back to a slower implementation if the JDK implementation indicates that the arrays are not equal. The code in #918 uses the existing code (ComparisonCriteria) as the fallback implementation, while #924 replaces the ComparisonCriteria with optimized code.

@dsaff
Copy link
Member

dsaff commented Sep 10, 2014

Kevin,

#924 would make sense rebased on top of #918, right? I'm thinking we merge #918 as a clear-win fix, and then consider #924. (I'm in favor of #924, but would rather not delay the quick touchdown while we debate whether to go for the extra point.)

@kcooney
Copy link
Member

kcooney commented Sep 11, 2014

@dsaff I am fine with that, as long as people are okay with a slowdown for large arrays of floating point values that are equals within the delta but not identical (which I fully admit may be an extremely rare thing to compare for most types of projects)..

@dsaff
Copy link
Member

dsaff commented Sep 11, 2014

@kcooney, I think that change is a decent trade-off, and I think Stefan and Marc agree. Going to pull the trigger on this, and then let's rebase #924 and continue discussion there (again, I'm in favor of #924, but could consider punting to the next release if others are uncomfortable with the complexity.)

dsaff pushed a commit that referenced this pull request Sep 11, 2014
Avoid a potentially expensive reflection-based loop in assertArrayEquals...
@dsaff dsaff merged commit c5a2e04 into junit-team:master Sep 11, 2014
@marcphilipp marcphilipp added this to the 4.12 milestone Sep 11, 2014
marcphilipp added a commit to marcphilipp/junit that referenced this pull request Sep 16, 2014
@marcphilipp marcphilipp mentioned this pull request Sep 16, 2014
kcooney added a commit that referenced this pull request Sep 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants