-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite CollectionUtils dedup to work with any type #85352
Conversation
CollectionUtils contains a method for sorting and deduplicating an hppc list of byte arrays. That is a very specific type. Yet the algorithm for deduplicating a sorted list is very simple and does not need to be specially typed. This commit removes the sort portion of the method, as that is already easily available (and timsort in Java should be just fine for this purpose, we don't need introsort), renames to unique, and makes the method take a generic List along with a Comparator. relates elastic#84735
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner.
T prevValue = list.get(0); | ||
for (int i = 1; i < list.size(); ++i) { | ||
T nextValue = list.get(i); | ||
if (cmp.compare(nextValue, prevValue) != 0 && prevNdx++ != i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pre-increment rather than post-increment? Otherwise it looks to me like a list where all elements are unique would still overwrite all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it should be pre-increment.
@@ -47,6 +44,27 @@ public static boolean isEmpty(Object[] array) { | |||
return array == null || array.length == 0; | |||
} | |||
|
|||
public static <T> void unique(List<T> list, Comparator<T> cmp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadocs that the list must be sorted according to the given comparator?
Also a bit of a nit-pick, but since this modifies the list in-place, I feel like naming the method after a verb would be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to uniquify
if (list.size() <= 1) { | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we verify that the list implements random access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm only requires a forward iterator. I've rewritten to use ListIterator instead of indices. The only caveat is that for LinkedList Java does not provide an efficient means to remove the rest of a list from a given point.
@jpountz I think I addressed all your comments, can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks good to me. One thing that makes me a bit nervous is that I'm unsure of the performance impact that this change would have for users of binary fields, but it would only possibly affect binary fields that are effectively multi-valued, which is a rare combination I suspect, so probably nothing to worry about?
server/src/main/java/org/elasticsearch/common/util/CollectionUtils.java
Outdated
Show resolved
Hide resolved
Yes that is my thinking. |
…tils.java Co-authored-by: Adrien Grand <[email protected]>
When a plugin specifies an elasticsearch version with an empty string in their plugin-descriptor.properties file, we don't enforce the version check. This PR ensures that we check the version string for empty as well as null value.
Fix small grammatical mistake. Closes elastic#85355
Since Java 9, the JDK has provided a means of parsing Java versions and getting the current Java version. That class obviates the need for the JavaVersion class of Elasticsearch. This commit removes the JavaVersion class in favor of Runtime.Version. Note that most of the changes here simply removed logic around versioning because this change is intended only for the master branch, where Java 17 is required.
This commit removes leftover sort methods in CollectionUtils which were only used in tests.
CollectionUtils contains a method for sorting and deduplicating an hppc
list of byte arrays. That is a very specific type. Yet the algorithm for
deduplicating a sorted list is very simple and does not need to be
specially typed.
This commit removes the sort portion of the method, as that is already
easily available (and timsort in Java should be just fine for this
purpose, we don't need introsort), renames to unique, and makes the
method take a generic List along with a Comparator.
relates #84735