Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove direct uses of hppc #84735

Closed
43 tasks done
rjernst opened this issue Mar 7, 2022 · 2 comments
Closed
43 tasks done

Remove direct uses of hppc #84735

rjernst opened this issue Mar 7, 2022 · 2 comments
Labels
:Core/Infra/Core Core issues without another label Meta Team:Core/Infra Meta label for core/infra team

Comments

@rjernst
Copy link
Member

rjernst commented Mar 7, 2022

This is a meta issue for removing most uses of hppc from Elasticsearch.

Like any complex software project, Elasticsearch uses various data structures from computer science: lists, maps, sets, etc. Java provides interfaces for these structures (Java Collections Framework, JCF), and concrete implementations using well known techniques, like a hashtable based Map implementation, HashMap. The Java provided implementations are used throughout Elasticsearch.

Additionally, Elasticsearch server depends on the HPPC library. It provides alternatives to Java’s builtin collections that attempt to provide efficient, fast and “open” implementations (open here means easily hackable). The library is mostly wrapped with our own classes, namely ImmutableOpenMap and ImmutableOpenIntMap, but it is also used sporadically throughout the codebase. But when to use the HPPC based collections vs the Java ones has never been well defined within the team.

HPPC has been around for more than a decade, and while it once provided major performance benefits over JCF, Java has changed a lot in that time, especially in hotspot. It is unclear if HPPC actually provides advantages over JCF anymore. On top of this, the HPPC based collections are more difficult to use, and don’t easily interoperate with JCF interfaces, requiring conversion, or more often, forcing a developer to use HPPC because an upstream object used it, like when dealing with cluster state.

We have decided to try to move away from using hppc. Doing so is tricky since swapping out JCF classes are not trivial in most uses. This meta issue lists direct uses of hppc that should be either removed entirely, or evaluated for performance differences with JCF. Whether performance tests are needed would depend on the use case, and whether it is expected to be sensitive to memory footprint, and the objects lifetime.

@rjernst rjernst added :Core/Infra/Core Core issues without another label Meta labels Mar 7, 2022
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Mar 7, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 7, 2022
This commit cleanups up allocation awareness integration tests to not
use hppc.

relates elastic#84735
elasticsearchmachine pushed a commit that referenced this issue Mar 7, 2022
This commit cleanups up allocation awareness integration tests to not
use hppc.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 8, 2022
This commit removes a couple trivial uses of hppc in x-pack tests.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 9, 2022
This commit removes a couple random uses of hppc maps in favor of
HashMap. The uses should not be memory sensitive.

relates elastic#84735
elasticsearchmachine pushed a commit that referenced this issue Mar 9, 2022
This commit removes a couple trivial uses of hppc in x-pack tests.

relates #84735
rjernst added a commit that referenced this issue Mar 9, 2022
This commit removes a couple random uses of hppc maps in favor of
HashMap. The uses should not be memory sensitive.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 10, 2022
The histogram field parses values and counts from document. Parsing the
document should not be the bottleneck in index, so using hppc classes
here appears unnecessary, especially given all the other overhead of I/O
for parsing a document. This commit converts these two uses to
ArrayList.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 10, 2022
This use was trivial, as it already was using references as the values
are String. This commit converts to using an ArrayList.

relates elastic#84735
rjernst added a commit that referenced this issue Mar 10, 2022
The histogram field parses values and counts from document. Parsing the
document should not be the bottleneck in index, so using hppc classes
here appears unnecessary, especially given all the other overhead of I/O
for parsing a document. This commit converts these two uses to
ArrayList.

relates #84735
rjernst added a commit that referenced this issue Mar 10, 2022
This use was trivial, as it already was using references as the values
are String. This commit converts to using an ArrayList.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 10, 2022
The LinearizabilityChecker has one use of hppc, where it uses a Long to
Object map. Given that the rest of the maps in the Cache class are Java
Maps, and this is only for testing, hppc does not seem necessary.
This commit converts the usage to Map.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 10, 2022
This method is only used for an assertion in test clusters. This commit
converts the hppc map to Java Map.

relates elastic#84735
rjernst added a commit that referenced this issue Mar 10, 2022
…84882)

This method is only used for an assertion in test clusters. This commit
converts the hppc map to Java Map.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Mar 14, 2022
A handful of server integ tests used hppc for local state to compare
with. None of these should be performance critical. This commit converts
remaining uses within server integ tests to Java collections.

relates elastic#84735
rjernst added a commit that referenced this issue Mar 15, 2022
A handful of server integ tests used hppc for local state to compare
with. None of these should be performance critical. This commit converts
remaining uses within server integ tests to Java collections.

relates #84735
rjernst added a commit that referenced this issue Mar 15, 2022
The LinearizabilityChecker has one use of hppc, where it uses a Long to
Object map. Given that the rest of the maps in the Cache class are Java
Maps, and this is only for testing, hppc does not seem necessary.
This commit converts the usage to Map.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 20, 2022
The percentiles bucket agg uses an hppc arraylist of doubles to store
the parsed percent values. This is a very small list and does not need
to be a native array. This commit changes to using a standard ArrayList.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 20, 2022
The http server transport uses an hppc integer set to find the first
open port. This commit changes it to use a standard HashSet.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 20, 2022
The builder for ImmutableOpenMap was changed to not inherit from hppc
types, but the builder from ImmutableOpenIntMap was missed. This commit
removes the base hppc interface from the builder and removes unused
methods, and converts one place that was using the leaked types.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 20, 2022
The CombinedDeletionPolicy keeps ref counts for each index snapshot
using an hppc primitive map. This commit converts it to use a standard
HashMap.

relates elastic#84735
rjernst added a commit that referenced this issue Apr 20, 2022
The builder for ImmutableOpenMap was changed to not inherit from hppc
types, but the builder from ImmutableOpenIntMap was missed. This commit
removes the base hppc interface from the builder and removes unused
methods, and converts one place that was using the leaked types.

relates #84735
rjernst added a commit that referenced this issue Apr 20, 2022
The http server transport uses an hppc integer set to find the first
open port. This commit changes it to use a standard HashSet.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 21, 2022
The LocalCheckpointTracker keeps mappings between sequence number and a
bitsets, using hppc primitive maps. This commit converts these to use
standard HashMaps.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 21, 2022
The translog writer uses hppc maps for mapping sequence numbers to
bitsets, and keeping track of which sequence numbers have been flushed
or not. This commit converts these uses to Java collections.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 21, 2022
The nested and reverse nested aggs use hppc maps and lists for mapping buckets to
ordinals. This commit converts these to use HashMaps and ArrayLists.

relates elastic#84735
rjernst added a commit that referenced this issue Apr 21, 2022
The CombinedDeletionPolicy keeps ref counts for each index snapshot
using an hppc primitive map. This commit converts it to use a standard
HashMap.

relates #84735
rjernst added a commit that referenced this issue Apr 21, 2022
The nested and reverse nested aggs use hppc maps and lists for mapping buckets to
ordinals. This commit converts these to use HashMaps and ArrayLists.

relates #84735
rjernst added a commit that referenced this issue Apr 25, 2022
The LocalCheckpointTracker keeps mappings between sequence number and a
bitsets, using hppc primitive maps. This commit converts these to use
standard HashMaps.

relates #84735
rjernst added a commit that referenced this issue Apr 25, 2022
The percentiles bucket agg uses an hppc arraylist of doubles to store
the parsed percent values. This is a very small list and does not need
to be a native array. This commit changes to using a standard ArrayList.

relates #84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 25, 2022
The top hits aggregator uses hppc for keeping track of leaf bucket
collectors. This commit converts it to use HashMap.

relates elastic#84735
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 25, 2022
The terms aggs take include/excludes for terms, and for numeric fields
use an hppc set of longs. This commit converts to using a HashSet.

relates elastic#84735
rjernst pushed a commit that referenced this issue Apr 26, 2022
Remove LongObjectHashMap, LongArrayList and LongProcedure from index translog files.

relates #84735
rjernst added a commit that referenced this issue Apr 26, 2022
The terms aggs take include/excludes for terms, and for numeric fields
use an hppc set of longs. This commit converts to using a HashSet.

relates #84735
rjernst added a commit that referenced this issue Apr 27, 2022
The top hits aggregator uses hppc for keeping track of leaf bucket
collectors. This commit converts it to use HashMap.

relates #84735
@rjernst
Copy link
Member Author

rjernst commented Apr 27, 2022

With the exception of ImmutableOpenIntMap and ImmutableOpenMap, the uses of hppc have been removed. I opened a followup for those classes, as they are used quite extensively: #86239

@rjernst rjernst closed this as completed Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Meta Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

2 participants