400x faster with linear hashing of the hash map entries #8425

JaroslavTulach · 2023-11-29T18:58:28Z

Pull Request Description

Fixes #5233 by removing EconomicMap & co. and using plain old good linear hashing. Fixes #8090 by introducing StorageEntry.removed() rather than copying the builder on each removal.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

All code follows the
Java,
All code has been tested:
- Unit tests continue to pass
- Check the effect on benchmarks

JaroslavTulach · 2023-11-30T08:59:57Z

I assume performance of the EnsoHashMap shall no longer be a problem.

org_enso_benchmarks_generated_Enso_Hash_Map_100000_Enso_Incremental achieves 4.06ms per operation which makes it slightly faster than org_enso_benchmarks_generated_Enso_Hash_Map_100000_Java_Incremental benchmark.

org_enso_benchmarks_generated_Enso_Hash_Map_100000_Enso_Replacement has been sped up 400 times and is now just 50% slower than the org_enso_benchmarks_generated_Enso_Hash_Map_100000_Java_Incremental - that's a good result given that the Java HashMap doesn't need to keep the referential transparency at all.

hubertp

Minor questions but other than that this looks nice!

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/EnsoHashMapBuilder.java

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/HashMapGetNode.java

Akirathan

Very nice performance improvement. Just few typos in javadoc.

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/EnsoHashMapBuilder.java

…/hash/EnsoHashMapBuilder.java Co-authored-by: Pavel Marek <[email protected]>

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/HashMapToVectorNode.java

radeusgd · 2023-12-01T12:54:44Z

test/Table_Tests/src/Common_Table_Operations/Select_Columns_Spec.enso

+            tester = expect_column_names ["Test 1", "Test 2", "Test 3", "Test"]
            problems = [Duplicate_Output_Column_Names.Error ["Test", "Test", "Test"]]
            Problems.test_problem_handling action problems tester


I think I'm slightly confused.

Changing the ordering of elements of the Map shouldn't really change the way rename_columns works, at least in theory. I guess that if it did, we should look into it.

I believe the ordering is not most crucial, but the original behaviour presented in this test seems rather preferred - we want the column without suffix to be the first column on the list, not last.

@jdunkerley shall I create a small bug report so that we can amend rename_columns to work again like before with the new Map?

Linear hashing for the entries

2eb5cdb

JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 29, 2023

JaroslavTulach self-assigned this Nov 29, 2023

JaroslavTulach requested review from 4e6, hubertp and Akirathan as code owners November 29, 2023 18:58

JaroslavTulach requested a review from radeusgd November 29, 2023 18:58

enso-bot bot mentioned this pull request Nov 30, 2023

Unused arguments cause buildEngineDistribution to generate invalid caches and prevent tests requiring the affected file from running #8384

Closed

JaroslavTulach added 2 commits November 30, 2023 05:48

Only include entries visible in latest generation

3711af7

Better documentation

25105b3

JaroslavTulach linked an issue Nov 30, 2023 that may be closed by this pull request

Improving performance of Map.insert #8090

Closed

JaroslavTulach added 6 commits November 30, 2023 11:13

Special equality for NaN

707c4f7

No need to compute vector of pairs in interpreter

0ed1db4

Just read the value if it is available

a09b770

Search only entries visible in given generation

0185fbb

HashSize is derived from size

9c80ce3

Order of elements in Map may not match the insertion order

23a5143

JaroslavTulach requested review from jdunkerley and GregoryTravis as code owners November 30, 2023 10:39

JaroslavTulach changed the title ~~Linear hashing for the hash map entries~~ 400x faster with linear hashing of the hash map entries Nov 30, 2023

Avoid usage of Streams in PE code. NI doesn't like it.

8dcd11c

hubertp approved these changes Nov 30, 2023

View reviewed changes

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/EnsoHashMapBuilder.java Show resolved Hide resolved

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/HashMapGetNode.java Show resolved Hide resolved

Akirathan approved these changes Nov 30, 2023

View reviewed changes

JaroslavTulach and others added 6 commits November 30, 2023 13:59

Update engine/runtime/src/main/java/org/enso/interpreter/runtime/data…

cb03c17

…/hash/EnsoHashMapBuilder.java Co-authored-by: Pavel Marek <[email protected]>

Javadoc wording

6522d31

Another Javadoc fix

2c3100f

Don't throw IllegalStateException on interop failures

d047104

Make sure every object has some hashCode

4cb4570

Adjusting to changes in Map entries ordering

67bc16e

GregoryTravis approved these changes Nov 30, 2023

View reviewed changes

JaroslavTulach added 2 commits December 1, 2023 04:35

Returning back accidentally removed @specialization

2718200

Sorting the result by value of x

705b877

JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Dec 1, 2023

mergify bot merged commit 81f0645 into develop Dec 1, 2023
34 checks passed

mergify bot deleted the wip/jtulach/MapInsert_5233 branch December 1, 2023 06:43

radeusgd reviewed Dec 1, 2023

View reviewed changes

engine/runtime/src/main/java/org/enso/interpreter/runtime/data/hash/HashMapToVectorNode.java Show resolved Hide resolved

radeusgd reviewed Dec 1, 2023

View reviewed changes

enso-bot bot mentioned this pull request Dec 2, 2023

Internal Compiler Error exposed if duplicate field names are provided in a type #7962

Closed

JaroslavTulach mentioned this pull request Jan 5, 2024

Migrate WithWarnings to use EnsoHashMap to speed them up significantly #8682

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

400x faster with linear hashing of the hash map entries #8425

400x faster with linear hashing of the hash map entries #8425

JaroslavTulach commented Nov 29, 2023 •

edited

Loading

JaroslavTulach commented Nov 30, 2023 •

edited

Loading

hubertp left a comment •

edited

Loading

Akirathan left a comment

radeusgd Dec 1, 2023

400x faster with linear hashing of the hash map entries #8425

400x faster with linear hashing of the hash map entries #8425

Conversation

JaroslavTulach commented Nov 29, 2023 • edited Loading

Pull Request Description

Important Notes

Checklist

JaroslavTulach commented Nov 30, 2023 • edited Loading

hubertp left a comment • edited Loading

Choose a reason for hiding this comment

Akirathan left a comment

Choose a reason for hiding this comment

radeusgd Dec 1, 2023

Choose a reason for hiding this comment

JaroslavTulach commented Nov 29, 2023 •

edited

Loading

JaroslavTulach commented Nov 30, 2023 •

edited

Loading

hubertp left a comment •

edited

Loading