Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

400x faster with linear hashing of the hash map entries #8425

Merged
merged 18 commits into from
Dec 1, 2023

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Nov 29, 2023

Pull Request Description

Fixes #5233 by removing EconomicMap & co. and using plain old good linear hashing. Fixes #8090 by introducing StorageEntry.removed() rather than copying the builder on each removal.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • All code follows the
    Java,
  • All code has been tested:
    • Unit tests continue to pass
    • Check the effect on benchmarks

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Nov 30, 2023

I assume performance of the EnsoHashMap shall no longer be a problem.

obrazek

org_enso_benchmarks_generated_Enso_Hash_Map_100000_Enso_Incremental achieves 4.06ms per operation which makes it slightly faster than org_enso_benchmarks_generated_Enso_Hash_Map_100000_Java_Incremental benchmark.

obrazek

org_enso_benchmarks_generated_Enso_Hash_Map_100000_Enso_Replacement has been sped up 400 times and is now just 50% slower than the org_enso_benchmarks_generated_Enso_Hash_Map_100000_Java_Incremental - that's a good result given that the Java HashMap doesn't need to keep the referential transparency at all.

@JaroslavTulach JaroslavTulach linked an issue Nov 30, 2023 that may be closed by this pull request
@JaroslavTulach JaroslavTulach changed the title Linear hashing for the hash map entries 400x faster with linear hashing of the hash map entries Nov 30, 2023
Copy link
Collaborator

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor questions but other than that this looks nice!

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice performance improvement. Just few typos in javadoc.

@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Dec 1, 2023
@mergify mergify bot merged commit 81f0645 into develop Dec 1, 2023
34 checks passed
@mergify mergify bot deleted the wip/jtulach/MapInsert_5233 branch December 1, 2023 06:43
Comment on lines +462 to 464
tester = expect_column_names ["Test 1", "Test 2", "Test 3", "Test"]
problems = [Duplicate_Output_Column_Names.Error ["Test", "Test", "Test"]]
Problems.test_problem_handling action problems tester
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm slightly confused.

Changing the ordering of elements of the Map shouldn't really change the way rename_columns works, at least in theory. I guess that if it did, we should look into it.

I believe the ordering is not most crucial, but the original behaviour presented in this test seems rather preferred - we want the column without suffix to be the first column on the list, not last.

@jdunkerley shall I create a small bug report so that we can amend rename_columns to work again like before with the new Map?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving performance of Map.insert Get rid of TruffleBoundary in Map.insert and Map.get
6 participants