Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help to explain behavior of Java string hash code in comparison with e.G. SpookyV2 32 hash code #302

Open
korvin2000 opened this issue Oct 11, 2024 · 0 comments

Comments

@korvin2000
Copy link

korvin2000 commented Oct 11, 2024

It's not a real issue , but a question or strange behaviour I can't explain. 🔢 Sorry ;-/

I have used ported SpookyV2 .3 hash32 bit version library to java and can confirm the results in your tests:
for FooXXXXBar, FooBarXXXX, XXXXFooBar with 14776336 keys:
I have about 1x number of collisions as expected,
so roughly around 25389 +/- (so about (0.99x - 1.00x) as in your tests) .
But If I am doing the same test using standard and very very simple java hash code for strings
I have for all 3 cases 0 collisions. Simply no collisions at all!!!
and java hash code calculated in very simple way:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

int hashCode(const std::string& value) {
    int h = 0;
    if (h == 0 && !value.empty()) {
        for (char c : value) {
            h = 31 * h + c;
        }
    }
    return h;
}
int hashCode(const char *value) {
    int h = 0;
    int length = strlen(value);
    if (h == 0 && length > 0) {
        for (int i = 0; i < length; i++) {
            h = 31 * h + value[i];
        }
    }
    return h;
}

for example for
"FooBar0001" is "6FA1E30E" as hex
why such very simple implementation for a string value beats any over complex implementation in terms of collisions?
Many years ago I tried to find an hash code algorithm that would beat this simple implementation in java for text strings in term of collisions and tested about > 20 hash codes implementations, but I didn't find any better alternative ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant