-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashmaps #57
base: master
Are you sure you want to change the base?
Hashmaps #57
Conversation
And Behold, a generic hashmap. And it is 10% faster despite using a worse hash function (edit: well, now I ran the test again and it is not faster anymore :/ edit2: need to break this down further. So insertions are 17 to 23% faster, but lookups are 30% slower. in the link, not in the PR) |
The hashmap of TFLRE is based on the semi-lockfree thread-safe POCAHash* stuff of https://raw.githubusercontent.com/BeRo1985/poca/master/src/POCA.pas where the POCAHash stuff of my POCA script engine works with parallel new shadow copies during resizing until the resizing is finished. Just so you know from the backgrounds of the FLRE HashMap why they appear at first sight unoptimized regarding the delete and resize operations, because they originally came from POCA, where they had to work semi-lockfree with multithreaded POCA scripts using temporary shadow copies and interact with the POCA Garbage Collector. See POCAHashResize POCAHashPut POCAHashPut POCAHashPutCache POCAHashPut POCAHashSymbolChainCache and so on. A POCA Hash Type is like a LUA Table Type or like a JavaScript/ECMAScript object, see https://github.com/BeRo1985/poca/wiki/Syntax |
Still it is not optimal for a single threaded map. I keep learning new things about hashmaps If the map would store the hash of each entitiy, it could be resized faster. But then it needs more memory. So perhaps not storing the hash is better. Python uses the same construction with an array of entities and one array mapping hashs to entity index, to minimize the memory usage.
Deletion becomes really difficult after removing the EntityToCell Array. I could not get it working in a generic case. If the key is a pointer, deleted entries could be marked with nil, but not when it stores string keys, because nil is the empty string. A special string could mark it, but fpc does not allow to inline functions doing pointer(str) = ..., which made it inefficient. And there would be no special marking key in an int->int hashmap. Although if it would store the hash in the entity array rather than recomputing it, it could mark deleted entries with a zero or -1 hash. And then one needs hashsets. Is a hashset a hashmap with a void value, or is a hashmap a hashset of pairs? An enumerator for the |
I have closely examined the hashmap code, since I need a hashmap and FLRE's hashmap is faster than most. It is very impressive.
But there is a lot of duplicate code and freeing of things that are already freed by reference counting. This pull request removes them.
There are two things not addressed here.
EntityToCellIndex is only used to track deleted entities, so it probably could be removed, by changing the detection to empty keys or values. it would reduce the memory use of the maps by like 25%.
Secondly, the same hashmap implementation occurs four times in FLRE. Surely that could be merged to a single generic hashmap?