You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
In HashedPartitioner you use python hash function
idx = hash(key) % size #line 12
The python hash function is not consistent and is based on the current running python environment.
For example hash('123') would produce a different partition each time a python process is restarted.
Is there a chance to use another python hash function instead (e.g. I'd recommend murmur hash, mmh3).
Thanks
The text was updated successfully, but these errors were encountered:
I think we should attempt to partition records consistently with the mainline java client. The code there is fairly simple abs(murmur2(key)) % numPartitions
Changing the key partitioning function has implications for anyone running a KeyedProducer and attempting to use parallel consumers based on the partitioned keys. I think this change requires at least a minor version bump when released (0.10)
Hi,
In HashedPartitioner you use python hash function
idx = hash(key) % size #line 12
The python hash function is not consistent and is based on the current running python environment.
For example hash('123') would produce a different partition each time a python process is restarted.
Is there a chance to use another python hash function instead (e.g. I'd recommend murmur hash, mmh3).
Thanks
The text was updated successfully, but these errors were encountered: