-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hashing function #225
Comments
Are you using Jsonnet via the API or commandline? If the former then in the latest release you can register callbacks from the host language (C, Python, etc) and these are available in Jsonnet functions. Such functions must be pure -- always return the same output for the same input. Wrapping md5 or whatever would satisfy that criteria. |
Alright, sweet! Thank you, that's very helpful. |
I'm using jsonnet from the command line and would like to see common hash functions available in the std lib. |
Other than md5, what would you use? |
SHA1 doesn't seem to be that useful in a config language for example. However in principle anything can be added as a native function, if its absence is blocking adoption in a given domain. |
MD5 would be enough for my purposes. My immediate use case is hashing the content of a ConfigMap Kubernetes resource and using it as a metadata label on the pod that consumes it. This would auto-redeploy the pod when the configmap changes - something kubernetes doesn't do on its own today. |
One can imagine use cases for generating unique names for things based on their content. MD5 would also suffice for that. Here's a question though: MD5 on sequences of bytes is straightforward, but on strings there is a question of representation. Should it be MD5 on the unicode codepoints (32 bits each) or UTF8? You get a different hash code for the two cases. For example, Python rejects unicode strings that don't fit into ASCII:
|
Here's an implementation of md5 in Jsonnet that only works for ASCII strings. Two things were annoying while writing this: 1) No hex literals and 2) No unsigned 32 bit ints operators. The first could be easily fixed but the second one wouldn't be sensible to add. I had to work around it by explicitly overflowing them at 2^32 for example. It is quite slow, it took 1m45s to calculate the md5 of all 10 meg of /usr/share/dict/words (after replacing non-unicode codepoints with 0). However, it does copy the whole thing several times in the pre-processing stage. A native implementation is pretty much instant. So there we go.
|
UTF-8 is fine for me, but worth seeing if anyone else has an opinion. |
Hi @sparkprime A hash function looks very useful actually. Do you see value in adding this to the stdlib ? |
Anyone still following this thread, note that a native std.md5("foo") now exists and is much faster than this version. |
* Internal refactor to improve performance
Would love to have a hash function available to me through the std library. Something like a UUID, md5, sha1, etc...
Something like:
Don't think I want to write this as a function in jsonnet, as its been done elsewhere a number of times and I'm concerned about performance issues
The text was updated successfully, but these errors were encountered: