Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide functionality for calculating MD5 hashes of files #48

Open
janko opened this issue Apr 25, 2018 · 3 comments
Open

Provide functionality for calculating MD5 hashes of files #48

janko opened this issue Apr 25, 2018 · 3 comments

Comments

@janko
Copy link

janko commented Apr 25, 2018

First of all, thanks a lot for creating this very useful library! 🙏

I recently needed to calculate an MD5 hash of a File object, and while I saw the section in the README showing how to do that, I really didn't like how much custom code it involves.

I was wondering, could maybe this functionality be part of this library? In comparison, Ruby has a Digest::MD5 class which supports calculating hash from a single string, incremental hashing in chunks, and calculating a hash from a file on disk.

Digest::MD5.hexdigest("string")
# or
md5 = Digest::MD5.new
md5.update("chunk1")
md5.update("chunk2")
md5.hexdigest
# or
Digest::MD5.file("/path/to/file").hexdigest

I took me quite a while to find a JavaScript library which simplifies reading a File object in chunks – chunked-file-reader – and it appears to work correctly (I get the same MD5 hash as with the snippet in the README here). So I came up with the following function:

function fileMD5 (file) {
  return new Promise(function (resolve, reject) {
    var spark  = new SparkMD5.ArrayBuffer(),
        reader = new ChunkedFileReader();

    reader.subscribe('chunk', function (e) {
      spark.append(e.chunk);
    });

    reader.subscribe('end', function (e) {
      var rawHash    = spark.end(true);
      var base64Hash = btoa(rawHash);

      resolve(base64Hash);
    });

    reader.readChunks(file);
  })
}

Since it took me a while to come up with this solution, I was wondering if it made sense to have that built into spark-md5.

@janko
Copy link
Author

janko commented Apr 25, 2018

If not, I think it would be nice to show this example in the README, so that people are more willing to copy-paste it into their projects.

@satazor
Copy link
Owner

satazor commented Apr 26, 2018

Hello @janko-m. This library was primally made to be used in browser like environments. While it works in node, using the native crypto module will be much faster.

Having a method to read files and calculate the hash out of it would have to cater how to really read the files based on the environment: browser-like or node. Because of that, I don't think it makes much sense to have that built-in.

I'm willing to improve the README to make it clear how to calculate the hash of a file in both browser and node environments. Could you make a PR to add examples of both environments? The current example is for a browser environment.

Makes sense?

@janko
Copy link
Author

janko commented Apr 27, 2018

Hey @satazor, thanks for a quick answer.

Having a method to read files and calculate the hash out of it would have to cater how to really read the files based on the environment: browser-like or node. Because of that, I don't think it makes much sense to have that built-in.

I was under the impression that this library was already considered "browser-only", because, as you said, for Node there is already the crypto module (and hasha which uses it). So, I'm not sure I fully understand, if the functionality for calculating a hash from a JavaScript File object is added, why would it mean that it should also support Node? If that's really the case, then I agree that it wouldn't make much sense.

I'm willing to improve the README to make it clear how to calculate the hash of a file in both browser and node environments. Could you make a PR to add examples of both environments? The current example is for a browser environment.

My intention was simplifying only the browser example that's already there, as I only have experience with using spark-md5 in the browser. Great, I'll send the PR then 👍

janko added a commit to janko/js-spark-md5 that referenced this issue Apr 27, 2018
The chunked-file-reader comes with the functionality of reading a file
in chunks, so we can simplify the file example a lot by offloading this
logic to that package. I think this will make it much more approachable
for people wanting to reuse that code.

The chunked-file-reader package uses `readAsArrayBuffer()`, and we cannot
use it for tests that use `readAsBinaryString()`.

Also, chunked-file-reader always uses `File.prototype.slice`, but I think
that's ok now, since `blob.mozSlice()` is only needed for Firefox 12 and
earlier, though I don't on which version did Safari start supporting
`File.prototype.slice` (I tested that it works on Safari 11 which is the
current latest version).

Closes satazor#48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants