Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No need to download a file when comparing the MD5 hash #88

Merged
merged 2 commits into from
Aug 15, 2013

Conversation

geedew
Copy link
Contributor

@geedew geedew commented Aug 15, 2013

Originally the file was needed to compare the MD5 by calculating it. Later, it was found the hash exists on the file's header. Found by @coen-hyde here geedew@fde0f93#commitcomment-3868274

@pifantastic
Copy link
Owner

This is a good catch. Have you pulled the latest master where the Travis build is fixed?

@geedew
Copy link
Contributor Author

geedew commented Aug 15, 2013

Yes I did. Travis failed on testDownload... no idea why. It may be a random thing?

Testing download.jslocalhost.localdomain - - [15/Aug/2013:13:44:43 UTC] "GET /test/a.txt HTTP/1.1" 200 2
- -> /test/a.txt
F.
>> testDownload
>> Error: Expected 1 assertions, 0 ran

@@ -406,7 +406,7 @@ exports.init = function (grunt) {
}).end();
} else {
// verify was truthy, so we need to make sure that this file is actually the file it thinks it is
client.getFile( dest, function(err, res) {
client.headFile( dest, function(err, res) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we call headFile for both cases (!options.verify and options.verify), can we simplify this code a bit?

@geedew
Copy link
Contributor Author

geedew commented Aug 15, 2013

It could definitely be simplified. I typically code for readability and let a minimizer handle the simplifying. Also, right now they are both block scoped inside their own functions. Where if I moved the if to inside one function, they would be merged (read: less separate in modularity/readability) further. I'm all up for simplifications; anything you would suggest?

@pifantastic
Copy link
Owner

Actually, looking at this more, I'm not sure what the verify flag is for. If the local file has been modified, it most certainly doesn't have the same MD5 hash anymore.

@geedew
Copy link
Contributor Author

geedew commented Aug 15, 2013

You might be reading it wrong. Here's how it flows.

  • if !verify, only upload if the file doesn't exist ( no attempt to over-write it)
  • if verify
    • and got a 404, upload it
    • and error or non-200, reject it
    • read local file, create md5 of it
    • if match, skip file
    • compare mtime
    • if local > remote, upload it

If the hashes do not match, then we know they are different, we then have to check mtime to verify that the local one is actually newer before uploading and overwriting. Imagine you were to git checkout an older branch. It's quite possible that you may push that older code, wiping out the newer code that is on S3, erroneously.

@pifantastic
Copy link
Owner

Ha, yeah, I apologize. I read that too quickly. Thanks for breaking it down.

It looks to me like this can be rewritten with something like this:

      client.headFile( dest, function(err, res) {
        var upload;

        // If the file was not found, then we should be able to continue with a normal upload procedure
        if (res && res.statusCode === 404) {
          upload = exports.upload( src, dest, opts);
          // pass through the dfd state
          upload.then( dfd.resolve, dfd.reject );
        } 
        else if (!res || err || res.statusCode !== 200 ) {
          dfd.reject(makeError(MSG_ERR_DOWNLOAD, src, err || res.statusCode));
        } 
        else {
          if (!options.verify) {
            return dfd.resolve(util.format(MSG_SKIP_SUCCESS, src));
          }

          // the file exists so let's check to make sure it's the right file, if not, we'll update it
          // Read the local file so we can get its md5 hash.
          fs.readFile(src, function (err, data) {
            var remoteHash, localHash;

            if (err) {
              dfd.reject(makeError(MSG_ERR_UPLOAD, src, err));
            }
            else {
              // The etag head in the response from s3 has double quotes around
              // it. Strip them out.
              remoteHash = res.headers.etag.replace(/"/g, '');

              // Get an md5 of the local file so we can verify the upload.
              localHash = crypto.createHash('md5').update(data).digest('hex');

              if (remoteHash === localHash) {
                // the file exists and is the same so do nothing with that
                dfd.resolve(util.format(MSG_SKIP_MATCHES, src));
              }
              else {
                fs.stat( src, function(err, stats) {
                  var remoteWhen, localWhen, upload;

                  if (err) {
                    dfd.reject(makeError(MSG_ERR_UPLOAD, src, err));
                  } 
                  else {
                    // which one is newer? if local is newer, we should upload it
                    remoteWhen = new Date(res.headers['last-modified'] || "0"); // earliest date possible if no header is returned
                    localWhen = new Date(stats.mtime || "1"); // make second earliest date possible if mtime isn't set

                    if( localWhen > remoteWhen ) {
                      // default is that local is newer, only upload when it is
                      upload = exports.upload( src, dest, opts);
                      // pass through the dfd state
                      upload.then( dfd.resolve, dfd.reject );
                    } else {
                      dfd.resolve(util.format(MSG_SKIP_OLDER, src));
                    }

                  }
                });
              }
            }
          });
        }
      }).end(); 

@geedew
Copy link
Contributor Author

geedew commented Aug 15, 2013

Looks good, updated!

@geedew
Copy link
Contributor Author

geedew commented Aug 15, 2013

Looks like Travis was happier that time too.

pifantastic added a commit that referenced this pull request Aug 15, 2013
No need to download a file when comparing the MD5 hash
@pifantastic pifantastic merged commit a97ef68 into pifantastic:master Aug 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants