Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add watch option for cache validation #13

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

viankakrisna
Copy link
Contributor

@viankakrisna viankakrisna commented Jul 4, 2017

1st pass of providing watch mode.
Right now this PR only hash the files and folders in watch settings to calculate the value of isCacheValid

To test:

  1. cd to examples/watch
  2. run npm start
  3. ctrl + c
  4. run npm start again to verify that cache is working
  5. modify package.json
  6. npm start -> cache is invalid, DLL is rebuilt

TODOS:
- [ ] Watch for the changes in these files and folder and rebuild without restarting devServer holding this one for future development

@viankakrisna viankakrisna changed the title [WIP] add watch option for cache validation add watch option for cache validation Jul 5, 2017
@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 5, 2017

I think running another watcher to watch the DLL watch list files on the fly is a bit overkill. So I'll hold that for now.

It's already working to validate the DLL cache every time there are changes in the contents of watch list on the next webpack start. works for folders too, so changing something in node_modules will trigger DLL recompilation on the next start.

@asfktz
Copy link
Owner

asfktz commented Jul 6, 2017

@viankakrisna this is awesome!
I plan to dive in more deeply tomorrow morning, to see what kind of sorcery you put into it (;

Running the watch examples made me very happy, it worked beautifully.

I noticed now there's no cleanup before each build, is it intentionally?
image

I think running another watcher to watch the DLL watch list files on the fly is a bit overkill. So I'll hold that for now.

That will be a great addition to be able to make changes to the files without re-running webpack.
I think we better get the current functionally shipped first, and then we will come back to it.

Many thanks 🙏

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 7, 2017

I noticed now there's no cleanup before each build, is it intentionally?

It unintentional 😅 I've updated the code so the build output will be in the separate folders based on the hash and get cleaned up accordingly.

That will be a great addition to be able to make changes to the files without re-running webpack.
I think we better get the current functionally shipped first, and then we will come back to it.

Yes, there's a whole lot about webpack hooks that I don't understand yet. I think this is more on your area, like how to retrigger compilation based on external file changes? Is adding more to compilationDependencies is the right way to do it?

Many thanks 🙏

Thanks to you I can reimplement this idea on a webpack plugin 🙏 😄

@asfktz
Copy link
Owner

asfktz commented Jul 7, 2017

Hi, @viankakrisna,

I would like to fix a few bug about multiple instances and I think It might be wiser to work on your branch instead of the master.

I saw that you already handle it using a counter and also store the environment.
That's nice. Do you think we can trust process.env.NODE_ENV?
About the counter. I fear it will bite us later. I would like to give my take on it.

I like what you did with the hash!

Do you feel that it's ready from your side?

Btw, how do you debug your code?
I use node --inspect to debug it with Chome's devtools
I think I'd be lost without it.

Do you know how to set it up? If not, tell me, I would gladly show you how.

@viankakrisna
Copy link
Contributor Author

Yeah, sorry the diff is a mess, I can't live without prettier anymore.... So i think all is well from my side. Right now i'm just trying to find a way to improve hashing perf, but haven't got anything substantial to add. Feel free to merge this to other branch on your repo / push to my branch.

Need to go to bed now :)

@asfktz
Copy link
Owner

asfktz commented Jul 7, 2017

I really wondered if you ever sleep 😉

Good night, my friend.

@asfktz
Copy link
Owner

asfktz commented Jul 8, 2017

Just wanted to let you know that I'm really happy with the way you handled the instances problem!

I fixed the cleanup and made a few style changes, but other than that, it worked like a charm.
I'm really happy that you decided to change the file structure of the .cache dir
It makes much more sense that way.

I saw what you mean about the perf problem of getContents, I comment it out for now.
I saw that you'd played around with making an async version of it, but you probably forced to keep it sync because you needed the "hash" to be available at the plugin's initialization stage.
I can make changes to allow it if you want.

Also, you can take advantage of bluebird's non standard api, it's very elegant and pleasant to use.

Take a lot at the cleanup function for examples:

The sync version of fs returns an array, which you can iterate over:

const cleanup = settings =>
  fs
    .readdirSync(cacheDir)
    .filter(fileName => fileName.includes(settings.nodeEnv))
    .filter(fileName => fileName.includes(settings.id))
    .forEach(p => del(path.join(cacheDir, p)));

In src/utils/fs I return bluebird.promisifyAll(fs).
That mean that now, every method of fs now also has an Async verstion too, which return a promise instead (for ex: fs.readfileAsync)

The beauty of bluebird's non-standard api is that now you can iterate over the promise's values as if it were just a regular array:

import fs from './utils/fs';

...

const cleanup = settings => () => {
  return fs
    .readdirAsync(cacheDir)
    .filter(fileName => fileName.includes(settings.nodeEnv))
    .filter(fileName => fileName.includes(settings.id))
    .each(p => del(path.join(cacheDir, p)));
};

This is equivalent to:

const cleanup = settings => () => {
  return fs
    .readdirAsync(cacheDir)
    .then(arr => {
      return arr.filter(fileName => fileName.includes(settings.nodeEnv));
    })
    .then(arr => {
      return arr.filter(fileName => fileName.includes(settings.id));
    })
    .then(arr => {
      return arr.each(p => del(path.join(cacheDir, p)));
    });
};

This is the branch worked on: https://github.com/asfktz/autodll-webpack-plugin/tree/pr/13
It a clone of yours with my changes.
I believe it's ready to be published, but I want to take another look later tonight.

Thank you for the awesome work you made (:

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 8, 2017

Nice! I like what you did there. And thanks for sharing the beauty of bluebird :)

The problem with getContents is that you need to make it synchronous else you get EMFILE: too many open files error. Here's an article describing the problem that I get https://medium.com/@adamhooper/node-synchronous-code-runs-faster-than-asynchronous-code-b0553d5cf54e

Prefer synchronous code, when it’s an option. It’s faster and simpler. For instance, reading lots of small files:

It's unfortunate if we ship without the ability to watch the changes of the contents of directories, maybe we can just enable it via a flag? getContents is what enables you to change files in node_modules and restart with a new hash.

@viankakrisna
Copy link
Contributor Author

Of course, there's a penalty in enabling folder hashing. But it makes the cache invalidation more accurate.

Btw, it's only enabled only when we define settings.watch so I think it's safe to uncomment those lines. Maybe rename it to watchPaths?

@asfktz
Copy link
Owner

asfktz commented Jul 8, 2017

I absolutely believe this is our next milestone.

But for now, I needed to ship a release to fix two critical bugs:

  • The one with the multiple instances
  • The default context was wrong

I help some big projects to implement the plugin and these two are show stoppers.

Now that this is out of the way, we can focus on detecting file changes.

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 8, 2017

Ok then, ship it! 👍
I'll keep up my fork to the latest changes.

@viankakrisna
Copy link
Contributor Author

I figure maybe instead of hashing the contents we could just calculate the hash via name + last modified time? Interestingly, reading last modified date is slower on my machine rather than reading the content itself. Will do proper reporting when I have time.

Btw, CI failing on node 4 because while testing, the hash on the first run is not written, and on node 8 because of unhandledRejection.

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 9, 2017

Note: It's nice if we can run tests locally for multiple versions of node. 😄

@asfktz
Copy link
Owner

asfktz commented Jul 10, 2017

Interestingly, reading last modified date is slower on my machine rather than reading the content itself

Really? that's interesting...

Maybe we can also find a way to reduce the number of files needed to be watched

For examples, if we setup the plugin like so:

new AutoDllPlugin({
  watch: true,
  entry: {
    vendor: [
      'react',
      'react-dom',
      'moment'
    ]
  }
})

When can only check for changes in react, react-dom and moment.

And if we want to add another file to the watch list we can do something like:

new AutoDllPlugin({
  watch: {
    files: [
      './package.json'
    ]
  },
  entry: {
    vendor: [
      'react',
      'react-dom',
      'moment'
    ]
  }
})

(Not sure about the format, but that's the main idea)

What do you think?

@viankakrisna
Copy link
Contributor Author

The problem with that is if somehow, moment has external dependency, and it is hoisted to the top level folder in node_modules, changes to that folder is not detected.

We already let the user to specify which folder or file they want to watch by giving them the watch option as array, if they don't want to watch node_modules they can remove that from the array. I think that's a fair tradeoff and simpler :) if in the future we want to add more setting to watch, we can detect if it is an object.

@viankakrisna
Copy link
Contributor Author

I think it's great if we want to give the option on how the hash is created though, eg. switch between reading content or last modified date.

new AutoDllPlugin({
  watch: {
    paths: [
      path.resolve('./package.json'),
      path.resolve('./node_modules')
    ],
    recursive: Boolean, // if we want to watch the paths recursively,
    hashSource: 'contents' // or 'modified'
  },
  entry: {
    vendor: [
      'react',
      'react-dom',
      'moment'
    ]
  }
})

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 10, 2017

Either way, I'm fine as long as the user can control the list of paths (file and folder) that we should watch for changes :)

@viankakrisna
Copy link
Contributor Author

Maybe the question is how to find a reasonable defaults for {watch: true}.
The choice is between

[
    path.resolve('package.json'), 
    path.resolve('yarn.lock'),
    path.resolve('package-lock.json'),
    path.resolve('node_modules')
]

or

[...entryFilesPath]

I'll go with the former because it's more magic and solve most edge cases

@asfktz
Copy link
Owner

asfktz commented Jul 10, 2017

The problem with that is if somehow, moment has external dependency, and it is hoisted to the top level folder in node_modules, changes to that folder is not detected.

How about reading the module's dependencies in its package.json?

react for example, depends on:

"dependencies": {
    "create-react-class": "^15.6.0",
    "fbjs": "^0.8.9",
    "loose-envify": "^1.1.0",
    "object-assign": "^4.1.0",
    "prop-types": "^15.5.10"
  },

Just read those and you're done.

@viankakrisna
Copy link
Contributor Author

Interesting, maybe we can use require.resolve for it?

@viankakrisna
Copy link
Contributor Author

but then, what if we change the file required from the main file?

@asfktz
Copy link
Owner

asfktz commented Jul 10, 2017

but then, what if we change the file required from the main file?

Not sure I understand, can you show me an example?

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 10, 2017

ok, so imagine this

require.resolve('package-a') resolves to node_modules/package-a/index.js

the content of node_modules/package-a/index.js:

const b = require('./someOtherModule.js')
module.exports = someFunction(b)

And then the user modify this ./someOtherModule.js. We'll miss updating the hash because we only watch node_modules/package-a/index.js, not node_modules/package-a/someOtherModule.js which affect the behavior of node_modules/package-a/index.js

@asfktz
Copy link
Owner

asfktz commented Jul 10, 2017

Hm... but npm modules are precompiled, isn't it already resolved at this stage?

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 10, 2017

hmm, I don't know what you mean by precompiled, but isn't importing like this means that it's a different file from the main entry point?
import Panel from 'react-bootstrap/lib/Panel

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 10, 2017

We could just watch the dependency folder from the entry files and its dependencies by parsing the package.json of entry files. How deep we want to go here? Do dependencies of the dependencies needs to be watched too? I think in the end we will watch most of the node_modules, so it's simpler to just watch it recursively in the first place....

@viankakrisna
Copy link
Contributor Author

btw, played around with adding a setting for different implementation of reading the hash source here

@asfktz
Copy link
Owner

asfktz commented Jul 10, 2017

Hm... but npm modules are precompiled, isn't it already resolved at this stage?

Sorry. My bad.

How deep we want to go here? Do dependencies of the dependencies needs to be watched too?

It depends on what are the use cases going to be.

For example:

If you just want to add a console.log inside react to check something,
you might not care so much about whether or not its dependencies are being watched.

If you're working on a PR on the other hand, you sure do, but then you probably gonna use a symlink instead anyway.

My original intention was that when you want to work on a module from the DLL,
you just comment it out from the entry list, do you things and put it back again.

So in my option:

  • for each entry module check:
    • is it a symlink?
      • remove it from the entry list (don't include it in the DLL in the first place)
    • is it a node module?
      • check the module's files (but not its dependencies)

I believe that should be enough for the initial release of the feature.

I someone will need more than that, he/she will open an Issue for it.

What do you think?

With that said, if you find a way to watch all the possible dependencies without affecting performance - that will be a whole different story.

@viankakrisna
Copy link
Contributor Author

With that said, if you find a way to watch all the possible dependencies without affecting performance - that will be a whole different story.

I think it's a trade off, and the user that configuring it should know that there will be a tradeoff for adding node_modules to the watch option. By default, we don't give them anything if they don't specify an array in watch option.

And rather than watch: true magic, I'd rather have the user configure what exactly they want to watch by watch: [...arrayOfAbsolutePath]

@viankakrisna
Copy link
Contributor Author

viankakrisna commented Jul 10, 2017

It could be ['./node_modules/react', './node_modules/react-bootstrap'], and we can still detect the changes for the files on these folders on the next start

run npm install before running examples on test

re-add watch examples
@viankakrisna viankakrisna force-pushed the watch branch 2 times, most recently from 7f4798f to d9820b5 Compare July 11, 2017 23:37
tweak test names
remove redundant io

return on catch
@viankakrisna
Copy link
Contributor Author

played around with this idea on this CRA PR facebook/create-react-app#2786

And the performance for mtime is consistently faster than that content (of course). 3s vs 9s for hashing CRA's node_modules.

@asfktz
Copy link
Owner

asfktz commented Jul 15, 2017

That's nice, @viankakrisna.

You're really doing a great job at improving the performance, I admire that.

But Ade, additional 3 seconds for a plugin that aim to improve performance is still quite a lot, even 1s is, especially if this is gonna be the defaults for create-react-app.

How about instead, provide the create-react-app user with a simple way to exclude that module from the DLL as long as he's making changes to it (from his package.json for example).

What do you think?

@viankakrisna
Copy link
Contributor Author

yeah but 3 second delay is only happen only if the user setup to watch node_modules. I think the real win from this pr is ability to watch changes to package.json, yarn.lock, and package-lock.json for cache invalidation (this surely wont take 3 seconds).

Watching node_modules is entirely for the user to make decision.

As for CRA, there are some discussion of placing src/vendor.js for splitting vendor chunks here So i think it can be applied for this plugin too. eg instead of listing down the deps in array we could accept a path to a file and build the dll from that instead. Watch here

Note: we should add a test that adding this plugin won't make the take longer than without it. Just to be sure :)

@viankakrisna
Copy link
Contributor Author

btw nice work on the docs! 👍

@viankakrisna
Copy link
Contributor Author

Also, i think the point of this PR is lost because it can be implemented from outside of this plugin. As long as we stringify and hash the settings object.
https://github.com/viankakrisna/create-react-app/blob/b21f775aeeadee8628fde79758eff0bb12041b45/packages/react-scripts/config/webpack.config.prod.js#L300-L305

@asfktz
Copy link
Owner

asfktz commented Jul 15, 2017

I think the real win from this pr is ability to watch changes to package.json, yarn.lock, and package-lock.json for cache invalidation

That already exists. This is why the .cache directory is inside node_modules.
NPM & Yarn delete that cache dir every time you use one of their commands (install, remove, update, etc)

That will invalidate the cache whether or not you used the --save flag to update package.json.
Which makes it more reliable, especially for NPM < 5.


I can add a way to tap into the plugin's cache invalidation, something like:

new AutoDllPlugin({
  ...

  shouldInvalidate (next) {
    // whether or not the cache should invalidate
    next(true);
  }
})

btw nice work on the docs! 👍

Thanks! much more works to be done

@danez
Copy link
Contributor

danez commented Jul 20, 2017

NPM & Yarn delete that cache dir every time you use one of their commands (install, remove, update, etc)

Where do you have this information from? I have never heard of this, and can also not reproduce this behaviour.
I currently try to investigate how this plugin determinds if node_modules are changed, but it seems it does not, which is the reason why I ship outdated dependencies, although package.json and shinkwrap changed. I think that the statement in the README

here are two conditions for triggering a new build on the next run:

Running npm install / remove / update package-name (or Yarn equivalent).
...

is not correct.

@viankakrisna
Copy link
Contributor Author

Reopening for visibility of more discussion 👍

@viankakrisna viankakrisna reopened this Jul 20, 2017
@asfktz
Copy link
Owner

asfktz commented Jul 20, 2017

I currently try to investigate how this plugin determinds if node_modules are changed, but it seems it does not, which is the reason why I ship outdated dependencies

@danez I'm sorry to hear about it. let's see what we can do to prevent it from happening again.

I use find-cache-dir to create the path for the cache directory.

sindresorhus/find-cache-dir#1 (comment)

Another reason for node_modules:

This means users will clear the cache themselves if they trash node_modules and reinstall (even if they don't know the cache is there). If we convince enough modules to adapt that pattern, it also makes trash node_modules/.cache a standard way to clear caches
avajs/ava#352 (comment)

I realized that I made a mistake.
The .cache directory does get deleted by yarn add/remove/upgrade package-name
But that's not true for NPM.

I apologize for the confusion. I should have double check that.

I will add package.json to the default cache invalidation.

Also, I'll add a new option in the config to allow the user to tap into the invalidation step:

new AutoDllPlugin({

  // always invalidate
  shouldInvalidate: true,

  invalidate: () => {
    // force invalidation for specific environment
    return process.env.NODE_ENV === 'production';
  }

  invalidate: (state) => {
    // read the current state

    state.rebuild // Boolean. Is it a regular build or a rebuild triggered by devServer

    // only build on first run, but not for rebuilds
    return !state.rebuild
  }

  invalidate: () => {
    // it also except a promise
    return new Promise((resolve, reject) => {
      fs.readFile('...', (err) => {
        if (err) {
          return reject(err);
        }

        resolve(true / false);
      })
    })
  }

  invalidate: (state, config, hash) => {
    // build your own hash
    // For example, below is the general idea for the new default.

    // you can gradually build your hash by calling hash.add() as many times you want
    // here, we add the config passed into the plugin.
    // hash is immutable.
    hash = hash.add(config);

    return new Promise(() => {
      var pkgPath = path.resolve(config.context, './package.json');
      
      fs.readFile(pkgPath, (err, content) => {
        if (err) throw err;

        // read package.json's content and add that too.
        hash = hash.add(content);
        
        // when you are done, return the hash.
        // when a hash returned instead of a boolean,
        // the plugin will compare it to previous one (if exists)
        // and then it stores the hash in the cache,
        // so it can be compared against the next one.
        resolve(hash);
      });
    });
  }
})

@danez
Copy link
Contributor

danez commented Jul 21, 2017

No need to apologize. What I did now is adding "postinstall": "rimraf ./node_modules/.cache" which works in our project, as we only call install if either package.json or npm-shrinkwrap.json is changed.

I like the idea to be able to modify the hash. I could then simply add the two files mentioned before to the hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants