Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the travese if glob is filepath. #1385

Closed
wants to merge 1 commit into from

Conversation

SparshithNR
Copy link
Member

Check if the glob is a file path if so do not traverse folder for that glob.

Check if the glob is file path, if so do not traverse folder for that glob.
@SparshithNR
Copy link
Member Author

Please review @Krinkle

filteredGlobs.push( `${glob}/**/*.js` );
} else if ( stat && stat.isFile() ) {
files.push( glob );
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is return false meant to do? I suspect it may be intended for jQuery.each which uses it to break the for-loop, however Array#forEach does not do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. return false doesn't actually mean anything for forEach.

@Krinkle
Copy link
Member

Krinkle commented Apr 13, 2019

@SparshithNR Are you experiencing a performance issue without this change?

That is, is there a problem with deep recursion (which seems like a bug we can fix in findFiles), or is it just between checking 1 file and opening 1 directory. If the last case, I think maybe it is worth keeping the code simpler for a small difference in CLI response.

@Krinkle
Copy link
Member

Krinkle commented Apr 13, 2019

Review the test failures as well, it appears the current patch version may be causing a regression.

Let me know if you have any questions :)

@SparshithNR
Copy link
Member Author

@SparshithNR Are you experiencing a performance issue without this change?

That is, is there a problem with deep recursion (which seems like a bug we can fix in findFiles), or is it just between checking 1 file and opening 1 directory. If the last case, I think maybe it is worth keeping the code simpler for a small difference in CLI response.

  1. This change will avoid needlessly traverse for a file when we know the exact path. There is a good amount of performance benefit by doing this.
  2. when we run qunit tests/*-test.js the process.args will have all those files which match the tests/*-test.js format. We need not find any files by minmatch.
    Ex:
// print process.argv
process.argv.forEach(function (val, index, array) {
  console.log(index + ': ' + val);
});

and if I run this I get, which will avoid traversing tress for all these many files as we get all these files readly available which other wise we min-match with entire tree structure.

node test.js tests/*.js         
0: /Users/snrai/.nvm/versions/node/v8.11.3/bin/node
1: /Users/snrai/Projects/testGround/test.js
2: tests/eventEmitter.js
3: tests/test.js
4: tests/test_2.js
5: tests/test_3.js

@Krinkle
Copy link
Member

Krinkle commented Jun 27, 2020

@SparshithNR I'm sorry but I am not yet understanding the problem.

If the patterns are plain file names like tests/eventEmitter.js, these are given to findFiles() and start at the current directory, and narrows its subdirectores to tests/ (fast, no recursion), and then narrow its files to eventEmitter.js (again, no recursion).

Is that your understanding is as well, or did you find a bug? Perhaps it is traversing unrelated directories or doing redundant recursion. If so, then that is a bug I would like to fix instead.

If not, can you quantify what amount of performance would be gained? I was not able to notice any significant difference both with a small list and with a large list of files. I generally prefer not to add optimisation complexity in code paths unless they are "hot" code paths. That way, the overall code remains simple and intiutive to debug and understand, allowing bigger optimisations to be thought of and implemented.

@rwjblue
Copy link
Contributor

rwjblue commented Jun 29, 2020

FWIW, from my perspective there is definitely a problem to be solved in this space. The current globbing logic is extremely slow in folders with many files even if you specify a single file.

@rwjblue
Copy link
Contributor

rwjblue commented Jun 29, 2020

Steps to repro (in a repo I happened to have worked in not too long ago that I noticed was slow):

git clone [email protected]:tildeio/broccoli-typescript-compiler.git
cd broccoli-typescript-compiler
git submodule update --init
yarn install
node ./node_modules/.bin/qunit dist/tests/path-utils-test.js

The test is super simple (import a function, invoke it, add one or two assertions), but that command takes ~ 45s (on my machine). Nearly all of the time is taken up in utils.getFilesFromArgs:

qunit/src/cli/utils.js

Lines 56 to 83 in 0a0b7b8

function getFilesFromArgs( args ) {
let globs = args.slice();
// Default to files in the test directory
if ( !globs.length ) {
globs.push( "test/**/*.js" );
}
// For each of the potential globs, we check if it is a directory path and
// update it so that it matches the JS files in that directory.
globs = globs.map( glob => {
const stat = existsStat( glob );
if ( stat && stat.isDirectory() ) {
return `${glob}/**/*.js`;
} else {
return glob;
}
} );
const files = findFiles( process.cwd(), { match: globs } );
if ( !files.length ) {
error( "No files were found matching: " + args.join( ", " ) );
}
return files;
}

Krinkle pushed a commit that referenced this pull request Jul 30, 2020
Check if the argument is path to a simple known file, if so do not
traverse the entire folder for that glob only to find self.

Closes #1385.

Co-authored-by: Robert Jackson <[email protected]>
@Krinkle
Copy link
Member

Krinkle commented Jul 30, 2020

Steps to repro (in a repo I happened to have worked in not too long ago that I noticed was slow):

git clone [email protected]:tildeio/broccoli-typescript-compiler.git
cd broccoli-typescript-compiler
git submodule update --init
yarn install
node ./node_modules/.bin/qunit dist/tests/path-utils-test.js

Thanks, yeah ~40s vs a split second! I do note that if the file doesn't exist, the CLI remains slow for this case, so there's room for a future improvement to perhaps also discard the entry early on if there's no file/dir if the input is not glob-"ish".

Krinkle pushed a commit that referenced this pull request Jul 30, 2020
Check if the argument is path to a simple known file, if so do not
traverse the entire folder for that glob only to find self.

Closes #1385.

Co-authored-by: Robert Jackson <[email protected]>
@Krinkle Krinkle mentioned this pull request Jul 30, 2020
3 tasks
Krinkle pushed a commit that referenced this pull request Jul 30, 2020
Check if the argument is path to a simple known file, if so do not
traverse the entire folder for that glob only to find self.

Closes #1385.

Co-authored-by: Robert Jackson <[email protected]>
@Krinkle
Copy link
Member

Krinkle commented Jul 30, 2020

As part of landing I have:

  • … removed the no-op return false statement.
  • … updated the unit tests as the run time order is slightly different now with known files run ahead of the others.

@Krinkle Krinkle closed this in 194fa05 Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants