Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: undeprecate existsSync; use access instead of stat for performance #7455

Closed
wants to merge 3 commits into from

Conversation

dfabulich
Copy link
Contributor

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

fs

Description of change
  1. Improve the performance of existsSync by using access without throwing an ignored exception.
  2. Make exists faster by using access instead of stat (without returning an ignored stat result)
  3. Undeprecate existsSync because there's no alternative method that doesn't throw an exception.

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. labels Jun 28, 2016
@dfabulich
Copy link
Contributor Author

dfabulich commented Jun 28, 2016

See also interminable issue #1592 and PR #4217. I hope that we merge this PR #7455, but if the team refuses to merge this PR, then I hope one of #7455 or #4217 can be merged.

No new tests were added in this branch, but all existing tests pass, and I'm not exactly sure how to write a fair benchmark for this. Your suggestions are welcome.

@dfabulich dfabulich changed the title fs: switch exists/existsSync to use access (instead of stat, which is… fs: undeprecate existsSync; use access instead of stat for performance Jun 28, 2016
@thefourtheye
Copy link
Contributor

cc @nodejs/fs

@mscdex
Copy link
Contributor

mscdex commented Jun 28, 2016

Why un-deprecate it if it's essentially just going to be a wrapper for fs.access()? To me it doesn't make sense. I'd much rather see fs.existsSync() finally removed (as a semver-major) since it has been deprecated since io.js v1.0.0.

@dfabulich
Copy link
Contributor Author

  1. There's an enormous gripe thread about the deprecation of existsSync in issue gripe: deprecating fs.exists/existsSync #1592, which is summarized pretty well in PR fs: add accessibleSync() which returns a boolean #4217.

  2. accessSync throws an exception if the file isn't accessible — you're asserting that the file exists and is accessible. But most developers don't want/need the actual exception object, so they just discard it. (Look at the existing implementation of existsSync.)

Generating an exception just so you can ignore it hurts performance. In PR #4217 we're considering adding a new method accessibleSync that would return a boolean (fast!) instead, but that would add unnecessary methods to Node's core, when we could just undeprecate existsSync.

Furthermore, accessSync is just plain annoying to use for ordinary use cases.

if (fs.existsSync(".git/rebase-apply/rebasing")) {
  console.error("You are in the middle of a rebase!");
}

To write that with accessSync you have to use a catch block.

try {
  fs.accessSync(".git/rebase-apply/rebasing");
} catch (ignored) {
  console.error("You are in the middle of a rebase!");
}

Perhaps if annoynce were the only problem, we could all just use a one-liner npm module for existsSync, but there's also performance to consider. You just can't make accessSync fast without making it return a boolean, like existsSync does.

  1. We should never delete exists or existsSync. It would be much too disruptive to the node ecosystem.

@mscdex
Copy link
Contributor

mscdex commented Jun 28, 2016

I'm -1 on this for the reasons stated in the other, related issues.

@dfabulich
Copy link
Contributor Author

Please leave a corresponding vote on PR #4217, preferably a +1 vote.

@Fishrock123
Copy link
Contributor

We can't change the behavior to use access. It was discussed long ago that would be a breaking change too large to justify.

@saghul
Copy link
Member

saghul commented Jun 28, 2016

-1. To keep it short, what @mscdex and @Fishrock123 said.

@dfabulich
Copy link
Contributor Author

dfabulich commented Jun 28, 2016

@Fishrock123 I think you'll find that the current implementation behaves exactly the same as the old behavior.

I realize that a lot of ink has been spilled on this topic, but I have read the megathread, and can find no specific information about how this would break the behavior of old exists, and I have found a lot of rumors and hearsay about how the change would be huge, which are totally unsubstantiated.

For example, I tried reproducing the rumored Windows EPERM issue and could not. My implementation behaves the same as the old implementation when it comes to inaccessible files. (Turns out that when you don't have access to a file, you can't stat it, either! Lucky us.)

This PR passes all existing tests. If there's a bug in this implementation, please tell me what it is and I will fix it.

@trevnorris
Copy link
Contributor

I understand the urge to do operations like:

try {
  fs.accessSync(".git/rebase-apply/rebasing");
} catch (ignored) {
  console.error("You are in the middle of a rebase!");
}

But developers need to be aware that that is still a race condition. Instead the operation should be performed as such:

fs.open('.git/rebase-apply/rebasing', 'wx', (err, fd) => {
  if (err) throw err;  // the file already exists
  // Great! we have a lock on the file and can now perform our operations.
  doStuff(fd);
});

function doStuff(fd) {
  // Perform operations
  fs.closeSync(fd);
  fs.unlinkSync('.git/rebase-apply/rebasing');
}

Using the above the ability to have any two programs to race against the lock file should be removed. I may see the usefulness if checking file existence from a processes that isn't creating/removing the lock file, and you don't care about false positives. But this is conditional upon the processes that are creating/removing the lock file from doing so properly.

@dfabulich
Copy link
Contributor Author

TOCTOU bugs are a risk for existsSync, but also equally for accessSync. There is no TOCTOU-based argument for deprecating existsSync that wouldn't also apply to accessSync.

Thus, no matter how you feel about the TOCTOU risk, it's irrelevant to the two PRs I've filed, PR #4217 or PR #7455. @trevnorris I thought you and I had actually come to agreement on this in the thread for PR #4217 :-(

(For the record, I picked on the .git/rebase-apply/rebasing file because it's a 0-byte file that signals to other non-git processes that the user is in the middle of a rebase. It's a good example of "using the filesystem as an RPC mechanism" which we've discussed in other threads.

There is nothing I want to do with .git/rebase-apply/rebasing, other than detect its existence, so I can report to the user that she's in the middle of a rebase. I don't want to lock it for reading or writing or interact with it in any way except to check its existence. There's no TOCTOU possible here, because the file is never used except by checking its existence.)

@trevnorris
Copy link
Contributor

I understand. Just hashing through key points for the PR. Unless someone can create a test showing how using fs.access() has a different result than using fs.stat() in this situation for supported platforms, I'm fine with keeping fs.exists() (and by extension fs.existsSync()) around.

It may alleviate concerns if more documentation was added about the racy-ness of fs.exists() (I believe the reason there isn't such concern around fs.access() is because it will error, and that error will have specific information about why it couldn't be accessed). For example with the example I have above about how to use fs.open(). Instead of just having "use fs.open() instead!"

Also some documentation about the use cases. e.g. not using it to see if the file can be created, and instead using it for a lock file like you have shown above. but only of the application has no intent on creating the same lock file. this way any failures should be kept as false positives.

@ronkorving
Copy link
Contributor

ronkorving commented Jun 29, 2016

Agreed that documentation can completely alleviate the problem (after that, it's "use at your own risk"). 99% of the time, checking just for file existence causes races (although I'll acknowledge that there are valid use cases).

+1 if we add appropriate warnings (and an explanation of what the right way is to mess about with files) to exists-documentation.

@bnoordhuis
Copy link
Member

The documentation has been warning about that for three years (d97ea06), with seemingly little effect.

@ronkorving
Copy link
Contributor

Ah, apologies. I should've done some homework there. Some extra "how to deal with the open error" info (ie: check for ENOENT) would be welcome though. I'm not really invested in this feature tbh, and it seems more than enough people are quite opinionated about this one, so I'll stay out of it :)

@trevnorris
Copy link
Contributor

@bnoordhuis my suggestion is that the documentation be expanded with specifics. Like alternative code examples. We say to use fs.open() but what's the equivalent of fs.F_OK that way?

@dfabulich
Copy link
Contributor Author

I opened a separate PR #7832 around providing better examples for how to use access and exists.

@dfabulich
Copy link
Contributor Author

The documentation PR #7832 with code examples is getting closer to merging, but we're not much closer to having this PR #7455 or its evil twin #4217 either merged or closed. I was really hoping to land one of those PRs in Node 7, which seems to be fast approaching.

@trevnorris @bnoordhuis What's the next step on this PR?

@trevnorris
Copy link
Contributor

@Fishrock123 you have a qualm w/ this if it lands as semver-major?

@dfabulich
Copy link
Contributor Author

@Fishrock123 said earlier that it would be a large breaking change, but I don't think it is a breaking change; I can't repro any regressions in my implementation.

If anyone can reproduce a bug where my implementation differs from the old existsSync implementation, please let me know.

The existing implementation throws ignored exceptions, which is slow.
The new implementation returns a simple boolean without throwing.
access is faster than stat, because we don't have to return the
actual stat results.
There's no alternative method to use that doesn't throw an exception
when the file doesn't exist.
@Fishrock123
Copy link
Contributor

Fishrock123 commented Aug 22, 2016

I can't repro any regressions in my implementation.

We couldn't repo any problems with realpath either.

I'm not comfortable with this...

@Fishrock123
Copy link
Contributor

I'll stick it on the CTC agenda, I see #7455 (comment) as a good use-case.

I doubt we'll be able to switch the impl though.

@Fishrock123
Copy link
Contributor

@bnoordhuis you had Opinions here, could you take a look at that use-case?

@ChALkeR
Copy link
Member

ChALkeR commented Aug 22, 2016

Ok, my thoughts:

  1. I would prefer fs.exists() and fs.existsSync() implementation to not to be changed too much in order to preserve stability and avoid potential breakage. Potential hard-deprecation in future is better than introducing some subtle behaviour differences.
  2. Un-deprecating fs.exists() doesn't make any sense — it doesn't follow the callback convention and could be replacced with fs.access() everywhere, without requiring more code. fs.exists() is not needed, fs.existsSync() is the only one triggering questions here. I am aware that this PR un-deprecates only fs.existsSync(), but I needed this notice for further points.
  3. Telling that users shouldn't use .existsSync() because of a bad usage pattern will most likely not help — they would just find a worse way to do the same. E.g. a native module that implements .existsSync() and/or monkey-patches it into fs.
  4. What is wanted by the ecosystemd is a sane replacement for fs.existsSync(). Such replacement could be e.g. fs: add accessibleSync() which returns a boolean #4217, but that's only one possible solution. The solution there (introduce a new API instead of changing the old one) looks better to me, because this will be a clear opt-in for users and it will have a saner name. But it's still not the ideal solution, see below.
  5. We should hard-deprecate .exists() and .existsSync() once there is a clear upgrade path for the users for both cases. For .exists() there is atm, for .existsSync() — it introduces more code atm and people don't like the try-catch solution.
  6. An ideal solution would be to slowly phase out sync function with async/await, once that would be ready. So it would be something like accessible = await fs.access(file) instead of accessible = fs.exists(file) (not a mistype, it actually returns whether the file is accessible or not contrary to whether it exists or not, which is impossible).

So what I think we should do is keep those soft deprecated until async/await is there, and then hard-deprecate those, telling people to switch to fs.access, either callback or async/await-based.

What I perhaps could be fine with is introducing another API endpoint without touching these for a truly sync version, like fs.accessibleSync, but I currently fail to see why that is needed now and why that would be needed once we will have proper async/await.

@dfabulich
Copy link
Contributor Author

With async/await access, you'll have the same usability/perf bug as accessSync today, which is that access returns an error when you can't access the file, not a boolean. (It's an assertion that you can access the file.)

If you promisify fs today e.g. with bluebird, and you try to await access you'll have to wrap it in a try/catch and discard the exception.

If you want a fast, easy-to-use await API for access, you'd need a non-Sync accessible method that returns true if the file is accessible and false otherwise. I would be happy to add that to #4217 if that would move the conversation forward.

@ChALkeR
Copy link
Member

ChALkeR commented Aug 22, 2016

@dfabulich

With async/await access, you'll have the same usability/perf bug as accessSync today, which is that access returns an error when you can't access the file, not a boolean. (It's an assertion that you can access the file.)

Hm, I missed that, thanks.

Another issue with accessible(), as far as I remember, was — should it return return an error (or thow in *Sync version) under some circumstances? If yes — which? E.g. what should happen on invalid file names? Too long paths?

@dfabulich
Copy link
Contributor Author

I claim that accessible/existsSync would/should return true if you can access the file and false otherwise, for any reason. That mirrors the existing behavior of existsSync which fails if stat fails (again, for any reason).

@Fishrock123
Copy link
Contributor

Is there actually a case where the user cannot access the file, stat returns true, but that is what the user expects?

Or does stat only return true for non-accessible files on Unix?

@dfabulich
Copy link
Contributor Author

There are two questions here:

  1. Can stat succeed when access F_OK fails?
  2. Can stat fail when access F_OK succeeds?

I claim that the answer is "no" in both cases.

http://linux.die.net/man/2/access lists five ways access F_OK can fail: EACCES, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR

http://linux.die.net/man/2/stat indicates EACCES, ELOOP, ENAMETOOLONG, ENOENT, and ENOTDIR as possible failure modes, in addition to EBADF, ENOMEM, and EOVERFLOW which are the caller's fault.

To pick off the obvious cases, it seems pretty clear that if access fails with ELOOP, ENAMETOOLONG, or ENOTDIR, then stat would fail too, and vice versa. It can be dangerous to rely too much on intuition when it comes to FS error handling, but I can't even imagine a plausible test case where they might differ.

So I think we can boil this down to four sub questions with a matrix of stat/access and ENOENT/EACCES:

1a) Can stat succeed when access F_OK fails with ENOENT?
1b) Can stat succeed when access F_OK fails with EACCES?
2a) Can stat fail with ENOENT when access F_OK succeeds?
2b) Can stat fail with EACCES when access F_OK succeeds?

I claim that we can answer "no" to all of these sub-questions. Specifically, for the ENOENT cases 1a and 2a, I again can't even imagine sane FS behavior where stat and access could differ. Similarly, in 1b, I can't think of any way we could be denied permission to test for the existence of a file but be allowed to read the file's stats.

2b is the only case where I think there could possibly be a difference; it is at least imaginable that a FS could say "this file exists, but you're not allowed to stat it." If I understand correctly, this is what some folks claimed would happen in certain Windows permissions cases. But I tried testing this myself and I couldn't get it to repro.

In my tests, I never found a way to access a file without being allowed to stat it or vice versa. If anybody can exhibit a test case, please let me know.

@Fishrock123
Copy link
Contributor

I'm mostly worried about Windows tbh.

@Fishrock123
Copy link
Contributor

I think the best way forward here is to separate the un-deprecation from the underlying change. Is that possible @dfabulich?

I may need a refresher on what exactly was wrong with stat if they do indeed need to be together.

@dfabulich dfabulich mentioned this pull request Aug 31, 2016
3 tasks
@dfabulich
Copy link
Contributor Author

dfabulich commented Aug 31, 2016

There's now an open PR #8364 to undeprecate existsSync only. If y'all can see fit to merge that, that'd be great. If that happens, I'd then rebase this PR on that one.

The only reason they were combined in the first place was that in other threads folks told me that a PR like #8364 would not be merged, because there was never ever ever any good reason to use existsSync instead of accessSync. This PR was an attempt to justify existsSync on the grounds of both usability and performance.

If the usability argument has convinced the team, then we can merge #8364 and then consider the performance implications of using access instead of stat separately. But if a performance boost is required in order to justify undeprecating existsSync, then we're back where we started with this PR.

@Trott
Copy link
Member

Trott commented Sep 1, 2016

Are any @nodejs/collaborators strongly in favor of undeprecating fs.existsSync() as proposed here and/or an alternative along the lines of #4217? There are collaborators who have concerns and currently oppose it. Without a collaborator champion to help work things through the process, this is unlikely to land anytime soon.

@jasnell
Copy link
Member

jasnell commented Sep 1, 2016

+1 to separating the undeprecation from the internal change. I signed off on the other PR here. I think that PR should land before this one, and this one should be modified to just provide the internal change.

@Fishrock123
Copy link
Contributor

closing in favor of #8364

@Fishrock123 Fishrock123 closed this Oct 5, 2016
Fishrock123 pushed a commit that referenced this pull request Oct 5, 2016
This has been dragged through various long discussions and has been
elevated to the CTC multiple times.

As noted in
#7455 (comment),
while this API is still generally considered an anti-pattern, there are
still use-cases it is best suited for, such as checking if a git rebase
is in progress by looking if ".git/rebase-apply/rebasing" exists.

The general consensus is to undeprecate just the sync version, given
that the async version still has the "arguments order inconsistency"
problem.

The consensus at the two last CTC meetings this came up at was also
to undeprecate existsSync() but keep exists() deprecated.
See: #8242 &
#8330

(Description write-up by @Fishrock123)

Fixes: #1592
Refs: #4217
Refs: #7455
PR-URL: #8364

Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
jasnell pushed a commit that referenced this pull request Oct 6, 2016
This has been dragged through various long discussions and has been
elevated to the CTC multiple times.

As noted in
#7455 (comment),
while this API is still generally considered an anti-pattern, there are
still use-cases it is best suited for, such as checking if a git rebase
is in progress by looking if ".git/rebase-apply/rebasing" exists.

The general consensus is to undeprecate just the sync version, given
that the async version still has the "arguments order inconsistency"
problem.

The consensus at the two last CTC meetings this came up at was also
to undeprecate existsSync() but keep exists() deprecated.
See: #8242 &
#8330

(Description write-up by @Fishrock123)

Fixes: #1592
Refs: #4217
Refs: #7455
PR-URL: #8364

Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
Fishrock123 pushed a commit that referenced this pull request Oct 11, 2016
This has been dragged through various long discussions and has been
elevated to the CTC multiple times.

As noted in
#7455 (comment),
while this API is still generally considered an anti-pattern, there are
still use-cases it is best suited for, such as checking if a git rebase
is in progress by looking if ".git/rebase-apply/rebasing" exists.

The general consensus is to undeprecate just the sync version, given
that the async version still has the "arguments order inconsistency"
problem.

The consensus at the two last CTC meetings this came up at was also
to undeprecate existsSync() but keep exists() deprecated.
See: #8242 &
#8330

(Description write-up by @Fishrock123)

Fixes: #1592
Refs: #4217
Refs: #7455
PR-URL: #8364

Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.