Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating "exists" because of some use cases ignores legitimate use cases #2594

Closed
dionjwa opened this issue Sep 1, 2022 · 37 comments
Closed

Comments

@dionjwa
Copy link

dionjwa commented Sep 1, 2022

Latest docs: https://deno.land/[email protected]/fs/mod.ts?s=exists

"Checking the state of a file before using it causes a race condition. Perform the actual operation directly instead."

So it's saying instead of:

  1. Check if file/dir exists
  2. If it exists, do the thing

instead do

  1. Just do the thing on the file/dir (maybe handle errors)

Rather than that flow, I have this flow:

  1. Check if a file/dir exists
  2. If it doesn't exist, do expensive thing

This doesn't fit into "Perform the actual operation directly instead." scenario. I don't want to perform the operation if the directory or file exists. But I need to check that state beforehand. I have this pattern in many places.

What would the alternative be?

I can understand the documented scenario is an anti-pattern, but this feels like deprecating crucial functionality without a clear pattern for legitimate use cases.

@kitsonk kitsonk transferred this issue from denoland/deno Sep 1, 2022
@NotFounds
Copy link
Contributor

@kt3k
Copy link
Member

kt3k commented Sep 1, 2022

  1. Check if a file/dir exists
  2. If it doesn't exist, do expensive thing

If the check is for a file (not dir), and expensive thing is creating that file at the checked path, then that case can be covered by Deno.open() with createNew: true option, which creates a file only when the file doesn't exist.

@dionjwa
Copy link
Author

dionjwa commented Sep 2, 2022

@kt3k That assumes:

  • that you are only creating a single file (are you really going to do this check on every single file if there are lots?)
    • that creates many edge cases in case that state flow is interrupted. Often safer to restart and redo.
  • that the creation of the contents of the file should done after opening the file and possibly failing
  • that this whole song and dance with Deno.open etc etc etc is somehow a better pattern than just checking if the file/dir exists.

These feel like very complex, cumbersome, and non-intuitive proposals to a potential problem that isn't quantified nor experienced in the wild in many contexts, and especially not in my use cases. I will never experience the stated threat in the use cases for the tooling I build with deno (CLI tools, build tools, automation).

@iuioiua
Copy link
Contributor

iuioiua commented Sep 6, 2022

Related: #2125. Perhaps, this is a matter that requires more thorough documentation.

@iuioiua
Copy link
Contributor

iuioiua commented Sep 30, 2022

@kt3k, I think it's worth:

  1. Improving documentation, strongly and clearly recommending the "do the action directly" method.
  2. Ensure any instance of exists within std is removed in favour of "doing the action directly", following our own recommendation.
  3. Reverting the deprecation.

I'm happy to work on this. What do you think?

@timreichen
Copy link
Contributor

@iuioiua I think this is a good solution.

@dionjwa
Copy link
Author

dionjwa commented Oct 1, 2022

I feel like the use cases I put forward are not understood. All the suggestions are assuming that you want to write to the file, or create it if it doesn't exist, rather than take an unrelated action if the file exists. None of the suggestions work in this case, since I don't want to create the file or directory, just to check if it exists. I suppose we'll have to create a third-party module to cover this basic use-case.

@iuioiua
Copy link
Contributor

iuioiua commented Oct 7, 2022

It'd be worth pointing out that std/log has a valid use case:
https://github.com/denoland/deno_std/blob/f096e33013fad0dfbb5a63f9d7960622a2ffe120/log/handlers.ts#L202-L209

If exists is to be deprecated, a reasonable alternative for this use case has to be vetted.

@kt3k
Copy link
Member

kt3k commented Oct 9, 2022

There's types of usages like assertEquals(existsSync(srcFile), true); in copy and move testing. Usages in testing might be also another example of legit usage.

@iuioiua
Copy link
Contributor

iuioiua commented Oct 9, 2022

Do these use cases provide sufficient reasoning to undo the deprecation of exists for now? I think they do.

@martin-braun
Copy link
Contributor

martin-braun commented Oct 19, 2022

It'd be worth pointing out that std/log has a valid use case:

https://github.com/denoland/deno_std/blob/f096e33013fad0dfbb5a63f9d7960622a2ffe120/log/handlers.ts#L202-L209

If exists is to be deprecated, a reasonable alternative for this use case has to be vetted.

@iuioiua My recent research shows that this code would throw an unhandled exception on Windows, when the backup file exists, but is not readable. If you'd use isReadableFile as substitution, it would not detect files with missing read permissions. Attempting to write on a file that has missing read-permissions could cause data loss if write-permissions are present.

Your code sample shows me that isReadable would not be substitute in all circumstances. Maybe the undo of the deprecation of exists with additional implementations (dirExists and fileExists?) would be preferable, but it should return true when Deno.errors.PermissionDenied is thrown, which can only happen on Windows.

What are your thoughts?

@lino-levan
Copy link
Contributor

I agree with @martin-braun's analysis here. I would suggest that we deprecate (and eventually remove) exists, and replace it with dirExists and fileExists. I think it's pretty important to give people a route to migrate to newer versions of std. If we just remove exists, it is really going against this philosophy.

I, personally, have run into a very similar problem as @dionjwa where I needed to do something (not a file write), if something did not exist and would really like a non-deprecated solution.

@kt3k
Copy link
Member

kt3k commented Nov 2, 2022

@lino-levan See also #2785 . I believe it's the latest version of the proposal.

@martin-braun
Copy link
Contributor

martin-braun commented Nov 3, 2022

@kt3k Which is probably stale. I could maybe move forward when #2791 gets fixed, but I seriously question if the permission part should be included, or not.

Unfortunately, my last comment in this issue got no response. Maybe you could look into it, please?

My concern is, that there are use-cases in which fileExists is better than readableFile, for instance, when trying to ensure that a given file does not exist, thus safely being able to write a new file without data loss. (see #2594 (comment)).

On the other hand, not having a fileReadable would still require to use try-catch to handle permission issues, unless an unhandled error is appreciated.

Third option would be to simply implement fileExists and dirExists as well, probably in a different PR, but I don't really want to pollute the std, thus I'm hereby seeking advice.

@kt3k
Copy link
Member

kt3k commented Nov 3, 2022

@martin-braun
I probably don't understand the question well. It sounds just fine to return false when PermissionDenied thrown (on windows). What could be the issue in that case?

BTW I feel it's a little verbose to have both fileExists and dirExists. How about passing options like isDir: true, isFile: true instead?

@martin-braun
Copy link
Contributor

martin-braun commented Nov 3, 2022

@kt3k The primary goal was to undo the deprecation while improving the situation to reduce the potential for TOCTOU scenarios. Including permission checks was an idea and I was motivated to get consistent behavior between the platforms, thus making readable* instead of *exists functions. On Windows PermissionDenied is raised on failed stat access, on POSIX, it's not. Not good. Another issue is readable* would be problematic when checking if a file does not exist. It would return false, even when the file exists, but no read permissions are available to the user.

So instead, I'm playing with the thought to just re-implement *exists, again, but this is problematic as well:

  • The user still has to catch PermissionDenied for Windows
  • For POSIX systems the user cannot be sure the file can be read, until he/she tries to open the file or analyses its modes

The first point is crucial for me. People want exists back for the ease, but if it fails to address permission and type issues users fall in pits or introduce more TOCTOES.

The second point is important for better consistency between the platforms.

In regards of introducing isDir: true or isFile: true to have just one exists: Feels wrong to me, since the user can cause illogical states (both true or both false).

The bottom line is: Both paths fail to be consistent between platforms, both cannot catch all basic use-cases, I'm neither happy with my draft (which is blocked by unstable implementations), nor with the old exists which had many pitfalls.

@kt3k
Copy link
Member

kt3k commented Nov 3, 2022

In regards of introducing isDir: true or isFile: true to have just one exists: Feels wrong to me, since the user can cause illogical states (both true or both false).

We can throw TypeError for wrong usages.

I still don't see what's wrong about returning false in case of PermissionDenied

@martin-braun
Copy link
Contributor

martin-braun commented Nov 3, 2022

I still don't see what's wrong about returning false in case of PermissionDenied

In case of exists I would say the problem is that only Windows will throw PermissionDenied, thus we have inconsistent behavior between the platforms.

exists will produce different results between Windows and POSIX if the file is available, but cannot be read by the user.

We can throw TypeError for wrong usages.

Ok sure, we can do that.

So you think the best is to ignore the issue I just mentioned and simply re-implement exists with isFile and isDir settings and the user has to be prepared for any permission issues on POSIX systems down the line?

@lino-levan
Copy link
Contributor

I feel like there's some amount of miscommunication going on here.

I'm kind of on a fence here, but I think we should push for isReadable({file?:boolean, folder?:boolean}) (the types here aren't set up properly). I think there isn't much controversy on that front, and afaik the naming there is pretty clear. I think a PR with just isReadable makes sense.

I do get @martin-braun's point here which is, sometimes I really do want to check if a file exists and not just that it's readable by me (I think the examples they've provided show enough usecases for that). If we do decide to remove the current implementation and create another one:

  1. It should have the same arguments as isReadable
  2. It should be focused on determining if a file exists.
    • On POSIX, this is fine and there is no controversy.
    • On Windows, regardless of its readability (we will have a separate method if you want to check that), we want to be able to say that a file exists.
    • If it throws a permission error, it means it exists.
  3. The JSDoc for exists should mention, and recommend, using isReadable unless you really can't in your use case.

Regardless of which route we take, I think that the JSDocs for (both of) them should still recommend "just do the operation directly and catch errors" for the vast majority of things.

@martin-braun
Copy link
Contributor

martin-braun commented Nov 4, 2022

@lino-levan Now I'm actually interested in changing my PR to implement exists and isReadable together and redirect users in the docs from exists to isReadable for the 99% of the time that this is preferred. As soon as uid() and gid() are stabilized, there is really nothing that prevents me to do that.

One last thing though: I think that the 3 arguments rule holds usually, but having to define two arguments when a folder is checked (isReadable(false, true)) feels wrong. I think it would be much better to define a settings type, so you can do:

isReadable({
	folder: true
})

Thanks.

@lino-levan
Copy link
Contributor

@martin-braun denoland/deno#16424 just landed in deno 1.28.0, is there any chance you could update your PR to use uid() and guid()?

@martin-braun
Copy link
Contributor

martin-braun commented Nov 14, 2022

@lino-levan No worries, I'm aware of this and I'm going to tackle this soon, since the path is graded, now.

@martin-braun
Copy link
Contributor

martin-braun commented Nov 16, 2022

Idea for further improvement:

const isReadable = await fs.exists("path/to/file.txt", {
	isFile: true,
	isReadable: true // also require file to be readable
});

The idea is to not just combine the path type check, but also the permission check, into one function. It's not very pretensive against bad programming styles that might lead to race conditions (similarly to the former exists), but it certainly allows to avoid all the pitfalls that could only be solved by using try/catch/lstat previously.

  • On POSIX systems options.isReadable would run the permission check on stat.mode
  • On Windows systems, a PermissionDenied will make fs.exists return !options.isReadable

I'm curious if you agree or not.

@lino-levan
Copy link
Contributor

Double thumbs on this one, I feel like the syntax is a lot clearer. I do want to really make sure the documentation is on point if we are essentially un-deprecating exists.

Also bumping this:

The JSDoc for exists should mention, and recommend, using isReadable unless you really can't in your use case.

@martin-braun
Copy link
Contributor

martin-braun commented Nov 16, 2022

The JSDoc for exists should mention, and recommend, using isReadable unless you really can't in your use case.

There is also an option to opt-in for another function name and have isReadable: true (or maybe ignorePermissions: false) by default to address this passively. Although, I'd have hard times in naming such function. Any suggestions are welcome.

@iuioiua
Copy link
Contributor

iuioiua commented Nov 16, 2022

I think a sensible solution would be to add a second optional ExistOptions argument that can extend its functionality, including isFile, isReadable, isDirectory properties, etc.

Outside of that, I think exists should remain as is. The name exists may not be 100% technically correct. However, it best reflects its most common use case.

The documentation recommends not using it and provides examples of how to do so. Also, there appears to be enough evidence of valid use cases. For this reason, I think the deprecation should be undone, and the codebase should steer away from using it internally where reasonable.

Frankly, but respectfully, I think the amount of time and effort surrounding this function has been overplayed. It'd be nice for a sensible agreement to be met and move on.

@martin-braun
Copy link
Contributor

martin-braun commented Nov 17, 2022

@iuioiua I agree. My PR is almost ready, but I need denoland/deno#15576 to be addressed first. Let me explain:

Deno.errors.PermissionDenied (thrown by Deno.stat) is either raised from missing to provide the --allow-read flag or by trying to access a file that is protected by the file system. Only for the latter I have to return !options.isReadable, while I have to throw the error for the former. I will use Deno.permissions.query to distinguish between both cases like so:

// ...
  } catch (error) {
    if (error instanceof Deno.errors.NotFound) {
      return false;
    }
    if (error instanceof Deno.errors.PermissionDenied) {
      if (
        (await Deno.permissions.query({ name: "read", path })).state ===
        "granted"
      ) { // --allow-read not missing
        return !options?.isReadable; // PermissionDenied was raised by file system
      }
    }
    throw error;
  }
}

For existsSync I will need querySync, respectively.

I came to awareness after re-implementing the Scenes for missing --allow-read permission checks.

cc @lino-levan

@iuioiua
Copy link
Contributor

iuioiua commented Nov 17, 2022

I think a sensible solution would be to add a second optional ExistOptions argument that can extend its functionality, including isFile, isReadable, isDirectory properties, etc.

Outside of that, I think exists should remain as is. The name exists may not be 100% technically correct. However, it best reflects its most common use case.

The documentation recommends not using it and provides examples of how to do so. Also, there appears to be enough evidence of valid use cases. For this reason, I think the deprecation should be undone, and the codebase should steer away from using it internally where reasonable.

Frankly, but respectfully, I think the amount of time and effort surrounding this function has been overplayed. It'd be nice for a sensible agreement to be met and move on.

@kt3k and @cjihrig, what do you think?

@iuioiua
Copy link
Contributor

iuioiua commented Nov 17, 2022

@iuioiua I agree. My PR is almost ready, but I need denoland/deno#15576 to be addressed first. Let me explain:

Deno.errors.PermissionDenied (thrown by Deno.stat) is either raised from missing to provide the --allow-read flag or by trying to access a file that is protected by the file system. Only for the latter I have to return !options.isReadable, while I have to throw the error for the former. I will use Deno.permissions.query to distinguish between both cases like so:

// ...
  } catch (error) {
    if (error instanceof Deno.errors.NotFound) {
      return false;
    }
    if (error instanceof Deno.errors.PermissionDenied) {
      if (
        (await Deno.permissions.query({ name: "read", path })).state ===
        "granted"
      ) { // --allow-read not missing
        return !options?.isReadable; // PermissionDenied was raised by file system
      }
    }
    throw error;
  }
}

For existsSync I will need querySync, respectively.

I came to awareness after re-implementing the Scenes for missing --allow-read permission checks.

cc @lino-levan

The Deno.permissions API isn't needed because Deno.lstat requires --allow-read anyway.

@lino-levan
Copy link
Contributor

lino-levan commented Nov 17, 2022

@iuioiua If I'm understanding @martin-braun's logic, I think you're missing the point. We need to query the --allow-read permission to the tell the difference between whether the user just didn't specify --allow-read or if the file is just not readable by us on windows.

Now that I say that out loud, is there even value in differentiating between these two @martin-braun?

@iuioiua
Copy link
Contributor

iuioiua commented Nov 17, 2022

Yes, I understand. However, my statement still stands: Deno.permissions.querySync would always return true anyway since it's needed for Deno.lstat.

Now that I say that out loud, is there even value in differentiating between these two @martin-braun?

I think that's a good question. Perhaps, a simple boolean option could set whether the function throws or ignores a permission error might do the trick. What do we think?

@martin-braun
Copy link
Contributor

martin-braun commented Nov 17, 2022

@iuioiua If I'm understanding @martin-braun's logic, I think you're missing the point. We need to query the --allow-read permission to the tell the difference between whether the user just didn't specify --allow-read or if the file is just not readable by us on windows.

This is absolutely correct, @lino-levan.

Now that I say that out loud, is there even value in differentiating between these two @martin-braun?

Yes there is. If PermissionDenied is raised by lstat or stat due to a file permission error, we know that the file exists, but cannot be read. If the user called exists with isReadable: false (not caring about whether the file can be read or not), we must return true in that case, because we know the file exists, it's just protected by the file system and cannot be read from the user.

However, my statement still stands: Deno.permissions.querySync would always return true anyway since it's needed for Deno.lstat.

@iuioiua Please revisit my code snippet, I'm trying to query the permissions in the catch-block and lstat / stat could also raise PermissionDenied for other reasons, for instance when the current user has no read permission on the file on Windows systems. I've verified this case to myself.

I cannot just return false if PermissionDenied is raised, because it could be that the user called exists("file.txt", { isReadable: false }) on a protected existing file on Windows systems. exists needs to return true in this case.

This is the reason why I cannot throw an error from PermissionDenied either, because it could just be a file permission error that isn't relevant for the question "does the file exists?" (we know, it exists by this point), but rather for the question "can the file be read?", so we have to solve the final bit depending on the isReadable option.

Perhaps, a simple boolean option could set whether the function throws or ignores a permission error might do the trick. What do we think?

I have hard times to imagine how this would solve the situation. Why would exists need to raise an exception on a permission error when the error itself can reveal if the file exists or not? If the user misses --allow-read I will throw its exception in any case. The unfortunate reality is that Deno raises the same error for file permission errors and deno permission errors.

A suitable alternative to my approach would be to let Deno raise different errors for these scenarios.

@cjihrig
Copy link
Contributor

cjihrig commented Nov 17, 2022

@cjihrig, what do you think?

I think we should just have an exists() function that returns a boolean (or Promise<boolean>). I've personally witnessed Node.js try to get rid of fs.exists() and fs.existsSync() for years. I don't think I've ever seen developers as passionate about anything as they are about having this fundamentally broken API. So, let's just have the API with a warning in the docs and move on with life.

I feel pretty strongly that we should not add more options to this API.

@iuioiua
Copy link
Contributor

iuioiua commented Nov 17, 2022

Yeah, my suggestion of adding options came from hearing about people's edge cases from previous discussions. But again, they're edge cases. exists, in its current form, takes care of its primary use case the vast majority of the time.

Thinking about it further, I agree. I think the best solution is to just add sufficient documentation to explain these edge cases and move on, rather than having a function that's needlessly over-engineered.

@martin-braun
Copy link
Contributor

martin-braun commented Nov 17, 2022

That is very unfortunate, given that the implementation is almost ready to be reviewed (querySync is the last missing piece).

I get that enhancing a function which concept is not perceived in a positive manner feels counter-intuitive, but the proposed enhancements would improve the situation by covering all the possible pitfalls that would occur when using exists, because people want and continue to use exists, since there is no better alternative for the proposed exceptional cases.

A simple undo of the deprecation of exists will bring back the old problems that came along with it. My initial reaction was that people should then try and catch Deno.lstat or Deno.stat instead, since a simple exists call will potentially cause a high risk of un-expected errors that the programmer isn't really aware of, namely PermissionDenied or consequences due to a wrong file system item type (directory / file). And if the programmer is aware of such potential errors, he/she needs to catch those errors and thus could just call Deno.lstat or Deno.stat instead and have less code in doing so.

For me, simply using exists has so little benefit over the high risk of failure to pay in return, that I would rather vote for keeping exists deprecated than undoing the deprecation without the possibility to check for the file permissions and file type (directory or file) without try/catch. As soon as I have to use try/catch to cover those things, there is no legitimate use-case for exists at all.

I believe it is irresponsible to offer only a plain exists, because it could only be used in a correct way when trying to make sure a path is not blocked before (external) access, so when neither permissions or types matter, but an exception can still be raised due to the lack of file permissions on Windows systems. And please let's face it: We know exists will be used in many more cases that are less than ideal.

@iuioiua
Copy link
Contributor

iuioiua commented Mar 20, 2023

That is very unfortunate, given that the implementation is almost ready to be reviewed (querySync is the last missing piece).

I get that enhancing a function which concept is not perceived in a positive manner feels counter-intuitive, but the proposed enhancements would improve the situation by covering all the possible pitfalls that would occur when using exists, because people want and continue to use exists, since there is no better alternative for the proposed exceptional cases.

A simple undo of the deprecation of exists will bring back the old problems that came along with it. My initial reaction was that people should then try and catch Deno.lstat or Deno.stat instead, since a simple exists call will potentially cause a high risk of un-expected errors that the programmer isn't really aware of, namely PermissionDenied or consequences due to a wrong file system item type (directory / file). And if the programmer is aware of such potential errors, he/she needs to catch those errors and thus could just call Deno.lstat or Deno.stat instead and have less code in doing so.

For me, simply using exists has so little benefit over the high risk of failure to pay in return, that I would rather vote for keeping exists deprecated than undoing the deprecation without the possibility to check for the file permissions and file type (directory or file) without try/catch. As soon as I have to use try/catch to cover those things, there is no legitimate use-case for exists at all.

I believe it is irresponsible to offer only a plain exists, because it could only be used in a correct way when trying to make sure a path is not blocked before (external) access, so when neither permissions or types matter, but an exception can still be raised due to the lack of file permissions on Windows systems. And please let's face it: We know exists will be used in many more cases that are less than ideal.

Fair points.

@martin-braun
Copy link
Contributor

This can be closed now, since #2785 landed.

@kt3k kt3k closed this as completed Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants