Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is a FileSystemHandle? #59

Closed
a-sully opened this issue Oct 10, 2022 · 12 comments · Fixed by #96
Closed

What is a FileSystemHandle? #59

a-sully opened this issue Oct 10, 2022 · 12 comments · Fixed by #96
Labels
clarification Standard could be clearer

Comments

@a-sully
Copy link
Collaborator

a-sully commented Oct 10, 2022

TPAC highlighted a pretty substantial architectural difference between the Chromium and Firefox implementations of this API. Chromium's implementation maps a FileSystemHandle to a file path (which I had attempted to codify a while back), while Firefox's implementation suggests "they are references to objects resolved at creation time."

This has gone unnoticed until this point because no features have exposed this difference to the web.

But there are some significant web-observable implications to this choice, most notably around directory moves. Consider the following code, which moves a directory containing a child we have a handle to:

// Create file at /old/file.txt.
const dir_entry = root.getDirectoryHandle('old', { create: true });
const file_entry = dir_entry.getFileHandle('file.txt', { create: true });

// The file previously pointed to file at /old/file.txt now resides at /new/file.txt.
await dir_entry.move('new');

What happens next?

// PATH-BASED: The promise rejects with NotFoundError.
// REF-BASED:  The call succeeds and returns the File.
await file_entry.getFile();

// PATH-BASED: The promise rejects with NotFoundError.
// REF-BASED:  The promise resolves successfully and creates a new writable stream.
await file_entry.createWritable();

// PATH-BASED: The promise resolves to `null`.
// REF-BASED:  The promise resolves to 'file.txt' (same as before the move).
await dir_entry.resolve(file_entry);

// PATH-BASED: The isSameEntry() promise resolves to `false`.
// REF-BASED:  The isSameEntry() promise resolves to `true`.
const other_file = await dir_entry.getFileHandle('file.txt');
await other_file.isSameEntry(file_entry);

// PATH-BASED: The directory's ID has changed, along with its path.
// REF-BASED:  The directory's ID remains unchanged since before the move.
await dir_entry.getUniqueId();
// PATH-BASED: The file's ID remains unchanged, along with its path.
// REF-BASED:  The file's ID remains unchanged since before the move.
await file_entry.getUniqueId();

// This method is proposed in https://github.com/whatwg/fs/issues/38
// PATH-BASED: The promise presumably will resolve to `null`.
// REF-BASED:  The promise presumably will resolve to a copy of `dir_entry`.
await file_entry.getParent();

// PATH-BASED: getFile() is once again valid and returns the file at /old/file.txt.
//             The isSameEntry() promise resolves to `true`.
// REF-BASED:  getFile() returns the file at /new/file.txt.
//             The isSameEntry() promise resolves to `false`.
const old_dir = await root.getDirectoryHandle('old', { create: true });
const other_file = await old_dir.getFileHandle('file.txt', { create: true });
await file_entry.getFile();
await other_file.isSameEntry(file_entry);

Directory moves very blatantly expose this difference. It will be basically impossible to specify directory moves in a way that's consistent across platforms without being much more specific here. This also has implications for at least the getUniqueId()method and removed handles. Directory moves would also expose a difference in resolve() and a discussed-but-not-yet-formally-proposed getParent() method, among other things.

Looking at the code above, I tend to agree that a ref-based approach leads to outcomes which are more likely to align with user expectations, at least regarding directory moves. It would be nice if moving a directory didn't invalidate all child handles, for example. Meanwhile, there are some instances where a path-based approach arguably makes more sense. What happens to a FileSystemHandle when its underlying file is removed?

​​// Create a file.
const file_entry = root.getFileHandle('file.txt', { create: true });

// Remove the file.
await file_entry.remove();

// PATH-BASED: We can still write to a removed file.
// REF-BASED:  ????
await file_entry.createWritable();

// PATH-BASED: The isSameEntry() promise resolves to `true`.
// REF-BASED:  The isSameEntry() promise resolves to `false`, presumably?
const other_file = root.getFileHandle('file.txt', { create: true });
await other_file.isSameEntry(file_entry);

That being said, it seems reasonable to specify that a handle can still be written to if it's removed via this API, while handles removed by other means (such as by clearing site data or via an OS file manager) could be invalidated.

In summary...

One option is to never specify features that expose this implementation difference, such as directory moves. Unfortunately this is a pretty fundamental difference which I suspect will be hard to paper over as the API continues to evolve. To me, this is a very unsatisfying option. Consider a web IDE which just wants to rename src/foo/ to src/bar/, but is forced to recursively copy all contained files.

Another theoretical option is to accept that there will be cross-browser differences. Sure, moving a directory will invalidate all child handles, but you can re-acquire all the handles within the new directory. However, going down this path is bound to expose many more subtle cross-browser differences. For example, Chromium locks all ancestor directories when a file has an open writable to ensure its path does not change while it is being written to. Having an open writable will block moving a parent directory, which would be a confusing restriction in a reference-based design. This option would be bad for developers and the impacted users, but just listing it here for completeness.

The other option is to specify a requirement that a FileSystemHande is a reference (in the same way I had attempted to specify it’s a path here). That framework could be used to specify new methods such as move(), remove() and getUniqueId() in a way that’s consistent across browsers. This is our preferred option, but supporting this in Chromium would require a pretty substantial re-architecture that I'm hesitant to commit to without clear indication from other browsers and the developer community that handles should be based on references rather than paths...

@szewai does WebKit have an opinion here?

@a-sully a-sully added the clarification Standard could be clearer label Oct 10, 2022
@jesup
Copy link
Contributor

jesup commented Oct 13, 2022

For the case of createWritable() on a FileSystemFileHandle that's been removed, Firefox currently fails the createWritable with NotFoundError, which seems reasonable (we created a FileBlob with a path that doesn't lead to a file -> NotFoundError).

@jesup
Copy link
Contributor

jesup commented Oct 13, 2022

Open writables for us only lock the file, by the way. And with stable identifiers, move() of a file or directory doesn't actually cause a filesystem move, just a DB update.

@jesup
Copy link
Contributor

jesup commented Oct 17, 2022

Note also that resolve() (or whatever we rename it to) is also impacted, of course. With a ref-based approach, resolve() would return the new path; with a pure path-based approach, it would return the original path (which may no longer refer to a file). With an invalidation-based approach, which the currently-written spec seems to take, it should probably throw.

Invalidation would make all file/directoryhandles referring to or under a moved item become invalid, and would throw on most/all attempts to use them (but what about SyncAccessHandles? Thus your lock-on-parent-dirs approach I presume).

Pure-path based would be visible in that given a handles to a, a/b, c and c/b, a->move(d) would make the make a/b invalid, but a subsequent c->move(a) would make the a/b handle valid again (and make c/b invalid). Invalidation would have that as never being valid again. With references, a->move(d) would make a/b into d/b.

@jesup
Copy link
Contributor

jesup commented Oct 17, 2022

I prefer reference-based. It also simplifies thinking about SyncAccessHandles, since in a reference world, it's ok if an ancestor changes name, so there's no need to lock directories above it (and this would be the first idea of a directory lock here, which would be surprising to users).

@szewai
Copy link

szewai commented Oct 21, 2022

It seems in normal file system API, there will be two types of objects (classes): path and handle (entry, file descriptor), where path is stateless and handle is stateful. Path can be used for creating/deleting/renaming a file, and handle can be used to modify file content (e.g. c++ FileSystem library).

I think the original design is FileSystemHandle is path and AccessHandle is handle (TPAC 2021 notes). That means FileSystemHandle would not hold a reference to the underlying file but AccessHandle would, i.e. AccessHandle might be usable after file renaming, but FileSystemHandle is not (as its path is unchanged).

However the existing spec seems to indicate FileSystemHandle is handle, e.g. underlying file needs to be existent for creating a FileSystemHandle. I guess that's why we wonder if FileSystemHandle should be reference-based. I think another option to update existing interfaces and make FileSystemHandle actually path (e.g. adding a create function instead of using options in getFileHandle), which is resolved to an object when needed.

@jesup
Copy link
Contributor

jesup commented Oct 25, 2022

If FileSystemHandle is a path, then the option to create the file/dir is odd and doesn't make sense; in that case the option to create should be on createSyncAccessHandle/createWritable.
If it's a reference, then having the create option on it (as we do now) makes sense.

@jesup
Copy link
Contributor

jesup commented Jan 23, 2023

Resolving this issue blocks other work on the spec (and code/wpt's). Is there anyone else that needs to weigh in on this? Any other comments to resolve?

@szewai
Copy link

szewai commented Jan 25, 2023

I agree that this is the issue that we need to figure out first. Based on the discussion in #34 and other issues, I think we probably just want to mimic POSIX behavior and API.
That means FileSystemHandle is probably a path (that can be used to create, move, remove files and get file attribute), and FileSystemSyncAccessHandle and FileSystemWritableFileStream is probably fd (reference, that can be used to read/write file content).
We will need some changes on existing interfaces like @jesup mentioned: moving the create option to createSyncAccessHandle/createWritable (these functions might mimic open() in POSIX).

@a-sully
Copy link
Collaborator Author

a-sully commented Jan 25, 2023

TLDR I'm supportive of the approach @szewai laid out and (barring any major concerns from @jesup) we can file other issues to bikeshed the specifics. Making a decision here would go a long way towards unblocking a number of other issues

I agree that this is the issue that we need to figure out first. Based on the discussion in #34 and other issues, I think we probably just want to mimic POSIX behavior and API. That means FileSystemHandle is probably a path (that can be used to create, move, remove files and get file attribute), and FileSystemSyncAccessHandle and FileSystemWritableFileStream is probably fd (reference, that can be used to read/write file content).

+1

We looked quite hard into re-architecting around references, but outside of the OPFS this raises more questions than it solves and leads to behavioral inconsistencies that, frankly, I think would leave developers frustrated (e.g. say site A and B hold a handle to the same file. What happens to each site’s file handle if the file is moved… by site A? by another application on the machine? while the browser isn’t running? while a copy of the handle is stored in IndexedDB? if the site is registered for file change notifications WICG/file-system-access#72?). I know this spec isn't focused on the non-OPFS portion of the API, but a reference-based approach would certainly increase the effort needed to support even a subset of it (e.g. mozilla/standards-positions#738).

And at the end of the day, it would be quite nice to be able to tell developers "a FileSystemHandle maps to a path” without all the "gotcha"s

We will need some changes on existing interfaces like @jesup mentioned: moving the create option to createSyncAccessHandle/createWritable (these functions might mimic open() in POSIX).

Just to clarify, are you suggesting removing the create option from getFileHandle() and getDirectoryHandle(), or adding it to createSyncAccessHandle() and createWritable()?

Following the POSIX model, these are touch and mkdir, respectively. Removing the option from getDirectoryHandle() would remove the ability to create new directories altogether. Presumably we don't want this... (and deprecating this method would be a significant breaking change)

Adding the create option to createSyncAccessHandle() and createWritable() (in the spirit of matching POSIX open) seems reasonable. A couple follow-up questions that come to mind (which can probably be bikeshedded on another issue):

  • What should be the default behavior if create is not specified?
    • Currently (at least in Chromium), createSyncAccessHandle() will fail if the path doesn't exist, while createWritable() will not. It would be nice if this was consistent - probably by failing in both cases
  • Do we need some sort of "create empty self from handle" for a directory, as well?
    • If so, would it be worthwhile to have a dedicated create() API?
    • For files, this can otherwise be achieved by calling createWritable() and writing an empty file
    • That being said, re-creation of directories isn't as straightforward. What happens if someone tries to re-create an OPFS root Add FileSystemHandle.remove method #9 (comment)? There's an argument that you should only be able to re-create a directory if you have access to the parent, which makes Accessing the parent of FileSystemFileHandle #38 quite appealing

@szewai
Copy link

szewai commented Jan 26, 2023

Just to clarify, are you suggesting removing the create option from getFileHandle() and getDirectoryHandle(), or adding it to createSyncAccessHandle() and createWritable()?

I was thinking about adding it to createSyncAccessHandle() and createWritable(), and removing it from getFileHandle() and getDirectoryHandle().

Following the POSIX model, these are touch and mkdir, respectively. Removing the option from getDirectoryHandle() would remove the ability to create new directories altogether. Presumably we don't want this... (and deprecating this method would be a significant breaking change)

One issue about having create option on getDirectoryHandle() is we need to have parent handle to create this file/directory, while touch and mkdir can create file/directory directly from a path. So it seems we should add touch()/create() function on FileSystemHandle instead. This might be helpful if the handle is restored from IndexedDB.

What should be the default behavior if create is not specified?
Currently (at least in Chromium), createSyncAccessHandle() will fail if the path doesn't exist, while createWritable() will not. It would be nice if this was consistent - probably by failing in both cases

It could be similar to open() without O_CREAT flag.

Do we need some sort of "create empty self from handle" for a directory, as well?

Yes, this is my first thought.

That being said, re-creation of directories isn't as straightforward. What happens if someone tries to re-create an OPFS root #9 (comment)? There's an argument that you should only be able to re-create a directory if you have access to the parent, which makes #38 quite appealing

I am not sure I understand this. If we allow OPFS root directory to be removed, it seems fine to allow it to be recreated?

@a-sully
Copy link
Collaborator Author

a-sully commented Jan 26, 2023

Following the POSIX model, these are touch and mkdir, respectively. Removing the option from getDirectoryHandle() would remove the ability to create new directories altogether. Presumably we don't want this... (and deprecating this method would be a significant breaking change)

One issue about having create option on getDirectoryHandle() is we need to have parent handle to create this file/directory, while touch and mkdir can create file/directory directly from a path.

I think this is desirable behavior? touch() doesn't allow you to create an arbitrary path - the parent must exist for the operation to succeed. It's also the default behavior for mkdir() (the -p flag will create intermediate directories, if necessary).

I can get behind adding a create() method (perhaps optionally allowing the creation of intermediate directories), but I don't see the point of removing the option from getFileHandle() and getDirectoryHandle(). This would be a significant hit to backwards-compatibility for not much gain, AFAICT... I'd strongly prefer to not cause that kind of breakage if we don't have to.

What should be the default behavior if create is not specified?
Currently (at least in Chromium), createSyncAccessHandle() will fail if the path doesn't exist, while createWritable() will not. It would be nice if this was consistent - probably by failing in both cases

It could be similar to open() without O_CREAT flag.

SGTM 👍

That being said, re-creation of directories isn't as straightforward. What happens if someone tries to re-create an OPFS root #9 (comment)? There's an argument that you should only be able to re-create a directory if you have access to the parent, which makes #38 quite appealing

I am not sure I understand this. If we allow OPFS root directory to be removed, it seems fine to allow it to be recreated?

I agree that it seems reasonable to allow re-creating an OPFS root - I just want to make we behave consistently in this case:

let root = await navigator.storage.getDirectory();
await root.remove();
let newRoot = await navigator.storage.getDirectory();
await root.create();  // This is a no-op
assert(await newRoot.isSameEntry(root));

@a-sully
Copy link
Collaborator Author

a-sully commented Feb 24, 2023

I just put up #96, which sketches out what specifying a path-based approach could look like. It's still very much a WIP, but gives an example (with FileSystemFileHandle.getFile()) of what the API's methods could look like if a handle is explicitly associated with a path, which is then mapped to an entry.

Please feel free to leave feedback on that PR. Thanks!

annevk pushed a commit that referenced this issue Mar 22, 2023
A much more thorough (than #30) attempt at expressing that FileSystemHandle maps to a file path. At a high level, it proposes that:

* A FileSystemHandle has an associated "file system locator"
* A "file system locator" contains, among other things, a "file system path"
* A "file system entry" can be located given a "file system locator"
* You can get a "file system locator" from a "file system entry"

See #59 for the implications of this in practice.

Fixes #59.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Standard could be clearer
Development

Successfully merging a pull request may close this issue.

3 participants