-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give FileSystemHandles an associated path #96
Conversation
I expect the build to fail for now, since |
@@ -193,9 +204,18 @@ interface FileSystemHandle { | |||
}; | |||
</xmp> | |||
|
|||
A {{FileSystemHandle}} object represents a [=/file system entry=]. Each {{FileSystemHandle}} object is associated | |||
with an <dfn for=FileSystemHandle export>entry</dfn> (a [=/file system entry=]). Multiple separate objects implementing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Un-defining [=FileSystemHandle/entry=]
breaks most of this spec (and https://github.com/WICG/file-system-access), and a FileSystemHandle does conceptually have an associated entry... even if getting that entry requires looking it up using path.
Is it possible to have an "associated" member (i.e. [=FileSystemHandle/entry=]
) that's retrieved using an algorithm?
It would be something like:
Each {{FileSystemHandle}} object is associated with a
<dfn for=FileSystemHandle export>path</dfn> (a [=/file system path=])
which maps to an associated <dfn for=FileSystemHandle export>entry</dfn> (a [=/file system entry=])
using an [=implementation-defined=] mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't part of the problem that it doesn't always map to an entry? I think you'd have to change all call sites anyway to take that into account and given that it seems better to have a more explicit algorithm that takes you from one to the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what I was thinking. It seems to me like the options are:
- a path can always map to an entry, which may "exist" or not (as currently sketched out in the
getFile()
method) - the locate method returns either a valid entry, or nothing. And then perhaps we can add a
[=FileSystemHandle/exists=]
algorithm for simplicity. The latter seems more intuitive to me. I can test this out in the next patch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the locate method returns either a valid entry, or nothing
This is sketched out in the latest patch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment above on tying access checks to paths, the current chrome implementation (at least last I looked at it, which admittedly is a while ago, so maybe that has been changed already?) does not guarantee that two handles representing the same path behave the same in all aspects (since for handles acquired via DirectoryHandle.getFileHandle/getDirectoryHandle the access checks are done with the path of the original directory, rather than the path of the current handle).
I'm not saying that shouldn't be changed, but it's certainly something to keep in mind. Perhaps tying access checks to the FileSystemHandle rather than the entry would be a way around that, if we want to keep the same behavior? Although you'd have to figure out how to serialize that properly as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my confusion, but why would we need to serialize access checks if we were to move them from the "entry" to the FileSystemHandle?
index.bs
Outdated
@@ -59,6 +60,8 @@ different storage mechanism with a different API for such files. The entry point | |||
|
|||
A <dfn export id="entry">file system entry</dfn> is either a [=file entry=] or a [=directory entry=]. | |||
|
|||
<!-- TODO: Consider tying access checks to a path, to support handles which don't map to an entry --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, at least originally we intentionally did not tie access checks to paths, to provide more flexibility for implementations (and I don't think even today our implementation actually ties access checks to path either?). One case where this makes a difference is:
- get a read-only directory handle via a directory picker
- get a writable file handle, that happens to be in the above directory
- now get a file handle to the same file by calling getFileHandle on the directory handle
I'm pretty sure that in at least our implementation the latter file handle will still be read-only. (i.e. while we sort of tie access checks to paths, they're not tied to the path of the entry a handle refers to, rather they are tied to the path of the file or directory that a user at some point picked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct that, at least in Chrome's current implementation, the file handle you get from calling getFileHandle()
on the directory handle is read-only, though it's debatable whether that's a feature or a bug. I can see this leading to confusion on the part of the website author and user. Regardless, that's a Chrome implementation detail...
The reason I mention this is because if we assert that an entry must exist on disk, then we can't use it to lock/check access to entries that don't exist, such as when locking the target file when renaming a file.
... thinking about this a bit more, perhaps the move()
case isn't actually an issue if we specify that all file system operations to the same queue (#95), since that would guarantee (at least within the OPFS) that the target file isn't created and locked while the move()
operation is in progress
Not sure what to do about access checks, though, if we assert that an "entry" exists on disk
@@ -193,9 +204,18 @@ interface FileSystemHandle { | |||
}; | |||
</xmp> | |||
|
|||
A {{FileSystemHandle}} object represents a [=/file system entry=]. Each {{FileSystemHandle}} object is associated | |||
with an <dfn for=FileSystemHandle export>entry</dfn> (a [=/file system entry=]). Multiple separate objects implementing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment above on tying access checks to paths, the current chrome implementation (at least last I looked at it, which admittedly is a while ago, so maybe that has been changed already?) does not guarantee that two handles representing the same path behave the same in all aspects (since for handles acquired via DirectoryHandle.getFileHandle/getDirectoryHandle the access checks are done with the path of the original directory, rather than the path of the current handle).
I'm not saying that shouldn't be changed, but it's certainly something to keep in mind. Perhaps tying access checks to the FileSystemHandle rather than the entry would be a way around that, if we want to keep the same behavior? Although you'd have to figure out how to serialize that properly as well.
@annevk do you have any idea why the build is failing? Does it not like "getting the entry"? |
I suspect it's failing because of "the the". Search for "typos" in https://resources.whatwg.org/build/deploy.sh. |
Ah, good catch. Thanks! |
index.bs
Outdated
exists at the path `data/drafts/example.txt` relative to the root directory of | ||
the [=origin private file system=], | ||
|locator|'s [=file system locator/kind=] has to be {{FileSystemHandleKind/"file"}}, | ||
|locator|'s [=file system locator/path=] might be « "`data`", "`drafts`", "`example.txt`" », and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say "might" here because it's still "implementation defined" how a "locator" maps to an "entry", and most of the methods (with isSameEntry()
and resolve()
being the exceptions) require the "entry" to exist to do anything useful. For example, take the getFileHandle(name)
method. It's basically:
- Given a "locator", locate
entry
(a dir entry) - Given
entry
, find thechild
whose name isname
- Create a FileSystemFileHandle given
child
, which involves "getting the locator" ofchild
So nowhere in this flow do we ever specify appending name
to this
's locator's path to get child
's locator's path
Relatedly, creating the root of an OPFS doesn't say anything about its locator's path
If we want to assert that the path above "has to be" /data/drafts/example.txt
(in list form) then we could:
- ensure that handles acquired from various API surfaces are given a defined path
- set the path of the root of an OPFS to a list containing one empty string
- outside of the OPFS (e.g. for a file picked from a file picker), let the path be "implementation defined"
- create handles from existing handles by manipulating the existing handle's locator's path
- add new versions of the "create a new FileSystem{File|Directory}Handle" algorithms that append to the path of the existing handle
I'm happy to make this change (either here or in a follow-up), but this is where getFileHandle()
and getDirectoryHandle()
get thorny, as @szewai pointed out #59 (comment)). Arguably, these methods could ignore the state on disk and just always return a handle whose locator has had its path appended to, without bothering to check what exists on disk....
... but this would be a breaking change, and a pretty substantial one where { create: true }
is currently passed
For { create: false }
with dir
that exists at path "/testing":
await dir.remove();
// Before: throws a NotFoundError
// After: succeeds - handle's path is "/testing/example.txt"
let handle = await dir.getFileHandle('example.txt', { create: false });
This is a web-exposed change, but arguably not that bad, especially if we add something like #80. Something that threw an exception before would longer throw.
For { create: true }
with dir
that exists at path "/testing" if we deprecate the "create" option:
// Before: creates a new file at "/testing/example.txt"
// After: handle's path is "/testing/example.txt" - but no file is created
let handle = await dir.getFileHandle('example.txt', { create: true });
// At this point, any method which cares about what's on disk now fails
// Before: creates a SyncAccessHandle
// After: throws a NotFoundError
let accessHandle = await handle.createSyncAccessHandle();
// Before: gets the file's contents
// After: throws a NotFoundError
let file = await handle.getFile();
I worry that deprecating this option would break a lot of existing code. Something that works now would consistently throw
My tentative thoughts are that we should just leave this as "might" in this PR, then make the changes above to turn this into a "has to be", then look into making getFileHandle()
and getDirectoryHandle()
(perhaps by adding an additional flag? We can hash that out later :) )
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a step back here to ensure I understand.
getFileHandle()
is exposed onFileSystemDirectoryHandle
.- A
FileSystemDirectoryHandle
is represented by a locator and as such when a method is invoked upon it that might fail if the locator no longer has a corresponding entry.
Why could we not append the given name to the locator's path and then see if an entry exists there? I think we can keep the existing behavior here, even if it's not necessarily the most natural given the "new" underpinnings.
I think for the OPFS root the path would indeed have to be the empty list. I'm not sure about making it contain an empty string. That only seems sensible if we did directories that way and you decided we wouldn't do that.
For file pickers you'd want the root to be a unique identifier and the path to be a list containing the file name, I think. At least by default <input type=file>
shouldn't start exposing more about the user's machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a step back here to ensure I understand.
getFileHandle()
is exposed onFileSystemDirectoryHandle
.- A
FileSystemDirectoryHandle
is represented by a locator and as such when a method is invoked upon it that might fail if the locator no longer has a corresponding entry.
Correct 👍
Why could we not append the given name to the locator's path and then see if an entry exists there? I think we can keep the existing behavior here, even if it's not necessarily the most natural given the "new" underpinnings.
That's basically what I was proposing on the other thread, so yeah that SGTM. I just wanted to explain what I understood of the other perspectives that were expressed (@szewai please correct me if I've misunderstood anything!)
The only nitpick here is that if { create: true }
currently fails rather than recursively create an entry. So we'd still need to check this
's locator first before appending the given name, or change the behavior
I think for the OPFS root the path would indeed have to be the empty list. I'm not sure about making it contain an empty string. That only seems sensible if we did directories that way and you decided we wouldn't do that.
Yeah I agree that having a list with an empty string isn't great, but the empty string is the name of the directory. We'd have to carve out an exception for OPFS roots in this (which isn't that big of a deal):
If [=locating an entry=] given [=/file system locator=] |locator| returns a
[=/file system entry=] |entry|, then |entry|'s [=file system entry/name=] must
be the same [=string=] as the last [=list/item=] of
|locator|'s [=file system locator/path=].
For file pickers you'd want the root to be a unique identifier and the path to be a list containing the file name, I think. At least by default
<input type=file>
shouldn't start exposing more about the user's machine.
I agree that this would be ideal, but consider the following sequence:
- show a file picker and select
data/drafts/example.txt
- show a directory picker and select
data
- resolve
data/drafts/example.txt
relative todata
(this should be[ 'drafts', 'example.txt' ]
)
This won't resolve correctly if the "file system path" of the first file is just [ 'example.txt' ]
We could specify that the last item of the "file system path" should be the entry's name, but I think the rest of the path probably needs to be implementation-defined? And since the "file system path" isn't expected to be exposed to the end user, I don't think it should really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only nitpick here is that if { create: true } currently fails rather than recursively create an entry. So we'd still need to check this's locator first before appending the given name, or change the behavior
Can you elaborate on this, I'm not sure I follow.
but consider the following sequence:
Well, you could resolve with null in this case. We don't have to tell the website more than the user told them. If you want to make it work you'd have to do away with the concept of a root I think (that would have to be part of the path), but I'm not necessarily convinced it needs to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only nitpick here is that if { create: true } currently fails rather than recursively create an entry. So we'd still need to check this's locator first before appending the given name, or change the behavior
Can you elaborate on this, I'm not sure I follow.
// If dirHandle exists, success.txt is created within its directory
let handle = await dirHandle.getFileHandle('success.txt', { create: true });
await dirHandle.remove({ recursive: true });
// `dirHandle` no longer exists, so fail.txt is not created
let other = await dirHandle.getFileHandle('fail.txt', { create: true });
In this scenario, whether FileSystemDirectoryHandle.getFileHandle()
succeeds is dependent on whether this
exists. So in the algorithm for the method, we can't just append the given name to the locator's path and then see if an entry exists there. We have to check this
's locator first (or support recursive creation)
but consider the following sequence:
Well, you could resolve with null in this case. We don't have to tell the website more than the user told them. If you want to make it work you'd have to do away with the concept of a root I think (that would have to be part of the path), but I'm not necessarily convinced it needs to work.
In this scenario, I don't see much benefit to not allowing the site to know that there's a relation chain between these handles? If the site has access to a parent directory, it has the means to find this out anyways. And this applies to more than just the resolve()
method. The isSameEntry()
method also just checks the locator (it can't check the entry, since the entry may not exist) and needs to return true in this case, for example
let handleFromDirPicker = await (await dataHandle.getDirectoryHandle('drafts')).getFileHandle('example.txt');`
// ???
await handleFromFilePicker.isSameEntry(handleFromDirPicker);
The more generic version of this question is whether the site should be aware if the same file is selected on separate file picker invocations. For example, if another file picker is shown and data/drafts/example.txt
is once again selected. It's critical for our partners that isSameEntry()
returns true in this case:
await handleFromFilePicker.isSameEntry(sameHandlePickedAgain);
Taking a step back... it seems like a bad idea to have different locators that point to the same entry anyways?
A nice perk of letting implementations define the "file system path" to contain components that are invisible to the user in the non-OPFS case is that we can operate directly on that path in resolve()
and check for equality of "file system locators" by comparing its fields, rather than needing to resort to hand-wavy algorithms. The "implementation-defined" magic is limited to just the prepended path components for files outside of the OPFS
Yeah I agree that having a list with an empty string isn't great, but the empty string is the name of the directory. We'd have to carve out an exception for OPFS roots
By the way, any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a step back... it seems like a bad idea to have different locators that point to the same entry anyways?
I'm not sure. I'll try to get some clarity on this.
By the way, any thoughts on this?
It seems reasonable. We probably have to clarify somewhere that the name getter can return an invalid file name in this special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to assert that the path above "has to be"
/data/drafts/example.txt
(in list form) then we could:
- ensure that handles acquired from various API surfaces are given a defined path
- set the path of the root of an OPFS to a list containing one empty string
- outside of the OPFS (e.g. for a file picked from a file picker), let the path be "implementation defined"
- create handles from existing handles by manipulating the existing handle's locator's path
- add new versions of the "create a new FileSystem{File|Directory}Handle" algorithms that append to the path of the existing handle
This is all included in the latest patch, aside from creating handles outside of the OPFS (which will have to be done in a follow-up on https://github.com/WICG/file-system-access, for which I think I've exported all the terms I'll need... 🤞)
Also added a warning that the "file system path" shouldn't be exposed directly to the website process - it can only be implied by resolving the handle to a parent, as in this use case:
- show a file picker and select
data/drafts/example.txt
- show a directory picker and select
data
- resolve
data/drafts/example.txt
relative todata
(this should be[ 'drafts', 'example.txt' ]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for tackling this! I will attempt to give this a more detailed read through next week. Perhaps meanwhile @jesup or @asutherland can have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty good to me. I found a couple of remaining nits. Can you please file the follow-ups so we can point to them?
Looking at this again I'm not sure I understand the setup around name.
I pushed a cleanup commit that removes some of the wording around name that seemed to duplicate other requirements. And where it was used incorrectly as a type. I think we can go further, but I'm not entirely sure how to reword the various children iterators. |
This looks good, assuming we agree below on how to handle the OPFS root. Thanks!
Are we sure we need to reword the various children iterators? It seems fine to use entries (as opposed to locators) for file system operations like reading the contents of an entry (i.e. the bytes of a file, or the children of a directory). We need some way to identify the children within a directory. Iterating through a directory doesn't make much sense if the files within it don't have a name... If the intention is that an "entry" represents an OS-level file/directory, then giving each entry a name (and contents, as mentioned above) is probably what we want. Alternatively, we could consider giving each entry a path... but that's not really how most file systems work. You can later get a file's path by walking up the directory chain, but the path isn't inherent to the file itself.
Yeah setting the path to "« the empty string »" is what I'd proposed earlier, so I'm on board with that :) #96 (comment) The entry itself still needs a name though, as per above. And we need |
Okay, that sounds reasonable. I guess that means we should keep the path[last]/name invariant in some form, probably as new bullet points in the lists I created. If you'll indulge me, what happens when you hold an entry named |
SGTM. I'll add this in the next patch
So, as of this PR, JS (i.e. a This isn't a huge deal, especially within the OPFS, since we don't expect users to be |
Fixes #59
A much more thorough (than #30) attempt at expressing that FileSystemHandles map to file paths. At a high level, it proposes that:
See #59 for the implications of this in practice.
(See WHATWG Working Mode: Changes for more details.)
Preview | Diff