Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does this proposal relate to other filesystem web APIs? #4

Open
othermaciej opened this issue Oct 14, 2020 · 46 comments
Open

How does this proposal relate to other filesystem web APIs? #4

othermaciej opened this issue Oct 14, 2020 · 46 comments
Assignees

Comments

@othermaciej
Copy link

How does this API relate to File System Access API, File API and File and Directory Entries API? Those three technologies seem to relate to and integrate with each other to various extents, but this Explainer does not appear to be integrated with them at all.

It would be regrettable if the web platform ended up with multiple disjoint ways of accessing the filesystem, where FileHandle and FileSystemHandle are totally unrelated objects.

@guest271314
Copy link

Those three technologies seem to relate to and integrate with each other to various extents, but this Explainer does not appear to be integrated with them at all.

File System Access API (Native File System) provides a means to write and read files to local filesystem, from any origin. There is a method described to write only to sandboxed "origin" described in that specification https://wicg.github.io/file-system-access/#sandboxed-filesystem. I have used the method locally for testing, though sparingly for experiments compared to using the File System Acess API in conjunction with inotify-tools outside of the scope of the specification, yet within the capabilities, to execute arbitrary shell scripts and run native applications https://github.com/guest271314/requestNativeScripts.

File API is used in HTML <input type="file">, among other API's, including FormData; Drag and Drop; Clipboard API; et al.

File and Directory Entries API is suited for iteration of directories and multiple files uploaded at <input type="file"> with and without webkitdirectory and allowdirs, multiple attributes set https://stackoverflow.com/q/39664662, not for writing files, though technically a FileList could be programmatically created and set with arbitrary File objects using DataTransfer https://stackoverflow.com/questions/47119426/how-to-set-file-objects-and-length-property-at-filelist-object-where-the-files-a.

The FileSystem API is also still in existence https://stackoverflow.com/questions/37502091/how-to-use-webkitrequestfilesystem-at-file-protocol and in use in code at GitHub repositories.

The plain language of this explainer isolates access to "origin", "sandboxed" in some form of "storage". That model resembles FileSystem API.

Note: While user agents will typically implement this by persisting the contents of this origin private file system to disk, it is not intended that the contents are easily user accessible.

The technologies relate to the extent that File and Blob objects are exposed to user for data storage in memory and on disk , not necessarily to original intent of every API that happens to use Blob and File or FileList in the API algorithms.

An API's use of "file" or formally File or Blob does not mean this API is the same as to intent or purpose as File API, File System Access API, or File and Directory Entries API.

This explainer begins with the premise that some users expect a specifcation to be developed and implemented to isolate their data, in a "sandbox", "not intended that the contents are easily user accessible" which is a reasonable use case, and clear intent of this explainer.

Ultimately the File or Blob, or other data storage technique employed, whether "sandboxed" or not, is written to users' memory and, or hard disk https://stackoverflow.com/a/56419176.

Note, though not specified, it is also possible to read and write files stored at "sandboxed" origin (browser configuration folder) at command line https://stackoverflow.com/questions/36098129/how-to-write-in-file-user-directory-using-javascript/36098618#36098618.

File System Access API begins with the premise that the user themselves will provide permission to read and write directly to their own filesystem, un-"sandboxed". That is the distinction.

@guest271314
Copy link

Note: read and write currently take a SharedArrayBuffers as a parameter. This is done to highlight the fact that it might be possible to observe changes to the buffer as the browser processes it. The implications of this and the possibility of using simpler ArrayBuffers are being discussed.

is intereting.

At 32-bit found through testing that there is a limit at Chromium as to how much a SharedArrayBuffer from WebAssembly.Memory.grow() can actually grow https://bugs.chromium.org/p/v8/issues/detail?id=7881.

File System Access API does not currently support reading a file while the file is being written as a stream https://bugs.chromium.org/p/chromium/issues/detail?id=1084880 without reading the entire file and checking size https://github.com/guest271314/captureSystemAudio#stream-file-being-written-at-local-filesystem-to-mediasource-capture-as-mediastream-record-with-mediarecorder-in-real-time, where ideally, weshould be able to do

        start.onclick = async e => {
          class AudioWorkletProcessor {}
          class AudioWorkletNativeFileStream extends AudioWorkletProcessor {
            constructor(options) {
              super(options);
              this.byteSize = 512 * 344 * 60 * 50;
              this.memory = new Uint8Array(this.byteSize);
              Object.assign(this, options.processorOptions);
              this.port.onmessage = this.appendBuffers.bind(this);
            }
         // ..
         try {
            start.disabled = true;
            controller = new AbortController();
            signal = controller.signal;
            const { body: readable } = await fetch(
              'http://localhost:8000?start=true',
              {
                cache: 'no-store',
                signal,
              }
            );
            aw.port.postMessage(readable, [readable]);
          } catch (e) {
            console.warn(e);
          } finally {
             
          }
        };
        stop.onclick = e => {
          controller.abort();
          start.disabled = false;
        };

<?php 
  if (isset($_GET["start"])) {
    header("Access-Control-Allow-Origin: *");
    header("Content-Type: application/octet-stream");
    echo passthru("parec -v --raw -d alsa_output.pci-0000_00_1b.0.analog-stereo.monitor");
    exit();
  }

on a File being written to, substituting something like FileSystemFileHandle.readable for fetch(). Use of SharedArrayBuffer also is a distinction between this explainer and the API's listed at OP, for users that are expecting "sandbox" storage specification and implementation.

These are just observations.

I found this repository this time while experimenting with means of communication between localhost and any origin, presently with QuicTransport - to determine if there is any simpler way to run shell scripts and native applications from JavaScript from the browser and get the results as a file other than the ways I have already achieved, namely using File System Access API and inotify-tools. The origin isolation mandate appaear to rule out this proposal exclude this API from that capability set - without investing time into isolating where in the user data directory Chromium stores the data.

@othermaciej
Copy link
Author

This explainer begins with the premise that some users expect a specifcation to be developed and implemented to isolate their data, in a "sandbox", "not intended that the contents are easily user accessible" which is a reasonable use case, and clear intent of this explainer.
...
File System Access API begins with the premise that the user themselves will provide permission to read and write directly to their own filesystem, un-"sandboxed". That is the distinction.

I believe this claim is incorrect. File System Access API offers access to an origin private filesystem in addition to providing ways for the user to grant access to portions of the native filesystem. Does this API provide access to the same virtual per-origin filesystem or a different one? The explainer is silent on this but from the Chrome implementation bug is seems like it uses the same underlying storage.

Even if the claim was correct, it would still be wrong to have different and totally incompatible APIs for real files and sandboxed virtual files. That's imposing needless complexity on developers using the web platform.

@guest271314
Copy link

@othermaciej FWIW I tend to agree with your general analysis. This could be incorporated into File System Access API, even the interesting experimental usage of SharedArrayBuffer. Presently I have no use case to test the API. I can just write files directly to the filesystem, without concerning myself with "origin" or "isolation" or searching through ~/,config/chromium to determine where the data is being stored. Chromium has already shipped the API. That does not mean other browsers need to follow just because Chromium did so.

Would much prefer a single File System Access API at Firefox than for Firefox to take time trying to decide what NativeIO is and taking the time comparing with File System Access API thereto, particularly given Mozilla recently announced changes in their operational structure. File System Access API, in general provides covereage for both sandboxed and un-sandboxed file system reads and writes.

@othermaciej
Copy link
Author

I agree that it would be better to fold in any new capabilities here to File System Access API. If there's a need for this proposal to be worked on separately before it can be merged, then monkey patching File System Access API would be a better temporary measure.

@guest271314
Copy link

These experiments https://github.com/fivedots/nativeio-porting-tutorial, https://github.com/fivedots/emfs when expanded for given use cases can be useful towards developing the capability to establish persistent watching of a file or directory for events (read, write, modify) https://bugs.chromium.org/p/chromium/issues/detail?id=1019297 in JavaScript and to read a FileSystemFileHandle as the file is being written to by a non-web application, without the need to read the entire file multiple times, similar to or the same as I am able to stream data to the browser using fetch() and php passthru() (currently working on a QuicTransport() version). What is possible to achieve at proof-of-concepts relevant to complete control over the file system read, write, stream, events and notifications, et al. procedures at the above repositories should be possible to implement within the scope of not only private origin filesystem, also at native file system via File System Access API.

Monkey patches can lead to unintended consequences that can remain in spite of a initial temporary intent, for example, there is still code that uses URL.createObjectURL(mediaStream) in the wild that works as intended otherwise kazuki/video-codec.js#12. File was temporarily transferable, then no longer transferable w3c/FileAPI#32. Though API's should be conceived of as having the capacity to expand beyond initial use cases, without necessitating creating an entirely different specification, then merge into the open-mindedly conceived specification that is well-equipped to merge additional sections of content into the overall document. I have been banned from several oganizations that I attempted to contribute to or join, so take my input at face value, I just compose and test code in the field to the point it breaks.

@guest271314
Copy link

@othermaciej

I agree that it would be better to fold in any new capabilities here to File System Access API.

What do the principal parties think about that idea?

@fivedots
Copy link
Collaborator

I'd like to thank both of you for the feedback and discussion, it's been very helpful!

We are currently talking with the owners of the File System Access API to explore the possibilities. I'll update this issue soon with the conclusions that come out of that.

@othermaciej
Copy link
Author

Thanks for the consideration, @fivedots . I'm pretty confident this strategy can work. If there's tricky design problems with it I'd be happy to help brainstorm solutions. The web will be much better off if we have just one API that offers virtual sandboxed filesystem access.

@fivedots
Copy link
Collaborator

Hello, I wanted to give you an update. We are continuing our discussions with the Chrome storage team. Our main fear is that by merging with the File System Access API, we will compromise the goals of either the Origin Private File System or NativeIO. In particular, there is a risk that the high level concepts used in File System Access API might bind us to slower performance. We are considering adding a special set of functions to the Origin Private File System, but the risk there is that, by breaking symmetry with File Systems Access API, we will end up with a less coherent interface and more cognitive load on the developers.

That being said, we understand why simplifying the platform by merging (if we can do it without strongly impacting use cases) would lead to a better result. We are working on benchmarks that should shed some light on the compromises that would have to be made, after we have that data we can figure out which trade-off is the better one. My proposal is that we temporarily pause this discussion, I will ping it again when we have more data.

Also thanks for the offer @othermaciej, it would be great to collaborate. Maybe we can discuss it after the benchmarks are done!

@fivedots
Copy link
Collaborator

fivedots commented Feb 5, 2021

Hello, I want to give you another short update:

We have made progress on creating a couple of benchmarks that should help inform our decision, the next steps are to make them easily reproducible and publish the results. We are still in quite close contact with the storage team, and the question of how we relate to Filesystem Access API is still at the top of our list.

We've had to pause our efforts on this to shift our focus to meeting a couple of important deadlines that are coming up. Still, I wanted to assure you we remember this issue and that we want to properly resolve it as soon as we get some spare cycles back.

I also wanted to point out that we've asked for an official position from WebKit here.

I'll keep you posted as things develop!

@othermaciej
Copy link
Author

I'm not sure why benchmark results are relevant. If an implementation of this new API is faster than using existing Filesystem Access API, then that would not be a reason to create a new wholly separate file API. Instead, it would suggest that either (a) the implementation of Filesystem Access API needs to be optimized; or (b) if needed, additional API surface should be added to Filesystem API to enable better efficiency; (c) or both. I do not see how any benchmark result would show that it is a good idea to create a completely disjoint notion of a file handle with its own new API, that cannot even be converted back and forth to the existing kind.

@guest271314
Copy link

What is missing from File System Access API are

  • Direct, non-temporary, read and write to and from files
  • Capability to read and write only portion of a file, without reading the entire file

Of interest re "benchmarks", I do not know if Storage Foundation API has been approached by Chrome "security team" to internally scan files written by user with proprietary "Google Safe Browsing" https://safebrowsing.google.com/ - which is not disclosed in the File System API specification whatsoever https://wicg.github.io/file-system-access/ - which would certainly explain why File System Access API is slower than Storage Foundation API: the file is written twice.

The two list items above are criteria enough alone for an API that performs those two procedures without undue and undisclosed restrictions, as those are serious impairments that will continue to be a problem for users in the filed that have to keep filing bugs to, perhaps, eventually, get the truth behind why File System Access API behaves the way that it does now https://bugs.chromium.org/p/chromium/issues/detail?id=1168715#c17.

@RReverser
Copy link

Capability to read and write only portion of a file, without reading the entire file

You can certainly do both of those operations already. You can read slices from the file, as well as write to a given position.

@guest271314
Copy link

You can certainly do both of those operations already. You can read slices from the file, as well as write to a given position.

Not without reading the entire file into memory. If you refute that fact kindly post a minimal, complete, verifiable example of that procedure, here.

And evidently proprietary "Google Safe Browsing" algorithms are being baked in to the process https://wicg.github.io/file-system-access/#malware-scans-and-safe-browsing-checks, which means worse case scenario Google is reading and analyzing every byte written and read, at best case scenario "virus" and "malware" "protections" will still fail when the "virus" or "malware" is not already known to the algorithms https://security.stackexchange.com/a/202306.

@guest271314
Copy link

Capability to read and write only portion of a file, without reading the entire file

You can certainly do both of those operations already. You can read slices from the file, as well as write to a given position.

If that were true and correct I would not have to read the entire file here https://github.com/guest271314/captureSystemAudio/blob/master/native_messaging/file_stream/app/captureSystemAudio.js#L159.

@kaizhu256
Copy link

You can certainly do both of those operations already. You can read slices from the file, as well as write to a given position.

is there a pathway to integrate slicing and appending with wasm-sqlite? i would prefer if that's possible, but would settle for storage-foundation-api if its not.

@guest271314
Copy link

@kaizhu256 I do not know what wasm-sqlite does. When I tried to read part of a file using File System Access API (Native File System) that is simply not possible right now. In brief see https://bugs.chromium.org/p/chromium/issues/detail?id=1084880. Reinterated in https://bugs.chromium.org/p/chromium/issues/detail?id=1168715, which reveals that Google "security team" is internally pushing their algorithms into the file writing and reading process https://bugs.chromium.org/p/chromium/issues/detail?id=1168715#c17

The problem here is indeed that our security team really wants us to perform safe browsing analysis of written files, before the files are available with their normal file name/extension (it is also for this reason that downloads are written to a temporary file, and only renamed once safe browsing checks pass).

If anything that should be user opt-in, not compulsory. What happens when I turn off proprietary "Google Safe Browsing" at Chrome settings? The user does not know. Given that temporary files are still written when "Google Safe Browsing" is disabled/turned off and that the reason given for writing temporary files, filtered through some unknown algorithms, "Google Safe Browsing" algorithms are still not turned off even though I manually turned off that setting. Or, if "Google Safe Browsing" really is turned off relevant to File System Access API when I turn off "Google Safe Browsing" for the entire browser, temporary files are being written for no reason, because I want no parts of "Google Safe Browsing" verified by turned off the setting.

@ddumont
Copy link

ddumont commented Feb 20, 2021

I would really like some of the concepts you talk about here to make it into the actual File System API. The shortcomings you mention above are a complete failure of that api and need to be solved, not side-stepped.

We are still in quite close contact with the storage team, and the question of how we relate to Filesystem Access API is still at the top of our list.

@fivedots I am very curious to know through what channels you are in close contact with the storage team. I very much would like to be able to speak with them about some of the issues linked here (one is a bug I opened)

I appreciate the effort involved in this project. I really do hope it pushes the chrome storage team to make their api better, because it's not very useful right now.

@jimmywarting
Copy link

capability to read and write only portion of a file is possible. even truncating/setLength

// Reading part of the file
await (await fileHandle.getFile()).slice(0, 100).arrayBuffer()

// writing part to the file
var writable = await fileHandle.createWritable()
writable.write(data, offset)
writable.seek(offset)
writable.close()

But the performance penalties off chromes atomic copy-modify-replace and being more secure is a no brainer for me also that needs to be addressed by bringing back/in some "inPlace" option.
I think they should at least not scan the private sandboxed storage.

@guest271314

This comment has been minimized.

@guest271314

This comment has been minimized.

@jimmywarting

This comment has been minimized.

@guest271314

This comment has been minimized.

@jimmywarting

This comment has been minimized.

@jimmywarting

This comment has been minimized.

@guest271314

This comment has been minimized.

@jimmywarting

This comment has been minimized.

@mkruisselbrink
Copy link

This seems somewhat off-topic for this issue, but handle.getFile() indeed is not supposed to read the entire file, it merely stats the file to figure out the last-modified timestamp. I believe there are some edge cases with particular chrome os file system backends where calling getFile() will end up reading the entire "file" because that is the only way to determine for example the size of the file, but those edge cases should be pretty rare and shouldn't generally effect things.

@guest271314

This comment has been minimized.

@jimmywarting

This comment has been minimized.

@guest271314

This comment has been minimized.

@guest271314

This comment has been minimized.

@RReverser

This comment has been minimized.

@fivedots
Copy link
Collaborator

I've hidden some comments in order to keep the issue on topic, i.e. Storage Foundation's relationship to other storage APIs and the state of our conversation with the Chrome Storage Team re: similarities with the Origin Private File System. If you'd like to continue the hidden detailed discussion, please open another issue.

@rstz
Copy link
Contributor

rstz commented Apr 13, 2021

(CC: @annevk , since we got feedback from him regarding this issue)

Hello all,

We've been exploring a few ways to unify Storage Foundation with the File System Access API, by extending the surface of the Origin Private File System. The options we've considered so far can be found here. It would be great to have input on what you consider the best way forward, so please let us know what you think!

@annevk
Copy link

annevk commented Apr 14, 2021

Thanks @rstz for following up! The stream-based approach looks promising and as @domenic notes an improvement over the status quo of FileSystemFileHandle. I've asked others at Mozilla to take a look as well.

@fivedots
Copy link
Collaborator

fivedots commented Jun 2, 2021

After looking at the feedback on the options to merge Storage Foundation API and File System Access API (mentioned here), we've written a more concrete proposal. It describes a few extensions that could be made to the Origin Private File System in order to support our use cases.

Feedback is very welcome!

@jimmywarting
Copy link

jimmywarting commented Jun 3, 2021

Creating a File through getFile() is possible when a lock is in place. The returned File behaves as it currently does in OPFS i.e. it is invalidated if file contents change after it was created. In our particular case this means that Files created while there is an active handle will be invalidated when a flush is executed (either explicitly through flush() or implicitly by the OS). It also means that these Files could be used to observe flushed changes done through the new API, even if a lock is still being held.

It would have been nice if it didn't invalidated the file so that you could create a File instance + a ObjectURL and play a video while you at the same time download something for streaming compatibility. (so that no modification to the file invalidates the file)

var x = await y.createFile('video.mp4')
await x.truncate(videoFileSize)
var file = await x.getFile()
video.src = URL.createObjectURL(file) // Start watching

// then download, write and stream the video at the same time
x.writer(buffer, offset)

Kind of how it works with VLC when you don't have the hole content yet. you can look at the video while it's still downloading.
Faced similar issue before... Didn't want to use Media Source Extension (since it's so complicated to support many format and do seeking and other stuff) ended up using service worker + evt.respondWith(new Response(new ReadableStream(...))) instead

@mkruisselbrink
Copy link

Unfortunately a File (or Blob) object fundamentally represents a fixed set of bytes, and a lot is built on top of that assumption. I.e. slicing a Blob, or creating another Blob containing a Blob merely add another reference to the same underlying data, but don't copy any data (also postMessaging a Blob to a different origin merely shares a reference to the same data). As such any API based around Blob can't represent a file after is is modified. Being able to stream from a file while it is being written to is a useful feature though, and one that might be achieved by the "readable" exposed by the API proposed in this thread (although only if the readable and writable do not share a cursor? If they share the same cursor presumably it wouldn't work to streaming read from the file while also writing to it).

@jimmywarting
Copy link

jimmywarting commented Jun 3, 2021

I know that they are immutable and represents a fixed set of bytes and slicing and constructing a new blob with that blob would be tough (built a spec compliant fetch-blob package after all - so i kind of know how they work internally)

My point was just to bring out the useful feature if it where possible to do something like it. to try and hatch some new ideas/features

I guess it can be possible with chromes old sandboxed filesystem where you can get a filesystem url
I'm not sure if it works but i think you are able to watch the movie while modifying/appending data to the file at the same time

video.src = fileHandle.toURL() // "filesystem:https://example.com/temporary/video.mp4"

Then you are not using a immutable blob


The browser will realize it has a content-length and a accept-bytes header so it can abort and just partially download a chunk of the beginning and when the browser needs more it would just make a new partial request when needed (at which point you have already written the data beforehand) knowing this headers also makes it seekable


What if you could create a object url from a fileHandle or cursor and not from a blob? URL.createObjectURL(fileHandle)

@asutherland
Copy link

In discussions relating to ServiceWorkers and I think Storage at TPAC, there's generally been a desire to specify/implement the existing HTMLMediaElement.srcObject more broadly, like on <img> as discussed in the WHATWG HTML spec, and avoid introducing any more uses of URL.createObjectURL since it's a very easy way to accidentally leak a lot of memory and because there's been inter-op and privacy/tracking problems in terms of where that URL is valid outside a given document.

Note that the discussion for ServiceWorkers was primarily dealing with letting Response objects be directly fed to DOM objects in Window contexts without needing to involve a ServiceWorker (or Blobs/Files).

@jimmywarting
Copy link

jimmywarting commented Jun 3, 2021

☝️ that's neat

@jespertheend
Copy link

Fwiw I think the streams based approach is an excellent way to do it. Having this proposal be an extension on top of OPFS sounds ideal to me, and I’m glad this is being explored. This way OPFS and FSA can benefit from the same performance additions that Storage Foundation would add, without there being any confusion between what apis to use.

@othermaciej
Copy link
Author

After looking at the feedback on the options to merge Storage Foundation API and File System Access API (mentioned here), we've written a more concrete proposal. It describes a few extensions that could be made to the Origin Private File System in order to support our use cases.

Any new updates on this proposal? Does it seem likely to move forward?

Also, sorry for not giving feedback on this proposal earlier. I've circulated the proposal among some of my Apple colleagues, and we like the stream-based API; it seems like that's what was selected anyway.

We will have more comments on specific details once there's a PR (or delta draft) to review.

@tomayac
Copy link
Contributor

tomayac commented Aug 6, 2021

There is now a concrete proposal to add a createAccessHandle() method to the FileSystemFileHandle object, which happens in the context of merging this API with the Origin Private File System.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests