Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: asynchronous blob backing store #10969

Merged
merged 33 commits into from
Jul 5, 2021
Merged

Conversation

jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Jun 15, 2021

This is my first time trying to contribute and learn the in and outs of how deno works internally - so be kind and helpful plz
I haven't learn how to fully work with all linting, (WPT) tests, webidl and what you can/can't do. And i do not know anything about rust, so i'm thankful for lucacasonato help on this PR.

Trying to solve some of the discussion that have been going on in #10539

  • Mainly that blob should be read lazily and copying (slicing a blob) don't increase the memory (it should just create an other references point with an other offset+size to start/stop read from/to)
  • And that is a step towards retrieving blob/files from the filesystem

Realized that i broke FormData encoder that is reading from the (now removed) _byteSequence - but it's fixed now
URL.createObjectURL also depended on _byteSequence - have also been refactored on the Rust side


Also did some manually testing on the new getFile() fn seems to work out nicely.
Got replaced \w BlobReferences - still missing BlobReferences.fromPath doe

Non essential for a MVP:

@CLAassistant
Copy link

CLAassistant commented Jun 15, 2021

CLA assistant check
All committers have signed the CLA.

@jimmywarting

This comment has been minimized.

Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you started on this!

There needs to be some changes to file backed blobs, mainly that they should not use the standard Deno FS APIs, as those go via regular permission checks. Instead there should be some ops in Rust that can only write files to a specific tmp dir, and can only access these files (and nothing else on disk). To JS they would only be exposed as UUIDs. These ops would not require any permissions. I can add these ops if you don't feel comfortable with Rust or the deno ops system.

Additionally we need to figure out what our metric for creating file based blobs is. We always know the size of a blob at creation, so do automatically create a file backed blob if the blob is larger than a specific threshold? Do we create blobs once we have used a fixed amount of memory for blob backing memory allocations in the process? File backed blobs should be fully transparent to the user. They should have no knowledge of them existing.

As you mentioned, FormData currently synchronously reads from Blob. This is not possible anymore with file backed blobs, so the entire FormData serializer needs to be turned into something that returns either a ReadableStream or an async iterable of chunks. Additionally URL.createObjectURL also synchronously reads from Blob - that also needs to be updated. That will require some changes to how object url blobs are stored in Rust.

extensions/web/09_file.js Outdated Show resolved Hide resolved
extensions/web/09_file.js Outdated Show resolved Hide resolved
extensions/web/09_file.js Outdated Show resolved Hide resolved
@jimmywarting
Copy link
Contributor Author

jimmywarting commented Jun 15, 2021

To JS they would only be exposed as UUIDs. These ops would not require any permissions. I can add these ops if you don't feel comfortable with Rust or the deno ops system.

Yes plz, thank you. I have never worked with rust before.

There needs to be some changes to file backed blobs, mainly that they should not use the standard Deno FS APIs, as those go via regular permission checks. Instead there should be some ops in Rust that can only write files to a specific tmp dir, and can only access these files (and nothing else on disk). To JS they would only be exposed as UUIDs. These ops would not require any permissions. I can add these ops if you don't feel comfortable with Rust or the deno ops system.

Additionally we need to figure out what our metric for creating file based blobs is. We always know the size of a blob at creation, so do automatically create a file backed blob if the blob is larger than a specific threshold? Do we create blobs once we have used a fixed amount of memory for blob backing memory allocations in the process? File backed blobs should be fully transparent to the user. They should have no knowledge of them existing.

EDIT: Not at the moment. But it could be a grate idea to offload some of the memory to a tmp file in the background if they are not being used. But could instead do that in a new feature PR. Have manly just focus on getting support for async blob sources. Currently don't see offloading chunks as a huge benefit ATM. mainly wanted to be able to get a Blob/File from a filesystem and use them in FormData and other places await Deno.getFile('./readme.md', 'text/plain') (more like chrome sandboxed filesystem or the native-filesystem-access getFile function)

That was the main reason why I added the getFile() function to have it public available on Deno.getFile (not to read temp files that have been offloaded) 😜 Deno.open() returns a File, but not the kind of File i'm looking for that can be used with FormData.
I also wanted to be able to do: Deno.getFile(path).then(file => file.text())

FormData serializer needs to be turned into something that returns either a ReadableStream or an async iterable of chunks

...Or a blob 🙃 #10969 (comment) ☝️
quite simple to do actually, synchronous (with the help of async blob sources).

Additionally URL.createObjectURL also synchronously reads from Blob - that also needs to be updated. That will require some changes to how object url blobs are stored in Rust.

Again, would like to leave this up to you. haven't worked with Rust before. It would need references points i guess...


Will fix all you reviews

reverted some webidl, spec step
@andreubotella
Copy link
Contributor

andreubotella commented Jun 15, 2021

As you mentioned, FormData currently synchronously reads from Blob. This is not possible anymore with file backed blobs, so the entire FormData serializer needs to be turned into something that returns either a ReadableStream or an async iterable of chunks.

A couple months ago I worked on putting together a spec definition of multipart/form-data that could be included in the WHATWG standards (see whatwg/html#6424), including a serializer that returns a list of chunks which are either byte sequences or File objects: https://andreubotella.github.io/multipart-form-data/

@jimmywarting
Copy link
Contributor Author

jimmywarting commented Jun 15, 2021

Oh, speaking of serializing FormData boundary. On bad thing the blob spec is doing is lowercasing the hole mimetype

See: whatwg/html#6251

when, blob cast the type to all lowercase it looses it boundary value, and it's no longer the same
multipart/form-data; boundary=----WebKitFormBoundaryaUetjQwudca4vbSu
multipart/form-data; boundary=----webkitformboundaryauetjqwudca4vbsu

I guess they are planing on only lowercasing the mimetype + keys (but keeping the values as is)
example mimt/type; key=VaLuE
...the spec haven't been updated yet (i think)

we (in fetch-blob) also did this lowercase but removed it to be less strict

I think we should do this in the meanwhile:

-    return normalizedType.toLowerCase();
+    return normalizedType;

@lucacasonato
Copy link
Member

@jimmywarting Are you on the Deno discord (https://discord.gg/deno)? I have a basic implementation of the Rust side for this done, but have some questions - might be easier to chat there than through GitHub issues / PR review :-)

@jimmywarting
Copy link
Contributor Author

Are you on the Deno discord

@lucacasonato yes I'm
I joined the discord

provided a util fn to retrieve all BlobReferences as a flat array for createObjectURL
  Hold the parts in a WeakMap instead to access them from outside

Removed BlobDataItem (mostly do same thing as BlobReferences)
@lucacasonato
Copy link
Member

The remaining test failure is an issue with an upstream WPT test. PR: web-platform-tests/wpt#29517

@lucacasonato
Copy link
Member

Blocked on denoland/rusty_v8#706 for FinalizationRegistry support

@lucacasonato lucacasonato changed the title Async blob source refactor: asynchronous blob backing store Jul 3, 2021
@lucacasonato lucacasonato marked this pull request as ready for review July 3, 2021 19:02
@lucacasonato lucacasonato self-requested a review July 3, 2021 19:02
Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot @jimmywarting, this is a fantastic great refactor to prepare us for the native file system API, and opaque file backed blobs.

@bartlomieju could you take a look too?

cli/file_fetcher.rs Show resolved Hide resolved
extensions/fetch/Cargo.toml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants