-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide file streams instead of metadata to resolvers #13
Comments
This issue would go away. |
After looking deeper into multipart forms and checking top parser, I got to the same conclusion you did formidable is the way to go. After digging into the library I found a way to skip the parsing. Formidable exposes a onPart method for user to edit.
Example from their docs
form.onPart = function(part) {
if (!part.filename) {
// let formidable handle all non-file parts
form.handlePart(part);
}
} Let formidable parse the field parts and skip the files form.onPart = function(part) {
if (!part.filename) {
// let formidable handle all non-file parts
form.handlePart(part);
} else {
// skip handlePart on files, just pass as stream
this.emit('file', part.name, part);
}
} Parser will then return the parsed fields and the file stream form.parse(req, (error, file, files) => {
// files["variables.avatar"] instanceof stream.Stream === true
}) I first introduced a new metadata field to have a file size {
"variables.avatar": {
"lastModified": 1503606654000,
"name": "avatar.gif",
"size": 317866,
"type": "image/gif"
}
} But maybe embedding the metadata on the variables isn't such a bad idea? {
"query": "mutation uploadAvatarMutation($user: ID! $avatar: Upload!) {uploadAvatar(id: $id avatar: $avatar) {resulturl} }",
"variables": {
"id": "598645558d905ae18524bc55",
"avatar": {
"lastModified": 1503606654000,
"name": "avatar.gif",
"size": 317866,
"type": "image/gif"
}
},
"operationName": "uploadAvatarMutation"
} The metadata info can be picked into the file, and then overwritten with then file
I also created a ScalarType to double check Upload type is a stream and definitely not the metadata function coerceStream(value) {
if (!(value instanceof stream.Stream)) {
throw new TypeError('Field error: value is not an instance of Stream');
}
// do more checks on the properties and values
return value;
}
export default new GraphQLScalarType({
name: 'Upload',
serialize: coerceStream,
parseValue: coerceStream,
parseLiteral: coerceStream
} I'm working on putting the code I tested on a fork, let know your thoughts on this approach |
Thanks for taking the time to dig in, this is going to be tricky! I updated the description with an ideal scenario, and my starter thoughts.
If we provide metadata from the client, developers will be tempted to trust it, which they should not. The current metadata comes from parsing the actual file that is received by the server. Also, metadata will increase the size of the request. |
We are ready to start working on this. I created We have a GraphQL multipart request spec v2.0.0 draft that makes it possible to implement file deduplication, file upload streams in resolvers and aborting file uploads in resolvers. First we update At first we don't need to implement all the possible new features such as file deduplication; the main feature is streams in the resolvers. Once everything is working, I will publish new versions of all 3 projects simultaneously. |
* New API to support the [GraphQL multipart request spec v2.0.0-alpha.2](https://github.com/jaydenseric/graphql-multipart-request-spec/releases/tag/v2.0.0-alpha.2). Files no longer upload to the filesystem; [readable streams](https://nodejs.org/api/stream.html#stream_readable_streams) are used in resolvers instead. Fixes [#13](#13). * Export a new `Upload` scalar type to use in place of the old `Upload` input type. It represents a file upload promise that resolves an object containing `stream`, `filename`, `mimetype` and `encoding`. * Deprecated the `uploadDir` middleware option. * Added new `maxFieldSize`, `maxFileSize` and `maxFiles` middleware options. * `graphql` is now a peer dependency. * Middleware are now arrow functions.
* New API to support the [GraphQL multipart request spec v2.0.0-alpha.2](https://github.com/jaydenseric/graphql-multipart-request-spec/releases/tag/v2.0.0-alpha.2). Files no longer upload to the filesystem; [readable streams](https://nodejs.org/api/stream.html#stream_readable_streams) are used in resolvers instead. Fixes [#13](jaydenseric/graphql-upload#13). * Export a new `Upload` scalar type to use in place of the old `Upload` input type. It represents a file upload promise that resolves an object containing `stream`, `filename`, `mimetype` and `encoding`. * Deprecated the `uploadDir` middleware option. * Added new `maxFieldSize`, `maxFileSize` and `maxFiles` middleware options. * `graphql` is now a peer dependency. * Middleware are now arrow functions.
New API to support the GraphQL multipart request spec v2. Fixes [#13](jaydenseric/graphql-upload#13).
I will flesh out this description out in time. There are lots of little conversations about this hidden around.
Ideal scenario
A new user fills in a registration form. Fields submitted include
username
(String),password
(String)avatar
(File), andbanner
(File). The files are 4 MB each.The non-file fields can be validated in the resolver as soon as the request comes in to the server, while the uploads asynchronously stream in. It turns out the
username
is already taken, so a validation error is thrown and the request is aborted. The user gets immediate feedback without having to wait for the whole upload to complete, and the server is not unnecessary burdened.Because the files are streams in the resolver, they can be individually forwarded into cloud storage, skipping temporary storage on the API server's memory or filesystem. This allows the first
avatar
file to begin uploading to cloud storage while the second secondbanner
file is still being received by the API. Because the files are not stored on the API server, data security compliance is simpler.Thoughts
With this setup, a developer could even use different methods to store each file from a single request.
I think it is best to just substitute files for file streams in the resolvers; no more metadata. The API is less opinionated, easier to implement and less complicated to document. Users can extract whatever metadata is useful to the them from the streams.
To prevent file uploads from blocking the resolvers (as they do currently with formidable) we will probably need to provide a new
files
field in the multipart form sent from the client. This will contain a JSON array of all the file object paths, i.e.["0.variables.avatar", "0.variables.banner"]
. It, along with the existingoperations
fields should be the very first fields in the multipart form, will all the files following. This will allow us to build an operations object with placeholder file streams to then pass into the GraphQL server and the resolvers. The form can then continue streaming up in parallel to the resolvers running; each file that gets parsed can be streamed into the relevant placeholder stream constructed earlier.I suspect that to engineer the new system we might have to write a custom multipart form parser, because we need to create streams before the file fields are parsed and options like formidable only create streams as they are encountered. We might also be better off writing the new functionality directly into the GraphQL server middlewares.
Also, I think when we move to streams we should start using a new word like "blob" instead of "file", because people will be able to use any sort of binary data that can't JSON encode in their variables. Relevant: jaydenseric/extract-files#2.
The text was updated successfully, but these errors were encountered: