Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve file upload in new platform route handlers #56944

Open
pgayvallet opened this issue Feb 6, 2020 · 5 comments
Open

Improve file upload in new platform route handlers #56944

pgayvallet opened this issue Feb 6, 2020 · 5 comments
Labels
discuss Feature:New Platform Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

File upload is something we overlooked regarding route handler features.

A concrete example on how to handle file upload in legacy:

export const createImportRoute = (
prereqs: Prerequisites,
server: Hapi.Server,
supportedTypes: string[]
) => ({
path: '/api/saved_objects/_import',
method: 'POST',
config: {
pre: [prereqs.getSavedObjectsClient],
payload: {
maxBytes: server.config().get('savedObjects.maxImportPayloadBytes'),
output: 'stream',
allow: 'multipart/form-data',
},
validate: {
query: Joi.object()
.keys({
overwrite: Joi.boolean().default(false),
})
.default(),
payload: Joi.object({
file: Joi.object().required(),
}).default(),
},
},
async handler(request: ImportRequest, h: Hapi.ResponseToolkit) {
const { savedObjectsClient } = request.pre;
const { filename } = request.payload.file.hapi;
const fileExtension = extname(filename).toLowerCase();
if (fileExtension !== '.ndjson') {
return Boom.badRequest(`Invalid file extension ${fileExtension}`);
}
return await importSavedObjects({
supportedTypes,
savedObjectsClient,
readStream: createSavedObjectsStreamFromNdJson(request.payload.file),
objectLimit: request.server.config().get('savedObjects.maxImportExportSize'),
overwrite: request.query.overwrite,
});
},
});

  • the output and allow must be specified (options already handled in NP)
    payload: {
      maxBytes: server.config().get('savedObjects.maxImportPayloadBytes'),
      output: 'stream',
      allow: 'multipart/form-data',
    },
  • the validation uses a raw object to define the file
payload: Joi.object({
        file: Joi.object().required(),
      })

atm, the file is a readable stream enhanced by some HAPI wrapper to add additional data such as filename

interface HapiReadableStream extends Readable {
  hapi: {
    filename: string;
  };
}

then when accessing the properties:

const { filename } = request.payload.file.hapi;

Questions

  • How should we define a file regarding route validation. Should we introduce schema.file() and/or a schema.stream()
  • Do we want to wrap the HAPI file wrapper to avoid explicit hapi usages when accessing the properties (request.payload.file.hapi.filename)? That sounds a little tricky/fragile as we will need to look at the hapi parsed payload/body to convert their internal stream type to ours.
@pgayvallet pgayvallet added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:New Platform labels Feb 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@mshustov
Copy link
Contributor

Should we introduce schema.file() and/or a schema.stream()

@kbn/config-schema already provides schema.stream.

Do we want to wrap the HAPI file wrapper to avoid explicit hapi usages when accessing the properties

We need to do something here since Hapi interface shouldn't leak outside. We can handle output: 'stream', accepts: 'multipart/form-data', internally to extend a request object with multipart data, as Hapi does.

My only concern with the current implementation that it stores parsed content in the memory. From Hapi typings for payload.output:

 * * 'stream' - the incoming payload is made available via a Stream.Readable interface. If the payload is 'multipart/form-data' and parse is true, field values are presented as text while files are
 * provided as streams. File streams from a 'multipart/form-data' upload will also have a hapi property containing the filename and headers properties. Note that payload streams for multipart
 * payloads are a synthetic interface created on top of the entire mutlipart content loaded into memory. To avoid loading large multipart payloads into memory, set parse to false and handle the
 * multipart payload in the handler using a streaming parser (e.g. pez).

If the necessity to access the filename the only reason for that, we could change our API not to use FormData (we send the single file anyway) and to pass filename via headers.
Side notes: Hapi uses https://github.com/hapijs/subtext to parse payload and https://github.com/hapijs/pez to parse multipart data We can handle output: 'stream', accepts: 'multipart/form-data', internally to extend a request object with multipart data, as Hapi does. My only concern with the current implementation that it stores parsed content in the memory...If the necessity to access the filename the only reason for that, we could change our API to pass filename via headers.

@pgayvallet
Copy link
Contributor Author

If the necessity to access the filename the only reason for that, we could change our API not to use FormData (we send the single file anyway) and to pass filename via headers.

multipart/form-data is a pain to handle properly, however that's one of the oldest http transfert 'protocol' for files (and has been the only one for a very long time). I really think we should support that one way or another in core (except if we decide that doesn't make sense in a modern SPA, which can be an option).

Also that would mean breaking changes in the SO import/export API, even if this API is 'experimental'.

We can handle output: 'stream', accepts: 'multipart/form-data', internally to extend a request object with multipart data, as Hapi does.

We can do that. It feels like reinventing the wheel a little though, as our underlying http framework already handles it for us. If the file.hapi access/naming is really a concern, we could probably provides our own file type and convert/construct our own file-stream structure from hapi's parsed content? (that doesn't answer the next point though)

My only concern with the current implementation that it stores parsed content in the memory

Not sure to see any other option to use payload content during request's lifecycle without asking the handler to do the parsing itself, which sounds like poor developer experience to me?

@mshustov
Copy link
Contributor

We can do that. It feels like reinventing the wheel a little though, as our underlying http framework already handles it for us. If the file.hapi access/naming is really a concern, we could probably provides our own file type and convert/construct our own file-stream structure from hapi's parsed content? (that doesn't answer the next point though)

Yes, we can keep using hapi under the hood.
If we switch from multipart/form-data, it doesn't change the logic a lot, since hapi provides file content as a stream and we parse it manually anyway


I think adding a wrapper around hapi parsed file is the simplest option at the moment.

@pgayvallet
Copy link
Contributor Author

If we switch from multipart/form-data, it doesn't change the logic a lot, since hapi provides file content as a stream and we parse it manually anyway

How do we handle routes with additional payload values without multipart though? We will be sending a json body payload with the 'binary' content of the file in a field?

options: {
body: {
maxBytes: maxImportPayloadBytes,
output: 'stream',
accepts: 'multipart/form-data',
},
},
validate: {
body: schema.object({
file: schema.stream(),
retries: schema.arrayOf(
schema.object({
type: schema.string(),
id: schema.string(),
overwrite: schema.boolean({ defaultValue: false }),
replaceReferences: schema.arrayOf(
schema.object({
type: schema.string(),
from: schema.string(),
to: schema.string(),
}),
{ defaultValue: [] }
),
})
),
}),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:New Platform Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants