Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new built-in middleware bodyLimit #2077

Closed
wants to merge 15 commits into from
Closed

feat: new built-in middleware bodyLimit #2077

wants to merge 15 commits into from

Conversation

EdamAme-x
Copy link
Contributor

Author should do the followings, if applicable

I saw this DISCUSSION and implemented it.
https://github.com/orgs/honojs/discussions/2048
It takes a little time to prepare an explanation.

  • Add tests
  • Run tests
  • yarn denoify to generate files for Deno

@EdamAme-x EdamAme-x marked this pull request as draft January 24, 2024 08:27
@EdamAme-x
Copy link
Contributor Author

EdamAme-x commented Jan 24, 2024

the little time, sry, i will provide an explanation without specific codes.
bodyParser is a middleware that enables pre-parsing of text, json, form, etc.
It is also possible to specify the maximum amount of data to be parsed.

I believe that it is also a (D)DoS countermeasure.

@EdamAme-x
Copy link
Contributor Author

I believe there are a couple of features missing on this.

The first is the error handler call from the context object mentioned earlier. (But this can be seen as already done.)

The second is when the maximum size is not specified.
the case where the maximum size is not specified.
This can be implemented immediately and will be implemented in the future. (Even tomorrow)

@EdamAme-x EdamAme-x marked this pull request as ready for review January 24, 2024 09:54
@EdamAme-x EdamAme-x marked this pull request as draft January 24, 2024 09:59
@EdamAme-x
Copy link
Contributor Author

Initially, the size limit is not set.

@EdamAme-x
Copy link
Contributor Author

The following test file will give you a rough idea of how to use it.

import { Hono } from '../../hono'
import { Uint, bodyParser } from '.'

describe('bodyParse works', () => {
  const app = new Hono()

  app.post(
    '/max-64mb/body',
    bodyParser({
      type: 'body',
      limit: 64 * Uint.mb,
    }),
    (c) => {
      return c.json({
        size: c.var.body<Blob>().size,
        type: c.var.body<Blob>().type,
      })
    }
  )

  it('Should return same body on POST', async () => {
    const data = {
      hono: 'is',
    }
    const res = await app.request(
      new Request('https://localhost/max-64mb/body', {
        method: 'POST',
        body: JSON.stringify(data),
      })
    )

    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.json()).toEqual({
      size: 13,
      type: 'text/plain;charset=utf-8',
    })
  })

  app.post(
    '/max-64mb/json',
    bodyParser({
      type: 'json',
      limit: 64 * Uint.mb,
    }),
    (c) => {
      return c.json(c.var.body())
    }
  )

  it('Should return same json on POST', async () => {
    const data = {
      hono: 'is',
    }
    const res = await app.request(
      new Request('https://localhost/max-64mb/json', {
        method: 'POST',
        body: JSON.stringify(data),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.json()).toEqual(data)
  })

  app.post(
    '/max-64mb/json-error',
    bodyParser({
      type: 'json',
      limit: 64 * Uint.mb,
    }),
    (c) => {
      return c.json(c.var.body())
    }
  )

  it('Should return json-error on POST', async () => {
    const data = `{
      hono: 'is',
    }cool`

    const res = await app.request(
      new Request('https://localhost/max-64mb/json-error', {
        method: 'POST',
        body: data,
      })
    )

    expect(res).not.toBeNull()
    expect(res.status).toBe(500)
    expect(await res.text()).toBe('Internal Server Error')
  })

  app.post(
    '/max-64mb/text',
    bodyParser({
      type: 'text',
      limit: 64 * Uint.mb,
    }),
    (c) => {
      return c.text(c.var.body<string>())
    }
  )

  it('Should return same text on POST', async () => {
    const data = 'hono is cool'
    const res = await app.request(
      new Request('https://localhost/max-64mb/text', {
        method: 'POST',
        body: data,
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })

  app.post(
    '/max-64mb/form',
    bodyParser({
      type: 'form',
      limit: 64 * Uint.mb,
    }),
    (c) => {
      return c.text(c.var.body<FormData>().toString())
    }
  )

  it('Should return same formData on POST', async () => {
    const data = new FormData()
    const res = await app.request(
      new Request('https://localhost/max-64mb/form', {
        method: 'POST',
        body: data,
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('[object FormData]')
  })

  app.use(
    '/max-32bit/*',
    bodyParser({
      type: 'body',
    })
  )

  app.all('/max-32bit', (c) => c.text('hono is cool'))

  it('Should return hono is cool on GET', async () => {
    const res = await app.request(
      new Request('https://localhost/max-32bit', {
        method: 'POST',
        body: JSON.stringify({
          hono: 'is',
        }),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })

  it('Should return hono is cool on POST', async () => {
    const res = await app.request(
      new Request('https://localhost/max-32bit', {
        method: 'POST',
        body: JSON.stringify({
          hono: 'is',
        }),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })

  it('Should return hono is cool on PUT', async () => {
    const res = await app.request(
      new Request('https://localhost/max-32bit', {
        method: 'PUT',
        body: JSON.stringify({
          hono: 'is',
        }),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })

  it('Should return hono is cool on PATCH', async () => {
    const res = await app.request(
      new Request('https://localhost/max-32bit', {
        method: 'PATCH',
        body: JSON.stringify({
          hono: 'is',
        }),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })

  app.use(
    '/max-8byte',
    bodyParser({
      type: 'text',
      limit: 8 * Uint.b,
    })
  )

  app.post('/max-8byte', (c) => c.text('hono is cool'))

  it('Should return 413 Request Entity Too Large on POST', async () => {
    const res = await app.request(
      new Request('https://localhost/max-8byte', {
        method: 'POST',
        body: JSON.stringify({
          hono: 'is',
        }),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(413)
    expect(await res.text()).toBe('413 Request Entity Too Large')
  })

  it('Should return hono is cool on POST', async () => {
    const res = await app.request(
      new Request('https://localhost/max-8byte', {
        method: 'POST',
        body: 'hono',
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })

  app.use(
    '/max-8byte-custom',
    bodyParser({
      type: 'text',
      limit: 8 * Uint.b,
      handler: (c) => {
        return c.text('not cool', 413)
      },
    })
  )

  app.post('/max-8byte-custom', (c) => c.text('hono is cool'))

  it('Should return not cool with 413 on POST', async () => {
    const res = await app.request(
      new Request('https://localhost/max-8byte-custom', {
        method: 'POST',
        body: JSON.stringify({
          hono: 'is',
        }),
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(413)
    expect(await res.text()).toBe('not cool')
  })

  it('Should return hono is cool on POST', async () => {
    const res = await app.request(
      new Request('https://localhost/max-8byte-custom', {
        method: 'POST',
        body: 'hono',
      })
    )
    expect(res).not.toBeNull()
    expect(res.status).toBe(200)
    expect(await res.text()).toBe('hono is cool')
  })
})

@EdamAme-x EdamAme-x marked this pull request as ready for review January 24, 2024 10:28
@yusukebe
Copy link
Member

Hi @EdamAme-x

I think the purpose of this feature is to limit the size. In that case, I think the name bodyParser is too generic. What do you think?

@EdamAme-x
Copy link
Contributor Author

To be honest, I was thinking that too.
Do you have any good ideas...?

@EdamAme-x
Copy link
Contributor Author

I thought about it for a while, and since the name of koa's middleware with equivalent functionality is also bodyParser, I feel that the name bodyParser is acceptable in light of this.

@goisaki
Copy link
Contributor

goisaki commented Jan 25, 2024

@EdamAme-x @yusukebe
Hmmm. The name bodyParser certainly doesn't seem right.
(If Hono uses bodyParser as an argument when registering a route, and it is usually omitted, then this name is not strange!)

In Koa, the middleware with this name is literally used to parse the body.
But in Hono, the place where this middleware is used is not to parse the body. That is why.
What about making it bodyLimit?
It seems simple and easy to understand

app.post(
  '/',
  bodyLimit({
    type: 'body',
    limit: 64 * Uint.mb,
  }),
  (c) => {}
)

… or bodyFilter?

@goisaki
Copy link
Contributor

goisaki commented Jan 25, 2024

@EdamAme-x
and if the limit could also be set as an array, i think it would be better.

@yusukebe
Copy link
Member

I like bodyLimit 👍

@EdamAme-x
Copy link
Contributor Author

Thanks. @nabeken5 @yusukebe
I like this too.

@EdamAme-x
Copy link
Contributor Author

@EdamAme-x

and if the limit could also be set as an array, i think it would be better.

Can you give me a sample code for this?

@EdamAme-x
Copy link
Contributor Author

I am thinking of narrowing this down to the ability to limit the size of the BODY.

I think Hono's function is sufficient for parsing.

@goisaki
Copy link
Contributor

goisaki commented Jan 25, 2024

I think there are times when we want to put a limit on a specific TYPE like this:

app.post(
  '/',
  bodyLimit([
    {
      type: 'text',
      limit: 0,
    },
    {
      type: 'json',
      limit: 64 * Uint.mb,
    },
    {
      type: 'form',
      limit: 0,
    }
  ]),
  (c) => {}
)

@goisaki
Copy link
Contributor

goisaki commented Jan 25, 2024

I am thinking of narrowing this down to the ability to limit the size of the BODY.

I think Hono's function is sufficient for parsing.

me too

switch (parseOptions.type) {
  case 'body':
    parsedData = blob
    break
  case 'text':
    parsedData = await blob.text()
    break
  case 'json':
    parsedData = JSON.parse(await blob.text())
    break
  case 'form':
    parsedData = await c.req.formData()
    break
  default:
    parsedData = await blob.text()
    break
}

https://github.com/EdamAme-x/hono/blob/e5afb9bca1124610adf52e34b384683e0301fb86/src/middleware/body-parser/index.ts#L50-L66

I feel like this process is not what bodyLimit should be doing, can it be omitted?

@EdamAme-x EdamAme-x marked this pull request as draft January 25, 2024 22:25
@EdamAme-x
Copy link
Contributor Author

EdamAme-x commented Jan 25, 2024

app.post(
  '/',
  bodyLimit([
    {
      type: 'text',
      limit: 0,
    },
    {
      type: 'json',
      limit: 64 * Uint.mb,
    },
    {
      type: 'form',
      limit: 0,
    }
  ]),
  (c) => {}
)

This idea is great!
thanks

@EdamAme-x EdamAme-x changed the title feat: new built-in middleware bodyParser feat: new built-in middleware bodyLimit Jan 26, 2024
@usualoma
Copy link
Member

usualoma commented Jan 26, 2024

Sorry if you all know this, but it's just a comment for your information. Please point out any errors in my understanding.

Need to limit size of body

It is very important to limit the size of body on the server side. For example, as @EdamAme-x wrote, it is necessary for countermeasures against DOS attacks.

What can we do with the Web Standard's API?

In hono, (await c.req.raw.blob()).size is probably the lowest level API call to get the size, but by the time you do this, the data has been read into memory. (In all environments -bun, deno, node and wrangler dev - the memory usage was used at this stage, up to the size of the body.)

I don't disagree that checking the size here before JSON.parse() etc. is a mitigating measure, so I don't object to the middleware implementation that is currently in progress. However, we should have a common understanding that ``rejecting a request before reading it into memory'' is not possible in hono.

const app = new Hono()

app.use('*', async (c, next) => {
  console.log((await c.req.raw.blob()).size) //   request body is already loaded into memory here.
  await new Promise((resolve) => setTimeout(resolve, Infinity))
  await next()
})

app.get('/', (c) => c.text('Hello World'))

Effective methods for countermeasures against DOS attacks

If you need to severely limit the size of the body in a production environment, you should consider the following method instead of hono.

@yusukebe
Copy link
Member

In hono, (await c.req.raw.blob()).size is probably the lowest level API call to get the size, but by the time you do this, the data has been read into memory. (In all environments -bun, deno, node and wrangler dev - the memory usage was used at this stage, up to the size of the body.)

I'm thinking the same thing. It may be out of Hono's scope, since the Web standards API does not have the feature to know the body size before loading contents on the memory. It is like a fact we can't get an IP address from a Request object.

If this feature is useful, it is to know the size of the JSON before parsing it, but it does not have that much impact.

Now, I'm asking if Cloudflare Workers can limit the size of a request body or not, but I think it can't do it.

@usualoma
Copy link
Member

@yusukebe

Now, I'm asking if Cloudflare Workers can limit the size of a request body or not, but I think it can't do it.

Thanks, I understand!

@yusukebe
Copy link
Member

Cloudflare Workers and Deno Deploy work with v8 isolate and often the process dies off by itself. Of course, we want to prevent DoS attacks, but this is no more important than a server that has to be up all the time. Also, a Fastly Compute application starts a process every time a request is made.

And with the Edge platform, the platform itself has an approach to defend against DoS attacks, so I don't think applications like Cloudflare Workers need to worry about it not so much.

We need to discuss which platforms Hono should target, but the mindset is different from the case that the server is always running like Node.js and Bun(but I think such like a Bun Deploy might be Edge-like) and the edge platform.

@EdamAme-x EdamAme-x marked this pull request as ready for review January 26, 2024 06:08
Copy link
Member

@usualoma usualoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EdamAme-x

content-type

If the Content-Type is not what is expected, the size will not be checked. I don't think this is the behavior you expect.

app.post(
  '/',
  bodyLimit({
    type: 'json',
    limit: 1,
  }),
  async (c) => {
    return c.text(JSON.stringify(await c.req.json()))
  }
)
$ curl -X POST -H "Content-Type: application/json" -d '{"message":"OK"}' http://127.0.0.1:8787
413 Request Entity Too Large                                                                        
$ curl -X POST -H "Content-Type: text/plain" -d '{"message":"OK"}' http://127.0.0.1:8787
{"message":"OK"}

in bun

This is FYI, but in bun, c.req.raw.clone() does not seem to get the original body, I think it is a bug in bun, but we need to be aware of this limitation if we merge this middleware.

oven-sh/bun#6348

c.req.raw.clone().blob()

This may depend on the execution environment, but when checked with workerd, it appears that the memory usage is larger, albeit temporarily, when c.req.raw.clone().blob() is executed compared to a simple c.req.parseBody(). We need to know that these overheads will occur.

app.post('/c.req.parseBody', async (c) => {
  const data = await c.req.parseBody()
  console.log(Object.keys(data))
  console.log((data['file1'] as File).size)
  console.log((data['file2'] as File).size)
  await new Promise((resolve) => setTimeout(resolve, 5000))
  return c.text('Hello')
})

app.post('/c.req.raw.blob', async (c) => {
  const data = await c.req.raw.blob()
  console.log(data.size)
  await new Promise((resolve) => setTimeout(resolve, 5000))
  return c.text('Hello')
})

app.post('/c.req.raw.clone.blob', async (c) => {
  const data = await c.req.raw.clone().blob()
  console.log(data.size)
  await new Promise((resolve) => setTimeout(resolve, 5000))
  return c.text('Hello')
})
$ curl -X POST -F file1=@100MB -F file2=@100MB http://127.0.0.1:8787/c.req.parseBody & sleep 3; ps -xm -o rss -p $(pgrep workerd)
   RSS
490800
# reload app
$ curl -X POST -F file1=@100MB -F file2=@100MB http://127.0.0.1:8787/c.req.raw.blob & sleep 3; ps -xm -o rss -p $(pgrep workerd)
   RSS
302080
# reload app
$ curl -X POST -F file1=@100MB -F file2=@100MB http://127.0.0.1:8787/c.req.raw.clone.blob & sleep 3; ps -xm -o rss -p $(pgrep workerd)
   RSS
648032

API

I don't think I understand what limit is if it is just bodyLimit and limit. If it is middleware to limit size, I think it is more appropriate to have the keyword size somewhere.

bodyLimit({
  type: 'json',
  maxSize: 1,
})
bodyLimit({
  type: 'json',
  size: 1,
})

src/middleware/body-limit/index.ts Outdated Show resolved Hide resolved
@usualoma
Copy link
Member

To be honest, in my opinion, although I admit that this middleware is effective in some situations, the results obtained may not be worth the cost of implementation and maintenance, since the scope of the web standard API does not allow checking before loading into memory, and a naive implementation will inevitably increase memory when checking.

@EdamAme-x
Copy link
Contributor Author

EdamAme-x commented Jan 27, 2024

yes.
I have been thinking about this issue...

Hi @EdamAme-x

content-type

If the Content-Type is not what is expected, the size will not be checked. I don't think this is the behavior you expect.

I will change to size. Thanks.

I will continue my research on this.

@yusukebe
Copy link
Member

@EdamAme-x @nabeken5 @usualoma

Hey, forks! I've implemented it with another approach and created PR #2103.

Check it!

@usualoma
Copy link
Member

@yusukebe
Wow, I missed raw.body.getReader(). I think this is an awesome idea that might solve the problem.

Here are the issues I have identified at #2103

@usualoma
Copy link
Member

usualoma commented Jan 27, 2024

For example, if it is the case that we only need to limit it to a value as small as 10 MB, I think the current implementation of #2103 will give us the results we are hoping for.

The use of memory consumption became larger at the stage of accepting requests in bun, and even when using #2103, memory consumption tended to be higher than in other runtimes. Whether #2103 improves the situation or not depends on the runtime.

@EdamAme-x EdamAme-x marked this pull request as draft January 28, 2024 01:58
@EdamAme-x
Copy link
Contributor Author

Sorry.
I had pull from the main branch by mistake and it was resolved.

@EdamAme-x
Copy link
Contributor Author

EdamAme-x commented Jan 28, 2024

I refactored this implementation as a pullrequest.

#2103

@usualoma
Copy link
Member

@EdamAme-x @yusukebe

Replacing c.req.raw may be controversial, and the error handling may not be correct since I haven't written any tests, but anyway, I think it is better to implement the size limit when using raw.body.getReader() like #2109

With #2109, any runtime will not consume significantly more memory than if it were not used bodyLimit middleware. (However, in the case of nodejs, honojs/node-server#133 must be merged)

@yusukebe
Copy link
Member

Hi @usualoma

#2109 is awesome! I've merged honojs/node-server#133 and released a new version of @hono/node-server includes the change.

@usualoma
Copy link
Member

Postscript.

I did a little more research and found that bun, deno, wrangler dev, node-server, or any environment, did not read data longer than the Content-Length specified.

const app = new Hono()
app.post('/', async (c) => c.text('hello: ' + await c.req.raw.text()))

For this script, for bun, deno, and wrangler dev, the following is true.

$ curl -X POST -d 'test' -H 'Content-Length: 2' -i localhost:$port/
HTTP/1.1 200 OK
Content-Length: 9
Content-Type: text/plain;charset=UTF-8

hello: te

On node-server I get "400 Bad Request".

Given this behavior, we would prefer to rely on Content-Length when it is present.

fdc2a43

@yusukebe
Copy link
Member

Given this behavior, we would prefer to rely on Content-Length when it is present.

I think this is good.

@usualoma
Copy link
Member

usualoma commented Feb 4, 2024

I think the items this middleware needs are as follows

  • No overhead on normal, non-malicious requests.
    • If there is overhead on normal requests, no one uses this middleware in the first place
  • Non-malicious, over-limit requests can be rejected immediately
  • Non-malicious, slightly special (e.g., chunked) requests may have some overhead
    • But not to the point of using several times more memory
  • We should make sure that a malicious request does not use more memory than a normal request

Also, we rarely want to limit the size per content type, and we think it is fine to limit the size to the entire target request. We think it is important to be able to use the system easily without having to specify difficult details.

I think the middleware shown in the following commit hash satisfies these conditions.
fdc2a43

@EdamAme-x is the one who pulled this middleware discussion to this point, so if we can continue with this PR, that is better, but if not, I think it is better to use #2103 and put @EdamAme-x as co-author.

@EdamAme-x
Copy link
Contributor Author

thanks.
I agree with you.

@yusukebe
Copy link
Member

yusukebe commented Feb 5, 2024

@usualoma

Thanks for the helpful comment!

I, too, would like @EdamAme-x to complete this PR.

However, it will be difficult to get it in time for the v4 release on Feb. 9, so let's do it after that. No need to rush.

@yusukebe yusukebe deleted the branch honojs:v4 February 9, 2024 05:53
@yusukebe yusukebe closed this Feb 9, 2024
@yusukebe
Copy link
Member

yusukebe commented Feb 25, 2024

Hi, folks!

I'm planning to merge #2103 into the "next" branch for the new release v4.1.0. I'll add @EdamAme-x as the co-author.

And thinking of adding #2109.

@usualoma

Replacing c.req.raw may be controversial, and the error handling may not be correct since I haven't written any tests, but anyway, I think it is better to implement the size limit when using raw.body.getReader() like #2109

One thing that concerns me about this. But I don't think c.req.raw will be a problem if it passes the proper tests.

So, can you please finish #2109?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants