[DISCUSS] Validate new document writes against max_http_request_size #1253

janl · 2018-03-29T11:24:38Z

This supersedes #1200.

New Behaviour

This variant introduces no new config variable and no formula, instead, there is a set of three hurdles each doc write has to pass:

doc body size
individual attachment size
length of multipart representation of full doc body and including attachments.

The validation path is now the following:

If a new doc body is > max_document_size, we throw an error.
If a new attachment is > max_attachment_size, we throw an error.
If the new doc body plus new and/or existing attachments is > max_http_request_size, we throw an error.

Notes

This is again just a sketch to show how something like this could look like. The patch is fairly minimal, but it does include a full additional ?JSON_ENCODE of the doc body, and some munging of the attachment stubs, that I’d like to get a performance review for. I’m sure we can make this fast if we need to, but that would require a larger patch, so it’s this sketch for now.

Compatibility

This also sets the max_document_size to 2 GB, to restore 1.x and 2.0.x compatibility as per #1200 (comment)

I’d suggest we make this a BC change in 3.0 to the suggest 64MB or whatever we feel is appropriate then.

Formalities

Includes tests
Documentation has been updated // waiting for consensus before doing this

The validation path is now the following: If a new doc body is > max_document_size, we throw an error. If a new attachment is > max_attachment_size, we throw an error. If the new doc body in combination with new and/or existing attachments is > max_http_request_size, we throw an error. This also sets the max_document_size to 2 GB, to restore 1.x and 2.0.x compatibility. Closes apache#1200

src/couch/src/couch_doc.erl

rnewson · 2018-03-29T11:32:59Z

src/couch/src/couch_att.erl

@@ -708,6 +709,9 @@ upgrade(#att{} = Att) ->
 upgrade(Att) ->
    Att.

+to_tuple(#att{name=Name, att_len=Len, type=Type, encoding=Encoding}) ->


not keen on the name. a record already is a tuple, so I'm guessing the only purpose here is to ignore other fields (and any new fields).

The only reason this exists is because couch_multipart_httpd:length_multipart_stream() makes assumptions about the format of the stub. This is usually called from chttpd. I didn’t find a way around this without exposing couch_att’s att# record.

But yeah, I’m very okay with any other name here. What would be better?

[16:26:10] <+jan____> rnewson: do you have a better name for to_tuple in https://github.com/apache/couchdb/pull/1253/files#r178029434? [16:27:14] <+rnewson> I don't, proceed with to_tuple.

janl · 2018-03-29T13:50:20Z

The travis fails here point to https://github.com/apache/couchdb/blob/master/test/javascript/tests/attachments.js#L300-L301 where we allow attachments stubs to be written verbatim and without length info. There is code to resolve this, but it requires reading the attachment info from disk.

I’m not yet implementing this because I wan’t a review of the approach here first. Could be perf-prohibitive on the write path, tho.

rel/overlay/etc/default.ini

wohali · 2018-05-01T04:15:44Z

Note that whatever we do here - especially if this PR is not merged into 2.2.0 - needs to be documented for the 2.2.0 release, in light of the concerns raised in #1304.

nickva · 2018-07-13T15:47:07Z

src/couch/src/couch_doc.erl

+    Boundary = couch_uuids:random(), % mock boundary, is only used for the length
+    Atts = lists:map(fun couch_att:to_tuple/1, Atts0),
+    {_, DocSum} = couch_httpd_multipart:length_multipart_stream(Boundary,
+        ?JSON_ENCODE(Body), Atts),


When we re-encode, we use jiffy, but unless it is a request made by the replicator, the user probably didn't use erlang to encode the data, so we could get a different value. There is also some performance loss in say re-encoding larger document bodies back to json just to check their size.

There is no canonical json encoding and so no canonical encoded json size. Above we are calculating the encoded size using a conservative size estimate (giving the user a benefit of the doubt) and with better performance (https://github.com/nickva/couch_ebench). Maybe make a version of length calculation that takes sizes and then we'd pass the already computed couch_ejson_size:encoded_size(Doc#doc.body) and 32 for boundary size.

Thanks @nickva, I’ll give this a ponder!

janl · 2018-07-13T15:50:59Z

I propose the following:

cherry-pick the change that restores the 4GB request size limit to master/2.2.0.
leave this branch/pr open until we made the validate-request-size-on-write function a) complete and b) fast.
when the function is ready, add it to a post 2.2.0 release with a note that in the future this will be enabled by default and a config option for folks to opt-in at that point.
eventually make it opt-out.

I don’t see this being finished any time soon, and since the end-result is functionally equivalent (sans the opt-in) for 2.2.0, this should not block 2.2.0.

nickva · 2018-07-13T15:55:32Z

@janl I'll try to make the function. It is close enough to the other one. Otherwise, I think we can keep it.

Use the already computed (conservative) body size. Switch multipart length calcuation to accept body and boundary sizes. Issue apache#1200 Issue apache#1253

Better total (body + attachments) size checking for documents

nkosi23 · 2019-06-30T15:05:21Z

When we talk about 64Mb, 2Gb etc... are we talking about default values that can be raised by the user, or hard limits that the end user cannot change?

For example, if 64Mb is selected for 3.0, would individual users still be able to set 4Gb and more?

As far as we are concerned, we enjoy CouchDb for the ability to use it as a file server / streaming server and keep all the data in one place to ensure database consistency (compared to keeping references to third party storage services such as S3, and then keeping everything in sync, ensuring links are not dead etc... Couchdb is a blackbox neatly integrated with the Application Layer, thus greatly simplifying maintenance and system administration).

Filtered replication is particularly nice with this regard, since it becomes extremely easy to create clusters of multimedia attachments to balance load. For example, one cluster may only contains replicated videos, and the Application Layer uses Command-Query separation to route the user to the right cluster depending on the type of Query being performed. All of this is transparent from the application's perspective, the various addresses of clusters handling specific query types simply need to be configured during the deployment of the application, and further load balancing can be done at the DNS level.

CouchDb hugely simplifies the infrastructure work. The management of large multimedia clusters becomes a breeze with continuous filtered replication, and consumption by application users is very efficient thanks to CouchDb support for HTTP range requests.

It would be interesting to retain the ability to have large documents / attachments / requests sizes for those who know what they are doing.

wohali · 2020-10-09T15:58:47Z

@janl This is very stale. Any plans to get this in? I am preparing to mass-close old PRs that never got merged.

bessbd · 2021-03-29T11:47:09Z

@janl This is very stale. Any plans to get this in? I am preparing to mass-close old PRs that never got merged.

@janl / @wohali : ping

wohali · 2021-03-29T16:41:42Z

@bessbd I think what Jan proposed in #1253 (comment) is basically done. We've pushed the default limit in 3.x pretty low, and as you know 4.0 changes all the rules.

I think it is probably safe to close this out, but I'd like to see @janl +1 that.

adityajoshi12 · 2021-05-04T03:29:29Z

is there any update on this issue?

saurabhprasadsah · 2022-01-11T10:41:51Z

this is very stale.

janl added this to the 2.2.0 milestone Mar 29, 2018

janl requested review from rnewson, wohali and nickva March 29, 2018 11:24

janl force-pushed the feat/max-req-size branch from ed935b7 to 6755eac Compare March 29, 2018 11:25

janl mentioned this pull request Mar 29, 2018

[DISCUSS] CouchDB Request Size Limits #1200

Closed

rnewson reviewed Mar 29, 2018

View reviewed changes

wip

ee70b56

wohali mentioned this pull request May 1, 2018

Make default httpd.max_http_request_size consistent #1304

Closed

wohali reviewed May 1, 2018

View reviewed changes

rel/overlay/etc/default.ini Outdated Show resolved Hide resolved

wohali mentioned this pull request May 9, 2018

Attachment length is undefined in validation function #1320

Closed

wohali mentioned this pull request Jun 25, 2018

Network performance issue (30 secs for 8mb PUT) #1409

Closed

janl added 3 commits July 13, 2018 16:25

fix comment

0e06697

hardcode mock uuid

616ec02

remove debug log

797c6d7

nickva reviewed Jul 13, 2018

View reviewed changes

janl removed this from the 2.2.0 milestone Jul 13, 2018

smarter uuid

e6ae447

Merge branch 'master' into feat/max-req-size

dac755b

Better total (body + attachments) size checking for documents

244442c

Use the already computed (conservative) body size. Switch multipart length calcuation to accept body and boundary sizes. Issue apache#1200 Issue apache#1253

nickva mentioned this pull request Jul 13, 2018

Better total (body + attachments) size checking for documents janl/couchdb#6

Merged

Merge pull request #6 from cloudant/better-total-doc-size-checking

e2f556d

Better total (body + attachments) size checking for documents

janl mentioned this pull request Jul 16, 2018

re-raise max_http_request_size to 4GB #1446

Merged

cluxter mentioned this pull request Oct 8, 2018

Streaming API for attachment data #1540

Open

wohali mentioned this pull request Aug 15, 2019

Update default config settings (Q, max_document_size, etc.) #2115

Closed

wohali changed the base branch from master to main October 21, 2020 18:22

dheimoz mentioned this pull request Dec 1, 2020

Safely maintaining a rolling history of large documents pouchdb/pouchdb#8244

Closed

nickva force-pushed the main branch from e41407e to a1fc807 Compare June 7, 2022 20:15

louwers mentioned this pull request Oct 23, 2023

Allow configuring JSON request body size limits pouchdb/pouchdb-express-router#16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Validate new document writes against max_http_request_size #1253

[DISCUSS] Validate new document writes against max_http_request_size #1253

janl commented Mar 29, 2018

rnewson Mar 29, 2018

janl Mar 29, 2018

janl Jul 13, 2018

janl commented Mar 29, 2018

wohali commented May 1, 2018

nickva Jul 13, 2018

janl Jul 13, 2018

janl commented Jul 13, 2018

nickva commented Jul 13, 2018

nkosi23 commented Jun 30, 2019

wohali commented Oct 9, 2020

bessbd commented Mar 29, 2021

wohali commented Mar 29, 2021

adityajoshi12 commented May 4, 2021

saurabhprasadsah commented Jan 11, 2022

[DISCUSS] Validate new document writes against max_http_request_size #1253

Are you sure you want to change the base?

[DISCUSS] Validate new document writes against max_http_request_size #1253

Conversation

janl commented Mar 29, 2018

New Behaviour

Notes

Compatibility

Formalities

rnewson Mar 29, 2018

Choose a reason for hiding this comment

janl Mar 29, 2018

Choose a reason for hiding this comment

janl Jul 13, 2018

Choose a reason for hiding this comment

janl commented Mar 29, 2018

wohali commented May 1, 2018

nickva Jul 13, 2018

Choose a reason for hiding this comment

janl Jul 13, 2018

Choose a reason for hiding this comment

janl commented Jul 13, 2018

nickva commented Jul 13, 2018

nkosi23 commented Jun 30, 2019

wohali commented Oct 9, 2020

bessbd commented Mar 29, 2021

wohali commented Mar 29, 2021

adityajoshi12 commented May 4, 2021

saurabhprasadsah commented Jan 11, 2022