Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] CouchDB Request Size Limits #1200

Closed
wants to merge 1 commit into from
Closed

Conversation

janl
Copy link
Member

@janl janl commented Mar 3, 2018

Note: the text below is written in a style that would allow it to be included in the CouchDB 2.2.0 documentation and/or release notes.

CouchDB Request Size Limits

There are multiple configuration variables for CouchDB that determine request size limits. This document explains the configuration variables, how they work together, and why they exist in the first place

Why Limit Requests by Size

Allowing requests of unlimited size to any network server is a [denial of service vector](wikipedia: denial of service). To allow safe operation of CouchDB, even on a network with hostile third parties, various request size limits exist.

The Request Size Limits

max_http_request_size: the maximum number bytes a request to a CouchDB server can have.

max_document_size: the maximum number of bytes for a JSON document written to CouchDB.

max_attachment_size: the maximum number of bytes for any one attachment written to CouchDB.

Background

There are three distinct ways of getting data into CouchDB:

  1. The standard JSON Document API, which uses plain JSON, if binary data is involved, it has to be encoded as base64. The base64 option only exists for legacy reasons and it is not recommended to be used.
  2. The standalone attachment API, which allows transferring of binary attachment data without encoding as base64.
  3. The multipart HTTP API: it allows the mix of JSON data and binary attachment data without encoding as base64. The CouchDB replicator uses this.

In version 2.1, CouchDB started enforcing a 64MB limit for max_http_request_size on all requests, but did not apply this to the standalone attachment API.

This had the unfortunate side effect that one could create a doc that is smaller than max_http_request_size with an attachment that is bigger than max_http_request_size. In addition, one could create a doc with two or more attachments that were each smaller than max_http_request_size but together bigger than max_http_request_size. The result in this scenario now is that these documents could no longer be replicated to CouchDB nodes with the same default configuration (or even to the same node).

Regardless to say, this is a very unfortunate user experience: create a number of documents with attachments, and at some not immediately obvious point, replications start failing.

Large Documents and Attachments

While CouchDB works reasonably well with almost any sort of JSON data sizes and attachment sizes. The development team makes recommendations as to the various limits for ideal and optimal uses. CouchDB users may vary from these recommendations, but will need to be okay with the resulting operational implications, like increased CPU & RAM usage as well as increased latency for many core operations.

Before CouchDB 2.1.0 there were no real limits imposed, and before CouchDB 2.2.0 the available limits weren’t applied uniformly, leading to surprising behaviour as for example outlined above.

CouchDB 2.2.0 and later aims to have a complete set of limits that avoids any unexpected behaviour, but the limits imposed won’t be set by default in order to preserve backwards compatibility. Starting with

CouchDB 3.0.0 the recommended limits will be set by default and users migrating from earlier versions of CouchDB need to adjust them, if their use-case requires it. The CouchDB team might produce a utility script that would allow to determine the required settings from an existing CouchDB installation, if resources can be made available for this.

Starting with CouchDB 2.2.0, the CouchDB distribution will come with an additional configuration file local.ini-recommended* with the developer-recommended defaults and explanations for what happens when these defaults are exceeded.

An alternative solution could avoid using the max_attachments_per_doc and reject attachment additions based on the existing doc + attachments size plus the new attachment size, but this PR/Discussion suggests that having another config value with sensible defaults here will nudge users into doing the right thing

Limits by Version

In order to account for all use-cases and the interplay of the different APIs, CouchDB 2.2.0 introduces a new limit max_attachments_per_document. This allows the application of a formula to show the interplay of all limits:

max_http_request_size = max_document_size + multipart HTTP boundary data
                        + max_attachments_per_doc
                        * (max_attachment_size + multipart HTTP boundary data)

Using this formula, any doc update (JSON or attachments) can check whether it would exceed max_http_request_size which would cause replication to fail.

CouchDB Version max_http_request_size max_document_size max_attachment_size max_attachments_per_document*
2.0.0 and earlier unlimited 4GB N/a N/a
2.1.0 64MB 4GB Unlimited N/a
2.2.0* 64MB 4GB Unlimited Unlimited
3.0.0* 64MB 4MB 6MB 10
  • Proposed names and values

The table shows the approximate sizes (sans HTTP multipart boundaries) for all limits. CouchDB versions earlier than 3.0.0 will still encounter the behaviour of not being able to replicate documents that have attachments that alone or together exceed max_http_request_size.

Implementation

This draft implementation introduces the new max_attachments_per_document to show how it could work. Tests will need to be added to validate that all three API routes are covered (casual review suggests they are, but we do, of course need tests). I stopped short of adding tests so we can discuss the details of this suggestion first.

To 2.2.0 or not to 2.2.0

Since we started on the 2.2.0 milestone, this might be too big a thing to discuss and finish. I’d be very okay with bumping this to 2.3.0 as long as we document the behaviour in the 2.2.0 release notes.

@wohali
Copy link
Member

wohali commented Mar 3, 2018

I haven't read everything yet, but you are failing make check:

    couch_doc_tests: doc_from_multi_part_stream_test...*failed*
in function ets:lookup/2
  called as lookup(config,{"couchdb","max_attachments_per_document"})
in call from config_meck_original:get/3 (src/config.erl, line 148)
in call from config:get/3
  called as get("couchdb","max_attachments_per_document","infinity")
in call from couch_att:max_attachment_count/0 (src/couch_att.erl, line 723)
in call from couch_att:validate_attachment_count/1 (src/couch_att.erl, line 732)
in call from couch_doc:from_json_obj_validate/2 (src/couch_doc.erl, line 138)
in call from couch_doc:doc_from_multi_part_stream/4 (src/couch_doc.erl, line 466)
in call from couch_doc_tests:doc_from_multi_part_stream_test/0 (test/couch_doc_tests.erl, line 34)
**error:badarg
  output:<<"">>
    couch_doc_tests: doc_to_multi_part_stream_test...ok
    couch_doc_tests: len_doc_to_multi_part_stream_test...ok
    undefined
    *** context setup failed ***
**in function meck_proc:start/2 (src/meck_proc.erl, line 96)
  called as start(config,[passthrough])
in call from couch_doc_tests:mock_config/0 (test/couch_doc_tests.erl, line 138)
in call from couch_doc_tests:'-validate_docid_test_/0-fun-34-'/0 (test/couch_doc_tests.erl, line 80)
**error:{already_started,<0.28375.0>}
  [done in 0.156 s]
[done in 0.156 s]

@wohali
Copy link
Member

wohali commented Mar 3, 2018

OK, I've had a read through. Looks good, though I have some comments.

In 2.2.0, shouldn't we raise the default max_http_request_size or reduce the default max_document_size so the default configuration doesn't end up with docs too big to replicate?

We should support infinity as a valid value for all of these max_* settings. We already do support infinity for max_attachment_size`.

We should enforce max_attachment_* = 0 on the node-local port. We can either ignore, mirror, or hard-code the other limits on the node-local port. (And we should keep moving towards removing the node-local port entirely for 3.0.)

I am +1 on the idea of setting max_attachments_per_doc = 0 to disable attachments, and documenting this as a feature. We can even include text about it improving CouchDB performance, and point people at recommended external solutions for binary management if desired, especially if this is a pattern we are actively trying to discourage.

I am -0 on local.ini-recommended, as I feel our defaults should already be the recommendations, for any release. If those settings can't yet be the default (as they would be in 2.2.0 vs. 3.0.0) then we should stick them in local.ini and just comment them out. People are already used to un-commenting lines in local.ini for things they want, let's continue with that pattern.

The only way I'd think local.ini-* files would make sense is if we wanted to start shipping different settings for different standard configurations: standalone node, 3-node cluster, db-per-user setups, etc. and that's still a bit of a bandaid. (The Debian packaging attempts to hide this type of configuration from the user; why not remove the smarts from the package and stick them right in the product instead? This is well beyond the scope of this PR.)

As a side-note, I'd like to move towards any (and all?) default.ini settings to be commented out, meaning the file is effectively empty for a new install and acts almost as documentation. We see in the field plenty of people who keep their old default.ini files around - this is especially a problem with the older Docker containers - so it'd be nice to eliminate that problem. I know this is challenging for our handler definitions right now, but at least numerical settings like these max_* settings could be resolved.

Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reads well and you cover all the bases. we should obviously discuss if this is the plan, but it's a good write up of a proposed direction.

@janl
Copy link
Member Author

janl commented Mar 3, 2018

make check

The code is just meant to demonstrate how this could look. This PR is mainly meant for discussion

.ini-*

Sorry for bringing this up, I'll open another issue to discuss, it's orthogonal to this discussion.

@janl
Copy link
Member Author

janl commented Mar 5, 2018

Summing up, here are the things we need to decide:

  1. What is the desired behaviour for CouchDB?
    1. Replication can fail if sum(attachments.size) > max_http_request_size (2.1.0 and later). We document this and move on.
    2. Things fail randomly plus a DOS vector, because, if at all, we have a 4GB limit for documents sizes (CouchDB 2.0.0 and earlier).
    3. A document that can be written to CouchDB can also be replicated to another CouchDB instance with the same configuration (including itself) AND there are request/document/attachment size limits in place that make it easy to operate CouchDB on the internet.

If iii., there are two options (so far, please suggest more ;):

  1. As outlined in this issue, use a combination of max_document_size, max_attachment_size and max_attachments_per_doc to stay under max_http_request_size to ensure smooth replication.
  2. alternatively: on document/attachment write, check against a max_http_request_size quota and reject doc updates, attachment updates, and attachment additions that would take a document over max_http_request_size.

  1. Should we decide on iii, what is the baseline for the new behaviour with regards to semantic versioning?
    1. We start enforcing new limits in the next minor version release of CouchDB (2.2.0 or 2.3.0 depending on how fast we can resolve this). This would mean breaking backwards compatibility with 2.0.0 and earlier releases, including all 1.x releases. If we do this, the release notes should be very upfront about this.
    2. Same as i., but we do this for the next major version of CouchDB (3.0.0), BC-breaks are of less concern here, although the upgrade notes should be extensive.

@nickva
Copy link
Contributor

nickva commented Mar 5, 2018

Good write-up @janl

max_http_request_size is a quick and dirty way of specifying limits but not very accurate and has loopholes in it still. Because we implemented other limits, it's value has decreased. Moreover, because we fixed how it checks its limit and fixed some bugs there, its default value started breaking users' code.

In general, if the cost of checking was the same, I'd think most users would rather pick the more specific limits (doc size, att size, max num of atts etc) than thinking of those limits and then applying some formula to calculate a max http request size.

Maybe max_http_request_size should be adjusted to an upper bound of max_doc_size + max attachment_size * max_num_attachment_per_doc expected by users. They are using attachments larger than 64Mb and they seem to work. We even fixed a few issue which prevented uploading of large attachments from timing out, so now chances are users would this limit more often. So maybe at least 500Mb (about 10x increase) or even 1G? Users which worry about DOS would have to apply limits as they see fit for their environment.

Replying to the summary specifically:

1.i & ii

From a users' perspective I think 1.i is buggy and is just as random of failure 1.ii. Since when replicating the only indication of a failed write is a failed doc write count bump that the majority of users don't know about, the breakage is quite insidious. It might lead to invalid backups and it might take years before users notice the missing data their backups. So I'd pick 1.ii as better than 1.i from this point of view. A DOS is terrible but at least immediately apparent. Attachments mysteriously disappearing during replication is only to be discovered much later later is a more serious issue. I can imagine already the "My db ate my data blog posts".

1.iii

I like the max_attachments_per_doc idea. So given a max doc size, max attachment size, max num of attachment / doc we could almost automatically generate a max http request size value for mp doc puts, attachment PUTs and single document updates.

There is one more place were we'd need a limit to completely constrain the http request size based on the more precise limits - _bulk_docs requests. We'd have to restrict the max number of docs posted there. Then given max doc size + max num of docs / _bulk_docs request we can automatically calculate a reasonable upper bound on request sizes to most "update/modify/create" APIs.

So I think I like 1.iii and it seems 1.iii.2 is similar to the idea of automatically deducing a max http request size from the other limits and rejecting it update based on it. But we' instead let users specify the more precise limits instead of using http max request size as the primary constraint.

(One exception I guess if users somehow have a broken proxy or some middleware that cannot cope with large requests and they need to specifically adjust the http request size).

2

I think we should increase max http request size, at least for 2.2.0 and document how users can apply limits to avoid DOS attacks and how max request size, max doc size and max attachment sizes are related. This would unbreak customers with 64Mb attachments. Maybe add the max_attachments_per_doc and max_docs_per_bulk_doc_request though these could be in 2.3.0 perhaps.

Some more random notes:

Another way to handle some of the issues is to teach the replicator to bisect attachment PUTs just like it bisect _bulk_docs when it receives a 413 response:

handle_flush_docs_result({error, request_body_too_large}, Target, DocList) ->
Len = length(DocList),
{DocList1, DocList2} = lists:split(Len div 2, DocList),
couch_log:notice("Replicator: couldn't write batch of size ~p to ~p because"
" request body is too large. Splitting batch into 2 separate batches of"
" sizes ~p and ~p", [Len, couch_replicator_api_wrap:db_uri(Target),
length(DocList1), length(DocList2)]),
flush_docs(Target, DocList1),
flush_docs(Target, DocList2);
that would be a pain to write for attachment mp parser streamer code though...

Also it is good to keep in mind that replicator could be running on a 3rd cluster (not target or source necessarily) and it would need to handle older or other alternative CouchDB implementations. In that respect it has to "auto-discover" setting by probing, bisecting and guessing.

In the table above, technically in <2.0.0 we didn't have max document size, but only max http request size. The setting was called max document size but was just bad name and as soon as we had a proper max document size we renamed it.

@elistevens
Copy link

I have a concrete use case that I don't think is well-served by iii.1:

  • I have some documents with many small attachments
  • I have some documents with one very large attachment

My understanding is that due to this, I'd have to choose between having an unrealistically large max_http_request_size or having a sane max_http_request_size but remain in the situation where docs+attachments could be written but not replicated (once usage patterns changed).

I would much prefer iii.2, where max_http_request_size becomes effectively max_doc_plus_attachments_size and at that point I don't care if there are attachment size or count limits because I can feel confident that no matter what mix of doc and attachments, the size will stay sane.

@elistevens
Copy link

I'd like to add that if iii.1 moves forward, I'd like to have some information about what impact setting max_http_request_size to large values would have. Things like:

  • Will setting the value to something large have a performance or behavior impact if all of the actual data flowing through the system remains small?
  • Are there corresponding timeouts that might need to be increased to accommodate larger data sizes moving through the system?
  • Can the value be set to something larger than physical RAM without swapping?

Etc.

Copy link
Member

@wohali wohali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nickva 's comment on this issue in general. Thanks for the good analysis.

Autocalculating max_http_request_size from the other variables rather than having to set it explicitly would be nice. In that sense, 1.iii.2 meets POLA very nicely. Big +1 here.

I like Nick's idea on _bulk_docs, but we can't re-write older versions of the replicator. What would CouchDB 1.x do? At what point does it insert a break between docs in a _bulk_docs? There would certainly be performance implications if you set Couch to Jan's 3.0.0 defaults, limited _bulk_docs to ~100 docs, and then tried to replicate lots of ~1K documents. I assume that if 1.x hit a limit talking to 2.2.0, we'd just send a 413 back and it'd try again...but I'd like to test that explicitly before moving forward.

And what about other PUT/POST endpoints that can take potentially large bodies, such as POST /{db}/_design/{ddoc}/_view/{view} and the similar new endpoints introduced in a recent PR for databases? If we drop max_http_request_size, couldn't these blow up hugely? Sure we could add a max_multi_requests parameter, but the number of settings is starting to climb....

I almost feel like we could keep max_http_request_size just for these latter two cases, IF we document that it only applies to these endpoints, AND keep that documentation up to date for any additional endpoints of this type. We could rename it max_bulk_request_size so it's clearer that it applies only to bulk operations.

I see pluses and minuses to both of the above ideas (more max_* parameters vs. max_bulk_request_size). I have a slight preference for more max_* parameters, since that moves us towards a cardinality view that is more deterministic.

@janl
Copy link
Member Author

janl commented Mar 6, 2018

@elistevens thanks, this convinces me that max_attachments_per_doc is not a good idea and the proposed alternative is the way to go.

@wohali
Copy link
Member

wohali commented Mar 20, 2018

@janl What are we planning to do here for 2.2.0? This is a hot topic that needs resolution before we call an RC.

janl added a commit to janl/couchdb that referenced this pull request Mar 29, 2018
The validation path is now the following:

If a new doc body is > max_document_size, we throw an error.

If a new attachment is > max_attachment_size, we throw an error.

If the new doc body in combination with new and/or existing
attachments is > max_attachment_size, we throw an error.

This also sets the max_document_size to 2 GB, to restore 1.x and
2.0.x compatibility.

Closes apache#1200
janl added a commit to janl/couchdb that referenced this pull request Mar 29, 2018
The validation path is now the following:

If a new doc body is > max_document_size, we throw an error.

If a new attachment is > max_attachment_size, we throw an error.

If the new doc body in combination with new and/or existing
attachments is > max_http_request_size, we throw an error.

This also sets the max_document_size to 2 GB, to restore 1.x and
2.0.x compatibility.

Closes apache#1200
@janl
Copy link
Member Author

janl commented Mar 29, 2018

Thanks for the great discussion here. Closing this out in favour of the new proposed approach: #1253

nickva added a commit to cloudant/couchdb that referenced this pull request Jul 13, 2018
Use the already computed (conservative) body size.

Switch multipart length calcuation to accept body and boundary sizes.

Issue apache#1200
Issue apache#1253
nickva added a commit to cloudant/couchdb that referenced this pull request Jul 13, 2018
Use the already computed (conservative) body size.

Switch multipart length calcuation to accept body and boundary sizes.

Issue apache#1200
Issue apache#1253
@wohali wohali deleted the feat/max-attachment-count branch October 21, 2020 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants