Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/kv: add guidance on large statements #7524

Closed
awoods187 opened this issue Jun 17, 2020 · 7 comments · Fixed by #10568
Closed

sql/kv: add guidance on large statements #7524

awoods187 opened this issue Jun 17, 2020 · 7 comments · Fixed by #10568
Assignees
Labels
C-doc-improvement P-1 High priority; must be done this release

Comments

@awoods187
Copy link
Contributor

awoods187 commented Jun 17, 2020

We discuss this briefly in #4409, and I recently filed an issue to improve the message in cockroachdb/cockroach#50330, but we should add guidance on large transactions. We have had recent customers tripped up on this limitation (cc @kai-niemi )

I've pulled out relevant conversation [from slack below] (https://cockroachlabs.slack.com/archives/CHVV403F0/p1590406019235800) that shows input from @bdarnell and @knz .

We currently have a 16 MiB limit due to our implementation of the postgres network protocol. it's a magic number we could change, but we've mainly discussed that in the context of lowering it, not raising it. write amplification is the main thing that prevents us from excelling at workloads with a lot of large blobs. there's also increased memory usage and especially increased variance in memory usage that can lead to instability. and if rows containing large blobs are not insert-only, you run the risk of hitting the max range size with mvcc history (less of an issue now with the larger default range size).

We can provide guidance that using placeholders will help, since that way you only get charged for the size of the value without the encoding necessary to fit it into a SQL statement.

if large blobs are important to your application, i'd generally recommend storing them in s3 or similar and only storing references to them in your RDBMS

See also:

@awoods187 awoods187 added A-kv-storage C-doc-improvement P-1 High priority; must be done this release labels Jun 17, 2020
@bdarnell
Copy link
Contributor

This slack thread was about large statements, not large transactions (#4409 was about transaction size).

@bdarnell bdarnell changed the title sql/kv: add guidance on large transactions sql/kv: add guidance on large statements Jun 17, 2020
@awoods187
Copy link
Contributor Author

Ah thanks for clarifying that!

@ericharmeling ericharmeling self-assigned this Jul 15, 2020
@ericharmeling ericharmeling added P-2 Normal priority; secondary task and removed P-1 High priority; must be done this release labels Jul 15, 2020
@awoods187 awoods187 added P-1 High priority; must be done this release and removed P-2 Normal priority; secondary task labels Aug 17, 2020
@awoods187
Copy link
Contributor Author

Bumping this up in priority. It's worth noting that cockroachdb/cockroach#50330 will be worked on soon.

@ericharmeling ericharmeling added P-2 Normal priority; secondary task and removed P-1 High priority; must be done this release labels Oct 26, 2020
@awoods187
Copy link
Contributor Author

I think we should improve this messaging and update it this release.

@awoods187 awoods187 added P-1 High priority; must be done this release and removed P-2 Normal priority; secondary task labels Jan 7, 2021
rmloveland added a commit that referenced this issue Apr 20, 2021
@ericharmeling
Copy link
Contributor

I'd like to add this to our list of known limitations (as I document limitations for the 21.1 release), but I'm not sure we should go into so much detail on write amplification/blob storage support in the known limitation description. We should keep it practical.

I'm thinking something like:

### Size limits on statement input from SQL clients

CockroachDB imposes a hard limit of 16MiB on the data input for a single statement passed to CockroachDB from a client (including the SQL shell).

As a workaround, we recommend that you do the following:

- Use parameterized statements, with placeholders in your statements instead of real values.
- For large blob input, store the blobs somewhere your client can access them (using a cloud storage service, for example), and then reference their addresses from your statement.

@bdarnell Am I oversimplifying this limitation and the recommendations?

Hopefully with that description, we can close this out.

@rmloveland I'll lean on Rich to develop a more complete resolution to documenting write amplification #6433.

@bdarnell
Copy link
Contributor

I don't think parameterized statements help enough to be worth calling out. (it's at best a factor of 2 for the encoding overhead, right? An 8MiB blob is not going to be a good experience even if parameterization makes the difference for whether it fits under the 16MiB limit or not)

I think the guidance is basically "don't do that" (if anything, you're undersimplifying :)). I'd put the note about cloud storage for blobs on the blob type page instead of tying it to the statement size limit (because blobs are unadvisable in CRDB well before you hit that limit - we recommend no more than 1MiB per value).

@ericharmeling
Copy link
Contributor

SGTM. Thanks Ben!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-doc-improvement P-1 High priority; must be done this release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants