Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command is too large on insert #37764

Closed
JorritSalverda opened this issue May 23, 2019 · 7 comments
Closed

Command is too large on insert #37764

JorritSalverda opened this issue May 23, 2019 · 7 comments
Assignees

Comments

@JorritSalverda
Copy link

Describe the problem

Since recently I started seeing errors like the following:

pq: command is too large: 123076554 bytes (max: 67108864)

when running the following query:

INSERT INTO
	build_logs
(
	repo_source,
	repo_owner,
	repo_name,
	repo_branch,
	repo_revision,
	steps
)
VALUES
(
	$1,
	$2,
	$3,
	$4,
	$5,
	$6
)

The only field that could potentially be too large is the steps field of type JSONB. But for the specific error above mentioning 123076554 bytes, the length of the steps byte array was actually only 909502 bytes long.

The table it tries to insert into is created with:

CREATE TABLE build_logs (
	id INT NOT NULL DEFAULT unique_rowid(),
	repo_source STRING(256) NULL,
	repo_owner STRING(256) NULL,
	repo_name STRING(256) NULL,
	repo_branch STRING(256) NULL,
	repo_revision STRING(256) NULL,
	steps JSONB NULL,
	inserted_at TIMESTAMPTZ NULL DEFAULT now(),
	build_id INT NULL,
	CONSTRAINT "primary" PRIMARY KEY (id ASC),
	INVERTED INDEX build_logs_v2_steps (steps),
	INDEX build_logs_build_id_idx (build_id ASC),
	FAMILY "primary" (id, repo_source, repo_owner, repo_name, repo_branch, repo_revision, steps, inserted_at, build_id)
)

And currently has the following stats:

TABLE NAME SIZE RANGES # OF COLUMNS # OF INDICES
build_logs 21.2 GiB 787 9 3

I run a statefulset on Kubernetes Engine, with 5 nodes using the following resources:

        resources:
          limits:
            cpu: "4"
            memory: 12Gi
          requests:
            cpu: "1"
            memory: 12Gi

and 10Gi pd-ssd persistent volumes per pod.

I'm now running version v19.1.1 but it started happening before, perhaps when the table filled up to a certain amount, or when I upgraded to v19.1.0.

Environment:

  • CockroachDB version: v19.1.1
  • Server OS: docker image cockroachdb/cockroach:v19.1.1
  • Client app: golang 1.12.5, using library github.com/lib/pq v1.0.0
  • Kubernetes Engine: 1.12.7-gke.7, node type n1-highmem-8

Additional context
What was the impact?

Part of our build logs are missing due to this error.

@JorritSalverda
Copy link
Author

I've seen issues #25233, #19304 and #25828 all mentioning the same type of failure. From those I understood it's actually an internal limit being hit not so much by the size of my inserted data, but how you structure data in Cockroachdb. But judging from the comments in those issues it seemed the issue was under control, but apparently I still run into it in the latest and greatest.

@ricardocrdb ricardocrdb self-assigned this May 29, 2019
@ricardocrdb
Copy link

Hey @JorritSalverda

Hopefully, we can shine a spotlight on what is causing this issue for you. After discussing this issue with one of the engineers here, at this point we need a bit more background info. How many paths or components are there through the JSONB column? The inverted index is created and updated based on how many components are configured to be used, and this can quickly add up to be too large due to write amplification.

Let me know how it's set up and we can gauge how to proceed from there.

Cheers,
Ricardo

@ricardocrdb
Copy link

Hey @JorritSalverda

Did you have the information regarding the JSONB components? Let me know if you have any other questions as well.

Cheers,
Ricardo

@JorritSalverda
Copy link
Author

I didn't realise I have the inverted index on this column, let me first find out if I use it or I can drop it otherwise to see if it resolves the issue.

The json in the steps column looks like this:

[
  {
     "step":"prefetch",
     "image":{
        "name":"extensions/prefetch",
        "tag":"stable",
        "isPulled":false,
        "imageSize":6904626,
        "pullDuration":2290292505,
        "isTrusted":true
     },
     "duration":17948577050,
     "logLines":[
        {
           "line":1,
           "timestamp":"2019-06-06T15:32:38.795066025Z",
           "streamType":"stdout",
           "text":"Starting estafette-extension-prefetch version 0.1.15...\n"
        },

        ...
     ],
     "exitCode":0,
     "status":"SUCCEEDED",
     "autoInjected":true
  },
  {
     "step":"git-clone",
     "image":{
        "name":"extensions/git-clone",
        "tag":"stable",
        "isPulled":true,
        "imageSize":26406304,
        "pullDuration":0,
        "isTrusted":true
     },
     "duration":10522259135,
     "logLines":[
        {
           "line":1,
           "timestamp":"2019-06-06T15:32:58.932115055Z",
           "streamType":"stdout",
           "text":"Starting estafette-extension-git-clone version 0.1.58...\n"
        },
        ...
     ],
     "exitCode":0,
     "status":"SUCCEEDED",
     "autoInjected":true
  },
  ...
]

Where the logLines property in particular can become very large depending on how verbose the build steps we run are logging.

@JorritSalverda
Copy link
Author

I've dropped the inverted index and so far the errors haven't resurfaced.

If you want to do more investigation I'm happy to provide an export, but otherwise we can close the issue.

@ricardocrdb
Copy link

Hey @JorritSalverda

I think we found the cause of the issue and am happy to hear that everything is working as it should. I will go ahead and close this issue. Don't hesitate to reach out in case of any other issues.

Cheers,
Ricardo

@wecall1
Copy link

wecall1 commented Sep 21, 2023

I'm seeing this again

ERROR:  command is too large: 174059870 bytes (max: 67108864)

with this naive concat attempt:

insert into recordings (data, recording_id)
values (
(SELECT string_agg(c, '') as data
FROM
(select data as c -- , timestamp
from chunks
where recording_id='anton'
order by timestamp asc) d) ,
'anton' )
;

where each chunks.data is ~ 94KB of bytea

PS Should I even mention that postgres does not even choke on this even on supabase minimal configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants