Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentry: (0) fetcher.go:1336: Non-nullable column "%s:%s" with no value! Index scanned was %q with the index key columns (%s) and the values (%s) | string; string; string; string; string (see stack traces in additional data) #38577

Closed
cockroach-teamcity opened this issue Jun 30, 2019 · 13 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@cockroach-teamcity
Copy link
Member

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/1091971862/?referrer=webhooks_plugin

Panic message:

(0) fetcher.go:1336: Non-nullable column "%s:%s" with no value! Index scanned was %q with the index key columns (%s) and the values (%s) | string; string; string; string; string
(see stack traces in additional data)

Stacktrace (expand for inline code snippets):

}
err := pgerror.NewAssertionErrorf(
"Non-nullable column \"%s:%s\" with no value! Index scanned was %q with the index key columns (%s) and the values (%s)",
in pkg/sql/row.(*Fetcher).finalizeRow
if rowDone {
err := rf.finalizeRow()
return rf.rowReadyTable.row, rf.rowReadyTable.desc.TableDesc(), rf.rowReadyTable.index, err
in pkg/sql/row.(*Fetcher).NextRow
) {
row, table, index, err := rf.NextRow(ctx)
if err != nil {
in pkg/sql/row.(*Fetcher).NextRowDecoded
for i := int64(0); i < chunkSize; i++ {
datums, _, _, err := cb.fetcher.NextRowDecoded(ctx)
if err != nil {
in pkg/sql/backfill.(*ColumnBackfiller).RunColumnBackfillChunk
var err error
key, err = cb.RunColumnBackfillChunk(
ctx,
in pkg/sql/distsqlrun.(*columnBackfiller).runChunk.func1
err := txn.exec(ctx, func(ctx context.Context, txn *Txn) error {
return retryable(ctx, txn)
})
in pkg/internal/client.(*DB).Txn.func1
}
err = fn(ctx, txn)
in pkg/internal/client.(*Txn).exec
txn.SetDebugName("unnamed")
err := txn.exec(ctx, func(ctx context.Context, txn *Txn) error {
return retryable(ctx, txn)
in pkg/internal/client.(*DB).Txn
var key roachpb.Key
err := cb.flowCtx.ClientDB.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
if cb.flowCtx.testingKnobs.RunBeforeBackfillChunk != nil {
in pkg/sql/distsqlrun.(*columnBackfiller).runChunk
var err error
sp.Key, err = b.chunks.runChunk(ctx, mutations, sp, chunkSize, b.spec.ReadAsOf)
if err != nil {
in pkg/sql/distsqlrun.(*backfiller).mainLoop
if err := b.mainLoop(ctx); err != nil {
b.output.Push(nil /* row */, &ProducerMetadata{Err: err})
in pkg/sql/distsqlrun.(*backfiller).Run
go func(i int) {
f.processors[i].Run(ctx)
f.waitGroup.Done()
in pkg/sql/distsqlrun.(*Flow).startInternal.func1

pkg/sql/row/fetcher.go in pkg/sql/row.(*Fetcher).finalizeRow at line 1336
pkg/sql/row/fetcher.go in pkg/sql/row.(*Fetcher).NextRow at line 1078
pkg/sql/row/fetcher.go in pkg/sql/row.(*Fetcher).NextRowDecoded at line 1097
pkg/sql/backfill/backfill.go in pkg/sql/backfill.(*ColumnBackfiller).RunColumnBackfillChunk at line 219
pkg/sql/distsqlrun/columnbackfiller.go in pkg/sql/distsqlrun.(*columnBackfiller).runChunk.func1 at line 102
pkg/internal/client/db.go in pkg/internal/client.(*DB).Txn.func1 at line 598
pkg/internal/client/txn.go in pkg/internal/client.(*Txn).exec at line 688
pkg/internal/client/db.go in pkg/internal/client.(*DB).Txn at line 597
pkg/sql/distsqlrun/columnbackfiller.go in pkg/sql/distsqlrun.(*columnBackfiller).runChunk at line 90
pkg/sql/distsqlrun/backfiller.go in pkg/sql/distsqlrun.(*backfiller).mainLoop at line 142
pkg/sql/distsqlrun/backfiller.go in pkg/sql/distsqlrun.(*backfiller).Run at line 86
pkg/sql/distsqlrun/flow.go in pkg/sql/distsqlrun.(*Flow).startInternal.func1 at line 566
Tag Value
Cockroach Release v19.1.2
Cockroach SHA: cbd571c
Platform linux amd64
Distribution CCL
Environment v19.1.2
Command server
Go Version go1.11.6
# of CPUs 12
# of Goroutines 284
@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Jun 30, 2019
@thoszhang
Copy link
Contributor

This is similar to #38292 (which originates from samplerProcessor, but the error is the same). It's from a 19.1.2 cluster, so unrelated to the SET NOT NULL change.

Unless I'm missing something, it's not even clear that this has anything to do with the column backfiller, since the column backfiller was just reading the existing rows in the primary key span when it found a null in an existing non-nullable column. I don't think we have enough information to take any action.

@ghost
Copy link

ghost commented Sep 11, 2019

I can confirm it (v19.1.4):

pq: internal error: Non-nullable column "name" with no value! Index scanned was "primary" with the index key columns (id) and the values ()
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:1336: in finalizeRow()
github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:1078: in NextRow()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/tablereader.go:165: in Next()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/tablereader.go:294: in Next()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/base.go:174: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:801: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:566: in func1()

@jordanlewis
Copy link
Member

@Dendon, do you have any schema changes running? Is this a reproducible problem? Can you show us the query that was running that caused this issue, and the schema of the tables it was interacting with? Thank you!

@ghost
Copy link

ghost commented Sep 11, 2019

I recently added 3 nodes which are still getting populated, but no schema changes running.
Well, it happened on the production system, so if you mean if I can get the error message to appear again, then yes, every query that relies on the primary key now fails on every table.

SELECT * FROM "table" WHERE "name" = 'name'; -> fails
while
SELECT * FROM "table"@"some_index_other_than_primary" WHERE "name" = 'name'; -> works

This represents the same structure, though is of course stripped down and renamed.

CREATE TABLE "db"."table"
(
  "id" SERIAL PRIMARY KEY
);

ALTER TABLE "db"."table" ADD COLUMN "name" STRING NOT NULL DEFAULT '' CREATE IF NOT EXISTS FAMILY "name";

@thoszhang
Copy link
Contributor

thoszhang commented Sep 11, 2019

This seems to confirm that something else is causing nulls to be written to the primary key index, and that both the column backfiller and the select statements (as well as #38292) are just causing the problem to surface.

I don't have any ideas about this bug itself, but maybe we should start tracking these occurrences in an issue for SQL execution (since this probably isn't a Bulk IO bug).

@jordanlewis
Copy link
Member

@Dendon so you're saying this cluster has been around for a long time but only just started seeing the problems after you started adding nodes?

@ghost
Copy link

ghost commented Sep 11, 2019

Correct.
The cluster is around as is for about 7 months without a node added or removed (just updated) and the problem started to arise a few hours after the nodes were added.

@jordanlewis
Copy link
Member

Is the problem localized to a particular primary key? As a workaround, you should be able to alter the table to remove the not null constraint, delete and re-add the key. Would you be able to send us a debug.zip of your cluster?

@nvanbenschoten
Copy link
Member

@Dendon is this a problem with any query on the primary key or just one specific row?

@ghost
Copy link

ghost commented Sep 11, 2019

@jordanlewis It seems to spread based on what tables got balanced yet and which haven't been yet.
Wait, what key? You mean the indexes? Like I said, if any index is used other than primary, it's fine.
It's currently not really a problem, since every query run by the software also defines what index to use.

I created a debug.zip to see what it contains and noticed already quite a few things going wrong with the balancing for some reason, so it might be wise to delay it until the balancing is over and see what problems will stick around.

@nvanbenschoten every query.

@ghost
Copy link

ghost commented Sep 12, 2019

Not sure if it's related, but I just tried to add an index and it failed with the message:

pq: 13000743 entries, expected 13000742 violates unique constraint "name_id"

When checking the affected table, it seems that there is one row which is completely NULLed expect the id column.
Also interesting is: violates unique constraint, this was meant to be a normal index, not a restraint.

I also went and dropped the NOT NULL restraint, so running queries on primary works again.

EDIT: I checked a backup file, and the id of the NULLed row was indeed used before.
EDIT 2: It's getting stranger. When checking the table for a NULL column, the id mentioned before shows as NULLed, however when checking the table exactly for the id, the row is fine and not NULLed at all.

SELECT * FROM db.table WHERE name IS NULL;

          id         | name
+--------------------+------+
  123                | NULL
SELECT * FROM db.table WHERE id = 123;
          id         | name
+--------------------+-------+
  123                | test

@jordanlewis
Copy link
Member

The last thing is caused by the index being inconsistent - same problem as earlier. You could drop the index and rebuild it, or delete and readd the key.

This being said it's certainly concerning. Would you be comfortable sharing data dumps with us? If not, a debug.zip would be useful.

@irfansharif
Copy link
Contributor

I realize this was left open for the SQL team to address, but this is almost certainly fixed by #42833. So I'm going to go ahead and close it (it'll still be searchable should anyone run into the same).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

No branches or pull requests

6 participants