Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: panic: runtime error: index out of range #39540

Closed
maddyblue opened this issue Aug 11, 2019 · 5 comments · Fixed by #40335
Closed

sql: panic: runtime error: index out of range #39540

maddyblue opened this issue Aug 11, 2019 · 5 comments · Fixed by #40335
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sqlsmith S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@maddyblue
Copy link
Contributor

CREATE TABLE IF NOT EXISTS tab_orig AS
	SELECT
		g % 2 = 0 AS _bool, g::DECIMAL AS _decimal
	FROM
		generate_series(0, 5) AS g;

SET vectorize = experimental_on;

SELECT
	tab_426212._decimal - tab_426216._decimal
FROM
	tab_orig AS tab_426212,
	tab_orig AS tab_426214,
	tab_orig
	RIGHT JOIN tab_orig AS tab_426216 ON true
ORDER BY
	tab_426214._bool ASC;
panic: runtime error: index out of range [recovered]
	panic: panic while executing 1 statements: SELECT _._ - _._ FROM _ AS _, _ AS _, _ RIGHT JOIN _ AS _ ON _ ORDER BY _._ ASC; caused by runtime error: index out of range

goroutine 554 [running]:
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).closeWrapper(0xc001877b00, 0x4599760, 0xc00191c700, 0x38922e0, 0x6656f00)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:727 +0x330
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1(0xc001877b00, 0x4599760, 0xc00191c700)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:440 +0x61
panic(0x38922e0, 0x6656f00)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
math/big.nat.itoa(0xc000398b20, 0x1, 0x1, 0xc0004c2a00, 0xa, 0x0, 0xc0003660a0, 0x2)
	/usr/local/go/src/math/big/natconv.go:332 +0x562
math/big.(*Int).Text(...)
	/usr/local/go/src/math/big/intconv.go:24
math/big.(*Int).String(...)
	/usr/local/go/src/math/big/intconv.go:37
github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd.(*Decimal).Append(0xc00110a000, 0xc0003660c0, 0x1, 0xa, 0x47, 0x0, 0x2, 0xa)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd/format.go:60 +0x1d3
github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd.(*Decimal).Text(...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd/format.go:30
github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd.(*Decimal).String(...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd/format.go:36
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*DDecimal).Format(0xc00110a000, 0xc001924240)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/datum.go:1012 +0xcd
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*FmtCtx).formatNodeOrHideConstants(0xc001924240, 0x45253a0, 0xc00110a000)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/hide_constants.go:43 +0xab
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*FmtCtx).FormatNode(0xc001924240, 0x45253a0, 0xc00110a000)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/format.go:345 +0x95
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*writeBuffer).writeLengthPrefixedDatum(0xc001a087c0, 0x45fa120, 0xc00110a000)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/write_buffer.go:121 +0x78
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*writeBuffer).writeTextDatum(0xc001a087c0, 0x4599820, 0xc0016c8b40, 0x45fa120, 0xc00110a000, 0x6679a60, 0x0, 0x2)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/types.go:123 +0xee0
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).bufferRow(0xc001a08000, 0x4599820, 0xc0016c8b40, 0xc00162c000, 0x1, 0x1, 0x0, 0x0, 0x0, 0x6679a60, ...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:1095 +0x315
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*commandResult).AddRow(0xc001a08668, 0x4599820, 0xc0016c8b40, 0xc00162c000, 0x1, 0x1, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/command_result.go:179 +0x1b4
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLReceiver).Push(0xc0015d6000, 0xc0017e2ae0, 0x1, 0x1, 0x0, 0xc000000000)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:672 +0xa67
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.Run(0x4599760, 0xc001596880, 0x45ac120, 0xc000b8f100, 0x45742a0, 0xc0015d6000)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/base.go:174 +0x5e
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*ProcessorBase).Run(0xc000b8f100, 0x4599760, 0xc001596880)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:793 +0x92
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*Flow).Run(0xc001628a20, 0x4599820, 0xc0017e2480, 0x3e55f80, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:584 +0x1ec
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc0002883c0, 0xc001599620, 0xc000c505a0, 0xc001aac870, 0xc0015d6000, 0xc001877f28, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:333 +0x381
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun(0xc0002883c0, 0x4599820, 0xc0016c8b40, 0xc001877f28, 0xc001599620, 0xc000c505a0, 0x459b5e0, 0xc0016287e0, 0xc0015d6000, 0xc001877cd8)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:942 +0x20a
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine(0xc001877b00, 0x4599820, 0xc0016c8b40, 0xc001877e50, 0x3, 0x7f1b0bd7d4c8, 0xc001a08668, 0xc001aaca01, 0x0, 0x0, ...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:812 +0x370
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc001877b00, 0x4599820, 0xc0016c8b40, 0xc001877e50, 0x7f1b0bd7d4c8, 0xc001a08668, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:700 +0x75b
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc001877b00, 0x4599820, 0xc0016c8b40, 0x45a5a20, 0xc00102fe00, 0xc0017a7533, 0xbc, 0x0, 0x4, 0x0, ...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:416 +0xb64
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc001877b00, 0x4599820, 0xc0016c8b40, 0x45a5a20, 0xc00102fe00, 0xc0017a7533, 0xbc, 0x0, 0x4, 0x0, ...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:98 +0x4ec
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd(0xc001877b00, 0x4599760, 0xc00102fe40, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1211 +0x1aba
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc001877b00, 0x4599760, 0xc00191c700, 0xc000500e80, 0x5400, 0x15000, 0xc000500f18, 0xc0018ee6f0, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1140 +0x1a3
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn(0xc000832d80, 0x4599760, 0xc00191c700, 0xc001877b00, 0x5400, 0x15000, 0xc000500f18, 0xc0018ee6f0, 0x0, 0x0)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:442 +0xce
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1(0xc00186120f, 0xc0018e94e0, 0x4599760, 0xc00191c700, 0xc0018ee6f0, 0xc000832d80, 0xc001a08000, 0x45a3fa0, 0xc0018e94c0, 0xc001924300, ...)
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:584 +0x21c
created by github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync
	/home/mjibson/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:513 +0x17b
@maddyblue maddyblue added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sqlsmith labels Aug 11, 2019
@yuzefovich yuzefovich self-assigned this Aug 15, 2019
@yuzefovich
Copy link
Member

This panic is very weird. I've tried a few things to pin-point the cause to no success.

Some observations:

  • if we reduce the size of tab_orig to less than 6 rows, then the query succeeds. I believe number 6 here is important because the query "raises the cardinality of the table into the fourth power", and 6 is the smallest integer i such that i^4 > 1024. I increased coldata.BatchSize to 10240, and the query succeeds.
  • if we remove the ORDER BY clause, the query succeeds
  • if we modify the projection so that either there is no subtraction of decimals happens or one of the sides is constant, the query succeeds.

@jordanlewis jordanlewis added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Aug 19, 2019
@jordanlewis
Copy link
Member

So the root cause panic here is actually that we're trying to Stringify an apd.Decimal that has odd contents. The decimal looks like this:

d = {*github.com/cockroachdb/cockroach/pkg/sql/sem/tree.DDecimal | 0xc0015b8600} 
 Decimal = {github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd.Decimal} 
  Form = {github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/apd.Form} Finite (0)
  Negative = {bool} true
  Exponent = {int32} 0
  Coeff = {math/big.Int} 
   neg = {bool} false
   abs = {math/big.nat} len:1, cap:1
    0 = {math/big.Word} 0

I believe a big.nat that is merely a single 0 is invalid.

This all begs the question - how did we manage to get ourselves into a situation where we have an invalid big.nat? Since big.nat is a fully private struct, I don't think any of us have control over what goes into it. Unless we (apd.Decimal or exec) are misusing the API somehow?

In particular, I see one big difference between how non-vectorized uses apd.Decimal and how vectorized does. Here's the implementation of Sub for non-vectorized:

				l := &left.(*DDecimal).Decimal
				r := &right.(*DDecimal).Decimal
				dd := &DDecimal{}
				_, err := ExactCtx.Sub(&dd.Decimal, l, r)
				return dd, err

And here's the vectorized equivalent:

			arg1 := col1[int(i)]
			arg2 := col2[int(i)]
			if _, err := tree.DecimalCtx.Sub(&projCol[i], &arg1, &arg2); err != nil {
				execerror.NonVectorizedPanic(err)
			}

Note that in non-vectorized we always allocate a fresh decimal. In vectorized we hoped to not have to do that - and as a result we pass a "garbage" value (whatever used to be in projCol[i], which could be a zero value apd.Decimal or some random other apd.Decimal) as the first argument to tree.DecimalCtx.Sub.

I thought that this was valid because of the godoc of Sub:

// Sub sets d to the difference x-y.
func (c *Context) Sub(d, x, y *Decimal) (Condition, error) {
	return c.add(d, x, y, true)
}

Is this actually invalid? Are you not allowed to pass a non-zero value as the first argument of Sub? Should vectorized be clearing the result of projections before running the projection in the case of decimals?

@mjibson, I believe you might have an idea of the expected contract here?

@jordanlewis
Copy link
Member

As a datapoint, if I change the vectorized implementation to zero it, there's no panic.

			arg1 := col1[i]
			arg2 := col2[i]
			projCol[i] = apd.Decimal{}
			if _, err := tree.DecimalCtx.Sub(&projCol[i], &arg1, &arg2); err != nil {
				execerror.NonVectorizedPanic(err)
			}

@jordanlewis
Copy link
Member

jordanlewis commented Aug 20, 2019

Ah, I think the problem here is actually our implementation of copying decimals from one vector to another. We perform a shallow copy, which is something that is explicitly banned by big.Int and by proxy apd.Decimal. Instead, it looks like we have to call Set on the destination with the source as the argument.

@maddyblue
Copy link
Contributor Author

Good find. Also, you should be using the same apd context (ExactCtx) instead of DecimalCtx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sqlsmith S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants