Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/random-load failed [relation "[1285]" does not exist] #129857

Closed
cockroach-teamcity opened this issue Aug 29, 2024 · 5 comments
Assignees
Labels
branch-release-24.2.1-rc C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 29, 2024

roachtest.schemachange/random-load failed with artifacts on release-24.2.1-rc @ 41ff1f4fcbf6cd934b49c24a47a98be1f909e3e3:

(schemachange_random_load.go:123).runSchemaChangeRandomLoad: full command output in run_112042.985480706_n1_workload-run-schemac.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/schemachange/random-load/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-41738

@cockroach-teamcity cockroach-teamcity added branch-release-24.2.1-rc C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Aug 29, 2024
@rafiss rafiss changed the title roachtest: schemachange/random-load failed roachtest: schemachange/random-load failed [relation "[1285]" does not exist] Aug 30, 2024
@rafiss rafiss removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 30, 2024
@rafiss
Copy link
Collaborator

rafiss commented Aug 30, 2024

E240829 11:29:21.633261 1 workload/cli/run.go:591  [-] 3  ***UNEXPECTED ERROR; Failed to generate a random operation: ERROR: relation "[1285]" does not exist (SQLSTATE 42P01)

similar to #129187

@fqazi
Copy link
Collaborator

fqazi commented Sep 4, 2024

The scenario is 3 schema change operations in a single txn:

I240829 11:29:18.707951 4043 sql/table.go:122 ⋮ [T1,Vsystem,n1,client=10.142.2.26:45390,hostssl,user=‹roachprod›] 13542  queued new database schema change job 998967078774177793 for database 104
I240829 11:29:19.058891 4043 sql/table.go:210 ⋮ [T1,Vsystem,n1,client=10.142.2.26:45390,hostssl,user=‹roachprod›] 13543  queued new schema-change job 998967079924137985 for table 1197, mutation 2
I240829 11:29:19.188120 4043 sql/table.go:210 ⋮ [T1,Vsystem,n1,client=10.142.2.26:45390,hostssl,user=‹roachprod›] 13544  queued new schema-change job 998967080347598849 for table 1285, mutation 0

The first can be ignored, the other two are adding a foreign key and dropping a table within a single transaction. I think thats how we hit this error. Lets see if we can manually repro, otherwise see if it reoccurs.

@annrpom
Copy link
Contributor

annrpom commented Sep 9, 2024

maybe this?

[email protected]:26257/movr> create table t (i int);                           
CREATE TABLE

Time: 7ms total (execution 5ms / network 2ms)

[email protected]:26257/movr> create table k(j int unique);                     
CREATE TABLE

Time: 5ms total (execution 4ms / network 0ms)

[email protected]:26257/movr> begin;                                            
BEGIN

Time: 0ms total (execution 1ms / network 0ms)

[email protected]:26257/movr  OPEN> alter table t add constraint fk_i foreign   
                              -> key (i) references k (j);                   
ALTER TABLE

Time: 8ms total (execution 8ms / network 0ms)

[email protected]:26257/movr  OPEN> drop table k;                               
DROP TABLE

Time: 3ms total (execution 3ms / network 0ms)

[email protected]:26257/movr  OPEN> commit;                                     
*
* ERROR: Queued as error a240fe3d90e04cea9686244a2a223abc
*
ERROR: transaction committed but schema change aborted with error: (XX000): relation "t" (112): referenced table "k" (113) is dropped
SQLSTATE: XXA00
HINT: You have encountered an unexpected error.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.

--
Some of the non-DDL statements may have committed successfully, but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
--
See: https://go.crdb.dev/issue-v/42061/v24.2

@annrpom
Copy link
Contributor

annrpom commented Sep 9, 2024

Actually ignore that; it looks like it was the other way around (drop table, add FK constraint)

1285/crdb_internal.jobs.txt:998967079924137985	SCHEMA CHANGE	"ALTER TABLE schemachange.schema_w16_230.table_w4_288 ADD CONSTRAINT \""table_w13_262_col26\r2_w13_268_table_w4_288_col288_w4_291_fk\"" FOREIGN KEY (col288_w4_291) REFERENCES schemachange.schema_w10_273.table_w13_262 (\""col26\r2_w13_268\"") ON DELETE CASCADE ON UPDATE CASCADE"		roachprod	{1197}	failed	NULL	2024-08-29 11:29:15.74048+00	2024-08-29 11:29:19.756021+00	2024-08-29 11:29:22.853533+00	2024-08-29 11:29:21.550038+00	0	NULL	"relation ""table_w4_288"" (1197): referenced table ""table_w13_262"" (1285) is dropped"	NULL	1750563244413838811	2024-08-29 11:29:20.590119+00	2024-08-29 11:29:50.590119+00	1	NULL	NULL

So while we were trying to generate the above ^ in the workload, the table we wanted to reference table_w13_262 (1285) got dropped and the following query that we were crafting failed with our relation [1285] does not exist error:

1285/nodes/1/crdb_internal.node_execution_insights.txt:17f02f196e36c8520000000000000001	dda6748b-b2c4-4c22-8125-5471f56de0bd	\xb64b14b31b778d4f	17f02f918b105d4e0000000000000001	\x8858c9901424f376	FailedExecution	{}	SELECT table_schema, table_name, column_name, crdb_sql_type, is_nullable FROM (SELECT table_schema, table_name, column_name, crdb_sql_type, is_nullable, contype, conkey FROM (SELECT table_schema, table_name, column_name, crdb_sql_type, is_nullable, ordinal_position, concat(table_schema, _, table_name)::REGCLASS::INT8 AS tableid FROM information_schema.columns WHERE column_name != _) AS cols JOIN (SELECT contype, conkey, conrelid FROM pg_catalog.pg_constraint) AS cons ON cons.conrelid = cols.tableid WHERE (((table_name SIMILAR TO _) AND ((contype = _) OR (contype = _))) AND (array_length(conkey, _) = _)) AND (conkey[_] = ordinal_position) ORDER BY random() LIMIT _)	Failed	2024-08-29 11:29:19.432994	2024-08-29 11:29:21.498564	f	roachprod	schemachange	schemachange	AgGM/v//nxoAANyCgICAgCAAAAADBw4B4Pz//58aAACQghAAAAADBwQJAAICAAAHDBgFCgYK	0	0	normal	0	NULL	{}	{}	NULL	{}	42P01	"relation ‹""[1285]""› does not exist"

from:

// randParentColumnForFkRelation fetches a column and table to use as the parent in a single-column foreign key relation.
// To successfully use a column as the parent, the column must be unique and must not be generated.
func (og *operationGenerator) randParentColumnForFkRelation(
ctx context.Context, tx pgx.Tx, unique bool,
) (*tree.TableName, *column, error) {
if err := og.setSeedInDB(ctx, tx); err != nil {
return nil, nil, err
}
subQuery := strings.Builder{}
subQuery.WriteString(`
SELECT table_schema, table_name, column_name, crdb_sql_type, is_nullable, contype, conkey
FROM (
SELECT table_schema, table_name, column_name, crdb_sql_type, is_nullable, ordinal_position,
concat(table_schema, '.', table_name)::REGCLASS::INT8 AS tableid
FROM information_schema.columns
WHERE column_name <> 'rowid'
) AS cols
JOIN (
SELECT contype, conkey, conrelid
FROM pg_catalog.pg_constraint
) AS cons ON cons.conrelid = cols.tableid
WHERE table_name SIMILAR TO 'table_w[0-9]_+%'
`)

@rafiss
Copy link
Collaborator

rafiss commented Oct 2, 2024

closing since 24.2.1 has been released

@rafiss rafiss closed this as completed Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.2.1-rc C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

4 participants