Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: table creation freeze on version 2.0-beta #23718

Closed
thibaultmeyer opened this issue Mar 12, 2018 · 11 comments
Closed

sql: table creation freeze on version 2.0-beta #23718

thibaultmeyer opened this issue Mar 12, 2018 · 11 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@thibaultmeyer
Copy link

Description
I'm using Ebean ORM with Ebean migration (Java / JDBC). Everything works on CockroachDB 1.1.5 but with 2.0-beta, the process hang (freeze) on the creation of the table "db_migration".

During this "freeze", the instruction "SHOW TABLES;" hand too. If I kill the Java application, the currently blocked instruction "SHOW TABLES;" return results.

Ebean create the table with this SQL instruction:

CREATE TABLE db_migration (
    id INTEGER NOT NULL,
    mtype VARCHAR(1) NOT NULL,
    mstatus VARCHAR(10) NOT NULL,
    mversion VARCHAR(150) NOT NULL,
    mcomment VARCHAR(150) NOT NULL,
    mchecksum INTEGER NOT NULL,
    run_on TIMESTAMP NOT NULL,
    run_by VARCHAR(30) NOT NULL,
    run_time INTEGER NOT NULL,

    CONSTRAINT pk_db_migration PRIMARY KEY (id)
);

CockroachDB version

Build Tag:    v2.0-beta.20180305
Build Time:   2018/03/05 15:32:39
Distribution: CCL
Platform:     darwin amd64
Go Version:   go1.9.4
C Compiler:   4.2.1 Compatible Clang 3.8.0 (tags/RELEASE_380/final)
Build SHA-1:  59246c8375fcffeb24b87ff2f40ed81eb84c070d
Build Type:   release

Output from Java application

[info] c.z.h.HikariDataSource - HikariPool-1 - Starting...
[debug] c.z.h.p.HikariPool - HikariPool-1 - Added connection org.postgresql.jdbc.PgConnection@76c5e806
[info] c.z.h.HikariDataSource - HikariPool-1 - Start completed.
[info] p.a.d.DefaultDBApi - Database [default] connected at jdbc:postgresql://127.0.0.1:26257/documentation?autoReconnect=true&ApplicationName=documentation
[info] i.e.c.p.LoadContext - loaded properties from [ebean.properties]
[debug] c.z.h.p.HikariPool - HikariPool-1 - Before cleanup stats (total=1, active=0, idle=1, waiting=0)
[debug] c.z.h.p.HikariPool - HikariPool-1 - After cleanup  stats (total=1, active=0, idle=1, waiting=0)
[debug] c.z.h.p.HikariPool - HikariPool-1 - Added connection org.postgresql.jdbc.PgConnection@2793a2e8
[debug] c.z.h.p.HikariPool - HikariPool-1 - After adding stats (total=2, active=0, idle=2, waiting=0)
[info] i.e.EbeanVersion - ebean version: 11.14.3
[debug] c.z.h.p.ProxyConnection - HikariPool-1 - Executed rollback on connection org.postgresql.jdbc.PgConnection@76c5e806 due to dirty commit state on close().
[debug] i.e.s.d.BeanDescriptorManager - Entities[5]
[debug] i.e.d.DbOffline - reset
[debug] i.e.m.r.LocalMigrationResources - resources: [seeds/default-dev/R__1.sql]
[info] io.ebean.DDL - Executing create migration table - 1 statements
[debug] io.ebean.DDL - executing 1 of 1 create table db_migration ( id integer not null, mtype...
// HANG / FREEZE

Output from CockroachDB

I180312 12:37:00.155872 374 sql/exec_log.go:173  [n1,client=127.0.0.1:51280,user=root] exec "documentation" {} "SELECT version() AS version" {} 0.597 1 ""
I180312 12:37:00.160290 374 sql/exec_log.go:173  [n1,client=127.0.0.1:51280,user=root] exec "documentation" {} "SELECT current_schema()" {} 0.170 1 ""
I180312 12:37:00.167679 374 sql/exec_log.go:173  [n1,client=127.0.0.1:51280,user=root] exec "documentation" {} "SELECT NULL AS table_cat, n.nspname AS table_schem, c.relname AS table_name, CASE (n.nspname ~ '^pg_') OR (n.nspname = 'information_schema') WHEN true THEN CASE WHEN (n.nspname = 'pg_catalog') OR (n.nspname = 'information_schema') THEN CASE c.relkind WHEN 'r' THEN 'SYSTEM TABLE' WHEN 'v' THEN 'SYSTEM VIEW' WHEN 'i' THEN 'SYSTEM INDEX' ELSE NULL END WHEN n.nspname = 'pg_toast' THEN CASE c.relkind WHEN 'r' THEN 'SYSTEM TOAST TABLE' WHEN 'i' THEN 'SYSTEM TOAST INDEX' ELSE NULL END ELSE CASE c.relkind WHEN 'r' THEN 'TEMPORARY TABLE' WHEN 'p' THEN 'TEMPORARY TABLE' WHEN 'i' THEN 'TEMPORARY INDEX' WHEN 'S' THEN 'TEMPORARY SEQUENCE' WHEN 'v' THEN 'TEMPORARY VIEW' ELSE NULL END END WHEN false THEN CASE c.relkind WHEN 'r' THEN 'TABLE' WHEN 'p' THEN 'TABLE' WHEN 'i' THEN 'INDEX' WHEN 'S' THEN 'SEQUENCE' WHEN 'v' THEN 'VIEW' WHEN 'c' THEN 'TYPE' WHEN 'f' THEN 'FOREIGN TABLE' WHEN 'm' THEN 'MATERIALIZED VIEW' ELSE NULL END ELSE NULL END AS table_type, d.description AS remarks FROM pg_catalog.pg_namespace AS n, pg_catalog.pg_class AS c LEFT JOIN pg_catalog.pg_description AS d ON ((c.oid = d.objoid) AND (d.objsubid = 0)) LEFT JOIN pg_catalog.pg_class AS dc ON ((d.classoid = dc.oid) AND (dc.relname = 'pg_class')) LEFT JOIN pg_catalog.pg_namespace AS dn ON ((dn.oid = dc.relnamespace) AND (dn.nspname = 'pg_catalog')) WHERE ((c.relnamespace = n.oid) AND (n.nspname LIKE 'public')) AND (c.relname LIKE 'db_migration') ORDER BY table_type, table_schem, table_name" {} 5.444 0 ""
I180312 12:37:00.180320 374 sql/exec_log.go:173  [n1,client=127.0.0.1:51280,user=root] internal-exec "" {} "INSERT INTO system.public.eventlog(\"timestamp\", \"eventType\", \"targetID\", \"reportingID\", info) VALUES (now(), $1, $2, $3, $4)" {$1:"'create_table'", $2:"69", $3:"1", $4:"'{\"TableName\":\"documentation.public.db_migration\",\"Statement\":\"CREATE TABLE db_migration (id INTEGER NOT NULL, mtype VARCHAR(1) NOT NULL, mstatus VARCHAR(10) NOT NULL, mversion VARCHAR(150) NOT NULL, mcomment VARCHAR(150) NOT NULL, mchecksum INTEGER NOT NULL, run_on TIMESTAMP NOT NULL, run_by VARCHAR(30) NOT NULL, run_time INTEGER NOT NULL, CONSTRAINT pk_db_migration PRIMARY KEY (id))\",\"User\":\"root\"}'"} 2.577 1 ""
I180312 12:37:00.180446 374 sql/exec_log.go:173  [n1,client=127.0.0.1:51280,user=root] exec "documentation" {} "CREATE TABLE db_migration (id INTEGER NOT NULL, mtype VARCHAR(1) NOT NULL, mstatus VARCHAR(10) NOT NULL, mversion VARCHAR(150) NOT NULL, mcomment VARCHAR(150) NOT NULL, mchecksum INTEGER NOT NULL, run_on TIMESTAMP NOT NULL, run_by VARCHAR(30) NOT NULL, run_time INTEGER NOT NULL, CONSTRAINT pk_db_migration PRIMARY KEY (id))" {} 6.210 0 ""
I180312 12:37:00.491223 215 sql/exec_log.go:173  [n1] internal-exec "" {} "SELECT count(version) FROM system.public.lease WHERE ((\"descID\" = $1) AND (version = $2)) AND (expiration > $3)" {$1:"68", $2:"2", $3:"'2018-03-12 12:37:00.490083+00:00'"} 1.086 1 ""
I180312 12:37:09.212135 212 server/status/runtime.go:219  [n1] runtime stats: 68 MiB RSS, 157 goroutines, 74 MiB/31 MiB/122 MiB GO alloc/idle/total, 16 MiB/23 MiB CGO alloc/total, 122.70cgo/sec, 0.01/0.01 %(u/s)time, 0.00 %gc (1x)
I180312 12:37:19.212465 212 server/status/runtime.go:219  [n1] runtime stats: 71 MiB RSS, 157 goroutines, 81 MiB/25 MiB/122 MiB GO alloc/idle/total, 16 MiB/23 MiB CGO alloc/total, 81.20cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I180312 12:37:29.110967 239 sql/exec_log.go:173  [n1,client=127.0.0.1:51284,user=root] exec "" {} "SET extra_float_digits = 3" {} 0.184 0 ""
I180312 12:37:29.111435 239 sql/exec_log.go:173  [n1,client=127.0.0.1:51284,user=root] exec "documentation" {} "SET application_name = 'documentation'" {} 0.119 0 ""
I180312 12:37:29.112396 239 sql/exec_log.go:173  [n1,client=127.0.0.1:51284,user=root] exec "documentation" {} "SET timezone = 'UTC'" {} 0.251 0 ""
I180312 12:37:29.211535 210 gossip/gossip.go:487  [n1] gossip status (ok, 1 node)
gossip client (0/3 cur/max conns)
gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received)
I180312 12:37:29.211642 212 server/status/runtime.go:219  [n1] runtime stats: 74 MiB RSS, 159 goroutines, 87 MiB/19 MiB/122 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 79.71cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I180312 12:37:29.216558 223 sql/exec_log.go:173  internal-exec "" {} "SELECT id, payload FROM system.public.jobs WHERE status IN ($1, $2) ORDER BY created DESC" {$1:"'pending'", $2:"'running'"} 1.775 0 ""
I180312 12:37:39.211863 212 server/status/runtime.go:219  [n1] runtime stats: 79 MiB RSS, 159 goroutines, 94 MiB/13 MiB/122 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 79.30cgo/sec, 0.01/0.01 %(u/s)time, 0.00 %gc (0x)
I180312 12:37:49.212229 212 server/status/runtime.go:219  [n1] runtime stats: 81 MiB RSS, 159 goroutines, 100 MiB/7.1 MiB/122 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 80.70cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I180312 12:37:59.210760 212 server/status/runtime.go:219  [n1] runtime stats: 85 MiB RSS, 159 goroutines, 107 MiB/696 KiB/122 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 74.21cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I180312 12:37:59.217284 223 sql/exec_log.go:173  internal-exec "" {} "SELECT id, payload FROM system.public.jobs WHERE status IN ($1, $2) ORDER BY created DESC" {$1:"'pending'", $2:"'running'"} 1.020 0 ""
I180312 12:38:09.211651 212 server/status/runtime.go:219  [n1] runtime stats: 90 MiB RSS, 159 goroutines, 113 MiB/384 KiB/128 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 83.29cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I180312 12:38:19.211624 212 server/status/runtime.go:219  [n1] runtime stats: 96 MiB RSS, 159 goroutines, 120 MiB/96 KiB/134 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 73.00cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
W180312 12:38:26.914324 374 internal/client/txn.go:531  [n1,client=127.0.0.1:51280,user=root] failure aborting transaction: HandledRetryableTxnError: TransactionAbortedError: txn aborted "sql txn" id=40e2c4ae key=/Table/SystemConfigSpan/Start rw=true pri=0.01057810 iso=SERIALIZABLE stat=PENDING epo=0 ts=1520858220.155526399,0 orig=1520858220.155526399,0 max=1520858220.655526399,0 wto=false rop=false seq=13; abort caused by: context canceled
I180312 12:38:29.210026 210 gossip/gossip.go:487  [n1] gossip status (ok, 1 node)
gossip client (0/3 cur/max conns)
gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received)
I180312 12:38:29.210128 212 server/status/runtime.go:219  [n1] runtime stats: 101 MiB RSS, 152 goroutines, 126 MiB/872 KiB/142 MiB GO alloc/idle/total, 16 MiB/24 MiB CGO alloc/total, 86.61cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I180312 12:38:29.218543 223 sql/exec_log.go:173  internal-exec "" {} "SELECT id, payload FROM system.public.jobs WHERE status IN ($1, $2) ORDER BY created DESC" {$1:"'pending'", $2:"'running'"} 1.440 0 ""
^CI180312 12:38:31.345681 1 cli/start.go:650  received signal 'interrupt'
Note: a second interrupt will skip graceful shutdown and terminate forcefully
I180312 12:38:31.345859 1 cli/start.go:701  initiating graceful shutdown of server
initiating graceful shutdown of server
I180312 12:38:31.348153 474 sql/exec_log.go:173  internal-exec "" {} "DELETE FROM system.public.lease WHERE (\"descID\", version, \"nodeID\", expiration) = ($1, $2, $3, $4)" {$2:"1", $3:"1", $4:"'2018-03-12 12:43:14.26188+00:00'", $1:"56"} 2.080 1 ""
W180312 12:38:31.351204 22 vendor/google.golang.org/grpc/clientconn.go:1277  grpc: addrConn.transportMonitor exits due to: grpc: the connection is closing
I180312 12:38:31.351269 474 storage/engine/rocksdb.go:674  closing rocksdb instance at "/Users/meyer_t/Downloads/cockroach-data/cockroach-temp010543778"
W180312 12:38:31.351393 243 vendor/google.golang.org/grpc/clientconn.go:1158  grpc: addrConn.createTransport failed to connect to {anansi:26257 0  <nil>}. Err :connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting...
W180312 12:38:31.351560 243 vendor/google.golang.org/grpc/clientconn.go:1277  grpc: addrConn.transportMonitor exits due to: context canceled
I180312 12:38:31.352889 474 storage/engine/rocksdb.go:674  closing rocksdb instance at "/Users/meyer_t/Downloads/cockroach-data"
I180312 12:38:31.354248 1 cli/start.go:755  server drained and shutdown completed
server drained and shutdown completed
I180312 12:38:31.354413 1 cli/error.go:109  interrupted
Error: interrupted
Failed running "start"
@vivekmenezes
Copy link
Contributor

@0xBAADF00D I'm unable to reproduce this issue. Are you running it on a fresh cluster? Can you run the CREATE TABLE from the cockroach CLI?

@thibaultmeyer
Copy link
Author

thibaultmeyer commented Mar 12, 2018

Hello,

I'm running on a fresh node, single node, run with --insecure. I work on my dev computer. Tested with Windows and MacOS version of Cockroach.

The issue only appear on version 2.0-beta and only via Ebean Migration. The strange behavior is, if the table "db_migration" already exists, everything works well, all migration scripts are run with success. Only the table creation freeze.

I could create a sample Java project to test the bahavior if it could help you?

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Mar 12, 2018

We know that a CREATE TABLE that is attempting to create a table that was created by another transaction that has yet to commit can hang indefinitely. Something like

func TestTxnWithCreate(t *testing.T) {
	defer leaktest.AfterTest(t)()
	params, _ := tests.CreateTestServerParams()
	s, sqlDB, _ := serverutils.StartServer(t, params)
	defer s.Stopper().Stop(context.TODO())

	if _, err := sqlDB.Exec(`
CREATE DATABASE t;
CREATE TABLE t.kv (k CHAR PRIMARY KEY, v CHAR);
INSERT INTO t.kv VALUES ('a', 'b');
`); err != nil {
		t.Fatal(err)
	}

	tx, err := sqlDB.Begin()
	if err != nil {
		t.Fatal(err)
	}

// Create table but don't commit transaction
	_, _ = tx.Exec(`CREATE TABLE t.timestamp (k CHAR PRIMARY KEY, v CHAR);`)

	if _, err := sqlDB.Exec(`
		CREATE TABLE t.timestamp (k CHAR PRIMARY KEY, v CHAR);
		`); err != nil {
		t.Fatal(err)
	}

	_ = tx.Rollback()
}

And this has nothing to do with version 2.0

@thibaultmeyer
Copy link
Author

it is surely something else, the same code works with the version 1.1.5 but not with 2.0-beta.

@vivekmenezes vivekmenezes added this to the 2.0 milestone Mar 12, 2018
@vivekmenezes vivekmenezes self-assigned this Mar 12, 2018
@rbygrave
Copy link

rbygrave commented Mar 13, 2018

Note that the code (migration runner) in question is using a single JDBC Connection (not with autoCommit), creating the table and then selecting from that table. I suspect it is actually the select that is hanging and not the create table ... will look to confirm that later today (maybe a question of if commit() is needed after the create table and before the select from that table).

@rbygrave
Copy link

Yes, the hang is on the select query that follows the create table.

  • Using JDBC Connection with autoCommit = false
  • create table db_migration ...
  • select ... from db_migration ... ... this hangs here UNLESS there is a commit after the create table db_migration ...

Now in general (well, other databases) we ideally do not want to run DB Migration scripts with autoCommit = true ... and these scripts contain a mix of DDL and DML.

The question I'd have at the moment is what is the expected limitations with respect to transactions and running DDL. Is it required that there is a COMMIT after DDL statements?

I note that we are:

  • Maybe adding a commit after create table db_migration
  • Currently adding a commit after create index ... as otherwise a following alter table add foreign key fails

It seems that any DML executed on a newly created table in the same transaction (without a commit or autoCommit) hangs.

e.g.

10:50:56.643 [main] DEBUG io.ebean.DDL - executing 1 of 3 create table m3 (id integer, acol varchar(20), bcol timestamp)
10:50:56.653 [main] DEBUG io.ebean.DDL - executing 2 of 3 alter table m1 add column addcol varchar(10)
10:50:56.685 [main] DEBUG io.ebean.DDL - executing 3 of 3 insert into m3 (id, acol) VALUES (1, 'text with ; sign')

The insert into m3 is hanging ... much like the select .. from db_migration hangs without the commit.

If we run everything with autoCommit = true ... (and no explicit connection.commit()) then it all runs without any error.

@vivekmenezes
Copy link
Contributor

There are a few restrictions to running DDL statements within transactions, for instance you cannot DROP and CREATE the same table in the same transactions. But the transactions you point out should work. In particular they seem to be working with the 1.1.5 release as you mention.

I've tried to reproduce the problem with no luck.

Can you send me the exact sequence of statements within the transaction. Thanks

@vivekmenezes
Copy link
Contributor

I've added some examples in the attached PR that work fine on our master branch

@vivekmenezes
Copy link
Contributor

I'd appreciate it if you can send me the exact sequence of sql statements that were run within the transactions. Thanks for all the information so far!

@petermattis petermattis modified the milestones: 2.0, 2.0.x Apr 5, 2018
craig bot pushed a commit that referenced this issue Apr 16, 2018
23816: sql: stop using txn.OrigTimestamp to reset TableCollection r=vivekmenezes a=vivekmenezes

Release note: None

#23718 

24616: ui: upgrade React 15.4 => 16.3 r=vilterp a=vilterp

Fixes #18795

Release note: None

Co-authored-by: Vivek Menezes <[email protected]>
Co-authored-by: Pete Vilter <[email protected]>
craig bot pushed a commit that referenced this issue Apr 18, 2018
24874: backport-2.0: sql: stop using txn.OrigTimestamp to reset TableCollection r=vivekmenezes a=vivekmenezes

Backport 1/1 commits from #23816.

/cc @cockroachdb/release

---

sql: Stop using txn.OrigTimestamp to reset TableCollection
We use the transaction OrigTimestamp to figure out if a
transaction is being retried, triggering a reset of
TableCollection. This logic would get triggered when
running PREPARE within a transaction, because PREPARE
uses it's own transaction and thus has a different
OrigTimestamp.

Removed this logic because transaction retries trigger
resetting the TableCollection in the new conn-executor code.

fixes #24578

Release note sql: fix PREPARE hanging when run in the same transaction
as a CREATE TABLE

possibly  related #23718 

Co-authored-by: Vivek Menezes <[email protected]>
@knz
Copy link
Contributor

knz commented Apr 23, 2018

@vivekmenezes moving this to the bulk io project. Needs to be triaged.

@knz knz added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Apr 27, 2018
@jordanlewis jordanlewis added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 2, 2018
@vivekmenezes
Copy link
Contributor

I'm gonna close this issue. We made a number of improvements here. In particular we now allow running schema changes and DML statements in the same transaction as the CREATE TABLE statement. #24626

@vivekmenezes vivekmenezes removed this from the 2.0.x milestone May 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

No branches or pull requests

7 participants