Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/multitenancy: cascade logictest very slow under 3node-tenant config #53040

Closed
Tracked by #52617
rohany opened this issue Aug 19, 2020 · 7 comments · Fixed by #53142
Closed
Tracked by #52617

sql/multitenancy: cascade logictest very slow under 3node-tenant config #53040

rohany opened this issue Aug 19, 2020 · 7 comments · Fixed by #53142
Assignees
Labels
A-multitenancy Related to multi-tenancy C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@rohany
Copy link
Contributor

rohany commented Aug 19, 2020

This can reproduced with running

make test PKG=./pkg/ccl/logictestccl TESTS=TestTenantLogic/3node-tenant/cascade TESTFLAGS="-show-sql -test.v"

It seems like each statement takes a few seconds to run. This has been causing some flakes on CI as well. This might be related to #52567.

@rohany rohany added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 19, 2020
@asubiotto asubiotto added the A-multitenancy Related to multi-tenancy label Aug 19, 2020
@asubiotto
Copy link
Contributor

Adding "-show-logs" to the TESTFLAGS results in this output for a randomly-chosen drop statement that takes a couple of seconds:

DROP TABLE parent_multi;
I200819 15:57:12.669092 2958 sql/table.go:137  [sql,client=127.0.0.1:56886,hostssl,user=root] queued new schema change job 582556819522987793 for table 57, mutation 0
I200819 15:57:12.840589 2958 sql/table.go:137  [sql,client=127.0.0.1:56886,hostssl,user=root] queued new schema change job 582556820267083537 for table 58, mutation 0
I200819 15:57:13.021360 2958 sql/event_log.go:146  [sql,client=127.0.0.1:56886,hostssl,user=root] Event: "drop_table", target: 58, info: {TableName:test.public.child_multi_1 Statement:DROP TABLE child_multi_1 User:root CascadeDroppedViews:[]}
I200819 15:57:13.021445 2958 jobs/registry.go:297  [sql,client=127.0.0.1:56886,hostssl,user=root] scheduled jobs [582556819522987793 582556820267083537]
I200819 15:57:13.145835 19187 sql/catalog/lease/lease.go:946  new lease: 57("parent_multi") ver=4:1597852941.311780110,0, refcount=0
I200819 15:57:13.433507 2457 jobs/adopt.go:95  job 582556819522987793: resuming execution
I200819 15:57:13.504457 2457 jobs/adopt.go:95  job 582556820267083537: resuming execution
I200819 15:57:13.504574 19470 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556819522987793: stepping through state running with error: <nil>
I200819 15:57:13.542749 19470 sql/schema_changer.go:551  [sql,job=582556819522987793,scExec,table=57,mutation=0] schema change on "parent_multi" (v4) starting execution...
I200819 15:57:13.542798 19470 sql/schema_changer.go:942  [sql,job=582556819522987793,scExec,table=57,mutation=0] waiting for a single version...
I200819 15:57:13.645436 19577 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556820267083537: stepping through state running with error: <nil>
I200819 15:57:13.675884 19577 sql/schema_changer.go:551  [sql,job=582556820267083537,scExec,table=58,mutation=0] schema change on "child_multi_1" (v3) starting execution...
I200819 15:57:13.675935 19577 sql/schema_changer.go:396  [sql,job=582556820267083537,scExec,table=58,mutation=0] draining previous names
I200819 15:57:13.692368 19470 sql/schema_changer.go:944  [sql,job=582556819522987793,scExec,table=57,mutation=0] waiting for a single version... done (at v 4)
I200819 15:57:13.692401 19470 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556819522987793: stepping through state succeeded with error: <nil>
I200819 15:57:14.480930 19577 sql/schema_changer.go:465  [sql,job=582556820267083537,scExec,table=58,mutation=0] starting GC job 582556824717698833
I200819 15:57:14.840893 19577 sql/schema_changer.go:942  [sql,job=582556820267083537,scExec,table=58,mutation=0] waiting for a single version...
I200819 15:57:14.841017 19993 jobs/registry.go:984  [sql] SCHEMA CHANGE GC job 582556824717698833: stepping through state running with error: <nil>
I200819 15:57:14.926352 19577 sql/schema_changer.go:944  [sql,job=582556820267083537,scExec,table=58,mutation=0] waiting for a single version... done (at v 4)
I200819 15:57:14.926404 19577 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556820267083537: stepping through state succeeded with error: <nil>
I200819 15:57:15.878733 2958 sql/table.go:137  [sql,client=127.0.0.1:56886,hostssl,user=root] queued new schema change job 582556830298580753 for table 57, mutation 0
I200819 15:57:16.027268 2958 sql/table.go:137  [sql,client=127.0.0.1:56886,hostssl,user=root] queued new schema change job 582556830786037521 for table 59, mutation 0
I200819 15:57:16.161473 2958 sql/event_log.go:146  [sql,client=127.0.0.1:56886,hostssl,user=root] Event: "drop_table", target: 59, info: {TableName:test.public.child_multi_2 Statement:DROP TABLE child_multi_2 User:root CascadeDroppedViews:[]}
I200819 15:57:16.161557 2958 jobs/registry.go:297  [sql,client=127.0.0.1:56886,hostssl,user=root] scheduled jobs [582556830298580753 582556830786037521]
I200819 15:57:16.270411 21098 sql/catalog/lease/lease.go:946  new lease: 57("parent_multi") ver=5:1597852941.312780110,0, refcount=0
I200819 15:57:16.731598 2457 jobs/adopt.go:95  job 582556830298580753: resuming execution
I200819 15:57:16.808353 2457 jobs/adopt.go:95  job 582556830786037521: resuming execution
I200819 15:57:16.808400 21453 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556830298580753: stepping through state running with error: <nil>
I200819 15:57:16.832983 21453 sql/schema_changer.go:551  [sql,job=582556830298580753,scExec,table=57,mutation=0] schema change on "parent_multi" (v5) starting execution...
I200819 15:57:16.833037 21453 sql/schema_changer.go:942  [sql,job=582556830298580753,scExec,table=57,mutation=0] waiting for a single version...
I200819 15:57:16.942544 21453 sql/schema_changer.go:944  [sql,job=582556830298580753,scExec,table=57,mutation=0] waiting for a single version... done (at v 5)
I200819 15:57:16.942597 21453 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556830298580753: stepping through state succeeded with error: <nil>
I200819 15:57:16.950087 21592 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556830786037521: stepping through state running with error: <nil>
I200819 15:57:16.964345 21592 sql/schema_changer.go:551  [sql,job=582556830786037521,scExec,table=59,mutation=0] schema change on "child_multi_2" (v3) starting execution...
I200819 15:57:16.964383 21592 sql/schema_changer.go:396  [sql,job=582556830786037521,scExec,table=59,mutation=0] draining previous names
I200819 15:57:17.179230 386 server/status/runtime.go:522  [n1] runtime stats: 176 MiB RSS, 1083 goroutines, 73 MiB/31 MiB/134 MiB GO alloc/idle/total, 7.5 MiB/9.1 MiB CGO alloc/total, 1.8 CGO/sec, 22.7/10.1 %(u/s)time, 0.0 %gc (2x), 1.5 MiB/1.5 MiB (r/w)net
W200819 15:57:17.190415 29 server/node.go:748  [n1,summaries] health alerts detected: {Alerts:[{StoreID:1 Category:METRICS Description:ranges.underreplicated Value:36}]}
I200819 15:57:17.246715 1534 server/status/runtime.go:522  [n2] runtime stats: 177 MiB RSS, 1084 goroutines, 83 MiB/22 MiB/134 MiB GO alloc/idle/total, 12 MiB/13 MiB CGO alloc/total, 1.9 CGO/sec, 22.9/10.2 %(u/s)time, 0.0 %gc (2x), 1.5 MiB/1.5 MiB (r/w)net
I200819 15:57:17.290674 1820 server/status/runtime.go:522  [n3] runtime stats: 180 MiB RSS, 1083 goroutines, 93 MiB/14 MiB/136 MiB GO alloc/idle/total, 12 MiB/13 MiB CGO alloc/total, 1.9 CGO/sec, 23.0/10.3 %(u/s)time, 0.0 %gc (2x), 1.5 MiB/1.5 MiB (r/w)net
I200819 15:57:17.612811 21592 sql/schema_changer.go:465  [sql,job=582556830786037521,scExec,table=59,mutation=0] starting GC job 582556835155814161
I200819 15:57:18.184755 21592 sql/schema_changer.go:942  [sql,job=582556830786037521,scExec,table=59,mutation=0] waiting for a single version...
I200819 15:57:18.184931 22312 jobs/registry.go:984  [sql] SCHEMA CHANGE GC job 582556835155814161: stepping through state running with error: <nil>
I200819 15:57:18.292880 21592 sql/schema_changer.go:944  [sql,job=582556830786037521,scExec,table=59,mutation=0] waiting for a single version... done (at v 4)
I200819 15:57:18.292932 21592 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556830786037521: stepping through state succeeded with error: <nil>
I200819 15:57:19.214923 2958 sql/table.go:137  [sql,client=127.0.0.1:56886,hostssl,user=root] queued new schema change job 582556840846337809 for table 57, mutation 0
I200819 15:57:19.395003 2958 sql/event_log.go:146  [sql,client=127.0.0.1:56886,hostssl,user=root] Event: "drop_table", target: 57, info: {TableName:test.public.parent_multi Statement:DROP TABLE parent_multi User:root CascadeDroppedViews:[]}
I200819 15:57:19.395114 2958 jobs/registry.go:297  [sql,client=127.0.0.1:56886,hostssl,user=root] scheduled jobs [582556840846337809]
I200819 15:57:19.856408 2457 jobs/adopt.go:95  job 582556840846337809: resuming execution
I200819 15:57:19.966396 23493 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556840846337809: stepping through state running with error: <nil>
I200819 15:57:19.972538 23493 sql/schema_changer.go:551  [sql,job=582556840846337809,scExec,table=57,mutation=0] schema change on "parent_multi" (v6) starting execution...
I200819 15:57:19.972576 23493 sql/schema_changer.go:396  [sql,job=582556840846337809,scExec,table=57,mutation=0] draining previous names
I200819 15:57:20.184905 23493 sql/schema_changer.go:465  [sql,job=582556840846337809,scExec,table=57,mutation=0] starting GC job 582556844225111825
I200819 15:57:20.535573 23493 sql/schema_changer.go:942  [sql,job=582556840846337809,scExec,table=57,mutation=0] waiting for a single version...
I200819 15:57:20.535701 23924 jobs/registry.go:984  [sql] SCHEMA CHANGE GC job 582556844225111825: stepping through state running with error: <nil>
I200819 15:57:20.667356 23493 sql/schema_changer.go:944  [sql,job=582556840846337809,scExec,table=57,mutation=0] waiting for a single version... done (at v 7)
I200819 15:57:20.667386 23493 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582556840846337809: stepping through state succeeded with error: <nil>

Compared to a run on local with what I think is the same statement:

I200819 16:03:00.862973 1659 sql/table.go:137  [n1,client=127.0.0.1:56918,hostssl,user=root] queued new schema change job 582557960969420801 for table 57, mutation 0
I200819 16:03:00.864571 1659 sql/table.go:137  [n1,client=127.0.0.1:56918,hostssl,user=root] queued new schema change job 582557960974630913 for table 58, mutation 0
I200819 16:03:00.866447 1659 sql/event_log.go:146  [n1,client=127.0.0.1:56918,hostssl,user=root] Event: "drop_table", target: 58, info: {TableName:test.public.child_multi_1 Statement:DROP TABLE child_multi_1 User:root CascadeDroppedViews:[]}
I200819 16:03:00.866489 1659 jobs/registry.go:297  [n1,client=127.0.0.1:56918,hostssl,user=root] scheduled jobs [582557960969420801 582557960974630913]
I200819 16:03:00.867281 3988 sql/catalog/lease/lease.go:946  [n1] new lease: 57("parent_multi") ver=4:1597853284.728589549,0, refcount=0
I200819 16:03:00.875225 464 jobs/adopt.go:95  job 582557960969420801: resuming execution
I200819 16:03:00.875994 464 jobs/adopt.go:95  job 582557960974630913: resuming execution
I200819 16:03:00.876091 4071 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557960969420801: stepping through state running with error: <nil>
I200819 16:03:00.876559 4071 sql/schema_changer.go:551  [n1,job=582557960969420801,scExec,table=57,mutation=0] schema change on "parent_multi" (v4) starting execution...
I200819 16:03:00.876580 4071 sql/schema_changer.go:942  [n1,job=582557960969420801,scExec,table=57,mutation=0] waiting for a single version...
I200819 16:03:00.877210 4075 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557960974630913: stepping through state running with error: <nil>
I200819 16:03:00.877431 4075 sql/schema_changer.go:551  [n1,job=582557960974630913,scExec,table=58,mutation=0] schema change on "child_multi_1" (v3) starting execution...
I200819 16:03:00.877457 4075 sql/schema_changer.go:396  [n1,job=582557960974630913,scExec,table=58,mutation=0] draining previous names
I200819 16:03:00.878071 4071 sql/schema_changer.go:944  [n1,job=582557960969420801,scExec,table=57,mutation=0] waiting for a single version... done (at v 4)
I200819 16:03:00.878103 4071 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557960969420801: stepping through state succeeded with error: <nil>
I200819 16:03:00.883160 4075 sql/schema_changer.go:465  [n1,job=582557960974630913,scExec,table=58,mutation=0] starting GC job 582557961027223553
I200819 16:03:00.884926 4075 sql/schema_changer.go:942  [n1,job=582557960974630913,scExec,table=58,mutation=0] waiting for a single version...
I200819 16:03:00.885047 4199 jobs/registry.go:984  [n1] SCHEMA CHANGE GC job 582557961027223553: stepping through state running with error: <nil>
I200819 16:03:00.885924 4075 sql/schema_changer.go:944  [n1,job=582557960974630913,scExec,table=58,mutation=0] waiting for a single version... done (at v 4)
I200819 16:03:00.885952 4075 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557960974630913: stepping through state succeeded with error: <nil>
I200819 16:03:00.893425 1659 sql/table.go:137  [n1,client=127.0.0.1:56918,hostssl,user=root] queued new schema change job 582557961069297665 for table 57, mutation 0
I200819 16:03:00.895076 1659 sql/table.go:137  [n1,client=127.0.0.1:56918,hostssl,user=root] queued new schema change job 582557961074376705 for table 59, mutation 0
I200819 16:03:00.898147 1659 sql/event_log.go:146  [n1,client=127.0.0.1:56918,hostssl,user=root] Event: "drop_table", target: 59, info: {TableName:test.public.child_multi_2 Statement:DROP TABLE child_multi_2 User:root CascadeDroppedViews:[]}
I200819 16:03:00.898290 1659 jobs/registry.go:297  [n1,client=127.0.0.1:56918,hostssl,user=root] scheduled jobs [582557961069297665 582557961074376705]
I200819 16:03:00.899191 4205 sql/catalog/lease/lease.go:946  [n1] new lease: 57("parent_multi") ver=5:1597853284.729589549,0, refcount=0
I200819 16:03:00.906712 464 jobs/adopt.go:95  job 582557961069297665: resuming execution
I200819 16:03:00.907432 464 jobs/adopt.go:95  job 582557961074376705: resuming execution
I200819 16:03:00.907479 4299 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557961069297665: stepping through state running with error: <nil>
I200819 16:03:00.907690 4299 sql/schema_changer.go:551  [n1,job=582557961069297665,scExec,table=57,mutation=0] schema change on "parent_multi" (v5) starting execution...
I200819 16:03:00.907710 4299 sql/schema_changer.go:942  [n1,job=582557961069297665,scExec,table=57,mutation=0] waiting for a single version...
I200819 16:03:00.908176 4303 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557961074376705: stepping through state running with error: <nil>
I200819 16:03:00.908344 4303 sql/schema_changer.go:551  [n1,job=582557961074376705,scExec,table=59,mutation=0] schema change on "child_multi_2" (v3) starting execution...
I200819 16:03:00.908362 4303 sql/schema_changer.go:396  [n1,job=582557961074376705,scExec,table=59,mutation=0] draining previous names
I200819 16:03:00.908601 4299 sql/schema_changer.go:944  [n1,job=582557961069297665,scExec,table=57,mutation=0] waiting for a single version... done (at v 5)
I200819 16:03:00.908628 4299 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557961069297665: stepping through state succeeded with error: <nil>
I200819 16:03:00.912252 4303 sql/schema_changer.go:465  [n1,job=582557961074376705,scExec,table=59,mutation=0] starting GC job 582557961128443905
I200819 16:03:00.913971 4303 sql/schema_changer.go:942  [n1,job=582557961074376705,scExec,table=59,mutation=0] waiting for a single version...
I200819 16:03:00.914094 4379 jobs/registry.go:984  [n1] SCHEMA CHANGE GC job 582557961128443905: stepping through state running with error: <nil>
I200819 16:03:00.914908 4303 sql/schema_changer.go:944  [n1,job=582557961074376705,scExec,table=59,mutation=0] waiting for a single version... done (at v 4)
I200819 16:03:00.914938 4303 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557961074376705: stepping through state succeeded with error: <nil>
I200819 16:03:00.925331 1659 sql/table.go:137  [n1,client=127.0.0.1:56918,hostssl,user=root] queued new schema change job 582557961173467137 for table 57, mutation 0
I200819 16:03:00.927860 1659 sql/event_log.go:146  [n1,client=127.0.0.1:56918,hostssl,user=root] Event: "drop_table", target: 57, info: {TableName:test.public.parent_multi Statement:DROP TABLE parent_multi User:root CascadeDroppedViews:[]}
I200819 16:03:00.927909 1659 jobs/registry.go:297  [n1,client=127.0.0.1:56918,hostssl,user=root] scheduled jobs [582557961173467137]
I200819 16:03:00.937337 464 jobs/adopt.go:95  job 582557961173467137: resuming execution
I200819 16:03:00.938078 4316 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557961173467137: stepping through state running with error: <nil>
I200819 16:03:00.938229 4316 sql/schema_changer.go:551  [n1,job=582557961173467137,scExec,table=57,mutation=0] schema change on "parent_multi" (v6) starting execution...
I200819 16:03:00.938248 4316 sql/schema_changer.go:396  [n1,job=582557961173467137,scExec,table=57,mutation=0] draining previous names
I200819 16:03:00.940699 4316 sql/schema_changer.go:465  [n1,job=582557961173467137,scExec,table=57,mutation=0] starting GC job 582557961223602177
I200819 16:03:00.942587 4316 sql/schema_changer.go:942  [n1,job=582557961173467137,scExec,table=57,mutation=0] waiting for a single version...
I200819 16:03:00.942631 4516 jobs/registry.go:984  [n1] SCHEMA CHANGE GC job 582557961223602177: stepping through state running with error: <nil>
I200819 16:03:00.943406 4316 sql/schema_changer.go:944  [n1,job=582557961173467137,scExec,table=57,mutation=0] waiting for a single version... done (at v 7)
I200819 16:03:00.943430 4316 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582557961173467137: stepping through state succeeded with error: <nil>

cc @ajwerner do you have an idea of what could be going on?

@asubiotto asubiotto changed the title cascade logictest very slow under 3node-tenant config sql/multitenancy: cascade logictest very slow under 3node-tenant config Aug 19, 2020
@asubiotto
Copy link
Contributor

Smaller repro:

# LogicTest: 3node-tenant

statement ok
CREATE TABLE parent_multi (pa INT, pb INT, pc INT, UNIQUE INDEX (pa,pb,pc));
CREATE TABLE child_multi_1 (
  c INT,
  a INT,
  b INT,
  FOREIGN KEY (a,b,c) REFERENCES parent_multi(pa,pb,pc) ON DELETE CASCADE
);
CREATE TABLE child_multi_2 (
  b INT,
  c INT,
  a INT,
  FOREIGN KEY (a,b,c) REFERENCES parent_multi(pa,pb,pc) ON DELETE CASCADE
)

Both child tables need to be created to notice the slow down since the slow down happens during the child_multi_2 CREATE TABLE statement. Running this test with local runs in 0.2s while with 3node-tenant it jumps up to 2.6s.

Here are the fakedist (normal three-node config) logs:

I200820 11:05:36.820693 93 sql/logictest/logic.go:1173  rewrote:
CREATE TABLE child_multi_2 (b INT8, c INT8, a INT8, FOREIGN KEY (a, b, c) REFERENCES parent_multi (pa, pb, pc) ON DELETE CASCADE, FAMILY (c), FAMILY (b), FAMILY (a));

I200820 11:05:36.824349 2694 sql/table.go:125  [n1,client=127.0.0.1:61088,hostssl,user=root] queued new schema change job 582782605140885505 for table 55, mutation 0
I200820 11:05:36.825973 2694 sql/table.go:125  [n1,client=127.0.0.1:61088,hostssl,user=root] queued new schema change job 582782605146226689 for table 53, mutation 0
I200820 11:05:36.832860 2694 sql/event_log.go:146  [n1,client=127.0.0.1:61088,hostssl,user=root] Event: "create_table", target: 55, info: {TableName:test.public.child_multi_2 Statement:CREATE TABLE child_multi_2 (b INT8, c INT8, a INT8, FOREIGN KEY (a, b, c) REFERENCES parent_multi (pa, pb, pc) ON DELETE CASCADE, FAMILY (c), FAMILY (b), FAMILY (a)) User:root}
I200820 11:05:36.832983 2694 jobs/registry.go:297  [n1,client=127.0.0.1:61088,hostssl,user=root] scheduled jobs [582782605140885505 582782605146226689]
I200820 11:05:36.847413 456 jobs/adopt.go:95  job 582782605140885505: resuming execution
I200820 11:05:36.849388 456 jobs/adopt.go:95  job 582782605146226689: resuming execution
I200820 11:05:36.849805 2346 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582782605140885505: stepping through state running with error: <nil>
I200820 11:05:36.850240 2346 sql/schema_changer.go:547  [n1,job=582782605140885505,scExec,table=55,mutation=0] schema change on "child_multi_2" (v1) starting execution...
I200820 11:05:36.850274 2346 sql/schema_changer.go:356  [n1,job=582782605140885505,scExec,table=55,mutation=0] making table public
I200820 11:05:36.850290 2346 sql/schema_changer.go:938  [n1,job=582782605140885505,scExec,table=55,mutation=0] waiting for a single version...
I200820 11:05:36.851210 2350 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582782605146226689: stepping through state running with error: <nil>
I200820 11:05:36.855420 2346 sql/schema_changer.go:940  [n1,job=582782605140885505,scExec,table=55,mutation=0] waiting for a single version... done (at v 3)
I200820 11:05:36.893393 2350 sql/schema_changer.go:547  [n1,job=582782605146226689,scExec,table=53,mutation=0] schema change on "parent_multi" (v3) starting execution...
I200820 11:05:36.893436 2350 sql/schema_changer.go:938  [n1,job=582782605146226689,scExec,table=53,mutation=0] waiting for a single version...
I200820 11:05:36.894715 2350 sql/schema_changer.go:940  [n1,job=582782605146226689,scExec,table=53,mutation=0] waiting for a single version... done (at v 3)
I200820 11:05:36.894745 2350 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582782605146226689: stepping through state succeeded with error: <nil>
I200820 11:05:36.895400 2346 sql/schema_changer.go:938  [n1,job=582782605140885505,scExec,table=55,mutation=0] waiting for a single version...
I200820 11:05:36.900467 2346 sql/schema_changer.go:940  [n1,job=582782605140885505,scExec,table=55,mutation=0] waiting for a single version... done (at v 2)
I200820 11:05:36.900541 2346 jobs/registry.go:984  [n1] SCHEMA CHANGE job 582782605140885505: stepping through state succeeded with error: <nil>

Here are the 3node-tenant logs:

I200820 11:03:57.405197 100 sql/logictest/logic.go:1173  rewrote:
CREATE TABLE child_multi_2 (b INT8, c INT8, a INT8, FOREIGN KEY (a, b, c) REFERENCES parent_multi (pa, pb, pc) ON DELETE CASCADE, FAMILY (c), FAMILY (b, a));

I200820 11:03:57.434675 2769 sql/table.go:125  [sql,client=127.0.0.1:61054,hostssl,user=root] queued new schema change job 582782279376316177 for table 55, mutation 0
I200820 11:03:57.503812 2769 sql/table.go:125  [sql,client=127.0.0.1:61054,hostssl,user=root] queued new schema change job 582782279466198801 for table 53, mutation 0
I200820 11:03:57.653158 2769 sql/event_log.go:146  [sql,client=127.0.0.1:61054,hostssl,user=root] Event: "create_table", target: 55, info: {TableName:test.public.child_multi_2 Statement:CREATE TABLE child_multi_2 (b INT8, c INT8, a INT8, FOREIGN KEY (a, b, c) REFERENCES parent_multi (pa, pb, pc) ON DELETE CASCADE, FAMILY (c), FAMILY (b, a)) User:root}
I200820 11:03:57.653261 2769 jobs/registry.go:297  [sql,client=127.0.0.1:61054,hostssl,user=root] scheduled jobs [582782279376316177 582782279466198801]
I200820 11:03:57.919243 2454 jobs/adopt.go:95  job 582782279376316177: resuming execution
I200820 11:03:57.988774 2454 jobs/adopt.go:95  job 582782279466198801: resuming execution
I200820 11:03:57.988814 3827 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582782279376316177: stepping through state running with error: <nil>
I200820 11:03:58.027270 3827 sql/schema_changer.go:547  [sql,job=582782279376316177,scExec,table=55,mutation=0] schema change on "child_multi_2" (v1) starting execution...
I200820 11:03:58.027291 3827 sql/schema_changer.go:356  [sql,job=582782279376316177,scExec,table=55,mutation=0] making table public
I200820 11:03:58.027297 3827 sql/schema_changer.go:938  [sql,job=582782279376316177,scExec,table=55,mutation=0] waiting for a single version...
I200820 11:03:58.130916 3898 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582782279466198801: stepping through state running with error: <nil>
I200820 11:03:58.199877 3827 sql/schema_changer.go:940  [sql,job=582782279376316177,scExec,table=55,mutation=0] waiting for a single version... done (at v 3)
I200820 11:03:58.238704 3898 sql/schema_changer.go:547  [sql,job=582782279466198801,scExec,table=53,mutation=0] schema change on "parent_multi" (v3) starting execution...
I200820 11:03:58.238735 3898 sql/schema_changer.go:938  [sql,job=582782279466198801,scExec,table=53,mutation=0] waiting for a single version...
I200820 11:03:58.615447 179 gossip/gossip.go:1508  [n1] node has connected to cluster via gossip
I200820 11:03:58.615612 179 kv/kvserver/stores.go:255  [n1] wrote 2 node addresses to persistent storage
I200820 11:03:58.621401 3898 sql/schema_changer.go:940  [sql,job=582782279466198801,scExec,table=53,mutation=0] waiting for a single version... done (at v 3)
I200820 11:03:58.621437 3898 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582782279466198801: stepping through state succeeded with error: <nil>
I200820 11:03:58.667866 3827 sql/schema_changer.go:938  [sql,job=582782279376316177,scExec,table=55,mutation=0] waiting for a single version...
I200820 11:03:58.826073 1314 gossip/gossip.go:1508  [n2] node has connected to cluster via gossip
I200820 11:03:58.826344 1314 kv/kvserver/stores.go:255  [n2] wrote 2 node addresses to persistent storage
I200820 11:03:58.972898 3827 sql/schema_changer.go:940  [sql,job=582782279376316177,scExec,table=55,mutation=0] waiting for a single version... done (at v 2)
I200820 11:03:58.972940 3827 jobs/registry.go:984  [sql] SCHEMA CHANGE job 582782279376316177: stepping through state succeeded with error: <nil>

It looks like we have a couple of stages where I see a performance difference:

  1. From a queued schema change to resuming execution, ~400ms go by in 3node-tenant.
  2. All the waiting for a single version take a while > ~200ms.

Focusing on (2), there are some interesting time sinks when getting a table descriptor and leases:

I200820 11:45:57.902740 3906 sql/catalog/lease/lease.go:331  [sql,job=582790535472588561,scExec,table=55,mutation=0] time spent getting descriptor 142.378ms
I200820 11:45:58.129270 3906 sql/catalog/lease/lease.go:340  [sql,job=582790535472588561,scExec,table=55,mutation=0] counting leases count=0, err=<nil> time spent=226.493ms

An interesting note is that I don't see this slowdown when running these create table statements through an end-to-end tenant demo, so it must be something about this test setup. Some curious log messages where the "node has connected to cluster via gossip". I wondered if sleeping before running the test would make a difference and it seems to do so (this is with a 3s sleep):

I200820 11:48:44.874643 4984 sql/catalog/lease/lease.go:331  [sql,job=582791085590718225,scExec,table=55,mutation=0] time spent getting descriptor 859µs
[sql,job=582791085590718225,scExec,table=55,mutation=0] counting leases count=0, err=<nil> time spent=1.646ms

But the sleep with the addition of DROP TABLE statements still results in the DROP TABLE taking a while:

I200820 12:55:24.621385 5788 sql/catalog/lease/lease.go:331  [sql,job=582804189170968337,scExec,table=54,mutation=0] time spent getting descriptor 40.25ms
I200820 12:55:24.808857 5788 sql/catalog/lease/lease.go:340  [sql,job=582804189170968337,scExec,table=54,mutation=0] counting leases count=0, err=<nil> time spent=187.411ms

These are my findings so far. I tried tracing the whole create table, but nothing interesting is showing up.

@ajwerner
Copy link
Contributor

Yeah this is totally the rate limiting. Fun! I find that if I up the burst on the rate limit it changes the runtime to 700ms.

--- a/pkg/kv/kvserver/tenantrate/settings.go
+++ b/pkg/kv/kvserver/tenantrate/settings.go
@@ -61,7 +61,7 @@ var (
        requestBurstLimit = settings.RegisterPositiveIntSetting(
                "kv.tenant_rate_limiter.request.burst_limit",
                "per-tenant request burst limit in requests",
-               512)
+               1024)

@asubiotto
Copy link
Contributor

Nice find! I assume we can turn this off or just make it very large in logic tests. Is there some verbose logging that would have made this more obvious?

@ajwerner
Copy link
Contributor

😬 not currently but I can add slow logging. I'm not sure I want to set the threshold on slow logging to be less than 1s though but I can make it so that we log on all acquisitions with their duration with a vmodule

@asubiotto
Copy link
Contributor

Logging with a vmodule sounds better to me than trying to figure out a threshold that works. If it shows up in a trace, that's good enough.

@ajwerner
Copy link
Contributor

ajwerner commented Aug 20, 2020

See #53133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants