Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/sqlproxyccl/tenant: TestWatchPods failed #69220

Closed
cockroach-teamcity opened this issue Aug 22, 2021 · 8 comments · Fixed by #73946
Closed

ccl/sqlproxyccl/tenant: TestWatchPods failed #69220

cockroach-teamcity opened this issue Aug 22, 2021 · 8 comments · Fixed by #73946
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ d18da6c092bf1522e7a6478fe3973817e318c247:

=== RUN   TestWatchPods
    directory_test.go:69: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods921468159
    directory_test.go:530: starting tenant 20
    directory_test.go:535: tenant 20 started
    directory_test.go:104: 
        	Error Trace:	directory_test.go:104
        	Error:      	Should be empty, but was [127.0.0.1:39909]
        	Test:       	TestWatchPods
    panic.go:632: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods921468159
--- FAIL: TestWatchPods (1.77s)
Reproduce

To reproduce, try:

make stressrace TESTS=TestWatchPods PKG=./pkg/ccl/sqlproxyccl/tenant TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-race -parallel=4

/cc @cockroachdb/sql-experience @cockroachdb/server andy-kimball

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 22, 2021
@cockroach-teamcity
Copy link
Member Author

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ c438563d64d6cd6493a59d84e65e62d3f6060725:

=== RUN   TestWatchPods
    directory_test.go:69: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods985514530
    directory_test.go:530: starting tenant 20
    directory_test.go:535: tenant 20 started
    directory_test.go:104: 
        	Error Trace:	directory_test.go:104
        	Error:      	Should be empty, but was [127.0.0.1:45709]
        	Test:       	TestWatchPods
    panic.go:613: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods985514530
--- FAIL: TestWatchPods (2.15s)
Reproduce

To reproduce, try:

make stressrace TESTS=TestWatchPods PKG=./pkg/ccl/sqlproxyccl/tenant TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-json

/cc @cockroachdb/sql-experience @cockroachdb/server andy-kimball

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ 649a8ca38cdc0883dd7a7811b189ea53af19f644:

=== RUN   TestWatchPods
    directory_test.go:70: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods386775890
    directory_test.go:534: starting tenant 20
    directory_test.go:539: tenant 20 started
    directory_test.go:106: 
        	Error Trace:	directory_test.go:106
        	Error:      	Should be empty, but was [127.0.0.1:45179]
        	Test:       	TestWatchPods
    panic.go:632: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods386775890
--- FAIL: TestWatchPods (2.04s)
Reproduce

To reproduce, try:

make stressrace TESTS=TestWatchPods PKG=./pkg/ccl/sqlproxyccl/tenant TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-json

/cc @cockroachdb/sql-experience @cockroachdb/server andy-kimball

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ 8ef90ca92676389da2c955054ab5d996bb9ae6ff:

=== RUN   TestWatchPods
    directory_test.go:70: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1328391944
    directory_test.go:534: starting tenant 20
    directory_test.go:539: tenant 20 started
    directory_test.go:106: 
        	Error Trace:	directory_test.go:106
        	Error:      	Should be empty, but was [127.0.0.1:45869]
        	Test:       	TestWatchPods
    panic.go:661: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1328391944
--- FAIL: TestWatchPods (12.02s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

  • GOFLAGS=-race -parallel=4

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ 38fd4064f40c574d0b2aee02f414787a2760fcec:

=== RUN   TestWatchPods
    directory_test.go:70: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1911827548
    directory_test.go:534: starting tenant 20
    directory_test.go:539: tenant 20 started
    directory_test.go:106: 
        	Error Trace:	directory_test.go:106
        	Error:      	Should be empty, but was [127.0.0.1:45071]
        	Test:       	TestWatchPods
    panic.go:661: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1911827548
--- FAIL: TestWatchPods (2.14s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

  • GOFLAGS=-race -parallel=4

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ 884088d64d0859d72e42ee4fde1cf65192f1cb99:

=== RUN   TestWatchPods
    directory_test.go:70: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1104339386
    directory_test.go:534: starting tenant 20
    directory_test.go:539: tenant 20 started
    directory_test.go:106: 
        	Error Trace:	directory_test.go:106
        	Error:      	Should be empty, but was [127.0.0.1:36341]
        	Test:       	TestWatchPods
    panic.go:661: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1104339386
--- FAIL: TestWatchPods (12.08s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

  • GOFLAGS=-race -parallel=4

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/sqlproxyccl/tenant.TestWatchPods failed with artifacts on master @ 150caaf45cf8633aca2e5d0ba7ef7bbc2a285efb:

=== RUN   TestWatchPods
    directory_test.go:70: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1678173623
    directory_test.go:533: starting tenant 20
    directory_test.go:538: tenant 20 started
    directory_test.go:106: 
        	Error Trace:	directory_test.go:106
        	Error:      	Should be empty, but was [127.0.0.1:37369]
        	Test:       	TestWatchPods
    panic.go:661: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestWatchPods1678173623
--- FAIL: TestWatchPods (12.10s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

  • GOFLAGS=-race -parallel=4

This test on roachdash | Improve this report!

@jaylim-crl
Copy link
Collaborator

jaylim-crl commented Dec 16, 2021

This should be fixed by: #73946. Note that I wasn't able to reproduce this locally even after 5m of runs. If this appears again, we could reopen the issue.

@imjching
Copy link

imjching commented Dec 16, 2021

Actually, I was able to reproduce this on my linux machine. It didn't work on Mac.

...
GOFLAGS= go test  -race -mod=vendor -exec 'stress -timeout 5m' -tags 'gss make x86_64_pc_linux_gnu crdb_test' -ldflags '-X github.com/cockroachdb/cockroach/pkg/build.typ=development -extldflags "" -X "github.com/cockroachdb/cockroach/pkg/build.tag=v22.1.0-alpha.00000000-2162-g378fe960d2-dirty" -X "github.com/cockroachdb/cockroach/pkg/build.rev=378fe960d24b2e85e3165dc014589eec5c86446d" -X "github.com/cockroachdb/cockroach/pkg/build.cgoTargetTriple=x86_64-pc-linux-gnu"  ' -run "TestWatchPods" -timeout 0 ./pkg/ccl/sqlproxyccl/tenant  -v -args -test.timeout 5m
0 runs so far, 0 failures, over 5s
3 runs so far, 0 failures, over 10s
4 runs so far, 0 failures, over 15s
21 runs so far, 0 failures, over 20s
23 runs so far, 0 failures, over 25s
29 runs so far, 0 failures, over 30s
32 runs so far, 0 failures, over 35s
45 runs so far, 0 failures, over 40s
52 runs so far, 0 failures, over 45s
54 runs so far, 0 failures, over 50s
61 runs so far, 0 failures, over 55s
68 runs so far, 0 failures, over 1m0s
77 runs so far, 0 failures, over 1m5s
80 runs so far, 0 failures, over 1m10s
89 runs so far, 0 failures, over 1m15s
98 runs so far, 0 failures, over 1m20s
101 runs so far, 0 failures, over 1m25s
109 runs so far, 0 failures, over 1m30s
115 runs so far, 0 failures, over 1m35s
122 runs so far, 0 failures, over 1m40s
128 runs so far, 0 failures, over 1m45s
136 runs so far, 0 failures, over 1m50s
145 runs so far, 0 failures, over 1m55s
148 runs so far, 0 failures, over 2m0s
156 runs so far, 0 failures, over 2m5s
164 runs so far, 0 failures, over 2m10s
172 runs so far, 0 failures, over 2m15s
177 runs so far, 0 failures, over 2m20s
185 runs so far, 0 failures, over 2m25s
195 runs so far, 0 failures, over 2m30s
199 runs so far, 0 failures, over 2m35s

initialized metamorphic constant "coldata-batch-size" with value 408
initialized metamorphic constant "kv-batch-size" with value 1
initialized metamorphic constant "datum-row-converter-batch-size" with value 1
initialized metamorphic constant "row-container-rows-per-chunk-shift" with value 1
initialized metamorphic constant "inverted-joiner-batch-size" with value 1
initialized metamorphic constant "spilling-queue-initial-len" with value 1
initialized metamorphic constant "merge-joiner-groups-buffer" with value 3
initialized metamorphic constant "max-batch-size" with value 5320
initialized metamorphic constant "max-batch-byte-size" with value 2706399
initialized metamorphic constant "async-IE-result-channel-buffer-size" with value 30
I211216 22:15:04.240342 1 (gostd) rand.go:147  [-] 1  random seed: -5155978415146265208
=== RUN   TestWatchPods
    directory_test.go:70: test logs captured to: /tmp/logTestWatchPods1877801899
    directory_test.go:533: starting tenant 20
    directory_test.go:538: tenant 20 started
    directory_test.go:106: 
        	Error Trace:	directory_test.go:106
        	Error:      	Should be empty, but was [127.0.0.1:41707]
        	Test:       	TestWatchPods
    panic.go:661: -- test log scope end --
test logs left over in: /tmp/logTestWatchPods1877801899
--- FAIL: TestWatchPods (4.76s)
FAIL


ERROR: exit status 1

200 runs completed, 1 failures, over 2m36s
context canceled
FAIL
FAIL	github.com/cockroachdb/cockroach/pkg/ccl/sqlproxyccl/tenant	156.262s
FAIL
make: *** [Makefile:1079: stressrace] Error 1
[jay@archome] cockroach (jay/arch-base)$ 

Applied the changes from #73946, and it looks good:

[jay@archome] cockroach (jay/fix-69220)$ make stressrace TESTS=TestWatchPods PKG=./pkg/ccl/sqlproxyccl/tenant TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1
Running make with -j20
GOPATH set to /home/jay/workspace
mkdir -p lib
rm -f lib/lib{geos,geos_c}.so
cp -L /home/jay/workspace/native/x86_64-pc-linux-gnu/geos/lib/lib{geos,geos_c}.so lib
GOFLAGS= go test  -race -mod=vendor -exec 'stress -timeout 5m' -tags 'gss make x86_64_pc_linux_gnu crdb_test' -ldflags '-X github.com/cockroachdb/cockroach/pkg/build.typ=development -extldflags "" -X "github.com/cockroachdb/cockroach/pkg/build.tag=v22.1.0-alpha.00000000-2163-gfbb31c457e" -X "github.com/cockroachdb/cockroach/pkg/build.rev=fbb31c457edf35ea8da7ec31323d84eb4a11c0cf" -X "github.com/cockroachdb/cockroach/pkg/build.cgoTargetTriple=x86_64-pc-linux-gnu"  ' -run "TestWatchPods" -timeout 0 ./pkg/ccl/sqlproxyccl/tenant  -v -args -test.timeout 5m
0 runs so far, 0 failures, over 5s
5 runs so far, 0 failures, over 10s
6 runs so far, 0 failures, over 15s
22 runs so far, 0 failures, over 20s
26 runs so far, 0 failures, over 25s
31 runs so far, 0 failures, over 30s
32 runs so far, 0 failures, over 35s
44 runs so far, 0 failures, over 40s
55 runs so far, 0 failures, over 45s
55 runs so far, 0 failures, over 50s
65 runs so far, 0 failures, over 55s
71 runs so far, 0 failures, over 1m0s
77 runs so far, 0 failures, over 1m5s
84 runs so far, 0 failures, over 1m10s
92 runs so far, 0 failures, over 1m15s
101 runs so far, 0 failures, over 1m20s
107 runs so far, 0 failures, over 1m25s
114 runs so far, 0 failures, over 1m30s
117 runs so far, 0 failures, over 1m35s
129 runs so far, 0 failures, over 1m40s
136 runs so far, 0 failures, over 1m45s
140 runs so far, 0 failures, over 1m50s
149 runs so far, 0 failures, over 1m55s
159 runs so far, 0 failures, over 2m0s
165 runs so far, 0 failures, over 2m5s
169 runs so far, 0 failures, over 2m10s
178 runs so far, 0 failures, over 2m15s
186 runs so far, 0 failures, over 2m20s
191 runs so far, 0 failures, over 2m25s
195 runs so far, 0 failures, over 2m30s
203 runs so far, 0 failures, over 2m35s
211 runs so far, 0 failures, over 2m40s
216 runs so far, 0 failures, over 2m45s
224 runs so far, 0 failures, over 2m50s
231 runs so far, 0 failures, over 2m55s
237 runs so far, 0 failures, over 3m0s
245 runs so far, 0 failures, over 3m5s
249 runs so far, 0 failures, over 3m10s
259 runs so far, 0 failures, over 3m15s
265 runs so far, 0 failures, over 3m20s
273 runs so far, 0 failures, over 3m25s
282 runs so far, 0 failures, over 3m30s
287 runs so far, 0 failures, over 3m35s
294 runs so far, 0 failures, over 3m40s
303 runs so far, 0 failures, over 3m45s
312 runs so far, 0 failures, over 3m50s
319 runs so far, 0 failures, over 3m55s
325 runs so far, 0 failures, over 4m0s
328 runs so far, 0 failures, over 4m5s
337 runs so far, 0 failures, over 4m10s
346 runs so far, 0 failures, over 4m15s
349 runs so far, 0 failures, over 4m20s
356 runs so far, 0 failures, over 4m25s
365 runs so far, 0 failures, over 4m30s
371 runs so far, 0 failures, over 4m35s
377 runs so far, 0 failures, over 4m40s
382 runs so far, 0 failures, over 4m45s
392 runs so far, 0 failures, over 4m50s
397 runs so far, 0 failures, over 4m55s
405 runs so far, 0 failures, over 5m0s
412 runs so far, 0 failures, over 5m5s
416 runs so far, 0 failures, over 5m10s
424 runs so far, 0 failures, over 5m15s
434 runs so far, 0 failures, over 5m20s
438 runs so far, 0 failures, over 5m25s
445 runs so far, 0 failures, over 5m30s
452 runs so far, 0 failures, over 5m35s
461 runs so far, 0 failures, over 5m40s
^C462 runs completed, 0 failures, over 5m42s
context canceled
FAIL
FAIL	github.com/cockroachdb/cockroach/pkg/ccl/sqlproxyccl/tenant	342.231s
make: *** [Makefile:1079: stressrace] Error 1

craig bot pushed a commit that referenced this issue Dec 17, 2021
73500: kv,storage: persist gateway node id in transaction intents r=AlexTalks a=AlexTalks

This change augments the `TxnMeta` protobuf structure to include the
gateway node ID (responsible for initiating the transaction) when
serializing the intent.  By doing so, this commit enables the Contention
Event Store proposed in #71965, utilizing option 2.

Release note: None

73862: sql: add test asserting CREATE/USAGE on public schema r=otan a=rafiss

refs #70266

The public schema currently always has CREATE/USAGE privileges
for the public role. Add a test that confirms this.

Release note: None

73873: scdeps: tighten dependencies, log more side effects r=postamar a=postamar

This commit reworks the dependency injection for the event logger, among
other declarative schema changer dependencies. It also makes the test
dependencies more chatty in the side effects log.

Release note: None

73932: ui: select grants tab on table details page r=maryliag a=maryliag

Previosuly, when the grants view was selected on the Database
Details page, it was going to the Table Details with the Overview
tab selected.
With this commit, if the view mode selected is Grant, the grant
tab is selected on the Table Details page.

Fixes #68829

Release note: None

73943: cli: support --locality and --max-offset flags with sql tenant pods r=nvanbenschoten a=nvanbenschoten

This commit adds support for the `--locality` and `--max-offset` flags to the `cockroach mt start-sql` command.

The first of these is important because tenant SQL pods should know where they reside. This will be important in the future for multi-region serverless and also for projects like #72593.

The second of these is important because the SQL pod's max-offset setting needs to be the same as the host cluster's. If we want to be able to configure the host cluster's maximum clock offset to some non-default value, we'll need SQL pods to be configured identically.

Validation of plumbing:
```sh
./cockroach start-single-node --insecure --max-offset=250ms
./cockroach sql --insecure -e 'select crdb_internal.create_tenant(2)'

 # verify --max-offset

./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0
 # CRDB crashes with error "locally configured maximum clock offset (250ms) does not match that of node [::]:62744 (500ms)"

./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms
 # successful

 # verify --locality

./cockroach sql --insecure --port=26258 -e 'select gateway_region()'

ERROR: gateway_region(): no region set on the locality flag on this node

./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms --locality=region=us-east1

./cockroach sql --insecure --port=26258 -e 'select gateway_region()'

  gateway_region
------------------
  us-east1
```

73946: ccl/sqlproxyccl: fix TestWatchPods under stressrace r=jaylim-crl a=jaylim-crl

Fixes #69220.
Regression from #67452.

In #67452, we omitted DRAINING pods from the tenant directory. Whenever a pod
goes into the DRAINING state, the pod watcher needs time to update the
directory. Not waiting for that while calling EnsureTenantAddr produces a
stale result. This commit updates TestWatchPods by polling on EnsureTenantAddr
until the pod watcher updates the directory.

Release note: None

73954: sqlsmith: don't compare voids for joins r=rafiss a=otan

No comparison expr is defined on voids, so don't generate comparisons
for them.

Resolves #73901
Resolves #73898
Resolves #73777

Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Jay <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@craig craig bot closed this as completed in 1120d60 Dec 17, 2021
@craig craig bot closed this as completed in #73946 Dec 17, 2021
gustasva pushed a commit to gustasva/cockroach that referenced this issue Jan 4, 2022
Fixes cockroachdb#69220.
Regression from cockroachdb#67452.

In cockroachdb#67452, we omitted DRAINING pods from the tenant directory. Whenever a pod
goes into the DRAINING state, the pod watcher needs time to update the
directory. Not waiting for that while calling EnsureTenantAddr produces a
stale result. This commit updates TestWatchPods by polling on EnsureTenantAddr
until that has been completed.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants