Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlccl: IMPORT fails with "Failed to create ServerTransport: connection". No indication in admin UI. Only in logs. #25481

Closed
thstart opened this issue May 14, 2018 · 7 comments
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs S-1-blocking-adoption

Comments

@thstart
Copy link

thstart commented May 14, 2018

BUG REPORT

commands:

n1:
/usr/local/bin/cockroach
start
--insecure
--store=/TC.CockRoachDB.store
--host=my ip address 1
--http-port=9000
--cache=25%
--max-sql-memory=25%

n2:
/usr/local/bin/cockroach
start
--insecure
/TC.CockRoachDB.store
--host=my ip address 2
--http-port=9000
--cache=25%
--max-sql-memory=25%
--join=my ip address 1:26257

n3:
/usr/local/bin/cockroach
start
--insecure
/TC.CockRoachDB.store
--host=my ip address 3
--http-port=9000
--cache=25%
--max-sql-memory=25%
--join=my ip address 1:26257

==========================

  1. Please supply the header (i.e. the first few lines) of your most recent

Only on n2:

W180512 10:00:29.066054 19312608 vendor/google.golang.org/grpc/server.go:625 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: "GET /_status/vars HTTP/1""
I180512 10:00:35.632078 172 server/status/runtime.go:219 [n1] runtime stats: 1.6 GiB RSS, 149 goroutines, 475 MiB/95 MiB/779 MiB GO alloc/idle/total, 4.7 GiB/6.1 GiB CGO alloc/total, 1020.51cgo/sec, 0.13/0.03 %(u/s)time, 0.00 %gc (0x)
W180512 10:00:39.068422 19312637 vendor/google.golang.org/grpc/server.go:625 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: "GET /_status/vars HTTP/1""
I180512 10:00:45.755898 172 server/status/runtime.go:219 [n1] runtime stats: 1.6 GiB RSS, 149 goroutines, 326 MiB/216 MiB/779 MiB GO alloc/idle/total, 4.7 GiB/6.1 GiB CGO alloc/total, 1110.66cgo/sec, 0.21/0.05 %(u/s)time, 0.00 %gc (1x)
W180512 10:00:49.066495 19312882 vendor/google.golang.org/grpc/server.go:625 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: "GET /_status/vars HTTP/1""
I180512 10:00:55.629911 172 server/status/runtime.go:219 [n1] runtime stats: 1.6 GiB RSS, 149 goroutines, 436 MiB/129 MiB/779 MiB GO alloc/idle/total, 4.7 GiB/6.1 GiB CGO alloc/total, 1049.41cgo/sec, 0.14/0.02 %(u/s)time, 0.00 %gc (0x)
W180512 10:00:57.154530 54 vendor/google.golang.org/grpc/clientconn.go:1158 grpc: addrConn.createTransport failed to connect to {192.168.0.196:26257 0 }. Err :connection error: desc = "transport: failed to write client preface: io: read/write on closed pipe". Reconnecting...
W180512 10:00:59.067747 19312996 vendor/google.golang.org/grpc/server.go:625 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: "GET /_status/vars HTTP/1""
I180512 10:01:05.764751 172 server/status/runtime.go:219 [n1] runtime stats: 1.6 GiB RSS, 149 goroutines, 556 MiB/23 MiB/779 MiB GO alloc/idle/total, 4.7 GiB/6.1 GiB CGO alloc/total, 1111.12cgo/sec, 0.12/0.03 %(u/s)time, 0.00 %gc (0x)
W180512 10:01:09.068300 19313116 vendor/google.golang.org/grpc/server.go:625 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: "GET /_status/vars HTTP/1""

  1. Please describe the issue you observed:
    I ran same IMPORT at 2018-05-13 and a Failure again at 2018-05-14
  • What did you do?

Performing IMPORT.

  • What did you expect to see?
  1. IMPORT to finish successfully.
  2. If there is communication issues I expect admin ui to show it.
    There is not an indication for such issues. I did go back for 2018-05-13 and 2018-05-14 and Replicas per Node show same number for all 3 nodes: 7,334.
    Basically slowly increasing but same number for all 3 nodes
    how this is possible with communication error on n2?
  • What did you see instead?

IMPORT failed after running 14 hours.

On the command line got the following error:
W180514 20:47:42.317473 35807157 server/server.go:1761 [n2] error closing gzip response writer:

@thstart thstart changed the title Failed to create ServerTransport: connection error. No indication in admin UI. Failed to create ServerTransport: connection error. No indication in admin UI. Only in logs. May 14, 2018
@vivekmenezes vivekmenezes added A-disaster-recovery O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs labels May 15, 2018
@vivekmenezes
Copy link
Contributor

@thstart I think you should bring up the cluster make sure that you're not seeing the grpc warning you are seeing in the logs and only then run the import command.

@thstart
Copy link
Author

thstart commented May 15, 2018

I did and got same messages again

@thstart
Copy link
Author

thstart commented May 15, 2018

I restarted cockroachdb on n2 and attempted IMPORT again and after 14 hours got the same result. This is serious issue and nobody mentioned it till now?

@knz knz changed the title Failed to create ServerTransport: connection error. No indication in admin UI. Only in logs. sqlccl: IMPORT fails with "Failed to create ServerTransport: connection". No indication in admin UI. Only in logs. May 15, 2018
@knz knz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 15, 2018
@tim-o
Copy link
Contributor

tim-o commented May 15, 2018

Hey @thstart - before we dig any further, could you tell us a bit more about your cluster configuration? It'd be helpful to know for each node:

  1. Where is it hosted? (i.e.: locally? Or on Azure / Digital Ocean / AWS / GCE / another provider? Are all nodes hosted in the same DC or different DCs?)
  2. How many cores are available per node?
  3. How much RAM is available per node?
  4. How much storage is available per node?

Thanks again for your patience.

@tim-o
Copy link
Contributor

tim-o commented May 15, 2018

Adding context from gitter: @thstart split insert into 50x 2m inserts, still seeing slow performance.

@thstart
Copy link
Author

thstart commented May 15, 2018

  1. it is self hosted on my own servers.
  2. mac-minis
    n1: 2.8 Ghz Intel Core i5/16GB 1800 Mhz DDR3
    n2: 2.8 Ghz Intel Core i5/16GB 1800 Mhz DDR3
    n3: 3 Ghz Intel Core i7/16GB 1600 Mhz DDR3
    These kind of servers are doing just fine - I have 3 others which
    work with RocksDB in production environment.
    I like CockRoachDB and wanted to have it.
    The communication problem is the only problem
    I have.
  3. 16 GB
  4. 2TB

The problem is nor the slow performance but this communication issue. from the new IMPORT these which failed show this communication problem as a reason to fail.

@maddyblue
Copy link
Contributor

Our current best guess is this is #25480. We think one of the nodes is having liveness or other issues that is causing the worker to die which causes the failure. We're not sure when we will prioritize that, but you can subscribe to that issue to see updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs S-1-blocking-adoption
Projects
None yet
Development

No branches or pull requests

6 participants