Error executing schema change: failed to update job / does not exist #39255

stickenhoffen · 2019-08-02T04:55:17Z

Describe the problem

The following lines are continually logged across three nodes in a cluster:

W190802 04:50:00.605588 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469519352643780609: job with ID 469519352643780609 does not exist
W190802 04:50:01.458351 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469519352643780609: job with ID 469519352643780609 does not exist
W190802 04:50:02.314610 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469491126704537601: job with ID 469491126704537601 does not exist
W190802 04:50:03.159336 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469530263381803009: job with ID 469530263381803009 does not exist
W190802 04:50:03.946957 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469492583951958017: job with ID 469492583951958017 does not exist
W190802 04:50:04.838493 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469492912178266113: job with ID 469492912178266113 does not exist
W190802 04:50:05.706084 195 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 469503700072857601: job with ID 469503700072857601 does not exist

To Reproduce

I dropped some indexes, and recreated. Not exactly sure when it started.

Expected behavior

No errors.

Environment:

CCL v19.1.3 @ 2019/07/08 18:24:39 (go1.11.6)
Ubuntu 18.04.2 LTS
Three node cluster (4 x vCPU 16GB GCP)

Additional context
What was the impact?

Increased CPU load.

The text was updated successfully, but these errors were encountered:

roncrdb · 2019-08-06T15:21:26Z

Hey @stickenhoffen,

It appears that you may be running into a known issue #38088. Could you send us over the debug zip.

I've shared a drive folder fo you to upload the debug zip file to.

Please let me know when you've added the debug zip.

Thanks,

Ron

roncrdb · 2019-08-08T15:07:39Z

Hey @stickenhoffen,

Just wanted to follow up to see if you're able to send over that debug zip.

roncrdb · 2019-08-14T19:23:49Z

Hey @stickenhoffen,

Just wanted to follow up, are you still seeing this issue?

Let us know!

stickenhoffen · 2019-08-14T19:26:29Z

Hi, I dropped the database in the end and reimported. I have some time tomorrow and Friday so I'll see if I can reproduce. Sorry for not getting back sooner.

…

On Thu, 15 Aug 2019, 03:24 roncrdb, ***@***.***> wrote: Hey @stickenhoffen <https://github.com/stickenhoffen>, Just wanted to follow up, are you still seeing this issue? Let us know! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#39255?email_source=notifications&email_token=ACH2BLBWM6ZSVQFBSBN737LQERLWZA5CNFSM4II2MBK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4J236Q#issuecomment-521383418>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACH2BLBJYLL7XURIHD2IIDLQERLWZANCNFSM4II2MBKQ> .

-- This email and any attachments are solely for the use of the intended recipient(s). Any person receiving this email and any attachment(s) contained, shall treat the information as confidential and not misuse, copy, disclose, distribute or retain the information in any way that amounts to a breach of confidentiality. If you have received this email in error, please notify the sender immediately and delete this email from your system. If you are not the intended recipient(s), you must not distribute, retain, copy, disclose or otherwise use any information contained in this email. Any opinions expressed in this email are solely those of the sender and do not necessarily represent those of MyDoc. MyDoc does not guarantee that this email or any attachments are secure or free from viruses. No liability is accepted for any errors or omissions in the contents of this message, which arise as a result of email transmission. Unless expressly stated, this email is not intended to form a binding contract.

ricardocrdb · 2019-08-26T17:21:30Z

Hello @stickenhoffen

Have you had any luck in attempting to reproduce the issue? Let us know so we can continue to try to troubleshoot this issue and help you get it resolved.

stickenhoffen · 2019-08-27T08:53:49Z

Hi @ricardocrdb

I've dropped and recreated a few indexes, and although I'm not seeing the schema change error being logged, CPU is increasing rapidly on all three nodes as per before.

I've uploaded a debug ZIP.

There are also several jobs related to indexing that are at "Waiting For GC TTL".

Edit: all those waiting for GC are "DROP INDEX".

ricardocrdb · 2019-08-27T18:46:33Z

Hey @stickenhoffen

Would it be possible to get the output for http://<adminUrl>/debug/pprof/ui/profile/ from each of the three nodes, and screenshot it for us to see? You can put the screenshots in the same Google Drive location as the debug zip. I am seeing various connection errors on all three nodes, and this page should give us a little more information.

W190827 08:47:46.049802 39986968 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:47:48.931406 39987187 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:47:49.132941 39987293 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
I190827 08:47:55.926647 169 server/status/runtime.go:500  [n1] runtime stats: 5.8 GiB RSS, 199 goroutines, 672 MiB/296 MiB/1.1 GiB GO alloc/idle/total, 3.7 GiB/4.6 GiB CGO alloc/total, 862.6 CGO/sec, 198.6/34.8 %(u/s)time, 0.0 %gc (5x), 14 MiB/24 MiB (r/w)net
W190827 08:47:56.051428 39987943 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:47:58.933189 39988351 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:47:59.153501 39988424 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
I190827 08:48:05.928882 169 server/status/runtime.go:500  [n1] runtime stats: 5.8 GiB RSS, 206 goroutines, 408 MiB/532 MiB/1.1 GiB GO alloc/idle/total, 3.7 GiB/4.6 GiB CGO alloc/total, 674.0 CGO/sec, 194.0/31.9 %(u/s)time, 0.0 %gc (4x), 13 MiB/22 MiB (r/w)net
W190827 08:48:06.052746 39989374 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:48:08.934566 39989671 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:48:09.135798 39989684 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
I190827 08:48:15.930985 169 server/status/runtime.go:500  [n1] runtime stats: 5.8 GiB RSS, 204 goroutines, 444 MiB/498 MiB/1.1 GiB GO alloc/idle/total, 3.7 GiB/4.6 GiB CGO alloc/total, 410.2 CGO/sec, 185.9/34.6 %(u/s)time, 0.0 %gc (4x), 3.4 MiB/22 MiB (r/w)net
W190827 08:48:16.053974 39990231 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:48:18.946795 39990432 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190827 08:48:19.138098 39990401 vendor/google.golang.org/grpc/server.go:666  grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
I190827 08:48:22.879095 165 gossip/gossip.go:557  [n1] gossip status (ok, 3 nodes)
gossip client (1/3 cur/max conns)

Let me know when those screenshots have been sent over.

stickenhoffen · 2019-08-28T04:15:46Z

@ricardocrdb uploaded as requested.

ricardocrdb · 2019-08-29T15:30:14Z

Hey @stickenhoffen

At this point based on those screenshots, I am not seeing anything jump out at me. Are you still seeing high CPU usage while waiting for GC TTL? Please let me know, if possible another screenshot of the CPU usage specifically from the AdminUI may help, along with the Job ID of the DROP INDEX. Let me know once that is uploaded.

stickenhoffen · 2019-09-06T09:07:08Z

Hi @ricardocrdb

A couple of schoolboy errors on my part. Firstly, we had a cronjob that was smashing a huge table, actually functionality that is no longer required! We had a whole bunch of missing indexes, the two together was killing it.

Secondly, I wasn't using the healthz endpoint to test, I was using TCP socket checks, that accounts for the HTTP2 error lines.

So for now, everything looks good. I'm very happy! :)

Thanks.

ricardocrdb · 2019-09-06T12:52:00Z

Hey @stickenhoffen

That's great to hear, I am glad everything is working smoothly. I will go ahead and close this issue out. If anything else comes up, feel free to open a new issue.

Cheers,
Ricardo

hollinwilkins · 2019-10-04T00:34:50Z

Hey, can we reopen this issue. I am actively getting this error. It is causing clients to not be able to connect to the database, and when I look in the UI, none of my databases are listed.

hollinwilkins · 2019-10-04T00:38:08Z

@ricardocrdb Also, when I go to the "Jobs" tab in the UI, the spinner spins forever and never loads. Looks like it is stuck on this API call: _admin/v1/jobs?status=&type=0&limit=50

Running version: v19.1.2
Running in Kubernetes
Deployed using stable helm chart
Insecure mode

Please help, it doesn't look like any data was lost, but I cannot access the cluster from the JDBC postgres client.

I CAN access the cluster and run queries using the cockroachdb client.

hollinwilkins · 2019-10-04T00:43:51Z

Funny enough, I just deleted all pods, let them come back up, and it seems to be working fine now. If that happens again in the future, how can I help you debug it? Any debug information I could save anywhere?

hollinwilkins · 2019-10-04T00:44:56Z

Spoke too soon, it is still happening. I do have access to my databases now from JDBC postgres driver.

irfansharif · 2020-02-28T18:36:55Z

@hollinwilkins: we seem to have missed your messages here, hopefully you were able to find a resolution elsewhere. We did some recent work on this area in #44299.

mattcrdb added the O-community Originated from the community label Aug 5, 2019

roncrdb self-assigned this Aug 6, 2019

awoods187 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. labels Aug 30, 2019

ricardocrdb closed this as completed Sep 6, 2019

spaskob mentioned this issue Jan 23, 2020

sql: Schema Change Manager may infinitely retry schema changes #44299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error executing schema change: failed to update job / does not exist #39255

Error executing schema change: failed to update job / does not exist #39255

stickenhoffen commented Aug 2, 2019 •

edited

Loading

roncrdb commented Aug 6, 2019

roncrdb commented Aug 8, 2019

roncrdb commented Aug 14, 2019

stickenhoffen commented Aug 14, 2019 via email

ricardocrdb commented Aug 26, 2019

stickenhoffen commented Aug 27, 2019 •

edited

Loading

ricardocrdb commented Aug 27, 2019

stickenhoffen commented Aug 28, 2019

ricardocrdb commented Aug 29, 2019

stickenhoffen commented Sep 6, 2019

ricardocrdb commented Sep 6, 2019

hollinwilkins commented Oct 4, 2019

hollinwilkins commented Oct 4, 2019 •

edited

Loading

hollinwilkins commented Oct 4, 2019

hollinwilkins commented Oct 4, 2019

irfansharif commented Feb 28, 2020 •

edited

Loading

Error executing schema change: failed to update job / does not exist #39255

Error executing schema change: failed to update job / does not exist #39255

Comments

stickenhoffen commented Aug 2, 2019 • edited Loading

roncrdb commented Aug 6, 2019

roncrdb commented Aug 8, 2019

roncrdb commented Aug 14, 2019

stickenhoffen commented Aug 14, 2019 via email

ricardocrdb commented Aug 26, 2019

stickenhoffen commented Aug 27, 2019 • edited Loading

ricardocrdb commented Aug 27, 2019

stickenhoffen commented Aug 28, 2019

ricardocrdb commented Aug 29, 2019

stickenhoffen commented Sep 6, 2019

ricardocrdb commented Sep 6, 2019

hollinwilkins commented Oct 4, 2019

hollinwilkins commented Oct 4, 2019 • edited Loading

hollinwilkins commented Oct 4, 2019

hollinwilkins commented Oct 4, 2019

irfansharif commented Feb 28, 2020 • edited Loading

stickenhoffen commented Aug 2, 2019 •

edited

Loading

stickenhoffen commented Aug 27, 2019 •

edited

Loading

hollinwilkins commented Oct 4, 2019 •

edited

Loading

irfansharif commented Feb 28, 2020 •

edited

Loading