Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core changefeed: unknown ParentID #51983

Closed
HeavyHorst opened this issue Jul 28, 2020 · 15 comments
Closed

Core changefeed: unknown ParentID #51983

HeavyHorst opened this issue Jul 28, 2020 · 15 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@HeavyHorst
Copy link

Describe the problem

I imported a MySQL dump and then deleted some tables I didn't need.
After that everything worked fine.

Today I wanted to try core changefeeds and run into this error message:

EXPERIMENTAL CHANGEFEED FOR TABLE overlaydata;
ERROR: table "users" has unknown ParentID 62

To Reproduce

Sorry, I'm not quite sure.

Expected behavior
I thought I would see row level changes.

Additional data / screenshots

CREATE TABLE overlaydata (
  context VARCHAR(150) NOT NULL,
  key VARCHAR(150) NOT NULL,
  value STRING NOT NULL,
  last_changed TIMESTAMPTZ NOT NULL DEFAULT current_timestamp():::TIMESTAMPTZ,
  CONSTRAINT "primary" PRIMARY KEY (context ASC, key ASC),
  FAMILY "primary" (context, key, value, last_changed)
)
CONFIGURE ZONE USING
range_min_bytes = 0,
range_max_bytes = 0,
gc.ttlseconds = 90000,
num_replicas = 0,
constraints = [''],
lease_preferences = [['']]

Environment:
CockroachDB version [v20.1.3]
Server OS: [Linux/Ubuntu 20.04 LTS]
Client app [cockroach sql]

@blathers-crl
Copy link

blathers-crl bot commented Jul 28, 2020

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/bulk-io (found keywords: import,changefeed)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jul 28, 2020
@spaskob
Copy link
Contributor

spaskob commented Jul 28, 2020

Thank for filing this issue! Did you by any chance delete any databases, ie by using DROP DATABASE ...?
Can you post the result of SELECT * FROM namespace.descriptors;?

@HeavyHorst
Copy link
Author

Did you by any chance delete any databases, ie by using DROP DATABASE ...?

I actually imported a current database dump under a different name and then imported the new data into the existing one

Like this:

  UPSERT INTO overlaydata (context, "key", "value", last_changed)
    (SELECT context, "key", "value", last_changed FROM bla.overlaydata WHERE last_changed > sometimestamp)

I dropped the temp-Database after that.

Can you post the result of SELECT * FROM namespace.descriptors;

SELECT * FROM namespace.descriptors;
ERROR: relation "namespace.descriptors" does not exist
SQLSTATE: 42P01

@spaskob
Copy link
Contributor

spaskob commented Jul 28, 2020

I apologize the correct query to run is SELECT * FROM system.namespace;

@HeavyHorst
Copy link
Author

Ah, thank you!

Here is the output:

 SELECT * FROM system.namespace;
  parentID | parentSchemaID |              name               | id
-----------+----------------+---------------------------------+-----
         0 |              0 | defaultdb                       | 50
         0 |              0 | erwin                           | 75
         0 |              0 | postgres                        | 51
         0 |              0 | system                          |  1
         1 |              0 | public                          | 29
         1 |             29 | comments                        | 24
         1 |             29 | descriptor                      |  3
         1 |             29 | eventlog                        | 12
         1 |             29 | jobs                            | 15
         1 |             29 | lease                           | 11
         1 |             29 | locations                       | 21
         1 |             29 | namespace                       |  2
         1 |             29 | namespace2                      | 30
         1 |             29 | protected_ts_meta               | 31
         1 |             29 | protected_ts_records            | 32
         1 |             29 | rangelog                        | 13
         1 |             29 | replication_constraint_stats    | 25
         1 |             29 | replication_critical_localities | 26
         1 |             29 | replication_stats               | 27
         1 |             29 | reports_meta                    | 28
         1 |             29 | role_members                    | 23
         1 |             29 | role_options                    | 33
         1 |             29 | settings                        |  6
         1 |             29 | statement_bundle_chunks         | 34
         1 |             29 | statement_diagnostics           | 36
         1 |             29 | statement_diagnostics_requests  | 35
         1 |             29 | table_statistics                | 20
         1 |             29 | ui                              | 14
         1 |             29 | users                           |  4
         1 |             29 | web_sessions                    | 19
         1 |             29 | zones                           |  5
        50 |              0 | public                          | 29
        51 |              0 | public                          | 29
        62 |             29 | users                           | 71
        75 |              0 | public                          | 29
        75 |             29 | file_event_log                  | 77
        75 |             29 | incloud_auto_inc                | 78
        75 |             29 | overlaydata                     | 80
        75 |             29 | users_auto_inc                  | 83
        85 |             29 | incloud                         | 89
        85 |             29 | users                           | 94
(41 rows)

Time: 39.165547ms

@spaskob
Copy link
Contributor

spaskob commented Jul 29, 2020

Thank you, I will need some more data to determine what is going on, please post the result of here
SELECT * FROM system.descriptor; or use SendSafely after you create a ticket in our https://cockroachdb.zendesk.com/agent/dashboard. Definitely use SendSafely if you want to keep the metadata about you table descriptors private.
Please also send a debug.zip via SendSafely as well and attach the link to the issue.

@HeavyHorst
Copy link
Author

I created ticket https://support.cockroachlabs.com/hc/en-us/requests/5941.
Could you explain how i use SendSafely?

@spaskob
Copy link
Contributor

spaskob commented Jul 30, 2020

Thanks for the patience! It looks like you succeeded to upload the files.

After investigating the conclusion is that you dropped a database that contained table users but table users was not deleted and was left with a valid ParentID which is to say without a valid database to contain it. This is a known issue (#50997 ) which we have fixed in version 20.1.4 which will should be released in the next few days.

The current corruption in the cluster will require a manual fix and I will post instructions shortly.

@spaskob
Copy link
Contributor

spaskob commented Jul 30, 2020

cc @vy-ton

@spaskob
Copy link
Contributor

spaskob commented Jul 30, 2020

Here are the steps for the fix:

  1. Create a new database CREATE DATABASE _crl_tmp;

  2. Get the ID of the new database SELECT id FROM system.namespace WHERE name = "_crl_tmp"

  3. Grant privileges to system table hack

-- must be done as root, cannot be done as any other user
INSERT INTO system.users VALUES ('node', NULL, true);
GRANT node TO root;
  1. Get the hex encoded descriptors for the abandoned tables.
    SELECT id, encode(descriptor, 'hex') FROM system.descriptor WHERE id = 71;

  2. Pipe the hex encoded descriptors to this tool with the flag --parent-id <database id from 2)> and save that SQL somewhere

package main

import (
	"bufio"
	"encoding/hex"
	"flag"
	"fmt"
	"os"
	"strings"

	"github.com/cockroachdb/cockroach/pkg/sql/sqlbase"
	"github.com/cockroachdb/cockroach/pkg/util/hlc"
	"github.com/cockroachdb/cockroach/pkg/util/protoutil"
)

func main() {
	newParent := flag.Int("parent-id", 0, "new parent for the table")
	flag.Parse()
	if *newParent == 0 {

	}
	sc := bufio.NewScanner(os.Stdin)
	for sc.Scan() {

		fields := strings.Fields(sc.Text())
		if len(fields) != 2 {
			panic(fmt.Errorf("got %q fields, expected 2", fields))
		}
		encoded, err := hex.DecodeString(fields[1])
		if err != nil {
			panic(fmt.Errorf("failed to decode hex descriptor %d: %v", fields[0], err))
		}
		var desc sqlbase.Descriptor
		if err := protoutil.Unmarshal(encoded, &desc); err != nil {
			panic(fmt.Errorf("failed to unmarshal descriptor %d: %v", fields[0], err))
		}
		tab := desc.GetTable()
		tab.Version++
		tab.ModificationTime = hlc.Timestamp{}

		for i := range tab.Columns {
			c := &tab.Columns[i]
			if c.DefaultExpr != nil && strings.Contains(*c.DefaultExpr, "nextval") {
				uniqueRowID := "unique_rowid()"
				c.DefaultExpr = &uniqueRowID
				c.UsesSequenceIds = nil
			}
		}
		tab.ParentID = sqlbase.ID(*newParent)
		encoded, err = protoutil.Marshal(&desc)
		if err != nil {
			panic(fmt.Errorf("failed to marshal descriptor %d: %v", fields[0], err))
		}
		fmt.Printf("UPDATE system.descriptor SET descriptor = decode('%s', 'hex') WHERE id = %s;\n", hex.EncodeToString(encoded), fields[0])
		fmt.Printf("UPDATE system.namespace SET \"parentID\" = %d WHERE id = %s;\n", *newParent, fields[0])
	}
}
  1. Run the outputted sql

  2. Remove the granted privileges

REVOKE node FROM root;
DELETE FROM system.users WHERE username = 'node';

@HeavyHorst
Copy link
Author

Thank you! I was able to resolve the corruption.

If someone has the same problem:

  1. You need to clone cockroachdb (https://wiki.crdb.io/wiki/spaces/CRDB/pages/181338446/Getting+and+building+from+source) and run "make execgen" to get rid of the build errors
  2. You need to pipe the data like so: echo "94 longhexdata" | ./theprogram --parent-id=96
  3. I had more than one corrupt table with the same name what lead to an duplicate key error:
ERROR: duplicate key value (parentID,parentSchemaID,name)=(95,29,'users') violates unique constraint "primary"

I just created another temp table (_crl_tmp_two) and ran the described steps again.

@HeavyHorst
Copy link
Author

HeavyHorst commented Jul 30, 2020

@spaskob Can I safely delete the Databases _crl_tmp and _crl_tmp_two or should i wait for version 20.1.4?

@spaskob
Copy link
Contributor

spaskob commented Jul 31, 2020 via email

@spaskob
Copy link
Contributor

spaskob commented Aug 11, 2020

Please send a new debug.zip regarding the webui bug.

@HeavyHorst
Copy link
Author

Done: See: #52422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

No branches or pull requests

2 participants