Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: empty splits in tpc-c 5k restore of new_order table #26375

Closed
a-robinson opened this issue Jun 4, 2018 · 25 comments
Closed

backupccl: empty splits in tpc-c 5k restore of new_order table #26375

a-robinson opened this issue Jun 4, 2018 · 25 comments
Assignees

Comments

@a-robinson
Copy link
Contributor

As brought up in #26059 (comment), when I restore the TPC-C 5k workload fixture onto a 15-node roachprod cluster, I get a bunch of empty splits in the tpcc.new_order table. Specifically, I run

$ roachprod run $CLUSTERNAME:16 "./workload.LATEST fixtures load tpcc {pgurl:1} --warehouses=5000"

Which internally runs:

RESTORE TABLE csv.new_order FROM 'gs://cockroach-fixtures/workload/tpcc/version=2.0.1,interleaved=false,seed=1,warehouses=5000/new_order' WITH into_db = $2

Once the restore is done, I always get:

root@localhost:26257/defaultdb> show testing_ranges from table tpcc.new_order;
+-----------------+-----------------+----------+-----------+--------------+
|    Start Key    |     End Key     | Range ID | Replicas  | Lease Holder |
+-----------------+-----------------+----------+-----------+--------------+
| NULL            | /0/1/2101/0     |      834 | {9,10,13} |           13 |
| /0/1/2101/0     | /272/10/2564    |     1146 | {9,13,14} |           13 |
| /272/10/2564    | /272/10/2564/0  |     1148 | {1,4,9}   |            4 |
| /272/10/2564/0  | /521/6/2127     |     1205 | {9,10,13} |           10 |
| /521/6/2127     | /521/6/2127/0   |     1263 | {9,10,13} |            9 |
| /521/6/2127/0   | /770/1/2590     |     1269 | {1,5,9}   |            9 |
| /770/1/2590     | /770/1/2590/0   |      927 | {9,10,13} |           13 |
| /770/1/2590/0   | /1018/7/2153    |     1204 | {4,9,13}  |           13 |
| /1018/7/2153    | /1018/7/2153/0  |     1206 | {5,9,13}  |           13 |
| /1018/7/2153/0  | /1267/2/2616    |     1262 | {4,10,13} |           10 |
| /1267/2/2616    | /1267/2/2616/0  |     1268 | {5,9,13}  |            5 |
| /1267/2/2616/0  | /1515/8/2179    |     1356 | {5,9,13}  |            9 |
| /1515/8/2179    | /1515/8/2179/0  |     1006 | {9,10,13} |           13 |
| /1515/8/2179/0  | /1764/3/2642    |     1210 | {5,9,13}  |           13 |
| /1764/3/2642    | /1764/3/2642/0  |     1264 | {5,9,13}  |            9 |
| /1764/3/2642/0  | /1948/8/2638    |     1351 | {5,9,13}  |            9 |
| /1948/8/2638    | /1948/8/2638/0  |     1358 | {5,13,14} |            5 |
| /1948/8/2638/0  | /2197/4/2201    |     1424 | {9,10,13} |           10 |
| /2197/4/2201    | /2197/4/2201/0  |     1143 | {4,9,10}  |            9 |
| /2197/4/2201/0  | /2445/9/2664    |     1270 | {1,5,9}   |            1 |
| /2445/9/2664    | /2445/9/2664/0  |     1355 | {9,10,13} |           13 |
| /2445/9/2664/0  | /2605/5/2267    |     1421 | {9,10,13} |           10 |
| /2605/5/2267    | /2605/5/2267/0  |     1427 | {9,10,13} |           13 |
| /2605/5/2267/0  | /2853/10/2730   |     1461 | {8,9,10}  |            9 |
| /2853/10/2730   | /2853/10/2730/0 |     1202 | {9,10,13} |            9 |
| /2853/10/2730/0 | /2921/10/2590   |     1357 | {5,9,13}  |            9 |
| /2921/10/2590   | /2921/10/2590/0 |     1360 | {9,10,13} |           13 |
| /2921/10/2590/0 | /3170/6/2153    |     1426 | {9,10,13} |            9 |
| /3170/6/2153    | /3170/6/2153/0  |     1430 | {9,10,13} |            9 |
| /3170/6/2153/0  | /3419/1/2616    |     1465 | {9,10,13} |           10 |
| /3419/1/2616    | /3419/1/2616/0  |     1208 | {3,9,10}  |           10 |
| /3419/1/2616/0  | /3667/7/2179    |     1359 | {5,6,9}   |            9 |
| /3667/7/2179    | /3667/7/2179/0  |     1423 | {2,9,13}  |            9 |
| /3667/7/2179/0  | /3916/2/2642    |     1429 | {9,13,15} |           15 |
| /3916/2/2642    | /3916/2/2642/0  |     1464 | {4,7,9}   |            9 |
| /3916/2/2642/0  | /4164/8/2205    |     1467 | {9,10,13} |            9 |
| /4164/8/2205    | /4164/8/2205/0  |     1265 | {9,10,13} |           13 |
| /4164/8/2205/0  | /4413/3/2668    |     1428 | {9,10,13} |            9 |
| /4413/3/2668    | /4413/3/2668/0  |     1463 | {9,10,13} |            9 |
| /4413/3/2668/0  | /4661/9/2231    |     1466 | {9,10,13} |           13 |
| /4661/9/2231    | /4661/9/2231/0  |     1468 | {3,7,10}  |           10 |
| /4661/9/2231/0  | /4910/4/2694    |     1469 | {9,10,13} |            9 |
| /4910/4/2694    | /4910/4/2694/0  |     1352 | {9,10,13} |            9 |
| /4910/4/2694/0  | /4999/10/3001   |     1422 | {1,8,10}  |            1 |
| /4999/10/3001   | NULL            |     1425 | {9,10,13} |            9 |
+-----------------+-----------------+----------+-----------+--------------+
(45 rows)

All those ranges from key to key/0 are worthless and will always be empty. As mentioned on #26059 I think there's a decent chance that it's just because the restore was generated from a cluster that had partitioned the table and was running on a build before #24896 was fixed, but, it was suggested that the RESTORE team check into it because it happens consistently.

@petermattis
Copy link
Collaborator

I'm also seeing these empty ranges on one of the bank fixtures:

root@localhost:26257/defaultdb> select * from [show testing_ranges from table bank.bank] limit 10;
+-----------+----------+----------+----------+--------------+
| Start Key | End Key  | Range ID | Replicas | Lease Holder |
+-----------+----------+----------+----------+--------------+
| NULL      | /0/0     |       22 | {1,3,5}  |            1 |
| /0/0      | /3273    |       37 | {1,3,5}  |            1 |
| /3273     | /3273/0  |       38 | {4,7,9}  |            4 |
| /3273/0   | /6546    |       23 | {2,5,7}  |            5 |
| /6546     | /6546/0  |       25 | {6,7,10} |            6 |
| /6546/0   | /9819    |       27 | {2,6,7}  |            6 |
| /9819     | /9819/0  |       29 | {6,7,10} |            6 |
| /9819/0   | /13092   |       61 | {5,7,9}  |            9 |
| /13092    | /13092/0 |       63 | {3,4,9}  |            4 |
| /13092/0  | /16365   |       65 | {5,6,7}  |            6 |
+-----------+----------+----------+----------+--------------+

This is used by the clearrange test. The fixture itself is gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank. No partitioning is involved so there might still be a bug here.

@mjibson Can someone from bulkio take a look soon?

@petermattis
Copy link
Collaborator

Btw, I created a smaller bank fixture and store dump last week and it is showing the same problem:

root@localhost:26257/defaultdb> select * from [show testing_ranges from table bank.bank] limit 10;
+-----------+----------+----------+----------+--------------+
| Start Key | End Key  | Range ID | Replicas | Lease Holder |
+-----------+----------+----------+----------+--------------+
| NULL      | /0/0     |       24 | {1}      |            1 |
| /0/0      | /3273    |       29 | {1}      |            1 |
| /3273     | /3273/0  |       31 | {1}      |            1 |
| /3273/0   | /6546    |       34 | {1}      |            1 |
| /6546     | /6546/0  |       38 | {1}      |            1 |
| /6546/0   | /9819    |       41 | {1}      |            1 |
| /9819     | /9819/0  |       44 | {1}      |            1 |
| /9819/0   | /12074   |       47 | {1}      |            1 |
| /12074    | /12074/0 |       50 | {1}      |            1 |
| /12074/0  | /15347   |       55 | {1}      |            1 |
+-----------+----------+----------+----------+--------------+

So whatever the problem is, it isn't due to an old fixture or old restore code.

@petermattis
Copy link
Collaborator

Definitely seems to be something coming out of the restore code. Run the following commands on a local 1 node cluster:

roachprod wipe local
roachprod start local

roachprod sql local -- -e "create database bank"
roachprod sql local -- -e "restore csv.bank from 'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=10000,seed=1/bank' with into_db = 'bank'"
roachprod sql local -- -e "show testing_ranges from table bank.bank"

In the logs I see:

I180605 16:51:04.350983 754 storage/replica_command.go:863  [n1,s1,r24/1:/{Table/53-Max}] initiating a split of this range at key /Table/53/1/3273/0 [r25]
I180605 16:51:04.416558 756 storage/replica_command.go:863  [n1,s1,r29/1:/Table/53/1/{0/0-3273/0}] initiating a split of this range at key /Table/53/1/3273 [r30]

I'm tracking down where these splits are coming from in the restore code.

@petermattis
Copy link
Collaborator

The splits are coming from ccl/backupccl/restore.go:splitAndScatter. The incoming import spans have these tiny ranges:

I180605 17:02:41.251022 713 ccl/backupccl/restore.go:763  [n1] 0: import span /Table/53/1-/Table/53/1/0/0
I180605 17:02:41.251044 713 ccl/backupccl/restore.go:763  [n1] 1: import span /Table/53/1/0/0-/Table/53/1/3273
I180605 17:02:41.251054 713 ccl/backupccl/restore.go:763  [n1] 2: import span /Table/53/1/3273-/Table/53/1/3273/0
I180605 17:02:41.251064 713 ccl/backupccl/restore.go:763  [n1] 3: import span /Table/53/1/3273/0-/Table/53/1/6546
I180605 17:02:41.251073 713 ccl/backupccl/restore.go:763  [n1] 4: import span /Table/53/1/6546-/Table/53/1/6546/0
I180605 17:02:41.251082 713 ccl/backupccl/restore.go:763  [n1] 5: import span /Table/53/1/6546/0-/Table/53/1/9819
I180605 17:02:41.251091 713 ccl/backupccl/restore.go:763  [n1] 6: import span /Table/53/1/9819-/Table/53/1/9819/0
I180605 17:02:41.251100 713 ccl/backupccl/restore.go:763  [n1] 7: import span /Table/53/1/9819/0-/Table/53/1/10000
I180605 17:02:41.251109 713 ccl/backupccl/restore.go:763  [n1] 8: import span /Table/53/1/10000-/Table/53/2

So this split code is doing the right thing. What is calling this code?

@petermattis
Copy link
Collaborator

The backup files cover the following key spans:

/Table/53/1/0/0-/Table/53/1/3273
/Table/53/1/3273/0-/Table/53/1/6546
/Table/53/1/6546/0-/Table/53/1/9819
/Table/53/1/9819/0-/Table/53/1/10000

Notice that the key boundaries do not align precisely. The code which creates the import spans from the backup spans seems to be creating empty ranges between these files.

Cc @dt, @danhhz

@petermattis
Copy link
Collaborator

@dt I'm tossing this your direction as I've reached the limits of my naive debugging. Backup seems to be creating a BackupDescriptor with files that span discontiguous key ranges (i.e the end key for one file span is not the start key for the next). Can this be fixed up in the restore code (i.e. in `makeImportSpans)? That doesn't seem quite kosher, but I'm not familiar with this code.

I haven't given this bug a severity label, but it is somewhat serious. AFAICT, we're creating double the number of ranges on restore that we should be.

@maddyblue
Copy link
Contributor

I'm looking into this now.

@maddyblue maddyblue assigned maddyblue and unassigned dt Jun 5, 2018
@petermattis
Copy link
Collaborator

A thought for a fix: splitAndScatter should only be splitting at importSpan entries that contain files. The empty ranges have an importEntry with len(importEntry.files) == 0.

@petermattis
Copy link
Collaborator

@mjibson This "works" (for some definition of I-don't-know-what-I'm-doing-but-it-produces-the-result-I-want):

diff --git a/pkg/ccl/backupccl/restore.go b/pkg/ccl/backupccl/restore.go
index 279d1795c..eacef0ebb 100644
--- a/pkg/ccl/backupccl/restore.go
+++ b/pkg/ccl/backupccl/restore.go
@@ -716,13 +716,17 @@ rangeLoop:
                                        return nil, hlc.Timestamp{}, err
                                }
                        }
+                       // We only need to create import entries for spans that contain files.
+                       if len(files) > 0 {
+                               requestEntries = append(requestEntries, importEntry{
+                                       Span:      roachpb.Span{Key: importRange.Start, EndKey: importRange.End},
+                                       entryType: request,
+                                       files:     files,
+                               })
+                       }
+               } else {
                        // If needed is false, we have data backed up that is not necessary
                        // for this restore. Skip it.
-                       requestEntries = append(requestEntries, importEntry{
-                               Span:      roachpb.Span{Key: importRange.Start, EndKey: importRange.End},
-                               entryType: request,
-                               files:     files,
-                       })
                }
        }
        return requestEntries, maxEndTime, nil

@maddyblue
Copy link
Contributor

This currently looks like an IMPORT problem since the generated SSTs don't align perfectly. Am checking.

@dt
Copy link
Member

dt commented Jun 5, 2018

I think IMPORT might have created them here, but emptied ranges in a backed up cluster would too. I think @petermattis patch makes sense -- there's no reason to make a range that we're not going to import any data into, so filtering the importEntry saves doing some pointless make-work.

@petermattis
Copy link
Collaborator

Note that the backups were create via:

./workload fixtures make bank --payload-bytes=10240 --ranges=0 --rows=20000 --csv-server=http://localhost:8081

I believe internally that creates a backup via some transform magic from CSVs. I used the following script to create the backup:

#!/bin/bash

roachprod wipe $1
roachprod put $1 ./cockroach-linux-2.6.32-gnu-amd64 ./cockroach
roachprod put $1 bin.docker_amd64/workload ./workload

for rows in 10000; do
  roachprod wipe $1
  roachprod start $1
  for i in $(seq 1 6); do
    roachprod ssh $1:$i -- \
      "./workload csv-server --port=8081 > logs/workload-csv-server.log 2>&1 < /dev/null &"
  done

  gsutil -m rm -rf gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=${rows},seed=1/
  roachprod ssh $1:1 -- \
    "./workload fixtures make bank --payload-bytes=10240 --ranges=0 --rows=${rows} --csv-server=http://localhost:8081 {pgurl:1}"
done

@dt
Copy link
Member

dt commented Jun 5, 2018

(to clarify, 👍 on filtering in RESTORE is in addition to, not instead of fixing in IMPORT: if it is making them in these backups, it probably also makes them when run directly, skipping RESTORE and any potential fixes we add there, so we still need to fix the root cause either way).

@maddyblue
Copy link
Contributor

Looking into IMPORT, it creates these keys with extra /0 in transform mode. So we can probably fix this in two places.

@danhhz
Copy link
Contributor

danhhz commented Jun 5, 2018

but emptied ranges in a backed up cluster would too

Unless it has broken, backup doesn't make a files entry in the descriptor for empty ranges. This comes up in incremental backups of tables that aren't changing. The top-level spans field on BackupDescriptor is so we can tell when things were emitted as empty. IMPORT should do the same thing when making the backup descriptor (so we aren't unnecessarily inflating the size of the descriptor) but there's no reason we shouldn't also add peter's check in restore

@dt
Copy link
Member

dt commented Jun 5, 2018

ah, right, forgot about that. I think given that, we should probably actually warn if this filter ever sees anything, since that indicates we messed up somewhere.

@petermattis
Copy link
Collaborator

Sounds like you guys have this in hand. Please take over my patch as I'm not planning to wrap it into a PR and add tests.

craig bot pushed a commit that referenced this issue Jun 7, 2018
26452: importccl: make spans overlap during IMPORT with transform r=mjibson a=mjibson

Force the backup descriptor spans to overlap correctly. This prevents
various extra ranges from being created during RESTORE of data created
with IMPORT with transform. The IMPORT commands will need to be re-run
to generate correct spans in the descriptors.

Fixes #26375

Release note: None

Co-authored-by: Matt Jibson <[email protected]>
@craig craig bot closed this as completed in #26452 Jun 7, 2018
@petermattis
Copy link
Collaborator

@dt, @mjibson I didn't see a PR go by that includes my change. Did you guys decide not to go with that as well? Just want to know if it is still in progress or if we should be rebuilding all of the workload fixtures.

@petermattis petermattis reopened this Jun 7, 2018
@maddyblue
Copy link
Contributor

We decided your change isn't necessary to fix the underlying issue. Regenerate the fixtures and things should be fine.

@petermattis
Copy link
Collaborator

We decided your change isn't necessary to fix the underlying issue. Regenerate the fixtures and things should be fine.

I agree, though there are a crap-ton of fixtures to regenerate. The bank fixtures are 2.1TB in size (and there are many of them). The tpcc fixtures are 15.6TB in size.

@maddyblue
Copy link
Contributor

I can submit another PR with your patch then if you want. Our concern was that it would possibly hide real problems since there shouldn't be anything that generates backup descriptors with non-adjacent spans.

@maddyblue
Copy link
Contributor

Ok I did a test with your patch importing the old data. It kind of works. Original, duplicate ranges for some test data after a RESTORE:

root@:26257/defaultdb> show testing_ranges from table t;
+-----------+---------+----------+----------+--------------+
| Start Key | End Key | Range ID | Replicas | Lease Holder |
+-----------+---------+----------+----------+--------------+
| NULL      | /1/0    |       23 | {1}      |            1 |
| /1/0      | /3      |       28 | {1}      |            1 |
| /3        | /3/0    |       29 | {1}      |            1 |
| /3/0      | /6      |       24 | {1}      |            1 |
| /6        | /6/0    |       30 | {1}      |            1 |
| /6/0      | /9      |       31 | {1}      |            1 |
| /9        | /9/0    |       26 | {1}      |            1 |
| /9/0      | /11     |       32 | {1}      |            1 |
| /11       | NULL    |       33 | {1}      |            1 |
+-----------+---------+----------+----------+--------------+
(9 rows)

Applied your patch, did the same RESTORE:

root@:26257/defaultdb> show testing_ranges from table t;
+-----------+---------+----------+----------+--------------+
| Start Key | End Key | Range ID | Replicas | Lease Holder |
+-----------+---------+----------+----------+--------------+
| NULL      | /1/0    |       33 | {1}      |            1 |
| /1/0      | /3/0    |       41 | {1}      |            1 |
| /3/0      | /6/0    |       45 | {1}      |            1 |
| /6/0      | /9/0    |       42 | {1}      |            1 |
| /9/0      | NULL    |       44 | {1}      |            1 |
+-----------+---------+----------+----------+--------------+
(5 rows)

So we reduced the number of ranges by N/2-1. Not quite the expected N/2 due to the initial NULL - /1/0 range. And in addition all of the ranges have the bogus /0 at the end. I'm not convinced this halfway good experience is worth putting in a patch for this specific problem (IMPORT with transform on data produced before yesterday).

We could make further changes to RESTORE so that it does smarter things, but I'm relatively scared to do this because messing with keys and split locations isn't the safest thing. I don't think that risk outweighs the cost of just regenerating this data, especially if we consider the amount of dev time it'd take to come up with a safe patch.

@petermattis
Copy link
Collaborator

So we reduced the number of ranges by N/2-1. Not quite the expected N/2 due to the initial NULL - /1/0 range. And in addition all of the ranges have the bogus /0 at the end. I'm not convinced this halfway good experience is worth putting in a patch for this specific problem (IMPORT with transform on data produced before yesterday).

The /0 isn't part of any key, but that doesn't make it bogus. The NULL - /1/0 range is expected. You might have keys such as /0 or /-1 and that is the range they would fall in.

We could make further changes to RESTORE so that it does smarter things, but I'm relatively scared to do this because messing with keys and split locations isn't the safest thing. I don't think that risk outweighs the cost of just regenerating this data, especially if we consider the amount of dev time it'd take to come up with a safe patch.

I wouldn't want to do anything beyond my patch. It completely solves the empty range problem for my use cases. I'll defer to your judgement if it is too risky or fragile.

@vivekmenezes
Copy link
Contributor

@mjibson do you want to do anything further here? Thanks

@maddyblue
Copy link
Contributor

Nope, I think we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants