Do in batches suggestion. #2

JoeGruffins · 2021-04-28T03:12:12Z

No description provided.

JoeGruffins · 2021-04-28T03:16:39Z

client/db/bolt/upgrades.go

@@ -74,7 +79,7 @@ func (db *BoltDB) upgradeDB() error {
 		return nil
 	}

-	db.log.Infof("Upgrading database from version %d to %d", version, DBVersion)
+	fmt.Printf("Upgrading database from version %d to %d\n", version, DBVersion)


Srry, got sloppy with the logs. Will fix if looks like something you can use.

chappjc

This all looks great. I just wonder if we can tolerate a rollback that will leave an update partially completed. I'd argue this (a) has little potential to error if the first Update chunk succeeded and (b) we have a backup file created immediately prior that we can use.

This looks like a good solution to me anyway. May merge this in to the dcrdex PR and tweak a few thing like the logs.

chappjc · 2021-04-28T16:57:41Z

@JoeGruffins I pushed another commit to the db-nocopy branch and made conflicts. I did a resolve and got the following, for guidance: https://github.com/chappjc/dcrdex/tree/upgradebatches

At this point though, I'm perplexed by the need for the reloadMatchProof upgrades since it doesn't change what's loaded subsequently because DecodeMatchProof returns the same result for any encoding version (so far).

Anyway, with your PR rebased on the updated upstream PR:

$    go test -v -count 1 -run TestUpgradeDB -memprofile=mem.out
=== RUN   TestUpgradeDB
Upgrading database from version 1 to 3
2021-04-28 11:39:34.276 [DBG] db_TEST: Upgrading to version 2...
Adding max fee rates to orders in the database.  This may take a while...
Updated 20000 entries (20000 total)
Updated 20000 entries (40000 total)
Updated 20000 entries (60000 total)
Updated 20000 entries (80000 total)
Updated 20000 entries (100000 total)
Updated 20000 entries (120000 total)
Updated 20000 entries (140000 total)
Updated 20000 entries (160000 total)
Updated 20000 entries (180000 total)
Updated 20000 entries (200000 total)
Updated 20000 entries (220000 total)
Updated 20000 entries (240000 total)
Updated 20000 entries (260000 total)
Updated 20000 entries (280000 total)
Updated 20000 entries (300000 total)
Done updating orders.  Total entries: 300598 in 978ms
2021-04-28 11:39:35.255 [DBG] db_TEST: Upgrading to version 3...
Reloading match proofs in the database.  This may take a while...
Done updating match proofs.  Total entries: 578 in 768ms
--- PASS: TestUpgradeDB (4.17s)
PASS
ok  	decred.org/dcrdex/client/db/bolt	4.268s

profile:

File: bolt.test
Type: alloc_space
Time: Apr 28, 2021 at 11:39am (CDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 1582.42MB, 99.43% of 1591.46MB total
Dropped 9 nodes (cum <= 7.96MB)
Showing top 20 nodes out of 37
      flat  flat%   sum%        cum   cum%
  446.15MB 28.03% 28.03%   446.15MB 28.03%  go.etcd.io/bbolt.(*node).put
  296.67MB 18.64% 46.68%   351.17MB 22.07%  go.etcd.io/bbolt.(*Bucket).openBucket
  228.68MB 14.37% 61.04%   228.68MB 14.37%  go.etcd.io/bbolt.(*node).read
  173.60MB 10.91% 71.95%   173.60MB 10.91%  decred.org/dcrdex/dex/encode.ExtractPushes
  121.01MB  7.60% 79.56%   121.01MB  7.60%  go.etcd.io/bbolt.(*Cursor).search
  105.54MB  6.63% 86.19%   105.54MB  6.63%  decred.org/dcrdex/client/db.decodeMatchProof_v2
   74.76MB  4.70% 90.89%   522.94MB 32.86%  go.etcd.io/bbolt.(*Bucket).Bucket
   73.51MB  4.62% 95.50%   302.19MB 18.99%  go.etcd.io/bbolt.(*Bucket).node
   54.50MB  3.42% 98.93%    54.50MB  3.42%  go.etcd.io/bbolt.newBucket (inline)
    4.50MB  0.28% 99.21%   756.83MB 47.56%  go.etcd.io/bbolt.(*Bucket).Put
    3.50MB  0.22% 99.43%    14.05MB  0.88%  go.etcd.io/bbolt.(*Bucket).spill
         0     0% 99.43%   279.14MB 17.54%  decred.org/dcrdex/client/db.DecodeMatchProof
         0     0% 99.43%   105.54MB  6.63%  decred.org/dcrdex/client/db.decodeMatchProof_v1 (inline)
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.(*BoltDB).upgradeDB
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.NewDB
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.TestUpgradeDB
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.TestUpgradeDB.func1
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.doUpgrade
         0     0% 99.43%   582.16MB 36.58%  decred.org/dcrdex/client/db/bolt.reloadMatchProofs
         0     0% 99.43%   582.16MB 36.58%  decred.org/dcrdex/client/db/bolt.reloadMatchProofs.func1

I'm just still unsure about the safety of upgrading in separate batches without ability to either roll all the way back or resume the upgrade if it gets interrupted.

chappjc · 2021-04-28T17:15:20Z

client/db/bolt/upgrades.go

+				fmt.Printf("Updated %d entries (%d total)\n", numUpdated, totalUpdated)
+				continue
+			}
+			return err


If we hit here, that means the user has to restore their db backup right? Perhaps a message.

chappjc · 2021-04-28T19:36:37Z

I've since pushed a second commit, so yet more conflicts probably. We're trying 1070 on a big DB to see if we need to take make the batch changes as in this PR. If we do go this route, I think we'll at least need clear recovery messages about the backup db, an automatic restore of the backup file, or more sophisticated upgrade resumption like Dave described in the dexdev chan. I hope none of it is necessary though because it'll get hairy. Thanks for looking into this issue @JoeGruffins!

EDIT: I also had to rebase 1070 on release-v0.2 for some easier testing.

JoeGruffins · 2021-04-29T00:13:50Z

Feel free to close if not going to use.

I don't know exactly how bolt works, but if all changes for a tx are saved in memory and done at once, or changes are done incrementally and the original is saved for rollback, it would make sense that txs with a lot of changes would use a lot of memory, and doing everything with one transaction will oom at some point.

I really don't know though.

chappjc · 2021-04-29T00:30:41Z

I really don't know either. We may very well need this approach. Let's just hang on to it for now.

Reports are the most challenging DBs so far have managed the upgrade with the upstream changes, but clearly we have scaling issues. Thanks for this work.

chappjc · 2021-04-29T04:19:19Z

Upstream branch got deleted so this PR auto-closed, but this is still on the table.

chappjc and others added 3 commits April 26, 2021 15:18

client/db/bolt: omit copy of buffer used only in db txns

cdb18b3

try each upgrade in its own tx

914d140

Do in batches suggestion.

e79326b

JoeGruffins mentioned this pull request Apr 28, 2021

client/db/bolt: omit copy of buffer and upgrade in separate db txns decred/dcrdex#1070

Merged

JoeGruffins commented Apr 28, 2021

View reviewed changes

chappjc approved these changes Apr 28, 2021

View reviewed changes

chappjc reviewed Apr 28, 2021

View reviewed changes

chappjc force-pushed the db-nocopy branch from 68a6cb3 to 2ba2aeb Compare April 28, 2021 19:42

chappjc force-pushed the db-nocopy branch 2 times, most recently from 61190fd to bbd3bfd Compare April 29, 2021 03:29

chappjc closed this Apr 29, 2021

chappjc deleted the branch chappjc:db-nocopy April 29, 2021 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do in batches suggestion. #2

Do in batches suggestion. #2

JoeGruffins commented Apr 28, 2021

JoeGruffins Apr 28, 2021

chappjc left a comment

chappjc commented Apr 28, 2021

chappjc Apr 28, 2021

chappjc commented Apr 28, 2021 •

edited

Loading

JoeGruffins commented Apr 29, 2021

chappjc commented Apr 29, 2021

chappjc commented Apr 29, 2021

Do in batches suggestion. #2

Do in batches suggestion. #2

Conversation

JoeGruffins commented Apr 28, 2021

JoeGruffins Apr 28, 2021

Choose a reason for hiding this comment

chappjc left a comment

Choose a reason for hiding this comment

chappjc commented Apr 28, 2021

chappjc Apr 28, 2021

Choose a reason for hiding this comment

chappjc commented Apr 28, 2021 • edited Loading

JoeGruffins commented Apr 29, 2021

chappjc commented Apr 29, 2021

chappjc commented Apr 29, 2021

chappjc commented Apr 28, 2021 •

edited

Loading