Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do in batches suggestion. #2

Closed
wants to merge 3 commits into from

Conversation

JoeGruffins
Copy link

No description provided.

@@ -74,7 +79,7 @@ func (db *BoltDB) upgradeDB() error {
return nil
}

db.log.Infof("Upgrading database from version %d to %d", version, DBVersion)
fmt.Printf("Upgrading database from version %d to %d\n", version, DBVersion)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Srry, got sloppy with the logs. Will fix if looks like something you can use.

Copy link
Owner

@chappjc chappjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great. I just wonder if we can tolerate a rollback that will leave an update partially completed. I'd argue this (a) has little potential to error if the first Update chunk succeeded and (b) we have a backup file created immediately prior that we can use.

This looks like a good solution to me anyway. May merge this in to the dcrdex PR and tweak a few thing like the logs.

@chappjc
Copy link
Owner

chappjc commented Apr 28, 2021

@JoeGruffins I pushed another commit to the db-nocopy branch and made conflicts. I did a resolve and got the following, for guidance: https://github.com/chappjc/dcrdex/tree/upgradebatches

At this point though, I'm perplexed by the need for the reloadMatchProof upgrades since it doesn't change what's loaded subsequently because DecodeMatchProof returns the same result for any encoding version (so far).

Anyway, with your PR rebased on the updated upstream PR:

$    go test -v -count 1 -run TestUpgradeDB -memprofile=mem.out
=== RUN   TestUpgradeDB
Upgrading database from version 1 to 3
2021-04-28 11:39:34.276 [DBG] db_TEST: Upgrading to version 2...
Adding max fee rates to orders in the database.  This may take a while...
Updated 20000 entries (20000 total)
Updated 20000 entries (40000 total)
Updated 20000 entries (60000 total)
Updated 20000 entries (80000 total)
Updated 20000 entries (100000 total)
Updated 20000 entries (120000 total)
Updated 20000 entries (140000 total)
Updated 20000 entries (160000 total)
Updated 20000 entries (180000 total)
Updated 20000 entries (200000 total)
Updated 20000 entries (220000 total)
Updated 20000 entries (240000 total)
Updated 20000 entries (260000 total)
Updated 20000 entries (280000 total)
Updated 20000 entries (300000 total)
Done updating orders.  Total entries: 300598 in 978ms
2021-04-28 11:39:35.255 [DBG] db_TEST: Upgrading to version 3...
Reloading match proofs in the database.  This may take a while...
Done updating match proofs.  Total entries: 578 in 768ms
--- PASS: TestUpgradeDB (4.17s)
PASS
ok  	decred.org/dcrdex/client/db/bolt	4.268s

profile:

File: bolt.test
Type: alloc_space
Time: Apr 28, 2021 at 11:39am (CDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 1582.42MB, 99.43% of 1591.46MB total
Dropped 9 nodes (cum <= 7.96MB)
Showing top 20 nodes out of 37
      flat  flat%   sum%        cum   cum%
  446.15MB 28.03% 28.03%   446.15MB 28.03%  go.etcd.io/bbolt.(*node).put
  296.67MB 18.64% 46.68%   351.17MB 22.07%  go.etcd.io/bbolt.(*Bucket).openBucket
  228.68MB 14.37% 61.04%   228.68MB 14.37%  go.etcd.io/bbolt.(*node).read
  173.60MB 10.91% 71.95%   173.60MB 10.91%  decred.org/dcrdex/dex/encode.ExtractPushes
  121.01MB  7.60% 79.56%   121.01MB  7.60%  go.etcd.io/bbolt.(*Cursor).search
  105.54MB  6.63% 86.19%   105.54MB  6.63%  decred.org/dcrdex/client/db.decodeMatchProof_v2
   74.76MB  4.70% 90.89%   522.94MB 32.86%  go.etcd.io/bbolt.(*Bucket).Bucket
   73.51MB  4.62% 95.50%   302.19MB 18.99%  go.etcd.io/bbolt.(*Bucket).node
   54.50MB  3.42% 98.93%    54.50MB  3.42%  go.etcd.io/bbolt.newBucket (inline)
    4.50MB  0.28% 99.21%   756.83MB 47.56%  go.etcd.io/bbolt.(*Bucket).Put
    3.50MB  0.22% 99.43%    14.05MB  0.88%  go.etcd.io/bbolt.(*Bucket).spill
         0     0% 99.43%   279.14MB 17.54%  decred.org/dcrdex/client/db.DecodeMatchProof
         0     0% 99.43%   105.54MB  6.63%  decred.org/dcrdex/client/db.decodeMatchProof_v1 (inline)
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.(*BoltDB).upgradeDB
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.NewDB
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.TestUpgradeDB
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.TestUpgradeDB.func1
         0     0% 99.43%  1591.46MB   100%  decred.org/dcrdex/client/db/bolt.doUpgrade
         0     0% 99.43%   582.16MB 36.58%  decred.org/dcrdex/client/db/bolt.reloadMatchProofs
         0     0% 99.43%   582.16MB 36.58%  decred.org/dcrdex/client/db/bolt.reloadMatchProofs.func1

I'm just still unsure about the safety of upgrading in separate batches without ability to either roll all the way back or resume the upgrade if it gets interrupted.

fmt.Printf("Updated %d entries (%d total)\n", numUpdated, totalUpdated)
continue
}
return err
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hit here, that means the user has to restore their db backup right? Perhaps a message.

@chappjc
Copy link
Owner

chappjc commented Apr 28, 2021

I've since pushed a second commit, so yet more conflicts probably. We're trying 1070 on a big DB to see if we need to take make the batch changes as in this PR. If we do go this route, I think we'll at least need clear recovery messages about the backup db, an automatic restore of the backup file, or more sophisticated upgrade resumption like Dave described in the dexdev chan. I hope none of it is necessary though because it'll get hairy. Thanks for looking into this issue @JoeGruffins!

EDIT: I also had to rebase 1070 on release-v0.2 for some easier testing.

@JoeGruffins
Copy link
Author

Feel free to close if not going to use.

I don't know exactly how bolt works, but if all changes for a tx are saved in memory and done at once, or changes are done incrementally and the original is saved for rollback, it would make sense that txs with a lot of changes would use a lot of memory, and doing everything with one transaction will oom at some point.

I really don't know though.

@chappjc
Copy link
Owner

chappjc commented Apr 29, 2021

I really don't know either. We may very well need this approach. Let's just hang on to it for now.

Reports are the most challenging DBs so far have managed the upgrade with the upstream changes, but clearly we have scaling issues. Thanks for this work.

@chappjc chappjc force-pushed the db-nocopy branch 2 times, most recently from 61190fd to bbd3bfd Compare April 29, 2021 03:29
@chappjc chappjc closed this Apr 29, 2021
@chappjc chappjc deleted the branch chappjc:db-nocopy April 29, 2021 03:52
@chappjc
Copy link
Owner

chappjc commented Apr 29, 2021

Upstream branch got deleted so this PR auto-closed, but this is still on the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants