Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.3 database migration is slow unless I turn off fsync/fdatasync #4441

Closed
vi opened this issue Mar 27, 2019 · 32 comments · Fixed by #4480
Closed

7.3 database migration is slow unless I turn off fsync/fdatasync #4441

vi opened this issue Mar 27, 2019 · 32 comments · Fixed by #4480

Comments

@vi
Copy link
Contributor

vi commented Mar 27, 2019

Tribler version/branch+revision:

release-7.3.0-beta1 branch, 65d974c

Operating system and version:

Linux

Steps to reproduce the behavior:

Use 7.2 and earlier for a while, then run 7.3.

Expected behavior:

Database migration is fast enough to complete overnight, even on HDD.

Actual behavior:

It converts only 0.7 batches per second and seems to fully synchonize to disk after every batch.

Disabling filsystem sync globally or using eatmydata bumps the speed up to 60 batches per second.

@vi
Copy link
Contributor Author

vi commented Mar 27, 2019

Shall it do more batches per database transaction, committing e.g. every 20 seconds instead of every N entries?

@Dmole
Copy link
Contributor

Dmole commented Mar 28, 2019

Maybe addressing bloat before migration;

rm -r /tmp/*tribler*
rm -r ~/.Tribler/{collected_*,dlcheckpoints,logs,sqlite/tribler.sdb,sqlite/dispersy.db}
# or this on V7.3.0+ # rm -r .Tribler/{dlcheckpoints,logs}
sqlite3 ~/.Tribler/sqlite/trustchain.db "delete from blocks where insert_time < '$(date --date="30 day ago" +%Y-%m-%d)';vacuum;"

@qstokkink
Copy link
Contributor

@Dmole we're hard at work to downsize and speed up. Starting from 7.3 the collected_*,sqlite/tribler.sdb,sqlite/dispersy.db will be deprecated (and will be removed once we have extracted the data - this is causing the long upgrading step).

Furthermore, if you don't want to store any logs, you can edit your logger.conf file not to write to disk.

The dlcheckpoints is needed for our libtorrent dependency, we can't do much about that.

Lastly, trustchain.db "delete from blocks where insert_time < '$(date --date="30 day ago" +%Y-%m-%d)';vacuum;" also removes part of your own blockchain. For now this is allowed, but it will cause you to void your own bandwidth tokens.

@Dmole
Copy link
Contributor

Dmole commented Mar 28, 2019

Thanks, I noticed the 7.3 improvement; why I have the "# or this on V7.3.0+ ..." line.

dlcheckpoints and old blocks do not apear to be required in V7.3.0B (bandwidth tokens remain, Tribler works)

removing the old blocks dropped the file size from 212MB to 12MB.

@qstokkink
Copy link
Contributor

What we store on the blockchain is the change and the running total tokens, something like:

Genesis <- (up:1, down:2, total:-1) <- (up:3, down:0, total:2) <- (up:0, down:1, total:1)

Right now we assume the last delivered block is correct and simply use the running total, so everything works.

However, if/once we actually require a consistent chain and you do not have the previous entries, your tokens cannot be verified. In the last example, if you only present the last block your token count would be -1 instead of 1. This could screw you in the future.

Note that you can filter your own blocks in the sql query, by not deleting SQL entries which have a public_key or link_public_key equal to your own public key.

@Dmole
Copy link
Contributor

Dmole commented Mar 28, 2019

When the blockchain is finally enforced in some future version all bandwidth tokens will be reset (as they were in the past) to prevent exploitation (because they will be linked to real $) (right?).
So for <=7.3.0-beta1 there is no downside.

@qstokkink
Copy link
Contributor

Indeed, short term it's a non-issue.

I'd hate to see anyone in the future blindly copy-pasting that command and voiding their currency though.

@devos50
Copy link
Contributor

devos50 commented Mar 29, 2019

When the blockchain is finally enforced in some future version all bandwidth tokens will be reset (as they were in the past) to prevent exploitation (because they will be linked to real $) (right?).

Our goal is indeed to link our bandwidth currency with other currencies, but we are not ready for this yet.

@Dmole
Copy link
Contributor

Dmole commented Mar 29, 2019

So short term Tribler can be light on storage, but long term is there a plan to make the chain scale?

maybe

1 agree on a checkpoint transaction once a year
2 have each client make a dummy transaction against the annual checkpoint
3 truncate transactions before the second last checkpoint
4 the checkpoint transaction would need to be agreed upon in a distributed fashion, like the first transaction of the UTC year ( pre/post-dated transactions should be invalidated)

That scheme should reliably reduce the chain size from O(time*clients) to O(clients) with the limitation that if you don't use the software for a year your history is lost.

@qstokkink
Copy link
Contributor

We actually have a similar checkpoint-based system available to us: #2457

The nasty part is that we then need some form of consensus.

@Dmole
Copy link
Contributor

Dmole commented Mar 29, 2019

Yeah there are a bunch of related complications that should be worked out before the chain is actually used for anything, but I was just pointing out that getting the chain size down to O(1) should be one of the goals.

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

@Dmole , if you remove tribler.sdb like that, there will be nothing to migrate from. You will effectively start with a brand-new empty database. I do not recommend doing it this way, though, because you will lose your personal channel (if you had one before) and you will not benefit from having legacy entries in your local DB during the transition period.

@Dmole
Copy link
Contributor

Dmole commented Mar 31, 2019

@ichorid that's no loss for me because untill tribler gets html channel pages (or a meta search), tribler channels are useless to me. AKA

tor->tpb/rut/etc->tribler

is more practical for the masses.

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

@Dmole , one of our goals is just that: build a distributed replacement for trackers system. It would be very nice if you describe what you think is necessary for that. We can discuss it in another, more relevant issue: #3615

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

Most of discussion here is offtopic anyway.

What about sqlite's (or other database's) synchronisations and transactions? I expect it to be rather easy fix, something like pragma synchronous = off;.

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

@vi , unfortunately, if we disable synchronization, the DB will become corrupted in case of a sudden power outage. Fixing this will require us to add recovery procedures, which can add to the code complexity.
So, that's not an easy decision.

@Dmole
Copy link
Contributor

Dmole commented Mar 31, 2019

Addind

cp .Tribler .Tribler-v7.2

might be a good idea anyway to mitigate regressions (would have helped historically). Manual restore is better than nothing. vi how big is/was your .Tribler folder? (unbound space is as bad as unbound time)

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

@ichorid , Just increase batch size (or do multiple batches per database transaction), so it takes around 30 seconds. It would preserve syncs, but make them rarer.

Currently on HDD sync time dominates actual calculations.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

how big is/was your .Tribler folder?

After migration:

$ du -sh  ~/.Tribler/*/
0	/home/tribler/.Tribler/channels/
16K	/home/tribler/.Tribler/collected_metadata/
4.5G	/home/tribler/.Tribler/collected_torrents/
4.2M	/home/tribler/.Tribler/dlcheckpoints/
0	/home/tribler/.Tribler/icons/
1.0G	/home/tribler/.Tribler/sqlite/
0	/home/tribler/.Tribler/wallet/

Heavy-weight directories are symlinked to separate xfs filesystem.

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

@vi , the batch size is determined dynamically, so processing of each batch will never take more than 0.5 seconds. When we increase the batch size to last more then 0.5 seconds, the Twisted reactor becomes unresponsive, the network packets get dropped, etc. Migrating ia 700 Mb Tribler database on a fast SSD takes about 2 hours in background mode (with <0.5 sec batches), and only 10 minutes if everything is processed offline in a single batch.

So, it is a tough choice: would we force our users to wait for 10-∞ minutes staring at the "Spinngin Gears" screen, or allow them to use the new version of Tribler instantly, but with annoying "Converting" message for 2-∞ hours?

@Dmole
Copy link
Contributor

Dmole commented Mar 31, 2019

There are other choices (backup + no-sync, or trim bloat first).

I thought collected_torrents was dead #3960
(an app requiring you to symlink folders for it to work is not user friendly)
1GB of db is pushing it, so Tribler should address DB size before it gets out of hand anyway.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

That 2 hours can actually span multiple days. And it is bad if Tribler can't migrade database overnight.

One way is to measure sync time and adjust batch size, so that sync time does not dominate.

Twisted reactor becomes unresponsive

Can't it be done in background? Or enlarge batches if there are no user interactions for a while.

10-∞ minutes staring at the "Spinngin Gears"

Bad idea. Without a progress bar it would be as if Tribler simple does not work.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

What about using two SQlite databases for the migration? Until migration is done, the new database is in unsynced unsafe mode. In case of corruption due to surprice shutdown migration can be just restarted from old database. Upon completion, new database replaces the old and syncing is enabled on it.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

I thought collected_torrents was dead #3960

It is indeed looks unused. Is it safe to be deleted or its content may be used for tests, etc.?

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

@Dmole , collected_torrents are dead, indeed. We only convert the bare minimum metadata from tribler.sdb that fits the new format. And there is no way to tell apart the good torrents from the bad ones, except for looking at the torrent health. Each torrent health request takes 5-15 seconds, so we'll have to pipeline thousands of these per second.

Can it be done in background

It is already done in a background thread. The problem is, writing to the database locks it for the main thread.

One way is to measure sync time and adjust batch size, so that sync time does not dominate.

As I have written above, we already dynamically adjust the batch size so it gets to the maximum size that does not affect other stuff running on the reactor. Targeting for efficiency instead is useless, since if the efficiency threshold is higher than the interactivity threshold, Tribler core becomes unresponsive, and if it is less than the interactivity threshold - the interactivity threshold will still offer more efficiency.

What about using two SQlite databases for the migration?

This is exactly how it works now: we open the old database read-only, and just create the corresponding entries in the new one. Unfortunately, the process is dominated by disk synchronization. If we turn off the synchronization, we face the possibility of corruption of new DB on power outage. In addition, we'll have to force Tribler restart at the end of the conversion process to turn it on again, which we would rather avoid.

As I told you, the trade-offs are complex, and we have not decided yet how to do this. Basically, there are two ways of doing it:

  1. Do everything in the background (that's what we do now). Pros: immediate Tribler experience. Cons: takes a very long time (but will be restarted from a checkpoint in case Tribler is closed during the process!)
  2. Do everything upfront at the first start of Tribler, showing the progress bar and "skip conversion" button. Pros: very fast (can even use synchronous = off!). Cons: impatient users can become frustrated by lack of content when they click the "skip" button.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

possibility of corruption of new DB on power outage
(two databases) This is exactly how it works now

Until old readonly database is deleted, new database is not serious and important. Corruption is handleable by restarting migration.

Anyways, it may be nice if there is a "turbo mode" button on migration banner for turning off synchonisation (after accepting confirmations, which state about power outages). This is how I personally handled this (using external means for turning off sync).

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

Maybe we should really opt for "offline migration + progress bar + skip button" thing. I would like to hear other team members' opinion on that. @qstokkink, @xoriole , @devos50 ?

@Dmole
Copy link
Contributor

Dmole commented Mar 31, 2019

Online is better.

ichorid, vi and I used 2 diffrent solutions not on your "straw man" alternatives list; likely others will encounter this issue when 7.3 leaves beta, so if it's to much trouble to improve the code it may be helpful to list the alternatives in the release notes.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

I'm not sure how easy is to opt out syncing externally on Windows and Mac.

@ichorid
Copy link
Contributor

ichorid commented Mar 31, 2019

@Dmole , you are definitely right that we should put the comment about alternative options in the release notes. However, we would like to come to a solution that will be acceptable for the majority of our users, who are non-programmers. We cannot rely on them reading (and understanding!) the documentation.

What could look like a "straw man" to a technical person, can look quite differently for an ordinary user.

@vi
Copy link
Contributor Author

vi commented Mar 31, 2019

Is it by the way a good idea to ask confirmation before migrating database to new version when starting non-release (devel or beta) Tribler? Non-aggreement may just quit.

@ichorid
Copy link
Contributor

ichorid commented Apr 1, 2019

@vi , we want our beta to be as close as possible to a real end-user experience. One of the more controversial questions was just that: the unavoidable migration procedure. And, in this topic, we're getting the necessary feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants