7.3 database migration is slow unless I turn off fsync/fdatasync #4441

vi · 2019-03-27T09:16:09Z

Tribler version/branch+revision:

release-7.3.0-beta1 branch, 65d974c

Operating system and version:

Linux

Steps to reproduce the behavior:

Use 7.2 and earlier for a while, then run 7.3.

Expected behavior:

Database migration is fast enough to complete overnight, even on HDD.

Actual behavior:

It converts only 0.7 batches per second and seems to fully synchonize to disk after every batch.

Disabling filsystem sync globally or using eatmydata bumps the speed up to 60 batches per second.

The text was updated successfully, but these errors were encountered:

vi · 2019-03-27T09:17:02Z

Shall it do more batches per database transaction, committing e.g. every 20 seconds instead of every N entries?

Dmole · 2019-03-28T04:23:05Z

Maybe addressing bloat before migration;

rm -r /tmp/*tribler*
rm -r ~/.Tribler/{collected_*,dlcheckpoints,logs,sqlite/tribler.sdb,sqlite/dispersy.db}
# or this on V7.3.0+ # rm -r .Tribler/{dlcheckpoints,logs}
sqlite3 ~/.Tribler/sqlite/trustchain.db "delete from blocks where insert_time < '$(date --date="30 day ago" +%Y-%m-%d)';vacuum;"

qstokkink · 2019-03-28T07:45:17Z

@Dmole we're hard at work to downsize and speed up. Starting from 7.3 the collected_*,sqlite/tribler.sdb,sqlite/dispersy.db will be deprecated (and will be removed once we have extracted the data - this is causing the long upgrading step).

Furthermore, if you don't want to store any logs, you can edit your logger.conf file not to write to disk.

The dlcheckpoints is needed for our libtorrent dependency, we can't do much about that.

Lastly, trustchain.db "delete from blocks where insert_time < '$(date --date="30 day ago" +%Y-%m-%d)';vacuum;" also removes part of your own blockchain. For now this is allowed, but it will cause you to void your own bandwidth tokens.

Dmole · 2019-03-28T11:27:35Z

Thanks, I noticed the 7.3 improvement; why I have the "# or this on V7.3.0+ ..." line.

dlcheckpoints and old blocks do not apear to be required in V7.3.0B (bandwidth tokens remain, Tribler works)

removing the old blocks dropped the file size from 212MB to 12MB.

qstokkink · 2019-03-28T11:52:04Z

What we store on the blockchain is the change and the running total tokens, something like:

Genesis <- (up:1, down:2, total:-1) <- (up:3, down:0, total:2) <- (up:0, down:1, total:1)

Right now we assume the last delivered block is correct and simply use the running total, so everything works.

However, if/once we actually require a consistent chain and you do not have the previous entries, your tokens cannot be verified. In the last example, if you only present the last block your token count would be -1 instead of 1. This could screw you in the future.

Note that you can filter your own blocks in the sql query, by not deleting SQL entries which have a public_key or link_public_key equal to your own public key.

Dmole · 2019-03-28T14:46:54Z

When the blockchain is finally enforced in some future version all bandwidth tokens will be reset (as they were in the past) to prevent exploitation (because they will be linked to real $) (right?).
So for <=7.3.0-beta1 there is no downside.

qstokkink · 2019-03-28T15:00:22Z

Indeed, short term it's a non-issue.

I'd hate to see anyone in the future blindly copy-pasting that command and voiding their currency though.

devos50 · 2019-03-29T09:31:41Z

When the blockchain is finally enforced in some future version all bandwidth tokens will be reset (as they were in the past) to prevent exploitation (because they will be linked to real $) (right?).

Our goal is indeed to link our bandwidth currency with other currencies, but we are not ready for this yet.

Dmole · 2019-03-29T12:20:49Z

So short term Tribler can be light on storage, but long term is there a plan to make the chain scale?

maybe

1 agree on a checkpoint transaction once a year
2 have each client make a dummy transaction against the annual checkpoint
3 truncate transactions before the second last checkpoint
4 the checkpoint transaction would need to be agreed upon in a distributed fashion, like the first transaction of the UTC year ( pre/post-dated transactions should be invalidated)

That scheme should reliably reduce the chain size from O(time*clients) to O(clients) with the limitation that if you don't use the software for a year your history is lost.

qstokkink · 2019-03-29T12:37:10Z

We actually have a similar checkpoint-based system available to us: #2457

The nasty part is that we then need some form of consensus.

Dmole · 2019-03-29T12:54:25Z

Yeah there are a bunch of related complications that should be worked out before the chain is actually used for anything, but I was just pointing out that getting the chain size down to O(1) should be one of the goals.

ichorid · 2019-03-31T13:58:04Z

@Dmole , if you remove tribler.sdb like that, there will be nothing to migrate from. You will effectively start with a brand-new empty database. I do not recommend doing it this way, though, because you will lose your personal channel (if you had one before) and you will not benefit from having legacy entries in your local DB during the transition period.

Dmole · 2019-03-31T14:09:19Z

@ichorid that's no loss for me because untill tribler gets html channel pages (or a meta search), tribler channels are useless to me. AKA

tor->tpb/rut/etc->tribler

is more practical for the masses.

ichorid · 2019-03-31T15:07:18Z

@Dmole , one of our goals is just that: build a distributed replacement for trackers system. It would be very nice if you describe what you think is necessary for that. We can discuss it in another, more relevant issue: #3615

vi · 2019-03-31T18:07:17Z

Most of discussion here is offtopic anyway.

What about sqlite's (or other database's) synchronisations and transactions? I expect it to be rather easy fix, something like pragma synchronous = off;.

ichorid · 2019-03-31T19:08:58Z

@vi , unfortunately, if we disable synchronization, the DB will become corrupted in case of a sudden power outage. Fixing this will require us to add recovery procedures, which can add to the code complexity.
So, that's not an easy decision.

Dmole · 2019-03-31T19:36:12Z

Addind

cp .Tribler .Tribler-v7.2

might be a good idea anyway to mitigate regressions (would have helped historically). Manual restore is better than nothing. vi how big is/was your .Tribler folder? (unbound space is as bad as unbound time)

vi · 2019-03-31T20:07:39Z

@ichorid , Just increase batch size (or do multiple batches per database transaction), so it takes around 30 seconds. It would preserve syncs, but make them rarer.

Currently on HDD sync time dominates actual calculations.

vi · 2019-03-31T20:10:00Z

how big is/was your .Tribler folder?

After migration:

$ du -sh  ~/.Tribler/*/
0	/home/tribler/.Tribler/channels/
16K	/home/tribler/.Tribler/collected_metadata/
4.5G	/home/tribler/.Tribler/collected_torrents/
4.2M	/home/tribler/.Tribler/dlcheckpoints/
0	/home/tribler/.Tribler/icons/
1.0G	/home/tribler/.Tribler/sqlite/
0	/home/tribler/.Tribler/wallet/

Heavy-weight directories are symlinked to separate xfs filesystem.

ichorid · 2019-03-31T20:23:41Z

@vi , the batch size is determined dynamically, so processing of each batch will never take more than 0.5 seconds. When we increase the batch size to last more then 0.5 seconds, the Twisted reactor becomes unresponsive, the network packets get dropped, etc. Migrating ia 700 Mb Tribler database on a fast SSD takes about 2 hours in background mode (with <0.5 sec batches), and only 10 minutes if everything is processed offline in a single batch.

So, it is a tough choice: would we force our users to wait for 10-∞ minutes staring at the "Spinngin Gears" screen, or allow them to use the new version of Tribler instantly, but with annoying "Converting" message for 2-∞ hours?

Dmole · 2019-03-31T20:26:13Z

There are other choices (backup + no-sync, or trim bloat first).

I thought collected_torrents was dead #3960
(an app requiring you to symlink folders for it to work is not user friendly)
1GB of db is pushing it, so Tribler should address DB size before it gets out of hand anyway.

vi · 2019-03-31T20:28:11Z

That 2 hours can actually span multiple days. And it is bad if Tribler can't migrade database overnight.

One way is to measure sync time and adjust batch size, so that sync time does not dominate.

Twisted reactor becomes unresponsive

Can't it be done in background? Or enlarge batches if there are no user interactions for a while.

10-∞ minutes staring at the "Spinngin Gears"

Bad idea. Without a progress bar it would be as if Tribler simple does not work.

vi · 2019-03-31T20:30:09Z

What about using two SQlite databases for the migration? Until migration is done, the new database is in unsynced unsafe mode. In case of corruption due to surprice shutdown migration can be just restarted from old database. Upon completion, new database replaces the old and syncing is enabled on it.

vi · 2019-03-31T20:34:20Z

I thought collected_torrents was dead #3960

It is indeed looks unused. Is it safe to be deleted or its content may be used for tests, etc.?

ichorid · 2019-03-31T21:00:26Z

@Dmole , collected_torrents are dead, indeed. We only convert the bare minimum metadata from tribler.sdb that fits the new format. And there is no way to tell apart the good torrents from the bad ones, except for looking at the torrent health. Each torrent health request takes 5-15 seconds, so we'll have to pipeline thousands of these per second.

Can it be done in background

It is already done in a background thread. The problem is, writing to the database locks it for the main thread.

One way is to measure sync time and adjust batch size, so that sync time does not dominate.

As I have written above, we already dynamically adjust the batch size so it gets to the maximum size that does not affect other stuff running on the reactor. Targeting for efficiency instead is useless, since if the efficiency threshold is higher than the interactivity threshold, Tribler core becomes unresponsive, and if it is less than the interactivity threshold - the interactivity threshold will still offer more efficiency.

What about using two SQlite databases for the migration?

This is exactly how it works now: we open the old database read-only, and just create the corresponding entries in the new one. Unfortunately, the process is dominated by disk synchronization. If we turn off the synchronization, we face the possibility of corruption of new DB on power outage. In addition, we'll have to force Tribler restart at the end of the conversion process to turn it on again, which we would rather avoid.

As I told you, the trade-offs are complex, and we have not decided yet how to do this. Basically, there are two ways of doing it:

Do everything in the background (that's what we do now). Pros: immediate Tribler experience. Cons: takes a very long time (but will be restarted from a checkpoint in case Tribler is closed during the process!)
Do everything upfront at the first start of Tribler, showing the progress bar and "skip conversion" button. Pros: very fast (can even use synchronous = off!). Cons: impatient users can become frustrated by lack of content when they click the "skip" button.

vi · 2019-03-31T21:06:06Z

possibility of corruption of new DB on power outage
(two databases) This is exactly how it works now

Until old readonly database is deleted, new database is not serious and important. Corruption is handleable by restarting migration.

Anyways, it may be nice if there is a "turbo mode" button on migration banner for turning off synchonisation (after accepting confirmations, which state about power outages). This is how I personally handled this (using external means for turning off sync).

ichorid · 2019-03-31T21:17:18Z

Maybe we should really opt for "offline migration + progress bar + skip button" thing. I would like to hear other team members' opinion on that. @qstokkink, @xoriole , @devos50 ?

Dmole · 2019-03-31T21:22:22Z

Online is better.

ichorid, vi and I used 2 diffrent solutions not on your "straw man" alternatives list; likely others will encounter this issue when 7.3 leaves beta, so if it's to much trouble to improve the code it may be helpful to list the alternatives in the release notes.

vi · 2019-03-31T21:40:03Z

I'm not sure how easy is to opt out syncing externally on Windows and Mac.

ichorid · 2019-03-31T21:41:41Z

@Dmole , you are definitely right that we should put the comment about alternative options in the release notes. However, we would like to come to a solution that will be acceptable for the majority of our users, who are non-programmers. We cannot rely on them reading (and understanding!) the documentation.

What could look like a "straw man" to a technical person, can look quite differently for an ordinary user.

vi · 2019-03-31T21:42:01Z

Is it by the way a good idea to ask confirmation before migrating database to new version when starting non-release (devel or beta) Tribler? Non-aggreement may just quit.

ichorid · 2019-04-01T04:53:12Z

@vi , we want our beta to be as close as possible to a real end-user experience. One of the more controversial questions was just that: the unavoidable migration procedure. And, in this topic, we're getting the necessary feedback.

xoriole added the type: enhancement label Mar 27, 2019

xoriole added this to the V7.3: Gigachannels milestone Mar 27, 2019

ichorid mentioned this issue May 3, 2019

READY: Foreground DB upgrader #4480

Merged

qstokkink assigned ichorid May 15, 2019

ichorid closed this as completed in #4480 May 15, 2019

qstokkink mentioned this issue Aug 10, 2020

Redesign of the Search/Channels feature #3615

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7.3 database migration is slow unless I turn off fsync/fdatasync #4441

7.3 database migration is slow unless I turn off fsync/fdatasync #4441

vi commented Mar 27, 2019

vi commented Mar 27, 2019

Dmole commented Mar 28, 2019

qstokkink commented Mar 28, 2019

Dmole commented Mar 28, 2019

qstokkink commented Mar 28, 2019

Dmole commented Mar 28, 2019 •

edited

Loading

qstokkink commented Mar 28, 2019

devos50 commented Mar 29, 2019

Dmole commented Mar 29, 2019

qstokkink commented Mar 29, 2019

Dmole commented Mar 29, 2019

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019

ichorid commented Mar 31, 2019 •

edited

Loading

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019

vi commented Mar 31, 2019

vi commented Mar 31, 2019 •

edited

Loading

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019

vi commented Mar 31, 2019

vi commented Mar 31, 2019

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019 •

edited

Loading

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

vi commented Mar 31, 2019 •

edited

Loading

ichorid commented Apr 1, 2019

7.3 database migration is slow unless I turn off fsync/fdatasync #4441

7.3 database migration is slow unless I turn off fsync/fdatasync #4441

Comments

vi commented Mar 27, 2019

Tribler version/branch+revision:

Operating system and version:

Steps to reproduce the behavior:

Expected behavior:

Actual behavior:

vi commented Mar 27, 2019

Dmole commented Mar 28, 2019

qstokkink commented Mar 28, 2019

Dmole commented Mar 28, 2019

qstokkink commented Mar 28, 2019

Dmole commented Mar 28, 2019 • edited Loading

qstokkink commented Mar 28, 2019

devos50 commented Mar 29, 2019

Dmole commented Mar 29, 2019

qstokkink commented Mar 29, 2019

Dmole commented Mar 29, 2019

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019

ichorid commented Mar 31, 2019 • edited Loading

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019

vi commented Mar 31, 2019

vi commented Mar 31, 2019 • edited Loading

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019

vi commented Mar 31, 2019

vi commented Mar 31, 2019

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

Dmole commented Mar 31, 2019 • edited Loading

vi commented Mar 31, 2019

ichorid commented Mar 31, 2019

vi commented Mar 31, 2019 • edited Loading

ichorid commented Apr 1, 2019

Dmole commented Mar 28, 2019 •

edited

Loading

ichorid commented Mar 31, 2019 •

edited

Loading

vi commented Mar 31, 2019 •

edited

Loading

Dmole commented Mar 31, 2019 •

edited

Loading

vi commented Mar 31, 2019 •

edited

Loading