-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce IO activity from 2.4 GByte/hour #564
Comments
Tribler is doing way too much database activities (about 600 to 1000 times per minute when idle). I'll check if we can optimize it. |
12May2014: 538 MByte/hour (b8ec22b branch). |
I'm guessing the proposed solution is to group transactions? |
@brandc thnx for reviving this issue. it's been a whole year since I've measured the IO performance. We don't know where this is coming from. |
I certainly hope it gets fixed somewhat soon, then I don't have to run Tribler on ram disk with scripts to back it up periodically. |
@synctext how did you get to 538 MB/hour? If I add the write_bytes and read_bytes and divide that by 1024*1024 I end up with 8127,37 MB. Assuming TIME+ 2h 41min = 161 minutes, then I obtain 3028,83 MB/hour. So I guess this ran took around 16h? But I cannot find that in your screenshot. |
Using my laptop, I have ran Tribler (with MultiChain crawler enabled) for 2 hours. I have captured the total io using From the screenshot above, I see that only around ~70-75 MB/hour is done in I/O. I did use my laptop which is only an i5, with 14mbit internet. But I guess Trible running idle won't do 1.5MB/s. All in all I think in the above screenshots the amount of I/O was divided by the |
interesting. Try to reproduce old results the with older code, like, b8ec22b Please ensure you are connectable on several ports (dht,libtorrent,dispersy). |
plus: f2889f4 Nice experiment to do IO intensity over 6 different versions. Then the 7th Laurens version. Plot X axis the date of release, Yaxis the IO per hour, for instance. Each just 1 hour run with clean megacache and also plot collected torrents + AllChannel votes (easy in debug panel with 'other stuff option' on). CPU user time also nice evolution in time to plot for 7 versions. |
AFAIK I was connectable on all ports. Since I am not seeding nor downloading libtorrent should not do much if anything. I will try that particular commit tomorrow. Will be interesting to observe the amount of I/O there. Regarding the plot, that might be indeed interesting. I'll try to do that tomorrow. |
I have managed to get Tribler 6.3.5 working on the ubuntu 15.10 machine. It looks like the IO numbers mentioned above are accurate, 10 minutes in and already 300MB of disk writes have been done. May actually be higher than 600MB/hour. Wow. Edit: after ~1 hour, it read a stunning 26.51 GB of data and wrote 1659.38 MB of data. This brings it to 28805 MB/hour. |
I note that on a fresh start, both 6.3.5 and 6.4.3 read a lot of data, ~100MB Edit: 6.4.3 reads in ~1 hour 19.81GB and writes 850.30 MB. This means 21135 MB/hour |
6.5.2 reads in ~1 hour 74.44MB and writes 1530.72MB. This means it does 1605 MB/hour. Apparently the virtual machines on a 100mbit connection do a lot more I/O, perhaps this plays a role, maybe in both connectivity and download speed of content? |
With the twitstd plugin: 6.6.0pre-exp1 read 4 KB in ~1hour and wrote 2.59GB. This means an I/O activity of 2652 MB/hour. With These all are significantly higher than what @synctext measured or what my laptop running on my 14 mbit connection did. Odd. |
good progress. key indight: cost vs. benefit. |
No, these were the official .deb files that ran, so no modifications in the code. |
no code changes needed. all in debug panel. |
Ok so I have measured both the I/O calls and times in Tribler as well in Dispersy using an empty megacache (.Tribler state dir) and run it for one hour. It turns out Tribler was blocked for 607.723 seconds doing I/O, this result does not include LevelDB as I am not sure yet if this has blocking behavior. Dispersy a lot less with 30.399 seconds. In total Tribler thus did 638.122 seconds of I/O, meaning 18% of the time the main thread was blocked. A breakdown generated: Tribler:
Dispersy:
|
607 out of 3600 seconds. (you mixed up your . and , btw) Note that which IO execute, fetchall or community.py calls are the heaviest is the critical info. |
Yes this info is all captured as well. Lines are in the form of: edit: I just notice the helper is a decorator function, I'll check if I can get the function one frame alter so the actual caller shows. |
@whirm I remembered you explained this to me (was doubting myself), thanks for the confirmation 👍 |
Jenkins needs IO read/write intensity tracking. |
It seems that database I/O has a huge impact on the reactor. I observed this during one my experiments with TrustChain. Within a network of 100 nodes where every node creates a TrustChain record every second, I get the following evolution of the total number of blocks. With a file database: When storing all blocks in dictionaries: I think this experiment gives us a reliable baseline for further improvements and optimizations. |
Several things are off here. |
I placed a timer in the We should explore whether delegating |
Preliminary observations for 7.3. In the idle state (channels database live for a couple of days, 50 channels, 600 Mb database, 600k torrents) on an SSD: ~50 Kb writes per second (0, 25% of the peak i/o). In the "channel processing" state (when inserting downloaded channel contents into the database) on an HDD: ~600 Kb writes per second (50% of the peak i/o). Processing a 500k torrents channel on an HDD takes more than 3 days! The database grows to ~500Mb during this time, but, you guess it, the total write i/o spend by Tribler during this time is about 130 GBytes. That is a crazy amount of write amplification. The reason for this is the reactor-congestion-control algorithm we employ to split channel processing in small chunks so the reactor is able to process other stuff. On slower media like HDDs it becomes super-inefficient. There are two ways to solve this, both require very invasive changes to the current Tribler architecture:
Also, both ways require blocking write access to GigaChannel database during the channel processing. |
Having done option 2 before myself, I agree with the Pony author: option 1 is the way to go if it is a viable alternative. |
Option 1 sounds easier to implement. We could first verify the performance gains of option 1 with some (small-scale) experiments. |
Option 2 should be relatively easy to implement because GigaChannel makes (almost) no use of other parts of Tribler. The only two calls that it uses are: "download a torrent" and "check if some torrent is already downloaded". However, we'll still have to implement something like a simplified priority queue even in case we go with a separate process. |
Now that we use AIOHTTP, we can start gradually making GigaChannel requests asynchronous, by wrapping Pony calls into coroutines working on a background thread. This will allow us to effectively realize option 1 employing asyncio reactor as a kind of "queue manager". |
I believe that this is indeed the proper way to manage all our operations in Tribler (and IPv8). The first release(s) of IPv8 used the reactor thread purely for socket listening and processed each incoming packet on one of the available thread pools. However, DAS5 experiments showed that this heavily impacted anonymous download speed. Therefore, we switched to packet processing on the reactor thread. During the Noodle release, we realised that it's actually the other way around and handling packets is better done in the thread pool, at least for TrustChain block processing. Now that we have switched to |
excessive disk writes have been a long-standing Tribler problem.
16 April 2014: 623 MByte/hour
17May 2013: 660 MByte/hour
See below the total IO activity of running Tribler f2889f4 for 16 hours. Test conditions: using the -O optimized mode, no Dispersy debug panel, no download, no searches executed, no GUI activity of any kind for 16 hours. Megacache fully bootstrapped before experiment: 547MB dispersy.db and 784MB tribler.sdb
With 22% average CPU utilization (fraction of 1 core) this is now a nice value. Since #15 this has been an issue, but now addressed with the Bloom skip and re-use heuristic.
Related ticket: #8. Running Tribler for 1 hour and see resource usage.
The text was updated successfully, but these errors were encountered: