Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better database for Tribler/ Prevent Trustchain exit nodes wiping out #4471

Closed
grimadas opened this issue Apr 25, 2019 · 18 comments
Closed

Better database for Tribler/ Prevent Trustchain exit nodes wiping out #4471

grimadas opened this issue Apr 25, 2019 · 18 comments

Comments

@grimadas
Copy link
Contributor

grimadas commented Apr 25, 2019

Current problems for scalability

Current database cannot scale well and handling big amount of trustchain records. After some number of records we see a degradation in performance for exit nodes.
One of the solutions is to wipe out database and start over, but that results in identity, reputation lose.
Alternately, we can improve underlying trustchain database. I think a good start is to look into workload and see how other databases can handle that.

Abstract Database module

Currently we have at least three different entries to the database with hardcoded sql queries in the codebase: Upgrade, IPv8/Database.py and Metastore using pony-orm.
To simplify the migration from one database to another to improve salability and latency we need to abstract database access( we might want to look into graph, key/value databases in future).

A good starting point would be a database adapter and an sqlite implementation of it.

@ichorid
Copy link
Contributor

ichorid commented Apr 25, 2019

I doubt it will be possible to abstract out Metadata Store access in the foreseeable future. PonyORM already provides a good level of abstraction. However, Trustchain can benefit greatly from switching to some NoSQL-based store (e.g. LevelDB). Building an abstraction level for storing blocks could become a nice step in that direction.

@qstokkink , @devos50, do any components of Tribler beside Trustchain use IPv8/Database.py?

@devos50
Copy link
Contributor

devos50 commented Apr 25, 2019

@ichorid No, I don't think so.

@qstokkink
Copy link
Contributor

The attestation community has its own Database + database schema.

@ichorid
Copy link
Contributor

ichorid commented Apr 26, 2019

@qstokkink , would you consider moving the attestation community to use Pony?

@qstokkink
Copy link
Contributor

@ichorid I just went through the code with @grimadas, doesn't seem like a good idea right now.

@grimadas
Copy link
Contributor Author

I'll first do quick and dirty check if it is worth to migrate to db other than sqlite, and proceed after.

@grimadas grimadas changed the title Database abstraction for Tribler Better database for Tribler/ Prevent Trustchain exit nodes whipping out Apr 26, 2019
@grimadas grimadas changed the title Better database for Tribler/ Prevent Trustchain exit nodes whipping out Better database for Tribler/ Prevent Trustchain exit nodes wiping out Apr 26, 2019
@synctext
Copy link
Member

We require DB embedding! As just discussed: our choice and selection of database technology is significantly limited due to our requirement that everything is bundled within our installer. We can't rely on big servers with dedicated database services, our DB is running locally and competing for resources.

@grimadas
Copy link
Contributor Author

These are benchmarking results for database behind tribler explorer.
avg
Figure 1 is the average execution time for a query. You see that one query get_block_creation takes more than 40 sec.
times
Figure 2 is showing number of times when query was executed.

total
Figure 3 is the total time for the experiment.

@ichorid
Copy link
Contributor

ichorid commented Apr 30, 2019

Block explorer is a special case. Could you try to capture a typical client Tribler workload session and build the same charts for it?

@qstokkink
Copy link
Contributor

Maybe it would make sense then for the block explorer to use a different database type. As this is not used by our users anyway, we can change this without any major repercussions (we'll just have to sit through the databse conversion).

I also agree with @ichorid, this tells us nothing about the Tribler user load.

@grimadas
Copy link
Contributor Author

grimadas commented Apr 30, 2019

I'll do local experiments now, imitating typical workload of a Tribler user: idle, downloading etc.
Also, exit node is another type of user with heavy workload

@grimadas
Copy link
Contributor Author

grimadas commented May 9, 2019

image
Some results from the exit node. It seems that database queries are not the bottleneck.

@grimadas
Copy link
Contributor Author

grimadas commented May 9, 2019

image

@devos50
Copy link
Contributor

devos50 commented Sep 8, 2019

I made some modifications to the TrustChain crawler. First, I increased the statistics maintainance interval to one hour, which means that block creation statistics (the graph) is rebuilt every hour, instead of every five minutes. Aggregation of statistics is a major bottleneck as it executes a resource-intensive SQL query.

Second, I started to explore whether a key-value store can be utilized to improve the performance of the TrustChain crawler. Last week I was told that there was some initial work to rewrite the TrustChain persistence layer to use key-value (we also received various questions already why we were not using one already). @grimadas do you have some initial code/results for this?

@grimadas
Copy link
Contributor Author

grimadas commented Sep 8, 2019

There are several options with good python bindings, but I guess one of the best matches for us is lmdb. It is pretty easy and supports async read/write with multiple threads.

@synctext
Copy link
Member

synctext commented Sep 8, 2019

Multi-core crawling is possible in near future?

@ichorid
Copy link
Contributor

ichorid commented Sep 9, 2019

This is a good target for 7.5 release, not for the technical 7.4.

@drew2a drew2a modified the milestones: Next-next release, Backlog Nov 4, 2020
@qstokkink qstokkink removed this from the Backlog milestone Aug 23, 2024
@qstokkink
Copy link
Contributor

Looking at this issue, I think it is no longer applicable for the current state of affairs: exit nodes no longer run any database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants