Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate datastores to real database #29

Closed
7 tasks
freimair opened this issue Apr 7, 2020 · 14 comments
Closed
7 tasks

Migrate datastores to real database #29

freimair opened this issue Apr 7, 2020 · 14 comments
Assignees
Labels
a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage was:rejected bisq.wiki/Project_management#Approval

Comments

@freimair
Copy link

freimair commented Apr 7, 2020

This is a Bisq Network project. Please familiarize yourself with the project management process.

Description

This project is about replacing the storage infrastructure with a real database we can query using SQL.

Rationale

Why is it important?

Until now, data stores have been persisted by serializing data to Protobuffer and then writing that to disk. Because of that and as the data stores grow, we see

  • massive disk I/O
  • high CPU loads
  • higher then necessary RAM requirements.

This project is about replacing the storage infrastructure with a real database we can query using SQL.

  • does eliminate the need for serializing all data every time (ie. reduce CPU load significantly)
  • does eliminate the need for replacing the whole databases every time (ie. reduce disk I/O)
  • lets us lazy-load data to RAM on demand (although not included in this project)

Why should it be done now?

Needs to be streamlined and fixed before growth is hugely successful.

Aside from that it will address the following issues in one way or the other

probably

probably more

What will happen if we don't do it or delay doing it?

if we do not do it, CPU, RAM and IO issues will grow until Bisq no longer works. If we delay it, the user experience is affected even more.

Criteria for delivery

  • refactored storage infrastructure
  • test suit

Measures of success

  • reduce CPU load caused by serializing the datablobs
  • reduce IO load caused by rewriting the whole database on every change

Risks

As always with changes in the P2P part, the risks are substantial.

Tasks

  • create test suit
  • settle on a database technology
  • proof-of-concept implementation
  • make it production ready
  • test

Estimates

Task Amount [USD]
create test suit 1900,00
select db technology 200,00
proof-of-concept implementation 900,00 3400,00
make production ready 2300,00 3800,00
testing 700,00 5400,00
other 500,00
total 6500,00 15200,00

EDIT: adjusted the numbers by factoring in recent developments and difficulties.

Notes

followup to #25

@freimair freimair added a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage labels Apr 7, 2020
@freimair freimair changed the title [WIP] Migrate datastores to real database Migrate datastores to real database Apr 7, 2020
@wiz
Copy link

wiz commented Apr 7, 2020

As some background information, in case the reason why this is a critical bugfix is not directly obvious, is because recently we are seeing support cases from users with either invalid / corrupted payout TX, or more recently, the payout TX being deleted or lost by bisq internally somehow. We believe it is due to the datastore issues

@freimair
Copy link
Author

freimair commented Apr 8, 2020

well, new to me, but bring it on. the more bugs we can fight off with this the better.

@freimair
Copy link
Author

@cbeams, @ripcurlx: Its been 3 weeks and we have 3 👍 and I believe a decision is due. What is there to wait for?

Additionally, just to make it clearer, I would aim to allow for different SQL-compatible backends. Especially,

  • file-based, low performance but easy-to-deploy backend for Bisq clients (H2 for example)
  • full-fledged, high performance but harder-to-setup database services for our servers (mariadb for example)

@ripcurlx
Copy link

@freimair Do you see any additional security risk if we are using a DB based storage?

@freimair
Copy link
Author

@freimair Do you see any additional security risk if we are using a DB based storage?

  • for the client Bisq app: absolutely not, there is no change in the security model. It is files now, it will be files then. Protobuf is no encryption, neither is H2.
  • for server environments: no, again, it is files now, it will be files then. If done right, it might be more secure even - given a proper setup (access restrictions at user level, authentication, ...). But even if the database is wide open to everyone, the data is shared throughout the network anyways.

@chimp1984
Copy link

@freimair Do you see any additional security risk if we are using a DB based storage?

Any DB solution will add a huge dependency tree to bisq. So that is definitely a security risk. We should look which DB solutions are used in other Bitcoin projects (like Bitcoin Core, LN,....) - though they might have very different requirements.

@ghost
Copy link

ghost commented Aug 6, 2020

It is being claimed that using a database will fix 6 critical bugs to do with high memory usage/exhaustion. From the little experimentation I and others have done it seems that JavaFX is the culprit when it comes to the memory use/leak issues. I don't see how using a database will solve those issues.

@freimair
Copy link
Author

freimair commented Aug 6, 2020

It is being claimed that using a database will fix 6 critical bugs to do with high memory usage/exhaustion.

That is not entirely correct. The claim is "will address the following issues in one way or the other". It will not fix these issues once and for all, however, it will contribute to lower resource requirements.

And yes, JavaFX is a major player when it comes to memory - needs fixing as well.

@chimp1984 agreed. However, if we do not use Hibernate or alike, we might just keep the dependencies to a minimum. Needs to be thoroughly evaluated of course.

@wiz
Copy link

wiz commented Aug 6, 2020

I agree that we should avoid new deps for the security risk. Bitcoin uses BDB and it would be perfect for us too if there is a Java equivalent.

@chimp1984
Copy link

Migrating to any DB solution will be a pretty large task. I recommend to take the more low hanging fruits first, specially as those might remove many of the main reasons for the need of a DB.
E.g. We don't need to keep 15 MB of trade statistics (goes back to 4 years old data). Loading historical data only on demand would reduce that issue by 95%.

DB will not solve magically all problems. If you do not keep tyhe data in RAM you will move to a async model which can become very complex when it comes to the DAO handling of the DaoState, which will become a major issue as it is alreayd 75 MB. But solving that is a pretty difficult and high risk task. Maybe a DB will help but maybe it will require a asynchronous (multithreaded) setup which adds a lot of risks and complexity....

Just rough thoughts, I am not against it, just want to keep expectations in reasonable limits and point out possible problems.

Also what was discussed above regarding lost payout tx data: That is very likely not related at all, as pending trades data are small. It might be that we do not update/write the data at the moment when the payout tx is set, and if the user shut down quickly or has a crash the data might be lost. Is likely a one-liner to fix if one investigate (maybe I will do if I find time).

@chimp1984
Copy link

Just looked at trade persistence: it is persisted with a 50 ms delay. Does not sound like the issue but as the write is happing in a new thread the real delay can be significantly longer if high CPU load happens.

@freimair
Copy link
Author

freimair commented Sep 1, 2020

@chimp1984 please keep in mind that not everyone has a powerful dev machine to use and trade on. The current storage solution reaches is limits now and our requirements grow faster as technology can keep up.

This project is a massive one, that I agree on. High risk and everything. However, it also addresses a lot of issue we see regularly nowadays. We should at least get started. One step at a time.

@chimp1984
Copy link

@ripcurlx @cbeams Can we close that project? We have rewritten and improved persistence and as we only read once at startup there is no use for a database solution. Also it would add a huge dependeny tail and with it increase the security attack surface.

@cbeams
Copy link
Contributor

cbeams commented Feb 8, 2021

I did not track this project, but given the inactivity and your comment above, I'll err on the side of closing it as rejected.

@cbeams cbeams closed this as completed Feb 8, 2021
@cbeams cbeams added the was:rejected bisq.wiki/Project_management#Approval label Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage was:rejected bisq.wiki/Project_management#Approval
Projects
None yet
Development

No branches or pull requests

5 participants