Migrate datastores to real database #29

freimair · 2020-04-07T09:21:06Z

This is a Bisq Network project. Please familiarize yourself with the project management process.

Description

This project is about replacing the storage infrastructure with a real database we can query using SQL.

Rationale

Why is it important?

Until now, data stores have been persisted by serializing data to Protobuffer and then writing that to disk. Because of that and as the data stores grow, we see

massive disk I/O
high CPU loads
higher then necessary RAM requirements.

This project is about replacing the storage infrastructure with a real database we can query using SQL.

does eliminate the need for serializing all data every time (ie. reduce CPU load significantly)
does eliminate the need for replacing the whole databases every time (ie. reduce disk I/O)
lets us lazy-load data to RAM on demand (although not included in this project)

Why should it be done now?

Needs to be streamlined and fixed before growth is hugely successful.

Aside from that it will address the following issues in one way or the other

probably

Bisq creates an invalid timelock payout transaction in certain circumstances bisq#4071 (thanks @wiz)

probably more

What will happen if we don't do it or delay doing it?

if we do not do it, CPU, RAM and IO issues will grow until Bisq no longer works. If we delay it, the user experience is affected even more.

Criteria for delivery

refactored storage infrastructure
test suit

Measures of success

reduce CPU load caused by serializing the datablobs
reduce IO load caused by rewriting the whole database on every change

Risks

As always with changes in the P2P part, the risks are substantial.

Tasks

Estimates

Task	Amount [USD]
create test suit	1900,00
select db technology	200,00
proof-of-concept implementation	~~900,00~~ 3400,00
make production ready	~~2300,00~~ 3800,00
testing	~~700,00~~ 5400,00
other	500,00
total	~~6500,00~~ 15200,00

EDIT: adjusted the numbers by factoring in recent developments and difficulties.

Notes

followup to #25

wiz · 2020-04-07T16:41:55Z

As some background information, in case the reason why this is a critical bugfix is not directly obvious, is because recently we are seeing support cases from users with either invalid / corrupted payout TX, or more recently, the payout TX being deleted or lost by bisq internally somehow. We believe it is due to the datastore issues

freimair · 2020-04-08T08:44:04Z

well, new to me, but bring it on. the more bugs we can fight off with this the better.

freimair · 2020-04-27T11:43:16Z

@cbeams, @ripcurlx: Its been 3 weeks and we have 3 👍 and I believe a decision is due. What is there to wait for?

Additionally, just to make it clearer, I would aim to allow for different SQL-compatible backends. Especially,

file-based, low performance but easy-to-deploy backend for Bisq clients (H2 for example)
full-fledged, high performance but harder-to-setup database services for our servers (mariadb for example)

ripcurlx · 2020-04-27T12:58:33Z

@freimair Do you see any additional security risk if we are using a DB based storage?

freimair · 2020-04-27T15:31:19Z

@freimair Do you see any additional security risk if we are using a DB based storage?

for the client Bisq app: absolutely not, there is no change in the security model. It is files now, it will be files then. Protobuf is no encryption, neither is H2.
for server environments: no, again, it is files now, it will be files then. If done right, it might be more secure even - given a proper setup (access restrictions at user level, authentication, ...). But even if the database is wide open to everyone, the data is shared throughout the network anyways.

chimp1984 · 2020-08-06T05:38:33Z

@freimair Do you see any additional security risk if we are using a DB based storage?

Any DB solution will add a huge dependency tree to bisq. So that is definitely a security risk. We should look which DB solutions are used in other Bitcoin projects (like Bitcoin Core, LN,....) - though they might have very different requirements.

ghost · 2020-08-06T12:40:51Z

It is being claimed that using a database will fix 6 critical bugs to do with high memory usage/exhaustion. From the little experimentation I and others have done it seems that JavaFX is the culprit when it comes to the memory use/leak issues. I don't see how using a database will solve those issues.

freimair · 2020-08-06T12:48:32Z

It is being claimed that using a database will fix 6 critical bugs to do with high memory usage/exhaustion.

That is not entirely correct. The claim is "will address the following issues in one way or the other". It will not fix these issues once and for all, however, it will contribute to lower resource requirements.

And yes, JavaFX is a major player when it comes to memory - needs fixing as well.

@chimp1984 agreed. However, if we do not use Hibernate or alike, we might just keep the dependencies to a minimum. Needs to be thoroughly evaluated of course.

wiz · 2020-08-06T13:04:56Z

I agree that we should avoid new deps for the security risk. Bitcoin uses BDB and it would be perfect for us too if there is a Java equivalent.

chimp1984 · 2020-08-06T15:47:05Z

Migrating to any DB solution will be a pretty large task. I recommend to take the more low hanging fruits first, specially as those might remove many of the main reasons for the need of a DB.
E.g. We don't need to keep 15 MB of trade statistics (goes back to 4 years old data). Loading historical data only on demand would reduce that issue by 95%.

DB will not solve magically all problems. If you do not keep tyhe data in RAM you will move to a async model which can become very complex when it comes to the DAO handling of the DaoState, which will become a major issue as it is alreayd 75 MB. But solving that is a pretty difficult and high risk task. Maybe a DB will help but maybe it will require a asynchronous (multithreaded) setup which adds a lot of risks and complexity....

Just rough thoughts, I am not against it, just want to keep expectations in reasonable limits and point out possible problems.

Also what was discussed above regarding lost payout tx data: That is very likely not related at all, as pending trades data are small. It might be that we do not update/write the data at the moment when the payout tx is set, and if the user shut down quickly or has a crash the data might be lost. Is likely a one-liner to fix if one investigate (maybe I will do if I find time).

chimp1984 · 2020-08-06T15:52:38Z

Just looked at trade persistence: it is persisted with a 50 ms delay. Does not sound like the issue but as the write is happing in a new thread the real delay can be significantly longer if high CPU load happens.

freimair · 2020-09-01T13:19:05Z

@chimp1984 please keep in mind that not everyone has a powerful dev machine to use and trade on. The current storage solution reaches is limits now and our requirements grow faster as technology can keep up.

This project is a massive one, that I agree on. High risk and everything. However, it also addresses a lot of issue we see regularly nowadays. We should at least get started. One step at a time.

chimp1984 · 2021-02-07T15:31:42Z

@ripcurlx @cbeams Can we close that project? We have rewritten and improved persistence and as we only read once at startup there is no use for a database solution. Also it would add a huge dependeny tail and with it increase the security attack surface.

cbeams · 2021-02-08T06:35:31Z

I did not track this project, but given the inactivity and your comment above, I'll err on the side of closing it as rejected.

freimair added a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage labels Apr 7, 2020

freimair changed the title ~~[WIP] Migrate datastores to real database~~ Migrate datastores to real database Apr 7, 2020

This was referenced Apr 30, 2020

Refactor send-message business logic in Connections #28

Closed

Reduce initial request size bisq-network/bisq#4233

Closed

freimair mentioned this issue May 15, 2020

Limit system ram to 4GB bisq-network/bisq#4048

Merged

cbeams assigned freimair Jun 12, 2020

cbeams closed this as completed Feb 8, 2021

cbeams added the was:rejected bisq.wiki/Project_management#Approval label Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate datastores to real database #29

Migrate datastores to real database #29

freimair commented Apr 7, 2020 •

edited

Loading

wiz commented Apr 7, 2020 •

edited

Loading

freimair commented Apr 8, 2020

freimair commented Apr 27, 2020

ripcurlx commented Apr 27, 2020

freimair commented Apr 27, 2020

chimp1984 commented Aug 6, 2020

ghost commented Aug 6, 2020

freimair commented Aug 6, 2020

wiz commented Aug 6, 2020

chimp1984 commented Aug 6, 2020

chimp1984 commented Aug 6, 2020

freimair commented Sep 1, 2020

chimp1984 commented Feb 7, 2021

cbeams commented Feb 8, 2021

Migrate datastores to real database #29

Migrate datastores to real database #29

Comments

freimair commented Apr 7, 2020 • edited Loading

Description

Rationale

Why is it important?

Why should it be done now?

What will happen if we don't do it or delay doing it?

Criteria for delivery

Measures of success

Risks

Tasks

Estimates

Notes

wiz commented Apr 7, 2020 • edited Loading

freimair commented Apr 8, 2020

freimair commented Apr 27, 2020

ripcurlx commented Apr 27, 2020

freimair commented Apr 27, 2020

chimp1984 commented Aug 6, 2020

ghost commented Aug 6, 2020

freimair commented Aug 6, 2020

wiz commented Aug 6, 2020

chimp1984 commented Aug 6, 2020

chimp1984 commented Aug 6, 2020

freimair commented Sep 1, 2020

chimp1984 commented Feb 7, 2021

cbeams commented Feb 8, 2021

freimair commented Apr 7, 2020 •

edited

Loading

wiz commented Apr 7, 2020 •

edited

Loading