Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volatile DB: non-locking reads #1816

Closed
mrBliss opened this issue Mar 17, 2020 · 3 comments · Fixed by #1866
Closed

Volatile DB: non-locking reads #1816

mrBliss opened this issue Mar 17, 2020 · 3 comments · Fixed by #1866
Labels
consensus issues related to ouroboros-consensus optimisation Performance optimisation volatile db

Comments

@mrBliss
Copy link
Contributor

mrBliss commented Mar 17, 2020

In the VolatileDB, not only writes require an exclusive lock (takeMVar) on the state of the VolatileDB, but reads too (instead of readMVar). This is because when a garbage collection (a write) happens concurrently with a read, e.g., getting a block from a file, it might be that the file to read from is deleted before we actually got to read the file. This will result in an FsError (FsResourceDoesNotExist?) when trying to read from the no longer existing file, which will close the VolatileDB, shutting down the node. The chances of this happening are small (getBlockComponent reads the state, then the garbageCollect locks the state and deletes the file before getBlockComponent does the actual read), but nonetheless problematic.

Also see this thread: #1165 (comment).

A potential solution is to use something like https://hackage.haskell.org/package/concurrent-extra-0.7.0.12/docs/Control-Concurrent-ReadWriteLock.html or https://hackage.haskell.org/package/concurrent-extra-0.7.0.12/docs/Control-Concurrent-ReadWriteVar.html. Note that we can't reuse these as we're using IOLike. We should allow concurrent reads, even reads concurrent with putBlock, but not reads concurrent with garbageCollect. To make things more complex, we could allow a putBlock call at the same time as garbageCollect, because we never garbage collect the file we append to.

Related: #1165.

Low priority, because we don't know how much time we lose because of the exclusive locks (we have always had them), and implementing this is complex (concurrency stuff always is) and harder to test.

@mrBliss mrBliss added consensus issues related to ouroboros-consensus optimisation Performance optimisation volatile db priority low labels Mar 17, 2020
mrBliss added a commit that referenced this issue Mar 17, 2020
See the TODO and #1816 for the motivation.
@dcoutts
Copy link
Contributor

dcoutts commented Mar 17, 2020

I think we have an STM based reader/writer lock we could port over easily, if that's what we need.

@dcoutts
Copy link
Contributor

dcoutts commented Mar 17, 2020

If it helps, it's worth noting that Unix file systems allow one to delete files that are already open. So if existing readers have a handle to the file and keep it open, then there's nothing to stop GC from deleting the file. The file will actually be deleted automatically by the OS when the last open file handle is closed.

This behaviour is not supported on Windows however.

@mrBliss
Copy link
Contributor Author

mrBliss commented Mar 17, 2020

If it helps, it's worth noting that Unix file systems allow one to delete files that are already open. So if existing readers have a handle to the file and keep it open, then there's nothing to stop GC from deleting the file. The file will actually be deleted automatically by the OS when the last open file handle is closed.

This behaviour is not supported on Windows however.

We can't rely on this behaviour, because our pure/mock HasFS implementation does not implement it, as it's very hard to imitate it in a pure way. And if Windows doesn't support it either, then I think we should not pursue this option.

Furthermore, the file may have been deleted before we have opened it (unless we keep handles of all files open, as in #1165).

mrBliss added a commit that referenced this issue Mar 27, 2020
mrBliss added a commit that referenced this issue Apr 3, 2020
iohk-bors bot added a commit that referenced this issue Apr 3, 2020
1866: VolatileDB: use Read-Append-Write lock for storing the state r=mrBliss a=mrBliss

Fixes #1816.

1895: Improve tx submission tracing r=intricate a=intricate

Closes #1861 

1899: Update ledger one commit and resolve golden test failure r=dcoutts a=dcoutts

The ledger patch 512c26a66a6a63278846646b81bf8eadcd4ae99c introduced a change in the representation of one field in TxSizeLinear, to resolve an important bug with fee calculations.

It did not change the CBOR representation of the TxSizeLinear, but it did change an example lovelace-per-byte fee value from (44 :: Lovelace) to (43.946 :: Rational).

This example value is in the byron consensus golden tests so this of course changes the serialised reference values in the tests.

It is slightly hard to see the change clearly because the TxSizeLinear gets double-serialised, so the change is
```
-"\130\ESC\NUL\NUL\141QuOR\NUL\ESC\NUL\NUL\NUL\n>\154\184\NUL"
+"\130\ESC\NUL\NUL\141QuOR\NUL\ESC\NUL\NUL\NUL\n;b\190\128"
```
The difference without the double-serialisation is more instructive:
```
-[TkListLen 2,TkInt 155381000000000,TkInt 44000000000]
+[TkListLen 2,TkInt 155381000000000,TkInt 43946000000]
```
This is then completely obvious given the change in the example from 44 to 43.946.

Also added a specific golden test for the TxSizeLinear to make this all a bit clearer, to explain the magic bytes.

1901: Pass the ticked ledger to (re)applyLedgerBlock r=mrBliss a=mrBliss

For the Byron and Shelley ledgers, the first thing `(re)applyLedgerBlock`
does, is tick the ledger. Since we already tick the ledger in
`applyExtLedgerState` (before obtaining a `LedgerView` from it to pass to
`validateHeader`) we can simply pass the ticked ledger to
`(re)applyLedgerBlock`, avoiding the repeated tick.

Co-authored-by: Thomas Winant <[email protected]>
Co-authored-by: Luke Nadur <[email protected]>
Co-authored-by: Duncan Coutts <[email protected]>
@iohk-bors iohk-bors bot closed this as completed in f9b64e6 Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus optimisation Performance optimisation volatile db
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants