Skip to content
This repository has been archived by the owner on Jan 19, 2021. It is now read-only.

Extract db-related methods from baseTrie #74

Merged
merged 13 commits into from
Jan 22, 2019
Merged

Extract db-related methods from baseTrie #74

merged 13 commits into from
Jan 22, 2019

Conversation

s1na
Copy link
Contributor

@s1na s1na commented Jan 16, 2019

This PR extracts db-related methods such as getRaw, putRaw from BaseTrie and moves them to a new class DB. It also modifies how checkpointing is done in CheckpointTrie.

The goal was to do these two modifications according to this comment, but after implementation I figured out some problem with that design. I experimented with a few other designs, but they all seemed hacky like the current checkpointing method. This PR is one of those methods which I liked a bit more than others. But if you think it doesn't improve on the status quo, we can close this PR.

To achieve checkpointing we somehow have to make the original trie read-only, and perform all subsequent writes to a temporary (scratch) db. On revert the scratch db is dropped and trie root is set to the checkpointed root. On commit nodes in the scratch db which correspond to the current trie (with current root) have to be written to the main persistent db.

For this purpose, this PR introduces a ScratchDB, which is an in-memory DB with a backend (pointer to main persistent db). On checkpoint, CheckpointTrie replaces the db that BaseTrie uses with an instance of ScratchDB, therefore all writes are written to the scratch. If a key is not found in the ScratchDB, it tries to fetch it from its backend.

I also have some questions regarding getRaw and putRaw. My impression from reading the existing code is that putRaw will write to the persistent DB even if trie is during a checkpoint. Is that intentional? If so, should getRaw and delRaw also be done on persistent db regardless of checkpointing? We should probably highlight this in documentation, or rename them to make this side-effect clear.

Signed-off-by: Sina Mahmoodi <[email protected]>
Signed-off-by: Sina Mahmoodi <[email protected]>
Signed-off-by: Sina Mahmoodi <[email protected]>
Signed-off-by: Sina Mahmoodi <[email protected]>
@coveralls
Copy link

coveralls commented Jan 16, 2019

Coverage Status

Coverage increased (+0.3%) to 93.613% when pulling 4ea8d58 on refactor/db into 029b5fe on master.

@holgerd77
Copy link
Member

Whew, at least tests are already passing, great! 👍 😄

@holgerd77
Copy link
Member

holgerd77 commented Jan 17, 2019

Think I like this approach, this get's definitely more readable and distinctly sorted out regarding the functionality respectively responsibility.

I wonder if we should generally switch to a more directory-structured file structure (and also take on some breaking import changes as a consequence), suggestion:

  • main directory
    • base.js
    • checkpoint.js
    • index.js
    • secure.js
    • node.js
    • proof.js
  • util directory
    • async.js
    • hex.js
    • nibbles.js
    • tasks.js (or eventually merge with async.js)
  • db directory
    • index.js
    • scratch.js
  • readStream directory
    • index.js
    • scratch.js

Update: Updated the above with latest reflections/ideas.

@holgerd77
Copy link
Member

Find this file and naming mix generally still too messy...

@holgerd77
Copy link
Member

Updated the directory structure above with latest reflections/ideas on this.

Side node: we can also address this in a separate PR.

@holgerd77
Copy link
Member

Will do some more proper review now. 😄 Let's probably address these file structure changes in a separate PR, will eventually also open a new issue on this. You can nevertheless give some first opinion if you want.

@holgerd77
Copy link
Member

Tested the basic usage example, worked.

Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks more than it is (my review comments), generally this looks really good and very happy with the changes. This brings a lot more transparency in the behavior of the library, also eventually allows in the future to do easier switches of the DB backend store (respectively allows to easier add this functionality to the API + code base).

src/baseTrie.js Outdated
this.db = db
} else {
this.db = new DB(db)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit undecided here, since this adds complexity to the public API. Do we (already) want to expose the DB class through this and allow the additional instantiation with DB? Would you say this should now be the recommended way of using the API? Or should we stick for now to just keep the API as is and just forward the level instance passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we should probably keep DB internal and have the public interface work with level instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have actually a tendency to the other way around, to enforce instantiation by using the DB wrapper class. Then we already have some good preparation if we want to add new storage options beside from level DB in the future. What do you think? I am in the breaking-changes mood! 😄

src/baseTrie.js Outdated Show resolved Hide resolved
src/baseTrie.js Show resolved Hide resolved
src/baseTrie.js Show resolved Hide resolved
src/baseTrie.js Show resolved Hide resolved
src/checkpointTrie.js Outdated Show resolved Hide resolved
src/checkpointTrie.js Outdated Show resolved Hide resolved
src/checkpointTrie.js Show resolved Hide resolved
src/checkpointTrie.js Show resolved Hide resolved
src/scratch.js Show resolved Hide resolved
@s1na
Copy link
Contributor Author

s1na commented Jan 21, 2019

Thanks for the extensive review! Will go through your comments now.

Re. the file structure, agree and what you proposed looks good for the current state. Only maybe we can wait a bit more before solidifying the structure, as there's another PR incoming, and then there's transition to typescript which could offer new opportunities for structuring.

@holgerd77
Copy link
Member

Makes very much sense to let the file structure changes sink in a bit more and do this along TypeScript transition. Have opened a separate issue #75 where we can continue discussion respectively track the latest state of proposals on that.

@holgerd77
Copy link
Member

Ok, have answered everything. Let me know if you have questions or would rather tend to choose other paths than the ones I have suggested.

@s1na
Copy link
Contributor Author

s1na commented Jan 21, 2019

Tried to apply your suggestions. Hope I didn't miss anything. Raw methods are dropped (and their test file). Also made createScratchReadStream private. DB uses this._leveldb internally to refer to the leveldb instance.

@holgerd77
Copy link
Member

Hmm, can't we use the tests on the DB class?

There is still a linting error preventing the CI run to pass.

} else {
this.db = new DB(db)
}
this.db = db || new DB()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the API docs here as well?

@holgerd77
Copy link
Member

Ready for re-review? 😄

@s1na
Copy link
Contributor Author

s1na commented Jan 22, 2019

Yep 😄 I was just waiting to see if CI passes.

I've replaced the rawOps tests with unit tests for db.js, scratch.js and added some additional cases to checkpointing tests (moved them to a separate file checkpoint.js).

@holgerd77
Copy link
Member

Cool, will directly go for it. 😄

(already a +0.4% test coverage increase, great! 👍)

@holgerd77
Copy link
Member

Update: ah, strange coverage behavior now the final report is -1.5%, I had a +0.4% shown while CI was still running. Hmm.

@holgerd77
Copy link
Member

Just as some note, no need to change in this round: maybe we should name DB even more generic to Storage or something, then this is already better named/prepared for some future storage backend exchange? The exposed DB API is pretty limited in scope (get, put, batch, copy), so it should be not too far-fetched to easily allow switching here in the future with other-backend-targeting wrapper classes?

@holgerd77
Copy link
Member

Also just as some note and not a PR blocker: these re-generated docs are better than nothing, but after all these changes we have to significantly restructure the docs to reflect the directory/file structure + API exposed.

@holgerd77
Copy link
Member

Maybe if we just release after TypeScript transition, that we also be the point for the docs update.

@holgerd77
Copy link
Member

Also some note: we had some extensive discussion with Alex when he was doing the changes around changing or not changing the API by the decision where to put the distribution files (so that require('secure.js') would not have to be changed to require('dist/secure.js'). We ended up to put these all in the root directory and exclude in .gitignore.

However after some time dealing with this I have to say this is a total mess. On TypeScript transition we should also switch to just the standard way we are doing on other libraries and put everything in the dist folder. There will be several breaking changes anyhow so this will be a good occasion to update.

@s1na
Copy link
Contributor Author

s1na commented Jan 22, 2019

Weird that coverage changed at the end! but luckily it helped us find a bug! A few points:

  1. Added a simple put/get test for secure, as well as a copy test.
  2. Added copy tests for checkpointTrie, both before any checkpointing and after. The after case didn't work. I've now fixed the problem.
  3. The copy method of all tries doesn't actually copy the leveldb. This means if they copy trie and use both instances in parallel, they're using the same leveldb. Tried to add this to docs. I don't know however if this is desired (the alternative of copying whole leveldb is also not elegant).
  4. I had forgotten to mention that I've changed the DB api. Its methods only accept Buffer type now and throw an error if input is not a Buffer. I thought this is a good opportunity to do this, as we're breaking API anyway :)

@holgerd77
Copy link
Member

  1. 👍
  2. 👍
  3. If the behavior before was the same we can leave this for now, already a larger improvement if this is clearly stated in the docs now.
  4. That's good/desired, hope we manage to get all breaking changes collected together at the end on release notes though! 😄

@s1na
Copy link
Contributor Author

s1na commented Jan 22, 2019

Regarding your comments:

  1. Storage also sounds good. Yeah, the API is really simple, so in principle it should be possible to use various storage backends. We can also probably do experiments and benchmarks easily.
  2. Yeah I think during transition to TS it would make sense to update docs.
  3. Definitely agree, I wanted to bring this up, glad you mentioned it. Yeah, it'd make sense to put all in dist and ask them to be imported via something like const { SecureTrie } = require('merkle-patricia-tree') (so that import is not dependent on file path). Also, maybe make exports much more explicit.

Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, had some last overall look, would give this a cautious GO now. Feel free to merge if you feel confident with it.

@holgerd77
Copy link
Member

On 3.: Think you have some better oversight then me on ways to organize this, feel free to realize the one you think is most appropriate. Just this throwing everything into root dir makes no sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants