Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/abci #188

Merged
merged 12 commits into from
Aug 21, 2023
Merged

Fix/abci #188

merged 12 commits into from
Aug 21, 2023

Conversation

brennanjl
Copy link
Collaborator

@brennanjl brennanjl commented Aug 16, 2023

I've implemented various changes here, all necessary for getting ABCI working. These include:

  • A KV store that implements sessions.Committable
  • Changes to AtomicCommitter, to allow for generation of the app-hash before the end of the first commit phase (this is necessary since apphash needs to be persisted atomically).
  • A few bug fixes on the engine's way of generating apphashes
  • A very basic WAL that can be used for the 2pc protocol
  • Various ABCI method implementations.

There's really only two things to flag here:

  1. Transaction.UnmarshalBinary: We deserialize binary into the transactions.Transaction struct, but it contains a public key interface. This will certainly fail, since the deserializer does not know what implementation of the Public Key interface to use. I don't think this is rocket science to fix, I just put a TODO there to flag it.
  2. abci: Various TODOs for @jchappelow are sprinkled through abci. They will take a few minutes to implement; mostly just deserializing transaction payloads, or returning validator updates to ABCI.

This knocks out quite a few blockers for getting integration tests working; the biggest blocker is still the crypto package.

pkg/abci/abci.go Outdated
}

func (a *AbciApp) InitChain(p0 abciTypes.RequestInitChain) abciTypes.ResponseInitChain {
panic("TODO")
ctx := context.Background()
// TODO: the following was contained in the old ABCI
Copy link
Contributor

@charithabandi charithabandi Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to initialize our app with the genesis validators, code will change a bit based on the validator store, but this initialization is needed. Think Jon is already handling this in his PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should get #182 very soon . Wrapping it up now.

Copy link
Collaborator Author

@brennanjl brennanjl Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a case where there is more than one genesis validator? I sort of assumed that I would start up a Kwil node for a new network, then if there was another node, it would request, I would approve, and it would then be the second validator, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there would be a set at genesis. @charithabandi's docker-compose has three in the initial set for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If u want to start the network with more than 1 validators, you can do that by specifying it in the genesis file and bootup the network. or the other way is to do the whole join process. So yeah, there can be more than one at the app initialization

pkg/abci/abci.go Outdated
func (a *AbciApp) Commit() abciTypes.ResponseCommit {
ctx := context.Background()

// TODO: how should we handle errors here? We probably want the node to stop, right?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying something for this here: 61ddf18

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, except we would want to make sure that a panic in ABCI still allows the modules to gracefully shutdown. I'm not quite sure how to do this (if it isn't the case already).

Copy link
Member

@jchappelow jchappelow Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas about lifetime management of subsystems.
Presently, even if an operator attempts to cleanly shutdown the node (e.g. CTRL+C or SIGINT), the only things that gracefully shutdown afaict are the grpc server and gateway. The other systems consumed by these servers have no lifetime management at all that I can discern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, except the Node via the startStopper, but I don't see any of the shutdown signals propagating further down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it's not something we have yet, but it is something we need to add. My thinking is that we can handle closing of the data stores in the same place that we handle stopping of servers, so that we can do it gracefully. I can try to put together an example of what this would look like.

Copy link
Member

@jchappelow jchappelow Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closing of the data stores in the same place that we handle stopping of servers

This is relatively straightforward with the nodeWrapper I've added in the validators PR:

type nodeWrapper struct {
n *nm.Node
}
func (nw *nodeWrapper) Start() error {
return nw.n.Start()
}
func (nw *nodeWrapper) Stop() error {
if err := nw.n.Stop(); err != nil {
return err
}
nw.n.Wait()
return nil
}
var _ (startStopper) = (*nodeWrapper)(nil)

I've added that mainly because when Server calls Stop , the icky Node API requires you to Wait for shutdown. So we can also stop the other systems in this type. This nodeWrapper is constructed in buildServer where the other modules and services are brought up, so it's easy enough to wrap them here too.

I think. I'm not seeing all the Close method uses clearly yet.


// Truncate deletes the entire WAL
// and opens a new one.
func (w *Wal) Truncate(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its good you are creating a new wal for truncate, as the tidwall's truncate functions are not great and messes up with the indexes a lot. This makes it whole lot easier to handle

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah, it's the one thing in the tidwall/wal package that really irked me.

charithabandi
charithabandi previously approved these changes Aug 16, 2023
@charithabandi
Copy link
Contributor

LGTM!

Comment on lines +106 to +117
// TODO: I am not sure if this will actually work, since it is unserializing into an interface
// I am quite sure it wont; an alternative is to decode into a struct where public key is bytes, and
// create the public key from there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, []byte sounds nicer here. However, it looks like we can make anything RLP decodable by implementing the rlp.Decoder interface:

type Decoder interface {
	DecodeRLP(*Stream) error
}

I think they must have defined something similar internally to support big.Int fields.

However, I think the tests in pkg/serialize/encode_test.go aren't working. You can change the expected output to anything and the tests pass.

			if *decoded == tc.output {
				assert.Equal(t, tc.output, *decoded)
			}

I don't follow the test above^. I don't think the condition is ever true, so assert.Equal is never called.


// OrderMapLexicographically orders a map lexicographically by its keys.
// It permits any map with keys that are generically orderable.
// TODO: once upgraded to go 1.21, an equivalent is in the standard library
Copy link
Member

@jchappelow jchappelow Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the Ordered constraint in the new cmp package, or this map k/v extraction function? https://pkg.go.dev/cmp@master#Ordered
Ordered started life in the x/exp/constraints package if we wanna use that until we force the 1.21 upgrade: https://pkg.go.dev/golang.org/x/exp/constraints#Ordered

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to Ordered. Sorry, that was totally not clear. I pulled this from a blog post, but I am now seeing that it is quite literally the exact same, so I am assuming they got it from experimental. I'll just use the experimental package for now.

a.mu.Lock()
defer a.mu.Unlock()

defer a.handleErr(context.Background(), &err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still in draft, and that handleErr was previously added, but the indirection for the error looks unnecessary. error is an interface, not a concrete type, so almost certainly no need to point to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does not function properly without the indirection. This was something I saw a while ago in another package for a very similar use case; deferred errors don't quite work as expected. I just tried switching it to just the interface, and the tests for this fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh. This has to do with when Go evaluates the arguments for deferred functions (immediately, rather than when it calls the function), so it's a little broader than errors. Typically it's solved using a closure to capture the variable:

defer func() { a.handleErr(context.Background(), err) }()

But that's almost as ugly as *error in handleErr. Hmm, I guess it's alright with a pointer, but both are kinda ick.


defer a.handleErr(context.Background(), &err)

if !a.inProgress {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out the need the mutex field, and which fields it guards. The field comments suggest it guards just the inProgress field. But one thing I have been seeing frequently in other parts of kwild is a mutex added to a type without really explaining why it is required. That is, if the consumer is well-behaved and calls the methods in a sensible order there's potentially no need to make a type thread-safe. AFAICT, the caller (AbciApp in this case), will be using Begin/ID/Commit synchronously.

In general I think a mutex needs justification so that it the intended access patterns and responsibility for synchronizing access is established. Every type does not and should not necessarily be made thread safe for any conceivable (mis)use of the methods. Sometimes it's the callers responsibility, and sometimes it's the type's job.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an area that I personally am just not great at yet. It's always a little confusing to me what should "own" concurrency concerns.

I put it here for a few reasons:

  1. I'm not intimately aware of CometBFT's concurrency (I am assuming they don't handle it, but I was not sure).
  2. This isn't really tied to CometBFT; it can kind of be used generally.
  3. Synchronous calls of atomic committer could be very bad. In particular, it could potentially cause data loss / nondeterminism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, just being defensive, for the possibility of arbitrary and unexpected use patterns. Just the need for the inProgress bool (not even the mutex) itself had me wondering.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inProgress is even more necessary with what I mentioned on the way to lunch today. I'll push it up right now so you can see.

go.mod Outdated
@@ -5,8 +5,11 @@ go 1.20
require (
github.com/alexliesenfeld/health v0.6.0
github.com/antlr/antlr4/runtime/Go/antlr/v4 v4.0.0-20230512164433-5d1fd1a340c9
github.com/aws/smithy-go v1.13.5
Copy link
Member

@jchappelow jchappelow Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the nits, but I think we should be more conservative when it comes to dependencies.
In this case you're importing this to get a time.Time from a unix/epoch time stamp, which is usually just time.Unix(epochStamp, 0), but since you've defined testVoteOptions.timestamp float64 you'd have to convert. But I don't see any sub-second precision time stamps in use, just whole numbers, and even if we want subsecond precision in tests, the testVoteOptions.timestamp field can just be a time.Time.

Can we toast this require?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh jeez I totally thought I was using the stdlib time package. Yeah I'll go change that

@brennanjl brennanjl marked this pull request as ready for review August 21, 2023 04:10
@brennanjl brennanjl force-pushed the fix/abci branch 2 times, most recently from 1e8fc03 to accea1f Compare August 21, 2023 15:40
Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great. Nice to see kwild on it's feet again!

Just some nits and questions. I also didn't look to hard at the privval code that was borrowed from comet

go.mod Show resolved Hide resolved
internal/app/kwild/config/config_test.go Outdated Show resolved Hide resolved
internal/app/kwild/config/variables.go Show resolved Hide resolved
internal/app/kwild/server/root.go Outdated Show resolved Hide resolved
internal/app/kwild/server/root.go Show resolved Hide resolved
internal/app/kwild/server/utils.go Outdated Show resolved Hide resolved
pkg/abci/cometbft/node.go Outdated Show resolved Hide resolved
…reation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface
…istence of app-hashes, made slight bug fixes, implemented wal for 2pc
@brennanjl
Copy link
Collaborator Author

@jchappelow I have addressed the changes you mentioned.

jchappelow
jchappelow previously approved these changes Aug 21, 2023
Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except I'm a little uncertain about the closers.

internal/app/kwild/server/root.go Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no clue where that came from. deleted

@sonarcloud
Copy link

sonarcloud bot commented Aug 21, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 16 Code Smells

No Coverage information No Coverage information
2.2% 2.2% Duplication

Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapprove. Perhaps we can tweak github so approvals don't get stale so easily.

@brennanjl brennanjl merged commit 86c0046 into main Aug 21, 2023
2 of 3 checks passed
@brennanjl brennanjl deleted the fix/abci branch August 21, 2023 21:34
@jchappelow jchappelow added this to the v0.6.0 milestone Sep 28, 2023
brennanjl added a commit that referenced this pull request Feb 26, 2024
* created some basic initializations to better abstract cometBFT node creation.  Still a bit more to think through.  Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface

* added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc

* added privval to avoid using cometbfts type

* aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables

* got cometbft running

* bug fixes for atomic committer, added garbage collection for KV store, workable dependency building

* typo fix

* fixed bug with closers not closing gracefully

* made resource closure synchronous to prevent bugs that can occur from out of order closing

* merged with snapshots, made final changes

* made closers the outermost thing to close, even after server panic

* removed debug_bin
brennanjl added a commit that referenced this pull request Feb 26, 2024
* created some basic initializations to better abstract cometBFT node creation.  Still a bit more to think through.  Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface

* added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc

* added privval to avoid using cometbfts type

* aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables

* got cometbft running

* bug fixes for atomic committer, added garbage collection for KV store, workable dependency building

* typo fix

* fixed bug with closers not closing gracefully

* made resource closure synchronous to prevent bugs that can occur from out of order closing

* merged with snapshots, made final changes

* made closers the outermost thing to close, even after server panic

* removed debug_bin
jchappelow pushed a commit that referenced this pull request Feb 26, 2024
* created some basic initializations to better abstract cometBFT node creation.  Still a bit more to think through.  Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface

* added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc

* added privval to avoid using cometbfts type

* aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables

* got cometbft running

* bug fixes for atomic committer, added garbage collection for KV store, workable dependency building

* typo fix

* fixed bug with closers not closing gracefully

* made resource closure synchronous to prevent bugs that can occur from out of order closing

* merged with snapshots, made final changes

* made closers the outermost thing to close, even after server panic

* removed debug_bin
brennanjl added a commit that referenced this pull request Feb 26, 2024
* created some basic initializations to better abstract cometBFT node creation.  Still a bit more to think through.  Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface

* added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc

* added privval to avoid using cometbfts type

* aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables

* got cometbft running

* bug fixes for atomic committer, added garbage collection for KV store, workable dependency building

* typo fix

* fixed bug with closers not closing gracefully

* made resource closure synchronous to prevent bugs that can occur from out of order closing

* merged with snapshots, made final changes

* made closers the outermost thing to close, even after server panic

* removed debug_bin
brennanjl added a commit that referenced this pull request Feb 26, 2024
* created some basic initializations to better abstract cometBFT node creation.  Still a bit more to think through.  Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface

* added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc

* added privval to avoid using cometbfts type

* aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables

* got cometbft running

* bug fixes for atomic committer, added garbage collection for KV store, workable dependency building

* typo fix

* fixed bug with closers not closing gracefully

* made resource closure synchronous to prevent bugs that can occur from out of order closing

* merged with snapshots, made final changes

* made closers the outermost thing to close, even after server panic

* removed debug_bin
brennanjl added a commit that referenced this pull request Feb 26, 2024
* created some basic initializations to better abstract cometBFT node creation.  Still a bit more to think through.  Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface

* added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc

* added privval to avoid using cometbfts type

* aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables

* got cometbft running

* bug fixes for atomic committer, added garbage collection for KV store, workable dependency building

* typo fix

* fixed bug with closers not closing gracefully

* made resource closure synchronous to prevent bugs that can occur from out of order closing

* merged with snapshots, made final changes

* made closers the outermost thing to close, even after server panic

* removed debug_bin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants