-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/abci #188
Conversation
pkg/abci/abci.go
Outdated
} | ||
|
||
func (a *AbciApp) InitChain(p0 abciTypes.RequestInitChain) abciTypes.ResponseInitChain { | ||
panic("TODO") | ||
ctx := context.Background() | ||
// TODO: the following was contained in the old ABCI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to initialize our app with the genesis validators, code will change a bit based on the validator store, but this initialization is needed. Think Jon is already handling this in his PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should get #182 very soon . Wrapping it up now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a case where there is more than one genesis validator? I sort of assumed that I would start up a Kwil node for a new network, then if there was another node, it would request, I would approve, and it would then be the second validator, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there would be a set at genesis. @charithabandi's docker-compose has three in the initial set for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If u want to start the network with more than 1 validators, you can do that by specifying it in the genesis file and bootup the network. or the other way is to do the whole join process. So yeah, there can be more than one at the app initialization
pkg/abci/abci.go
Outdated
func (a *AbciApp) Commit() abciTypes.ResponseCommit { | ||
ctx := context.Background() | ||
|
||
// TODO: how should we handle errors here? We probably want the node to stop, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying something for this here: 61ddf18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, except we would want to make sure that a panic in ABCI still allows the modules to gracefully shutdown. I'm not quite sure how to do this (if it isn't the case already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some ideas about lifetime management of subsystems.
Presently, even if an operator attempts to cleanly shutdown the node (e.g. CTRL+C or SIGINT), the only things that gracefully shutdown afaict are the grpc server and gateway. The other systems consumed by these servers have no lifetime management at all that I can discern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, except the Node via the startStopper
, but I don't see any of the shutdown signals propagating further down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it's not something we have yet, but it is something we need to add. My thinking is that we can handle closing of the data stores in the same place that we handle stopping of servers, so that we can do it gracefully. I can try to put together an example of what this would look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing of the data stores in the same place that we handle stopping of servers
This is relatively straightforward with the nodeWrapper
I've added in the validators PR:
kwil-db/internal/app/kwild/server/root.go
Lines 69 to 85 in 61ddf18
type nodeWrapper struct { | |
n *nm.Node | |
} | |
func (nw *nodeWrapper) Start() error { | |
return nw.n.Start() | |
} | |
func (nw *nodeWrapper) Stop() error { | |
if err := nw.n.Stop(); err != nil { | |
return err | |
} | |
nw.n.Wait() | |
return nil | |
} | |
var _ (startStopper) = (*nodeWrapper)(nil) |
I've added that mainly because when Server
calls Stop
, the icky Node
API requires you to Wait
for shutdown. So we can also stop the other systems in this type. This nodeWrapper
is constructed in buildServer
where the other modules and services are brought up, so it's easy enough to wrap them here too.
I think. I'm not seeing all the Close method uses clearly yet.
|
||
// Truncate deletes the entire WAL | ||
// and opens a new one. | ||
func (w *Wal) Truncate(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its good you are creating a new wal for truncate, as the tidwall's truncate functions are not great and messes up with the indexes a lot. This makes it whole lot easier to handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yeah, it's the one thing in the tidwall/wal
package that really irked me.
LGTM! |
// TODO: I am not sure if this will actually work, since it is unserializing into an interface | ||
// I am quite sure it wont; an alternative is to decode into a struct where public key is bytes, and | ||
// create the public key from there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, []byte
sounds nicer here. However, it looks like we can make anything RLP decodable by implementing the rlp.Decoder
interface:
type Decoder interface {
DecodeRLP(*Stream) error
}
I think they must have defined something similar internally to support big.Int
fields.
However, I think the tests in pkg/serialize/encode_test.go aren't working. You can change the expected output
to anything and the tests pass.
if *decoded == tc.output {
assert.Equal(t, tc.output, *decoded)
}
I don't follow the test above^. I don't think the condition is ever true, so assert.Equal is never called.
|
||
// OrderMapLexicographically orders a map lexicographically by its keys. | ||
// It permits any map with keys that are generically orderable. | ||
// TODO: once upgraded to go 1.21, an equivalent is in the standard library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the Ordered
constraint in the new cmp
package, or this map k/v extraction function? https://pkg.go.dev/cmp@master#Ordered
Ordered
started life in the x/exp/constraints package if we wanna use that until we force the 1.21 upgrade: https://pkg.go.dev/golang.org/x/exp/constraints#Ordered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to Ordered
. Sorry, that was totally not clear. I pulled this from a blog post, but I am now seeing that it is quite literally the exact same, so I am assuming they got it from experimental. I'll just use the experimental package for now.
pkg/sessions/session.go
Outdated
a.mu.Lock() | ||
defer a.mu.Unlock() | ||
|
||
defer a.handleErr(context.Background(), &err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still in draft, and that handleErr
was previously added, but the indirection for the error looks unnecessary. error
is an interface, not a concrete type, so almost certainly no need to point to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually does not function properly without the indirection. This was something I saw a while ago in another package for a very similar use case; deferred errors don't quite work as expected. I just tried switching it to just the interface, and the tests for this fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh. This has to do with when Go evaluates the arguments for deferred functions (immediately, rather than when it calls the function), so it's a little broader than errors. Typically it's solved using a closure to capture the variable:
defer func() { a.handleErr(context.Background(), err) }()
But that's almost as ugly as *error
in handleErr
. Hmm, I guess it's alright with a pointer, but both are kinda ick.
|
||
defer a.handleErr(context.Background(), &err) | ||
|
||
if !a.inProgress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out the need the mutex field, and which fields it guards. The field comments suggest it guards just the inProgress
field. But one thing I have been seeing frequently in other parts of kwild is a mutex added to a type without really explaining why it is required. That is, if the consumer is well-behaved and calls the methods in a sensible order there's potentially no need to make a type thread-safe. AFAICT, the caller (AbciApp
in this case), will be using Begin
/ID
/Commit
synchronously.
In general I think a mutex needs justification so that it the intended access patterns and responsibility for synchronizing access is established. Every type does not and should not necessarily be made thread safe for any conceivable (mis)use of the methods. Sometimes it's the callers responsibility, and sometimes it's the type's job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an area that I personally am just not great at yet. It's always a little confusing to me what should "own" concurrency concerns.
I put it here for a few reasons:
- I'm not intimately aware of CometBFT's concurrency (I am assuming they don't handle it, but I was not sure).
- This isn't really tied to CometBFT; it can kind of be used generally.
- Synchronous calls of atomic committer could be very bad. In particular, it could potentially cause data loss / nondeterminism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, just being defensive, for the possibility of arbitrary and unexpected use patterns. Just the need for the inProgress bool (not even the mutex) itself had me wondering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inProgress
is even more necessary with what I mentioned on the way to lunch today. I'll push it up right now so you can see.
go.mod
Outdated
@@ -5,8 +5,11 @@ go 1.20 | |||
require ( | |||
github.com/alexliesenfeld/health v0.6.0 | |||
github.com/antlr/antlr4/runtime/Go/antlr/v4 v4.0.0-20230512164433-5d1fd1a340c9 | |||
github.com/aws/smithy-go v1.13.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the nits, but I think we should be more conservative when it comes to dependencies.
In this case you're importing this to get a time.Time
from a unix/epoch time stamp, which is usually just time.Unix(epochStamp, 0)
, but since you've defined testVoteOptions.timestamp float64
you'd have to convert. But I don't see any sub-second precision time stamps in use, just whole numbers, and even if we want subsecond precision in tests, the testVoteOptions.timestamp
field can just be a time.Time
.
Can we toast this require?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh jeez I totally thought I was using the stdlib time package. Yeah I'll go change that
1e8fc03
to
accea1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great. Nice to see kwild on it's feet again!
Just some nits and questions. I also didn't look to hard at the privval code that was borrowed from comet
…reation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface
…istence of app-hashes, made slight bug fixes, implemented wal for 2pc
…tion of Committables
…, workable dependency building
… out of order closing
@jchappelow I have addressed the changes you mentioned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except I'm a little uncertain about the closers.
cmd/kwild/__debug_bin3948100767
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no clue where that came from. deleted
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapprove. Perhaps we can tweak github so approvals don't get stale so easily.
* created some basic initializations to better abstract cometBFT node creation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface * added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc * added privval to avoid using cometbfts type * aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables * got cometbft running * bug fixes for atomic committer, added garbage collection for KV store, workable dependency building * typo fix * fixed bug with closers not closing gracefully * made resource closure synchronous to prevent bugs that can occur from out of order closing * merged with snapshots, made final changes * made closers the outermost thing to close, even after server panic * removed debug_bin
* created some basic initializations to better abstract cometBFT node creation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface * added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc * added privval to avoid using cometbfts type * aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables * got cometbft running * bug fixes for atomic committer, added garbage collection for KV store, workable dependency building * typo fix * fixed bug with closers not closing gracefully * made resource closure synchronous to prevent bugs that can occur from out of order closing * merged with snapshots, made final changes * made closers the outermost thing to close, even after server panic * removed debug_bin
* created some basic initializations to better abstract cometBFT node creation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface * added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc * added privval to avoid using cometbfts type * aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables * got cometbft running * bug fixes for atomic committer, added garbage collection for KV store, workable dependency building * typo fix * fixed bug with closers not closing gracefully * made resource closure synchronous to prevent bugs that can occur from out of order closing * merged with snapshots, made final changes * made closers the outermost thing to close, even after server panic * removed debug_bin
* created some basic initializations to better abstract cometBFT node creation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface * added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc * added privval to avoid using cometbfts type * aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables * got cometbft running * bug fixes for atomic committer, added garbage collection for KV store, workable dependency building * typo fix * fixed bug with closers not closing gracefully * made resource closure synchronous to prevent bugs that can occur from out of order closing * merged with snapshots, made final changes * made closers the outermost thing to close, even after server panic * removed debug_bin
* created some basic initializations to better abstract cometBFT node creation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface * added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc * added privval to avoid using cometbfts type * aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables * got cometbft running * bug fixes for atomic committer, added garbage collection for KV store, workable dependency building * typo fix * fixed bug with closers not closing gracefully * made resource closure synchronous to prevent bugs that can occur from out of order closing * merged with snapshots, made final changes * made closers the outermost thing to close, even after server panic * removed debug_bin
* created some basic initializations to better abstract cometBFT node creation. Still a bit more to think through. Spent some time trying to figure out the proper way to do validator keys, only to find out that it does in fact only support ED25519 and will fail at runtime if anything else implements the interface * added atomic kv store, altered atomic committer to suport atomic persistence of app-hashes, made slight bug fixes, implemented wal for 2pc * added privval to avoid using cometbfts type * aqdded register/unregister to AtomicCommitter for on-the-fly registration of Committables * got cometbft running * bug fixes for atomic committer, added garbage collection for KV store, workable dependency building * typo fix * fixed bug with closers not closing gracefully * made resource closure synchronous to prevent bugs that can occur from out of order closing * merged with snapshots, made final changes * made closers the outermost thing to close, even after server panic * removed debug_bin
I've implemented various changes here, all necessary for getting ABCI working. These include:
sessions.Committable
AtomicCommitter
, to allow for generation of the app-hash before the end of the first commit phase (this is necessary since apphash needs to be persisted atomically).There's really only two things to flag here:
Transaction.UnmarshalBinary
: We deserialize binary into thetransactions.Transaction
struct, but it contains a public key interface. This will certainly fail, since the deserializer does not know what implementation of the Public Key interface to use. I don't think this is rocket science to fix, I just put a TODO there to flag it.abci
: Various TODOs for @jchappelow are sprinkled through abci. They will take a few minutes to implement; mostly just deserializing transaction payloads, or returning validator updates to ABCI.This knocks out quite a few blockers for getting integration tests working; the biggest blocker is still the
crypto
package.