Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime message results #4443

Merged
merged 1 commit into from
Feb 4, 2022
Merged

Conversation

ptrus
Copy link
Member

@ptrus ptrus commented Jan 21, 2022

Closes: #4402

Implementation of runtime message results as described in ADR 0012.

TODO:

  • finish the remaining todos
  • implement rust types for results
  • add checks to e2e test runtime
  • update the result types in the ADR

@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch 4 times, most recently from e71d44e to b69946e Compare January 24, 2022 10:33
@ptrus ptrus added the s:ready-ci Status: ready for CI label Jan 24, 2022
@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch from b69946e to 6d6832a Compare January 24, 2022 10:35
@codecov
Copy link

codecov bot commented Jan 24, 2022

Codecov Report

Merging #4443 (63f9a4e) into master (795eb65) will increase coverage by 0.13%.
The diff coverage is 69.97%.

❗ Current head 63f9a4e differs from pull request most recent head 0e1bdcb. Consider uploading reports for the commit 0e1bdcb to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4443      +/-   ##
==========================================
+ Coverage   68.66%   68.79%   +0.13%     
==========================================
  Files         415      415              
  Lines       46571    46798     +227     
==========================================
+ Hits        31976    32194     +218     
- Misses      10635    10638       +3     
- Partials     3960     3966       +6     
Impacted Files Coverage Δ
go/consensus/tendermint/abci/mux.go 57.12% <0.00%> (+1.25%) ⬆️
go/consensus/tendermint/api/app.go 0.00% <0.00%> (ø)
go/consensus/tendermint/apps/beacon/beacon.go 73.77% <0.00%> (ø)
...consensus/tendermint/apps/keymanager/keymanager.go 62.63% <0.00%> (ø)
go/consensus/tendermint/apps/registry/genesis.go 47.32% <0.00%> (ø)
...o/consensus/tendermint/apps/scheduler/scheduler.go 71.14% <0.00%> (ø)
...nt/apps/supplementarysanity/supplementarysanity.go 77.61% <0.00%> (ø)
go/roothash/api/api.go 80.00% <ø> (ø)
go/runtime/registry/host.go 71.88% <0.00%> (-4.81%) ⬇️
go/staking/api/api.go 62.33% <ø> (ø)
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d51556...0e1bdcb. Read the comment docs.

@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch 8 times, most recently from edcd45f to ceaf072 Compare January 27, 2022 14:55
@ptrus ptrus marked this pull request as ready for review January 27, 2022 14:55
func (md *messageDispatcher) Publish(ctx *api.Context, kind, msg interface{}) error {
if len(md.subscriptions[kind]) == 0 {
return api.ErrNoSubscribers
func (md *messageDispatcher) Publish(ctx *api.Context, kind, msg interface{}) ([]interface{}, error) {
Copy link
Contributor

@pro-wh pro-wh Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I actually see the tests, the interface for this feels so idiosyncratic. for one, it's one of those rare functions where the first return value is meaningful when the error is non-nil.

from the code in this PR, the behavior is like this:

  • the results slice always has one element per subscriber
  • the error is a multierror with an element for each subscriber that errored, thus varying length
  • the subscriber's interface treats its response as undefined when the error is non-nil
  • for a subscriber that errors, its corresponding slot in the results is left as nil

I just wonder if we will really be able to write rigorous multi-subscriber programs with this. my concerns are:

  1. it's not clear to what extent the code calling Publish is expected to know the composition and ordering of the subscribers. the previous code where the errors are piled together suggest the calling code doesn't know/doesn't care. but surely that knowledge must be increased now that we have responses that the calling code will be looking for.
  2. it's not clear how the calling code should find the result it's looking for. by known index? or should it maybe loop through and find it by type? the ADR suggests that the zeroth result is the important one that the runtime will want to know about.
  3. some calling code that's less knowledgeable about the composition of subscribers won't be able to differentiate subscribers that succeed and return nil from subscribers that errored. at least not from looking at the results slice. not sure if this will come up.
  4. for calling code that's more knowledgeable about the composition of subscribers, I think it will be hard to match up errors with what subscriber it came from. maybe those use cases will have to include more metadata in the errors to work around that.

do we have cases where there are multiple subscribers for a message? or if not, what are the intended use cases? how do/will we handle the situation where some subscribers fail and some succeed? is the zeroth result always the one that a runtime will want to see?

Copy link
Member Author

@ptrus ptrus Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error is a multierror with an element for each subscriber that errored, thus varying length

The error return type should probably be changed to be a list of errors, so then the caller knows which responses are non failed.

do we have cases where there are multiple subscribers for a message? or if not, what are the intended use cases?

No, no such cases. Not sure about intended use-cases.

how do/will we handle the situation where some subscribers fail and some succeed

I guess it's up to the caller to decide.

is the zeroth result always the one that a runtime will want to see?

For all existing cases, yes, as there's always a single subscriber. Maybe the code should propagate all results to the caller, and not just the first one (although this makes no difference at this point in time).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's at least a use case for zero subscribers, so that we can support non-lock-step changes to paratimes and core. Maybe now is the time to consider switching to a zero-or-one subscriber model of message handlers?

Copy link
Contributor

@pro-wh pro-wh Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error slice would be unambiguous I guess. potentially verbose retrofitting work though.

I see some places in this PR's callsite changes like this

// Notify other interested applications about the resumed runtime.
if _, err = app.md.Publish(ctx, registryApi.MessageRuntimeResumed, rt); err != nil {

I guess we've been using this message dispatcher class for things other than runtime-sent messages too. as I understand it, the requirements there are "if any handler fails, abort tx or halt consensus"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we've been using this message dispatcher class for things other than runtime-sent messages too.

Yes, this is a general pubsub mechanism used by the consensus layer services.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify this by doing something like:

  • If any subscribe handler failed, only an error is returned (with nil result).
  • Only the last non-nil result is returned, possibly panic in case multiple handlers return a non-nil result.

This should be enough for the current use cases.

Copy link
Member Author

@ptrus ptrus Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok agreed, i'll make Publish return a single result then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please take a look, I'll update the ADR once this new approach is finalized.

@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch 3 times, most recently from bba46ac to 63f9a4e Compare February 1, 2022 15:13
go/consensus/tendermint/abci/messages.go Outdated Show resolved Hide resolved
func (md *messageDispatcher) Publish(ctx *api.Context, kind, msg interface{}) error {
if len(md.subscriptions[kind]) == 0 {
return api.ErrNoSubscribers
func (md *messageDispatcher) Publish(ctx *api.Context, kind, msg interface{}) ([]interface{}, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify this by doing something like:

  • If any subscribe handler failed, only an error is returned (with nil result).
  • Only the last non-nil result is returned, possibly panic in case multiple handlers return a non-nil result.

This should be enough for the current use cases.

go/roothash/api/api.go Outdated Show resolved Hide resolved
go/roothash/api/api.go Outdated Show resolved Hide resolved
runtime/src/consensus/roothash.rs Outdated Show resolved Hide resolved
@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch 5 times, most recently from 3335210 to 20553c3 Compare February 3, 2022 18:47
@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch from 20553c3 to fc9b76c Compare February 3, 2022 18:53
Copy link
Contributor

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new dispatcher interface looks good, thanks!

result = resp
case resp != nil && result != nil:
// Multiple non-nil results, this is unexpected and unsupported by the pub-sub interface at this time.
panic(fmt.Sprintf("unexpected result: got: %d, previous result: %d", resp, result))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be sufficient to set a flag and return an error (and possibly nil result), I think. Callers will either abort the tx or halt consensus as appropriate.

Maybe in the future we can split up the dispatcher to have SubscribeRespondingly and SubscribeQuietly, so that we can detect problems at setup time instead of at dispatch time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I think that panic is probably more suitable, as this is considered a programmer error at the moment.

go/consensus/tendermint/abci/messages_test.go Outdated Show resolved Hide resolved
go/consensus/tendermint/abci/messages_test.go Outdated Show resolved Hide resolved
go/consensus/tendermint/abci/messages_test.go Outdated Show resolved Hide resolved
go/consensus/tendermint/apps/roothash/messages.go Outdated Show resolved Hide resolved
go/consensus/tendermint/apps/staking/transactions_test.go Outdated Show resolved Hide resolved
@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch 3 times, most recently from 3e3669c to 3ac4212 Compare February 4, 2022 15:10
@ptrus ptrus enabled auto-merge February 4, 2022 15:10
@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch 2 times, most recently from 26651f9 to 3390ff9 Compare February 4, 2022 15:33
Implementation of runtime message results as described in [ADR 0012].

[ADR 0012]: docs/adr/0012-runtime-message-results.md
@ptrus ptrus force-pushed the ptrus/feature/runtime-message-results branch from 3390ff9 to 0e1bdcb Compare February 4, 2022 15:41
@ptrus ptrus merged commit 44b27c2 into master Feb 4, 2022
@ptrus ptrus deleted the ptrus/feature/runtime-message-results branch February 4, 2022 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s:ready-ci Status: ready for CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Runtime Message Results
3 participants