Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celostats reliability #1487

Merged
merged 4 commits into from
Apr 7, 2021
Merged

Celostats reliability #1487

merged 4 commits into from
Apr 7, 2021

Conversation

gastonponti
Copy link
Contributor

@gastonponti gastonponti commented Apr 7, 2021

Description

Improve reliability from the ethstats module

The connectionWrapper was cherrypicked from go-ethereum upstream:
ethereum/go-ethereum@82a9e11 (ethereum/go-ethereum#21404)

Other changes

Validator injecting its version to the proxy stats chunk

Tested

Manually in a local testnet

Related issues

Backwards compatibility

Yes

* concurrent writing in sockets
* change stats polling using timers
* better handling of the messages from the proxy <-> validator
communication
@gastonponti gastonponti requested a review from a team as a code owner April 7, 2021 15:10
@gastonponti gastonponti requested review from mcortesi and trianglesphere and removed request for a team April 7, 2021 15:10
@gastonponti gastonponti changed the title Gastonponti/celostats reliable Celostats reliability Apr 7, 2021
Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.

I'd be curious if upstream has done something similar with the connWrapper (it seems like they'd have similar issues w/ concurrency).

Otherwise I had a couple comments on some potential future work. I think this is better than what was, but I wonder if it might be possible to make the sendCh less racey.

ethstats/ethstats.go Outdated Show resolved Hide resolved
dialer := websocket.Dialer{HandshakeTimeout: 5 * time.Second}
header := make(http.Header)
header.Set("origin", "http://localhost")
for _, url := range urls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new, but it is it correct that we only use one connection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the celostats server? I think that as is not critical data, handling only one connection keeps the other code (and even the celostats server) simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. I agree it makes it simpler, but it's weird to have multiple urls but only use 1

}
case signedMessage := <-sendCh:
// if it is a ping or hello message, it shouldn't be handled here
if signedMessage.Action != actionNodePing && signedMessage.Action != actionHello {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to split this out into several channels (1 for each message type)? It seems like we're selecting from it in multiple places, but are only looking for a specific message.

I think with the retry logic we'll be fine, but I think it may be possible to rely on it less (but you'll probably still need it b/c of the network).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a check that shouldn't happened because those are messages that should be consumed before, or fail due to a timeout, and that timeout should trigger a disconnect -> reconnect. But I wanted to differentiate that case for future logs if for some reason there was something under my radar.

Regardless of handling in different channels, it was something I thought about, and it's probably a thing that we could add when we change the way we send stats (to be defined in these days), because there won't be necessary for the proxy to know which message it is (if we start using it as a transport layer, instead of generator+transport as it is today)

@gastonponti
Copy link
Contributor Author

I'd be curious if upstream has done something similar with the connWrapper (it seems like they'd have similar issues w/ concurrency).

Yes, that change is from upstream: https://github.com/ethereum/go-ethereum/pull/21404/files

@trianglesphere
Copy link
Contributor

Yes, that change is from upstream: https://github.com/ethereum/go-ethereum/pull/21404/files

Oh nice. Could you tag it in the PR as a cherry-pick for when we eventually get around to merging that from upstream.

Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gastonponti gastonponti merged commit 51da97f into master Apr 7, 2021
@gastonponti gastonponti deleted the gastonponti/celostats-reliable branch April 7, 2021 21:02
oneeman pushed a commit that referenced this pull request Apr 8, 2021
Description
Improve reliability from the ethstats module

The connectionWrapper was cherrypicked from go-ethereum upstream:
ethereum/go-ethereum@82a9e11 (ethereum/go-ethereum#21404)

Other changes
Validator injecting its version to the proxy stats chunk

Tested
Manually in a local testnet

Related issues
Fixes #1395
Fixes #1397
Backwards compatibility
Yes
oneeman pushed a commit that referenced this pull request Apr 8, 2021
* Celostats reliability (#1487)

Description
Improve reliability from the ethstats module

The connectionWrapper was cherrypicked from go-ethereum upstream:
ethereum/go-ethereum@82a9e11 (ethereum/go-ethereum#21404)

Other changes
Validator injecting its version to the proxy stats chunk

Tested
Manually in a local testnet

Related issues
Fixes #1395
Fixes #1397
Backwards compatibility
Yes

* Skip GPM and other checks for transactions whose gas limit exceeds what's left in the block

* Change default tx pool GlobalSlots back to 4096.

The decrease to 2048 was due to inefficiencies in the tx pool which have now been fixed, so this sets it back to its original value.

* Release v1.3.0-beta.3

Co-authored-by: Gaston Ponti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

celostats validator true client version celostats reliability
2 participants