-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tmpnet
: Separate node into orchestration, config and process
#2460
Conversation
f9aada9
to
3e42f16
Compare
7761c0c
to
96f8d79
Compare
96f8d79
to
05a7ca2
Compare
e446b6a
to
0a2ab0a
Compare
05a7ca2
to
29548f2
Compare
7d3474d
to
282f77d
Compare
tmpnet
: Separate node into orchestration, config and process
0a2ab0a
to
2597b74
Compare
282f77d
to
6b22969
Compare
5c35f62
to
dab364e
Compare
tests/fixture/tmpnet/defaults.go
Outdated
RootDirEnvName = "TMPNET_ROOT_DIR" | ||
|
||
DefaultNetworkTimeout = 2 * time.Minute | ||
DefaultNodeInitTimeout = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/fixture/tmpnet/defaults.go
Outdated
|
||
DefaultNetworkTimeout = 2 * time.Minute | ||
DefaultNodeInitTimeout = 10 * time.Second | ||
DefaultNodeStopTimeout = 5 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/fixture/tmpnet/flags.go
Outdated
} | ||
|
||
// Return a deep copy of the flags map. | ||
func (f FlagsMap) Copy() FlagsMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use maps.Copy
https://pkg.go.dev/golang.org/x/exp/maps#Copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused - deleted.
switch t := err.(type) { | ||
case *net.OpError: | ||
if t.Op == "read" { | ||
// Connection refused - potentially recoverable | ||
return false, nil | ||
} | ||
case syscall.Errno: | ||
if t == syscall.ECONNREFUSED { | ||
// Connection refused - potentially recoverable | ||
return false, nil | ||
} | ||
} | ||
// Assume all other errors are not recoverable | ||
return false, fmt.Errorf("failed to query node health: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general in avalanchego we don't check whether network errors are potentially temporary/recoverable, and just assume they're not. I'd be ok doing the same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing my best here to avoid failing the start of a network due to one or more nodes being in an intermediate network state before becoming healthy. Maybe it makes sense in the context of a running network to be more strict, but I think this fixture needs to be more forgiving (i.e. to avoid flakes in CI).
// Retrieve the node process if it is running. As part of determining | ||
// process liveness, the node's process context will be refreshed if | ||
// live or cleared if not running. | ||
func (p *NodeProcess) getProcess() (*os.Process, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we do exec.Command
to start the binary, we have a reference to the Process
. I think we can use that instead, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense in some cases to maintain a reference to the process - e.g. when the network is started by the test suite - but this fixture also supports temporary networks whose nodes were started by the tmpnetctl command. This has the potential to speed up test development since the cost of network setup only needs to be incurred once for many invocations of the test suite or a subset thereof. I made good use of this capability during migration of the kurtosis tests.
This method also enables shutdown of an existing network via the tmpnetctl command.
And while it would be possible to hold a reference to the process in the case where the network was started in-process, I don't think the complexity would be justified by the minimal overhead imposed by on-demand process lookup in the context of test execution.
dab364e
to
e338c6e
Compare
6b22969
to
d4e2f67
Compare
e338c6e
to
fe08729
Compare
fe08729
to
205f6e8
Compare
c6f2252
to
969b064
Compare
969b064
to
5b9bf79
Compare
5b9bf79
to
7c8a744
Compare
@@ -31,6 +31,8 @@ the following non-test files: | |||
| genesis.go | | Creates test genesis | | |||
| network.go | Network | Orchestrates and configures temporary networks | | |||
| node.go | Node | Orchestrates and configures nodes | | |||
| node_config.go | Node | Reads and writes node configuration | | |||
| node_process.go | NodeProcess | Orchestrates node processes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 107: DefaultRuntime:
should be renamed to NodeRuntimeConfig
I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/fixture/tmpnet/node_process.go
Outdated
// Avoid attempting to start an already running node. | ||
proc, err := p.getProcess() | ||
if err != nil { | ||
return fmt.Errorf("failed to start node process: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this error message seems a bit off. It's not really starting a process here, just checking if it's running already right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
func (p *NodeProcess) InitiateStop() error { | ||
proc, err := p.getProcess() | ||
if err != nil { | ||
return fmt.Errorf("failed to retrieve process to stop: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a better error message. Consider adapting it for p.Start as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found minor nits but LGTM overall
This refactor is intended to improve maintainability by separating node into coherent constituent parts and minimizing the exported API.
6beb808
to
ef41e2e
Compare
Previously tmpnet/node.go contained methods to orchestrate nodes, manage their processes, and read and write their configuration. This PR moves reading/writing of configuration to node_config.go and process management to node_process.go. The separation is intended to improve maintainability.
A new
NodeRuntime
interface is added to support future pod-based implementation.