Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmpnet: Separate node into orchestration, config and process #2460

Merged
merged 5 commits into from
Dec 19, 2023

Conversation

marun
Copy link
Contributor

@marun marun commented Dec 10, 2023

  • Previously tmpnet/node.go contained methods to orchestrate nodes, manage their processes, and read and write their configuration. This PR moves reading/writing of configuration to node_config.go and process management to node_process.go. The separation is intended to improve maintainability.

  • A new NodeRuntime interface is added to support future pod-based implementation.

@marun marun self-assigned this Dec 10, 2023
@marun marun force-pushed the tmpnet-move-local branch from f9aada9 to 3e42f16 Compare December 10, 2023 21:14
@marun marun force-pushed the tmpnet-refactor branch 9 times, most recently from 7761c0c to 96f8d79 Compare December 10, 2023 23:31
@marun marun added the testing This primarily focuses on testing label Dec 11, 2023
@marun marun changed the base branch from tmpnet-move-local to tmpnet-refactor-config December 11, 2023 01:54
@marun marun force-pushed the tmpnet-refactor-config branch from e446b6a to 0a2ab0a Compare December 11, 2023 03:21
@marun marun changed the title tmpnet: Refactor for maintainability tmpnet: Separate node into orchestration, config and process Dec 11, 2023
@marun marun force-pushed the tmpnet-refactor branch 3 times, most recently from 7d3474d to 282f77d Compare December 11, 2023 05:01
@marun marun changed the title tmpnet: Separate node into orchestration, config and process tmpnet: Separate node into orchestration, config and process Dec 11, 2023
@marun marun force-pushed the tmpnet-refactor-config branch from 0a2ab0a to 2597b74 Compare December 11, 2023 19:01
@marun marun marked this pull request as ready for review December 13, 2023 20:36
@marun marun requested review from abi87 and gyuho as code owners December 13, 2023 20:36
@marun marun force-pushed the tmpnet-refactor-config branch from 5c35f62 to dab364e Compare December 14, 2023 00:33
tests/fixture/tmpnet/README.md Show resolved Hide resolved
RootDirEnvName = "TMPNET_ROOT_DIR"

DefaultNetworkTimeout = 2 * time.Minute
DefaultNodeInitTimeout = 10 * time.Second

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


DefaultNetworkTimeout = 2 * time.Minute
DefaultNodeInitTimeout = 10 * time.Second
DefaultNodeStopTimeout = 5 * time.Second

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Return a deep copy of the flags map.
func (f FlagsMap) Copy() FlagsMap {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use maps.Copy https://pkg.go.dev/golang.org/x/exp/maps#Copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused - deleted.

Comment on lines +40 to +53
switch t := err.(type) {
case *net.OpError:
if t.Op == "read" {
// Connection refused - potentially recoverable
return false, nil
}
case syscall.Errno:
if t == syscall.ECONNREFUSED {
// Connection refused - potentially recoverable
return false, nil
}
}
// Assume all other errors are not recoverable
return false, fmt.Errorf("failed to query node health: %w", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general in avalanchego we don't check whether network errors are potentially temporary/recoverable, and just assume they're not. I'd be ok doing the same here

Copy link
Contributor Author

@marun marun Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing my best here to avoid failing the start of a network due to one or more nodes being in an intermediate network state before becoming healthy. Maybe it makes sense in the context of a running network to be more strict, but I think this fixture needs to be more forgiving (i.e. to avoid flakes in CI).

// Retrieve the node process if it is running. As part of determining
// process liveness, the node's process context will be refreshed if
// live or cleared if not running.
func (p *NodeProcess) getProcess() (*os.Process, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we do exec.Command to start the binary, we have a reference to the Process. I think we can use that instead, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense in some cases to maintain a reference to the process - e.g. when the network is started by the test suite - but this fixture also supports temporary networks whose nodes were started by the tmpnetctl command. This has the potential to speed up test development since the cost of network setup only needs to be incurred once for many invocations of the test suite or a subset thereof. I made good use of this capability during migration of the kurtosis tests.

This method also enables shutdown of an existing network via the tmpnetctl command.

And while it would be possible to hold a reference to the process in the case where the network was started in-process, I don't think the complexity would be justified by the minimal overhead imposed by on-demand process lookup in the context of test execution.

@marun marun force-pushed the tmpnet-refactor-config branch from dab364e to e338c6e Compare December 15, 2023 02:50
@marun marun force-pushed the tmpnet-refactor-config branch from e338c6e to fe08729 Compare December 15, 2023 04:38
@marun marun force-pushed the tmpnet-refactor-config branch from fe08729 to 205f6e8 Compare December 15, 2023 04:42
Base automatically changed from tmpnet-refactor-config to dev December 15, 2023 17:52
@marun marun linked an issue Dec 18, 2023 that may be closed by this pull request
6 tasks
@@ -31,6 +31,8 @@ the following non-test files:
| genesis.go | | Creates test genesis |
| network.go | Network | Orchestrates and configures temporary networks |
| node.go | Node | Orchestrates and configures nodes |
| node_config.go | Node | Reads and writes node configuration |
| node_process.go | NodeProcess | Orchestrates node processes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 107: DefaultRuntime: should be renamed to NodeRuntimeConfig I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Avoid attempting to start an already running node.
proc, err := p.getProcess()
if err != nil {
return fmt.Errorf("failed to start node process: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this error message seems a bit off. It's not really starting a process here, just checking if it's running already right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func (p *NodeProcess) InitiateStop() error {
proc, err := p.getProcess()
if err != nil {
return fmt.Errorf("failed to retrieve process to stop: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a better error message. Consider adapting it for p.Start as well

Copy link
Contributor

@abi87 abi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found minor nits but LGTM overall

This refactor is intended to improve maintainability by separating
node into coherent constituent parts and minimizing the exported API.
@StephenButtolph StephenButtolph added this to the v1.10.18 milestone Dec 19, 2023
@StephenButtolph StephenButtolph added this pull request to the merge queue Dec 19, 2023
Merged via the queue into dev with commit fa21d78 Dec 19, 2023
17 checks passed
@StephenButtolph StephenButtolph deleted the tmpnet-refactor branch December 19, 2023 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing This primarily focuses on testing
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add subnet support to the tmpnet fixture
4 participants