-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: split Monitor functionality in two components #118214
Labels
A-testing
Testing tools and infrastructure
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-testeng
TestEng Team
Comments
renatolabs
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-testing
Testing tools and infrastructure
T-testeng
TestEng Team
labels
Jan 23, 2024
cc @cockroachdb/test-eng |
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
Previously, roachtests would use bare goroutines `go func...` to start tasks during a test to perform various background functions. This is problematic due to the nature of roachtest, which runs multiple tests in parallel using workers. If one such bare goroutine panics the whole test suite is interrupted and the process exits. This change is the first step and introduces an interface that both the traditional and mixedversion test frameworks can utilise. It intends to target both regular and MVT tests by supplying the test with the interface either through a helper or the test interface. In addition to providing the `Go` function, there are additional options that can be passed. The test framework will generally supply the panic and error handler, but users can supply additional options such as naming the task, or passing a custom logger. This makes logging more useful to determine what task panic or encountered an error. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
This change provides a task manager implementation that will drive the logic running behind the Tasker interface, as well as provide the implementation of it. The manager is meant to be used by the test framework(s), while the Tasker interface is supplied to tests. The framework will do additional logic such as terminating tasks at the end of a test, logging uncaught errors or issues and providing the appropriate panic and error handlers. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
Previously, the mixedversion framework had its own implementation for managing background tasks. This implementation performed well, but in its current state it is difficult to share with tests that do not require the mixedversion framework. This change replace the mixedversion framework background runner with the newly implemented task manager. The implementation details are relatively similar making it close to a drop-in replacement. Additionally, some utilities relating to tasks have been replaced by the task package. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
Previously, roachtests would use bare goroutines `go func...` to start tasks during a test to perform various background functions. This is problematic due to the nature of roachtest, which runs multiple tests in parallel using workers. If one such bare goroutine panics the whole test suite is interrupted and the process exits. This change is the first step and introduces an interface that both the traditional and mixedversion test frameworks can utilise. It intends to target both regular and MVT tests by supplying the test with the interface either through a helper or the test interface. In addition to providing the `Go` function, there are additional options that can be passed. The test framework will generally supply the panic and error handler, but users can supply additional options such as naming the task, or passing a custom logger. This makes logging more useful to determine what task panic or encountered an error. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
This change provides a task manager implementation that will drive the logic running behind the Tasker interface, as well as provide the implementation of it. The manager is meant to be used by the test framework(s), while the Tasker interface is supplied to tests. The framework will do additional logic such as terminating tasks at the end of a test, logging uncaught errors or issues and providing the appropriate panic and error handlers. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
Previously, the mixedversion framework had its own implementation for managing background tasks. This implementation performed well, but in its current state it is difficult to share with tests that do not require the mixedversion framework. This change replace the mixedversion framework background runner with the newly implemented task manager. The implementation details are relatively similar making it close to a drop-in replacement. Additionally, some utilities relating to tasks have been replaced by the task package. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 23, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
Previously, roachtests would use bare goroutines `go func...` to start tasks during a test to perform various background functions. This is problematic due to the nature of roachtest, which runs multiple tests in parallel using workers. If one such bare goroutine panics the whole test suite is interrupted and the process exits. This change is the first step and introduces an interface that both the traditional and mixedversion test frameworks can utilise. It intends to target both regular and MVT tests by supplying the test with the interface either through a helper or the test interface. In addition to providing the `Go` function, there are additional options that can be passed. The test framework will generally supply the panic and error handler, but users can supply additional options such as naming the task, or passing a custom logger. This makes logging more useful to determine what task panic or encountered an error. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
This change provides a task manager implementation that will drive the logic running behind the Tasker interface, as well as provide the implementation of it. The manager is meant to be used by the test framework(s), while the Tasker interface is supplied to tests. The framework will do additional logic such as terminating tasks at the end of a test, logging uncaught errors or issues and providing the appropriate panic and error handlers. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
Previously, the mixedversion framework had its own implementation for managing background tasks. This implementation performed well, but in its current state it is difficult to share with tests that do not require the mixedversion framework. This change replace the mixedversion framework background runner with the newly implemented task manager. The implementation details are relatively similar making it close to a drop-in replacement. Additionally, some utilities relating to tasks have been replaced by the task package. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
Previously, roachtests would use bare goroutines `go func...` to start tasks during a test to perform various background functions. This is problematic due to the nature of roachtest, which runs multiple tests in parallel using workers. If one such bare goroutine panics the whole test suite is interrupted and the process exits. This change is the first step and introduces an interface that both the traditional and mixedversion test frameworks can utilise. It intends to target both regular and MVT tests by supplying the test with the interface either through a helper or the test interface. In addition to providing the `Go` function, there are additional options that can be passed. The test framework will generally supply the panic and error handler, but users can supply additional options such as naming the task, or passing a custom logger. This makes logging more useful to determine what task panic or encountered an error. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
This change provides a task manager implementation that will drive the logic running behind the Tasker interface, as well as provide the implementation of it. The manager is meant to be used by the test framework(s), while the Tasker interface is supplied to tests. The framework will do additional logic such as terminating tasks at the end of a test, logging uncaught errors or issues and providing the appropriate panic and error handlers. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
Previously, the mixedversion framework had its own implementation for managing background tasks. This implementation performed well, but in its current state it is difficult to share with tests that do not require the mixedversion framework. This change replace the mixedversion framework background runner with the newly implemented task manager. The implementation details are relatively similar making it close to a drop-in replacement. Additionally, some utilities relating to tasks have been replaced by the task package. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 24, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 25, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 25, 2024
This change now utilises the Tasker interface supplied by the mixedversion helper to start a task. In essence a background function can also be utilised, but for low impact go routines the utility provides a quick way of starting a task. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Oct 25, 2024
Previously, roachtests would use bare goroutines `go func...` to start tasks during a test to perform various background functions. This is problematic due to the nature of roachtest, which runs multiple tests in parallel using workers. If one such bare goroutine panics the whole test suite is interrupted and the process exits. This change is the first step and introduces an interface that both the traditional and mixedversion test frameworks can utilise. It intends to target both regular and MVT tests by supplying the test with the interface either through a helper or the test interface. In addition to providing the `Go` function, there are additional options that can be passed. The test framework will generally supply the panic and error handler, but users can supply additional options such as naming the task, or passing a custom logger. This makes logging more useful to determine what task panic or encountered an error. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, Monitor was only able to monitor processes already started. It could not detect a new processes if it was started after monitor started monitoring. This change uses the new monitor scripts `monitor_local.sh` & `monitor_remote.sh` which produces a frame of processes when any cockroach process changes occur including when a new cockroach process is started. The logic for detecting changes in processes has been moved to the "client" side. The scripts are now less complicated and only have the responsibility of sending through a frame of processes. A frame consists of the cluster label, process id, and processes status of all cockroach processes. In addition, Monitor has been moved into its own source file for better separation of logic. The `ignore empty nodes` functionality has also been removed, as it is an unreliable way of detecting "empty nodes", and does not serve any real purpose anymore. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, Monitor was mostly untested. This change aims to add a few data driven tests using the new mock session functionality. The Monitor logic for checking for process changes have moved to the client, and can now be tested as well. Although we can't test that the shell scripts are correct with these tests, we can test that Monitor interprets the output correctly, and emits the correct events. We can also test that it correctly handles IO errors. The data driven tests specify what the scripts output, and we can then verify the number of events we expect and what those events are. Since events can be out of order, to ensure the tests do not flake we sort the events. Event timing can be tested by chaining multiple output write blocks and event checks. Additionally, a few smalls separate tests are added to check different IO scenarios. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Add a new monitor script for monitoring remote processes. `systemctl` is used to monitor cockroach processes, on remote nodes, and determine the exit status if a process has died. It emits the same frame format as the local script version, and depends on Monitor to implement the logic for detecting changes in the process list frame. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, it was difficult to test code that creates a remote session. This change adds a mock implementation of session, and the hooks in SyncedCluster to use it during tests. The mock session provides several hooks for controlling the IO of a session. The inputs and outputs are backed by a channel to allow finer control of the flow of data. Currently, it only implements what was required to test the Monitor functionality, but can be extended in the future to cover more use cases. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, Monitor was only able to monitor processes already started. It could not detect a new processes if it was started after monitor started monitoring. This change uses the new monitor scripts `monitor_local.sh` & `monitor_remote.sh` which produces a frame of processes when any cockroach process changes occur including when a new cockroach process is started. The logic for detecting changes in processes has been moved to the "client" side. The scripts are now less complicated and only have the responsibility of sending through a frame of processes. A frame consists of the cluster label, process id, and processes status of all cockroach processes. In addition, Monitor has been moved into its own source file for better separation of logic. The `ignore empty nodes` functionality has also been removed, as it is an unreliable way of detecting "empty nodes", and does not serve any real purpose anymore. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, Monitor was mostly untested. This change aims to add a few data driven tests using the new mock session functionality. The Monitor logic for checking for process changes have moved to the client, and can now be tested as well. Although we can't test that the shell scripts are correct with these tests, we can test that Monitor interprets the output correctly, and emits the correct events. We can also test that it correctly handles IO errors. The data driven tests specify what the scripts output, and we can then verify the number of events we expect and what those events are. Since events can be out of order, to ensure the tests do not flake we sort the events. Event timing can be tested by chaining multiple output write blocks and event checks. Additionally, a few smalls separate tests are added to check different IO scenarios. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 9, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Add a new monitor script for monitoring remote processes. `systemctl` is used to monitor cockroach processes, on remote nodes, and determine the exit status if a process has died. It emits the same frame format as the local script version, and depends on Monitor to implement the logic for detecting changes in the process list frame. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Previously, it was difficult to test code that creates a remote session. This change adds a mock implementation of session, and the hooks in SyncedCluster to use it during tests. The mock session provides several hooks for controlling the IO of a session. The inputs and outputs are backed by a channel to allow finer control of the flow of data. Currently, it only implements what was required to test the Monitor functionality, but can be extended in the future to cover more use cases. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Previously, Monitor was only able to monitor processes already started. It could not detect a new processes if it was started after monitor started monitoring. This change uses the new monitor scripts `monitor_local.sh` & `monitor_remote.sh` which produces a frame of processes when any cockroach process changes occur including when a new cockroach process is started. The logic for detecting changes in processes has been moved to the "client" side. The scripts are now less complicated and only have the responsibility of sending through a frame of processes. A frame consists of the cluster label, process id, and processes status of all cockroach processes. In addition, Monitor has been moved into its own source file for better separation of logic. The `ignore empty nodes` functionality has also been removed, as it is an unreliable way of detecting "empty nodes", and does not serve any real purpose anymore. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Previously, Monitor was mostly untested. This change aims to add a few data driven tests using the new mock session functionality. The Monitor logic for checking for process changes have moved to the client, and can now be tested as well. Although we can't test that the shell scripts are correct with these tests, we can test that Monitor interprets the output correctly, and emits the correct events. We can also test that it correctly handles IO errors. The data driven tests specify what the scripts output, and we can then verify the number of events we expect and what those events are. Since events can be out of order, to ensure the tests do not flake we sort the events. Event timing can be tested by chaining multiple output write blocks and event checks. Additionally, a few smalls separate tests are added to check different IO scenarios. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 10, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, a new API for managing tasks were introduced (see cockroachdb#133263). However, this did not address roachtests that want to manage groups. In an effort to replace `errgroup`, and possibly using `monitor.Go` for task management this change introduces a group provider in the task manager. A group adds the ability to wait on a subset of tasks to finish before proceeding. The task handler will still report returned errors or panics to the test framework. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
This change groups the default options used by tasks and groups into a function that can be shared when creating new groups or tasks via the mixedversion test helper. The mixedversion helper now also implements `GroupProvider` to enable grouping tasks. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
This change groups the default options used by tasks and groups into a function that can be shared when creating new groups or tasks via the test framework. The `Test` interface now implements `GroupProvider` as well to enable grouping tasks. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, a roachtest would need to use wait groups in order to wait on tasks. The new `GroupProvider` interface now allows tests to manage groups of tasks via the framework instead of manually. Informs: cockroachdb#118214 Epic: None Release note: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, Monitor was only able to monitor processes already started. It could not detect a new processes if it was started after monitor started monitoring. This change uses the new monitor scripts `monitor_local.sh` & `monitor_remote.sh` which produces a frame of processes when any cockroach process changes occur including when a new cockroach process is started. The logic for detecting changes in processes has been moved to the "client" side. The scripts are now less complicated and only have the responsibility of sending through a frame of processes. A frame consists of the cluster label, process id, and processes status of all cockroach processes. In addition, Monitor has been moved into its own source file for better separation of logic. The `ignore empty nodes` functionality has also been removed, as it is an unreliable way of detecting "empty nodes", and does not serve any real purpose anymore. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, Monitor was mostly untested. This change aims to add a few data driven tests using the new mock session functionality. The Monitor logic for checking for process changes have moved to the client, and can now be tested as well. Although we can't test that the shell scripts are correct with these tests, we can test that Monitor interprets the output correctly, and emits the correct events. We can also test that it correctly handles IO errors. The data driven tests specify what the scripts output, and we can then verify the number of events we expect and what those events are. Since events can be out of order, to ensure the tests do not flake we sort the events. Event timing can be tested by chaining multiple output write blocks and event checks. Additionally, a few smalls separate tests are added to check different IO scenarios. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 11, 2024
Previously, the post assertions would check for dead nodes by using roachprod Monitor. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. This change refactors the code to rather use the HTTP health checks to determine if there are dead nodes. Since we now know which node is used for the workload, this check only needs to determine if the cockroach nodes are all still live. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 12, 2024
Previously, the post assertions would check for dead nodes by using `roachprod.Monitor`. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. It also no longer functions, since `roachprod.Monitor` only watches already live processes. This change removes this functionality as it no longer works. An issue (See: cockroachdb#137329) has been created to address this should it still be a requirement. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 12, 2024
Previously, the post assertions would check for dead nodes by using `roachprod.Monitor`. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. It also no longer functions, since `roachprod.Monitor` only watches already live processes. This change removes this functionality as it no longer works. An issue (See: cockroachdb#137329) has been created to address this should it still be a requirement. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 12, 2024
Previously, the post assertions would check for dead nodes by using `roachprod.Monitor`. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. It also no longer functions, since `roachprod.Monitor` only watches already live processes. This change removes this functionality as it no longer works. An issue (See: cockroachdb#137329) has been created to address this should it still be a requirement. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 12, 2024
Previously, the post assertions would check for dead nodes by using `roachprod.Monitor`. It checks if the data directory of a cockroach node is non-trivial. This is not a reliable way to exclude empty nodes, or to check for dead nodes. It also complicates the Monitor functionality unnecessarily. It also no longer functions, since `roachprod.Monitor` only watches already live processes. This change removes this functionality as it no longer works. An issue (See: cockroachdb#137329) has been created to address this should it still be a requirement. Informs: cockroachdb#118214 Epic: None
herkolategan
added a commit
to herkolategan/cockroach
that referenced
this issue
Dec 12, 2024
Previously, `cluster.HealthStatus` seemed to check all nodes, but rely on the count of a nodes list passed in as a parameter. This change fixes the function to use the passed nodes during the request phase as well. It also adds a small retry mechanism during the 15 seconds it checks the health of the nodes. The issue was found when updating the health check to only look at `c.CRDBNodes()` and exclude the workload node if one is part of the cluster. Informs: cockroachdb#118214 Epic: None
craig bot
pushed a commit
that referenced
this issue
Dec 13, 2024
135831: roachtest: add groups to task manager r=DarrylWong a=herkolategan Previously, a new API for managing tasks were introduced (see #133263). However, this did not address roachtests that want to manage groups. In an effort to replace `errgroup`, and `monitor.Go` for task management, this change introduces a group provider in the task manager. A group adds the ability to wait on a subset of tasks to finish before proceeding. The task handler will still report returned errors or panics to the test framework. Informs: #118214 Epic: None Release note: None Co-authored-by: Herko Lategan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-testing
Testing tools and infrastructure
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-testeng
TestEng Team
Typical uses of the roachtest monitor are as follows:
This is a summary of the two main functions provided by the monitor:
Go
: starts a goroutine. This is useful because we never want barego
calls in roachtests (or anywhere, really -- see *: ban the go keyword in non-test code #58164); having automated panic recovery is nice to have for all tests. Note that callingGo
will start the goroutine immediately.Wait
: serves two purposes. First, it waits for all goroutines started withGo
to return (canceling on error likeerrgroup
). In addition, it also starts a "processs monitor" (hence the name) that will watch for unexpected cockroach process deaths in the nodes it monitors.This API is less than ideal for the following reasons:
Monitoring processes and managing goroutines are two completely separate functionalities. It is unexpected that we will only monitor node deaths if we also have some goroutine to run. We recently had to patch this module in order to allow callers to just monitor nodes, without having to run a function to achieve that goal (roachtest: allow callers to listen for node events in monitor #108594).
We will only actually monitor nodes if the test calls
Wait()
on the monitor. While this is good practice and should happen every time, this is not enforced and tests can continue to pass even if they don't. In addition, node deaths that happen before the test callsWait()
go completely unnoticed.Because goroutine management and process monitoring are coupled, it makes composition really difficult. Suppose a function
A
creates a new monitor in order to run a background task. Suppose also that we're writing a new test that wants to callA
to reuse its functionality. If our test B wishes to restart nodes, we now have to find a way to pass that information down toA
so that A's monitor can properly handle that node death event. In the general case, if our test involves creating N monitors (directly or indirectly), and we want to perform a node restart, we need to find a way to pass that information to all N monitors.Proposal
A proposed way to deal with this is to promote node monitoring to the test runner as a default. Every test should have an active "node monitor" that checks for unexpected node deaths. We could have a single
t.ExpectNodeDeath(n1)
to indicate that this test expects n1 to die, and that can happen only once. What's currently theMonitor
will be deprecated in tests in favour of a simple goroutine runner with builtin panic handling.Jira issue: CRDB-35614
The text was updated successfully, but these errors were encountered: