Skip to content

Monitoring the machines involved

Julian Harty edited this page Mar 11, 2018 · 2 revisions

Context

Each computer, or machine, has a finite, limited capacity for work. Their ability to cope with more work tails off when resources are close-to, or at, capacity. Resources include CPU, RAM, Network IO, Storage IO and Storage. Choices of storage can affect the performance characteristics, some are IO bound, others throughput bound.

It's useful to monitor each computer while they are working. We may also want to check their available capacity both before and after a period of work (such as a test). This particularly applies to storage volumes (which may be logical and/or physical disks). Kafka nodes in particular fail and struggle to recover when they run out of storage.

Monitoring the computers while they are working

The monitoring needs to have a low overhead, be consistent, repeatable, and useful. Linux has lots of utility programs such as top which are relevant and may be useful. One such example is iostat. Here is an example of using iostat to record the IO each second for 400 seconds, enough to record the IO and CPU for a machine before, during and immediately after a 5 minute (300 second) test.

iostat -m -t 1 400 > `hostname`.iostat.log