Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 3.8 KB

metrics.md

File metadata and controls

62 lines (40 loc) · 3.8 KB

Metrics

Agents periodically collect and report various metrics, described below.

System/process CPU/Heap

All agents (excluding JavaScript RUM) should record the following basic system/process metrics:

  • system.cpu.total.norm.pct: system CPU usage since the last report, in the range [0,1] (0-100%)
  • system.process.cpu.total.norm.pct: process CPU usage since the last report, in the range [0,1] (0-100%)
  • system.memory.total: total usable (but not necessarily available) memory on the system, in bytes
  • system.memory.actual.free: total available memory on the system, in bytes
  • system.process.memory.size: process virtual memory size, in bytes
  • system.process.memory.rss.bytes: process resident set size, in bytes

cgroup metrics

Where applicable, all agents (excluding JavaScript RUM) should record the following cgroup metrics:

  • system.process.cgroup.memory.mem.limit.bytes
  • system.process.cgroup.memory.mem.usage.bytes

Metrics source

  • system.process.cgroup.memory.mem.limit.bytes - based on the memory.limit_in_bytes file
  • system.process.cgroup.memory.mem.usage.bytes - based on the memory.usage_in_bytes file
  • system.process.cgroup.memory.mem.limit.bytes - based on the memory.max file
  • system.process.cgroup.memory.mem.usage.bytes - based on the memory.current file

Discovery of the memory files

All files mentioned above are located at the same directory. Ideally, we can discover this dir by parsing the /proc/self/mountinfo file, looking for the memory mount line and extracting the path from within it. An example of such line is:

436 431 0:33 /docker/5042cfbb4ab36fcef9ca5f1eda54f40265c6ef3fe0694dfe34b9b474e70f8df5 /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:22 - cgroup memory rw,memory

The regex ^\d+? \d+? .+? .+? (.*?) .*cgroup.*memory.* works in the cgroup-v1 systems tested so far, where the first and only group should be the directory path. However, it will probably take a few iterations and tests on different container runtimes and OSs to get it right. There is no regex currently suggested for cgroup-v2. Look in other agent PRs to get ideas.

Whenever agents fail to discover the memory mount path, they should default to /sys/fs/cgroup/memory.

Special values for unlimited memory quota

Special values are used to indicate that the cgroup is not configured with a memory limit. In cgroup v1, this value is numeric - 0x7ffffffffffff000 and in cgroup v2 it is represented by the string max. Agents should not send the system.process.cgroup.memory.mem.limit.bytes metric whenever these special values are set.

Runtime

Agent should record runtime-specific metrics, such as garbage collection pauses. Due to their runtime-specific nature, these will differ for each agent.

When capturing runtime metrics, keep in mind the end use-case: how will they be used? Is the format in which they are recorded appropriate for visualisation in Kibana? Do not record metrics just because it is easy; record them because they are useful.

Transaction and span breakdown

Agents should record "breakdown metrics", which is a summarisation of how much time is spent per span type/subtype in each transaction group. This is described in detail in the Breakdown Graphs document, so we do not repeat it here.

Shutdown behavior

Agents should make an effort to flush any metrics before shutting down. If this cannot be achieved with shutdown hooks provided by the language/runtime, the agent should provide a public API that the user can call to flush any remaining data.