Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Add available memory state (optionally) #19149

16 changes: 16 additions & 0 deletions .chloggen/hostmetrics-add-mem-available.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
dmitryax marked this conversation as resolved.
Show resolved Hide resolved

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: hostmetricsreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add available memory metric.

# One or more tracking issues related to the change
issues: [7417]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
20 changes: 20 additions & 0 deletions .chloggen/hostmetrics-some-memory-metrics-optional.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: hostmetricsreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Data points for slab_reclaimable and slab_unreclaimable memory states are now disabled by default.

# One or more tracking issues related to the change
issues: [7417]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
The old behavior can be preserved by setting the enable_additional_memory_states config option to true.
NOTE: When either of slab_reclaimable, slab_unreclaimable or available memory states is enabled,
all the memory states don't sum up well (the result exceeds `total` memory).
For more details, see the discussion https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417.
9 changes: 9 additions & 0 deletions receiver/hostmetricsreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,15 @@ load:
cpu_average: <false|true>
```

### Memory

`enable_additional_memory_states` specifies whether to enable reporting of a couple of additional memory state metrics: `slab_reclaimable`, `slab_unreclaimable` and `available`. When those are enabled, all the memory states don't sum up well (the result exceeds `total` memory) (default: `false`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a hill I'm going to die on, but in my opinion we should be looking to replace this metric with one or more metrics that use attributes appropriately to differentiate portions of a meaningful total.

I'm not an expert at memory management but it seems someone should be able to clearly state the relationships between these values. As a (probably wrong) starting point:

  • total = buffered + cached + inactive + free + used
  • slab = slab_reclaimable + slab_unreclaimable
  • available = free + slab_reclaimable?

If we had a list of these relationships, it should be fairly easy to identify a correct data model. Without it, I don't know how to evaluate such a solution except to say that overloading a metric has never been necessary elsewhere, as far as I am aware. Again, this isn't intended to be blocking - just trying to articulate why in my opinion we are having to invent novel ways to tweak a metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, we're dealing with a complex issue that doesn't have a simple and universal solution. However, I believe that including the ability to report available memory is a valuable feature for users, despite its potential drawbacks.


```yaml
memory:
enable_additional_memory_states: <false|true>
```

### Network

```yaml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,10 @@ import (
type Config struct {
metadata.MetricsBuilderConfig `mapstructure:",squash"`
internal.ScraperConfig

// EnableAdditionalMemoryStates (false by default) controls reporting of data points with additional memory states:
// slab_reclaimable, slab_unreclaimable and available.
// When those are enabled, all the memory states don't sum up well (the result exceeds `total` memory).
// For more details, see the discussion https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417.
EnableAdditionalMemoryStates bool `mapstructure:"enable_additional_memory_states"`
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Bytes of memory in use.

| Name | Description | Values |
| ---- | ----------- | ------ |
| state | Breakdown of memory usage by type. | Str: ``buffered``, ``cached``, ``inactive``, ``free``, ``slab_reclaimable``, ``slab_unreclaimable``, ``used`` |
| state | Breakdown of memory usage by type. | Str: ``buffered``, ``cached``, ``inactive``, ``free``, ``slab_reclaimable``, ``slab_unreclaimable``, ``used``, ``available`` |

## Optional Metrics

Expand All @@ -48,4 +48,4 @@ Percentage of memory bytes in use.

| Name | Description | Values |
| ---- | ----------- | ------ |
| state | Breakdown of memory usage by type. | Str: ``buffered``, ``cached``, ``inactive``, ``free``, ``slab_reclaimable``, ``slab_unreclaimable``, ``used`` |
| state | Breakdown of memory usage by type. | Str: ``buffered``, ``cached``, ``inactive``, ``free``, ``slab_reclaimable``, ``slab_unreclaimable``, ``used``, ``available`` |

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ func (s *scraper) scrape(_ context.Context) (pmetric.Metrics, error) {
memInfo.Total), metricsLen)
}
s.recordMemoryUtilizationMetric(now, memInfo)

if s.config.EnableAdditionalMemoryStates {
s.recordMemoryUsageMetricAdditionalStates(now, memInfo)
s.recordMemoryUtilizationMetricAdditionalStates(now, memInfo)
}
}

return s.mb.Emit(), nil
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,23 @@ func (s *scraper) recordMemoryUsageMetric(now pcommon.Timestamp, memInfo *mem.Vi
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Free), metadata.AttributeStateFree)
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Buffers), metadata.AttributeStateBuffered)
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Cached), metadata.AttributeStateCached)
}

func (s *scraper) recordMemoryUsageMetricAdditionalStates(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Sreclaimable), metadata.AttributeStateSlabReclaimable)
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Sunreclaim), metadata.AttributeStateSlabUnreclaimable)
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Available), metadata.AttributeStateAvailable)
}

func (s *scraper) recordMemoryUtilizationMetric(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Used)/float64(memInfo.Total), metadata.AttributeStateUsed)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Free)/float64(memInfo.Total), metadata.AttributeStateFree)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Buffers)/float64(memInfo.Total), metadata.AttributeStateBuffered)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Cached)/float64(memInfo.Total), metadata.AttributeStateCached)
}

func (s *scraper) recordMemoryUtilizationMetricAdditionalStates(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Sreclaimable)/float64(memInfo.Total), metadata.AttributeStateSlabReclaimable)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Sunreclaim)/float64(memInfo.Total), metadata.AttributeStateSlabUnreclaimable)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Available)/float64(memInfo.Total), metadata.AttributeStateAvailable)
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,16 @@ func (s *scraper) recordMemoryUsageMetric(now pcommon.Timestamp, memInfo *mem.Vi
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Inactive), metadata.AttributeStateInactive)
}

func (s *scraper) recordMemoryUsageMetricAdditionalStates(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Available), metadata.AttributeStateAvailable)
}

func (s *scraper) recordMemoryUtilizationMetric(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Used)/float64(memInfo.Total), metadata.AttributeStateUsed)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Free)/float64(memInfo.Total), metadata.AttributeStateFree)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Inactive)/float64(memInfo.Total), metadata.AttributeStateInactive)
}

func (s *scraper) recordMemoryUtilizationMetricAdditionalStates(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Available)/float64(memInfo.Total), metadata.AttributeStateAvailable)
}
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,21 @@ func TestScrape(t *testing.T) {
},
expectedMetricCount: 2,
},
{
name: "Additional memory state metrics enabled",
config: &Config{
MetricsBuilderConfig: metadata.NewMetricsBuilderConfig(metadata.MetricsSettings{
SystemMemoryUtilization: metadata.MetricSettings{
Enabled: true,
},
SystemMemoryUsage: metadata.MetricSettings{
Enabled: true,
},
}, metadata.DefaultResourceAttributesSettings()),
EnableAdditionalMemoryStates: true,
},
expectedMetricCount: 2,
},
{
name: "Error",
virtualMemoryFunc: func() (*mem.VirtualMemoryStat, error) { return nil, errors.New("err1") },
Expand Down Expand Up @@ -125,6 +140,10 @@ func TestScrape(t *testing.T) {

if runtime.GOOS == "linux" {
assertMemoryUsageMetricHasLinuxSpecificStateLabels(t, metrics.At(0))

if test.config.EnableAdditionalMemoryStates {
assertMemoryUsageMetricHasLinuxSpecificAdditionalStateLabels(t, metrics.At(0))
}
} else if runtime.GOOS != "windows" {
internal.AssertSumMetricHasAttributeValue(t, metrics.At(0), 2, "state",
pcommon.NewValueStr(metadata.AttributeStateInactive.String()))
Expand All @@ -137,14 +156,19 @@ func TestScrape(t *testing.T) {

func TestScrape_MemoryUtilization(t *testing.T) {
type testCase struct {
name string
virtualMemoryFunc func() (*mem.VirtualMemoryStat, error)
expectedErr error
name string
virtualMemoryFunc func() (*mem.VirtualMemoryStat, error)
enableAdditionalMemoryStates bool
expectedErr error
}
testCases := []testCase{
{
name: "Standard",
},
{
name: "Standard with additional memory states",
enableAdditionalMemoryStates: true,
},
{
name: "Invalid total memory",
virtualMemoryFunc: func() (*mem.VirtualMemoryStat, error) { return &mem.VirtualMemoryStat{Total: 0}, nil },
Expand All @@ -157,7 +181,8 @@ func TestScrape_MemoryUtilization(t *testing.T) {
mbc.Metrics.SystemMemoryUtilization.Enabled = true
mbc.Metrics.SystemMemoryUsage.Enabled = false
scraperConfig := Config{
MetricsBuilderConfig: mbc,
MetricsBuilderConfig: mbc,
EnableAdditionalMemoryStates: test.enableAdditionalMemoryStates,
}
scraper := newMemoryScraper(context.Background(), receivertest.NewNopCreateSettings(), &scraperConfig)
if test.virtualMemoryFunc != nil {
Expand All @@ -180,6 +205,10 @@ func TestScrape_MemoryUtilization(t *testing.T) {

if runtime.GOOS == "linux" {
assertMemoryUtilizationMetricHasLinuxSpecificStateLabels(t, metrics.At(0))

if test.enableAdditionalMemoryStates {
assertMemoryUtilizationMetricHasLinuxSpecificAdditionalStateLabels(t, metrics.At(0))
}
} else if runtime.GOOS != "windows" {
internal.AssertGaugeMetricHasAttributeValue(t, metrics.At(0), 2, "state",
pcommon.NewValueStr(metadata.AttributeStateInactive.String()))
Expand All @@ -192,7 +221,7 @@ func TestScrape_MemoryUtilization(t *testing.T) {

func assertMemoryUsageMetricValid(t *testing.T, metric pmetric.Metric, expectedName string) {
assert.Equal(t, expectedName, metric.Name())
assert.GreaterOrEqual(t, metric.Sum().DataPoints().Len(), 2)
assert.GreaterOrEqual(t, metric.Sum().DataPoints().Len(), 3)
internal.AssertSumMetricHasAttributeValue(t, metric, 0, "state",
pcommon.NewValueStr(metadata.AttributeStateUsed.String()))
internal.AssertSumMetricHasAttributeValue(t, metric, 1, "state",
Expand All @@ -201,7 +230,7 @@ func assertMemoryUsageMetricValid(t *testing.T, metric pmetric.Metric, expectedN

func assertMemoryUtilizationMetricValid(t *testing.T, metric pmetric.Metric, expectedName string) {
assert.Equal(t, expectedName, metric.Name())
assert.GreaterOrEqual(t, metric.Gauge().DataPoints().Len(), 2)
assert.GreaterOrEqual(t, metric.Gauge().DataPoints().Len(), 3)
internal.AssertGaugeMetricHasAttributeValue(t, metric, 0, "state",
pcommon.NewValueStr(metadata.AttributeStateUsed.String()))
internal.AssertGaugeMetricHasAttributeValue(t, metric, 1, "state",
Expand All @@ -213,19 +242,29 @@ func assertMemoryUsageMetricHasLinuxSpecificStateLabels(t *testing.T, metric pme
pcommon.NewValueStr(metadata.AttributeStateBuffered.String()))
internal.AssertSumMetricHasAttributeValue(t, metric, 3, "state",
pcommon.NewValueStr(metadata.AttributeStateCached.String()))
}

func assertMemoryUsageMetricHasLinuxSpecificAdditionalStateLabels(t *testing.T, metric pmetric.Metric) {
internal.AssertSumMetricHasAttributeValue(t, metric, 4, "state",
pcommon.NewValueStr(metadata.AttributeStateSlabReclaimable.String()))
internal.AssertSumMetricHasAttributeValue(t, metric, 5, "state",
pcommon.NewValueStr(metadata.AttributeStateSlabUnreclaimable.String()))
internal.AssertSumMetricHasAttributeValue(t, metric, 6, "state",
pcommon.NewValueStr(metadata.AttributeStateAvailable.String()))
}

func assertMemoryUtilizationMetricHasLinuxSpecificStateLabels(t *testing.T, metric pmetric.Metric) {
internal.AssertGaugeMetricHasAttributeValue(t, metric, 2, "state",
pcommon.NewValueStr(metadata.AttributeStateBuffered.String()))
internal.AssertGaugeMetricHasAttributeValue(t, metric, 3, "state",
pcommon.NewValueStr(metadata.AttributeStateCached.String()))
}

func assertMemoryUtilizationMetricHasLinuxSpecificAdditionalStateLabels(t *testing.T, metric pmetric.Metric) {
internal.AssertGaugeMetricHasAttributeValue(t, metric, 4, "state",
pcommon.NewValueStr(metadata.AttributeStateSlabReclaimable.String()))
internal.AssertGaugeMetricHasAttributeValue(t, metric, 5, "state",
pcommon.NewValueStr(metadata.AttributeStateSlabUnreclaimable.String()))
internal.AssertGaugeMetricHasAttributeValue(t, metric, 6, "state",
pcommon.NewValueStr(metadata.AttributeStateAvailable.String()))
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,15 @@ func (s *scraper) recordMemoryUsageMetric(now pcommon.Timestamp, memInfo *mem.Vi
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Free), metadata.AttributeStateFree)
}

func (s *scraper) recordMemoryUsageMetricAdditionalStates(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUsageDataPoint(now, int64(memInfo.Available), metadata.AttributeStateAvailable)
}

func (s *scraper) recordMemoryUtilizationMetric(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Used)/float64(memInfo.Total), metadata.AttributeStateUsed)
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Free)/float64(memInfo.Total), metadata.AttributeStateFree)
}

func (s *scraper) recordMemoryUtilizationMetricAdditionalStates(now pcommon.Timestamp, memInfo *mem.VirtualMemoryStat) {
s.mb.RecordSystemMemoryUtilizationDataPoint(now, float64(memInfo.Available)/float64(memInfo.Total), metadata.AttributeStateAvailable)
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,16 @@ attributes:
state:
description: Breakdown of memory usage by type.
type: string
enum: [buffered, cached, inactive, free, slab_reclaimable, slab_unreclaimable, used]
enum: [
available, # disabled by default
buffered,
cached,
free,
inactive,
slab_reclaimable, # disabled by default
slab_unreclaimable, # disabled by default
used,
]

metrics:
system.memory.usage:
Expand Down