Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Statuscode Handler to Agent Health Extension #1423

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
2348639
fixing issue
Paramadon Nov 12, 2024
b7fac5e
trying to fix logs
Paramadon Nov 12, 2024
45536b8
fixing issue
Paramadon Nov 12, 2024
75b024a
adding debug statements (swapping logger with log.print)
Paramadon Nov 13, 2024
b41de09
adding more log statments
Paramadon Nov 13, 2024
90a22a3
fixing tags
Paramadon Nov 13, 2024
8c6d037
fixing logs
Paramadon Nov 13, 2024
6ff99f8
fixing logs
Paramadon Nov 13, 2024
667f608
adding a log
Paramadon Nov 13, 2024
9de9448
fixing stats function for statuscodehandler
Paramadon Nov 13, 2024
1e690e3
fixing issue
Paramadon Nov 13, 2024
4c6adcf
making operation names shorter
Paramadon Nov 13, 2024
dc129c9
Last commit works to reduce name- this one is for filtering operation…
Paramadon Nov 13, 2024
07a06fa
adding status code only filter
Paramadon Nov 13, 2024
0f76726
removing status code from otel config
Paramadon Nov 13, 2024
965ccdb
had to add return statments to filter
Paramadon Nov 13, 2024
a8268fd
commenting out try configure for ec2tagger
Paramadon Nov 13, 2024
8ddb463
adding ecw client for ec2 tagger
Paramadon Nov 13, 2024
3c15e81
changing the middleware to metricID
Paramadon Nov 13, 2024
947b8f3
trying to get stat
Paramadon Nov 14, 2024
a76ef49
everything works just removing some debug statement and filter from h…
Paramadon Nov 15, 2024
abf484b
fixing filter issue
Paramadon Nov 15, 2024
c849f61
previous commit works, just cleaning up debug statemtns
Paramadon Nov 15, 2024
bd33dbc
adding unit tests
Paramadon Nov 15, 2024
822680f
fixing test
Paramadon Nov 19, 2024
d547e1c
fixing unit tests
Paramadon Nov 19, 2024
5dc40f4
running make fmt
Paramadon Nov 19, 2024
e5bf295
fixing unit tests
Paramadon Nov 19, 2024
e07463a
fixing tests
Paramadon Nov 19, 2024
a5ba52e
fixing yamls
Paramadon Nov 19, 2024
f3c8d46
fixing yamls
Paramadon Nov 19, 2024
f64b332
removing internal changes
Paramadon Nov 19, 2024
fa4c163
restoring files
Paramadon Nov 19, 2024
457e48c
removing unnecessary file changes and passing unit test
Paramadon Nov 19, 2024
2705e87
cleaning up code
Paramadon Nov 19, 2024
3083626
fixing up pointer issue and passing in operations
Paramadon Nov 20, 2024
9e40bd8
adding agent health to prometheus
Paramadon Nov 20, 2024
5745ee6
resolving comments
Paramadon Nov 22, 2024
6e0fdd3
resolving comments
Paramadon Nov 22, 2024
8aee0d9
adding tests
Paramadon Nov 22, 2024
f344518
resolving comments
Paramadon Nov 22, 2024
a7e0415
resolving comments
Paramadon Nov 22, 2024
d91706d
adding tests
Paramadon Nov 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions extension/agenthealth/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ import (
)

type Config struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO status code stats should have to be enabled. agenthealth/metrics and agenthealth/traces won't have any allowlisted operations, but are still going to have the handlers created and attached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this and I changed it so that we would have to enable statuscodes

IsUsageDataEnabled bool `mapstructure:"is_usage_data_enabled"`
Stats agent.StatsConfig `mapstructure:"stats"`
IsUsageDataEnabled bool `mapstructure:"is_usage_data_enabled"`
Stats agent.StatsConfig `mapstructure:"stats"`
IsOnlyStatusCodeEnabled bool `mapstructure:"is_status_code_enabled,omitempty"`
}

var _ component.Config = (*Config)(nil)
22 changes: 18 additions & 4 deletions extension/agenthealth/extension.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"github.com/amazon-contributing/opentelemetry-collector-contrib/extension/awsmiddleware"
"go.opentelemetry.io/collector/component"
"go.uber.org/zap"
"golang.org/x/exp/slices"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats"
"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/useragent"
Expand All @@ -24,11 +25,24 @@ var _ awsmiddleware.Extension = (*agentHealth)(nil)
func (ah *agentHealth) Handlers() ([]awsmiddleware.RequestHandler, []awsmiddleware.ResponseHandler) {
var responseHandlers []awsmiddleware.ResponseHandler
requestHandlers := []awsmiddleware.RequestHandler{useragent.NewHandler(ah.cfg.IsUsageDataEnabled)}
if ah.cfg.IsUsageDataEnabled {
req, res := stats.NewHandlers(ah.logger, ah.cfg.Stats)
requestHandlers = append(requestHandlers, req...)
responseHandlers = append(responseHandlers, res...)

if !ah.cfg.IsUsageDataEnabled {
ah.logger.Debug("Usage data is disabled, skipping stats handlers")
return requestHandlers, responseHandlers
}

statusCodeEnabled := false
statusCodeEnabled = ah.cfg.IsOnlyStatusCodeEnabled

agentStatsEnabled :=
slices.Contains(ah.cfg.Stats.Operations, "PutMetricData") ||
slices.Contains(ah.cfg.Stats.Operations, "PutLogEvents") ||
slices.Contains(ah.cfg.Stats.Operations, "PutTraceSegment")

statsRequestHandlers, statsResponseHandlers := stats.NewHandlers(ah.logger, ah.cfg.Stats, statusCodeEnabled, agentStatsEnabled)
requestHandlers = append(requestHandlers, statsRequestHandlers...)
responseHandlers = append(responseHandlers, statsResponseHandlers...)

return requestHandlers, responseHandlers
}

Expand Down
4 changes: 3 additions & 1 deletion extension/agenthealth/extension_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ import (
"github.com/stretchr/testify/assert"
"go.opentelemetry.io/collector/component/componenttest"
"go.uber.org/zap"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats/agent"
)

func TestExtension(t *testing.T) {
ctx := context.Background()
cfg := &Config{IsUsageDataEnabled: true}
cfg := &Config{IsUsageDataEnabled: true, Stats: agent.StatsConfig{Operations: []string{"PutLogEvents"}}}
extension := NewAgentHealth(zap.NewNop(), cfg)
assert.NotNil(t, extension)
assert.NoError(t, extension.Start(ctx, componenttest.NewNopHost()))
Expand Down
95 changes: 75 additions & 20 deletions extension/agenthealth/handler/stats/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,22 @@ const (
)

type Stats struct {
CpuPercent *float64 `json:"cpu,omitempty"`
MemoryBytes *uint64 `json:"mem,omitempty"`
FileDescriptorCount *int32 `json:"fd,omitempty"`
ThreadCount *int32 `json:"th,omitempty"`
LatencyMillis *int64 `json:"lat,omitempty"`
PayloadBytes *int `json:"load,omitempty"`
StatusCode *int `json:"code,omitempty"`
SharedConfigFallback *int `json:"scfb,omitempty"`
ImdsFallbackSucceed *int `json:"ifs,omitempty"`
AppSignals *int `json:"as,omitempty"`
EnhancedContainerInsights *int `json:"eci,omitempty"`
RunningInContainer *int `json:"ric,omitempty"`
RegionType *string `json:"rt,omitempty"`
Mode *string `json:"m,omitempty"`
EntityRejected *int `json:"ent,omitempty"`
CpuPercent *float64 `json:"cpu,omitempty"`
MemoryBytes *uint64 `json:"mem,omitempty"`
FileDescriptorCount *int32 `json:"fd,omitempty"`
ThreadCount *int32 `json:"th,omitempty"`
LatencyMillis *int64 `json:"lat,omitempty"`
PayloadBytes *int `json:"load,omitempty"`
StatusCode *int `json:"code,omitempty"`
SharedConfigFallback *int `json:"scfb,omitempty"`
ImdsFallbackSucceed *int `json:"ifs,omitempty"`
AppSignals *int `json:"as,omitempty"`
EnhancedContainerInsights *int `json:"eci,omitempty"`
RunningInContainer *int `json:"ric,omitempty"`
RegionType *string `json:"rt,omitempty"`
Mode *string `json:"m,omitempty"`
EntityRejected *int `json:"ent,omitempty"`
StatusCodes map[string][5]int `json:"codes,omitempty"` //represents status codes 200,400,408,413,429,
}

// Merge the other Stats into the current. If the field is not nil,
Expand Down Expand Up @@ -80,6 +81,28 @@ func (s *Stats) Merge(other Stats) {
if other.EntityRejected != nil {
s.EntityRejected = other.EntityRejected
}
if other.StatusCodes == nil {
return
}

if s.StatusCodes == nil {
s.StatusCodes = make(map[string][5]int)
}

for key, value := range other.StatusCodes {
if existing, ok := s.StatusCodes[key]; ok {
s.StatusCodes[key] = [5]int{
existing[0] + value[0], // 200
existing[1] + value[1], // 400
existing[2] + value[2], // 408
existing[3] + value[3], // 413
existing[4] + value[4], // 429
}
} else {
s.StatusCodes[key] = value
}
}

}

func (s *Stats) Marshal() (string, error) {
Expand All @@ -104,6 +127,26 @@ func (of OperationsFilter) IsAllowed(operationName string) bool {
return of.allowAll || of.operations.Contains(operationName)
}

var StatusCodeOperations = []string{ // all the operations that are allowed
"DescribeInstances",
"DescribeTags",
"DescribeVolumes",
"DescribeContainerInstances",
"DescribeServices",
"DescribeTaskDefinition",
"ListServices",
"ListTasks",
"CreateLogGroup",
"CreateLogStream",
}

type StatsConfig struct {
// Operations are the allowed operation names to gather stats for.
Operations []string `mapstructure:"operations,omitempty"`
// UsageFlags are the usage flags to set on start up.
UsageFlags map[Flag]any `mapstructure:"usage_flags,omitempty"`
}

func NewOperationsFilter(operations ...string) OperationsFilter {
allowed := collections.NewSet[string](operations...)
return OperationsFilter{
Expand All @@ -112,9 +155,21 @@ func NewOperationsFilter(operations ...string) OperationsFilter {
}
}

type StatsConfig struct {
// Operations are the allowed operation names to gather stats for.
Operations []string `mapstructure:"operations,omitempty"`
// UsageFlags are the usage flags to set on start up.
UsageFlags map[Flag]any `mapstructure:"usage_flags,omitempty"`
// NewStatusCodeOperationsFilter creates a new filter for allowed operations and status codes.
func NewStatusCodeOperationsFilter() OperationsFilter {
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
allowed := make(map[string]struct{}, len(StatusCodeOperations))

return OperationsFilter{
operations: allowed,
allowAll: false,
}
}

func NewStatusCodeAndOtherOperationsFilter(operations []string) OperationsFilter {
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
filter := NewStatusCodeOperationsFilter()
for _, operation := range operations {
filter.operations.Add(operation)
}

return filter
}
70 changes: 70 additions & 0 deletions extension/agenthealth/handler/stats/agent/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,76 @@ func TestMerge(t *testing.T) {
assert.EqualValues(t, "Mode", *stats.Mode)
}

func TestMergeWithStatusCodes(t *testing.T) {
stats := &Stats{
StatusCodes: map[string][5]int{
"operation1": {1, 2, 3, 4, 5},
},
}

stats.Merge(Stats{
StatusCodes: map[string][5]int{
"operation1": {2, 3, 4, 5, 6}, // Existing operation with new values
"operation2": {0, 1, 2, 3, 4}, // New operation
},
})

assert.Equal(t, [5]int{3, 5, 7, 9, 11}, stats.StatusCodes["operation1"]) // Values should sum
assert.Equal(t, [5]int{0, 1, 2, 3, 4}, stats.StatusCodes["operation2"]) // New operation added

stats.Merge(Stats{
StatusCodes: nil,
})

assert.Equal(t, [5]int{3, 5, 7, 9, 11}, stats.StatusCodes["operation1"])
assert.Equal(t, [5]int{0, 1, 2, 3, 4}, stats.StatusCodes["operation2"])
}

func TestMarshalWithStatusCodes(t *testing.T) {
testCases := map[string]struct {
stats *Stats
want string
}{
"WithEmptyStatusCodes": {
stats: &Stats{
StatusCodes: map[string][5]int{},
},
want: "",
},
"WithStatusCodes": {
stats: &Stats{
StatusCodes: map[string][5]int{
"operation1": {1, 2, 3, 4, 5},
"operation2": {0, 1, 2, 3, 4},
},
},
want: `"codes":{"operation1":[1,2,3,4,5],"operation2":[0,1,2,3,4]}`,
},
}
for name, testCase := range testCases {
t.Run(name, func(t *testing.T) {
got, err := testCase.stats.Marshal()
assert.NoError(t, err)
assert.Contains(t, got, testCase.want)
})
}
}

func TestMergeFullWithStatusCodes(t *testing.T) {
stats := &Stats{
CpuPercent: aws.Float64(1.0),
StatusCodes: map[string][5]int{"operation1": {1, 0, 0, 0, 0}},
}
stats.Merge(Stats{
CpuPercent: aws.Float64(2.0),
StatusCodes: map[string][5]int{"operation1": {0, 1, 0, 0, 0}, "operation2": {1, 1, 1, 1, 1}},
})

assert.Equal(t, 2.0, *stats.CpuPercent)
assert.Equal(t, [5]int{1, 1, 0, 0, 0}, stats.StatusCodes["operation1"])
assert.Equal(t, [5]int{1, 1, 1, 1, 1}, stats.StatusCodes["operation2"])
}

func TestMarshal(t *testing.T) {
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
testCases := map[string]struct {
stats *Stats
Expand Down
34 changes: 29 additions & 5 deletions extension/agenthealth/handler/stats/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,36 @@ const (
headerKeyAgentStats = "X-Amz-Agent-Stats"
)

func NewHandlers(logger *zap.Logger, cfg agent.StatsConfig) ([]awsmiddleware.RequestHandler, []awsmiddleware.ResponseHandler) {
filter := agent.NewOperationsFilter(cfg.Operations...)
clientStats := client.NewHandler(filter)
stats := newStatsHandler(logger, filter, []agent.StatsProvider{clientStats, provider.GetProcessStats(), provider.GetFlagsStats()})
func NewHandlers(logger *zap.Logger, cfg agent.StatsConfig, statusCodeEnabled bool, agentStatsEnabled bool) ([]awsmiddleware.RequestHandler, []awsmiddleware.ResponseHandler) {
var requestHandlers []awsmiddleware.RequestHandler
var responseHandlers []awsmiddleware.ResponseHandler
var statsProviders []agent.StatsProvider

if !statusCodeEnabled && !agentStatsEnabled {
return nil, nil
}

statusCodeFilter := agent.NewStatusCodeOperationsFilter()
statusCodeStats := provider.GetStatusCodeStats(statusCodeFilter)
if statusCodeEnabled {
requestHandlers = append(requestHandlers, statusCodeStats)
responseHandlers = append(responseHandlers, statusCodeStats)
statsProviders = append(statsProviders, statusCodeStats)
}

if agentStatsEnabled {
clientStats := client.NewHandler(agent.NewOperationsFilter())
statsProviders = append(statsProviders, clientStats, provider.GetProcessStats(), provider.GetFlagsStats())
responseHandlers = append(responseHandlers, clientStats)
requestHandlers = append(requestHandlers, clientStats)

}
filter := agent.NewStatusCodeAndOtherOperationsFilter(cfg.Operations)
stats := newStatsHandler(logger, filter, statsProviders)
requestHandlers = append(requestHandlers, stats)

agent.UsageFlags().SetValues(cfg.UsageFlags)
return []awsmiddleware.RequestHandler{stats, clientStats}, []awsmiddleware.ResponseHandler{clientStats}
return requestHandlers, responseHandlers
}

type statsHandler struct {
Expand Down
28 changes: 24 additions & 4 deletions extension/agenthealth/handler/stats/handler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ func TestStatsHandler(t *testing.T) {
StatusCode: aws.Int(200),
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
ImdsFallbackSucceed: aws.Int(1),
SharedConfigFallback: aws.Int(1),
StatusCodes: map[string][5]int{
"pmd": {1, 0, 0, 0, 0},
"di": {0, 1, 0, 0, 0},
},
}
handler := newStatsHandler(
zap.NewNop(),
Expand All @@ -59,15 +63,31 @@ func TestStatsHandler(t *testing.T) {
assert.Equal(t, "", req.Header.Get(headerKeyAgentStats))
handler.filter = agent.NewOperationsFilter(agent.AllowAllOperations)
handler.HandleRequest(ctx, req)
assert.Equal(t, `"cpu":1.2,"mem":123,"fd":456,"th":789,"lat":1234,"load":5678,"code":200,"scfb":1,"ifs":1`, req.Header.Get(headerKeyAgentStats))
assert.Equal(t, `"cpu":1.2,"mem":123,"fd":456,"th":789,"lat":1234,"load":5678,"code":200,"scfb":1,"ifs":1,"codes":{"di":[0,1,0,0,0],"pmd":[1,0,0,0,0]}`, req.Header.Get(headerKeyAgentStats))
stats.StatusCode = aws.Int(404)
stats.LatencyMillis = nil
handler.HandleRequest(ctx, req)
assert.Equal(t, `"cpu":1.2,"mem":123,"fd":456,"th":789,"load":5678,"code":404,"scfb":1,"ifs":1`, req.Header.Get(headerKeyAgentStats))
assert.Equal(t, `"cpu":1.2,"mem":123,"fd":456,"th":789,"load":5678,"code":404,"scfb":1,"ifs":1,"codes":{"di":[0,1,0,0,0],"pmd":[1,0,0,0,0]}`, req.Header.Get(headerKeyAgentStats))
}

func TestNewHandlers(t *testing.T) {
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
requestHandlers, responseHandlers := NewHandlers(zap.NewNop(), agent.StatsConfig{})
func TestNewHandlersWithStatusCodeOnly(t *testing.T) {
requestHandlers, responseHandlers := NewHandlers(zap.NewNop(), agent.StatsConfig{}, true, false)
assert.Len(t, requestHandlers, 2)
assert.Len(t, responseHandlers, 1)
}
func TestNewHandlersWithAgentStatsOnly(t *testing.T) {
requestHandlers, responseHandlers := NewHandlers(zap.NewNop(), agent.StatsConfig{}, false, true)
assert.Len(t, requestHandlers, 2)
assert.Len(t, responseHandlers, 1)
}

func TestNewHandlersWithStatusCodeAndAgenthStats(t *testing.T) {
requestHandlers, responseHandlers := NewHandlers(zap.NewNop(), agent.StatsConfig{}, true, true)
assert.Len(t, requestHandlers, 3)
assert.Len(t, responseHandlers, 2)
}
func TestNewHandlersWithoutStatusCodeAndAgenthStats(t *testing.T) {
requestHandlers, responseHandlers := NewHandlers(zap.NewNop(), agent.StatsConfig{}, false, false)
assert.Len(t, requestHandlers, 0)
assert.Len(t, responseHandlers, 0)
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ import (

"github.com/shirou/gopsutil/v3/process"
"github.com/stretchr/testify/assert"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats/agent"
)

type mockProcessMetrics struct {
Expand Down Expand Up @@ -77,6 +75,6 @@ func TestProcessStats(t *testing.T) {
mock.mu.Unlock()
provider.refresh()
assert.Eventually(t, func() bool {
return provider.getStats() == agent.Stats{}
return len(provider.getStats().StatusCodes) == 0 // map isn't comparable, so we check the length
}, 5*time.Millisecond, time.Millisecond)
}
Loading
Loading