Skip to content

Commit

Permalink
Fetch Region and CLUSTER_ID information from cni-metrics-helper env (#…
Browse files Browse the repository at this point in the history
…1715)

* merge conflicts with cni-metrics-helper chart

Fix compilation errors (#1751)

add support for running canary script in different regions (#1752)

Regenerate pod eni values for new instance types (#1754)

* Regenerate pod eni values for new instance types

Co-authored-by: Senthil Kumaran <[email protected]>

Closed issue message (#1761)

* closed issue message

* update message

fix typo in upload script (#1763)

Update calico file path

Use an unique s3 bucket name (#1760)

Update region

Workflow to build arm and x86 images (#1764)

DataStore.GetStats() refactoring to simplify adding new fields (#1704)

* DataStore.GetStats() refactoring to simplify adding new fields

* cleanup

* cleanup

* cleanup

* goimports

* rename test to TestGetStatsV4

* address comments

* fix typo

* update

* update "IP pool is too low" logging

* GetStats() -> GetIpStats()

* GetStats() -> GetIpStats() in tests and comments

* update test

* cleanup test

* add logPoolStats comment

Fix KOPS_STATE_STORE (#1770)

Automation script for running IT  (#1759)

Update issue template

Update issue template with email address

Update issue template

Update go.mod for integration folder (#1741)

* Update go.mod for integration folder

- Update go.mod for integration folder

* Change integration test to use new K8s test framework

* Modify server pod image

* Switch to Nginx port 80 for server pod

* Switch server port in client test

* Remove custom command directive for Nginx pod

* Added ping command for host checks

README: mention arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy (#1768)

Co-authored-by: Shreya027 <[email protected]>

Add dl1.24xlarge to ENILimits override list (#1777)

Chart and Manifest updates (#1771)

* Chart and Manifest updates

* Update probe timeout values

Change workflow to use git install (#1785)

- Change workflow to use git install as the go get command was
  altering go.mod file without updating go.sum file

Add HostNetworking Test for PPSG in test agent  (#1720)

* Add HostNetworking Test for PPSG in test agent

* Updated PPSG test to validate vlan.eth link

* Minor change to logging CLUSTER_ID and Region values
Fix for cni-metrics-helper failing integration test

* Fixed merge conflicts

* Readme update

* Updated Readme with more description of AWS_CLUSTER_ID

* minor change
  • Loading branch information
cgchinmay authored Jan 27, 2022
1 parent 25f0daf commit 852d811
Show file tree
Hide file tree
Showing 13 changed files with 200 additions and 196 deletions.
30 changes: 1 addition & 29 deletions charts/cni-metrics-helper/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,6 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
1 change: 1 addition & 0 deletions charts/cni-metrics-helper/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ image:

env:
USE_CLOUDWATCH: "true"
AWS_CLUSTER_ID: ""

fullnameOverride: "cni-metrics-helper"

Expand Down
96 changes: 96 additions & 0 deletions cmd/cni-metrics-helper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,102 @@ The following diagram shows how `cni-metrics-helper` works in a cluster:

![](../../docs/images/cni-metrics-helper.png)

### Using IRSA
As per [AWS EKS Security Best Practice](https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html), if you are using IRSA for pods then following requirements must be satisfied to succesfully publish metrics to CloudWatch

1. The IAM Role for your SA [(IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) must have following policy attached

```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": "*"
}
]
}
```

2. You should have similar ClusterRole and ClusterRoleBinding for the IRSA

```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cni-metrics-helper
rules:
- apiGroups: [""]
resources:
- pods
- pods/proxy
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cni-metrics-helper
labels:
app.kubernetes.io/name: cni-metrics-helper
app.kubernetes.io/instance: cni-metrics-helper
app.kubernetes.io/version: "v1.10.2"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cni-metrics-helper
subjects:
- kind: ServiceAccount
name: <IRSA name>
namespace: kube-system
```

3. Specify the IRSA name in the cni-metrics-helper deployment spec alongwith the AWS_CLUSTER_ID (as described below). The value that you specify here will show up under the dimension 'CLUSTER_ID' for your published metrics. Specifying value for this field is mandatory only if you are blocking IMDS access

#### `AWS_CLUSTER_ID`

Type: String

Default: `""`

An Identifier for your Cluster which will be used as the dimension for published metrics. Ideally it should be ClusterName or ClusterID.

```
kind: Deployment
apiVersion: apps/v1
metadata:
name: cni-metrics-helper
namespace: kube-system
labels:
k8s-app: cni-metrics-helper
spec:
selector:
matchLabels:
k8s-app: cni-metrics-helper
template:
metadata:
labels:
k8s-app: cni-metrics-helper
spec:
containers:
- env:
- name: USE_CLOUDWATCH
value: "true"
- name: AWS_CLUSTER_ID
value: ""
name: cni-metrics-helper
image: <image>
serviceAccountName: <IRSA name>
```
With IRSA, the above deployment spec will be auto-injected with AWS_REGION parameter and it will be used to fetch Region information when we publish metrics.
Possible Scenarios for above configuration
1. If you are not using IRSA, then Region and CLUSTER_ID information will be fetched using IMDS (should have access)
2. If you are using IRSA but have not specified AWS_CLUSTER_ID, we will fetch the value for CLUSTER_ID if IMDS access is not blocked
3. If you have blocked IMDS access, then you must specify a value for AWS_CLUSTER_ID in the deployment spec
4. If you have not blocked IMDS access but have specified AWS_CLUSTER_ID value, then this value will be used.

### Installing the cni-metrics-helper
```
kubectl apply -f v1.6/cni-metrics-helper.yaml
Expand Down
14 changes: 13 additions & 1 deletion cmd/cni-metrics-helper/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,21 @@ func main() {
}
}

// Fetch region, if using IRSA it be will auto injected as env variable in pod spec
// If not found then it will be empty, in which case we will try to fetch it from IMDS (existing approach)
// This can also mean that Cx is not using IRSA and we shouldn't enforce IRSA requirement
region, _ := os.LookupEnv("AWS_REGION")

// should be name/identifier for the cluster if specified
clusterID, _ := os.LookupEnv("AWS_CLUSTER_ID")

log.Infof("Starting CNIMetricsHelper. Sending metrics to CloudWatch: %v, LogLevel %s", options.submitCW, logConfig.LogLevel)

clientSet, err := k8sapi.GetKubeClientSet()
if err != nil {
log.Fatalf("Error Fetching Kubernetes Client: %s", err)
os.Exit(1)
}

rawK8SClient, err := k8sapi.CreateKubeClient()
if err != nil {
Expand All @@ -98,7 +110,7 @@ func main() {
var cw publisher.Publisher

if options.submitCW {
cw, err = publisher.New(ctx)
cw, err = publisher.New(ctx, region, clusterID, log)
if err != nil {
log.Fatalf("Failed to create publisher: %v", err)
}
Expand Down
10 changes: 5 additions & 5 deletions cmd/cni-metrics-helper/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -238,11 +238,11 @@ func postProcessingHistogram(convert metricsConvert, log logger.Logger) bool {
func processMetric(family *dto.MetricFamily, convert metricsConvert, log logger.Logger) (bool, error) {
resetDetected := false

mType := family.GetType()
metricType := family.GetType()
for _, metric := range family.GetMetric() {
for _, act := range convert.actions {
if act.matchFunc(metric) {
switch mType {
switch metricType {
case dto.MetricType_GAUGE:
processGauge(metric, &act)
case dto.MetricType_HISTOGRAM:
Expand All @@ -256,7 +256,7 @@ func processMetric(family *dto.MetricFamily, convert metricsConvert, log logger.
}
}

switch mType {
switch metricType {
case dto.MetricType_COUNTER:
curResetDetected := postProcessingCounter(convert, log)
if curResetDetected {
Expand Down Expand Up @@ -316,9 +316,9 @@ func filterMetrics(originalMetrics map[string]*dto.MetricFamily,
func produceCloudWatchMetrics(t metricsTarget, families map[string]*dto.MetricFamily, convertDef map[string]metricsConvert, cw publisher.Publisher) {
for key, family := range families {
convertMetrics := convertDef[key]
mType := family.GetType()
metricType := family.GetType()
for _, action := range convertMetrics.actions {
switch mType {
switch metricType {
case dto.MetricType_COUNTER:
if t.submitCloudWatch() {
dataPoint := &cloudwatch.MetricDatum{
Expand Down
30 changes: 1 addition & 29 deletions config/master/cni-metrics-helper-cn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
30 changes: 1 addition & 29 deletions config/master/cni-metrics-helper-us-gov-east-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
30 changes: 1 addition & 29 deletions config/master/cni-metrics-helper-us-gov-west-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
33 changes: 4 additions & 29 deletions config/master/cni-metrics-helper.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -89,6 +61,9 @@ spec:
- env:
- name: USE_CLOUDWATCH
value: "true"
# Optional: Should be ClusterName/ClusterIdentifier used as the metric dimension
- name: AWS_CLUSTER_ID
value: ""
name: cni-metrics-helper
image: "602401143452.dkr.ecr.us-west-2.amazonaws.com/cni-metrics-helper:v1.10.1"
serviceAccountName: cni-metrics-helper
Loading

0 comments on commit 852d811

Please sign in to comment.