Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/awsemfexporter] fix awsemfexporter Namespace dimension #17030

Conversation

crigertg
Copy link

@crigertg crigertg commented Dec 14, 2022

Description:

  • see related issue
  • Namespace of Cloudwatch was used as dimension
  • this conflicted when the metric has namespace metadata (for e.g. k8s metrics)
  • this is caused because AWS Embedded metric format depends on the
    ordering of the keys in CloudWatchMetrics, if the Namespace of
    cloudwatch comes first it is used in the dimensions too

Link to tracking Issue: fixes #17024

Testing: Built a fixed version and deployed it to an EKS cluster. I was able to see that the EMF key order changed and could see that metrics arrived with the correct namespace:

"_aws": {
        "CloudWatchMetrics": [
            {
                "Dimensions": [
                    [
                        "ClusterName",
                        "Namespace",
                        "PodName"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "pod_cpu_utilization_over_pod_limit",
                        "Unit": "Percent"
                    },
                    {
                        "Name": "pod_network_rx_bytes",
                        "Unit": "Bytes/Second"
                    },
                    {
                        "Name": "pod_cpu_utilization",
                        "Unit": "Percent"
                    },
                    {
                        "Name": "pod_number_of_container_restarts",
                        "Unit": "Count"
                    },
                    {
                        "Name": "pod_network_tx_bytes",
                        "Unit": "Bytes/Second"
                    },
                    {
                        "Name": "pod_memory_utilization",
                        "Unit": "Percent"
                    },
                    {
                        "Name": "pod_memory_utilization_over_pod_limit",
                        "Unit": "Percent"
                    }
                ],
                "Namespace": "ContainerInsights"
            }
        ],
        "Timestamp": 1671033826175

Documentation: I've added a comment above the cWMeasurement type. But I'm not sure if that's the best approach. Suggestions welcome.

@crigertg crigertg requested a review from a team December 14, 2022 16:17
@crigertg crigertg requested a review from Aneurysm9 as a code owner December 14, 2022 16:17
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 14, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: crigertg / name: Claus-Theodor Riegg (3c6160d, b4310a4, b02dcfa, 49a1274b2a8a9626277c32fed10161384cb9a93d)

@crigertg
Copy link
Author

Hello @bogdandrutu @Aneurysm9
Sorry for pinging you but I think this bugfix is important. Can you please take a look at it? If something is wrong with this PR I'm happy to improve it. Please just tell me what to do.

@crigertg crigertg changed the title 17024 fix cwmetrics namespace dimension [exporter/awsemfexporter] fix awsemfexporter Namespace dimension Dec 21, 2022
…n-telemetry#17024)

* see related issue
* Namespace of Cloudwatch was used as dimension
* this conflicted when the metric has namespace metadata (for e.g. k8s metrics)
* this is caused because AWs Embedded metric format depends on the
ordering of the keys in `CloudWatchMetrics`, if the Namespace of
cloudwatch comes first it is used in the dimensions too
@crigertg crigertg force-pushed the 17024_fix_cwmetrics_namespace_dimension branch from a7aff96 to 9cf5e18 Compare December 21, 2022 15:22
@mookie-
Copy link

mookie- commented Dec 23, 2022

👍
Would like to see this getting merged.

@Aneurysm9
Copy link
Member

Thanks for taking a pass at this. I'm a bit perplexed by the mechanism by which this change operates. Can you explain it further? I do not understand how changing the position of a field in a struct declaration affects the generated metrics in this way. Or is it the change in the struct literal construction? Or are both required? Why? The two Namespace values are in different objects in the generated JSON.

Also, some tests are needed to ensure that there is no regression in the future.

@crigertg
Copy link
Author

crigertg commented Jan 4, 2023

Thanks for taking a pass at this. I'm a bit perplexed by the mechanism by which this change operates. Can you explain it further? I do not understand how changing the position of a field in a struct declaration affects the generated metrics in this way. Or is it the change in the struct literal construction? Or are both required? Why? The two Namespace values are in different objects in the generated JSON.

Hello! I'll try to explain. If you've not already have checked the related issue #17024 please do so.

Cloudwatch Metrics can contain an arbitrary number of dimensions to match the metrics to the affected systems/components/etc. So for example the dimensions for a Kubernetes Pod CPU utilizations could be:

  • Podname (name of the Kubernetes Pod)
  • ClusterName (the name of the kubernetes cluster the pod is running in)
  • Namespace (the name of the kubernetes namespace the pod is running in)

There is a field named Namespace for the EMF format too. This field configures the Cloudwatch Namespace (a layer to put your metrics in Cloudwatch). It's mandatory to configure this, otherwise cloudwatch would not know where the metrics should be stored.

Back to our metrics for the CPU utilizations for our kubernetes Pod. Instead that the Namespace of the kubernetes Pod is written to the cloudwatch dimension, the value of the Cloudwatch Namespace gets written to the dimension. This is wrong.

I've continued to debug this and was able to find out, that this only happens, if the Cloudwatch Namespace appears before the Dimensions field in the EMF JSON object (see #17024 (comment) ).

In https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/17030/files#diff-af3bf0e3a5a889da31842d592ffeecc4ef5d3e5b058af7ca500ed44e321bca0cR386 the fieldMap which contains multiple cWMeasurements is unmarshalled. Here the order of the changed CwMeasurement struct comes into play. By changing the order of the struct members the EMF message JSON gets the order in which the namespace of the dimensions is applied correctly.

If you look at the EMF message for the cloudwatch agent (#17024 (comment)) you can see that it's sent in this order too.

The changes are only required for the struct. But I thought it would be good to change the order of the members in other appearances too so that it's consistent.

Also, some tests are needed to ensure that there is no regression in the future.

I will look into that as soon as possible.

@crigertg
Copy link
Author

crigertg commented Jan 4, 2023

I've updated the test for translateCWMetricToEMF. The test already contains a json with the expected format and will fail if the namespace is not at the expected position.
But maybe it would be better to have a seperate assertion which ensures that the cloudwatch Namespace is always at the end of the _aws map.

I've tried to to this simple assertion but to be honest I'm missing a simple solution for checking this. It seems that we would have to introduce new structs/interfaces for just parsing the build json again to check if the field is at the end.

My experience with go is limited and parsing and handling a nested json seems to be pretty complicated. Maybe you can point me in the right direction here.

@Aneurysm9
Copy link
Member

I'm still perplexed. Looking at your sample data from #17024 it seems that there is a Namespace key at the top level of the JSON object that is a user-provided field that may be used to define a dimension in the metrics parsed from this log entry. Here that has the value kube-system and refers to a Kubernetes namespace. There is also a Namespace key in the Metadata Object which should indicate the Cloudwatch metrics namespace into which metrics derived from this log entry should be inserted. (As an aside, it seems the sample data is incorrectly structured as it does not have the Metadata Object underneath the _aws key in the top-level JSON object.) These values are in different objects and thus their keys do not collide.

{
    "CloudWatchMetrics": [
        {
            "Metrics": [
                {
                    "Unit": "Count",
                    "Name": "service_number_of_running_pods"
                }
            ],
            "Dimensions": [
                [
                    "Service",
                    "Namespace",
                    "ClusterName"
                ],
                [
                    "ClusterName"
                ]
            ],
            "Namespace": "ContainerInsights"
        }
    ],
    "ClusterName": "eks-dev",
    "Namespace": "kube-system",
    "Service": "aws-load-balancer-webhook-service",
    "Sources": [
        "apiserver"
    ],
    "Timestamp": "1671023034918",
    "Type": "ClusterService",
    "Version": "0",
    "kubernetes": {
        "namespace_name": "kube-system",
        "service_name": "aws-load-balancer-webhook-service"
    },
    "service_number_of_running_pods": 2
}

It seems like the issue here is on the receiving/parsing side with a streaming parser making incorrect decisions based on observed keys without regard to object containment. I'm not sure that a "fix" that depends on behavior that is called out in the JSON RFC as impairing interoperability is the correct path to take.

When the names within an object are not unique, the behavior of software that receives such an object is unpredictable.
...
JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

I'm also not sure it is safe to assume that the order of declaring fields in the Go struct definition or struct literal initialization is the same order that the fields will be marshalled into a JSON representation of that struct. The Go JSON Marshal function documents that it will sort map keys, but makes no such statement regarding struct keys. Even if it did, that's an implementation detail and another JSON serialization library may be used that has different behavior.

I'm going to make some inquiries with the CW Metrics team to see if they can shed some light on this situation.

@crigertg
Copy link
Author

crigertg commented Jan 5, 2023

@Aneurysm9 Thank you for looking into this. I agree with all you say and I'm perplexed too. Imagine debugging this with no idea why the namespace is wrong all the time 😅

I already reached out to AWS support but had no luck there. That's why I wanted to fix this on the client side and thought it would not do any harm.

As I can see you're working at AWS and I think you will have more luck reaching out to the Cloudwatch Team directly. It would be best if this can be fixed on their side. Will you update this MR if there are any news?

@Aneurysm9
Copy link
Member

A quick update here: I've got the right eyes on this and it is being investigated as a defect in the CW EMF processing. I don't have any ETA for a fix at this time.

@crigertg
Copy link
Author

It seems that this is fixed. I'm not able to reproduce it anymore. See #17024 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/awsemf awsemf exporter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/awsemfexporter] namespace is always set to cloudwatch namespace
4 participants