-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/awsemfexporter] fix awsemfexporter Namespace dimension #17030
[exporter/awsemfexporter] fix awsemfexporter Namespace dimension #17030
Conversation
49a1274
to
e8f95d3
Compare
e8f95d3
to
a7aff96
Compare
Hello @bogdandrutu @Aneurysm9 |
…n-telemetry#17024) * see related issue * Namespace of Cloudwatch was used as dimension * this conflicted when the metric has namespace metadata (for e.g. k8s metrics) * this is caused because AWs Embedded metric format depends on the ordering of the keys in `CloudWatchMetrics`, if the Namespace of cloudwatch comes first it is used in the dimensions too
a7aff96
to
9cf5e18
Compare
👍 |
Thanks for taking a pass at this. I'm a bit perplexed by the mechanism by which this change operates. Can you explain it further? I do not understand how changing the position of a field in a struct declaration affects the generated metrics in this way. Or is it the change in the struct literal construction? Or are both required? Why? The two Also, some tests are needed to ensure that there is no regression in the future. |
Hello! I'll try to explain. If you've not already have checked the related issue #17024 please do so. Cloudwatch Metrics can contain an arbitrary number of dimensions to match the metrics to the affected systems/components/etc. So for example the dimensions for a Kubernetes Pod CPU utilizations could be:
There is a field named Back to our metrics for the CPU utilizations for our kubernetes Pod. Instead that the Namespace of the kubernetes Pod is written to the cloudwatch dimension, the value of the Cloudwatch Namespace gets written to the dimension. This is wrong. I've continued to debug this and was able to find out, that this only happens, if the Cloudwatch Namespace appears before the In https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/17030/files#diff-af3bf0e3a5a889da31842d592ffeecc4ef5d3e5b058af7ca500ed44e321bca0cR386 the If you look at the EMF message for the cloudwatch agent (#17024 (comment)) you can see that it's sent in this order too. The changes are only required for the struct. But I thought it would be good to change the order of the members in other appearances too so that it's consistent.
I will look into that as soon as possible. |
I've updated the test for I've tried to to this simple assertion but to be honest I'm missing a simple solution for checking this. It seems that we would have to introduce new structs/interfaces for just parsing the build json again to check if the field is at the end. My experience with go is limited and parsing and handling a nested json seems to be pretty complicated. Maybe you can point me in the right direction here. |
I'm still perplexed. Looking at your sample data from #17024 it seems that there is a {
"CloudWatchMetrics": [
{
"Metrics": [
{
"Unit": "Count",
"Name": "service_number_of_running_pods"
}
],
"Dimensions": [
[
"Service",
"Namespace",
"ClusterName"
],
[
"ClusterName"
]
],
"Namespace": "ContainerInsights"
}
],
"ClusterName": "eks-dev",
"Namespace": "kube-system",
"Service": "aws-load-balancer-webhook-service",
"Sources": [
"apiserver"
],
"Timestamp": "1671023034918",
"Type": "ClusterService",
"Version": "0",
"kubernetes": {
"namespace_name": "kube-system",
"service_name": "aws-load-balancer-webhook-service"
},
"service_number_of_running_pods": 2
} It seems like the issue here is on the receiving/parsing side with a streaming parser making incorrect decisions based on observed keys without regard to object containment. I'm not sure that a "fix" that depends on behavior that is called out in the JSON RFC as impairing interoperability is the correct path to take.
I'm also not sure it is safe to assume that the order of declaring fields in the Go struct definition or struct literal initialization is the same order that the fields will be marshalled into a JSON representation of that struct. The Go JSON Marshal function documents that it will sort map keys, but makes no such statement regarding struct keys. Even if it did, that's an implementation detail and another JSON serialization library may be used that has different behavior. I'm going to make some inquiries with the CW Metrics team to see if they can shed some light on this situation. |
@Aneurysm9 Thank you for looking into this. I agree with all you say and I'm perplexed too. Imagine debugging this with no idea why the namespace is wrong all the time 😅 I already reached out to AWS support but had no luck there. That's why I wanted to fix this on the client side and thought it would not do any harm. As I can see you're working at AWS and I think you will have more luck reaching out to the Cloudwatch Team directly. It would be best if this can be fixed on their side. Will you update this MR if there are any news? |
A quick update here: I've got the right eyes on this and it is being investigated as a defect in the CW EMF processing. I don't have any ETA for a fix at this time. |
It seems that this is fixed. I'm not able to reproduce it anymore. See #17024 (comment) |
Description:
ordering of the keys in
CloudWatchMetrics
, if the Namespace ofcloudwatch comes first it is used in the dimensions too
Link to tracking Issue: fixes #17024
Testing: Built a fixed version and deployed it to an EKS cluster. I was able to see that the EMF key order changed and could see that metrics arrived with the correct namespace:
Documentation: I've added a comment above the
cWMeasurement
type. But I'm not sure if that's the best approach. Suggestions welcome.