Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: Export to Stackdriver is not working #1330

Closed
3 of 4 tasks
aLekSer opened this issue Feb 10, 2020 · 11 comments · Fixed by #1479
Closed
3 of 4 tasks

Metrics: Export to Stackdriver is not working #1330

aLekSer opened this issue Feb 10, 2020 · 11 comments · Fixed by #1479
Labels
area/operations Installation, updating, metrics etc kind/bug These are bugs.
Milestone

Comments

@aLekSer
Copy link
Collaborator

aLekSer commented Feb 10, 2020

There are no stackdriver metrics due to error in Labels:

Failed to export to Stackdriver: rpc error: code = InvalidArgument
 desc = One or more TimeSeries could not be written:
 Unrecognized resource label: instance_id: timeSeries[2,3,12];
 Unrecognized resource label: namespace_id: timeSeries[1,11]; 
Unrecognized resource label: pod_id: timeSeries[0,5,7]; 
Unrecognized resource label: zone: timeSeries[4,6,8-10]

What happened:

No stackdriver metrics on the dashboard, which was working several month ago.
New errors on Agones Controller logs.

What you expected to happen:
Stackdriver metrics are correctly visualised.

How to reproduce it (as minimally and precisely as possible):
https://agones.dev/site/docs/guides/metrics/#stackdriver-installation

What should be done to fix an issue

Anything else we need to know?:
There are two Pull requests which solved mentioned above ticket
They contain fixes for:

func getMonitoredResource(projectID string) (*monitoredres.MonitoredResource, error) {
...
}

Environment:

  • Agones version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Install method (yaml/helm):
  • Troubleshooting guide log(s):
  • Others:
@aLekSer aLekSer added the kind/bug These are bugs. label Feb 10, 2020
@aLekSer aLekSer changed the title Export to Stackdriver is not working Metrics: Export to Stackdriver is not working Feb 10, 2020
@aLekSer
Copy link
Collaborator Author

aLekSer commented Feb 10, 2020

We can switch from getMonitoredResource() function to monitoredresource.Autodetect() after updating to version 0.22 of opencensus and contrib.go.opencensus.io/exporter/stackdriver v0.12.0 as it is done in latest example here:
https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/6ee7f9652d2a9e707fea22c56d06235db6289426/examples/stats/main.go#L51

@markmandel markmandel added the area/operations Installation, updating, metrics etc label Feb 10, 2020
@markmandel
Copy link
Member

@bbf Have you seen this on recent releases?

@bbf
Copy link
Contributor

bbf commented Feb 11, 2020

While I have not tested any recent releases, I can imagine why a few things stopped working. I'm very interested in overhauling the Stackdriver integration of Agones, so if possible give me some time to look into it.

It was already in my plans to propose some changes to have a better alignment between Agones and the new monitoring agent used by GKE on Stackdriver, so addressing that while fixing this bug might be ideal.

@markmandel / @aLekSer WTDY?

@aLekSer
Copy link
Collaborator Author

aLekSer commented Feb 11, 2020

Hello, I managed to make it working by updating exporter's Monitored resource yesterday. I will send a PR, it involves update of the Opencensus to 0.22 this update slow me down a bit.
@bbf I will send a draft PR soon so you can review

@aLekSer
Copy link
Collaborator Author

aLekSer commented Feb 11, 2020

Well switching to AutoDetect() was not working on recent OpenCensus and stackdriver-exporter as well:
https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/master/monitoredresource/gcp_metadata_config.go#L100
I will rewrite getMonitoredResource() for a fast fix.
And then need to understand why Autodetect():

	resT, lab :=  monitoredresource.Autodetect().MonitoredResource()
	logger.Info("Monitored Resource: ", resT, " ", lab)

returns on test-cluster GKE:

Monitored Resource: gke_container map[cluster_name:test-cluster container_name:agones-controller instance_id:1205178163407041488 namespace_id: pod_id:agones-controller-59bd95c448-dwp88 project_id:agones-alexander zone:us-west1-c]

While working scenario is k8s_container as in upcoming PR

@aLekSer
Copy link
Collaborator Author

aLekSer commented Feb 11, 2020

Also we receive errors for Prometheus exporter:

textPayload: "2020/02/07 15:14:14 Failed to export to Prometheus: inconsistent label cardinality: expected 1 label values but got 0 in []string(nil)

Which seems to be census-instrumentation/opencensus-go#659
with a fix census-instrumentation/opencensus-go#989

@markmandel
Copy link
Member

I defer these things to you two 😄 my knowledge of metrics is very low.

I definitely advocate for a working solution 😁

@markmandel
Copy link
Member

@cyriltovena have you got any feedback here?

@aLekSer
Copy link
Collaborator Author

aLekSer commented Feb 11, 2020

Currently on Master Prometheus is working, but contains such error message in Agones Controller logs.
PR #1335 adds working stackdriver. Update to OpenCensus 0.22 could be done after this fix, to split up the process. I thought to update in single PR, but as in #893 all tests should be updated.

@markmandel
Copy link
Member

Is this fixed now?

@aLekSer
Copy link
Collaborator Author

aLekSer commented Apr 22, 2020

Stackdriver would be fixed after PR, now I am grabbing screenshots from Grafana to compare with a previous one made by @cyriltovena as part of #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants