Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple setup, healthcheck failure #75

Closed
zachdaniel opened this issue Apr 10, 2018 · 25 comments
Closed

Simple setup, healthcheck failure #75

zachdaniel opened this issue Apr 10, 2018 · 25 comments

Comments

@zachdaniel
Copy link

zachdaniel commented Apr 10, 2018

I followed the guide here: https://cloud.google.com/trace/docs/zipkin And used the startup script option for GCP. I've now sshd and I'm port forwarding locally for testing. However, when I check the healthcheck I get this:

{"status":"UP","zipkin":{"status":"UNKNOWN"},"diskSpace":{"status":"UP","total":10499674112,"free":8691965952,"threshold":10485760}}

Any ideas?
Thanks :D

@zachdaniel
Copy link
Author

I have the same effect if run as a daemon set in my kubernetes cluster.

@zachdaniel
Copy link
Author

I'm also using an erlang open tracing client that sends things to a zipkin instance, and (the documentation on google is really just horrendous around all of this) it looks like perhaps I'm missing an important piece here?

@zachdaniel
Copy link
Author

I should also add that when I hook up to a normal instance of a zipkin server that I see trace data. Its only using this that I don't see trace data in stackdriver, and I'm not sure the best way to go about debugging it.

@codefromthecrypt
Copy link
Member

the old image is replaced by the content here. does advice in #74 help? You should be running the zipkin-gcp image basically

@zachdaniel
Copy link
Author

@adriancole trying it out now, I misread that issue originally, but I see now how it could be helpful!

@zachdaniel
Copy link
Author

@adriancole I'm a bit confused. I'm not supposed to run both am I?

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 10, 2018 via email

@zachdaniel
Copy link
Author

zachdaniel commented Apr 10, 2018

Awesome, thanks :D Now I see that the health check is normal, but I don't see any trace data in stackdriver. Any recommendations on debugging steps?

@zachdaniel
Copy link
Author

I've got traces going into a local zipkin server, are there special rules about the tags I need, or about what values certain tags can have?

@codefromthecrypt
Copy link
Member

I think I need to merge #73 what do you think?

@zachdaniel
Copy link
Author

I can't say with little experience, but merge away and I'll try it out!

@jsw
Copy link

jsw commented Apr 10, 2018

@zachdaniel I found /metrics helpful in verifying that it's sending traces to Stackdriver. Look for counter.zipkin_storage.stackdriver.sent. Also make sure your cluster has the trace.append oauth scope via gcloud container clusters describe ... | grep trace.append

@zachdaniel
Copy link
Author

@jsw interesting, I don't see that key at all. I do see a bunch of spans as "dropped", so perhaps there is a problem with my spans?

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 10, 2018 via email

@jsw
Copy link

jsw commented Apr 10, 2018

@adriancole It looks like the counter.zipkin_storage.stackdriver.sent metric is only present in the gcr image? Wondering if it got lost in the zipkin-gcp image.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 10, 2018 via email

@zachdaniel
Copy link
Author

zachdaniel commented Apr 10, 2018

{  
   "mem":611627,
   "mem.free":300139,
   "processors":2,
   "instance.uptime":3305466,
   "uptime":3317785,
   "systemload.average":0.19,
   "heap.committed":540160,
   "heap.init":120832,
   "heap.used":240020,
   "heap":1703936,
   "nonheap.committed":73696,
   "nonheap.init":2496,
   "nonheap.used":71468,
   "nonheap":0,
   "threads.peak":19,
   "threads.daemon":12,
   "threads.totalStarted":29,
   "threads":19,
   "classes":9687,
   "classes.loaded":9687,
   "classes.unloaded":0,
   "gc.ps_scavenge.count":13,
   "gc.ps_scavenge.time":244,
   "gc.ps_marksweep.count":2,
   "gc.ps_marksweep.time":151,
   "gauge.zipkin_collector.message_spans.http":5.0,
   "gauge.zipkin_collector.message_bytes.http":4500.0,
   "counter.zipkin_collector.messages.http":7,
   "counter.zipkin_collector.spans_dropped.http":165,
   "counter.zipkin_collector.bytes.http":152759,
   "counter.zipkin_collector.spans.http":165,
   "counter.status.200.health":1,
   "gauge.response.star-star":21.0,
   "gauge.response.health":59.0,
   "counter.status.404.star-star":1
}

These are my metrics.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 10, 2018 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 10, 2018 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 10, 2018 via email

@zachdaniel
Copy link
Author

@adriancole yeah, that makes sense! Thanks I'll try it out!

@zachdaniel
Copy link
Author

🤦‍♂️ The trace API was not enabled.

@zachdaniel
Copy link
Author

Now I'm getting PERMISSION_DENIED: The caller does not have permission, but I checked and those oauth scopes are granted on the cluster.

@zachdaniel
Copy link
Author

I got everything worked out :D It was an issue w/ the service account's permissions, the oauth scope on the cluster was not enough, I also needed properly configure the service account to have write access to the Trace API

1 similar comment
@zachdaniel
Copy link
Author

I got everything worked out :D It was an issue w/ the service account's permissions, the oauth scope on the cluster was not enough, I also needed properly configure the service account to have write access to the Trace API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants