Simple setup, healthcheck failure #75

zachdaniel · 2018-04-10T05:34:24Z

I followed the guide here: https://cloud.google.com/trace/docs/zipkin And used the startup script option for GCP. I've now sshd and I'm port forwarding locally for testing. However, when I check the healthcheck I get this:

{"status":"UP","zipkin":{"status":"UNKNOWN"},"diskSpace":{"status":"UP","total":10499674112,"free":8691965952,"threshold":10485760}}

Any ideas?
Thanks :D

The text was updated successfully, but these errors were encountered:

zachdaniel · 2018-04-10T05:48:03Z

I have the same effect if run as a daemon set in my kubernetes cluster.

zachdaniel · 2018-04-10T05:50:24Z

I'm also using an erlang open tracing client that sends things to a zipkin instance, and (the documentation on google is really just horrendous around all of this) it looks like perhaps I'm missing an important piece here?

zachdaniel · 2018-04-10T05:55:42Z

I should also add that when I hook up to a normal instance of a zipkin server that I see trace data. Its only using this that I don't see trace data in stackdriver, and I'm not sure the best way to go about debugging it.

codefromthecrypt · 2018-04-10T06:01:46Z

the old image is replaced by the content here. does advice in #74 help? You should be running the zipkin-gcp image basically

zachdaniel · 2018-04-10T06:10:00Z

@adriancole trying it out now, I misread that issue originally, but I see now how it could be helpful!

zachdaniel · 2018-04-10T06:14:37Z

@adriancole I'm a bit confused. I'm not supposed to run both am I?

codefromthecrypt · 2018-04-10T06:16:15Z

I reworded that issue.. right zipkin-gcp replaces the old image. You don't need to and shouldn't run both.

…

On Tue, Apr 10, 2018 at 2:14 PM, Zach Daniel ***@***.***> wrote: @adriancole <https://github.com/adriancole> I'm a bit confused. I'm not supposed to run both am I? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD618X51lapdOC7s2bksJXPSmvglAXdks5tnE3PgaJpZM4TNoK2> .

zachdaniel · 2018-04-10T06:17:48Z

Awesome, thanks :D Now I see that the health check is normal, but I don't see any trace data in stackdriver. Any recommendations on debugging steps?

zachdaniel · 2018-04-10T06:24:56Z

I've got traces going into a local zipkin server, are there special rules about the tags I need, or about what values certain tags can have?

codefromthecrypt · 2018-04-10T06:37:01Z

I think I need to merge #73 what do you think?

zachdaniel · 2018-04-10T06:47:32Z

I can't say with little experience, but merge away and I'll try it out!

jsw · 2018-04-10T06:48:57Z

@zachdaniel I found /metrics helpful in verifying that it's sending traces to Stackdriver. Look for counter.zipkin_storage.stackdriver.sent. Also make sure your cluster has the trace.append oauth scope via gcloud container clusters describe ... | grep trace.append

zachdaniel · 2018-04-10T07:08:33Z

@jsw interesting, I don't see that key at all. I do see a bunch of spans as "dropped", so perhaps there is a problem with my spans?

codefromthecrypt · 2018-04-10T07:13:23Z

I'm releasing 0.3.2 which hopefully will become a docker image within an hour. meanwhile feel free to use https://gitter.im/openzipkin/zipkin for exploratory troubleshooting

…

On Tue, Apr 10, 2018 at 3:08 PM, Zach Daniel ***@***.***> wrote: @jsw <https://github.com/jsw> interesting, I don't see that key at all. I *do* see a bunch of spans as "dropped", so perhaps there is a problem with my spans? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD614RvbZaixmDl2ucrGQLyVppZSjqMks5tnFp0gaJpZM4TNoK2> .

jsw · 2018-04-10T07:33:16Z

@adriancole It looks like the counter.zipkin_storage.stackdriver.sent metric is only present in the gcr image? Wondering if it got lost in the zipkin-gcp image.

codefromthecrypt · 2018-04-10T07:37:37Z

That is not a normal metric. The drop metric covers lost for any reason. In other words if you are not seeing drop metrics the server accepted the spans. That doesnt necessarily mean they processed properly after getting to goog though.

zachdaniel · 2018-04-10T07:39:22Z

{  
   "mem":611627,
   "mem.free":300139,
   "processors":2,
   "instance.uptime":3305466,
   "uptime":3317785,
   "systemload.average":0.19,
   "heap.committed":540160,
   "heap.init":120832,
   "heap.used":240020,
   "heap":1703936,
   "nonheap.committed":73696,
   "nonheap.init":2496,
   "nonheap.used":71468,
   "nonheap":0,
   "threads.peak":19,
   "threads.daemon":12,
   "threads.totalStarted":29,
   "threads":19,
   "classes":9687,
   "classes.loaded":9687,
   "classes.unloaded":0,
   "gc.ps_scavenge.count":13,
   "gc.ps_scavenge.time":244,
   "gc.ps_marksweep.count":2,
   "gc.ps_marksweep.time":151,
   "gauge.zipkin_collector.message_spans.http":5.0,
   "gauge.zipkin_collector.message_bytes.http":4500.0,
   "counter.zipkin_collector.messages.http":7,
   "counter.zipkin_collector.spans_dropped.http":165,
   "counter.zipkin_collector.bytes.http":152759,
   "counter.zipkin_collector.spans.http":165,
   "counter.status.200.health":1,
   "gauge.response.star-star":21.0,
   "gauge.response.health":59.0,
   "counter.status.404.star-star":1
}

These are my metrics.

codefromthecrypt · 2018-04-10T07:40:37Z

Yep so if you set debug logging you can see maybe the reason. Look at the README for the args

codefromthecrypt · 2018-04-10T07:42:50Z

To get all do JAVA_OPTS=-Dlogging.level.zipkin=DEBUG -Dlogging.level.zipkin2=DEBUG

…

On Tue, 10 Apr 2018, 15:40 Adrian Cole, ***@***.***> wrote: Yep so if you set debug logging you can see maybe the reason. Look at the README for the args

codefromthecrypt · 2018-04-10T07:44:40Z

Reason it isnt logging failures by default is if something in your net has bad impl it would fill up disks. The counters tell yoi something is failing..

zachdaniel · 2018-04-10T07:45:55Z

@adriancole yeah, that makes sense! Thanks I'll try it out!

zachdaniel · 2018-04-10T07:48:49Z

🤦‍♂️ The trace API was not enabled.

zachdaniel · 2018-04-10T07:50:52Z

Now I'm getting PERMISSION_DENIED: The caller does not have permission, but I checked and those oauth scopes are granted on the cluster.

zachdaniel · 2018-04-10T15:10:27Z

I got everything worked out :D It was an issue w/ the service account's permissions, the oauth scope on the cluster was not enough, I also needed properly configure the service account to have write access to the Trace API

zachdaniel · 2018-04-10T15:10:34Z

I got everything worked out :D It was an issue w/ the service account's permissions, the oauth scope on the cluster was not enough, I also needed properly configure the service account to have write access to the Trace API

zachdaniel closed this as completed Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple setup, healthcheck failure #75

Simple setup, healthcheck failure #75

zachdaniel commented Apr 10, 2018 •

edited

Loading

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018 via email

zachdaniel commented Apr 10, 2018 •

edited

Loading

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

jsw commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018 via email

jsw commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018 via email

zachdaniel commented Apr 10, 2018 •

edited

Loading

codefromthecrypt commented Apr 10, 2018 via email

codefromthecrypt commented Apr 10, 2018 via email

codefromthecrypt commented Apr 10, 2018 via email

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

Simple setup, healthcheck failure #75

Simple setup, healthcheck failure #75

Comments

zachdaniel commented Apr 10, 2018 • edited Loading

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018 via email

zachdaniel commented Apr 10, 2018 • edited Loading

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

jsw commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018 via email

jsw commented Apr 10, 2018

codefromthecrypt commented Apr 10, 2018 via email

zachdaniel commented Apr 10, 2018 • edited Loading

codefromthecrypt commented Apr 10, 2018 via email

codefromthecrypt commented Apr 10, 2018 via email

codefromthecrypt commented Apr 10, 2018 via email

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018

zachdaniel commented Apr 10, 2018 •

edited

Loading

zachdaniel commented Apr 10, 2018 •

edited

Loading

zachdaniel commented Apr 10, 2018 •

edited

Loading