Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save trace to permanent store #1747

Closed
wants to merge 24 commits into from
Closed

Conversation

naoman
Copy link
Contributor

@naoman naoman commented Sep 27, 2017

Ref: #1093
Quick (and dirty) implementation of save trace. Sending out for review to get some feedback.

Following @adriancole suggestion of deploying Zipkin Server twice, one backed by regular datastore and one with permanent datastore.

Adding Save Trace button on the UI. Clicking this butting will make an AJAX request to backend api /api/v1/save/trace/<trace_id>, and on receiving this request, zipkin server will pull spans for this trace, and POST them to permanent store using api /api/v1/spans

Address of permanent datastore server can be configured by property zipkin.permanent.store

@codefromthecrypt
Copy link
Member

Add screen shot?

PS: use v2 api as this is a new feature? Easier than having to do a migration of this to v2 later

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting.. I was thinking this was going to happen in javascript. maybe it should? that way it doesn't need to expose a new server-side endpoint.

@naoman
Copy link
Contributor Author

naoman commented Sep 28, 2017

screen shot 2017-09-28 at 2 57 55 pm
screen shot 2017-09-28 at 2 58 13 pm
screen shot 2017-09-28 at 3 01 58 pm

use v2 api as this is a new feature? Easier than having to do a migration of this to v2 later

We have not migrated to V2 yet. Our datastore is in V1 format, so we have to read via V1 api. For writing to permanent store, either V1 or V2 is fine. Looks like zipkin server writes in V2 format even if we call V1 api, which seems a bit strange.
Ideally we should be able to run the server in V1 mode or V2 mode, and all APIs can then work accordingly. Do we have a config property that can do this?

interesting.. I was thinking this was going to happen in javascript. maybe it should? that way it doesn't need to expose a new server-side endpoint.

I wanted to keep the client light. Also, if its on the server side, other clients/scripts can also use it. We can move it to javascript if needed.

@codefromthecrypt
Copy link
Member

I don't want to make an api like this as a part of the server api, especially not under v1 namespace. it doesn't impact the UI to do the blocking here or there.

In order to deploy this UI change, you will inherit the v2 api anyway, unless you are somehow hosting static assets independently.. is that the case?

@naoman
Copy link
Contributor Author

naoman commented Sep 29, 2017

I don't want to make an api like this as a part of the server api, especially not under v1 namespace. it doesn't impact the UI to do the blocking here or there.

So you want to go for javascript solution, where client (UI) reads form one zipkin-server and saves to permanent zipkin-server? I'll make the change.

In order to deploy this UI change, you will inherit the v2 api anyway, unless you are somehow hosting static assets independently.. is that the case?

Not sure what you mean. What I'm thinking is that UI/Javascript will read raw spans for a trace using /api/v1/trace/<trace_id> and post them to http://<permanent_zipkin_address>/api/v1/spans

e.preventDefault();
const traceId = this.attr.traceId;

$.ajax(`/api/v1/save/trace/${traceId}`).done(result => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so basically I'd suggest having an alternative that reads from config.json and if a key like "permanentZipkinUrl" is present, show the link, and on clicking the link copy the result of /api/v2/trace/ID to a POST ${permanentZipkinEndpoint} (ex http://foo/bar/api/v2/spans)

you don't need to process the body in any way or even know about the v2 model. The server hosting the permanent repo doesn't need to be using v2 native storage, though obviously in the case of elasticsearch it should as v2 is a better investment for a concept like permanent.

For the handling, just copy the bytes from the first response to the request body of the second. the code will be easy and the results easy to reason with. The url associated with this task could be defined in crossroads.

If this isn't clear etc, lemme know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one other consideration is to add a tag to the first span (easy in v2 model), just insert one in the tags dict.
ex tags.outage='disaster happened in JIRA FOO-1234' then in your permanent store, you can search for "outage" and get this trace. Such a feature could be worth a little javascript to make it work.

This will allow you to label the trace such that in your long-term storage you can look it up somehow. Otherwise, eventually, these favorite traces will become annoying to browse.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 29, 2017 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 29, 2017 via email

@naoman
Copy link
Contributor Author

naoman commented Sep 29, 2017

Ex we have a saved trace, but this UI wont be able to find saved traces.. Is that ok? If so, why? If not, why not?

Ideally same UI should provide access to saved traces, but I think its okay to have a separate UI, if it helps keeping the system simple and maintainable. It could be considered an archiving feature, and in most systems, retrieving archived data has a separate process than active data.

Also, if a user is trying to access a trace thats not available in the regular datastore, the UI can share a link to archive/permanent store, or even pull trace from there. Search will be a bit tricker, but we can handle that too by picking the correct datastore based on the TTL for the regular store.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 30, 2017 via email

@naoman
Copy link
Contributor Author

naoman commented Oct 3, 2017

... and so far I think archive is it. do you agree?

Yes.

screen shot 2017-10-03 at 4 10 25 pm

Error message
screen shot 2017-10-03 at 4 10 38 pm

Success message, when zipkin.ui.archive-read-endpoint is not specified.
screen shot 2017-10-03 at 4 11 20 pm

Success message, when zipkin.ui.archive-read-endpoint is specified.
screen shot 2017-10-03 at 4 12 04 pm

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. great docs, too!

Can you poke around the zipkin-js tests section to see if there's any low-hanging fruit (ex test case that already exists that you can pop a related unit test into)?

@@ -161,6 +161,11 @@
</dependency>

<dependency>
<groupId>org.springframework</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting this exception while posting spans to /api/v2/spans

2017-10-03 11:08:28.605 ERROR 46979 --- [nio-9412-exec-2] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.NoClassDefFoundError: org/springframework/util/concurrent/SettableListenableFuture] with root cause

java.lang.NoClassDefFoundError: org/springframework/util/concurrent/SettableListenableFuture
	at zipkin.server.ZipkinHttpCollector.validateAndStoreSpans(ZipkinHttpCollector.java:88) ~[classes!/:na]
	at zipkin.server.ZipkinHttpCollector.uploadSpansJson2(ZipkinHttpCollector.java:67) ~[classes!/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_101]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_101]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_101]

This stack overflow suggested this solution. I couldn't get it working without adding this dependency.

@codefromthecrypt
Copy link
Member

you'll want to rebase this on master, as the thing that allows proxies to work was merged

shakuzen and others added 9 commits October 3, 2017 19:42
This adds a RabbitMQ collector module along with its corresponding auto-configuration.
…openzipkin#1752)

For safety, create the stress schema in a different keyspace.
…l_dc has been defined, otherwise use ONE (openzipkin#1753)

Mark all writes as idempotent.
Update the retry policy, honouring idempotence, consistency level, and applying appropriate back-pressure.
* Support for running Zipkin behind a reverse proxy
 - subfolder 'zipkin' is enforced
 - rewrite of the 'base' tag in the 'index.html' file is required

* Fixed 'redirectedHeaderUsesOriginalHostAndPort' test to ensure that a relative URI is returned for a redirect to 'zipkin/' subpath.

* Using 'contextRoot' as a code style conforming reference to the '__webpack_public_path__' global variable.

* Using relative URIs for AJAX calls (which works thanks to the 'head>base' html).
Appending '/' to the 'contextRoot' if not present.

* Added readme regarding running behind a reverse proxy.
Fixed running in a 'devServer' mode. Assuming the 'zipkin-server' runs at 'localhost:9411', execute:

    proxy=http://localhost:9411/zipkin/ ./npm.sh run dev
@codefromthecrypt
Copy link
Member

@jcarres-mdsol so we have had a number of requests for how to achieve permanent storage w/o complicating other things. I've not found a simpler way than this..

The notes would be that if you want permanent storage, setup zipkin somewhere (or a zipkin proxy like stackdriver), then point env variables to that. If using a normal zipkin server, the archival one should probably set the environment name to "archive" (something you can configure on the UI).

Does this clarify the setup? indeed it would be new for a lot of folks, but fairly easy to provision (add a permanent namespace to storage, keep a server running which needn't deal with high volume as it is only archival)

@jorgheymans
Copy link
Contributor

will the UI have an href somewhere then to the archive endpoint ?

@naoman
Copy link
Contributor Author

naoman commented Oct 5, 2017

@jorgheymans when a trace is archived, the UI will show a link to archived trace in the confirmation dialog. Screenshot attached above.

@naoman just curious how many traces do you save to permanent store ? Is it a regularly occuring thing ?

Not many, maybe just a few traces per day. Because of our high trace volume, regular datastore has TTL of 10 days. This feature will let users save "interesting" traces, specially something related to a bug, for longer duration.

@jorgheymans
Copy link
Contributor

@naoman what i meant was just a top level link on the ui somewhere that points to the archiving endpoint if it is configured. That way ppl only have to remember (or bookmark) the "main" zipkin instance.

@naoman
Copy link
Contributor Author

naoman commented Oct 6, 2017

@jorgheymans Yes, this can be useful. But at this point I want to keep the code simple and clean, since this feature will not be used by many users. In my personal experience, when archiving a trace, users will save the link to the trace itself, since they are archiving the trace to save or share it.

I would say we should go ahead with the current code, and revisit based on user feedback.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Oct 7, 2017 via email

@naoman
Copy link
Contributor Author

naoman commented Oct 7, 2017

Like I mentioned in my last message, this change is ready and fully functional. Adding a link would be an improvement on top of this change that can be made in a separate pull request. Lets merge this change, get the feature out there so that people can start using it, get user feedback, and then improve iteratively.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Oct 7, 2017 via email

@naoman
Copy link
Contributor Author

naoman commented Oct 7, 2017

We don't merge single user stuff, so consider it lucky someone else is interested!

Not sure what you mean. Do you think I'm trying to push something thats Pinterest specific, and not needed by other users?

Let me try to explain one more time. While the request to add an additional url link is valid, its an add-on feature for this change, and does not block it.

Can you please explain why these two changes can not be made in two separate pull requests?

@jorgheymans
Copy link
Contributor

jorgheymans commented Oct 7, 2017 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Oct 7, 2017 via email

@codefromthecrypt
Copy link
Member

@openzipkin/core I personally don't want to go around in explanation circles on this. I actually am ok with the change in general, but not motivated to continue this line of inquiry. If someone else is ok driving this, please do. I'm going to unsubscribe for the time being

@naoman
Copy link
Contributor Author

naoman commented Oct 7, 2017

@jorgheymans I've added a link to archive server on the menu bar. Let me know what you think. Below is the screenshot.

screen shot 2017-10-07 at 1 03 43 pm

Seems a hacky solution, most people will not have this setup.

@jcarres-mdsol can you please elaborate on why you feel this solution is hacky and how it can be improved.

Would the button appear in the UI by default? It may set expectations it would work when in most cases would not

@jcarres-mdsol No it would not. I've added this information in Readme as well. Let me know if you think it needs more explanation there.

@jorgheymans
Copy link
Contributor

jorgheymans commented Oct 7, 2017 via email

@eirslett
Copy link
Contributor

eirslett commented Oct 7, 2017

Zipkin already has a feature to download a trace as a JSON file (and I think also one to upload a JSON file and render it in the UI). Wouldn't that cover the use case? You could just store the JSON file wherever you want, on a hard drive, dropbox etc...

I do agree with @adriancole's comments in general.

@eirslett
Copy link
Contributor

eirslett commented Oct 7, 2017

Another solution is to run a cache proxy like Varnish in front of Zipkin, and make it cache results for 30 days. (Assuming that the amount of Zipkin data developers look at will be very small relative to the total amount of Zipkin data, so it could comfortably fit on a cache instance)

@naoman
Copy link
Contributor Author

naoman commented Oct 7, 2017

@eirslett Yes, thats do able, however it wont be very scaleable and convenient. Consider the use case where we want to save a trace related to a bug. Without this feature, we'll first have to download the trace, attach it to the bug ticket, and upload if back to zipkin server for looking at the trace. This feature provides a convenient way to archive traces with a single click.

@eirslett
Copy link
Contributor

eirslett commented Oct 7, 2017

Without this feature, we'll first have to download the trace, attach it to the bug ticket, and upload if back to zipkin server for looking at the trace. This feature provides a convenient way to archive traces with a single click.

With the Varnish solution, I guess you could just copy/paste a link to the trace in the bug report?

@naoman
Copy link
Contributor Author

naoman commented Oct 7, 2017

@eirslett varnish proxy based cache will have its own limitations. Below are few:

  • what if we want to keep data for longer than cache TTL?
  • how do we enable data replication?
  • What if other tools (service dependency graph, trace analyzer scripts, etc) want to pull this data?

@jcarres-mdsol
Copy link
Contributor

@naoman I think it is hacky because to add this new functionality, I need to double my infrastructure.

@naoman
Copy link
Contributor Author

naoman commented Oct 16, 2017

I think it is hacky because to add this new functionality, I need to double my infrastructure.

@jcarres-mdsol Can you please elaborate on this? This solution does require some additional infrastructure setup, but by no means it would double the infrastructure. For example, below is how Elasticsearch based solution look like:

  • A new index for archived traces in the same Elasticsearch cluster.
  • A new Zipkin Server instance for archived traces. This instance can run on the same hosts running regular Zipkin Server
  • A load-balancer in front of Zipkin archive server. This is the only additional infrastructure component needed for this solution.

If you still feel that this is too much hassle for archiving traces, can you propose a better solution?

@openzipkin openzipkin deleted a comment Nov 15, 2017
abesto pushed a commit to abesto/zipkin that referenced this pull request Sep 10, 2019
This is an alternative to saving on the backend or via a proxy

Closes openzipkin#1747
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants