Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Archive Trace button #3018

Merged
merged 2 commits into from
Mar 23, 2020
Merged

Conversation

drolando
Copy link
Contributor

This button lets you easily reupload the current trace to a different
server.

The main motivation for having this is that you can have the
archival server have a very long retention period and use it as very
long term storage for traces that you care about. For example when
sharing a trace in a jira ticket since otherwise the link would expire
after 1 week.

Design doc explaining the reason behind this in more details and why we
went with this implementation: https://github.com/openzipkin/openzipkin.github.io/wiki/Favorite-trace

@drolando
Copy link
Contributor Author

I still need to figure out how translations work, but at least this is now out for people to comment on.

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some bike shed a little..

archiveWriteUrl -> archivePostUrl (as it is a POST)
archiveReadUrl -> archiveUrl (shorter.. emphasizing it is a web link not a raw json url)

zipkin-server/README.md Outdated Show resolved Hide resolved
zipkin-server/README.md Outdated Show resolved Hide resolved
@mchandramouli
Copy link

@drolando @adriancole - this might be a bit late. I read the need @ https://github.com/openzipkin/openzipkin.github.io/wiki/Favorite-trace. Instead of a second cluster, one can have a second storage with higher retention and do scatter/gather on search. When someone does an explicit search for a traceid, move the trace to the second storage in the background. This allows the trace to still show up in the cluster and be available longer. Not sure if zipkin allows multiple storage DBs and scatter / gather on search - if it does, this might be easier than a second archive cluster from user perspective as saved links in jira or issues will continue to work as is. This is what was being done when I last worked with Haystack team

@codefromthecrypt
Copy link
Member

@mchandramouli thanks for the context. I think for some setups an internal tiering could work.

There is additional configuration complexity to achieve that.. we don't have anything that could do that internally, especially as TTL is implemented differently (if defined at all). For example, in cassandra it is literally copying the rows again (to update the TTL). Also, much of the configuration would double to achieve the goal.. I think we've talked about this tension in the past. It gets more complex when storage is different schema or tech per tier. These decisions are mostly backend I think anyway, and this change is all frontend.

It is a very good point on readback/permalink, we do have a change here to allow a different URL. However, there's no requirement to use one. Ex yelp did what you said with scatter gather in the past https://github.com/drolando/zipkin-mux. We don't have to use a separate URL space for things explicitly saved, or otherwise tiered to a longer TTL. If we mount the UI assets so its /zipkin path is mounted against a proxy api, it will never know the difference.

Stepping back, this change is focused on is users being explicitly able to choose traces, which is possibly higher signal of interest than queries, albeit manually trained. Our UI saves data from the list screen to the trace detail. A trace id-specific api call never occurs in this flow, unless the user hit the trace link directly. In other words, a user who is discovering a trace from a list will show no side-effects to the backend that they are looking at one trace. Basically you'd have to assume that any queried data with no parameters except time imply interest, then push all those a tier down. I think this is something that could require tuning to avoid chattiness, because I would suspect many just hit refresh a lot. However, I agree even the wider net would be significantly smaller than all traces: time-based is better than pushing all traces a tier back.

Automatic tiering is nevertheless an interesting idea sites can consider... maybe this feature can help build tension needed to explore that. Meanwhile, they have a lot of options that are easy as pie.

In simplest case you can imagine someone who is already using our in-memory server, and wants to save off traces to ES or even a cloud provider (many accept zipkin format now). In the case of cassandra, re-posting to another cluster is similar impact to overwriting the same rows with a new TTL. TL;DR; In this design there's no constraint in which technologies are in use, hence very big bank for the buck.

I'm very happy to hear about the story you mentioned though.. it is no doubt helpful for people to know existing practice! thanks!

@jcchavezs
Copy link
Contributor

jcchavezs commented Mar 11, 2020 via email

@codefromthecrypt
Copy link
Member

@jcchavezs I think we did discuss having a reason at some point.. ex someone can put a JIRA ticket in the tag value. Good idea. basically someone can enter a choice of that tag value or nothing to just save empty.

On the other topic, I wouldn't assume it is definitely a zipkin server that's being archived to. It could be any APM, or even haystack. We can't really assume we know the link syntax, but that's what the POST-back is there for if it can work. Same as logs-url, if someone can express a query, they can populate the URL.

I'd recommend stopping there, short of bookmark service though. We already overloaded screen issues (ex #3014) so suggest we don't do too much adding of more links until there's more practice to show they are helpful. my 2p

@jorgheymans
Copy link
Contributor

Mentioning #1747 and #1093 as prior discussions of this feature, there may be more.

@codefromthecrypt
Copy link
Member

a shower helped me recognize why we can easily branch discussion into automated escalation of traces to another tier, when looking at this. The reason I think is that this change can feel very much like user-assisted late (after-the-fact or tail) sampling.

Ex for the same reasons @mchandramouli mentioned. @jeqo and others have discussed how automated late sampling can occur, ex by storing queries to the api. This led to a design idea that a collector tier component would actually need to consider the entire trace. We had a couple initiatives to play with this including the VoltDB experiment and a later more fleshed out Kafka Streams approach.

What I'd recommend in this change is to be conscious that something like this could help train automation, the automation itself could be the POST endpoint here! That said, we can probably develop things around tail sampling separately, mentioning this for context and visa versa. I'm really happy that folks helped participate here.. I feel clearer about the boundaries and am glad we can have a tie-in for those who can setup a late sampling arch.

We do owe ourselves a combined wiki on late sampling though, as we do on other things. It is a bit difficult to hold the context of the various efforts together, so a volunteer to harvest some of this thread and others about late sampling into a design doc would be welcome https://github.com/openzipkin/openzipkin.github.io/wiki/Designs

@mchandramouli
Copy link

mchandramouli commented Mar 11, 2020

Automatic tiering is nevertheless an interesting idea sites can consider... maybe this feature can help build tension needed to explore that. Meanwhile, they have a lot of options that are easy as pie.
What I'd recommend in this change is to be conscious that something like this could help train automation, the automation itself could be the POST endpoint here!

+1

I think this can lead to an "adaptive tiering/sampling" - agree on taking that to a different discussion and limit this to the change @drolando has started

On the perma link - one other thought (again could be another discussion) is that have Zipkin lens search both live and archive cluster (beyond live cluster time range) so archived traces can appear in the same UI

@codefromthecrypt
Copy link
Member

thx @mchandramouli filed #3023 for follow-up on client-side proxying (scatter gather)

It occurs to me it is important to do the tagging here on POST. We should also make the archive url conditional on no root span tag of the same. Ex if we use the tag zipkin.archive, do not offer to archive it again :)

@drolando drolando force-pushed the add_archive_trace_button branch 4 times, most recently from 6299661 to f0b5155 Compare March 15, 2020 18:16
@drolando
Copy link
Contributor Author

Tests are now passing! I've also added a zipkin.archived=true tag

@drolando drolando force-pushed the add_archive_trace_button branch from fe60f14 to 19a2697 Compare March 15, 2020 23:16
@drolando
Copy link
Contributor Author

@tacigar @anuraaga I've tried using react-alerts to display nicer popups than just bare alerts with little effort. I saw that material-ui also has an Alert component but it seems much more involved to use. Let me know if you're fine with adding an extra JS dependency or not.

Screenshot 2020-03-15 15 52 15

Screenshot 2020-03-15 15 51 56

Unfortunately I couldn't convince the alert to properly show the link. Adding an <a href="..."> ... </a> just resulted in the <a> tag being printed as well as it's not recognized as html...

@tacigar
Copy link
Member

tacigar commented Mar 16, 2020

@drolando
I know material-ui's snackbar is difficult to use by itself, but it is relatively easy when used with notistack (you no longer need to write Snackbar component directly! notistack uses material-ui's Snackbar internally).
I'm not familiar with react-alert, but as far as I see a diff in your code, the usage of react-alert and notistack seems to be similar.
https://material-ui.com/components/snackbars/#notistack

IMHO, basically, if the css framework provides components that provide the same function, I think that using it will enhance the unity of the design.
But if you choose react-alert for reasons other than complexity of usage, that's fine for me :)

@drolando
Copy link
Contributor Author

@tacigar updated PR to use notistack. It even works better then react-alerts because it properly resizes to fit the text without me having to do any hack.

The only thing that confuses me a bit is that alert function. If I define it outside of archiveClick and pass it as "argument" in the [] at the end of the callback, npm complains with

The 'alert' function makes the dependencies of useCallback Hook (at line 188) change on every render. Move it inside the useCallback callback. Alternatively, wrap the 'alert' definition into its own useCallback() Hook

so I just moved it inside...

This button lets you easily reupload the current trace to a different
server.

The main motivation for having this is that you can have the
archival server have a very long retention period and use it as very
long term storage for traces that you care about. For example when
sharing a trace in a jira ticket since otherwise the link would expire
after 1 week.

Design doc explaining the reason behind this in more details and why we
went with this implementation: https://github.com/openzipkin/openzipkin.github.io/wiki/Favorite-trace
@drolando drolando force-pushed the add_archive_trace_button branch from 420d2fb to 5ccbcb4 Compare March 19, 2020 04:31
@drolando
Copy link
Contributor Author

@anuraaga Rebased on master and moved the notistack provider to App.jsx

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just one small point

@drolando drolando force-pushed the add_archive_trace_button branch from 5ccbcb4 to 59b8eb6 Compare March 19, 2020 04:51
@drolando
Copy link
Contributor Author

New popup UI using notistack:

image

image

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small things, looks legit!

zipkin-lens/src/components/App/App.jsx Show resolved Hide resolved
zipkin-lens/src/translations/es/messages.json Outdated Show resolved Hide resolved
@@ -36,6 +36,7 @@
"Trace ID": "",
"Upload JSON": "",
"View Logs": "",
"Archive Trace": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uckyk do you mind offering some chinese translation text here?

dependency.lowErrorRate | zipkin.ui.dependency.low-error-rate | The rate of error calls on a dependency link that turns it yellow. Defaults to 0.5 (50%) set to >1 to disable.
dependency.highErrorRate | zipkin.ui.dependency.high-error-rate | The rate of error calls on a dependency link that turns it red. Defaults to 0.75 (75%) set to >1 to disable.
basePath | zipkin.ui.basepath | path prefix placed into the <base> tag in the UI HTML; useful when running behind a reverse proxy. Default "/zipkin"

To map properties to environment variables, change them to upper-underscore case format. For
example, if using docker you can set `ZIPKIN_UI_QUERY_LIMIT=100` to affect `$.queryLimit` in `/config.json`.

### Trace archival
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@drolando drolando force-pushed the add_archive_trace_button branch from 59b8eb6 to b68c514 Compare March 19, 2020 05:36
</Provider>
)}
</UiConfigConsumer>
// Snackbar is used to provide popup alerts to the user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet

@drolando drolando force-pushed the add_archive_trace_button branch from b68c514 to 902f926 Compare March 23, 2020 04:49
@anuraaga anuraaga merged commit 1a8e40a into openzipkin:master Mar 23, 2020
@anuraaga
Copy link
Contributor

Thanks a lot @drolando!

@tacigar
Copy link
Member

tacigar commented Mar 23, 2020

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants