Add Archive Trace button #3018

drolando · 2020-03-11T03:08:13Z

This button lets you easily reupload the current trace to a different
server.

The main motivation for having this is that you can have the
archival server have a very long retention period and use it as very
long term storage for traces that you care about. For example when
sharing a trace in a jira ticket since otherwise the link would expire
after 1 week.

Design doc explaining the reason behind this in more details and why we
went with this implementation: https://github.com/openzipkin/openzipkin.github.io/wiki/Favorite-trace

drolando · 2020-03-11T03:08:58Z

I still need to figure out how translations work, but at least this is now out for people to comment on.

codefromthecrypt

some bike shed a little..

archiveWriteUrl -> archivePostUrl (as it is a POST)
archiveReadUrl -> archiveUrl (shorter.. emphasizing it is a web link not a raw json url)

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx

zipkin-server/README.md

mchandramouli · 2020-03-11T04:19:58Z

@drolando @adriancole - this might be a bit late. I read the need @ https://github.com/openzipkin/openzipkin.github.io/wiki/Favorite-trace. Instead of a second cluster, one can have a second storage with higher retention and do scatter/gather on search. When someone does an explicit search for a traceid, move the trace to the second storage in the background. This allows the trace to still show up in the cluster and be available longer. Not sure if zipkin allows multiple storage DBs and scatter / gather on search - if it does, this might be easier than a second archive cluster from user perspective as saved links in jira or issues will continue to work as is. This is what was being done when I last worked with Haystack team

codefromthecrypt · 2020-03-11T06:28:14Z

@mchandramouli thanks for the context. I think for some setups an internal tiering could work.

There is additional configuration complexity to achieve that.. we don't have anything that could do that internally, especially as TTL is implemented differently (if defined at all). For example, in cassandra it is literally copying the rows again (to update the TTL). Also, much of the configuration would double to achieve the goal.. I think we've talked about this tension in the past. It gets more complex when storage is different schema or tech per tier. These decisions are mostly backend I think anyway, and this change is all frontend.

It is a very good point on readback/permalink, we do have a change here to allow a different URL. However, there's no requirement to use one. Ex yelp did what you said with scatter gather in the past https://github.com/drolando/zipkin-mux. We don't have to use a separate URL space for things explicitly saved, or otherwise tiered to a longer TTL. If we mount the UI assets so its /zipkin path is mounted against a proxy api, it will never know the difference.

Stepping back, this change is focused on is users being explicitly able to choose traces, which is possibly higher signal of interest than queries, albeit manually trained. Our UI saves data from the list screen to the trace detail. A trace id-specific api call never occurs in this flow, unless the user hit the trace link directly. In other words, a user who is discovering a trace from a list will show no side-effects to the backend that they are looking at one trace. Basically you'd have to assume that any queried data with no parameters except time imply interest, then push all those a tier down. I think this is something that could require tuning to avoid chattiness, because I would suspect many just hit refresh a lot. However, I agree even the wider net would be significantly smaller than all traces: time-based is better than pushing all traces a tier back.

Automatic tiering is nevertheless an interesting idea sites can consider... maybe this feature can help build tension needed to explore that. Meanwhile, they have a lot of options that are easy as pie.

In simplest case you can imagine someone who is already using our in-memory server, and wants to save off traces to ES or even a cloud provider (many accept zipkin format now). In the case of cassandra, re-posting to another cluster is similar impact to overwriting the same rows with a new TTL. TL;DR; In this design there's no constraint in which technologies are in use, hence very big bank for the buck.

I'm very happy to hear about the story you mentioned though.. it is no doubt helpful for people to know existing practice! thanks!

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx

jcchavezs · 2020-03-11T07:05:52Z

My 2p: So if I am not wrong, this traces will be uploaded in a different instance of zipkin (that is always the case, otherwise no point) hence there is no way to mix normal spans from the archived ones, isn't? That said, is the tag still needed? If the tag is needed for a visibility reason (e.g. lens is going to show an archived label in the trace view to make sure you don't accidentally confuse with a normal trace) then this could make sense. I would still go for `zipkin.archived` trace to make sure we don't mess up with other traces. Letting imagination flight, it could be interesting to have a `zipkin.archive.reason` tag to allow the user to specify the reason for this trace to be interesting. Finally I would like to think loud, since the archived trace is just another zipkin instance, that's it make sense to add a link to archived zipkin UI? Or even beyond that, maybe we can just use the normal instance's lens to call the archived API? Something like myzipkin:9411/zipkin/search/srchived/?blah=blah. In any case I am in for giving visibility to archived traces in the day-to-day zipkin UI.

…

On Wed, 11 Mar 2020, 07:32 Tommy Ludwig, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx <#3018 (comment)>: > + // We don't store the raw json in the browser yet, so we need to make an + // HTTP call to retrieve it again. + fetch(`${api.TRACE}/${traceSummary.traceId}`) + .then((response) => { + if (!response.ok) { + throw new Error('Failed to fetch trace from backend'); + } + return response.json(); + }) + .then((json) => { + fetch(archiveWriteUrl, { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + }, + body: JSON.stringify(json), Sounds reasonable to me. I wonder if a user might have an archived tag for their own purposes separate from this use case. If so, perhaps we should prefix the tag with zipkin or such so it is clear this is a sort of internal tag not part of user data. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#3018?email_source=notifications&email_token=AAXOYASWLDQWSVZEL6WUUKTRG4WA3A5CNFSM4LFMPUCKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCYZ654I#discussion_r390768066>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXOYAW4223UTXYZFH77Z63RG4WA3ANCNFSM4LFMPUCA> .

codefromthecrypt · 2020-03-11T07:27:06Z

@jcchavezs I think we did discuss having a reason at some point.. ex someone can put a JIRA ticket in the tag value. Good idea. basically someone can enter a choice of that tag value or nothing to just save empty.

On the other topic, I wouldn't assume it is definitely a zipkin server that's being archived to. It could be any APM, or even haystack. We can't really assume we know the link syntax, but that's what the POST-back is there for if it can work. Same as logs-url, if someone can express a query, they can populate the URL.

I'd recommend stopping there, short of bookmark service though. We already overloaded screen issues (ex #3014) so suggest we don't do too much adding of more links until there's more practice to show they are helpful. my 2p

jorgheymans · 2020-03-11T07:35:42Z

Mentioning #1747 and #1093 as prior discussions of this feature, there may be more.

codefromthecrypt · 2020-03-11T14:33:09Z

a shower helped me recognize why we can easily branch discussion into automated escalation of traces to another tier, when looking at this. The reason I think is that this change can feel very much like user-assisted late (after-the-fact or tail) sampling.

Ex for the same reasons @mchandramouli mentioned. @jeqo and others have discussed how automated late sampling can occur, ex by storing queries to the api. This led to a design idea that a collector tier component would actually need to consider the entire trace. We had a couple initiatives to play with this including the VoltDB experiment and a later more fleshed out Kafka Streams approach.

What I'd recommend in this change is to be conscious that something like this could help train automation, the automation itself could be the POST endpoint here! That said, we can probably develop things around tail sampling separately, mentioning this for context and visa versa. I'm really happy that folks helped participate here.. I feel clearer about the boundaries and am glad we can have a tie-in for those who can setup a late sampling arch.

We do owe ourselves a combined wiki on late sampling though, as we do on other things. It is a bit difficult to hold the context of the various efforts together, so a volunteer to harvest some of this thread and others about late sampling into a design doc would be welcome https://github.com/openzipkin/openzipkin.github.io/wiki/Designs

mchandramouli · 2020-03-11T16:26:46Z

Automatic tiering is nevertheless an interesting idea sites can consider... maybe this feature can help build tension needed to explore that. Meanwhile, they have a lot of options that are easy as pie.
What I'd recommend in this change is to be conscious that something like this could help train automation, the automation itself could be the POST endpoint here!

+1

I think this can lead to an "adaptive tiering/sampling" - agree on taking that to a different discussion and limit this to the change @drolando has started

On the perma link - one other thought (again could be another discussion) is that have Zipkin lens search both live and archive cluster (beyond live cluster time range) so archived traces can appear in the same UI

codefromthecrypt · 2020-03-12T00:18:08Z

thx @mchandramouli filed #3023 for follow-up on client-side proxying (scatter gather)

It occurs to me it is important to do the tagging here on POST. We should also make the archive url conditional on no root span tag of the same. Ex if we use the tag zipkin.archive, do not offer to archive it again :)

drolando · 2020-03-15T19:30:54Z

Tests are now passing! I've also added a zipkin.archived=true tag

drolando · 2020-03-15T23:18:08Z

@tacigar @anuraaga I've tried using react-alerts to display nicer popups than just bare alerts with little effort. I saw that material-ui also has an Alert component but it seems much more involved to use. Let me know if you're fine with adding an extra JS dependency or not.

Unfortunately I couldn't convince the alert to properly show the link. Adding an <a href="..."> ... </a> just resulted in the <a> tag being printed as well as it's not recognized as html...

tacigar · 2020-03-16T00:59:21Z

@drolando
I know material-ui's snackbar is difficult to use by itself, but it is relatively easy when used with notistack (you no longer need to write Snackbar component directly! notistack uses material-ui's Snackbar internally).
I'm not familiar with react-alert, but as far as I see a diff in your code, the usage of react-alert and notistack seems to be similar.
https://material-ui.com/components/snackbars/#notistack

IMHO, basically, if the css framework provides components that provide the same function, I think that using it will enhance the unity of the design.
But if you choose react-alert for reasons other than complexity of usage, that's fine for me :)

drolando · 2020-03-18T04:36:20Z

@tacigar updated PR to use notistack. It even works better then react-alerts because it properly resizes to fit the text without me having to do any hack.

The only thing that confuses me a bit is that alert function. If I define it outside of archiveClick and pass it as "argument" in the [] at the end of the callback, npm complains with

The 'alert' function makes the dependencies of useCallback Hook (at line 188) change on every render. Move it inside the useCallback callback. Alternatively, wrap the 'alert' definition into its own useCallback() Hook

so I just moved it inside...

This button lets you easily reupload the current trace to a different server. The main motivation for having this is that you can have the archival server have a very long retention period and use it as very long term storage for traces that you care about. For example when sharing a trace in a jira ticket since otherwise the link would expire after 1 week. Design doc explaining the reason behind this in more details and why we went with this implementation: https://github.com/openzipkin/openzipkin.github.io/wiki/Favorite-trace

drolando · 2020-03-19T04:32:05Z

@anuraaga Rebased on master and moved the notistack provider to App.jsx

anuraaga

Thanks! Just one small point

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx

drolando · 2020-03-19T04:54:25Z

New popup UI using notistack:

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx

codefromthecrypt

small things, looks legit!

zipkin-lens/src/components/App/App.jsx

zipkin-lens/src/translations/es/messages.json

codefromthecrypt · 2020-03-19T05:04:35Z

zipkin-lens/src/translations/zh-cn/messages.json

@@ -36,6 +36,7 @@
  "Trace ID": "",
  "Upload JSON": "",
  "View Logs": "",
+  "Archive Trace": "",


@uckyk do you mind offering some chinese translation text here?

codefromthecrypt · 2020-03-19T05:05:10Z

zipkin-server/README.md

 dependency.lowErrorRate | zipkin.ui.dependency.low-error-rate | The rate of error calls on a dependency link that turns it yellow. Defaults to 0.5 (50%) set to >1 to disable.
 dependency.highErrorRate | zipkin.ui.dependency.high-error-rate | The rate of error calls on a dependency link that turns it red. Defaults to 0.75 (75%) set to >1 to disable.
 basePath | zipkin.ui.basepath | path prefix placed into the <base> tag in the UI HTML; useful when running behind a reverse proxy. Default "/zipkin"

 To map properties to environment variables, change them to upper-underscore case format. For
 example, if using docker you can set `ZIPKIN_UI_QUERY_LIMIT=100` to affect `$.queryLimit` in `/config.json`.

+### Trace archival


looks great

zipkin-lens/src/zipkin/trace.js

codefromthecrypt · 2020-03-19T05:40:54Z

zipkin-lens/src/components/App/App.jsx

-                </Provider>
-              )}
-            </UiConfigConsumer>
+            // Snackbar is used to provide popup alerts to the user


zipkin-lens/src/components/App/App.jsx

zipkin-lens/src/translations/es/messages.json

anuraaga · 2020-03-23T05:26:03Z

Thanks a lot @drolando!

tacigar · 2020-03-23T09:04:58Z

Thank you!

codefromthecrypt reviewed Mar 11, 2020

View reviewed changes

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx Outdated Show resolved Hide resolved

zipkin-server/README.md Outdated Show resolved Hide resolved

zipkin-server/README.md Outdated Show resolved Hide resolved

codefromthecrypt reviewed Mar 11, 2020

View reviewed changes

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx Show resolved Hide resolved

codefromthecrypt mentioned this pull request Mar 12, 2020

Should lens fan-out queries to the "archive" zipkin #3023

Closed

drolando force-pushed the add_archive_trace_button branch 4 times, most recently from 6299661 to f0b5155 Compare March 15, 2020 18:16

drolando force-pushed the add_archive_trace_button branch from fe60f14 to 19a2697 Compare March 15, 2020 23:16

drolando force-pushed the add_archive_trace_button branch from 420d2fb to 5ccbcb4 Compare March 19, 2020 04:31

anuraaga reviewed Mar 19, 2020

View reviewed changes

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx Outdated Show resolved Hide resolved

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx Outdated Show resolved Hide resolved

tacigar reviewed Mar 19, 2020

View reviewed changes

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx Outdated Show resolved Hide resolved

drolando force-pushed the add_archive_trace_button branch from 5ccbcb4 to 59b8eb6 Compare March 19, 2020 04:51

anuraaga approved these changes Mar 19, 2020

View reviewed changes

zipkin-lens/src/components/TracePage/TraceSummaryHeader.jsx Show resolved Hide resolved

codefromthecrypt approved these changes Mar 19, 2020

View reviewed changes

tacigar reviewed Mar 19, 2020

View reviewed changes

zipkin-lens/src/zipkin/trace.js Outdated Show resolved Hide resolved

drolando force-pushed the add_archive_trace_button branch from 59b8eb6 to b68c514 Compare March 19, 2020 05:36

codefromthecrypt reviewed Mar 19, 2020

View reviewed changes

tacigar reviewed Mar 19, 2020

View reviewed changes

zipkin-lens/src/components/App/App.jsx Outdated Show resolved Hide resolved

jeqo reviewed Mar 19, 2020

View reviewed changes

zipkin-lens/src/translations/es/messages.json Outdated Show resolved Hide resolved

Use notistack to show alerts

902f926

drolando force-pushed the add_archive_trace_button branch from b68c514 to 902f926 Compare March 23, 2020 04:49

anuraaga mentioned this pull request Mar 23, 2020

Change Download JSON button to be a normal link with href. #3040

Merged

anuraaga merged commit 1a8e40a into openzipkin:master Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Archive Trace button #3018

Add Archive Trace button #3018

drolando commented Mar 11, 2020

drolando commented Mar 11, 2020

codefromthecrypt left a comment

mchandramouli commented Mar 11, 2020

codefromthecrypt commented Mar 11, 2020

jcchavezs commented Mar 11, 2020 via email

codefromthecrypt commented Mar 11, 2020

jorgheymans commented Mar 11, 2020

codefromthecrypt commented Mar 11, 2020

mchandramouli commented Mar 11, 2020 •

edited

Loading

codefromthecrypt commented Mar 12, 2020

drolando commented Mar 15, 2020

drolando commented Mar 15, 2020

tacigar commented Mar 16, 2020 •

edited

Loading

drolando commented Mar 18, 2020

drolando commented Mar 19, 2020

anuraaga left a comment

drolando commented Mar 19, 2020

codefromthecrypt left a comment

codefromthecrypt Mar 19, 2020

codefromthecrypt Mar 19, 2020

codefromthecrypt Mar 19, 2020

anuraaga commented Mar 23, 2020

tacigar commented Mar 23, 2020

Add Archive Trace button #3018

Add Archive Trace button #3018

Conversation

drolando commented Mar 11, 2020

drolando commented Mar 11, 2020

codefromthecrypt left a comment

Choose a reason for hiding this comment

mchandramouli commented Mar 11, 2020

codefromthecrypt commented Mar 11, 2020

jcchavezs commented Mar 11, 2020 via email

codefromthecrypt commented Mar 11, 2020

jorgheymans commented Mar 11, 2020

codefromthecrypt commented Mar 11, 2020

mchandramouli commented Mar 11, 2020 • edited Loading

codefromthecrypt commented Mar 12, 2020

drolando commented Mar 15, 2020

drolando commented Mar 15, 2020

tacigar commented Mar 16, 2020 • edited Loading

drolando commented Mar 18, 2020

drolando commented Mar 19, 2020

anuraaga left a comment

Choose a reason for hiding this comment

drolando commented Mar 19, 2020

codefromthecrypt left a comment

Choose a reason for hiding this comment

codefromthecrypt Mar 19, 2020

Choose a reason for hiding this comment

codefromthecrypt Mar 19, 2020

Choose a reason for hiding this comment

codefromthecrypt Mar 19, 2020

Choose a reason for hiding this comment

anuraaga commented Mar 23, 2020

tacigar commented Mar 23, 2020

mchandramouli commented Mar 11, 2020 •

edited

Loading

tacigar commented Mar 16, 2020 •

edited

Loading