Site specific Tags

Background

Currently Zipkin supports users to search for traces using different parameters including special tags such as a services names and the corresponding spans and annotation query. Even though site specific tags can be filtered via free text annotation query, this approach has some disadvantages such as:

Users has to remember all the combinations
Users possibly enter wrong query

References

https://github.com/openzipkin/zipkin/issues/2236
https://github.com/openzipkin/zipkin/pull/2309
Netflix uses site specific tags to query the traces, filter and has developed an internal UI to support that.
Infostellar has tags like antenna-id, satellite-id, etc. to query and filter traces
LINE uses a site specific tag called "Phase" to filter traces from different environments
SoundCloud illustrates how they enrich spans with environment specific tags
IMON which was the previous version of Lens made by LINE used site-specific tag to guide their users to search in a correct way. Example was ‘phase’ to encourage people to not filter through non-prod traces!

Goal

Zipkin should allow users to filter traces by tags dynamically.
The UI should have an option to select the tag key and their corresponding values.
The tag {key,value} pair should be applied to filter traces

Design

One option is to return the list of key pairs broken down by service.

    GET /api/v2/tags
    ["service": { "environment": "beta"}, "service2"  { "environment": "omicron"}]

As Adrian noted

Note that breaking down by service has pros and cons. Pro is that a very chatty service who abuses service names can be disabled (though that could also be done on the back end). Con is that there will be a lot of repetition, also a con is that pre-defined tag name/values will be a little more work to break down by service. Ex some sites may choose to limit the predefined key/value obviating a special api for it.

Another option is to design like our existing {service, span_name} pair. Maintaining a separate {k,v} pair which is relatively simpler and can be maintained in a separate table or index depending on the storage.

    GET /api/v2/tagKeys

    {"environment": "Environment", "threat.level": "Maturity"}

    GET /api/v2/tagValues?key=environment
    ["alpha", "beta", "omnicron"]

In order to avoid duplicate documents in Elasticsearch, the document id will be a combination of tag id: key+value. In Cassandra it will be PRIMARY KEY (key, value).

Cardinality

One of the repeated thought that arose during the discussions was about Cardinality of tags. For some sites, the tags cardinality will be high unlike the fixed tags or environment tags. Example: Antenna-Id. But it is a known thing and we want to keep the design simple to understand and get feedback from the actual site users.

Whitelisting

Storage should provide a way to whitelist known static tags from indexing potentially through environment variable zipkin.storage.tags. For example {environment: production} is a redundant information which will be costlier to process for every span.

UI should provide a way to configure these static tags to appear in the filter preferably in config.json so that there is no need to make a network request for these known tags.

Caching

The service will provide a mechanism to cache the tag keys similar to service names which will prevent frequent requests to tags API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly