Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsdb: Document that labels.Labels should always be sorted #5880

Closed
gouthamve opened this issue Apr 5, 2017 · 4 comments
Closed

tsdb: Document that labels.Labels should always be sorted #5880

gouthamve opened this issue Apr 5, 2017 · 4 comments

Comments

@gouthamve
Copy link
Member

Currently we {1="2",2="1"} and {2="1",1="2"} are being treated as different series. This is true for headBlock and I could not find if we are sorting and merging series while persisting.

If this is true, it spreads the same series across different locations. Now while this may look like we are increasing our write throughput, we are never appending to the same series concurrently. This is a definite hit to the query performance as we need to hit different places for the same series.

I am pretty sure this is intentional but not able to understand the rationale behind this.

@fabxc
Copy link
Contributor

fabxc commented Apr 6, 2017

Comparing label sets is rather expensive considering how often we are doing it. Generally it either requires a nested loop which is O(n^2) or building a hashmap of the first label set and then comparing the second against it (allocations, which are even worse).
If they are sorted, we can linearly walk through them. It's the same pattern as we use for intersecting postings, merging SeriesSets, etc.

The system invariant is that label sets are always sorted. The must always be created through labels.New(). Thus, the same label set with different ordering simply does not exist in the system. If it does, it's a bug.

Lists of labels (or lists of series) are also always sorted when retrieved from the Querier. That's the invariant needed for merging of query results from multiple blocks to work.
When blocks are persisted, all series are sorted first. In the head block, where they come in in random order, we sort them after intersecting postings.

Does that make sense?

@gouthamve
Copy link
Member Author

Hmm, it would be good if that invariant was specified somewhere. Will add a comment in my next PR.

As for if Prometheus is doing it, it is actually sorting and returning: ingest and then here we are actually sorting them: textparse

Interesting why we are sorting from [1:] though.

@fabxc
Copy link
Contributor

fabxc commented Apr 9, 2017

Yea, should be documented at the outer interfaces.

The [1:] happens because the first label is the metric name (__name__) and because of the valid characters for label names in Prometheus, we know that it will remain the first one even after sorting. Haven't checked whether it's a relevant optimization but the scope seems small enough that it's not too confusing.

@gouthamve gouthamve changed the title [Q] Why is the same set of labels in different order considered as different series? Document that labels.Labels should always be sorted Apr 9, 2017
@fabxc fabxc closed this as completed Jul 5, 2017
@fabxc fabxc reopened this Jul 5, 2017
@bwplotka bwplotka changed the title Document that labels.Labels should always be sorted tsdb: Document that labels.Labels should always be sorted Aug 13, 2019
@bwplotka bwplotka transferred this issue from prometheus-junkyard/tsdb Aug 13, 2019
@cstyan
Copy link
Member

cstyan commented Sep 20, 2019

@gouthamve is this still relevant? did you want to document it somewhere in a markdown file or this enough: https://github.com/prometheus/prometheus/blob/master/tsdb/labels/labels.go#L35-L37

@lock lock bot locked and limited conversation to collaborators Mar 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants