Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabelNames API with matchers #9083

Merged
merged 20 commits into from
Jul 20, 2021
Merged

LabelNames API with matchers #9083

merged 20 commits into from
Jul 20, 2021

Conversation

colega
Copy link
Contributor

@colega colega commented Jul 13, 2021

This modifies the LabelQuerier.LabelNames() method adding the matchers to it, similar to it's LabelValues counterpart.

Previously, the matchers for LabelNames were supported only on the HTTP API surface, then the HTTP API implementation made use of the Select() call with special hints to tell TSDB to skip the samples.

This however is hacky and the implementation divergence causes issues in other projects depending on this, like cortex: cortexproject/cortex#3658

This new approach does what one would expect to: call the LabelMatchers providing the matchers, and let the TSDB internals decide how to get that data. As a side effect, we're saving lots of resources, especially on deployments where there are tons of series with different label values but same label names (I'd say this is a common scenario in a cloud-native world, isn't it?)

The work was initiated by @tomwilkie and I finished some details.

See more comments on the PR for thoughts.


func (r blockIndexReader) LabelNames(matchers ...*labels.Matcher) ([]string, error) {
if len(matchers) == 0 {
return r.b.LabelNames()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls Block.LabelNames which calls the index.Reader.LabelNames and it's the only call there and it has no matchers. You'll find a TODO later to implement matchers there, but actually we should just move the matchers evaluation from here to index.Reader.LabelNames.

I would love to do that by moving labelNamesWithMatchers to index.LabelNamesWithMatchers, however, that's a bigger refactor since we'd need to move the PostingsForMatchers and even the IndexReader interface there too. I also think that it makes sense, but it's just out of the scope of this PR.

@@ -479,6 +487,12 @@ func (r blockIndexReader) LabelValueFor(id uint64, label string) (string, error)
return r.ir.LabelValueFor(id, label)
}

// LabelNamesFor returns all the label names for the series referred to by IDs.
// The names returned are sorted.
func (r blockIndexReader) LabelNamesFor(ids ...uint64) ([]string, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to implement this here feels cumbersome to me, I feel that the interface we're implementing it shouldn't force us to do that, i.e. it should be split into two. But I don't have enough knowledge of the packet yet to have a better picture.

labelNames := h.head.postings.LabelNames()
h.head.symMtx.RUnlock()
if len(matchers) == 0 {
h.head.symMtx.RLock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this but I can't see why it's needed: h.head.postings is a MemPostings, and MemPostings.LabelNames locks itself, same happens for LabelValues above. the symMtx seems to be a mutex to be used to acces h.head.symbols but we're not accessing that in any way here so...

🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was some race somewhere, but cant remember. Might as well have it like before :)

if err != nil {
return nil, storage.ErrNotFound
}
for _, off := range offsets {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking first to have an [][]uint32 allOffsets with all offsets slices here and then make an k-way merge using a heap. However, that would mean that allOffsets could potentially be the entire copy of the all the series with their symbols offsets, which seemed unnecessary so I decided to go with a map and keep deduplicating them as we get them.

return nil, err
}

var postings []uint64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be the list of all series in the block, like if the only matcher we pass in matches all the series. But I think we'll save important roundtrips later in r.LabelNamesFor by grabbing them all here.

@@ -1168,6 +1168,7 @@ func (m mockIndex) LabelValues(name string, matchers ...*labels.Matcher) ([]stri
for _, series := range m.series {
for _, matcher := range matchers {
if matcher.Matches(series.l.Get(matcher.Name)) {
// TODO(colega): shouldn't we check all the matchers before adding this to the values?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have N matchers here and we saw that series matches one of them, so we're putting it into values, that's wrong, right?

This is the mock index so 🤷 , but if you confirm, I'll fix it.

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through all the files and the high level approach LGTM, it is similar to what we do for LabelValues. I will take a detailed look next week. Thanks for working on this!

tsdb/querier.go Outdated Show resolved Hide resolved
tomwilkie and others added 12 commits July 16, 2021 08:26
NB This doesn't actually implement it in the index, just plumbs it through for now...

Signed-off-by: Tom Wilkie <[email protected]>
Can't see why do we need to hold a mutex on symbols, and the purpose of
the LabelNamesFor method.

Maybe I'll need to re-add this later.

Signed-off-by: Oleg Zaytsev <[email protected]>
This method provides the label names that appear in the postings
provided. We do that deeper than the label values because we know
beforehand that most of the label names we'll be the same across
different postings, and we don't want to go down an up looking up the
same symbols for all different series.

Signed-off-by: Oleg Zaytsev <[email protected]>
However, I still don't understand why do we need a mutex here.

Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
I still don't see why we need to grab that unrelated mutex, but at least
now we're grabbing it consistently

Signed-off-by: Oleg Zaytsev <[email protected]>
@colega
Copy link
Contributor Author

colega commented Jul 16, 2021

I have force-pushed the branch rebased from the latest master.

@cstyan cstyan removed their request for review July 16, 2021 18:05
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I only have comments about comments and error handling.

Additionally, I don't see any tests cases in api_test.go which is having >1 matcher groups. Since we modified that part, could you please add 1-2 test cases to cover that?

storage/interface.go Outdated Show resolved Hide resolved
labelNames := h.head.postings.LabelNames()
h.head.symMtx.RUnlock()
if len(matchers) == 0 {
h.head.symMtx.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was some race somewhere, but cant remember. Might as well have it like before :)

tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/index/index.go Outdated Show resolved Hide resolved
tsdb/block_test.go Outdated Show resolved Hide resolved
tsdb/head_test.go Show resolved Hide resolved
colega and others added 7 commits July 20, 2021 13:25
Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Ganesh Vernekar <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Ganesh Vernekar <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Ganesh Vernekar <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Ganesh Vernekar <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Ganesh Vernekar <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
@colega
Copy link
Contributor Author

colega commented Jul 20, 2021

Added the testcases: 17c4594

@codesome codesome merged commit b1ed4a0 into prometheus:main Jul 20, 2021
@colega colega deleted the label-name-api-matchers branch July 20, 2021 14:55
colega added a commit to grafana/mimir that referenced this pull request Jul 20, 2021
Updated Prometheus to latest commit that includes changes in the
LabelQuerier.LabelNames method signature:

prometheus/prometheus#9083

Signed-off-by: Oleg Zaytsev <[email protected]>
pstibrany pushed a commit to grafana/mimir that referenced this pull request Jul 23, 2021
* Upgrade prometheus to b1ed4a0a663d0c62526312311c7529471abbc565

Updated Prometheus to latest commit that includes changes in the
LabelQuerier.LabelNames method signature:

prometheus/prometheus#9083

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add matchers to all LabelNames implementations

Signed-off-by: Oleg Zaytsev <[email protected]>

* Pass matchers from distributor to ingester's block tsdb

Signed-off-by: Oleg Zaytsev <[email protected]>

* Ingester.LabelNames(w/matchers) for chunks too

Signed-off-by: Oleg Zaytsev <[email protected]>

* Use the old distributor query path when cfg is disabled

Signed-off-by: Oleg Zaytsev <[email protected]>

* Sort label names in old implementations

Signed-off-by: Oleg Zaytsev <[email protected]>

* BlockStoreQueryable querying both paths

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix pkg/querier/querier_test.go

Signed-off-by: Oleg Zaytsev <[email protected]>

* Use the provided ctx

Signed-off-by: Oleg Zaytsev <[email protected]>

* Make doc

* Test distributorQueryable.LabelNames

And fix missing matchers in the call to Distributor

Signed-off-by: Oleg Zaytsev <[email protected]>

* Test Distributor.LabelNames

Signed-off-by: Oleg Zaytsev <[email protected]>

* Test Ingester.LabelNames with matchers

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add matchers to querier test

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Oleg Zaytsev <[email protected]>

* Move changes to mimir/unreleased changelog section

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add another label to testcase to improve test

If multiple matchers didn't work, we wouldn't realize as the result set
was the same as for a single matcher.

Now the multiple matcher doesn't match the series with status=500 which
has a reason=broken label

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update docs, add `enabled` suffix to the config.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Remove fixme comment and feature flag from blocks_store_queryable

All the StoreGateways (actually, one) are already supporting matchers in
the call so we don't need to use a feature flag on this.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update span name

Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Remove feature flag from blocks_store_queryable

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fixup changelog entry

Signed-off-by: Oleg Zaytsev <[email protected]>

* Remove fingerprint collision testcase

At least until we find a collision between two different label names.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Handle matchers in tenantfederation/mergeQuerier.LabelNames

Signed-off-by: Oleg Zaytsev <[email protected]>

* Opinionated refactor of filterValuesByMatchers

Since there are three options, IMO, a switch is clearer than
if/continue/if/elseif structure.

Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>
colega added a commit to colega/cortex that referenced this pull request Jul 26, 2021
Since prometheus/prometheus#9083 prometheus now
provides matchers to the LabelNames method on the LabelQuerier
interface, so in order to upgrade Prometheus we need to support that.

This partially solves
cortexproject#3658 as now the block
store queryable uses the LabelNames method with matchers.

However, we're still using the ingesters' MetricsForLabelMatchers method
to perform the LabelNames w/matchers call on the distributor. That
change will be tackled separately as it breaks ingester's contracts and
needs to be released with a feature flag to perform a backwards
compatible release.

Signed-off-by: Oleg Zaytsev <[email protected]>
pracucci pushed a commit to cortexproject/cortex that referenced this pull request Jul 26, 2021
* Upgrade Prometheus to LabelNames with matchers

Since prometheus/prometheus#9083 prometheus now
provides matchers to the LabelNames method on the LabelQuerier
interface, so in order to upgrade Prometheus we need to support that.

This partially solves
#3658 as now the block
store queryable uses the LabelNames method with matchers.

However, we're still using the ingesters' MetricsForLabelMatchers method
to perform the LabelNames w/matchers call on the distributor. That
change will be tackled separately as it breaks ingester's contracts and
needs to be released with a feature flag to perform a backwards
compatible release.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Updated CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>
colega added a commit to grafana/loki that referenced this pull request Jul 27, 2021
This bumps prometheus up to the changes from
prometheus/prometheus#9083 and Cortex to
cortexproject/cortex#4380

Signed-off-by: Oleg Zaytsev <[email protected]>
cyriltovena pushed a commit to grafana/loki that referenced this pull request Jul 27, 2021
This bumps prometheus up to the changes from
prometheus/prometheus#9083 and Cortex to
cortexproject/cortex#4380

Signed-off-by: Oleg Zaytsev <[email protected]>
@juliusv juliusv mentioned this pull request Jul 29, 2021
alvinlin123 pushed a commit to ac1214/cortex that referenced this pull request Jan 14, 2022
* Upgrade Prometheus to LabelNames with matchers

Since prometheus/prometheus#9083 prometheus now
provides matchers to the LabelNames method on the LabelQuerier
interface, so in order to upgrade Prometheus we need to support that.

This partially solves
cortexproject#3658 as now the block
store queryable uses the LabelNames method with matchers.

However, we're still using the ingesters' MetricsForLabelMatchers method
to perform the LabelNames w/matchers call on the distributor. That
change will be tackled separately as it breaks ingester's contracts and
needs to be released with a feature flag to perform a backwards
compatible release.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Updated CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Alvin Lin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants