Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tombstone cli and mask store series using tombstones #4790

Closed
wants to merge 14 commits into from

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Oct 18, 2021

Signed-off-by: yeya24 [email protected]

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

This pr continues the previous tombstones related work. It includes two prs:

Verification

TODO:
Unit tests and E2E tests TBD.

@yeya24 yeya24 force-pushed the tombstones branch 3 times, most recently from d8a7b73 to 689ceb1 Compare October 18, 2021 07:26
if err := bkt.Iter(ctx, TombstoneDir, func(name string) error {
tombstoneFilename := path.Join("", name)
tombstoneFile, err := bkt.Get(ctx, tombstoneFilename)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check whether it is 404 or something else?

pkg/tombstone/tombstone.go Outdated Show resolved Hide resolved
pkg/tombstone/tombstone.go Outdated Show resolved Hide resolved
Matchers: matchers,
MinTime: minTime,
MaxTime: maxTime,
CreationTime: timestamp.FromTime(time.Now()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mocking in tests you'll probably want to pass this as a parameter instead so that you could mock it

@@ -1443,6 +1475,13 @@ func (s *BucketStore) LabelValues(ctx context.Context, req *storepb.LabelValuesR
}, nil
}

func (s *BucketStore) SyncTombstones(ctx context.Context) (err error) {
s.tombstonesMtx.Lock()
s.tombstones, err = tombstone.ReadTombstones(ctx, s.bkt, s.logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could hold the lock for a long time. I suggest retrieving first and then locking -> switching -> unlocking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be better. I will add ReadTombstones as part of the BucketStore method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I misunderstood your suggestion when I first commented on this 🤣 I think I just applied what you suggested. Thanks

pkg/store/bucket.go Outdated Show resolved Hide resolved
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are still missing https://github.com/thanos-io/thanos/pull/3075/files#r477281247 to avoid unnecessary postings lookups? ;P

@yeya24
Copy link
Contributor Author

yeya24 commented Oct 18, 2021

I think we are still missing https://github.com/thanos-io/thanos/pull/3075/files#r477281247 to avoid unnecessary postings lookups? ;P

I am not sure but I guess what I am doing now is what Bartek meant in https://github.com/thanos-io/thanos/pull/3075/files#r477281247?

Now for tombstones matchers, we don't expand postings. Instead we just match tombstone matchers with each series labels we get.

if val := lset.Get(matcher.Name); val != "" {
if matcher.Matches(val) {
if skipChunks {
continue PostingsLoop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about the case of SkipChunks. If the series matches tombstones, do we just skip that series labels? What about cases of LabelNames and LabelValues?
I think Prometheus doesn't care about these and tombstones only affect non-metadata queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add an e2e test for this case? 🤔

CreationTime int64 `json:"creationTime"`
Author string `json:"author"`
Reason string `json:"reason"`
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also question about the tombstone file. How can we support multi-tenancy with the current format?
I guess we want to load some tombstones only for some blocks or tenants. Then does it make sense to add external labels here or should we add external labels to matchers here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for the 1st option 👍 if none are specified then the tombstone could apply to all blocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use labels to represent the external labels (tenants) for each tombstone then it doesn't support matching two tenants. We can either match 1 tenant or all tenants (specify no labels).

@yeya24
Copy link
Contributor Author

yeya24 commented Nov 4, 2021

I am still thinking about per block tombstones.
For global tombstones, when should we clean up the data in each block and delete the tombstones?

After #4620 is done, there is a block API to do action per block level. In this case, users can easily add & modify tombstones directly in the UI per block level. WDYT?

@stale
Copy link

stale bot commented Jan 9, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jan 9, 2022
@stale stale bot closed this Mar 2, 2022
@yeya24
Copy link
Contributor Author

yeya24 commented Jun 11, 2022

Think about the tombstone approach a little bit more.
Initially, I think it would be super expensive for the compactor to check series for every blocks using global tombstone files. As the compactor doesn't download blocks indexes at start, it has to download all the block index locally to check if the block has series to be rewritten, even if the block doesn't match the series marchers at all. That's why I think per block tombstone makes more sense cuz it is clear what blocks to rewrite.

But per block tombstone has its issues as well like UX, scalability, etc. To overcome the issue from the global tombstone approach, I'm wondering if we can have another gRPC API on store gateway to get the block IDs that matches specific matchers. Then the compactor can leverage this API to get the index information and perform series deletion.

@yeya24 yeya24 reopened this Jun 11, 2022
@stale stale bot removed the stale label Jun 11, 2022
@yeya24 yeya24 marked this pull request as ready for review June 16, 2022 00:44
cmd/thanos/tools_bucket.go Outdated Show resolved Hide resolved
pkg/compact/compact.go Outdated Show resolved Hide resolved
pkg/compact/compact.go Outdated Show resolved Hide resolved
pkg/compact/compact.go Outdated Show resolved Hide resolved
@yeya24 yeya24 marked this pull request as draft June 18, 2022 09:19
syncs prometheus.Counter
g singleflight.Group

mtx sync.Mutex
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U1000: field mtx is unused

Reply with "@sonatype-lift help" for more info.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

pkg/tombstone/fetcher.go Outdated Show resolved Hide resolved
import (
"context"
"encoding/json"
"github.com/golang/groupcache/singleflight"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports: File is not goimports-ed

Reply with "@sonatype-lift help" for more info.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

cmd/thanos/store.go Outdated Show resolved Hide resolved
}

type Fetcher struct {
bkt objstore.InstrumentedBucketReader
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structcheck: bkt is unused

Reply with "@sonatype-lift help" for more info.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

pkg/tombstone/fetcher.go Outdated Show resolved Hide resolved
@@ -153,6 +155,13 @@ type bucketMarkBlockConfig struct {
blockIDs []string
}

type bucketDeleteConfig struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we need to rename it because delete command might be ambiguous, whether it deletes series or deletes the whole block.
Maybe we can rename it to delete-series or create-tombstone, as we may need undelete-series or remove-tombstone in the future...

@@ -123,6 +124,12 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage.").
Default("32").IntVar(&sc.blockMetaFetchConcurrency)

cmd.Flag("sync-tombstone-duration", "Repeat interval for syncing the tombstones between local and remote view.").
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we need another flag to enable/disable tombstone masking.

pkg/tombstone/fetcher.go Outdated Show resolved Hide resolved
pkg/tombstone/fetcher.go Outdated Show resolved Hide resolved
pkg/tombstone/fetcher.go Outdated Show resolved Hide resolved
pkg/tombstone/fetcher.go Outdated Show resolved Hide resolved
Ben Ye added 2 commits June 19, 2022 14:01
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
tombstoneIntervals = tombstoneIntervals.Add(promtombstones.Interval{Mint: ts.MinTime, Maxt: ts.MaxTime})
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering which way is better. Now we are matching tombstones at the query time, which is online processing. Another way is to build an in-memory Prometheus tombstone for each matched block when we load tombstones each time. Since block contents are immutable, we can cache the in-mem tombstone results each time based on the tombstone ids.
Ideally, the second option is better in performance. But happy to hear more ideas.

pkg/store/bucket_test.go Outdated Show resolved Hide resolved
Signed-off-by: Ben Ye <[email protected]>
@yeya24 yeya24 marked this pull request as ready for review June 20, 2022 23:09
Signed-off-by: Ben Ye <[email protected]>
return bkt.Upload(ctx, tsPath, bytes.NewBuffer(b))
}

// RemoveTombstone removes the tombstone from object storage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godot: Comment should end in a period

Reply with "@sonatype-lift help" for more info.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

}
}
for _, ts := range m.tombstones {
ts.Iter(func(id storage.SeriesRef, ivs tombstones.Intervals) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errcheck: Error return value of ts.Iter is not checked

Reply with "@sonatype-lift help" for more info.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@stale
Copy link

stale bot commented Sep 21, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 21, 2022
@stale stale bot closed this Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants