Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Implement vertical query merging and compaction #90

Closed
fabxc opened this issue Jun 6, 2017 · 12 comments
Closed

Implement vertical query merging and compaction #90

fabxc opened this issue Jun 6, 2017 · 12 comments

Comments

@fabxc
Copy link
Contributor

fabxc commented Jun 6, 2017

We should add support to query and compact time-overlapping blocks. Probably no need for sophisticated handling of overlapping chunks.

This is a rather complex one and we have to discuss specifics. But once done, it makes our life for restoring from backups etc. a lot easier.

@gouthamve
Copy link
Collaborator

This should be relatively straightforward from what I can see. For us, BlockReader is an interface and we need to make sure the overlapping blocks turnout to be a single block of that interface.

What we essentially want is I hope clear from the following pictures:
screen shot 2018-05-07 at 1 00 45 pm
screen shot 2018-05-07 at 1 00 54 pm

Now if the same series exists in overlapping blocks, we want to merge the data, but if there exists samples with the same time-stamp but different values, we just drop the "second" (arbitrarily choose one) one. This is okay as it violates the TSDB invariants.

/cc @krasi-georgiev @fabxc

PS: This might take some extra memory, but that is okay given how we don't want to recommend this.

@gouthamve
Copy link
Collaborator

From IRC, @brian-brazil on the semantics:

bulk inserted data wins or the insert fails.

We can do the same thing mentioned above, but, we delete the data that is overlapping in the existing blocks in promtool when inserting data using promtool. Little more work, but we can start with the proposal above, and then when integrating with promtool, implement the deletes.

@bwplotka
Copy link
Contributor

bwplotka commented May 8, 2018

I can see on IRC that the main question about this issues is "why we need that", can we enumerate the use cases, before starting to designing this?

For me this issues is also about the assumptions taken by compaction. Basically, I would love to have compactor be resilient on eventual storage write-read consistency. Why? Because so far it (lack of support for it) produced major release issue for 2.2.0 as well as major compaction issues on Thanos a month ago.

Basically:

  • if compactor sees block for T and for T - 2 it automatically assumes that T-1 is missing for sure.
  • it compacts T and T -2 together. Now when T - 1 is back we have broken state.
    As a result, query is utterly broken (sometimes showing sometimes <T, T-2> or things like that) and further compactions are dropping data around (most likely that T-1 block).

Making compactor aware that overlaping block is a think will basically reduce damage radius if any of these bugs happen again.

@gouthamve can you explain more, how bulk insert can result in overlapping (same) data? (I think I am missing the context on how bulk insert is exactly planned to be done)

I would also keep in mind that TSDB is a library and while mainly used for Prometheus, other use it as well. For Thanos this feature would be extremely valuable, because:

So yea. It is needed for Thanos for sure, any other use cases? Issue was created Jan 2017, so @fabxc had some certain use cases in mind for sure (:

@krasi-georgiev
Copy link
Contributor

I can work on this once we define the why/what/how
@gouthamve waiting for your input 🙏

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented May 16, 2018

just watched @gouthamve's video that add's another use case - back-filling
https://youtu.be/0UvKEHFNu4Q?t=1219

@fabxc
Copy link
Contributor Author

fabxc commented May 16, 2018 via email

@bwplotka
Copy link
Contributor

bwplotka commented May 16, 2018

Yea, but we need to make sure compaction does not add regression when you put overlapping blocks. Currently, with overlap check we added, we just block the compaction flow. It is not safe to just unblock compaction in the current form in case of overlapping blocks

@bwplotka
Copy link
Contributor

Any ideas for this? It is blocking: thanos-io/thanos#348

@codesome
Copy link
Contributor

I would like to work on this, will come up with ideas for this soon.
This would also help implement bulk loading (#24, prometheus/prometheus#535)

@codesome codesome mentioned this issue Aug 31, 2018
@krasi-georgiev
Copy link
Contributor

great, I will find time to review it.

I'd think in general handling this at query time first will be by far
sufficient. Compaction adds lots of attack surface for data loss and
regressions and we don't have a use case where we'd be dealing with high
fragmentation.

don't forget to sync with @gouthamve and @fabxc tsdb gurus for the best design approach 😜

@codesome
Copy link
Contributor

@fabxc @gouthamve
Are we looking at only vertical query merging? With bulk import I feel vertical compaction would be helpful.

@bwplotka
Copy link
Contributor

Again, IMO vertical compaction is must-have. (: For bulk import if you want bulk import further in past, but also more resilient compaction overall.

Plus thanos-io/thanos#348

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants