Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M-map full chunks of Head from disk #6679

Merged
merged 44 commits into from
May 6, 2020
Merged

Conversation

codesome
Copy link
Member

@codesome codesome commented Jan 22, 2020

tl-dr desc for the PR from @krasi-georgiev


When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages

  • Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
  • Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

Mmaped chunks format - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
The block chunks are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

Prombench results

WAL Replay

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

Memory During WAL Replay

High Churn
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in this comment


Prerequisite: #6830 (Merged)

Closes #6377. More info in the linked issue and the doc in that issue and the doc inside that doc inside that issue :)

@codesome codesome changed the title M-map full chunks of Head from disk [WIP] M-map full chunks of Head from disk Jan 22, 2020
@codesome
Copy link
Member Author

Forgot to open as a draft PR, changed the title to WIP.

@codesome codesome force-pushed the mmap-chunks branch 4 times, most recently from e4d052a to caca00a Compare January 23, 2020 11:42
@codesome codesome force-pushed the mmap-chunks branch 3 times, most recently from 42f798b to 2746f72 Compare January 27, 2020 12:32
Signed-off-by: Ganesh Vernekar <[email protected]>
@codesome codesome changed the title [WIP] M-map full chunks of Head from disk M-map full chunks of Head from disk Jan 27, 2020
@codesome
Copy link
Member Author

I have removed the WIP tag now and it is ready for review. The windows tests are failing and I could not find a way yet to get rid of the failure where the files are being removed in defer in the unit tests.

I will run prombench for a few hours tomorrow on this to see the benefit and also stress test this PR.

@bwplotka bwplotka self-requested a review January 27, 2020 17:25
@codesome
Copy link
Member Author

/prombench master

@prombot
Copy link
Contributor

prombot commented Jan 28, 2020

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-6679 and master

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.12.0

@krasi-georgiev krasi-georgiev self-assigned this Jan 28, 2020
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Amazing work, looking good in general. Benchmarks are solid as well 💪

Some comments, but mostly minor nits.

Also, maybe it's my own preference but this PR is too long to review properly ): I think we will have better quality reviews if we would try to ship smaller functionalities one by one. I got definitiely tired and dragged to other duties when I reached like 70% of this PR. But again, this my sidenote/suggestion, maybe for next featues. (:

Thanks!

tsdb/chunks/chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated
if s.headChunk != nil {
chunkRef, err := chunkReadWriter.WriteChunk(s.ref, s.headChunk.minTime, s.headChunk.maxTime, s.headChunk.chunk)
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic? Can we do better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to put a TODO here. We should be able to get rid of the panic, I will look into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a panic below too in the same function for the appender. And looking through the flow where this is called, hard fail seems to be the pattern in the ingest path if it fails. I will keep it as panic for now.

}
if err = w.dirFile.Sync(); err != nil {
return err
if err = dirFile.Sync(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have an fsync on any critical path, this could stall things for many many seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest doing here?

tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
@codesome
Copy link
Member Author

@geekodour helped me run another benchmark on this with a little less churn than default prombench, you can find the links to the dashboard in this pr prometheus-community#50. This will give an idea of improvement for a saner Prometheus setup (do note that prombench uses 5s as scrape interval so the memory difference is visible early on in both prombench instances than what would be on a typical setup if I assume 15s scrape interval)

@codesome
Copy link
Member Author

codesome commented Jan 29, 2020

Benchmark results are here

  1. With a high churn (which was run on this PR), with series ranging between 4.3M to 9.8M because of churn, the RSS savings was ranging between 10-22%.
    more_churn

  2. With a slightly lower churn (which was run on another PR but with the same code), with series ranging between 2.2M to 3.2M because of churn, the RSS savings was ranging between 30-45%.
    less_churn

Some conclusions

  1. Memory reduction is as expected. With higher churn, as there are more incomplete chunks, the gain shows slightly lower as incomplete chunks stay in the head (incomplete=didn't touch 120 samples). With a lower churn, the gain was higher. On a Prometheus setup with very little churn, the gain would be even higher.

Some observations (graphs are from high churn benchmark):

  1. The number of Head chunks look weird for this PR, it slightly grew compared to master at a constant rate. When it came down suddenly, there was a spike in the number of active series for a small while before it became normal (it shot up to 27.7M then came back to the normal 5.6M). This needs to be investigated. (Update: This is fixed)
    Screenshot from 2020-01-29 10-59-29

@codesome
Copy link
Member Author

/prombench cancel

@prombot
Copy link
Contributor

prombot commented Jan 29, 2020

Benchmark cancel is in progress.

tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
@krasi-georgiev
Copy link
Contributor

lgtm

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

Generally looks good some style and readability comments only. (:

tsdb/block_test.go Outdated Show resolved Hide resolved
tsdb/db_test.go Show resolved Hide resolved
tsdb/db_test.go Show resolved Hide resolved
tsdb/head.go Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Show resolved Hide resolved
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, some style suggestions and responded on your questions.

I think it's slowly good to go (:

tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <[email protected]>
@codesome codesome force-pushed the mmap-chunks branch 3 times, most recently from 0e9712e to c8e059b Compare May 6, 2020 10:31
Signed-off-by: Ganesh Vernekar <[email protected]>
@codesome
Copy link
Member Author

codesome commented May 6, 2020

Windows test is finally fixed in eb2e6c1

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it out 💪

@codesome codesome merged commit d4b9fe8 into prometheus:master May 6, 2020
@bwplotka
Copy link
Member

bwplotka commented May 6, 2020

🎉

@brancz
Copy link
Member

brancz commented May 6, 2020

Incredible work! 🎉🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flush Head chunks to disk
8 participants