Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: improve ingested flushable flushing logic #2266

Open
bananabrick opened this issue Jan 23, 2023 · 6 comments
Open

db: improve ingested flushable flushing logic #2266

bananabrick opened this issue Jan 23, 2023 · 6 comments

Comments

@bananabrick
Copy link
Contributor

bananabrick commented Jan 23, 2023

Presently in #2212, we split a flush into chunks so that ingested flushables have an updated view of the lsm while flushing.

This is unnecessary. If we have some queue of flushables, and the flushable with sstables at some position i is f_i, then it is possible to determine the target level for the files associated with f_i, if we've determined the target level of all flushables upto f_i-1. f_0 is the base case as it can use the current version to determine its target level.

This will prevent a potential slowdown of flushes when ingested flushables are present in the flushable queue.

Jira issue: PEBBLE-145

@jbowens
Copy link
Collaborator

jbowens commented Feb 15, 2023

@bananabrick can this slip to 23.2? I know you've got a lot on your plate for 23.1

@bananabrick
Copy link
Contributor Author

@jbowens No this one can't. I want to do this before enabling flushable ingests.

@jbowens
Copy link
Collaborator

jbowens commented Feb 15, 2023

@bananabrick can you explain why it can't wait? My understanding is that this is purely an optimization and is not a regression over 22.2.

@bananabrick
Copy link
Contributor Author

bananabrick commented Feb 16, 2023

@jbowens Let's say we have some memtables m1, m2 in the queue, followed by ingested flushables i1, followed by memtable m3, followed by ingested flushable i2, followed by memtable m4. So, the queue is m1,m2,i1,m3,i2,m4. If we try and flush this right now, we're going to incur 4 manifest writes + syncs. In 22.2, we won't have i2, i2, and just m1,m2,m3,m4. To flush m1, m2 and m3, we'll incur exactly one manifest write + sync.

Maybe writing the memtable to disk as sstables and syncing those will dominate over a few manifests writes + syncs, and this is okay. Not a 100% sure.

@bananabrick
Copy link
Contributor Author

Maybe I could just enable it and only do this issue if we see a regression in any benchmarks which perform pebble ingestions.

@jbowens
Copy link
Collaborator

jbowens commented Feb 27, 2023

So, the queue is m1,m2,i1,m3,i2,m4. If we try and flush this right now, we're going to incur 4 manifest writes + syncs. In 22.2, we won't have i2, i2, and just m1,m2,m3,m4.

In 22.2, wouldn't the same sequence of ingests result in:

  1. flush of m1+m2
  2. ingest of i1
  3. flush of m3
  4. ingest of i2
  5. flush of m4

with corresponding manifests writes and syncs for each?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

2 participants