Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: investigate wal performance degradation #183

Closed
Ryanfsdf opened this issue Jul 16, 2019 · 1 comment
Closed

perf: investigate wal performance degradation #183

Ryanfsdf opened this issue Jul 16, 2019 · 1 comment
Assignees

Comments

@Ryanfsdf
Copy link
Contributor

Enabling the WAL severely impacts the speed at which flushes and compactions occur (From my experimentation, roughly ~4-8x). I would have expected some sort of performance degradation though not to the extent that I saw in my experiment.

This initially came up during my implementation of #179. When writes were stalled (which means WAL wasn't written to), background compactions temporarily sped up by ~4-8x.

I further tested this by adding a few lines to compaction.go in runCompaction() after the iteration:

if c.flushing == nil {
	fmt.Printf("bytes compacted: %d\n", c.bytesIterated)
} else {
        fmt.Printf("bytes flushed: %d\n", c.bytesIterated)
}

and updating the sync.go benchmark to stop writing for a few seconds every couple of minutes with the below code added at the front of the for loop in sync.go:

if time.Now().Minute() % 2 == 0 && time.Now().Second() < 5 {
        time.Sleep(1 * time.Second)
}

and running:

pebble sync -c 1000 -d 10m -w bench -m 20000

What I observed was that the bytes compacted: x would print ~4-8x the number of bytes compacted/flushed per second during the time when writes were stopped compared to when writes were running.

Another experiment I ran was to simply run the sync benchmark with the WAL disabled. The throughput difference was also ~4-8x. I found this by turning up the concurrency level until turning it up no longer affected the throughput. The benchmark with WAL enabled would cap at ~7MB/s and the benchmark with WAL disabled would cap at ~35MB/s (c=1000 for WAL enabled, c=10 for WAL disabled). Once again, I observed that flushes and compaction occurred much quicker with the WAL disabled.

I have yet to try this on anything but my Macbook, I'll be testing this out on other machines as well.

I suspect that the results may be due to:

  1. How my Macbook SSD behaves
  2. Implementation bottleneck somewhere in Pebble
  3. Limitation of writing the WAL to the the same disk as the entries
  4. Bad experimental setup

Further investigation will be required.

@Ryanfsdf Ryanfsdf self-assigned this Jul 16, 2019
@petermattis
Copy link
Collaborator

I believe what is happening here is due to the Macbook SSD which has extremely poor performance when syncing frequently. On a c5d.4xlarge AWS instance I see almost no difference between compaction throughput when the WAL is enabled vs disabled. In both cases it is 150-200 MB/s, which is what I'd expect from an SSD. On my Macbook, compaction throughput is 50-100 MB/s with the WAL disabled, but only 10 MB/s with the WAL enabled. But if I leave the WAL enabled and disable syncing, throughput rises to 50-100 MB/s again. I don't think there is anything to be done here.

Note that RocksDB does not do syncing properly on Darwin. There is a special system call in order to actually sync data to the media (fsync does the wrong thing). See golang/go#26650 and https://github.com/golang/go/blob/a3b01440fef3d2833909f6651455924a1c86d192/src/internal/poll/fd_fsync_darwin.go#L12-L23.

One takeaway is that we shouldn't be using Macbooks for write performance teseting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants