Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/record: use direct I/O for the WAL and MANIFEST #1159

Open
jbowens opened this issue Jun 4, 2021 · 5 comments
Open

internal/record: use direct I/O for the WAL and MANIFEST #1159

jbowens opened this issue Jun 4, 2021 · 5 comments

Comments

@jbowens
Copy link
Collaborator

jbowens commented Jun 4, 2021

WAL and MANIFEST might be good candidates for using direct I/O. The LogWriter already handles organizing writes into contiguous blocks. I'm not sure what impact, if any, on performance direct I/O would have.

I do think it would allow us to retry failed syncs during WAL and MANIFEST writes: DB.Apply Fatalf, logAndApply Fatalfs

Currently, errors in these codepaths are fatal because fsyncs of OS-buffered files cannot be retried. The OS marks errored buffers as clean, meaning a retried fsync will not sync the buffer and the file's contents remain unchanged regardless of a retry. https://wiki.postgresql.org/wiki/Fsync_Errors

This was motivated by thinking about @sumeerbhola's automated ballast file suggestion for detecting out-of-disk conditions. It's a really nice solution. The one sticking point is that an ENOSPC may occur during fsync. I'm not sure under what conditions ENOSPC may surface from fsync rather than the preceding write, but I suspect it may happen when the filesystem needs to allocate new metadata blocks. I'm not sure but maybe on copy-on-write file systems all block allocations happen during fsync?

Jira issue: PEBBLE-211

@petermattis
Copy link
Collaborator

WAL and MANIFEST might be good candidates for using direct I/O. The LogWriter already handles organizing writes into contiguous blocks. I'm not sure what impact, if any, on performance direct I/O would have.

I did testing a while ago which showed that direct I/O writes had similar sync latency to recycled log fsyncs. The latency was actually a few percent better for direct I/O, so there would be a perf win, but not a dramatic one. Direct I/O also uses less CPU. Direct I/O writes: the best way to improve your credit score is an interesting recent blog post on this topic.

@petermattis
Copy link
Collaborator

#41 (comment) is a useful comment on the intricacies of direct I/O.

@jbowens
Copy link
Collaborator Author

jbowens commented Sep 28, 2022

I've seen it mentioned a few times that in some systems a fsync on a file waits for all dirty pages to be flushed, not just the file that the fsync was requested on. There's some reference to it here. Direct I/O for the WAL seems like it would insulate WAL commits from latency spikes due to a glut of dirty pages elsewhere.

@sumeerbhola
Copy link
Collaborator

Also see the discussion and related links to other discussions on cockroachdb/cockroach#88442 (comment)

@jbowens
Copy link
Collaborator Author

jbowens commented Mar 23, 2023

There's an interaction with disk stall detection that may have been obvious to others but eluded me. When we write through the page cache, I/O may occur outside the context of Cockroach syscall. If a background kernel thread is performing the write back and stalls, Cockroach is oblivious. Direct I/O would ensure all Cockroach's I/O is directly timed.

Maybe this distinction is insignificant, because eventually Cockroach should always issue a timed fsync which should block on the in-progress writeback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

3 participants