Switch to batched fsync by default #3492

dscho · 2021-10-26T15:59:51Z

For years now, Git for Windows turned on the mode where loose object files were synchronized to disk immediately. This was done because several users reported problems when their Windows crashed: there were object files on disk, in the correct location, and having the correct file size, but instead of the zlib-compressed Git objects, the files consisted of NUL bytes.

However, despite my initial findings, the performance impact was actually quite measurable when users started to use larger monorepos.

Happily, @neerajsi-msft implemented a batched fsync mode. Instead of synchronizing immediately after writing the file, we can now choose to synchronize once, after all the loose object files are written by the current Git process.

This strikes a much better balance between safety and speed than the mode that Git for Windows uses.

So let's integrate the patches implementing that mode, and switch to that mode as the new default.

neerajsi-msft · 2021-10-26T17:01:39Z

Are the test failures new? Would you like me to investigate?

neerajsi-msft · 2021-10-26T17:25:50Z

I'm investigating, please expect an update shortly.

dscho · 2021-10-26T18:41:13Z

Are the test failures new?

Yes, those test failures are new. Please note that I turned on the default for all platforms, kind of out of curiosity. (The commit message of 64204c8 is a bit misleading there, it was a spur of the moment kind of idea to simply turn it on in environment.c rather than via compat/mingw.c).

Would you like me to investigate?

That would be fantastic!

I'm investigating, please expect an update shortly.

Thank you!

derrickstolee · 2021-10-26T18:48:51Z

This change is quite significant. Can we plan to switch the default only after the 2.34 release so we have time to do a thorough performance analysis across lots of environments?

I think it would be good to take @neerajsi-msft's changes for 2.34, but off by default for a release cycle.

dscho · 2021-10-26T19:15:22Z

This change is quite significant.

I agree.

Can we plan to switch the default only after the 2.34 release so we have time to do a thorough performance analysis across lots of environments?

I think it would be good to take @neerajsi-msft's changes for 2.34, but off by default for a release cycle.

I guess that's a safer approach, but let's see whether we can identify the bugs that prevent us from enabling this globally, anyway. It would be good to flesh this out before v2.34.

neerajsi-msft · 2021-10-26T22:41:38Z

I submitted a fix to the git mailing list over here: https://lore.kernel.org/git/[email protected]/

I tested on Ubuntu locally and passed. It would be great if you can merge the fix and see if the CI passes here.

Thanks!
-Neeraj

dscho · 2021-10-27T12:38:58Z

I submitted a fix to the git mailing list over here: https://lore.kernel.org/git/[email protected]/

I tested on Ubuntu locally and passed. It would be great if you can merge the fix and see if the CI passes here.

Thanks! -Neeraj

Thank you for this quick turnaround! I applied the patches and pushed the branch. Let's see what CI thinks.

About the patch series, I fear that you will need to rephrase the commit messages because the original patch made it into next already (and is hence subject to follow-up patches rather than being rewritten).

And alternative would be to ask Junio to kick the topic out of next and back to seen, in which case you will probably be asked to submit a new iteration of the original patch.

I'll reply on the Git mailing list with the same content.

dscho · 2021-10-27T16:33:44Z

Let's see what CI thinks.

Hmm. @neerajsi-msft could you have a look? I saw a potential crash in unpack-objects(), something about a full disk and a few runaway jobs?

neerajsi-msft · 2021-10-27T18:27:59Z

@dscho: You might need to kill the CI jobs. There is some problem where we're looping and creating tons of files inside the git directory. FYI, the problem doesn't repro in the upstream git when built with the gfw sdk. So there's probably something specific to an interaction wtih downstream patches.
I'm still investigating...

dscho · 2021-10-27T19:19:43Z

Thank you!

dscho · 2021-10-27T19:21:08Z

BTW if you cherry-pick the commit making batch the default, does it still not reproduce with upstream?

neerajsi-msft · 2021-10-27T19:24:00Z

Yeah, I did enable batch mode by default. I've narrowed it down to a problem in tmp_objdir_migrate. For some reason we're continuously trying to make top-level obj file path at increasing levels of nesting.
i.e. we want to create
.git/tmp_incoming_xxx/03/1234, but instead create .git/tmp_incoming_xxx/03/03/03/03/....

I'm pretty close to a root cause.

neerajsi-msft · 2021-10-27T20:25:27Z

I have a root cause!
It has to do with the behavior of mkdtemp on Windows with long path support.

In tmp_objdir_create, we do:

	if (!mkdtemp(t->path.buf)) {

This calls mingw_mktemp, which internally calls xutftowcs_path, which futhermore calls handle_long_path.

handle_long_path will canonicalize the path with GetFullPathNameW. The final path may not have the exact same length as the original path (in this case it is shorter). So the strbuf::len field doesn't match the null-terminated length of the string.

This breaks strbuf_addf as called by migrate_paths.

neerajsi-msft · 2021-10-27T21:44:33Z

@dscho : Please see 435e1d2
That fixes at least one of the tests.

I created my own pr at #3494 so that I can iterate on the CI if there are any more failures.

neerajsi-msft · 2021-10-27T22:36:00Z

Looks like my PR passed.

dscho · 2021-10-28T08:44:24Z

Excellent work!

Looks like my PR passed.

Excellent! Thank you for your hard work, it is much appreciated! I fast-forwarded to your fix.

Now, I am considering the following options:

Integrating this as-is, before v2.34.0-rc0 (Git v2.34.0-rc0 is scheduled to appear today, and Git for Windows will follow suite soon thereafter)
Dropping the change to batch as default, keeping it at true for the Git for Windows v2.34.x cycle
- Then there would be the question whether it would be worth it to introduce an "experimental option" in the installer, to allow opting into the batch mode
Integrating it as-is, for -rc0, but then switching the fsync mode back to true in -rc1 and then keeping it there for the Git for Windows v2.34.x cycle
Leave this PR out of v2.34.0 altogether

I am currently leaning toward option 2, without the experimental option (essentially because it would be a lot of work for me)...

Thoughts? Did I forget any valid option?

rimrul · 2021-10-28T11:31:48Z

Integrating it as-is, for -rc0, but then switching the fsync mode back to true in -rc1 and then keeping it there for the Git for Windows v2.34.x cycle

I don't like the concept of planned changes to the default value of a config option between RCs. If it's been changed before RC0 and feedback from RC0 shows it's causing trouble, changing it back is fine, but releasing a RC with something we intend not to be in the final release seems to go against the concept of release candidates.

The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <[email protected]> Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

When creating a subprocess with a temporary ODB, we set the GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not to update refs, since the tmp-objdir may go away. Introduce a similar mechanism for in-process temporary ODBs when we call tmp_objdir_replace_primary_odb. Now both mechanisms set the disable_ref_updates flag on the odb, which is queried by the ref_transaction_prepare function. Note: This change adds an assumption that the state of the_repository is relevant for any ref transaction that might be initiated. Unwinding this assumption should be straightforward by saving the relevant repository to query in the transaction or the ref_store. Peff's test case was invoking ref updates via the cachetextconv setting. That particular code silently does nothing when a ref update is forbidden. See the call to notes_cache_put in fill_textconv where errors are ignored. Reported-by: Jeff King <[email protected]> Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

* ns/tmp-objdir: tmp-objdir: disable ref updates when replacing the primary odb tmp-objdir: new API for creating temporary writable databases

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_state'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

When adding many objects to a repo with core.fsyncObjectFiles set to true, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. Fortunately, Windows, and macOS offer mechanisms to write data from the filesystem page cache without initiating a hardware flush. Linux has the sync_file_range API, which issues a pagecache writeback request reliably after version 5.2. This patch introduces a new 'core.fsyncObjectFiles = batch' option that batches up hardware flushes. It hooks into the bulk-checkin plugging and unplugging functionality and takes advantage of tmp-objdir. When the new mode is enabled do the following for each new object: 1. Create the object in a tmp-objdir. 2. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1. Issue an fsync against a dummy file to flush the hardware writeback cache, which should by now have processed the tmp-objdir writes. 2. Rename all of the tmp-objdir files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the case today, but may be a good extension to those components. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This change also updates the macOS code to trigger a real hardware flush via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on macOS there was no guarantee of durability since a simple fsync(2) call does not flush any hardware caches. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. This number is from a patch later in the series. Adding 500 files to the repo with 'git add' Times reported in seconds. core.fsyncObjectFiles | Linux | Mac | Windows ----------------------|-------|-------|-------- false | 0.06 | 0.35 | 0.61 true | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

This commit adds a win32 implementation for fsync_no_flush that is called git_fsync. The 'NtFlushBuffersFileEx' function being called is available since Windows 8. If the function is not available, we return -1 and Git falls back to doing a full fsync. The operating system is told to flush data only without a hardware flush primitive. A later full fsync will cause the metadata log to be flushed and then the disk cache to be flushed on NTFS and ReFS. Other filesystems will treat this as a full flush operation. I added a new file here for this system call so as not to conflict with downstream changes in the git-for-windows repository related to fscache. Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables bulk-checkin for update-index infrastructure to speed up adding new objects to the object database by leveraging the pack functionality and the new bulk-fsync functionality. There is some risk with this change, since under batch fsync, the object files will not be available until the update-index is entirely complete. This usage is unlikely, since any tool invoking update-index and expecting to see objects would have to synchronize with the update-index process after passing it a file path. Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

The unpack-objects functionality is used by fetch, push, and fast-import to turn the transfered data into object database entries when there are fewer objects than the 'unpacklimit' setting. By enabling bulk-checkin when unpacking objects, we can take advantage of batched fsyncs. Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

Add test cases to exercise batch mode for: * 'git add' * 'git stash' * 'git update-index' * 'git unpack-objects' These tests ensure that the added data winds up in the object database. In this change we introduce a new test helper lib-unique-files.sh. The goal of this library is to create a tree of files that have different oids from any other files that may have been created in the current test repo. This helps us avoid missing validation of an object being added due to it already being in the repo. Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>

Switch to batched fsync by default

dscho added this to the Next release milestone Oct 26, 2021

dscho self-assigned this Oct 26, 2021

dscho removed this from the Next release milestone Oct 26, 2021

dscho mentioned this pull request Oct 28, 2021

Ns/batched fsync fix long path #3494

Closed

neerajsi-msft and others added 9 commits October 28, 2021 17:58

Merge branch 'ns/tmp-objdir' into ns/batched-fsync

dfcd0a3

* ns/tmp-objdir: tmp-objdir: disable ref updates when replacing the primary odb tmp-objdir: new API for creating temporary writable databases

dscho pushed a commit to dscho/git that referenced this pull request Dec 30, 2024

Merge pull request git-for-windows#3492 from dscho/ns/batched-fsync

b0309ec

Switch to batched fsync by default

dscho pushed a commit to dscho/git that referenced this pull request Dec 30, 2024

Merge pull request git-for-windows#3492 from dscho/ns/batched-fsync

c59e397

Switch to batched fsync by default

dscho pushed a commit to dscho/git that referenced this pull request Dec 30, 2024

Merge pull request git-for-windows#3492 from dscho/ns/batched-fsync

a61bb84

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Dec 31, 2024

Merge pull request #3492 from dscho/ns/batched-fsync

87b8a4e

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Dec 31, 2024

Merge pull request #3492 from dscho/ns/batched-fsync

6baa011

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 1, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

b3257ff

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 1, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

c1ae190

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 1, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

0e5fd6c

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 1, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

4eae63d

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 1, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

89e074d

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 2, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

4ae51ea

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 2, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

3a5f190

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 2, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

666acbb

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 3, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

b6cf04c

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 6, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

ff3af1b

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 6, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

841b65b

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

4fd4e45

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

04c9739

Switch to batched fsync by default

dscho mentioned this pull request Jan 7, 2025

Rebase to v2.48.0 rc2 #5348

Merged

dscho pushed a commit to dscho/git that referenced this pull request Jan 7, 2025

Merge pull request git-for-windows#3492 from dscho/ns/batched-fsync

8cad34c

Switch to batched fsync by default

dscho pushed a commit to dscho/git that referenced this pull request Jan 7, 2025

Merge pull request git-for-windows#3492 from dscho/ns/batched-fsync

7ccbc9a

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

e5df6e0

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

1095dd0

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

c3b5dc6

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

a1a8df8

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

3efe573

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

3f536d8

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 9, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

7d8cf18

Switch to batched fsync by default

git-for-windows-ci pushed a commit that referenced this pull request Jan 9, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

9751ccd

Switch to batched fsync by default

dscho pushed a commit that referenced this pull request Jan 17, 2025

Merge pull request #3492 from dscho/ns/batched-fsync

b16def4

Switch to batched fsync by default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to batched fsync by default #3492

Switch to batched fsync by default #3492

dscho commented Oct 26, 2021

neerajsi-msft commented Oct 26, 2021

neerajsi-msft commented Oct 26, 2021

dscho commented Oct 26, 2021

derrickstolee commented Oct 26, 2021

dscho commented Oct 26, 2021

neerajsi-msft commented Oct 26, 2021

dscho commented Oct 27, 2021

dscho commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021

dscho commented Oct 27, 2021

dscho commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021 •

edited

Loading

neerajsi-msft commented Oct 27, 2021

dscho commented Oct 28, 2021

rimrul commented Oct 28, 2021

Switch to batched fsync by default #3492

Switch to batched fsync by default #3492

Conversation

dscho commented Oct 26, 2021

neerajsi-msft commented Oct 26, 2021

neerajsi-msft commented Oct 26, 2021

dscho commented Oct 26, 2021

derrickstolee commented Oct 26, 2021

dscho commented Oct 26, 2021

neerajsi-msft commented Oct 26, 2021

dscho commented Oct 27, 2021

dscho commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021

dscho commented Oct 27, 2021

dscho commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021

neerajsi-msft commented Oct 27, 2021 • edited Loading

neerajsi-msft commented Oct 27, 2021

dscho commented Oct 28, 2021

rimrul commented Oct 28, 2021

neerajsi-msft commented Oct 27, 2021 •

edited

Loading