Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
diff: track total cost of search and bail if high
This is the last piece of the puzzle to get somewhat comparable to GNU diff performance without implementing all of its tricks - although this one is also used by GNU diff, in its own way. It brings down a diff which still takes over a minute with the previous commit to under a second. Before > hyperfine -N -i --output=pipe --warmup 2 \ './target/release/diffutils diff /home/kov/Projects/CompilingProfiling/b.cpp /home/kov/Projects/CompilingProfiling/c.cpp' Benchmark 1: ./target/release/diffutils diff /home/kov/Projects/CompilingProfiling/b.cpp /home/kov/Projects/CompilingProfiling/c.cpp Time (mean ± σ): 67.717 s ± 0.773 s [User: 67.037 s, System: 0.079 s] Range (min … max): 67.371 s … 69.903 s 10 runs After > hyperfine -N -i --output=pipe --warmup 2 \ './target/release/diffutils diff /home/kov/Projects/CompilingProfiling/b.cpp /home/kov/Projects/CompilingProfiling/c.cpp' Benchmark 1: ./target/release/diffutils diff /home/kov/Projects/CompilingProfiling/b.cpp /home/kov/Projects/CompilingProfiling/c.cpp Time (mean ± σ): 595.0 ms ± 2.1 ms [User: 531.8 ms, System: 43.0 ms] Range (min … max): 591.8 ms … 598.1 ms 10 runs It basically keeps track of how much work we have done overall for a diff job and enables giving up completely on trying to find ideal split points if the cost implies we had to trigger the "too expensive" heuristic too often. From that point forward it only does naive splitting of the work. This should not generate diffs which are much worse than doing the diagonal search, as it should only trigger in cases in which the files are so different it won't find good split points anyway. This is another case in which GNU diff's additional work with hashing and splitting large chunks of inclusion / deletion from the diff work and trying harder to find ideal splits seem to cause it to perform slightly poorer: > hyperfine -N -i --output=pipe --warmup 2 \ 'diff /home/kov/Projects/CompilingProfiling/b.cpp /home/kov/Projects/CompilingProfiling/c.cpp' Benchmark 1: diff /home/kov/Projects/CompilingProfiling/b.cpp /home/kov/Projects/CompilingProfiling/c.cpp Time (mean ± σ): 2.412 s ± 0.009 s [User: 2.361 s, System: 0.032 s] Range (min … max): 2.402 s … 2.428 s 10 runs That said, GNU diff probably still generates better diffs not due to this, but due to its post-processing of the results, trying to create more hunks with nearby changes staying close to each other, which we do not do (but we didn't do that before anyway).
- Loading branch information