Skip to content
/ patch Public

line based patch, input is a unified diff

License

Notifications You must be signed in to change notification settings

hannesm/patch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Patch - apply your unified diffs in pure OCaml

The loosely specified diff file format is widely used for transmitting differences of line-based information. The motivating example is opam, which is able to validate updates being cryptographically signed (e.g. conex) by providing a unified diff.

The test-based infered specification implemented in this library is the following grammar.

decimal := [0-9]+
any := any character except newline

filename := "/dev/null" | any except tab character
file := filename "\t" any "\n"
mine := "--- " file
theirs := "+++ " file

no_newline = "\ No newline at end of file"
hunk_line_prefix := " " | "-" | "+"
hunk_line := hunk_line_prefix any | no_newline
range := decimal "," decimal | decimal
hunk_hdr := "@@ -" range " + " range " @@\n"
hunk := hunk_hdr line+

diff := mine theirs hunk+

In addition, some support for the git diff format is available, which contains diff --git a/nn b/nn as separator, prefixes filenames with a/ and b/, and may contain extra headers, especially for pure renaming: rename from <path> followed by rename to <path>. The git diff documentation also mentions that a diff file itself should be an atomic operation, thus all - files corrspond to the files before applying the diff (since patch only does single diff operations, and requires the old content as input). You have to ensure to provide the correct data yourself.

A diff consists of a two-line header containing the filenames (or "/dev/null" for creation and deletion) followed by the actual changes in hunks. A complete diff file is represented by a list of diff elements. The OCaml types below, provided by this library, represent mine and theirs as operation (edit, delete, create). Since a diff is line-based, if the file does not end with a newline character, the line in the diff always contains a newline, but the special marker no_newline is added to the diff. The range information carries start line and chunk size in the respective file, with two side conditions: if the chunk size is 0, the start line refers to after which the chunk should be added or deleted, and if the chunk size is omitted (including the comma), it is set to 1. NB from practical experiments, only "+1" and "-1" are supported.

type operation =
  | Edit of string * string
  | Delete of string
  | Create of string
  | Rename_only of string * string

type hunk (* positions and contents *)

type t = {
  operation : operation ;
  hunks : hunk list ;
  mine_no_nl : bool ;
  their_no_nl : bool ;
}

In addition to parsing a diff and applying it, support for generating a diff from old and new file contents is also provided.

Shortcomings

The function patch assumes that the patch applies cleanly, and does not check this assumption. Exceptions may be raised if this assumption is violated. The git diff format allows further features, such as file permissions, and also a "copy from / to" header, which I was unable to spot in the wild.

Installation

opam install patch

Documentation

The API documentation can be browsed online.