Skip to content

Commit

Permalink
add differential bcmap compression code, cleanup difflib
Browse files Browse the repository at this point in the history
  • Loading branch information
fkaelberer committed May 19, 2014
1 parent 59373ce commit f7d3330
Show file tree
Hide file tree
Showing 6 changed files with 1,194 additions and 468 deletions.
51 changes: 50 additions & 1 deletion external/cmapscompress/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quick notes about binary CMap format (bcmap)

The format is designed to package some information from the CMap files located at external/cmap. Please notice for size optimization reasons, the original information blocks can be changed (split or joined) and items in the blocks can be swaped.
The format is designed to package some information from the CMap files located at external/cmap. Please notice for size optimization reasons, the original information blocks can be changed (split or joined) and items in the blocks can be swapped.

The data stored in binary format in network byte order (big-endian).

Expand All @@ -15,6 +15,55 @@ The following primitives used during encoding of the file:
- signed fixed number (SB[n]) – similar to the SN, but it represents a signed number that is stored in B[n]
- string (S) – the string is encoded as sequence of bytes. First comes length is characters encoded as UN, when UTF16 characters encoded as UN.

# Differential compression

The contents of each CMap file is either stored normally or differentially. In the latter case, a second CMap file (the 'base file') is needed for file decoding.

The first record in each file indicates if the file is stored normally or differentially.
It is a *baseFileName* string (S) which
- is empty ('') if the file is stored normally, or
- contains the file name of the base file (without path or extension) if it is stored differentially.

The second record, *contentSize*, is an unsigned number (UN) which contains
- the number of bytes to follow if the file is stored normally ('the file contents'), or
- the number of bytes to reconstruct via differential compression (see below).

In either case, the (possibly decoded) file contents are then structured as described in the [file structure](#file-structure) section.

### Decoding differential data

If a CMap file (let's name it *A*) is stored differentially, file contents are to be constructed from the contents of *A* and from the base file (which we shall call *B*).
The records to follow are alternately of the following type, starting with *copy*.

A **copy**-type instruction specified by
- startDelta as UN
- length as UN

which instructs to read *length* bytes from *B*, where startDelta specifies the start position as an offset from the previously used array end. (The previous array end is initialized with the position of the start of content, i.e., after *baseFileName* and *contentSize* in *B*).

An **insert**-type instruction is specified by
- length as UN

and instructs to read append the following *length* bytes from *A* and append it to the contents.

It may happen that file *B* itself is stored differentially and depends on a further file. In this case, *B* has to be restored before restoring *A*. The following pseudocode accomplishes the decoding
```
var contents = '';
var previousEnd = 0; // position after *baseFileName* and *contentSize* in baseFile
for (var copy = true; contents.length < contentSize; copy = !copy) {
if (copy) {
var start = previousEnd + A.readUN();
var length = A.readUN();
contents.append(B.subarray(start, start + length));
previousEnd = start + length;
} else {
var length = A.readUN();
contents.append(A.readBytes(length));
}
}
```

<a name="file-structure"></a>
# File structure

The first byte is a header:
Expand Down
Loading

0 comments on commit f7d3330

Please sign in to comment.