Version 1.0 (Draft)
- Table of Contents
- 0. Foreword
- 1. Purpose
- 2. Definitions
- 3. Versioning
- 4. File Structure
- 5. Magic Values
- 6. Media Types
- 7. Referencing Resources
- 8. External Documentation Referenced
This is a DRAFT specification and is subject to change without notice and in ways which are not backwards-compatible with previous draft revisions. Once the initial specification is finalized with version 1.0, further revisions will be properly versioned.
The Argus Resource Package (ARP) format is designed to provide a simple, structured format for packaging static assets in a way which is easily and efficiently parsed.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Strictly speaking, the phrase "ARP package" is tautological. However, for the sake of ease of understanding, this SHALL be considered an acceptable way of describing a binary structure encoding data in the ARP format, and will be used to refer to such throughout this specification.
Unless otherwise noted, all STRING values referred to in the format SHALL be stored and interpreted as null-terminated UTF-8-encoded strings.
Unless otherwise noted, all INTEGER values referred to in the format SHALL be encoded and interpreted as unsigned and little-Endian.
Each version of the ARP specification is identified by a major version and an incremental version). The incremental version SHALL be exclusively used for clarifying, corrective, or otherwise backwards-compatible changes. The format version specified as a single INTEGER in the header of ARP files SHALL correspond to the major version of the specification that the file conforms to.
ARP packages SHOULD be contained by files with the extension .arp
. ARP packages SHALL have a maximum size of 2^64-1
bytes.
ARP packages MAY be split into multiple files (parts) if desired. However, this MUST be explicitly declared in the header. A package MAY have up to 999 parts. The first part MUST have index 1.
Each part file MUST be named as follows, where #
represents a digit of the part index: <package name>.part###.arp
.
For example, part 2 of package foo SHALL be named foo.part002.arp
. This naming convention is OPTIONAL for the first
part, which in this case MAY be named simply foo.arp
.
Each part MUST begin with a 16-byte Part Header (described below). The body section corresponding to the part MUST immediately follow this header.
The first part of an ARP package SHALL contain three primary sections: the package header, the directory, and the body. These sections are described below. Note that the body section of the first part, while technically REQUIRED, MAY have a length of 0 bytes and is thus effectively OPTIONAL.
Subsequent parts SHALL NOT include the package header or directory structures, Instead, they SHALL include a shorter Part Header followed immediately by the body structure.
The package header describes the meta-attributes of the ARP package. The structure is described below.
Offset | Length | Type | Name | Description |
---|---|---|---|---|
0x0 |
0x8 |
byte sequence | Magic | MUST be hex sequence 1B 41 52 47 55 53 52 50 (0x1B ARGUSRP ). |
0x8 |
0x2 |
INTEGER | Version | The format major version the package conforms to. Parsers MUST refuse to parse further if they do not provide explicit support for this major version of the format. |
0xA |
0x2 |
ASCII string | Compression | The type of compression applied to individual resources in the package as a magic ASCII string. The standard possible values are described in the Magic Values section of this document. |
0xC |
0x30 |
STRING | Namespace | The package namespace as a STRING. |
0x3C |
0x2 |
INTEGER | Parts | The number of files comprising this package. This value MUST be between 1 and 999, inclusive. |
0x3E |
0x8 |
INTEGER | Catalogue Offset | The offset in bytes of the catalogue section, starting from the beginning of the package header. |
0x46 |
0x8 |
INTEGER | Catalogue Size | The length in bytes of the catalogue section. |
0x4E |
0x4 |
INTEGER | Node Count | The number of node descriptors contained by the catalogue. |
0x52 |
0x4 |
INTEGER | Directory Count | The number of directories contained by the catalogue. |
0x56 |
0x4 |
INTEGER | Resource Count | The number of resources contained by the catalogue. |
0x5A |
0x8 |
INTEGER | Body Offset | The offset in bytes of the body section of the first part, starting from the beginning of the package header. |
0x62 |
0x8 |
INTEGER | Body Size | The length in bytes of the body section of the first part. |
0x6A |
0x96 |
(none) | Reserved | Reserved for future use. |
The package namespace is constrained as described in Section 7.
Offset | Length | Type | Name | Description |
---|---|---|---|---|
0x0 |
0x8 |
byte sequence | Magic | MUST be hex sequence 1B 41 52 47 55 53 50 54 (0x1B ARGUSPT ). This is different from the magic in the primary header. |
0x8 |
0x2 |
INTEGER | Part Number | The index of this part. This MUST be between 2 and 999, inclusive. |
0xA |
0x6 |
(none) | Reserved | Reserved for future use. |
The catalogue structure is comprised of sequential node descriptors which point to directories and resources in the package. The structure of a node descriptor is described below.
The first node descriptor MUST describe the root directory of the package. This node has the magic name "" (empty string).
A node descriptor describes and points to either a resource or a directory listing within a body section.
Nodes descriptors MUST contain the CRC-32C checksum of the corresponding data. This MAY be ignored by the unpacker if
the package specifies a compression scheme which already includes a CRC, such as bzip2
.
The CRC-32C checksum SHALL be computed per the IEEE 802.3 standard, with a polynomial of 0x1EDC6F41
.
Nodes descriptors pointing to resources MAY specify a media type describing the type of data contained
by the resource. This MAY be left empty, in which case parsers SHALL assume a default type of
application/octet-stream
.
Apart from validating the checksum, any further validation by parsers against the node data is strictly OPTIONAL.
Node name nor media types SHALL NOT exceed 255 bytes in length.
Node names MUST NOT contain any reserved characters as defined by Section 7.
Offset | Length | Type | Name | Description |
---|---|---|---|---|
0x0 |
0x2 |
INTEGER | Descriptor length | The length of the node descriptor, including this length field. |
0x2 |
0x1 |
INTEGER | Type | The type of the node. 0 for resource, 1 for directory. |
0x3 |
0x2 |
INTEGER | Part index | The index of the package part containing the resource data. For directory-type nodes, this MUST be 1 . |
0x5 |
0x8 |
INTEGER | Data offset | The offset of this node's data in the body section of the corresponding package part. |
0xD |
0x8 |
INTEGER | Packed data length | The length of the packed node data in bytes. If this node is a directory, this MUST be a multiple of 4. |
0x15 |
0x8 |
INTEGER | Unpacked data length | The length of the unpacked node data in bytes. If the package uses compression, this MAY be different from the packed length. |
0x1D |
0x4 |
INTEGER | CRC | The CRC-32C checksum of the node data. |
0x21 |
0x1 |
INTEGER | Name length | The length of the node name in bytes, not including a null terminator. |
0x22 |
0x1 |
INTEGER | File extension length | The length of the node file extension, if applicable, not including a null terminator. |
0x23 |
0x1 |
INTEGER | Media type length | The length of the node media type in bytes, not including a null terminator. |
0x24 |
variable | STRING | Name | The name of this node as a STRING, not including a null terminator. |
variable | variable | STRING | File extension | The extension of the file this node was generated from, if applicable, not including a null terminator. |
variable | variable | ASCII string | Media type | The media type of this node as an ASCII string, not including a null terminator. |
A body section is comprised of raw resource data. There is no explicit structure in this section. It SHALL be organized according to the directory section.
In the first package part, the corresponding body section SHALL begin at the offset defined in the package header. In subsequent parts, the body MUST immediately follow the header.
A directory listing describes the contents of a directory. The structure is extremely simple, containing only a tightly-packed array of 4-byte zero-indexed node descriptor indices.
For example: the data [0x01
0x00
0x00
0x00
0x02
0x00
0x00
0x00
] specifies that the directory contains
the nodes with descriptor indices 1 and 2 in the catalogue.
All directory listings MUST be defined in the first part of the package.
A directory MUST NOT contain the root directory (index 0) as a child.
A package SHOULD NOT contain any set of directories which include one another in a cyclical manner, including the case of a directory containing itself.
The ARP format makes use of magic values to specify data types in various places. The table below defines the semantic meaning of each magic byte.
ARP packages MUST begin with the magic hex sequence 1B
41
52
47
55
53
52
50
(0x1B
ARGUSRP
). Files
not beginning with this magic are MUST NOT be considered valid ARP packages.
ARP parts following the first MUST begin with the magic hex sequence 1B
41
52
47
55
53
50
54
(0x1B
ARGUSPT
). Parts not beginning with this magic SHALL be rejected.
Note that this is not the same as the format magic.
Resources in the archive MAY be compressed with a number of different schemes. The available formats as well as their magic values are described in the table below.
As of version 1.0, compliant implementations MUST provide support for the DEFLATE algorithm. DEFLATE is chosen as the standard compression algorithm for its high compression ratio and decompression speed.
Generators MAY use other compression schemes, but decompression support is not guaranteed by the specification.
Only resources MAY be compressed. Directory listings SHALL always be stored uncompressed, irrespective of the package's compression field.
Compression magic MUST NOT contain ASCII control characters.
Magic | Compression type |
---|---|
df |
DEFLATE |
ARP defines a standard format for media type strings as a variant of the syntax defined in section 5.1 of RFC 2045.
This format differs from the RFC in that the preceding "Content-Type:" is excluded, and the optional parameter is
excluded. As in the standard, ARP media type strings MUST be present in the IANA media types registry unless the
format component is prefixed with x-
.
ARP media types strings SHALL be encoded and interpreted as ASCII strings.
The ARP specification references the mime.types
file present in Apache's httpd
project as it appears in
revision 1884192
for media type mapping. Additionally, the ARP specification MAY define other mappings which
SHALL take precedence over the mime.types
file. These supplementary mappings are listed below.
Packers SHOULD provide a mechanism for user-defined mappings to be used. These SHALL take precedence over all other mappings, if applicable, but SHOULD typically be used to supplementally map extensions not included in the mappings specified by the ARP standard.
Extension | Media type |
---|---|
(no extension) | application/octet-stream |
ARP defines an idiomatic way of referencing resources contained by an ARP package via the ARP identifier specification.
Note that while similar in appearance, ARP identifiers are not URIs. The URI specification requires a scheme, which is not provided for by the ARP identifier specification. ARP identifiers instead begin with a namespace, which is not semantically interchangable.
For the purposes of ARP identifiers and their componenents, the characters /
, \
, and :
SHALL be considered
reserved. The set of characters including all reserved characters along with control characters (including
U+0000
–U+001F
and U+007F
–U+009F
) SHALL be considered illegal. All characters not in the set of
illegal characters SHALL be considered legal.
An ARP identifier is defined by the following ABNF rules (as specified by RFC 5234):
ARP-identifier = namespace ":" path
namespace = 1*( idchar )
path = path-part *( "/" path-part )
path-part = 1*( idchar )
idchar = ALPHA / DIGIT / symchar / cschar
symchar = %x20-2E / %x3C-40 / %x5B / %x5D-60 / %x7B-7E
cschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
Additionally, the first character of a namespace SHOULD be a letter as defined by the Unicode specification. The ARP specification does not restrict usage to any given version of the Unicode specification.
As an example, a package has a namespace of foo
, a resource in the root called bar
, and a directory in the root
called baz
. The baz
directory contains a resource called qux
. The path referencing the resource bar
is
foo:bar
, and the path referencing the resource qux
is foo:baz/qux
.