Skip to content

snunez1/binary-types

 
 

Repository files navigation


BINARY-TYPES

A system for declarative specification of binary file readers and writers

Report Bugs · Request Feature · Reference Manual

Table of Contents

  1. About the Project
  2. Installation
  3. Using
  4. Performance
  5. Roadmap
  6. Contributing
  7. License
  8. Contact

About the Project

BINARY-TYPES is a Common Lisp system for reading and writing binary files. Binary-types provides macros that are used to declare the mapping between lisp objects and most binary (i.e. octet-based) representations. Binary-types is not helpful in reading files with variable bit-length code-words, such as most compressed file formats. It will basically only work with file-formats based on 8-bit bytes (octets).

Objectives

Support most kinds of binary types including:

  • Signed and unsigned integers of any octet-size, big-endian or little-endian. Maps to lisp integers.

  • Enumerated types based on any integer type. Maps to lisp symbols.

  • Complex bit-field types based on any integer type. Sub-fields can be numeric, enumerated, or bit-flags. Maps to lisp lists of symbols and integers.

  • Fixed-length and null-terminated strings. Maps to lisp strings.

  • Compound records of other binary types. Maps to lisp DEFCLASS classes or, when you prefer, DEFSTRUCT structs.

  • Vectors and arrays

  • 32 and 64 bit IEEE-754 floats map to lisp single-float and double-float.

  • NaN and infinities

History

BINARY-TYPES was developed over the years 1999-2003 by Frode Vatvedt Fjeld [email protected] whilst working at the Department of Computer Science, University of Tromsø, Norway. It later served as the basis for Chapter 24: Parsing Binary Files of the book Practical Common Lisp by Peter Seibel. That chapter makes a good technical reference for the system, and you should read it if you want to extend BINARY-TYPES.

Frode's version was sufficiently well done that the system went largely unchanged since except for some local additions for slitch a low-level networking library in 2003 and then again in a fork by Olof-Joachim Frahm in 2013 that added 128 and 256 bit integers.

This repository began in 2024 and adds support for 32/64 bit IEEE-754 floats, binary arrays, a test framework, improved documentation and refactored the repository/ASDF system.

Installation

This version of BINARY-TYPES is not the official QuickLisp version, so to install it you'll need to clone the source code.

To make the system accessible to ASDF (a build facility, similar to make in the C world), clone the repository in a directory ASDF knows about. By default the common-lisp directory in your home directory is known. Create this if it doesn't already exist and then:

  1. Clone the repositories
cd ~/common-lisp && \
git clone https://github.com/snunez1/binary-types.git
  1. Reset the ASDF source-registry to find the new system (from the REPL)
    (asdf:clear-source-registry)
  2. Load the system
    (asdf:load-system :binary-types)

If you have installed the slime ASDF extensions, you can invoke this with a comma (',') from the slime REPL.

Who uses?

binary-types is used by several systems, including:

Using

Typically, a complete binary record format/type can be specified in a single (nested) declaration statement. Such compound records may then be read and written with READ-BINARY and WRITE-BINARY. So start with the specification for the binary file or stream and map each element. Here's a simple example to take the first two bytes of a file:

(define-binary-struct llama-config ()
  (dim        nil :binary-type u32)
  (hidden-dim nil :binary-type u32))

and, with that, we can read and print from the binary file with:

(let ((binary-types:*endian* :little-endian))
  (with-binary-file (stream #P"stories15M.bin" :direction :input)
    (let ((config (read-binary 'llama-config stream)))
      (format t "~A~%~A"
	      (slot-value config 'dim)
	      (slot-value config 'hidden-dim)))))

(Note: this isn't really the on-disk format for a llama LLM checkpoint, it's just an example for demonstration purposes.

Also see Chapter 24: Parsing Binary Files for an extended example.

Declaring classes and structures

Binary types may be declared with the DEFINE-BINARY-CLASS macro, which has the same syntax and semantics as DEFCLASS, only there is an additional slot-option (named :BINARY-TYPE) that declares that slot's binary type. Note that the binary aspects of slots are not inherited (the semantics of inheriting binary slots is unspecified).

Another slot-option added by BINARY-TYPES is :MAP-BINARY-WRITE, which names a function (of two arguments) that is applied to the slot's value and the name of the slot's binary-type in order to obtain the value that is actually passed to WRITE-BINARY. Similarly, :MAP-BINARY-READ takes a function that is to be applied to the binary data and type-name when a record of that type is being read. A slightly modified version of :map-binary-read is :MAP-BINARY-READ-DELAYED, which will do essentially the same thing as :MAP-BINARY-READ, only the mapping will be "on-demand": A slot-unbound method will be created for this purpose.

A variation of the :BINARY-TYPE slot-option is :BINARY-LISP-TYPE, which does everything :BINARY-TYPE does, but also passes on a :TYPE slot-option to DEFCLASS (or DEFSTRUCT). The type-spec is inferred from the binary-type declaration. When using this mechanism, you should be careful to always provide a legal value in the slot (as you must always do when declaring slots' types). If you find this confusing, just use :BINARY-TYPE.

type hierarchy

Bitfields

My only problem is with DEF-BITFIELD. All other BINARY-TYPES features are intuitive and easy to use.

DEF-BITFIELD, because it isn't an oft-seen paradigm, can be confusing. I think that's because it's a bit complex and it's going to take some more using it to make certain it is the way it should be.

Basically DEF-BITFIELD divides a numeric base-type (typically an unsigned integer) into a number of fields, where each field is one of :BITS for bitmaps, :ENUM for an enumerated field (takes an optional :BYTE <bytespec>), and finally :NUMERIC <byte-size> <byte-pos> for a subfield that is a number.

Here are a couple of examples:

(define-bitfield r-info (u32)
              (((:enum :byte (8 0))
                 r-386-none     0
                 r-386-32       1
                 r-386-pc32     2
                 r-386-got32    3
                 r-386-plt32    4
                 r-386-copy     5
                 r-386-glob-dat 6
                 r-386-jmp-slot 7
                 r-386-relative 8
                 r-386-gotoff   9
                 r-386-gotpc    10)
                ((:numeric r-sym 24 8))))

This declares R-INFO to be an unsigned 32-bit number, divided into two fields. The first field resides in bits 0-7, and is one of the values r-386-xx. The second field is a numeric value that resides in bits 8-23. So this type R-INFO may for example have symbolic value (r-386-pc32 (r-sym . 1)), which translates to a numeric value of (logior 2 1<<8)) = 258.

Another example:

(define-bitfield p-flags (u8)
                (((:bits)
                  pf-x 0
                  pf-w 1
                  pf-r 2)))

Here P-FLAGS has just one bit-field, where bit 0 is named PF-X, bit 1 is named PF-W etc. So the value (PF-X PF-R) maps to 5.

Examples

The included file "example.lisp" demonstrates how to use this package. To give you a taste of what it looks like, the following declarations are enough to read the header of an ELF executable file with the form

(let ((*endian* :big-endian))
  (read-binary 'elf-header stream)

;;; ELF basic type declarations
(define-unsigned word 4)
(define-signed sword  4)
(define-unsigned addr 4)
(define-unsigned off  4)
(define-unsigned half 2)

;;; ELF file header structure
(define-binary-class elf-header ()
  ((e-ident
    :binary-type (define-binary-struct e-ident ()
           (ei-magic nil :binary-type
                 (define-binary-struct ei-magic ()
                   (ei-mag0 0 :binary-type u8)
                   (ei-mag1 #\null :binary-type char8)
                   (ei-mag2 #\null :binary-type char8)
                   (ei-mag3 #\null :binary-type char8)))
           (ei-class nil :binary-type
                 (define-enum ei-class (u8)
                   elf-class-none 0
                   elf-class-32   1
                   elf-class-64   2))
           (ei-data nil :binary-type
                (define-enum ei-data (u8)
                  elf-data-none 0
                  elf-data-2lsb 1
                  elf-data-2msb 2))
           (ei-version 0 :binary-type u8)
           (padding nil :binary-type 1)
           (ei-name "" :binary-type
                (define-null-terminated-string ei-name 8))))
   (e-type
    :binary-type (define-enum e-type (half)
           et-none 0
           et-rel  1
           et-exec 2
           et-dyn  3
           et-core 4
           et-loproc #xff00
           et-hiproc #xffff))
   (e-machine
    :binary-type (define-enum e-machine (half)
           em-none  0
           em-m32   1
           em-sparc 2
           em-386   3
           em-68k   4
           em-88k   5
           em-860   7
           em-mips  8))
   (e-version   :binary-type word)
   (e-entry     :binary-type addr)
   (e-phoff     :binary-type off)
   (e-shoff     :binary-type off)
   (e-flags     :binary-type word)
   (e-ehsize    :binary-type half)
   (e-phentsize :binary-type half)
   (e-phnum     :binary-type half)
   (e-shentsize :binary-type half)
   (e-shnum     :binary-type half)
   (e-shstrndx  :binary-type half)))

Performance

Performance has not really been a concern while designing this system. There's no obvious performance bottlenecks that we are aware of, but keep in mind that all "binary" reads and writes are reduced to individual 8-bit READ-BYTEs and WRITE-BYTEs. If you do identify particular performance bottlenecks, please raise an issue.

Roadmap

BINARY-TYPES is more or less feature complete. The only feature I have encountered, once, that wasn't handled is serialized hash-maps.

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated. Please see CONTRIBUTING for details on the code of conduct, and the process for submitting pull requests.

License

Distributed under the BSD-3-Clause License. See LICENSE for more information.

Contact

Project Link: https://github.com/snunez1/binary-types

About

Read and write binary records in Common Lisp

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages

  • Common Lisp 100.0%