Skip to content

Commit

Permalink
Start writing formal specification for APE
Browse files Browse the repository at this point in the history
  • Loading branch information
jart committed Jul 20, 2024
1 parent 7996bf6 commit 29ce25c
Show file tree
Hide file tree
Showing 2 changed files with 272 additions and 1 deletion.
271 changes: 271 additions & 0 deletions ape/specification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
# Actually Portable Executable Specification v0.1

Actually Portable Executable (APE) is an executable file format that
polyglots the Windows Portable Executable (PE) format with a UNIX Sixth
Edition style shell script that doesn't have a shebang. This makes it
possible to produce a single file binary that executes on the stock
installations of the many OSes and architectures.

## Supported OSes and Architectures

- AMD64
- Linux
- MacOS
- Windows
- FreeBSD
- OpenBSD
- NetBSD
- BIOS

- ARM64
- Linux
- MacOS
- FreeBSD
- Windows (non-native)

## File Header

APE defines three separate file magics, all of which are 8 characters
long. Any file that starts with one of these magic values can be
considered an APE program.

### (1) APE MZ Magic

- ASCII: `MZqFpD='`
- Hex: 4d 5a 71 46 70 44 3d 27

This is the canonical magic used by almost all APE programs. It enables
maximum portability between OSes. When interpreted as a shell script, it
is assiging a single quoted string to an unused variable. The shell will
then ignore subsequent binary content that's placed inside the string.

It is strongly recommended that this magic value be immediately followed
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
Zsh impose a binary safety check before handing off files that don't
have a shebang to `/bin/sh`. That check applies to the first line, which
can't contain NUL characters.

The letters were carefully chosen so as to be valid x86 instructions in
all operating modes. This makes it possible to store a BIOS bootloader
disk image inside an APE binary. For example, simple CLI programs built
with Cosmopolitan Libc will boot from BIOS into long mode if they're
treated as a floppy disk image.

The letters also allow for the possibility of being treated on x86-64 as
a flat executable, where the PE / ELF / Mach-O executable structures are
ignored, and execution simply begins at the beginning of the file,
similar to how MS-DOS .COM binaries work.

The 0x4a relative offset of the magic causes execution to jump into the
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
Libc use tricks in the MS-DOS stub to check the operating mode and then
jump to the appropriate entrypoint, e.g. `_start()`.

#### Decoded as i8086

```asm
dec %bp
pop %dx
jno 0x4a
jo 0x4a
```

#### Decoded as i386

```asm
push %ebp
pop %edx
jno 0x4a
jo 0x4a
```

#### Decoded as x86-64

```asm
rex.WRB
pop %r10
jno 0x4a
jo 0x4a
```

### (2) APE UNIX-Only Magic

- ASCII: `jartsr='`
- Hex: 6a 61 72 74 73 72 3d 27

Being a novel executable format that was first published in 2020, the
APE file format is less understood by industry tools compared to the PE,
ELF, and Mach-O executable file formats, which have been around for
decades. For this reason, APE programs that use the MZ magic above can
attract attention from Windows AV software, which may be unwanted by
developers who aren't interested in targeting the Windows platform.
Therefore the `jartsr='` magic is defined which enables the creation of
APE binaries that can safely target all non-Windows platforms. Even
though this magic is less common, APE interpreters and binfmt-misc
installations MUST support this.

It is strongly recommended that this magic value be immediately followed
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
Zsh impose a binary safety check before handing off files that don't
have a shebang to `/bin/sh`. That check applies to the first line, which
can't contain NUL characters.

The letters were carefully chosen so as to be valid x86 instructions in
all operating modes. This makes it possible to store a BIOS bootloader
disk image inside an APE binary. For example, simple CLI programs built
with Cosmopolitan Libc will boot from BIOS into long mode if they're
treated as a floppy disk image.

The letters also allow for the possibility of being treated on x86-64 as
a flat executable, where the PE / ELF / Mach-O executable structures are
ignored, and execution simply begins at the beginning of the file,
similar to how MS-DOS .COM binaries work.

The 0x78 relative offset of the magic causes execution to jump into the
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
Libc use tricks in the MS-DOS stub to check the operating mode and then
jump to the appropriate entrypoint, e.g. `_start()`.

#### Decoded as i8086 / i386 / x86-64

```asm
push $0x61
jb 0x78
jae 0x78
```

### (3) APE Debug Magic

- ASCII: `APEDBG='`
- Hex: 41 50 45 44 42 47 3d 27

While APE files must be valid shell scripts, in practice, UNIX systems
will oftentimes be configured to provide a faster safer alternative to
loading an APE binary through `/bin/sh`. The Linux Kernel can be patched
to have execve() recognize the APE format and directly load its embedded
ELF header. Linux systems can also use binfmt-misc to recognize APE's MZ
and jartsr magic, and pass them to a userspace program named `ape` that
acts as an interpreter. In such environments, the need sometimes arises
to be able to test that the `/bin/sh` is working correctly, in which
case the `APEDBG='` magic is RECOMMENDED.

APE interpreters, execve() implementations, and binfmt-misc installs
MUST ignore this magic. If necessary, steps can be taken to help files
with this magic be passed to `/bin/sh` like a normal shebang-less shell
script for execution.

## Embedded ELF Header

APE binaries MAY embed an ELF header inside them. Unlike conventional
executable file formats, this header is not stored at a fixed offset.
It's instead encoded as octal escape codes in a shell script `printf`
statement. For example:

```
printf '\177ELF\2\1\1\011\0\0\0\0\0\0\0\0\2\0\076\0\1\0\0\0\166\105\100\000\000\000\000\000\060\013\000\000\000\000\000\000\000\000\000\000\000\000\000\000\165\312\1\1\100\0\070\0\005\000\0\0\000\000\000\000'
```

This `printf` statement MUST appear in the first 8192 bytes of the APE
executable, so as to limit how much of the initial portion of a file an
intepreter must load.

Multiple such `printf` statements MAY appear in hte first 8192 bytes, in
order to specify multiple architectures. For example, fat binaries built
by the `apelink` program (provided by Cosmo Libc) will have two encoded
ELF headers, for amd64 and arm64, each of which point into the proper
file offsets for their respective native code. Therefore, kernels and
interpreters which load the APE format directly MUST check the
`e_machine` field of the `Elf64_Ehdr` that's decoded from the octal
codes, before accepting a `printf` shell statement as valid.

These printf statements MUST always use only unescaped ASCII characters
or octal escape codes. These printf statements MUST NOT use space saving
escape codes such as `\n`. For example, rather than saying `\n` it would
be valid to say `\012` instead. It's also valid to say `\12` but only if
the encoded characters that follow aren't an octal digit.

For example, the following algorithm may be used for parsing octal:

```c
static int ape_parse_octal(const unsigned char page[8192], int i, int *pc)
{
int c;
if ('0' <= page[i] && page[i] <= '7') {
c = page[i++] - '0';
if ('0' <= page[i] && page[i] <= '7') {
c *= 8;
c += page[i++] - '0';
if ('0' <= page[i] && page[i] <= '7') {
c *= 8;
c += page[i++] - '0';
}
}
*pc = c;
}
return i;
}
```
APE aware interpreters SHOULD only take `e_machine` into consideration.
It is the responsibility of the `_start()` function to detect the OS.
Therefore, multiple `printf` statements are only embedded in the shell
script for different CPU architectures.
The OS ABI field of an APE embedded `Elf64_Ehdr` SHOULD be set to
`ELFOSABI_FREEBSD`, since it's the only UNIX OS APE supports that
actually checks the field. However different values MAY be chosen for
binaries that don't intend to have FreeBSD in their support vector.
Counter-intuitively, the ARM64 ELF header is used on the MacOS ARM64
platform when loading from fat binaries.
## Embedded Mach-O Header (x86-64 only)
APE shell scripts that support MacOS on AMD64 must use the `dd` command
in a very specific way to specify how the embedded binary Macho-O header
is copied backward to the start of the file. For example:
```
dd if="$o" of="$o" bs=8 skip=433 count=66 conv=notrunc
```
These `dd` statements have traditionally been generated by the GNU as
and ld.bfd programs by encoding ASCII into 64-bit linker relocations,
which necessitated a fixed width for integer values. It took several
iterations over APE's history before we eventually got it right:
- `arg=" 9293"` is how we originally had ape do it
- `arg=$(( 9293))` b/c busybox sh disliked quoted space
- `arg=9293 ` is generated by modern apelink program
Software that parses the APE file format, which needs to extract to be
able extract the Macho-O x86-64 header SHOULD support the old binaries
that use the previous encodings. To make backwards compatibility simple
the following regular expression may be used, which generalizes to all
defined formats:
```c
regcomp(&rx,
"bs=" // dd block size arg
"(['\"] *)?" // #1 optional quote w/ space
"(\\$\\(\\( *)?" // #2 optional math w/ space
"([[:digit:]]+)" // #3
"( *\\)\\))?" // #4 optional math w/ space
"( *['\"])?" // #5 optional quote w/ space
" +" //
"skip=" // dd skip arg
"(['\"] *)?" // #6 optional quote w/ space
"(\\$\\(\\( *)?" // #7 optional math w/ space
"([[:digit:]]+)" // #8
"( *\\)\\))?" // #9 optional math w/ space
"( *['\"])?" // #10 optional quote w/ space
" +" //
"count=" // dd count arg
"(['\"] *)?" // #11 optional quote w/ space
"(\\$\\(\\( *)?" // #12 optional math w/ space
"([[:digit:]]+)", // #13
REG_EXTENDED);
```

For further details, see the canonical implementation in
`cosmopolitan/tool/build/assimilate.c`.
2 changes: 1 addition & 1 deletion libc/thread/pthread_cond_signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
* pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
* // ...
* pthread_mutex_lock(&lock);
* pthread_cond_signal(&cond, &lock);
* pthread_cond_signal(&cond);
* pthread_mutex_unlock(&lock);
*
* This function has no effect if there aren't any threads currently
Expand Down

0 comments on commit 29ce25c

Please sign in to comment.