Skip to content

A new, improved, port of the Qed editor for Unix, with UTF-8 processing

Notifications You must be signed in to change notification settings

phonologus/qed-new

Repository files navigation

The qed editor for Unix

This is a fresh port of the venerable qed editor for Unix, first written by Tom Duff, Rob Pike, Hugh Redelmeier and David Tilbrook at the University of Toronto in the late 1970's.

Qed is described completely in the qed(1) manpage. It is a line editor, in the tradition of ed, with multi-file editing, and a powerful, if somewhat cryptic, programming language.

The version here has been updated to compile on a modern ANSI/POSIX system, and has had the ability to work with UTF-8 encoded text added.

Many of the size-limits imposed by the original memory model have been removed, by dynamically allocating most in-memory containers on the heap, and growing them as required.

This port supersedes, and renders obsolete, my previous port here.

Building

Building and installing qed should be as easy as:

make clean && make && make install

Installation is into a self-contained directory $HOME/.qed To uninstall, make uninstall, or rm -rf $HOME/.qed.

The line . $HOME/.qed/env should be added to your .profile or .bash_profile.

The qed binary is installed into .qed/bin, and the manpage into .qed/man/man1. The startup file, and the library programs are in .qed/lib. The script .qed/env ensures .qed/bin is in PATH, and it sets up the QEDFILE and QEDLIB envrionment variables. Make changes to the env script to point QEDFILE and QEDLIB elsewhere if you need.

New Features

UTF-8

This verison of qed has two notions of what a "character" is, controlled by a new option a. If a is unset (which is the dafault), then qed assumes that a character is a UTF-8 encoded Unicode codepoint, and that the text being processed is valid UTF-8 encoded text. In this mode, Qed will throw a ?U error if it is asked to perform a character-oriented action on invalid UTF-8. It will still save and load invalid UTF-8, but will issue a warning notification !U.

If a is set, then qed assumes that a character is an 8-bit byte. In this case, qed can process any single-byte character encoding, such as Latin-1. Since qed can edit any single byte in this mode, it can be used to fix broken UTF-8 on a byte-by-byte basis.

The a option can be set from the commandline, with option -a, and/or it can be set or reset at any time by issuing the oas or oar commands, as with qed's other options.

BEWARE! Since the last used regular expression is saved in its compiled form for possible re-use, unexpected results can occur if the regular expression was first compiled in UTF-8 mode, and is reinvoked when in 8-bit mode, and vice versa.

BEWARE! The rendering of UTF-8 encoded text on-screen is entirely under the control of the terminal running qed. Qed itself has no understanding of Unicode semantics, such as character width or directionality. This means that some of the visually-orientated features of qed (mainly the x command's visual editing mode, and several of the programs in the library, such as paren.q) are likely to have unexpected outputs with text that is anything more exotic than single-width, non-combining characters.

Other new features

This qed has a new special character \0, followed by up to three octal digits, which allows the insertion of any byte value. The motivation for this is largely to be able to fix broken UTF-8 on a byte-by-byte basis in conjunction with setting the a option, although other more exotic transformations of UTF-8 text are certainly possible.

This qed has a new register command ] which complements the existing command [. Whereas [ puts the index of the first matched character into the Count register, ] puts the index of the last matched character into the Count register. For example:

za:kungfoo
za[/foo/
zCp
    4    "match starts at index 4
za]/foo/
zCp
    6    "match ends at index 6

Documentation

Qed is fully described in the included manpage qed(1). However, gaining a working knowledge of Qed is not trivial. For a tutorial on Qed's advanced features, see my edited and updated version of Rob Pike's original Qed Turorial here.

Sources

This version of qed is derived from the Research Unix Version 8 sources available from the Unix Archive here, and located in /usr/src/cmd/qed.

The manpage is derived from the Research Unix Version 10 manpage, also available from the Unix Archive here, and located in /man/manb/qed.1.

The library of qed programs is derived from Rob Pike's q directory, found in Arnold Robbins' qed-archive here. The library seems to be an updated version of a similar library that was released with an earlier version of the qed sourcecode at the 1980 Usenix Delaware conference, and available at the Unix Archive here, in directory boulder/caltech in the archive usenix_80_delaware.tar.gz. It is also available in Arnold Robbins' qed-archive here.

Authors

The original sourcecode for qed was written by Tom Duff, Rob Pike, Hugh Redelmeier and David Tilbrook at the University of Toronto in the late 1970's, based on the U. of T.'s version of the Unix Version 6 editor ed. The original sourcecode and documentation for this U. of T. ed was distributed along with the qed sourcecode in the 1980 Usenix bundle mentioned above, and is available here, in directory ./ed.

In 2024, Sean Jensen reformatted the authors' original sourcecode to be ANSI-compliant C, and made changes to (i) make it compile on an up-to-date ANSI/POSIX system; and (ii) to add new capabilities for processing UTF-8 encoded text.

He also added some text to the manpage describing these new capabilities.

The source files alloc.c, bytes.[ch], u.[ch] and utf.[ch] were written entirely by Sean Jensen.

About

A new, improved, port of the Qed editor for Unix, with UTF-8 processing

Resources

Stars

Watchers

Forks

Packages

No packages published