Merge immediates into main working line.

Although the code size is an issue for the ATmega328P, I believe that can best be worked on through further structural changes, and it's easier to do those without trying to keep a large changeset in sync.
anarchodin · Mar 29, 2021 · 77fb98a · 77fb98a
2 parents ccff492 + ec73b9b
commit 77fb98a
Show file tree

Hide file tree

Showing 46 changed files with 425 additions and 199 deletions.
diff --git a/doc/immediates.md b/doc/immediates.md
@@ -0,0 +1,108 @@
+# Immediate values
+
+Like other Lisps, but unlike languages like C, uLisp attaches type information
+to all of its runtime objects. Originally, this was done in a fairly
+straightforward and uniform fashion: All objects were represented by two values
+of the same size as the underlying machine’s memory pointers. Cons cells, which
+always point to other values, had actual pointers in both cells, while other
+kinds of objects had a type tag in the `car` cell and a representation of the
+object in the `cdr` cell. This representation is simple to understand, but it is
+wasteful: To represent an n-bit number we use two n-bit values! There is,
+however, a way to increase the efficiency of this scheme for cases that warrant
+the effort but retain most of the simplicity where it isn’t.
+
+Notice that even in the simple version we need to have an infallible way to
+distinguish between type tags and memory pointers. This turns out to be possible
+in very common cases due to alignment constraints. To take an example, if we
+ensure that our table of uLisp objects _starts_ at an even memory address, no
+valid pointer to a uLisp object will ever be an odd number – even on 8-bit
+machines, pointers are always at least two bytes. As it happens, uLisp _already_
+uses the lowest bit – the one controlling whether a number is even or odd – for
+[garbage collection](http://www.ulisp.com/show?1BD3). That’s fine by us:
+Instead, we ensure that the table starts at a multiple of four. You see, a uLisp
+object is two pointers, so if the first address is a multiple of four, all of
+them are. This enables us to use the other bits to represent objects in a way
+that doesn’t require memory to be allocated. Instead of representing n-bit
+numbers with 2n bits we can now represent numbers with n-2 bits in n
+bits. That’s considerably less wasteful, and all we have to do to know that it
+_is_ a number is to check if the bit signifying 2 is set. Great!
+
+There is one catch: Remember those type tags from earlier? We still need those
+for our remaining boxed values. Therefore, in order to properly gain, we can’t
+just stuff numbers into the n-2 bits. We have to be able, again, to tell type
+tags and numbers apart with certainty. We’ll have to sacrifice an additional
+bit. Having done that, to figure out whether the value is a number or a type
+tag, we need to look at two bits, those representing 6. Since it’s two bits,
+there are four possible combinations, but two of those – 0 and 4 – represent a
+pointer, not an immediate value. Let’s say that if the value is 2, we have a
+number, and if it’s 6 we have a type tag. Then we have n-3 (13, 16, 61) bit
+numbers, which seems reasonable. But now a question arises - do we need that
+many bits to distinguish between types?
+
+Well, no. And we might want to stick other kinds of values into immediates –
+like, say, character values. If using two 16-bit values to represent a 16-bit
+number felt wasteful, using those same 32 bits to represent an 8-bit value is
+even worse. So we extend this system: Ignoring the lowest-order bit, an
+immediate object’s type is identified by the first unset bit. This means we can
+check types with fairly simple bitmasks – all the values used are powers of two,
+minus two.
+
+## Implemented immediate types
+
+uLisp has implementations that use 16-bit, 32-bit and 64-bit pointers. The
+differences in size mean that some aspects of the implementation differ between
+platforms. In particular, it is possible to encode user-defined symbols in an
+immediate value on the larger machines, which is not reasonable for 16-bit
+pointers. For this reason, the immediate types do vary by bit-size. They do not
+vary by anything else, however. Certain fundamentals are shared - fixnums always
+have the same tag, for example.
+
+### 16-bit
+
+- Fixnums are thirteen-bit signed integers. `(eql (logand fixnum 6) 2)`
+- Built-in symbols are eleven-bit values. `(eql (logand symbol 30) 14)`
+- Characters are eight-bit unsigned integers. `(eql (logand byte 254) 126)`
+
+### 32-bit
+
+- Fixnums are 29-bit signed integers. `(eql (logand fixnum 6) 2)`
+- Symbols are packed into 27 bits. `(eql (logand symbol 30) 14)`
+- Characters are 21-bit unsigned integers. `(eql (logand unicode 2046) 1022)`
+
+### 64-bit
+
+For now, 64-bit platforms use the same tags as 32-bit systems.
+
+## Fixnums
+
+Fixnums are the original impetus for immediates. They get three fewer bits than
+platform pointers, and are two’s complement signed integers that do not need to
+be allocated from the workspace. This can save significant memory, particularly
+when combined with arrays. Aside from the size differences, there are no real
+platforms specifics.
+
+## Symbols
+
+The implementation of symbols in uLisp is somewhat unorthodox – there is a
+strong distinction between symbols that are built in to uLisp and two types of
+user symbols. On 32-bit and 64-bit platforms, all symbols are immediate
+values. On smaller platforms, only built-in symbols are immediate values, with
+user symbols being boxed values.
+
+Using immediate values for built-in symbols on all platforms ensures that their
+exact representation is known at compile-time, which is useful in various parts
+of the uLisp internals.
+
+## Possible extensions
+
+### Parametric types
+
+The type tags, especially on the larger platforms, are _much_ larger than they
+need themselves. It is possible to specify a fixed number of bits to be used for
+the type tag itself, and allocate the rest of the bits to some kind of
+parameters for the type. A potential use for the platforms that could feasibly
+run their own compiler would be to stash calling convention information about
+functions in there.
+
+It might also, by complicating the memory allocation mechanism a little, be used
+to carry size information for contiguous allocations larger than a single cell.
diff --git a/functions/arm/restarti2c.c b/functions/arm/restarti2c.c
@@ -8,7 +8,7 @@ object *fn_restarti2c (object *args, object *env) {
   I2CCount = 0;
   if (args != NULL) {
     object *rw = first(args);
-    if (integerp(rw)) I2CCount = rw->integer;
+    if (intp(rw)) I2CCount = getint(rw);
     read = (rw != NULL);
   }
   int address = stream & 0xFF;

diff --git a/functions/arm/withi2c.c b/functions/arm/withi2c.c
@@ -14,7 +14,7 @@ object *sp_withi2c (object *args, object *env) {
   I2CCount = 0;
   if (params != NULL) {
     object *rw = eval(first(params), env);
-    if (integerp(rw)) I2CCount = rw->integer;
+    if (intp(rw)) I2CCount = getint(rw);
     read = (rw != NULL);
   }
   // Top bit of address is I2C port

diff --git a/functions/restarti2c.c b/functions/restarti2c.c
@@ -8,7 +8,7 @@ object *fn_restarti2c (object *args, object *env) {
   I2CCount = 0;
   if (args != NULL) {
     object *rw = first(args);
-    if (integerp(rw)) I2CCount = rw->integer;
+    if (intp(rw)) I2CCount = getint(rw);
     read = (rw != NULL);
   }
   int address = stream & 0xFF;

diff --git a/functions/withi2c.c b/functions/withi2c.c
@@ -10,7 +10,7 @@ object *sp_withi2c (object *args, object *env) {
   I2CCount = 0;
   if (params != NULL) {
     object *rw = eval(first(params), env);
-    if (integerp(rw)) I2CCount = rw->integer;
+    if (intp(rw)) I2CCount = getint(rw);
     read = (rw != NULL);
   }
   I2Cinit(1); // Pullups

diff --git a/platforms.lisp b/platforms.lisp
@@ -9,7 +9,7 @@
 
 (defparameter *platforms*
   '((:avr
-     (:types zzero symbol number stream character string pair)
+     (:types zzero symbol number stream string pair)
      (:streams serial i2c spi sd)
      (:keywords
       ("CPU_ATmega328P"
@@ -33,7 +33,7 @@
        (ANALOGREAD ADC_DAC0 ADC_TEMPERATURE)))
      (:features :dacreference))
     (:arm
-     (:types zzero symbol code number stream character float array string pair)
+     (:types zzero code number stream float array string pair)
      (:streams serial i2c spi sd string gfx)
      (:keywords
       ("CPU_ATSAMD21"
@@ -71,7 +71,7 @@
        (ANALOGREFERENCE DEFAULT EXTERNAL)))
      (:features :float :gfx :code :array :stringstream :write-resolution))
     (:esp
-     (:types zzero symbol number stream character float array string pair)
+     (:types zzero number stream float array string pair)
      (:streams serial i2c spi sd wifi string gfx)
      (:keywords 
       ("ESP8266"
@@ -82,7 +82,7 @@
        (PINMODE INPUT INPUT_PULLUP INPUT_PULLDOWN OUTPUT)))
      (:features :float :gfx :code :array :stringstream :ethernet))
     (:riscv
-     (:types zzero symbol code number stream character float array string pair)
+     (:types zzero code number stream float array string pair)
      (:streams serial i2c spi sd string gfx)
      (:keywords
       (nil

diff --git a/preface.lisp b/preface.lisp
@@ -5,13 +5,28 @@
 ;; FIXME: This belongs elsewhere.
 (defvar *maximum-trace-count* 3 "The number of functions that can be traced at one time.")
 
-;; NOTE: Done as CPP macros rather than an enum in preparation for further changes.
+(defun print-tokens (platform &optional (stream *standard-output*))
+  "Output token definitions for a given platform."
+  (let* ((byte-size (if (eq platform :avr) 16 32))
+         (shift-size (if (= byte-size 16) 12 28))
+         (base-num (if (= byte-size 16) #x7FE #x7FFFFFE))
+         (num-length (if (= byte-size 16) 4 8)))
+    (loop for token in '(:BRA :KET :QUO :DOT)
+          for i from 0
+          do (format stream "#define ~a 0x~v,'0x~%"
+                     token num-length
+                     (logior (ash i shift-size)
+                             base-num)))))
+
 (defun print-types (typelist &optional (stream *standard-output*))
   "Output type definitions for the given types."
-  (format stream "~&~%// Types~%")
+  (format stream "~&~%// Type identifiers. Four last bits fixed at 6.~%")
   (let ((value -1))
     (dolist (type typelist)
-      (format stream "#define ~a ~d~%" type (ash (incf value) 1))))
+      (format stream "#define ~a ~d // (~d << 4 | 6)~%"
+              type
+              (logior (ash (incf value) 4) 6)
+              value)))
   (terpri stream))
 
 (defun print-streams (streamlist &optional (stream *standard-output*) (margin 90))
@@ -29,6 +44,6 @@
   (format stream "~&~%// Constants~%~%")
   (format stream "const int TRACEMAX = ~d; // Number of traced functions~%" *maximum-trace-count*)
   (print-types (get-types platform) stream)
-  (write-line "enum token { UNUSED, BRA, KET, QUO, DOT };" stream)
+  (print-tokens platform stream)
   (print-streams (get-streams platform) stream)
   (terpri stream))
diff --git a/sections/arm/setup.c b/sections/arm/setup.c
@@ -12,7 +12,6 @@ void initgfx () {
 
 void initenv () {
   GlobalEnv = NULL;
-  tee = symbol(TEE);
 }
 
 void setup () {

diff --git a/sections/array.c b/sections/array.c
@@ -28,13 +28,13 @@ object *makearray (symbol_t name, object *dims, object *def, bool bitp) {
   int size = 1;
   object *dimensions = dims;
   while (dims != NULL) {
-    int d = car(dims)->integer;
+    int d = getint(car(dims));
     if (d < 0) error2(MAKEARRAY, PSTR("dimension can't be negative"));
     size = size * d;
     dims = cdr(dims);
   }
   // Bit array identified by making first dimension negative
-  if (bitp) { size = (size + 31)/32; car(dimensions) = number(-(car(dimensions)->integer)); }
+  if (bitp) { size = (size + 31)/32; car(dimensions) = number(-getint(car(dimensions))); }
   object *ptr = myalloc();
   ptr->type = ARRAY;
   object *tree = nil;
@@ -66,7 +66,7 @@ object **getarray (symbol_t name, object *array, object *subs, object *env, int
   bool bitp = false;
   object *dims = cddr(array);
   while (dims != NULL && subs != NULL) {
-    int d = car(dims)->integer;
+    int d = getint(car(dims));
     if (d < 0) { d = -d; bitp = true; }
     if (env) s = checkinteger(name, eval(car(subs), env)); else s = checkinteger(name, car(subs));
     if (s < 0 || s >= d) error(name, PSTR("subscript out of range"), car(subs));
@@ -87,7 +87,7 @@ object **getarray (symbol_t name, object *array, object *subs, object *env, int
   rslice - reads a slice of an array recursively
 */
 void rslice (object *array, int size, int slice, object *dims, object *args) {
-  int d = first(dims)->integer;
+  int d = getint(first(dims));
   for (int i = 0; i < d; i++) {
     int index = slice * d + i;
     if (!consp(args)) error2(0, PSTR("initial contents don't match array type"));
@@ -144,7 +144,7 @@ object *readbitarray (gfun_t gfun) {
   while (head != NULL) {
     object **loc = arrayref(array, index>>5, size);
     int bit = index & 0x1F;
-    *loc = number((((*loc)->integer) & ~(1<<bit)) | (car(head)->integer)<<bit);
+    *loc = number(((getint(*loc)) & ~(1<<bit)) | (getint(car(head)))<<bit);
     index++;
     head = cdr(head);
   }
@@ -157,13 +157,13 @@ object *readbitarray (gfun_t gfun) {
 void pslice (object *array, int size, int slice, object *dims, pfun_t pfun, bool bitp) {
   bool spaces = true;
   if (slice == -1) { spaces = false; slice = 0; }
-  int d = first(dims)->integer;
+  int d = getint(first(dims));
   if (d < 0) d = -d;
   for (int i = 0; i < d; i++) {
     if (i && spaces) pfun(' ');
     int index = slice * d + i;
     if (cdr(dims) == NULL) {
-      if (bitp) pint(((*arrayref(array, index>>5, size))->integer)>>(index & 0x1f) & 1, pfun);
+      if (bitp) pint((getint(*arrayref(array, index>>5, size)))>>(index & 0x1f) & 1, pfun);
       else printobject(*arrayref(array, index, size), pfun);
     } else { pfun('('); pslice(array, size, index, cdr(dims), pfun, bitp); pfun(')'); }
   }
@@ -178,7 +178,7 @@ void printarray (object *array, pfun_t pfun) {
   bool bitp = false;
   int size = 1, n = 0;
   while (dims != NULL) {
-    int d = car(dims)->integer;
+    int d = getint(car(dims));
     if (d < 0) { bitp = true; d = -d; }
     size = size * d;
     dims = cdr(dims); n++;

diff --git a/sections/avr/setup.c b/sections/avr/setup.c
@@ -2,7 +2,6 @@
 
 void initenv () {
   GlobalEnv = NULL;
-  tee = symbol(TEE);
 }
 
 void setup () {

diff --git a/sections/closure.c b/sections/closure.c
@@ -3,21 +3,21 @@
 object *value (symbol_t n, object *env) {
   while (env != NULL) {
     object *pair = car(env);
-    if (pair != NULL && car(pair)->name == n) return pair;
+    if (pair != NULL && getname(car(pair)) == n) return pair;
     env = cdr(env);
   }
   return nil;
 }
 
 bool boundp (object *var, object *env) {
-  symbol_t varname = var->name;
+  symbol_t varname = getname(var);
   if (value(varname, env) != NULL) return true;
   if (value(varname, GlobalEnv) != NULL) return true;
   return false;
 }
 
 object *findvalue (object *var, object *env) {
-  symbol_t varname = var->name;
+  symbol_t varname = getname(var);
   object *pair = value(varname, env);
   if (pair == NULL) pair = value(varname, GlobalEnv);
   if (pair == NULL) error(0, PSTR("unknown variable"), var);
@@ -55,7 +55,7 @@ object *closure (int tc, symbol_t name, object *state, object *function, object
   while (params != NULL) {
     object *value;
     object *var = first(params);
-    if (symbolp(var) && var->name == OPTIONAL) optional = true;
+    if (symbolp(var) && getname(var) == OPTIONAL) optional = true;
     else {
       if (consp(var)) {
         if (!optional) error(name, PSTR("invalid default value"), var);
@@ -65,7 +65,7 @@ object *closure (int tc, symbol_t name, object *state, object *function, object
         if (!symbolp(var)) error(name, PSTR("illegal optional parameter"), var);
       } else if (!symbolp(var)) {
         error2(name, PSTR("illegal function parameter"));
-      } else if (var->name == AMPREST) {
+      } else if (getname(var) == AMPREST) {
         params = cdr(params);
         var = first(params);
         value = args;
@@ -90,7 +90,7 @@ object *closure (int tc, symbol_t name, object *state, object *function, object
 
 object *apply (symbol_t name, object *function, object *args, object *env) {
   if (symbolp(function)) {
-    symbol_t fname = function->name;
+    symbol_t fname = getname(function);
     if (fname < ENDKEYWORDS) {
       uint8_t callc = getcallc(fname);
       if (callc < 0x80) { // High bit not set, so normal function.

diff --git a/sections/compactimage.c b/sections/compactimage.c
@@ -28,7 +28,6 @@ void movepointer (object *from, object *to) {
 }
 
 uintptr_t compactimage (object **arg) {
-  markobject(tee);
   markobject(GlobalEnv);
   markobject(GCStack);
   object *firstfree = Workspace;

diff --git a/sections/esp/setup.c b/sections/esp/setup.c
@@ -11,7 +11,6 @@ void initgfx () {
 
 void initenv () {
   GlobalEnv = NULL;
-  tee = symbol(TEE);
 }
 
 void setup () {