Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unword to typeset #104

Merged
merged 6 commits into from
Sep 22, 2015
Merged

Unword to typeset #104

merged 6 commits into from
Sep 22, 2015

Conversation

hostilefork
Copy link
Member

These changes should ideally have no user visible effect besides fixing some crashes. They take what was formerly a special kind of WORD! (known as an UNWORD) which contained information for a typeset, and turns it around so that those are actually real TYPESET!s which can optionally contain a word symbol number.

It also initiates some renaming by way these "special" values (once words, now typeset) are referred to not as "words" but as "keys" and "params". Most definitions and callsites are simplified and easier to check. The bits for protecting and hiding also move off of being a "word thing" to being a "typeset thing", so something could be considered const as being part of its datatype.

While the change has general benefits, it was motivated by a need for typesets to be able to expand to more than 64 types if they need to...and to try and make all typeset-related things code operate on values that were actually focused on being TYPESET!.

This is essentially just a renaming so that the "_WORDS" list for
objects becomes the "_KEYS" list, and the "_WORDS" list for
functions becomes the "_PARAMS" list.  The series themselves
are called KEYLIST and PARAMLIST respectively.

Motivation for this change is that the special kind of word used in
object frames doesn't have very much in common with WORD!
besides having a symbol number.  It's a lot more like a TYPESET!.
Similarly for params in functions, though they are slightly more
ANY-WORD! like because they store the word subtype to know
how to treat the parameter (get words and lit-words for quoting, for
example.)  In particular, if these words ever leak out to the system
they will crash, e.g. the "x" that escapes and makes it into this error:

     >> foo: func [x [series! any-object!]] []
     >> bind? select to-object try [foo 1] 'arg2
     == Segmentation fault: 11

The nature of the change is to switch these words to typesets which
happen to have an extra symbol.  Hence instead of an OPTS_UNWORD
the cue that it holds a typeset will be that it is actually a REB_TYPESET.
It also goes ahead and commits corrections to when VAL_WORD_SYM
was used when VAL_BIND_SYM was intended, the latter being what
will eventually be VAL_TYPESET_SYM.

It would have been possible to name all the variables "typeset", but as
with the unword it's a little different...so key and param likely are better.
Static typing in the C++ build will be able to smooth over any confusion
about what type the key is.
An object or function frame contains a symbol, and lives at a certain
index.  This index gets put into bound words along with the frame
pointer when a binding happens.

There wasn't any assertion in the system that when you went to look
up the frame key at an index that the symbol matches.  It seemed a
prudent one to add, when shuffling the keys from being an "UNWORD"
to being typesets...though it may be worthwhile to keep around for
a while even after that settles.
Get_Action_Word used to return an UNWORD out of the frame
keys directly.  When leaked into the system they could cause crashes
if the encoded type information were misinterpreted as binding info.

Also, in the pending change where TYPESET! takes the place of an
UNWORD, if it didn't create a new REBVAL you'd get a typeset...
This is a similar-in-spirit change to the one for Get_Action_Word with
a similar reason: the routine was giving back UNWORDs directly from
the frame keys, with potential crash if used.
This commit does the main work after the "pre-renaming"...for ease of
inspection.  It still has another "post-renaming" for those renames
that only make sense after the change.  Summarizing:

Historically there has been a special variation of WORD! used to store
the keys of object frames and the parameters+locals of a function.  It
had in common that in the structure it had the same symbol number
field that regular words did, but instead of binding information it held
a typeset.  It held this 64-bit typeset struct field at the same offset that
actual TYPESET!s did, so functions could run in common.

Experience bore out that this was both confusing and dangerous.  The
unword would leak easily to code that wasn't sensitive to its special
status, and crash using type bits as binding.  People reading code and
trying to follow it would not be cued very well to knowing that what they
were looking at was not "a word as they knew it".

The pressing reason for the change was contemplating enhancements
to typesets so they could expand out to more than just 64 flag bits of
information on types (which is a nice optimization for common cases
but not forward-looking in a future with user-defined types).  It makes
sense to think of the typeset vocabulary and mechanics growing...and
in that point of view it makes more sense to see these keys as "typesets
with a word symbol ID in them" as opposed to "words that can't bind and
have typesets in them".

(Also: some things, like a <protected> status, makes sense being in the
representation of the typeset more than being a flag inside of a word.
For instance there's how `const` is part of the datatype in C.)

One issue from having it be switched to a "typeset plus symbol" is that
the frame keys lose the SET, GET, LIT, REFINEMENT distinction.  So
the properties those distinctions made are part of the typeset EXT bits.
This renames VAL_BIND* routines to the more informative and
pattern-following VAL_TYPESET* names.  It also makes the bits
access of typeset more obvious with VAL_TYPESET_BITS instead
of just using VAL_TYPESET, which will also generalize better for
a typeset that may be capable of storing its list another way.

The BUF_WORDS buffer was used both for collecting keys of objects
(once unwords, now typesets) as well as in the Collect_Words routines
where they are still collecting actual REB_WORDs.  It doesn't make
sense to have two such buffers just because of the type difference
when they don't overlap, hence this is changed to BUF_COLLECT to
indicate appropiate type-neutrality.

Also, one user-visible error that should have been an assert was
converted to an assert.
@hostilefork hostilefork merged commit 9ad1f07 into metaeducation:master Sep 22, 2015
@hostilefork hostilefork deleted the unword-to-typeset branch September 28, 2015 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants