Unword to typeset #104

hostilefork · 2015-09-22T18:05:04Z

These changes should ideally have no user visible effect besides fixing some crashes. They take what was formerly a special kind of WORD! (known as an UNWORD) which contained information for a typeset, and turns it around so that those are actually real TYPESET!s which can optionally contain a word symbol number.

It also initiates some renaming by way these "special" values (once words, now typeset) are referred to not as "words" but as "keys" and "params". Most definitions and callsites are simplified and easier to check. The bits for protecting and hiding also move off of being a "word thing" to being a "typeset thing", so something could be considered const as being part of its datatype.

While the change has general benefits, it was motivated by a need for typesets to be able to expand to more than 64 types if they need to...and to try and make all typeset-related things code operate on values that were actually focused on being TYPESET!.

This is essentially just a renaming so that the "_WORDS" list for objects becomes the "_KEYS" list, and the "_WORDS" list for functions becomes the "_PARAMS" list. The series themselves are called KEYLIST and PARAMLIST respectively. Motivation for this change is that the special kind of word used in object frames doesn't have very much in common with WORD! besides having a symbol number. It's a lot more like a TYPESET!. Similarly for params in functions, though they are slightly more ANY-WORD! like because they store the word subtype to know how to treat the parameter (get words and lit-words for quoting, for example.) In particular, if these words ever leak out to the system they will crash, e.g. the "x" that escapes and makes it into this error: >> foo: func [x [series! any-object!]] [] >> bind? select to-object try [foo 1] 'arg2 == Segmentation fault: 11 The nature of the change is to switch these words to typesets which happen to have an extra symbol. Hence instead of an OPTS_UNWORD the cue that it holds a typeset will be that it is actually a REB_TYPESET. It also goes ahead and commits corrections to when VAL_WORD_SYM was used when VAL_BIND_SYM was intended, the latter being what will eventually be VAL_TYPESET_SYM. It would have been possible to name all the variables "typeset", but as with the unword it's a little different...so key and param likely are better. Static typing in the C++ build will be able to smooth over any confusion about what type the key is.

An object or function frame contains a symbol, and lives at a certain index. This index gets put into bound words along with the frame pointer when a binding happens. There wasn't any assertion in the system that when you went to look up the frame key at an index that the symbol matches. It seemed a prudent one to add, when shuffling the keys from being an "UNWORD" to being typesets...though it may be worthwhile to keep around for a while even after that settles.

Get_Action_Word used to return an UNWORD out of the frame keys directly. When leaked into the system they could cause crashes if the encoded type information were misinterpreted as binding info. Also, in the pending change where TYPESET! takes the place of an UNWORD, if it didn't create a new REBVAL you'd get a typeset...

This is a similar-in-spirit change to the one for Get_Action_Word with a similar reason: the routine was giving back UNWORDs directly from the frame keys, with potential crash if used.

This commit does the main work after the "pre-renaming"...for ease of inspection. It still has another "post-renaming" for those renames that only make sense after the change. Summarizing: Historically there has been a special variation of WORD! used to store the keys of object frames and the parameters+locals of a function. It had in common that in the structure it had the same symbol number field that regular words did, but instead of binding information it held a typeset. It held this 64-bit typeset struct field at the same offset that actual TYPESET!s did, so functions could run in common. Experience bore out that this was both confusing and dangerous. The unword would leak easily to code that wasn't sensitive to its special status, and crash using type bits as binding. People reading code and trying to follow it would not be cued very well to knowing that what they were looking at was not "a word as they knew it". The pressing reason for the change was contemplating enhancements to typesets so they could expand out to more than just 64 flag bits of information on types (which is a nice optimization for common cases but not forward-looking in a future with user-defined types). It makes sense to think of the typeset vocabulary and mechanics growing...and in that point of view it makes more sense to see these keys as "typesets with a word symbol ID in them" as opposed to "words that can't bind and have typesets in them". (Also: some things, like a <protected> status, makes sense being in the representation of the typeset more than being a flag inside of a word. For instance there's how `const` is part of the datatype in C.) One issue from having it be switched to a "typeset plus symbol" is that the frame keys lose the SET, GET, LIT, REFINEMENT distinction. So the properties those distinctions made are part of the typeset EXT bits.

This renames VAL_BIND* routines to the more informative and pattern-following VAL_TYPESET* names. It also makes the bits access of typeset more obvious with VAL_TYPESET_BITS instead of just using VAL_TYPESET, which will also generalize better for a typeset that may be capable of storing its list another way. The BUF_WORDS buffer was used both for collecting keys of objects (once unwords, now typesets) as well as in the Collect_Words routines where they are still collecting actual REB_WORDs. It doesn't make sense to have two such buffers just because of the type difference when they don't overlap, hence this is changed to BUF_COLLECT to indicate appropiate type-neutrality. Also, one user-visible error that should have been an assert was converted to an assert.

hostilefork added 6 commits September 18, 2015 02:23

Eliminate Get_Type_Word, use SYM_FROM_TYPE instead

47ed5b2

This is a similar-in-spirit change to the one for Get_Action_Word with a similar reason: the routine was giving back UNWORDs directly from the frame keys, with potential crash if used.

hostilefork merged commit 9ad1f07 into metaeducation:master Sep 22, 2015

hostilefork deleted the unword-to-typeset branch September 28, 2015 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unword to typeset #104

Unword to typeset #104

hostilefork commented Sep 22, 2015

Unword to typeset #104

Unword to typeset #104

Conversation

hostilefork commented Sep 22, 2015