-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the way VT control sequences are identified and dispatched #7276
Comments
I'm in favor. @zadjii-msft / @miniksa if you're in favor, one of you yank Triage 😄 |
It's clever, and I like it. It also sets us up for tabled dispatch (if we ever want something like that). Are there only ever 0-2 intermediates? I noticed that we support |
According to the DEC STD 070 specs, "conforming implementations shall recognize at least 3 intermediates", although the most I think I've seen in practice is 2 (e.g. designating the Portuguese character set is |
I'd rather not get bitten by the difference in |
Also, I'm fine with this proposal. When I originally put the state machine together this way, I didn't necessarily think quite this far ahead nor imagine that the parser would become quite this feature filled. It is likely about time that we have a better mechanism for sorting these things out. I like the proposal with the |
This PR changes the way VT control sequences are identified and dispatched, to be more efficient and easier to extend. Instead of parsing the intermediate characters into a vector, and then having to identify a sequence using both that vector and the final char, we now use just a single `uint64_t` value as the identifier. The way the identifier is constructed is by taking the private parameter prefix, each of the intermediate characters, and then the final character, and shifting them into a 64-bit integer one byte at a time, in reverse order. For example, the `DECTLTC` control has a private parameter prefix of `?`, one intermediate of `'`, and a final character of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and `0x73` respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control. The reason for storing them in reverse order, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). When in reverse order, this can easily be achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence. With 64 bits we have enough space for a private prefix, six intermediates, and the final char, which is way more than we should ever need (the _DEC STD 070_ specification recommends supporting at least three intermediates, but in practice we're unlikely to see more than two). With this new way of identifying controls, it should now be possible for every action code to be unique (for the most part). So I've also used this PR to clean up the action codes a bit, splitting the codes for the escape sequences from the control sequences, and sorting them into alphabetical order (which also does a reasonable job of clustering associated controls). ## Validation Steps Performed I think the existing unit tests should be good enough to confirm that all sequences are still being dispatched correctly. However, I've also manually tested a number of sequences to make sure they were still working as expected, in particular those that used intermediates, since they were the most affected by the dispatch code refactoring. Since these changes also affected the input state machine, I've done some manual testing of the conpty keyboard handling (both with and without the new Win32 input mode enabled) to make sure the keyboard VT sequences were processed correctly. I've also manually tested the various VT mouse modes in Vttest to confirm that they were still working correctly too. Closes #7276
🎉This issue was addressed in #7304, which has now been successfully released as Handy links: |
Description of the new feature/enhancement
The way VT sequences are currently processed, we build up a list of intermediate characters (and private parameter prefixes) in a wchar_t vector, and then pass both that vector and the final character through to the state machine engine to dispatch. The engine goes through a rather complicated process where it has to check for intermediates, and branch off into separate methods, just to figure out which command has actually been requested.
So to make things simpler, I want to propose we identify each control sequence with a single integer value, which is a combination of the private parameter prefix, the intermediates, and the final char. This identifier can then be used in a simple switch statement when dispatching commands, without having to worry about special case handling for intermediates.
Right now we only have a few commands that depend on intermediates and private prefixes, and it's already a bit of a pain to work with. Once we start adding more advanced commands, though, it's going to get a lot worse. So before we get to that point, I thought it would be worth refactoring the code to make it a little easier to extend.
Proposed technical implementation details (optional)
I've been investigating different ways of implementing this, and the option I found best was a
VTID
class with a singlesize_t
field holding the sequence value, and asize_t
cast operator that will return that value. It would also have a constexpr constructor that can build the value from a string, so the action code enums can easily be defined with something likeDECSTR_SoftReset = VTID("!p")
. Currently we're pretending thatDECSTR
is identified byp
, which is actually something else entirely.The actual structure of the sequence value is simply made up from the intermediate and final characters stored one byte at a time in reverse order. For example, the
DECTLTC
control has a private parameter prefix of?
, one intermediate of'
, and a final character ofs
. The ASCII values of those characters are 0x3F, 0x27, and 0x73 respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control.The reason for storing them in reverse, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). This is easily achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence.
The downside of the reverse order is it makes constructing the sequence in the state machine a little more complicated, because you need to keep track of how many characters have already been stored in the identifier (each new character needs to be shifted 8-bit further left). The way I've approached that was with a little
VTIDBuilder
class, that has both an accumulator for the value being built up, and a shift value that determine the offset at which each new character is added.And note that we don't need to worry too much about overflowing, because any sequence that wouldn't fit in a
size_t
would not be a valid sequence anyway. If it looks like it's going to overflow, we simply set the accumulator to zero, and stop accumulating further characters. When it's eventually dispatched, it'll just be ignored as an unrecognised identifier.This is quite a big change to the code base, but I do think it makes things simpler, and should make the process of adding new multicharacter control sequences a lot less hassle.
The text was updated successfully, but these errors were encountered: