Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the way VT control sequences are identified and dispatched #7276

Closed
j4james opened this issue Aug 13, 2020 · 6 comments · Fixed by #7304
Closed

Refactor the way VT control sequences are identified and dispatched #7276

j4james opened this issue Aug 13, 2020 · 6 comments · Fixed by #7304
Labels
Area-VT Virtual Terminal sequence support Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.

Comments

@j4james
Copy link
Collaborator

j4james commented Aug 13, 2020

Description of the new feature/enhancement

The way VT sequences are currently processed, we build up a list of intermediate characters (and private parameter prefixes) in a wchar_t vector, and then pass both that vector and the final character through to the state machine engine to dispatch. The engine goes through a rather complicated process where it has to check for intermediates, and branch off into separate methods, just to figure out which command has actually been requested.

So to make things simpler, I want to propose we identify each control sequence with a single integer value, which is a combination of the private parameter prefix, the intermediates, and the final char. This identifier can then be used in a simple switch statement when dispatching commands, without having to worry about special case handling for intermediates.

Right now we only have a few commands that depend on intermediates and private prefixes, and it's already a bit of a pain to work with. Once we start adding more advanced commands, though, it's going to get a lot worse. So before we get to that point, I thought it would be worth refactoring the code to make it a little easier to extend.

Proposed technical implementation details (optional)

I've been investigating different ways of implementing this, and the option I found best was a VTID class with a single size_t field holding the sequence value, and a size_t cast operator that will return that value. It would also have a constexpr constructor that can build the value from a string, so the action code enums can easily be defined with something like DECSTR_SoftReset = VTID("!p"). Currently we're pretending that DECSTR is identified by p, which is actually something else entirely.

The actual structure of the sequence value is simply made up from the intermediate and final characters stored one byte at a time in reverse order. For example, the DECTLTC control has a private parameter prefix of ?, one intermediate of ', and a final character of s. The ASCII values of those characters are 0x3F, 0x27, and 0x73 respectively, and reversing them gets you 0x73273F, so that would then be the identifier for the control.

The reason for storing them in reverse, is because sometimes we need to look at the first intermediate to determine the operation, and treat the rest of the sequence as a kind of sub-identifier (the character set designation sequences are one example of this). This is easily achieved by masking off the low byte to get the first intermediate, and then shifting the value right by 8 bits to get a new identifier with the rest of the sequence.

The downside of the reverse order is it makes constructing the sequence in the state machine a little more complicated, because you need to keep track of how many characters have already been stored in the identifier (each new character needs to be shifted 8-bit further left). The way I've approached that was with a little VTIDBuilder class, that has both an accumulator for the value being built up, and a shift value that determine the offset at which each new character is added.

And note that we don't need to worry too much about overflowing, because any sequence that wouldn't fit in a size_t would not be a valid sequence anyway. If it looks like it's going to overflow, we simply set the accumulator to zero, and stop accumulating further characters. When it's eventually dispatched, it'll just be ignored as an unrecognised identifier.

This is quite a big change to the code base, but I do think it makes things simpler, and should make the process of adding new multicharacter control sequences a lot less hassle.

@j4james j4james added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Aug 13, 2020
@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Aug 13, 2020
@DHowett DHowett added Area-VT Virtual Terminal sequence support Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase and removed Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. labels Aug 13, 2020
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Aug 13, 2020
@DHowett DHowett added this to the Console Backlog milestone Aug 13, 2020
@DHowett
Copy link
Member

DHowett commented Aug 13, 2020

I'm in favor. @zadjii-msft / @miniksa if you're in favor, one of you yank Triage 😄

@DHowett
Copy link
Member

DHowett commented Aug 13, 2020

It's clever, and I like it. It also sets us up for tabled dispatch (if we ever want something like that).

Are there only ever 0-2 intermediates? I noticed that we support N where N fits in a size_t apparently o_O

@j4james
Copy link
Collaborator Author

j4james commented Aug 13, 2020

According to the DEC STD 070 specs, "conforming implementations shall recognize at least 3 intermediates", although the most I think I've seen in practice is 2 (e.g. designating the Portuguese character set is ESC ( % 6). So worst case theoretically would be 5 bytes (a private prefix, 3 intermediates, and a final), so maybe uint64_t would be better than size_t to be safe in a 32-bit build. In practice we'd probably never need that much though.

@miniksa
Copy link
Member

miniksa commented Aug 13, 2020

I'd rather not get bitten by the difference in size_t on 32 vs 64, no matter how unlikely. I would feel absolutely awful discovering that was the issue at the bottom of the bug. Rather you do uint64_t in that case.

@miniksa
Copy link
Member

miniksa commented Aug 13, 2020

Also, I'm fine with this proposal. When I originally put the state machine together this way, I didn't necessarily think quite this far ahead nor imagine that the parser would become quite this feature filled. It is likely about time that we have a better mechanism for sorting these things out.

I like the proposal with the VTID() and the rationale behind the storage. I knew the single letter thing wouldn't last forever the moment I saw the first collision. Let's do it.

@miniksa miniksa removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Aug 13, 2020
@DHowett DHowett added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Aug 13, 2020
@ghost ghost added the In-PR This issue has a related PR label Aug 15, 2020
@ghost ghost closed this as completed in #7304 Aug 18, 2020
@ghost ghost added Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Aug 18, 2020
ghost pushed a commit that referenced this issue Aug 18, 2020
This PR changes the way VT control sequences are identified and
dispatched, to be more efficient and easier to extend. Instead of
parsing the intermediate characters into a vector, and then having to
identify a sequence using both that vector and the final char, we now
use just a single `uint64_t` value as the identifier.

The way the identifier is constructed is by taking the private parameter
prefix, each of the intermediate characters, and then the final
character, and shifting them into a 64-bit integer one byte at a time,
in reverse order. For example, the `DECTLTC` control has a private
parameter prefix of `?`, one intermediate of `'`, and a final character
of `s`. The ASCII values of those characters are `0x3F`, `0x27`, and
`0x73` respectively, and reversing them gets you 0x73273F, so that would
then be the identifier for the control.

The reason for storing them in reverse order, is because sometimes we
need to look at the first intermediate to determine the operation, and
treat the rest of the sequence as a kind of sub-identifier (the
character set designation sequences are one example of this). When in
reverse order, this can easily be achieved by masking off the low byte
to get the first intermediate, and then shifting the value right by 8
bits to get a new identifier with the rest of the sequence.

With 64 bits we have enough space for a private prefix, six
intermediates, and the final char, which is way more than we should ever
need (the _DEC STD 070_ specification recommends supporting at least
three intermediates, but in practice we're unlikely to see more than
two).

With this new way of identifying controls, it should now be possible for
every action code to be unique (for the most part). So I've also used
this PR to clean up the action codes a bit, splitting the codes for the
escape sequences from the control sequences, and sorting them into
alphabetical order (which also does a reasonable job of clustering
associated controls).

## Validation Steps Performed

I think the existing unit tests should be good enough to confirm that
all sequences are still being dispatched correctly. However, I've also
manually tested a number of sequences to make sure they were still
working as expected, in particular those that used intermediates, since
they were the most affected by the dispatch code refactoring.

Since these changes also affected the input state machine, I've done
some manual testing of the conpty keyboard handling (both with and
without the new Win32 input mode enabled) to make sure the keyboard VT
sequences were processed correctly. I've also manually tested the
various VT mouse modes in Vttest to confirm that they were still working
correctly too.

Closes #7276
@ghost
Copy link

ghost commented Aug 26, 2020

🎉This issue was addressed in #7304, which has now been successfully released as Windows Terminal Preview v1.3.2382.0.:tada:

Handy links:

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-VT Virtual Terminal sequence support Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants