Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XTerm parsing improvements #570

Merged
merged 12 commits into from
Jun 17, 2022
Merged

XTerm parsing improvements #570

merged 12 commits into from
Jun 17, 2022

Conversation

darrenburns
Copy link
Member

  • Added a unit test suite covering the XTermParser code
  • Fixes freeze/hanging issue: When a candidate escape sequence grows in length to exceed a threshold, or when another \x1b is found, it'll backtrack and treat the characters until that point as keypresses (and escape codes that follow the non-matched sequence won't be lost).

# escape sequence, at which length should we give up and consider our search
# to be unsuccessful?
_MAX_SEQUENCE_SEARCH_THRESHOLD = 20

_re_mouse_event = re.compile("^" + re.escape("\x1b[") + r"(<?[\d;]+[mM]|M...)\Z")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two regexes for mouse events that run.

This one suggests a mouse event can have any number of parameters, but when we parse the mouse event it assumes there are a fixed number.

Is this one definitely required? Could we possibly get away with running just the 1 regex for mouse events rather than 2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly. IIRC there are at least two mouse codes that can be generated. At one point I thought I would have to support both, but everything supports the newer method. That might explain the dual regexes.

@darrenburns darrenburns marked this pull request as ready for review June 10, 2022 08:59
@darrenburns darrenburns requested a review from willmcgugan June 10, 2022 09:00
Copy link
Collaborator

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions

# When trying to determine whether the current sequence is a supported/valid
# escape sequence, at which length should we give up and consider our search
# to be unsuccessful?
_MAX_SEQUENCE_SEARCH_THRESHOLD = 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was 20 chosen?

Copy link
Member Author

@darrenburns darrenburns Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The largest that I think we could expect to receive (that I'm aware of) is length 14 (mouse events in large terminal windows). The existence of the re_sgr_mouse regex suggested to me that longer sequences could exist, so I added on some leeway for defensiveness.

# escape sequence, at which length should we give up and consider our search
# to be unsuccessful?
_MAX_SEQUENCE_SEARCH_THRESHOLD = 20

_re_mouse_event = re.compile("^" + re.escape("\x1b[") + r"(<?[\d;]+[mM]|M...)\Z")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly. IIRC there are at least two mouse codes that can be generated. At one point I thought I would have to support both, but everything supports the newer method. That might explain the dual regexes.

@@ -34,7 +39,7 @@ def __init__(

super().__init__()

def debug_log(self, *args: Any) -> None:
def debug_log(self, *args: Any) -> None: # pragma: no cover
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the typing error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no typing error. I was trying to ensure we had full test coverage of this module, and thought this code was temporary "for us" code so didn't think coverage.py should measure it. I can remove this and add a test case around it if you'd prefer.

# If we run into another ESC at this point, then we've failed
# to find a match, and should issue everything we've seen within
# the suspected sequence as Key events instead.
buffer = yield self.peek_buffer()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. peek_buffer may return empty string if there is nothing in the buffer, but it could still read an ESC on L155

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about how I can get a test to cover this scenario and fix it on Tuesday.

or len(sequence) > _MAX_SEQUENCE_SEARCH_THRESHOLD
):
for character in sequence:
keys = get_key_ansi_sequence(character, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that if we have an ESC to be sent from an unrecognised sequence, we should send a ^ since other wise it would be seen as the user hitting the literal escape key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you might be right. I was thinking that we might not know if it was literally the user pressing an escape key followed by some data that doesn't match an ANSI sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have made it such that when we perform the backtracking, \x1b gets converted to ^ rather than escape.

and buffer[0] == ESC
or len(sequence) > _MAX_SEQUENCE_SEARCH_THRESHOLD
):
for character in sequence:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some very similar code further down, I wonder if it could be hoisted in to a closure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but will do it now that I know I'm not the only one with that thought :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you said closure...

I've factored it out into a separate method

@darrenburns
Copy link
Member Author

The XTermParser makes the assumption that escape sequences will always be delivered fully, and not split into chunks.

Right now, if you split a sequence into chunks such that the final character of a chunk is \x1b, then peek_buffer returns an empty string and the \x1b is treated as an Esc key press. However, this \x1b could be the beginning of an escape sequence which is to be delivered in the next chunk.

The problem is at the point we read this \x1b at the end of the chunk, we don't know if there's another chunk to follow, and so we don't know right now what to do with the \x1b. The only way I can see us supporting this is if we have some kind of timeout that can kick in when \x1b is received at the end of a chunk.

@willmcgugan I'd like to see if we can come up with a solution for this, but I don't want to block this PR on it as it's getting way beyond the original scope.

Copy link

@olivierphi olivierphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see this bunch of new tests for the XTerm parser! 💯 😊

Copy link
Collaborator

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@willmcgugan willmcgugan merged commit 349b532 into css Jun 17, 2022
@willmcgugan willmcgugan deleted the handle-unknown-sequences branch June 17, 2022 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unknown escape sequences shouldn't cause freeze (e.g. ^[d, ^[f, ^[b)
3 participants