-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XTerm parsing improvements #570
Conversation
darrenburns
commented
Jun 9, 2022
- Added a unit test suite covering the XTermParser code
- Fixes freeze/hanging issue: When a candidate escape sequence grows in length to exceed a threshold, or when another \x1b is found, it'll backtrack and treat the characters until that point as keypresses (and escape codes that follow the non-matched sequence won't be lost).
# escape sequence, at which length should we give up and consider our search | ||
# to be unsuccessful? | ||
_MAX_SEQUENCE_SEARCH_THRESHOLD = 20 | ||
|
||
_re_mouse_event = re.compile("^" + re.escape("\x1b[") + r"(<?[\d;]+[mM]|M...)\Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two regexes for mouse events that run.
This one suggests a mouse event can have any number of parameters, but when we parse the mouse event it assumes there are a fixed number.
Is this one definitely required? Could we possibly get away with running just the 1 regex for mouse events rather than 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly. IIRC there are at least two mouse codes that can be generated. At one point I thought I would have to support both, but everything supports the newer method. That might explain the dual regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly questions
# When trying to determine whether the current sequence is a supported/valid | ||
# escape sequence, at which length should we give up and consider our search | ||
# to be unsuccessful? | ||
_MAX_SEQUENCE_SEARCH_THRESHOLD = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was 20 chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The largest that I think we could expect to receive (that I'm aware of) is length 14 (mouse events in large terminal windows). The existence of the re_sgr_mouse
regex suggested to me that longer sequences could exist, so I added on some leeway for defensiveness.
# escape sequence, at which length should we give up and consider our search | ||
# to be unsuccessful? | ||
_MAX_SEQUENCE_SEARCH_THRESHOLD = 20 | ||
|
||
_re_mouse_event = re.compile("^" + re.escape("\x1b[") + r"(<?[\d;]+[mM]|M...)\Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly. IIRC there are at least two mouse codes that can be generated. At one point I thought I would have to support both, but everything supports the newer method. That might explain the dual regexes.
@@ -34,7 +39,7 @@ def __init__( | |||
|
|||
super().__init__() | |||
|
|||
def debug_log(self, *args: Any) -> None: | |||
def debug_log(self, *args: Any) -> None: # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the typing error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no typing error. I was trying to ensure we had full test coverage of this module, and thought this code was temporary "for us" code so didn't think coverage.py should measure it. I can remove this and add a test case around it if you'd prefer.
src/textual/_xterm_parser.py
Outdated
# If we run into another ESC at this point, then we've failed | ||
# to find a match, and should issue everything we've seen within | ||
# the suspected sequence as Key events instead. | ||
buffer = yield self.peek_buffer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. peek_buffer
may return empty string if there is nothing in the buffer, but it could still read an ESC
on L155
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll think about how I can get a test to cover this scenario and fix it on Tuesday.
src/textual/_xterm_parser.py
Outdated
or len(sequence) > _MAX_SEQUENCE_SEARCH_THRESHOLD | ||
): | ||
for character in sequence: | ||
keys = get_key_ansi_sequence(character, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that if we have an ESC
to be sent from an unrecognised sequence, we should send a ^
since other wise it would be seen as the user hitting the literal escape key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you might be right. I was thinking that we might not know if it was literally the user pressing an escape key followed by some data that doesn't match an ANSI sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have made it such that when we perform the backtracking, \x1b
gets converted to ^
rather than escape
.
src/textual/_xterm_parser.py
Outdated
and buffer[0] == ESC | ||
or len(sequence) > _MAX_SEQUENCE_SEARCH_THRESHOLD | ||
): | ||
for character in sequence: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some very similar code further down, I wonder if it could be hoisted in to a closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but will do it now that I know I'm not the only one with that thought :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you said closure...
I've factored it out into a separate method
The XTermParser makes the assumption that escape sequences will always be delivered fully, and not split into chunks. Right now, if you split a sequence into chunks such that the final character of a chunk is The problem is at the point we read this @willmcgugan I'd like to see if we can come up with a solution for this, but I don't want to block this PR on it as it's getting way beyond the original scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice to see this bunch of new tests for the XTerm parser! 💯 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM