XTerm parsing improvements #570

darrenburns · 2022-06-09T16:37:52Z

Added a unit test suite covering the XTermParser code
Fixes freeze/hanging issue: When a candidate escape sequence grows in length to exceed a threshold, or when another \x1b is found, it'll backtrack and treat the characters until that point as keypresses (and escape codes that follow the non-matched sequence won't be lost).

src/textual/_xterm_parser.py

darrenburns · 2022-06-10T08:55:02Z

src/textual/_xterm_parser.py

+# escape sequence, at which length should we give up and consider our search
+# to be unsuccessful?
+_MAX_SEQUENCE_SEARCH_THRESHOLD = 20
+
 _re_mouse_event = re.compile("^" + re.escape("\x1b[") + r"(<?[\d;]+[mM]|M...)\Z")


There are two regexes for mouse events that run.

This one suggests a mouse event can have any number of parameters, but when we parse the mouse event it assumes there are a fixed number.

Is this one definitely required? Could we possibly get away with running just the 1 regex for mouse events rather than 2?

Quite possibly. IIRC there are at least two mouse codes that can be generated. At one point I thought I would have to support both, but everything supports the newer method. That might explain the dual regexes.

willmcgugan

Mostly questions

willmcgugan · 2022-06-10T20:22:02Z

src/textual/_xterm_parser.py

+# When trying to determine whether the current sequence is a supported/valid
+# escape sequence, at which length should we give up and consider our search
+# to be unsuccessful?
+_MAX_SEQUENCE_SEARCH_THRESHOLD = 20


Why was 20 chosen?

The largest that I think we could expect to receive (that I'm aware of) is length 14 (mouse events in large terminal windows). The existence of the re_sgr_mouse regex suggested to me that longer sequences could exist, so I added on some leeway for defensiveness.

willmcgugan · 2022-06-10T20:28:25Z

src/textual/_xterm_parser.py

+# escape sequence, at which length should we give up and consider our search
+# to be unsuccessful?
+_MAX_SEQUENCE_SEARCH_THRESHOLD = 20
+
 _re_mouse_event = re.compile("^" + re.escape("\x1b[") + r"(<?[\d;]+[mM]|M...)\Z")


Quite possibly. IIRC there are at least two mouse codes that can be generated. At one point I thought I would have to support both, but everything supports the newer method. That might explain the dual regexes.

willmcgugan · 2022-06-10T20:28:41Z

src/textual/_xterm_parser.py

@@ -34,7 +39,7 @@ def __init__(

        super().__init__()

-    def debug_log(self, *args: Any) -> None:
+    def debug_log(self, *args: Any) -> None:  # pragma: no cover


What was the typing error?

There was no typing error. I was trying to ensure we had full test coverage of this module, and thought this code was temporary "for us" code so didn't think coverage.py should measure it. I can remove this and add a test case around it if you'd prefer.

willmcgugan · 2022-06-10T20:38:13Z

src/textual/_xterm_parser.py

+                    # If we run into another ESC at this point, then we've failed
+                    # to find a match, and should issue everything we've seen within
+                    # the suspected sequence as Key events instead.
+                    buffer = yield self.peek_buffer()


This doesn't look right. peek_buffer may return empty string if there is nothing in the buffer, but it could still read an ESC on L155

I'll think about how I can get a test to cover this scenario and fix it on Tuesday.

willmcgugan · 2022-06-10T20:40:31Z

src/textual/_xterm_parser.py

+                        or len(sequence) > _MAX_SEQUENCE_SEARCH_THRESHOLD
+                    ):
+                        for character in sequence:
+                            keys = get_key_ansi_sequence(character, None)


I suspect that if we have an ESC to be sent from an unrecognised sequence, we should send a ^ since other wise it would be seen as the user hitting the literal escape key.

Yeah, you might be right. I was thinking that we might not know if it was literally the user pressing an escape key followed by some data that doesn't match an ANSI sequence.

Have made it such that when we perform the backtracking, \x1b gets converted to ^ rather than escape.

willmcgugan · 2022-06-10T20:43:06Z

src/textual/_xterm_parser.py

+                        and buffer[0] == ESC
+                        or len(sequence) > _MAX_SEQUENCE_SEARCH_THRESHOLD
+                    ):
+                        for character in sequence:


There's some very similar code further down, I wonder if it could be hoisted in to a closure?

I considered this, but will do it now that I know I'm not the only one with that thought :)

Oh, you said closure...

I've factored it out into a separate method

darrenburns · 2022-06-14T14:19:46Z

The XTermParser makes the assumption that escape sequences will always be delivered fully, and not split into chunks.

Right now, if you split a sequence into chunks such that the final character of a chunk is \x1b, then peek_buffer returns an empty string and the \x1b is treated as an Esc key press. However, this \x1b could be the beginning of an escape sequence which is to be delivered in the next chunk.

The problem is at the point we read this \x1b at the end of the chunk, we don't know if there's another chunk to follow, and so we don't know right now what to do with the \x1b. The only way I can see us supporting this is if we have some kind of timeout that can kick in when \x1b is received at the end of a chunk.

@willmcgugan I'd like to see if we can come up with a solution for this, but I don't want to block this PR on it as it's getting way beyond the original scope.

olivierphi

Really nice to see this bunch of new tests for the XTerm parser! 💯 😊

willmcgugan

LGTM

darrenburns added 3 commits June 9, 2022 16:27

Backtracking unknown escape sequences, various tests for XTermParser

bfb962b

Add various additional tests around XTermParser

763c0d0

Add test around non-escape code input mapping to keys

0125fbd

darrenburns commented Jun 9, 2022

View reviewed changes

src/textual/_xterm_parser.py Outdated Show resolved Hide resolved

Formatting, tidying up, add extra mouse event parsing test

30b6a0b

darrenburns commented Jun 10, 2022

View reviewed changes

Some comment improvements and tidying up

2f2d064

darrenburns marked this pull request as ready for review June 10, 2022 08:59

darrenburns requested a review from willmcgugan June 10, 2022 09:00

willmcgugan reviewed Jun 10, 2022

View reviewed changes

darrenburns added 7 commits June 11, 2022 13:38

Translate "escape" to "^" when XTermParser has to backtrack

1510739

Small tidy up

13925b9

Ensure we read buffer correctly in XTermParser

1b8781f

XTermHandler refactor

23855f1

Variable rename in XTermParser

f31a9d4

Variable rename in XTermParser

101558b

Add tests for XTermParser chunking

e1c8598

darrenburns requested review from willmcgugan and olivierphi June 16, 2022 12:07

darrenburns linked an issue Jun 16, 2022 that may be closed by this pull request

Unknown escape sequences shouldn't cause freeze (e.g. ^[d, ^[f, ^[b) #541

Closed

olivierphi approved these changes Jun 16, 2022

View reviewed changes

willmcgugan approved these changes Jun 17, 2022

View reviewed changes

willmcgugan merged commit 349b532 into css Jun 17, 2022

willmcgugan deleted the handle-unknown-sequences branch June 17, 2022 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XTerm parsing improvements #570

XTerm parsing improvements #570

darrenburns commented Jun 9, 2022

darrenburns Jun 10, 2022

willmcgugan Jun 10, 2022

willmcgugan left a comment

willmcgugan Jun 10, 2022

darrenburns Jun 11, 2022 •

edited

Loading

willmcgugan Jun 10, 2022

willmcgugan Jun 10, 2022

darrenburns Jun 11, 2022

willmcgugan Jun 10, 2022

darrenburns Jun 11, 2022

willmcgugan Jun 10, 2022

darrenburns Jun 11, 2022

darrenburns Jun 11, 2022

willmcgugan Jun 10, 2022

darrenburns Jun 11, 2022

darrenburns Jun 11, 2022

darrenburns commented Jun 14, 2022

olivierphi left a comment

willmcgugan left a comment

XTerm parsing improvements #570

XTerm parsing improvements #570

Conversation

darrenburns commented Jun 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willmcgugan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darrenburns Jun 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darrenburns commented Jun 14, 2022

olivierphi left a comment

Choose a reason for hiding this comment

willmcgugan left a comment

Choose a reason for hiding this comment

darrenburns Jun 11, 2022 •

edited

Loading