refactor: more explicit terminal sequence parsing #39

sethp · 2023-04-28T15:50:49Z

Introduces parse_new, which takes advantage of a generalized sequence
parser and slice patterns to (hopefully) make the mapping between a byte
pattern and the corresponding terminal behavior more explicit.

This initial implementation is running in "dark launch" mode: we run
both parse_new and parse_classic (always taking the latter's output as
canonical), and print a warning when the two don't match.

This has resulted in a number of "bug-compatible" changes to the parser
to match the old behavior precisely, both in an effort to ensure that
we're faithfully reproducing the semantics of the old parser and to give
us the opportunity to directly compare the relative merits of the old &
new approaches.

In terms of completeness, this current implementation has a few
limitations:

Most notably, we don't handle the "redraw all" request given
(currently) by the sequence "\u{1b}[VxD": per the standard, that
parses as [CSI, .., "V"] followed by the unrelated bytes xD. We
can either extend the parser to recognize this sequence, or, as I
would prefer, change the sequence to "fit" within the standard.
The sequence enumeration and handling feels pretty good to me, both in
terms of how the existing sequences are handled and ease of adding new
ones (including, as set_text_mode demonstrates, the flexibility to
integrate external combinatorial parsers if necessary), especially
with respect to the parameter parsing. However, the parser itself is a
mess: the standard proved less helpful in recognizing the set of
sequences we've encountered in the wild" than I'd hoped, so I think we
could definitely do better if we revisit it with fresh eyes.
Not all of the error cases are exactly the same when given "weird"
sequences like "\u{1b}[?m"; they both produce an Err::Failure, but
marking slightly different portions of the input. I believe this to be
roughly acceptable (we'd still make progress towards parsing the entire
input, just while printing out slightly different results for the
unrecognized sequences).
I based the current general parser on the ECMA standard rather than
ANSI's, despite being in ansi.rs. So that's potential for some
comedy maybe.

And some work that remains entirely untouched is:

Integrating a utf-8 parser so we can correctly handle multi-byte
sequences.
Collapsing the Text/Op hierarchy so the parser can more
directly split incoming inputs rather than "leaking" those details
into parse_str_tail.
Critically evaluating the efficiency and throughpout of the parser,
especially with an eye towards reducing the number of times we scan
over the whole input.
Replacing the allocating branches (TextOp, DecPrivate*) with
iterators.

A combination of: wip: doesn't even build lol wip: everything compiles now, so it must work

Introduces parse_new, which takes advantage of a generalized sequence parser and slice patterns to (hopefully) make the mapping between a byte pattern and the corresponding terminal behavior more explicit. This initial implementation is running in "dark launch" mode: we run both parse_new and parse_classic (always taking the latter's output as canonical), and print a warning when the two don't match. This has resulted in a number of "bug-compatible" changes to the parser to match the old behavior precisely, both in an effort to ensure that we're faithfully reproducing the semantics of the old parser and to give us the opportunity to directly compare the relative merits of the old & new approaches. In terms of completeness, this current implementation has a few limitations: * Most notably, we don't handle the "redraw all" request given (currently) by the sequence "\u{1b}[VxD": per the standard, that parses as `[CSI, .., "V"]` followed by the unrelated bytes `xD`. We can either extend the parser to recognize this sequence, or, as I would prefer, change the sequence to "fit" within the standard. * The sequence enumeration and handling feels pretty good to me, both in terms of how the existing sequences are handled and ease of adding new ones (including, as `set_text_mode` demonstrates, the flexibility to integrate external combinatorial parsers if necessary), especially with respect to the parameter parsing. However, the parser itself is a mess: the standard proved less helpful in recognizing the set of sequences we've encountered in the wild" than I'd hoped, so I think we could definitely do better if we revisit it with fresh eyes. * Not all of the error cases are exactly the same when given "weird" sequences like "\u{1b}[?m"; they both produce an Err::Failure, but marking slightly different portions of the input. I believe this to be roughly acceptable (we'd still make progress towards parsing the entire input, just while printing out slightly different results for the unrecognized sequences). * I based the current general parser on the ECMA standard rather than ANSI's, despite being in `ansi.rs`. So that's potential for some comedy maybe. And some work that remains entirely untouched is: * Integrating a utf-8 parser so we can correctly handle multi-byte sequences. * Collapsing the Text/Op hierarchy so the parser can more directly split incoming inputs rather than "leaking" those details into parse_str_tail. * Critically evaluating the efficiency and throughpout of the parser, especially with an eye towards reducing the number of times we scan over the whole input. * Replacing the allocating branches (TextOp, DecPrivate*) with iterators.

sethp · 2023-04-28T16:09:48Z

src/ansi.rs

 //! ESC [ <n> S         => Scroll up n lines
-//! ESC [ <n> T         => SCroll down n lines
-//! ESC [ 6 n           => Request cursor postion, as `ESC [ <r> ; <c> R` at row r and column c
+//! ESC [ <n> T         => Scroll down n lines
+//! ESC [ 6 n           => Request cursor position, as `ESC [ <r> ; <c> R` at row r and column c


Whoops, I definitely lost the scrolling changes in this:

whuh oh! wanted: Ok(("", Scroll { delta: -1 })) got: Err(Failure(Error { input: "S", code: Fail })) for input: "\u{1b}[S"

dougli1sqrd · 2023-04-28T16:20:55Z

is ECMA like Different than ANSI? Like are they just different standards?

dougli1sqrd

I found myself a little lost at points, but I'm also only looking at this on github. But overall I think it's great that we parse into an array of parts and then we're able to make a nice set of sequence to Op results at the end there.

dougli1sqrd · 2023-05-04T06:35:25Z

Cargo.toml

@@ -65,7 +65,10 @@ unroll = "0.1.5"
 [dev-dependencies]

 [features]
-default  = ["perf_log"]
+default    = ["background"]
+background = []


dougli1sqrd · 2023-05-04T06:36:18Z

src/ansi.rs

 pub enum Op {
    MoveCursorDelta { dx: isize, dy: isize },
    MoveCursorAbs { x: usize, y: usize },
    MoveCursorAbsCol { x: usize },
    MoveCursorBeginningAndLine { dy: isize },
    Scroll { delta: isize },
-    RequstCursorPos,


dougli1sqrd · 2023-05-04T06:39:30Z

src/ansi.rs

@@ -301,9 +293,9 @@ fn scroll_down(input: &str) -> OpResult {

 // Request Cursor Position
 // ESC [ 6 n
-fn request_cursor_postion(input: &str) -> OpResult {


i was just trying to spell "positive ion" I guess 😄

dougli1sqrd · 2023-05-04T06:42:04Z

src/ansi.rs

+        q: &'a mut [&'str str; 4],
+    ) -> IResult<&'str str, &'a [&'str str]> {
+        let (input, start) =
+            nom::combinator::recognize(nom::character::streaming::one_of("\u{1b}\u{9b}"))


it's wild to me that 0x1b or 1x9b is officially a start sequence.

Also why use recognize here instead of just the streaming one_of directly?

dougli1sqrd · 2023-05-04T07:05:27Z

src/ansi.rs

+            nom::combinator::recognize(nom::character::streaming::one_of("\u{1b}\u{9b}"))
+                .parse(input)?;
+
+        match context(


maybe just next time we chat you can go into how you decided to use context and the "c0" context string etc. I don't disagree or anything, I'm just curious about your thought process

dougli1sqrd · 2023-05-04T07:15:36Z

src/ansi.rs

+                        nom::combinator::recognize(nom::character::complete::char('\x40')),
+                    ))),
+                    nom::combinator::recognize(nom::character::streaming::satisfy(|ch| {
+                        '\x00' < ch && ch < '\x1f'


is all this from the ecma parsing?

dougli1sqrd · 2023-05-04T07:45:47Z

src/ansi.rs

+    }
+
+    /// kind of like [nom::multi::fill], but for up to N repeats rather than exactly N
+    // TODO: can this be a parser? we could use .parse_all then


this could certainly make a parser

dougli1sqrd · 2023-05-04T07:51:55Z

src/ansi.rs

+            }
+            // We didn't match a non-standard VT100 sequence, nothing to return yet
+            Err(nom::Err::Error(_)) | Ok((_, None)) => {}
+        }


tbh I'm having trouble sorting through this series of match context().parse() it's so lomg

dougli1sqrd · 2023-05-04T08:01:01Z

src/ansi.rs

+            );
+            r
+        }
+    }


maybe it can't be helped bc that's just what the formatter does, but I'm finding this match also lomg and kinda hard to follow

dougli1sqrd · 2023-05-04T08:02:27Z

src/terminal.rs

@@ -273,7 +273,36 @@ impl TextField {

    fn handle_op(&mut self, op: Op) {
        use Op::*;
-        // println!("{:?}", op);
+        #[cfg(feature = "op_log")]


oh interesting, this make sense!

sethp requested a review from dougli1sqrd April 28, 2023 15:50

sethp added 3 commits April 28, 2023 08:53

wip: squash of initial prototypes

7f3b4d8

A combination of: wip: doesn't even build lol wip: everything compiles now, so it must work

wip: parse3 example

bd82435

wip: committing to parse6

5e933ed

sethp force-pushed the refactor/nom-it-up branch from 2ef3e63 to 5f4a4ca Compare April 28, 2023 15:58

sethp force-pushed the refactor/nom-it-up branch from 5f4a4ca to a5d231f Compare April 28, 2023 16:00

sethp commented Apr 28, 2023

View reviewed changes

dougli1sqrd approved these changes May 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: more explicit terminal sequence parsing #39

refactor: more explicit terminal sequence parsing #39

sethp commented Apr 28, 2023

sethp Apr 28, 2023

dougli1sqrd commented Apr 28, 2023

dougli1sqrd left a comment

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

dougli1sqrd May 4, 2023

refactor: more explicit terminal sequence parsing #39

Are you sure you want to change the base?

refactor: more explicit terminal sequence parsing #39

Conversation

sethp commented Apr 28, 2023

Choose a reason for hiding this comment

dougli1sqrd commented Apr 28, 2023

dougli1sqrd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment