Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: more explicit terminal sequence parsing #39

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sethp
Copy link
Contributor

@sethp sethp commented Apr 28, 2023

Introduces parse_new, which takes advantage of a generalized sequence
parser and slice patterns to (hopefully) make the mapping between a byte
pattern and the corresponding terminal behavior more explicit.

This initial implementation is running in "dark launch" mode: we run
both parse_new and parse_classic (always taking the latter's output as
canonical), and print a warning when the two don't match.

This has resulted in a number of "bug-compatible" changes to the parser
to match the old behavior precisely, both in an effort to ensure that
we're faithfully reproducing the semantics of the old parser and to give
us the opportunity to directly compare the relative merits of the old &
new approaches.

In terms of completeness, this current implementation has a few
limitations:

  • Most notably, we don't handle the "redraw all" request given
    (currently) by the sequence "\u{1b}[VxD": per the standard, that
    parses as [CSI, .., "V"] followed by the unrelated bytes xD. We
    can either extend the parser to recognize this sequence, or, as I
    would prefer, change the sequence to "fit" within the standard.

  • The sequence enumeration and handling feels pretty good to me, both in
    terms of how the existing sequences are handled and ease of adding new
    ones (including, as set_text_mode demonstrates, the flexibility to
    integrate external combinatorial parsers if necessary), especially
    with respect to the parameter parsing. However, the parser itself is a
    mess: the standard proved less helpful in recognizing the set of
    sequences we've encountered in the wild" than I'd hoped, so I think we
    could definitely do better if we revisit it with fresh eyes.

  • Not all of the error cases are exactly the same when given "weird"
    sequences like "\u{1b}[?m"; they both produce an Err::Failure, but
    marking slightly different portions of the input. I believe this to be
    roughly acceptable (we'd still make progress towards parsing the entire
    input, just while printing out slightly different results for the
    unrecognized sequences).

  • I based the current general parser on the ECMA standard rather than
    ANSI's, despite being in ansi.rs. So that's potential for some
    comedy maybe.

And some work that remains entirely untouched is:

  • Integrating a utf-8 parser so we can correctly handle multi-byte
    sequences.
  • Collapsing the Text/Op hierarchy so the parser can more
    directly split incoming inputs rather than "leaking" those details
    into parse_str_tail.
  • Critically evaluating the efficiency and throughpout of the parser,
    especially with an eye towards reducing the number of times we scan
    over the whole input.
  • Replacing the allocating branches (TextOp, DecPrivate*) with
    iterators.

@sethp sethp requested a review from dougli1sqrd April 28, 2023 15:50
A combination of:

wip: doesn't even build lol
wip: everything compiles now, so it must work
Introduces parse_new, which takes advantage of a generalized sequence
parser and slice patterns to (hopefully) make the mapping between a byte
pattern and the corresponding terminal behavior more explicit.

This initial implementation is running in "dark launch" mode: we run
both parse_new and parse_classic (always taking the latter's output as
canonical), and print a warning when the two don't match.

This has resulted in a number of "bug-compatible" changes to the parser
to match the old behavior precisely, both in an effort to ensure that
we're faithfully reproducing the semantics of the old parser and to give
us the opportunity to directly compare the relative merits of the old &
new approaches.

In terms of completeness, this current implementation has a few
limitations:

* Most notably, we don't handle the "redraw all" request given
  (currently) by the sequence "\u{1b}[VxD": per the standard, that
  parses as `[CSI, ..,  "V"]` followed by the unrelated bytes `xD`. We
  can either extend the parser to recognize this sequence, or, as I
  would prefer, change the sequence to "fit" within the standard.

* The sequence enumeration and handling feels pretty good to me, both in
  terms of how the existing sequences are handled and ease of adding new
  ones (including, as `set_text_mode` demonstrates, the flexibility to
  integrate external combinatorial parsers if necessary), especially
  with respect to the parameter parsing. However, the parser itself is a
  mess: the standard proved less helpful in recognizing the set of
  sequences we've encountered in the wild" than I'd hoped, so I think we
  could definitely do better if we revisit it with fresh eyes.

* Not all of the error cases are exactly the same when given "weird"
  sequences like "\u{1b}[?m"; they both produce an Err::Failure, but
  marking slightly different portions of the input. I believe this to be
  roughly acceptable (we'd still make progress towards parsing the entire
  input, just while printing out slightly different results for the
  unrecognized sequences).

* I based the current general parser on the ECMA standard rather than
  ANSI's, despite being in `ansi.rs`. So that's potential for some
  comedy maybe.

And some work that remains entirely untouched is:

* Integrating a utf-8 parser so we can correctly handle multi-byte
  sequences.
* Collapsing the Text/Op hierarchy so the parser can more
  directly split incoming inputs rather than "leaking" those details
  into parse_str_tail.
* Critically evaluating the efficiency and throughpout of the parser,
  especially with an eye towards reducing the number of times we scan
  over the whole input.
* Replacing the allocating branches (TextOp, DecPrivate*) with
  iterators.
Comment on lines 14 to +16
//! ESC [ <n> S => Scroll up n lines
//! ESC [ <n> T => SCroll down n lines
//! ESC [ 6 n => Request cursor postion, as `ESC [ <r> ; <c> R` at row r and column c
//! ESC [ <n> T => Scroll down n lines
//! ESC [ 6 n => Request cursor position, as `ESC [ <r> ; <c> R` at row r and column c
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I definitely lost the scrolling changes in this:

whuh oh! wanted: Ok(("", Scroll { delta: -1 }))
            got: Err(Failure(Error { input: "S", code: Fail }))
for input: "\u{1b}[S"

@dougli1sqrd
Copy link
Contributor

is ECMA like Different than ANSI? Like are they just different standards?

Copy link
Contributor

@dougli1sqrd dougli1sqrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself a little lost at points, but I'm also only looking at this on github. But overall I think it's great that we parse into an array of parts and then we're able to make a nice set of sequence to Op results at the end there.

@@ -65,7 +65,10 @@ unroll = "0.1.5"
[dev-dependencies]

[features]
default = ["perf_log"]
default = ["background"]
background = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature!

pub enum Op {
MoveCursorDelta { dx: isize, dy: isize },
MoveCursorAbs { x: usize, y: usize },
MoveCursorAbsCol { x: usize },
MoveCursorBeginningAndLine { dy: isize },
Scroll { delta: isize },
RequstCursorPos,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recust

@@ -301,9 +293,9 @@ fn scroll_down(input: &str) -> OpResult {

// Request Cursor Position
// ESC [ 6 n
fn request_cursor_postion(input: &str) -> OpResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just trying to spell "positive ion" I guess 😄

q: &'a mut [&'str str; 4],
) -> IResult<&'str str, &'a [&'str str]> {
let (input, start) =
nom::combinator::recognize(nom::character::streaming::one_of("\u{1b}\u{9b}"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's wild to me that 0x1b or 1x9b is officially a start sequence.

Also why use recognize here instead of just the streaming one_of directly?

nom::combinator::recognize(nom::character::streaming::one_of("\u{1b}\u{9b}"))
.parse(input)?;

match context(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just next time we chat you can go into how you decided to use context and the "c0" context string etc. I don't disagree or anything, I'm just curious about your thought process

nom::combinator::recognize(nom::character::complete::char('\x40')),
))),
nom::combinator::recognize(nom::character::streaming::satisfy(|ch| {
'\x00' < ch && ch < '\x1f'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is all this from the ecma parsing?

}

/// kind of like [nom::multi::fill], but for up to N repeats rather than exactly N
// TODO: can this be a parser? we could use .parse_all then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could certainly make a parser

}
// We didn't match a non-standard VT100 sequence, nothing to return yet
Err(nom::Err::Error(_)) | Ok((_, None)) => {}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'm having trouble sorting through this series of match context().parse() it's so lomg

);
r
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it can't be helped bc that's just what the formatter does, but I'm finding this match also lomg and kinda hard to follow

@@ -273,7 +273,36 @@ impl TextField {

fn handle_op(&mut self, op: Op) {
use Op::*;
// println!("{:?}", op);
#[cfg(feature = "op_log")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting, this make sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants