-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: more explicit terminal sequence parsing #39
base: main
Are you sure you want to change the base?
Conversation
A combination of: wip: doesn't even build lol wip: everything compiles now, so it must work
2ef3e63
to
5f4a4ca
Compare
Introduces parse_new, which takes advantage of a generalized sequence parser and slice patterns to (hopefully) make the mapping between a byte pattern and the corresponding terminal behavior more explicit. This initial implementation is running in "dark launch" mode: we run both parse_new and parse_classic (always taking the latter's output as canonical), and print a warning when the two don't match. This has resulted in a number of "bug-compatible" changes to the parser to match the old behavior precisely, both in an effort to ensure that we're faithfully reproducing the semantics of the old parser and to give us the opportunity to directly compare the relative merits of the old & new approaches. In terms of completeness, this current implementation has a few limitations: * Most notably, we don't handle the "redraw all" request given (currently) by the sequence "\u{1b}[VxD": per the standard, that parses as `[CSI, .., "V"]` followed by the unrelated bytes `xD`. We can either extend the parser to recognize this sequence, or, as I would prefer, change the sequence to "fit" within the standard. * The sequence enumeration and handling feels pretty good to me, both in terms of how the existing sequences are handled and ease of adding new ones (including, as `set_text_mode` demonstrates, the flexibility to integrate external combinatorial parsers if necessary), especially with respect to the parameter parsing. However, the parser itself is a mess: the standard proved less helpful in recognizing the set of sequences we've encountered in the wild" than I'd hoped, so I think we could definitely do better if we revisit it with fresh eyes. * Not all of the error cases are exactly the same when given "weird" sequences like "\u{1b}[?m"; they both produce an Err::Failure, but marking slightly different portions of the input. I believe this to be roughly acceptable (we'd still make progress towards parsing the entire input, just while printing out slightly different results for the unrecognized sequences). * I based the current general parser on the ECMA standard rather than ANSI's, despite being in `ansi.rs`. So that's potential for some comedy maybe. And some work that remains entirely untouched is: * Integrating a utf-8 parser so we can correctly handle multi-byte sequences. * Collapsing the Text/Op hierarchy so the parser can more directly split incoming inputs rather than "leaking" those details into parse_str_tail. * Critically evaluating the efficiency and throughpout of the parser, especially with an eye towards reducing the number of times we scan over the whole input. * Replacing the allocating branches (TextOp, DecPrivate*) with iterators.
5f4a4ca
to
a5d231f
Compare
//! ESC [ <n> S => Scroll up n lines | ||
//! ESC [ <n> T => SCroll down n lines | ||
//! ESC [ 6 n => Request cursor postion, as `ESC [ <r> ; <c> R` at row r and column c | ||
//! ESC [ <n> T => Scroll down n lines | ||
//! ESC [ 6 n => Request cursor position, as `ESC [ <r> ; <c> R` at row r and column c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I definitely lost the scrolling changes in this:
whuh oh! wanted: Ok(("", Scroll { delta: -1 }))
got: Err(Failure(Error { input: "S", code: Fail }))
for input: "\u{1b}[S"
is ECMA like Different than ANSI? Like are they just different standards? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found myself a little lost at points, but I'm also only looking at this on github. But overall I think it's great that we parse into an array of parts and then we're able to make a nice set of sequence to Op results at the end there.
@@ -65,7 +65,10 @@ unroll = "0.1.5" | |||
[dev-dependencies] | |||
|
|||
[features] | |||
default = ["perf_log"] | |||
default = ["background"] | |||
background = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature!
pub enum Op { | ||
MoveCursorDelta { dx: isize, dy: isize }, | ||
MoveCursorAbs { x: usize, y: usize }, | ||
MoveCursorAbsCol { x: usize }, | ||
MoveCursorBeginningAndLine { dy: isize }, | ||
Scroll { delta: isize }, | ||
RequstCursorPos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recust
@@ -301,9 +293,9 @@ fn scroll_down(input: &str) -> OpResult { | |||
|
|||
// Request Cursor Position | |||
// ESC [ 6 n | |||
fn request_cursor_postion(input: &str) -> OpResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was just trying to spell "positive ion" I guess 😄
q: &'a mut [&'str str; 4], | ||
) -> IResult<&'str str, &'a [&'str str]> { | ||
let (input, start) = | ||
nom::combinator::recognize(nom::character::streaming::one_of("\u{1b}\u{9b}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's wild to me that 0x1b or 1x9b is officially a start sequence.
Also why use recognize
here instead of just the streaming one_of
directly?
nom::combinator::recognize(nom::character::streaming::one_of("\u{1b}\u{9b}")) | ||
.parse(input)?; | ||
|
||
match context( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just next time we chat you can go into how you decided to use context
and the "c0" context string etc. I don't disagree or anything, I'm just curious about your thought process
nom::combinator::recognize(nom::character::complete::char('\x40')), | ||
))), | ||
nom::combinator::recognize(nom::character::streaming::satisfy(|ch| { | ||
'\x00' < ch && ch < '\x1f' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is all this from the ecma parsing?
} | ||
|
||
/// kind of like [nom::multi::fill], but for up to N repeats rather than exactly N | ||
// TODO: can this be a parser? we could use .parse_all then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could certainly make a parser
} | ||
// We didn't match a non-standard VT100 sequence, nothing to return yet | ||
Err(nom::Err::Error(_)) | Ok((_, None)) => {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I'm having trouble sorting through this series of match context().parse()
it's so lomg
); | ||
r | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it can't be helped bc that's just what the formatter does, but I'm finding this match also lomg and kinda hard to follow
@@ -273,7 +273,36 @@ impl TextField { | |||
|
|||
fn handle_op(&mut self, op: Op) { | |||
use Op::*; | |||
// println!("{:?}", op); | |||
#[cfg(feature = "op_log")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting, this make sense!
Introduces parse_new, which takes advantage of a generalized sequence
parser and slice patterns to (hopefully) make the mapping between a byte
pattern and the corresponding terminal behavior more explicit.
This initial implementation is running in "dark launch" mode: we run
both parse_new and parse_classic (always taking the latter's output as
canonical), and print a warning when the two don't match.
This has resulted in a number of "bug-compatible" changes to the parser
to match the old behavior precisely, both in an effort to ensure that
we're faithfully reproducing the semantics of the old parser and to give
us the opportunity to directly compare the relative merits of the old &
new approaches.
In terms of completeness, this current implementation has a few
limitations:
Most notably, we don't handle the "redraw all" request given
(currently) by the sequence "\u{1b}[VxD": per the standard, that
parses as
[CSI, .., "V"]
followed by the unrelated bytesxD
. Wecan either extend the parser to recognize this sequence, or, as I
would prefer, change the sequence to "fit" within the standard.
The sequence enumeration and handling feels pretty good to me, both in
terms of how the existing sequences are handled and ease of adding new
ones (including, as
set_text_mode
demonstrates, the flexibility tointegrate external combinatorial parsers if necessary), especially
with respect to the parameter parsing. However, the parser itself is a
mess: the standard proved less helpful in recognizing the set of
sequences we've encountered in the wild" than I'd hoped, so I think we
could definitely do better if we revisit it with fresh eyes.
Not all of the error cases are exactly the same when given "weird"
sequences like "\u{1b}[?m"; they both produce an Err::Failure, but
marking slightly different portions of the input. I believe this to be
roughly acceptable (we'd still make progress towards parsing the entire
input, just while printing out slightly different results for the
unrecognized sequences).
I based the current general parser on the ECMA standard rather than
ANSI's, despite being in
ansi.rs
. So that's potential for somecomedy maybe.
And some work that remains entirely untouched is:
sequences.
directly split incoming inputs rather than "leaking" those details
into parse_str_tail.
especially with an eye towards reducing the number of times we scan
over the whole input.
iterators.