Skip to content

Commit

Permalink
fix: last number not identified in an isolated value pair
Browse files Browse the repository at this point in the history
I am using the Lexer to parse json chunks that I get in stdin one line at a time.

I appreciate my use case isn't the intended one for this lib as all the tests deal with fully formed json objects and nowhere it states that this lib will work with chunks. But since it works perfectly for my use case, I thought others might benefit from this fix. But also feel free to ignore it.

This PR fixes a problem where the last number value won't be identified as a number because numbers don't have a terminator like strings. In normal circumstances the next token would serve as the terminator (comma, curly brackets, etc) but if the number line is the last one in an json object, the string just ends and the object closing curly brace is in the next line so the number gets ignored.

This fix checks if the last processed token was Number and returns the token type accordingly.

Happy to add any further improvements or additional tests that would be beneficial
  • Loading branch information
Byron authored Jul 25, 2024
2 parents 3fe2df2 + 2b47444 commit 927c987
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 1 deletion.
12 changes: 11 additions & 1 deletion src/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,17 @@ where
} // end for each byte

match t {
None => None,
None => match (buf, state) {
(Some(b), Mode::Number) => Some(Token {
kind: TokenType::Number,
buf: Buffer::MultiByte(b),
}),
(None, Mode::Number) => Some(Token {
kind: TokenType::Number,
buf: Buffer::Span(Span { first, end: self.cursor }),
}),
_ => None,
},
Some(t) => {
if self.cursor == last_cursor {
None
Expand Down
36 changes: 36 additions & 0 deletions tests/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,42 @@ fn other_backslash_escapes_in_string_value() {
);
}

#[test]
fn isolated_value_pairs() {
for &(src, ref kind, first, end, buf) in &[
(r#""v":12"#, TokenType::Number, 4, 6, "12"),
(r#""v":-12"#, TokenType::Number, 4, 7, "-12"),
(r#""v":"12""#, TokenType::String, 4, 8, r#""12""#),
(r#""v":true"#, TokenType::BooleanTrue, 4, 8, ""),
(r#""v":false"#, TokenType::BooleanFalse, 4, 9, ""),
(r#""v":null"#, TokenType::Null, 4, 8, ""),
] {
let mut it = Lexer::new(src.bytes(), BufferType::Bytes(0));

assert_eq!(
it.by_ref().skip(2).next(),
Some(Token {
kind: kind.clone(),
buf: match &kind {
TokenType::Number => Buffer::MultiByte(buf.as_bytes().to_vec()),
TokenType::String => Buffer::MultiByte(buf.as_bytes().to_vec()),
_ => Buffer::Span(Span { first, end }),
}
})
);

let mut it = Lexer::new(src.bytes(), BufferType::Span);

assert_eq!(
it.by_ref().skip(2).next(),
Some(Token {
kind: kind.clone(),
buf: Buffer::Span(Span { first, end }),
})
);
}
}

#[test]
fn special_values_closed_and_unclosed() {
for &(src, ref kind, first, end) in &[
Expand Down

0 comments on commit 927c987

Please sign in to comment.