Make lexer produce Result<Token, Error> instead of Token. #273

agluszak · 2022-12-20T17:57:40Z

Remove the requirement of designating an error token. Remove the #[error] attribute.
Using Result instead allows passing the error details downstream.

Closes #158. Closes #104.

Remove the requirement of designating an error token. Remove the `#[error]` attribute. Using Result instead allows passing the error details downstream.

agluszak · 2022-12-20T18:01:49Z

Hi! I implemented that change for myself, but I think it might be useful to upstream it because the previous discussion/implementation effort seems to have stalled

Skarlett · 2022-12-21T10:09:31Z

Perhaps this would be better suited as an additional function to the Lexer<T>.

maybe Lexer<T>::try_lexer(&str) which returns an iterator that returns result types.

rewriting of the lexer(&str) function and adding to the upstream, would break anyone's build who relies on the library's core behavior.

agluszak · 2022-12-21T11:20:50Z

maybe Lexer::try_lexer(&str) which returns an iterator that returns result types.

Hm, I think that would require introducing a second trait (TryLogos?) and changing the derive macro to require either an #[error] annotation on some enum variant OR #[logos(error = ErrorType)].

If you want to keep the previous behavior you could map the Result<Token, ErrorType> iterator back to Token by replacing any Err with some token.

rewriting of the lexer(&str) function and adding to the upstream, would break anyone's build who relies on the library's core behavior.

Sure, that's a breaking change - it would have to be included in 0.13, not 0.12.2. And there will have to be a migration guide.

Skarlett · 2022-12-25T00:57:15Z

You know, an alternative for this behavior without this PR.

T::lexer(&src)
   .map(|t| match t { 
        tok => Ok(tok),
        T::Error => Err(ctor_error()) )
});

I find the changes added to be unergonomic, and badly composed. Request for close.

agluszak · 2022-12-25T16:39:24Z

You know, an alternative for this behavior without this PR.
T::lexer(&src)
   .map(|t| match t { 
        tok => Ok(tok),
        T::Error => Err(ctor_error()) )
});

This is not at all an alternative. Once all information about the error has been discarded, it is impossible to recover it. There is no information associated with T::Error. See the linked issues (#158, #104).

maciejhirsz · 2023-02-26T10:39:55Z

I'll do a proper review in a bit, really good job with this one. I was tempted to do an implementation that's backwards compatible somehow, but I also reckon that the "There should be one-- and preferably only one --obvious way to do it" rule should apply here.

The only thing I'd change, sans of fixing conflicts (but that's my fault), is changing the default error type from () to some concrete zero-sized struct, so that the Error trait can be implemented on it.

With that in mind:

I find the changes added to be unergonomic, and badly composed.

Per my comment from almost 3 years ago when I was pondering this (<insert existential dread of getting old>), this should yield no performance cost at all and actually improve ergonomics. Using Iterator interfaces it's actually quite nice to collect the tokens into Result<Vec<Token>, _>, stopping on the first error, using just, well, the collect method. Do you have any examples of user code where this is somehow worse to work with by a significant margin?

maciejhirsz · 2023-02-26T12:06:15Z

I'll tackle the error type in subsequent PR. Also looking at tests I reckon we need a better way of setting up skips for whitespace, I reckon a good place to start would be an attribute on the token, so that instead of:

enum Token {
    #[regex(r"[ \t\n\f]+", logos::skip)]
    Ignored,
    // ...
}

We could just do:

#[logos(skip r"[ \t\n\f]+")]
enum Token {
    // ...
}

Or some such. Thanks again for the PR @agluszak, and apologies for getting to it so late.

maciejhirsz · 2023-04-10T11:46:32Z

Released in 0.13

Make lexer produce Result<Token, Error> instead of Token.

28aa23a

Remove the requirement of designating an error token. Remove the `#[error]` attribute. Using Result instead allows passing the error details downstream.

maciejhirsz and others added 2 commits February 26, 2023 12:48

Merge branch 'master' into errors

e29f7e5

Fix post-conflict errors and fmt

649acf6

maciejhirsz merged commit 8ea5cac into maciejhirsz:master Feb 26, 2023

maciejhirsz mentioned this pull request Feb 26, 2023

Added support for #[logos(skip "...")] #284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make lexer produce Result<Token, Error> instead of Token. #273

Make lexer produce Result<Token, Error> instead of Token. #273

agluszak commented Dec 20, 2022

agluszak commented Dec 20, 2022

Skarlett commented Dec 21, 2022

agluszak commented Dec 21, 2022

Skarlett commented Dec 25, 2022 •

edited

Loading

agluszak commented Dec 25, 2022

maciejhirsz commented Feb 26, 2023

maciejhirsz commented Feb 26, 2023

maciejhirsz commented Apr 10, 2023

Make lexer produce Result<Token, Error> instead of Token. #273

Make lexer produce Result<Token, Error> instead of Token. #273

Conversation

agluszak commented Dec 20, 2022

agluszak commented Dec 20, 2022

Skarlett commented Dec 21, 2022

agluszak commented Dec 21, 2022

Skarlett commented Dec 25, 2022 • edited Loading

agluszak commented Dec 25, 2022

maciejhirsz commented Feb 26, 2023

maciejhirsz commented Feb 26, 2023

maciejhirsz commented Apr 10, 2023

Skarlett commented Dec 25, 2022 •

edited

Loading