Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make lexer produce Result<Token, Error> instead of Token. #273

Merged
merged 3 commits into from
Feb 26, 2023

Conversation

agluszak
Copy link
Contributor

Remove the requirement of designating an error token. Remove the #[error] attribute.
Using Result instead allows passing the error details downstream.

Closes #158. Closes #104.

Remove the requirement of designating an error token.
Remove the `#[error]` attribute.
Using Result instead allows passing the error details downstream.
@agluszak
Copy link
Contributor Author

Hi! I implemented that change for myself, but I think it might be useful to upstream it because the previous discussion/implementation effort seems to have stalled

@Skarlett
Copy link

Perhaps this would be better suited as an additional function to the Lexer<T>.

maybe Lexer<T>::try_lexer(&str) which returns an iterator that returns result types.

rewriting of the lexer(&str) function and adding to the upstream, would break anyone's build who relies on the library's core behavior.

@agluszak
Copy link
Contributor Author

maybe Lexer::try_lexer(&str) which returns an iterator that returns result types.

Hm, I think that would require introducing a second trait (TryLogos?) and changing the derive macro to require either an #[error] annotation on some enum variant OR #[logos(error = ErrorType)].

If you want to keep the previous behavior you could map the Result<Token, ErrorType> iterator back to Token by replacing any Err with some token.

rewriting of the lexer(&str) function and adding to the upstream, would break anyone's build who relies on the library's core behavior.

Sure, that's a breaking change - it would have to be included in 0.13, not 0.12.2. And there will have to be a migration guide.

@Skarlett
Copy link

Skarlett commented Dec 25, 2022

You know, an alternative for this behavior without this PR.

T::lexer(&src)
   .map(|t| match t { 
        tok => Ok(tok),
        T::Error => Err(ctor_error()) )
});

I find the changes added to be unergonomic, and badly composed. Request for close.

@agluszak
Copy link
Contributor Author

You know, an alternative for this behavior without this PR.

T::lexer(&src)
   .map(|t| match t { 
        tok => Ok(tok),
        T::Error => Err(ctor_error()) )
});

This is not at all an alternative. Once all information about the error has been discarded, it is impossible to recover it. There is no information associated with T::Error. See the linked issues (#158, #104).

@maciejhirsz
Copy link
Owner

I'll do a proper review in a bit, really good job with this one. I was tempted to do an implementation that's backwards compatible somehow, but I also reckon that the "There should be one-- and preferably only one --obvious way to do it" rule should apply here.

The only thing I'd change, sans of fixing conflicts (but that's my fault), is changing the default error type from () to some concrete zero-sized struct, so that the Error trait can be implemented on it.

With that in mind:

I find the changes added to be unergonomic, and badly composed.

Per my comment from almost 3 years ago when I was pondering this (<insert existential dread of getting old>), this should yield no performance cost at all and actually improve ergonomics. Using Iterator interfaces it's actually quite nice to collect the tokens into Result<Vec<Token>, _>, stopping on the first error, using just, well, the collect method. Do you have any examples of user code where this is somehow worse to work with by a significant margin?

@maciejhirsz
Copy link
Owner

I'll tackle the error type in subsequent PR. Also looking at tests I reckon we need a better way of setting up skips for whitespace, I reckon a good place to start would be an attribute on the token, so that instead of:

enum Token {
    #[regex(r"[ \t\n\f]+", logos::skip)]
    Ignored,
    // ...
}

We could just do:

#[logos(skip r"[ \t\n\f]+")]
enum Token {
    // ...
}

Or some such. Thanks again for the PR @agluszak, and apologies for getting to it so late.

@maciejhirsz maciejhirsz merged commit 8ea5cac into maciejhirsz:master Feb 26, 2023
@maciejhirsz
Copy link
Owner

Released in 0.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error kinds Allow lexers to provide more user-friendly errors
3 participants