Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuple struct field is parsed as float #1109

Open
sigoden opened this issue Apr 4, 2019 · 19 comments · Fixed by #12149
Open

Tuple struct field is parsed as float #1109

sigoden opened this issue Apr 4, 2019 · 19 comments · Fixed by #12149
Labels
A-parser parser issues C-bug Category: bug E-hard S-actionable Someone could pick this issue up and work on it right now

Comments

@sigoden
Copy link

sigoden commented Apr 4, 2019

use std::collections::HashMap;

pub struct Graph(HashMap<i32, Vec<i32>>);

impl Graph {
    pub fn new() -> Self {
        Graph(HashMap::new())
    }
    pub fn add_edge(&mut self, u: i32, v: i32) {
        self.0.<|>
    }
}
@kjeremy
Copy link
Contributor

kjeremy commented Apr 5, 2019

In the case of

pub struct S;
impl S {
    pub fn blah(&self) {}
}

struct T(S);

impl T {
    fn foo(&self) {
        self.0.<|>
    }
}

It looks like "0." is treated as a LITERAL:

L_CURLY@[5619; 5620) "{"
    WHITESPACE@[5620; 5630) "\r\n        "
    EXPR_STMT@[5630; 5635)
      FIELD_EXPR@[5630; 5635)
        PATH_EXPR@[5630; 5634)
          PATH@[5630; 5634)
            PATH_SEGMENT@[5630; 5634)
              SELF_KW@[5630; 5634) "self"
        DOT@[5634; 5635) "."
        err: `expected field name or number`
        err: `expected SEMI`
    LITERAL@[5635; 5637)
      FLOAT_NUMBER@[5635; 5637) "0."
    WHITESPACE@[5637; 5643) "\r\n    "
    R_CURLY@[5643; 5644) "}"

@kjeremy
Copy link
Contributor

kjeremy commented Apr 5, 2019

Shouldn't struct T(S); be a TupleStruct? I don't see much about them in our code.

@matklad
Copy link
Member

matklad commented Apr 5, 2019

That is expected, b/c 0. is a float literal. Note, however, that completion inserts a dummy identifier (see how CompletionCtx is constructed), so that shouldn't be a problem.

Shouldn't struct T(S); be a TupleStruct? I don't see much about them in our code.

I haven't looked at this at all, it might be the case that "nothing works b/c nothing is supported", but it shouldb't be too hard to fix. I suggest just tracing back from the completion to see where we get wrong results.

@flodiebold
Copy link
Member

In principle, this should work, but there are no tests for typing tuple_struct.0 correctly, so it might actually be broken for some reason (I haven't investigated further so far)...

@kjeremy
Copy link
Contributor

kjeremy commented Apr 5, 2019

The CompletionContext.dot_receiver isn't being filled. It looks like we aren't passing the test at https://github.com/rust-analyzer/rust-analyzer/blob/42a883f06c28ddeab22e5703a578f19110dde7f3/crates/ra_ide_api/src/completion/completion_context.rs#L214

As an aside I couldn't debug this at all under windows and my vm likes to churn at 100% disk usage :D

@robojumper
Copy link
Contributor

I have opened #1117 to add inference support to tuple struct indexing. However, this doesn't fix the issue as the lexer still finds a float literal in the autocompletion case and I don't know how to recover and parse it as int + dot.

bors bot added a commit that referenced this issue Apr 5, 2019
1117: [WIP] Tuple struct index inference r=matklad a=robojumper

The first commit adds a helper struct `ast::FieldKind` to facilitate inference.

The second commit adds a slightly modified test from #1109 while mentioning that there is a problem with how we're handling tuple indexing / floats.

cc #1109 

Co-authored-by: robojumper <[email protected]>
@matklad
Copy link
Member

matklad commented Apr 18, 2019

So the problem here is that we can't properly analyze incomplete file due to stupid lexer :) The tree for

{ self.0.<|> }

is

BLOCK@[198; 221)
  L_CURLY@[198; 199) "{"
  WHITESPACE@[199; 208) "\n        "
  FIELD_EXPR@[208; 215)
    PATH_EXPR@[208; 212)
      PATH@[208; 212)
        PATH_SEGMENT@[208; 212)
          SELF_KW@[208; 212) "self"
    DOT@[212; 213) "."
    err: `Tuple (struct) field access is only allowed through decimal integers with no underscores or suffix`
    FLOAT_NUMBER@[213; 215) "0."
  WHITESPACE@[215; 220) "\n    "
  R_CURLY@[220; 221) "}"

Note that we have FIELD_EXPR for self., but no expr for self.0. I think the cleanest fix here is to just special-case HIR construction: if field expr is missing a field, and the immediate next token is a float literal which ends in ., we should fix this up to field access expression. The relevant code is here:

https://github.com/rust-analyzer/rust-analyzer/blob/d55f1136d6444b1f50b9092c36a976d0e1c26202/crates/ra_hir/src/expr.rs#L690

We probably will need to patch completion context as well:

https://github.com/rust-analyzer/rust-analyzer/blob/d55f1136d6444b1f50b9092c36a976d0e1c26202/crates/ra_ide_api/src/completion/completion_context.rs#L203

Question: why don't we fix the parser/lexer instead?

It seems like this require the parser to know the text of the tokens, or for the lexer to know the preceding token, both of which seem less than ideal. But maybe I am missing some clean way to fix this?

@matklad matklad added E-hard and removed E-medium labels Apr 18, 2019
@edwin0cheng
Copy link
Member

edwin0cheng commented Apr 18, 2019

for the lexer to know the preceding token

We do not need to know a whole token, we only need to know prev char is a "DOT", such that we only need to add a prev function in lex::Ptr ? And change scan_number to handle that case ?

edit: grammer and typo

@matklad
Copy link
Member

matklad commented Apr 18, 2019

a single char wouldn't be enough: 0.0..0.9 should parse as range. In general, I'd avoid adding lookahead/lookbehind to the lexer

@flodiebold flodiebold added the A-completion autocompletion label Dec 26, 2019
@lnicola
Copy link
Member

lnicola commented Jul 15, 2020

Triage: completion works here, but the expression is still parsed as a float.

image

@lnicola lnicola added A-parser parser issues S-actionable Someone could pick this issue up and work on it right now and removed A-completion autocompletion labels Jan 25, 2021
@lnicola lnicola changed the title No completion for new_type struct Tuple stuct field is parsed as float Jan 25, 2021
@jonas-schievink
Copy link
Contributor

Here's a simple reproduction. This infers fld as (i32,), but it should be i32:

fn main() {
    let fld = ((0,),).0.0;
}

@alexfertel
Copy link
Contributor

Can I try to fix this one? 👀

@lnicola
Copy link
Member

lnicola commented Jul 30, 2021

Sure :-).

@jonas-schievink jonas-schievink added Broken Window Bugs / technical debt to be addressed immediately C-bug Category: bug labels Apr 6, 2022
@jonas-schievink
Copy link
Contributor

I don't think this is going to be properly fixable without fixing it at the parser/lexer stage. Consider the way tup.0.0 is currently lexed:

Because 0.0 is its own token, this leaves no possibility of referring to tup.0 as a syntax node by itself (even if the parse tree looked different), which means that it would be an expression with no corresponding source node, which would mean that IDE features wouldn't work on it (extend selection and hover selection come to mind, but others would probably break too).

I think that leaves 2 options:

Option A – change the lexer: I've played with the idea of changing how floats are lexed at the rustc_lexer stage, they could be split up into a "pre-decimal" part, the ., and a "post-decimal" part, and then the parser could reassemble them when parsing a literal. This might cause problems with proc macros though, AFAIK they get direct access to the lexer output. Perhaps we could split up float tokens coming from rustc_lexer when converting them to our input token type though.

This makes the lexer output look somewhat "ugly" – why would floats be multiple tokens? –, but from a compiler engineering perspective seems like the more "correct" way of doing things as it avoids lexer hacks.

Option B – Put a 'lexer hack' in the parser: Another alternative would be to allow the parser to modify its input tokens while it's parsing them. Presumably we could tuck away the code that does this somewhere, to avoid having the parser "know" token contents, and just have it call try_split_float_into_dot_separated_ints. Sadly this doesn't seem to be immediately doable either, because the parser just reports events to the caller, such as "consume N raw tokens and build a node of type X" – where the whole float literal corresponds to a single "raw token" that was provided by the caller, so we can't tell it "consume half of this token". Also, this would need changes to Input to make all users of the parser provide the text in each token.

rustc and syn both implement option B, because they are both lossy one-way parsers and that's simplest for them. I'm not quite sure which solution would be architecturally simpler for us but would appreciate some input, because I don't like this bug and would like to get it fixed :)

@jonas-schievink
Copy link
Contributor

Sadly the fix caused a lot of downstream bugs, so I've reverted it in #12241 for now

@Veykril
Copy link
Member

Veykril commented May 25, 2022

Another old related issue #5429

@cynecx
Copy link
Contributor

cynecx commented Sep 25, 2022

I've implemented an even more hacky "Option B" from above here: cynecx@cc7ba82. Fair warning: It is far from being upstreamable or PR worthy, but works if you really need it.

@Veykril
Copy link
Member

Veykril commented Sep 26, 2022

Hmm, looking at this, I wonder if we can just encode the split as an extra event with option B? The main question is how to encode the split position (or if we have to just split on demand later)

@Veykril
Copy link
Member

Veykril commented Feb 8, 2023

#14084 fixes this mostly. We still incorrectly parse it if it's passed as a macro input where an expression fragment is expected.

@Veykril Veykril removed their assignment Feb 8, 2023
@Veykril Veykril removed the Broken Window Bugs / technical debt to be addressed immediately label Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser parser issues C-bug Category: bug E-hard S-actionable Someone could pick this issue up and work on it right now
Projects
None yet
Development

Successfully merging a pull request may close this issue.