Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name nodes for property names etc #322

Closed
felixfbecker opened this issue Dec 1, 2016 · 10 comments
Closed

Name nodes for property names etc #322

felixfbecker opened this issue Dec 1, 2016 · 10 comments
Milestone

Comments

@felixfbecker
Copy link
Contributor

felixfbecker commented Dec 1, 2016

Currently there is no way to get the offset information of the name of a PropertyFetch or MethodCall. Assume having code like this:

echo $obj->whatever();

$obj has two methods, whatever and whatever2.
The user's cursor is behind what, I will suggest both methods.

First issue is the node under the cursor will resolve to the whole MethodCall node. Therefor I cannot find out what prefix the user has already typed (what) and cannot filter by that. If the method name was a node, I could just get the substring of the method name between the start offset and the cursor position offset.

Second issue is when the user selects whatever2, I need to tell the editor to replace the method name (and only the method name) with whatever2. But since it is not a node, it does not have offset information, and all I can do is insert it directly like

echo $obj->whatwhatever2ever();

or solve it with a smart algorithm that "guesses" by the source code string how to insert it without overlapping parts.

Other places where this issue occurs is in find-all-references, where a method reference will always highlight the whole MethodCall node instead of just the method node.
Also the other way around, clicking anywhere except the variable node inside the MethodCall (even the -> operator) will resolve to the definition of the method, not just the name.
The same problem is also true for classes. extends and implements contain proper Name nodes, but the class name itself is just a string. Invoking find-all-references should only work on the class name, but in reality you can click anywhere in the class, even whitespace, to get the references of the class.

I would propose to make all string "names" a Name node so we can always get offset information.
The node should probably not be a Node\Name\FullyQualified node with parts, but just a simple Node\Name node that wraps the string and exposes the offset information like any other node.

This is also an argument in favor of having a parts sub-node array for FullyQualifiedName:

use SomeNamespace\OtherNamespace\Whatever;

new \SomeNamespace\OtherNamespace\Whatever();

would would ideally be

Node\Name\FullyQualified {
  parts => array(
    Node\Name {
      name => 'SomeNamespace'
    },
    Node\Name {
      name => 'OtherNamespace'
    },
    Node\Name {
      name => 'Whatever'
    }
  )
}

Where each part has its own offset information, but the whole Name\FullyQualified node still just joins the parts with a backslash when calling __toString(). That would make it much easier to provide completion for namespaces, like when you do

use SomeNamespace\Other|Namespace\Whatever;

where the cursor is behind Other, the node would resolve to the Node\Name node with OtherNamespace value. I can then do suggestions filtered by the Other prefix and when the user selects one, I can tell the editor to replace only the OtherNamespace node. Currently I can only replace the whole Node\Name\FullyQualified name node, or I have to do my own custom string parsing (searching for the next backslash) to find out what needs to be replaced.

The last proposal is kind of optional, because as I just said it is possible to accomplish this, but I just wanted for you to consider this because it is a counter-argument to removing the parts array.

@nikic
Copy link
Owner

nikic commented Dec 22, 2016

Preliminary implementation: https://github.com/nikic/PHP-Parser/compare/identifiersAsNodes

If the useIdentifierNodes parser option is enabled, the parser will create Identifier nodes for non-namespaced names.

I'm using a new node type here, because I'd like to keep a distinction between namespaced and non-namespaced names. Many of the methods on Name (like isFullyQualified) simply don't make sense in this context. An alternative to Identifier might be Label, though I'm afraid this might cause confusion with the Label statement.

I've kept the contents of Name a simple string array. I've also kept Variable->name a simple string. One could argue that this could also use a wrapper, but in this case the actual name has no dedicated token, so we run into some attribute mapping issues.

One case that still uses strings are built-in type annotations like foo(array $x). This should probably also use Identifier.

@felixfbecker
Copy link
Contributor Author

felixfbecker commented Dec 22, 2016

I'm using a new node type here, because I'd like to keep a distinction between namespaced and non-namespaced names. Many of the methods on Name (like isFullyQualified) simply don't make sense in this context.

I agree, good point. What do you think about a NamePart class? Or NameName, like UseUse / PropertyProperty?

What do you think about making the parts array of Name an array of NameParts/Identifiers/NameNames? So that we have distinct offset information for every part of a qualified name.

@nikic
Copy link
Owner

nikic commented Dec 22, 2016

I agree, good point. What do you think about a NamePart class? Or NameName, like UseUse / PropertyProperty?

In most cases it is the complete name though, so something like NamePart does not seem appropriate. I'm also not a big fan of the UseUse names (even though once upon a time I came up with them for some reason), they really should have been something like UseItem or UsePart.

What do you think about making the parts array of Name an array of NameParts/Identifiers/NameNames? So that we have distinct offset information for every part of a qualified name.

Not totally sold on this yet, though not sure why... I suspect this may have weird interactions with name creation and modification. E.g. if you run the name resolver, names will have some parts "located" in the namespace-declaration (or a use-clause) and some parts "located" at the actual location of the name. That seems weird to me, but maybe it's okay. Relatedly, if you do something like Name::concat($n, 'foo') where $n is a name with Identifier parts, what do you get? Will the last part be a simple string, or will it be normalized to an (attribute-less) Identifier? Presumably methods like getFirst() will also return Identifiers instead of plain strings?

@nikic
Copy link
Owner

nikic commented Dec 22, 2016

I've now changed callable, array etc. types to be represented as Identifiers as well.

I think that while we're at it, we should also take a look at other cases where we currently represent something as plain strings without a wrapping node. In particular, we have a number of places where variable names are presented as simple strings without a wrapping Variable node. I would propose that if this mode is enabled, we should wrap the following in Variables:

  • Catch clause variable names. catch (Foo $var)
  • Parameter names. function foo($var) {}
  • Static variable names. static $foo;
  • Lexical variable names. function() use ($var)

The following two syntactically appear as variables, but really aren't. I'm not totally sure how these should be presented. Would it be inappropriate to use Identifier for this?

  • Property names (in declarations). public $prop
  • Static variable names (in lookups). self::$prop

@felixfbecker
Copy link
Contributor Author

In most cases it is the complete name though, so something like NamePart does not seem appropriate. I'm also not a big fan of the UseUse names

UseUse was confusing for me too. Why do you think is NamePart not a good idea?

if you run the name resolver, names will have some parts "located" in the namespace-declaration (or a use-clause) and some parts "located" at the actual location of the name

It took a while, but I get what you mean. But even at the moment, the Name node offsets are kind of weird after you ran the NameResolver, right? A Name node like \Very\Deep\Nested\ClassName can have an offset of 'startFilePos' => 50, 'endFilePos' => 59 because it used to be just ClassName. It is simply impossible to let the offset represent the exact offsets in the document after you've run NameResolver... the only way to go would be to let the NameResolver add a resolvedName attribute and not replace the node, like you suggested in the other issue.

Relatedly, if you do something like Name::concat($n, 'foo') where $n is a name with Identifier parts, what do you get? Will the last part be a simple string, or will it be normalized to an (attribute-less) Identifier?

I would assume Name::concat() to implicitely convert strings.

Presumably methods like getFirst() will also return Identifiers instead of plain strings?

getFirst() would return a node too, yes, but it could be easily casted to a string.

I've now changed callable, array etc. types to be represented as Identifiers as well.

What about self, parenttruefalse? Although I'm not sure if I'm a fan of mixing class/namespace referencing names with keywords/types. I have found a bug in my code a couple of times now because I didn't handle those somewhere, and ended up inserting a definition for the literal FQSEN self into the index. Or couldn't infer the type of a boolean variable, because it couldn't find a constant named true in the index.

I would propose that if this mode is enabled, we should wrap the following in Variables:
Catch clause variable 
names. catch (Foo $var)
Parameter names. function foo($var) {}
Static variable names. static $foo;
Lexical variable names. function() use ($var)

Definitely! In fact, I actually need these all to do proper go-to-definition. Would you then do an Indentifier for $var, or a Variable for $var, that has the name set to Identifier var? There is no real value in separating the dollar

The following two syntactically appear as variables, but really aren't. I'm not totally sure how these should be presented. Would it be

inappropriate to use Identifier for this?
Property names (in declarations). public $prop

I would say ket's do it in every case where the offset information would be different. If the Identifier node would have the exact same offsets as the PropertyProperty, it doesn't make sense. I assume the Identifier would not include the $?
Let's definitely do it for the names of method declarations though. Maybe you could bring the argument for consistency then.

Static variable names (in lookups). self::$prop

Yes, that will finally enable differentiating the class part and the variable part. But I guess $prop should just be a Variable node (I know it currently is only for VariableVariables).

@nikic
Copy link
Owner

nikic commented Dec 22, 2016

UseUse was confusing for me too. Why do you think is NamePart not a good idea?

Because in most cases it's not going to be a part of a name. I can see the argument that for class Foo {} the Foo is a name part, because it could meaningfully concatenate it with a namespaced name. However, for something like $foo->bar (the bar part) this is not the case.

I've now changed callable, array etc. types to be represented as Identifiers as well.

What about self, parent, true, false? Although I'm not sure if I'm a fan of mixing class/namespace referencing names with keywords/types. I have found a bug in my code a couple of times now because I didn't handle those somewhere, and ended up inserting a definition for the literal FQSEN self into the index. Or couldn't infer the type of a boolean variable, because it couldn't find a constant named true in the index.

self and parent are considered (as previously) namespaced Names. true and false are still constant lookups (where the name is again a namespaced Name). This is consistent with how PHP deals with them and in some cases this is important (e.g. on PHP 5 the value of true in a namespace is not necessarily \true...)

Definitely! In fact, I actually need these all to do proper go-to-definition. Would you then do an Indentifier for $var, or a Variable for $var, that has the name set to Identifier var? There is no real value in separating the dollar

In these cases I would use a Variable with the name set to a string, i.e. make it consistent with how variables are referenced normally. We could also use Variable[Identifier[name]] everywhere, but I don't think this is particularly useful and it will also cause issues with attribute assignment (the $var name is a single-token, so it's unclear what token offsets we should assign for the Identifier).

I would say ket's do it in every case where the offset information would be different. If the Identifier node would have the exact same offsets as the PropertyProperty, it doesn't make sense. I assume the Identifier would not include the $?

PropertyProperty can also contain an initializer expression, in which case the Identifier would encompass a smaller range. As to what the Identifier would contain here: It should not include the $ in the string, but I think it would still have to include in the position information, due to the aforementioned issue with the token offset. (Though this does seem somewhat weird -- on the other hand the same happens for Variables.)

Yes, that will finally enable differentiating the class part and the variable part. But I guess $prop should just be a Variable node (I know it currently is only for VariableVariables).

Making this a Variable would require special-casing this in code that, for example, looks for all uses of a variable. It would have to explicitly check whether the Variable is in the property portion of a static property lookup, in which case it's not actually a variable access. Additionally, it would make something like StaticPropertyFetch[Name[...], Variable[...]] mean different things depending on which parsing mode is used (in one case it would be a simple static property reference, in the other it would be an indirect static property reference).

@nikic
Copy link
Owner

nikic commented Dec 22, 2016

I've pushed a few more changes:

  • Added useConsistentVariableNodes option. This will use Variable for catch vars, parameter names, static variable names and use variables. (Both options might be merged, keeping this separate for now.)
  • Made property declarations and static property fetches use Identifiers for the property names.

I'm not really happy with point two. I don't want to use Variable here, as these are not variables semantically. I also don't like Identifier, because everywhere else this is a plain identifier, while here it has an implicit $ prefix. I'm thinking that it might be worthwhile to add an extra node kind for this. Something like VarLikeName or VarLikeIdentifier.

@felixfbecker
Copy link
Contributor Author

I agree, it is more of a variable declaration and therefor a Stmt, not an Expr. But you could bring that argument for parameters, use etc. too. How about adding Stmt\VariableDeclaration?

@nikic
Copy link
Owner

nikic commented Jan 18, 2017

Probably time to move forward with this. In combination with #41, which based on the work here implements some major changes to make format-preserving code transformations possible, I will probably just fork version 4.0 now, so I have more freedom in changing the AST without adding options for all the things.

@nikic nikic added this to the 4.0 milestone Jan 23, 2017
@nikic
Copy link
Owner

nikic commented Apr 29, 2017

In version 4.0 Identifier nodes are now always used (no longer behind an option), so I'm considering this to be resolved.

@nikic nikic closed this as completed Apr 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants