Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust & Typescript Cursors for the CST #540

Merged

Conversation

AntonyBlakey
Copy link
Contributor

@AntonyBlakey AntonyBlakey commented Jul 12, 2023

Does not include TS Visitor - waiting for Omar's TS/Rust codegen unification PR to land before completing this task.

@changeset-bot
Copy link

changeset-bot bot commented Jul 12, 2023

🦋 Changeset detected

Latest commit: 22128d5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
changelog Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@AntonyBlakey AntonyBlakey self-assigned this Jul 12, 2023
@AntonyBlakey AntonyBlakey force-pushed the AntonyBlakey/node-cursors branch 11 times, most recently from bbb01ad to 1a3f8de Compare July 15, 2023 06:04
@AntonyBlakey AntonyBlakey marked this pull request as ready for review July 15, 2023 06:04
@AntonyBlakey AntonyBlakey requested a review from a team as a code owner July 15, 2023 06:04
@AntonyBlakey AntonyBlakey requested review from OmarTawfik and removed request for a team July 15, 2023 06:04
Cursor::new(self.clone())
}

pub fn is_token_kind(&self, kinds: &[TokenKind]) -> Option<&Rc<TokenNode>> {
Copy link
Collaborator

@OmarTawfik OmarTawfik Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For is_token_kind() and is_rule_kind(), I wonder why do they take multiple kinds, rather than only one? is it an optimization for NAPI? does that mean that we won't do that if it was a native TS API?

Copy link
Contributor Author

@AntonyBlakey AntonyBlakey Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is for alignment with the Cursor API, where you can search for more than one at a time without having to do multiple searches over multiple Cursors. In TS it will be is_token_kind(TokenKind, ...).

Note that it also returns the matched TokenNode, meaning no match is required, and it functions therefore as a typeguard.

Note also the following is_token_matching which is also for alignment with the cursor API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is_XX() usually indicates returning booleans, a cheap check that can be chained in a boolean condition with other conditions. Thoughts about something like as_token()? which indicates a "cast" that can succeed/fail (thus returning an option).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, done

self.leaf.is_none()
}

pub fn node_or_none(&self) -> Option<Node> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding node() and node_or_none(), I might be missing something, but I would have expected the rust idiomatic way is to return Option<Node> only/directly for things like this, and let the user expect()/unwrap() if needed? why the extra API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are many cases where the client know that node isn't none e.g. after a successful go_to_*, and I wanted to not require .unwrap() there, and instead catch it in one place. Happy to change the method names if you have a suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is idiomatic to let the user make that decision, and to make it explicit at the call site (a strong direction in rust). I think similar to how a vector/map/set would offer get() -> Option<T> without (the often convenient) get_and_unwrap() -> T

Copy link
Contributor Author

@AntonyBlakey AntonyBlakey Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trait Index provides [] for this, and it is implemented on Vec, HashMap, and BTreeMap for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in any case, solved with an explicit completed state. Also allows the cursor to be reset

.text_range()
}

pub fn path_rule_nodes(&self) -> Vec<Rc<RuleNode>> {
Copy link
Collaborator

@OmarTawfik OmarTawfik Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the use case for path_rule_nodes(). I see it is currently unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that you can access the parent RuleNodes when using a cursor to navigate the tree.

}

#[napi(getter, ts_return_type = "RuleNode | TokenNode | null")]
pub fn node(&self, env: Env) -> Option<JsObject> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something. This would convert it to a plain JS object, however, us typing it with RuleNode | TokenNode means that it will have the usual APIs on it, like the ones that need round tripping to Rust. How does that work?

this is related to other to_js() uses as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we have to return JsObject because the result is polymorphic. And fundamentally, JsObject is what is passed even if the Rust type were explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on the receiver side, working in NodeJS land, they would get a plain JS object.
It wouldn't have the native Rust APIs? not sure how would calling its methods would round trip.

... I think I will wait for this to be ready and play with it for a bit to understand more.

fn rule_exit(
&mut self,
node: &Rc<RuleNode>,
cursor: &Cursor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOL, I wonder if we should be exposing the cursor to the visitor API, even if it cannot be used/mutated by the caller? what is the use case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For two reasons

  1. You can start a Visitor at any Cursor position. It is nice in the Visitor to be able to get to the Cursor and do some more internal iteration. Apart from being convenient, it makes internal and external iterators nicely symmetric.
  2. Regardless of the symmetry, it is convenient in the visitor to be able to access things like the text_range from the Cursor, and also the path_rule_nodes. As the Cursor API is extended, the Visitor implementations can use those facilities.

},
};

#[test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest splitting into three tests for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but because the 'test' nature of this is not actually for testing, but just to validate that the example is ok, I decided that the clarity I could gain by simply repeating a section of code with exactly the same result but using different API patterns was worth having the repeated functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is also for "testing", since the Rust test infra is the same in all projects, this is a quick way for someone to just copy a test locally, and edit/run it for experimenting.

It also indicates to readers that these are separate, instead of having side effects on later statements.

Copy link
Collaborator

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions/concerns.

return Ok(VisitorEntryResponse::StepIn);
if node.kind == RuleKind::ContractDefinition {
if let Node::Token(token) = &node.children[2] {
assert_eq!(token.kind, TokenKind::Identifier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ensure!() to be consistent with bail!() instead of panicking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

) -> Result<VisitorEntryResponse> {
if node.kind != RuleKind::ContractDefinition {
return Ok(VisitorEntryResponse::StepIn);
if node.kind == RuleKind::ContractDefinition {
Copy link
Collaborator

@OmarTawfik OmarTawfik Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggest quitting early when possible (like the existing visitor) for readability, and to decrease nesting. But not a blocker at all.

if !interesting {
  return;
}
do_something();

rather than:

if interesting {
  do_something();
}
return;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a functional programming fan, I prefer to have a single control flow, and therefore not use guards as often.

goToNthChild(childNumber: number): boolean;
goToNextSibling(): boolean;
goToPreviousSibling(): boolean;
findRuleKind(kinds: Array<RuleKind>): RuleNode | null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOL: thoughts about unifying findXXXX() and findXXXXWithoutRecursing() to a default argument findXXXX(recursive = true)?

Copy link
Contributor Author

@AntonyBlakey AntonyBlakey Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to add a recurse param to find* if the use cases were not equaly distributed, which I suspect they wouldn't be.

But in any case I've removed the *WithoutRecursing forms because they cannot work - the movement to the next has to happen between calls to those methods.

@AntonyBlakey AntonyBlakey added this pull request to the merge queue Jul 19, 2023
Copy link
Collaborator

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a few open questions, but will wait until the docs are ready to have a better understanding of the proposed API before continuing.

@OmarTawfik OmarTawfik removed this pull request from the merge queue due to a manual request Jul 19, 2023
@AntonyBlakey AntonyBlakey added this pull request to the merge queue Jul 20, 2023
Merged via the queue into NomicFoundation:main with commit 0d04f95 Jul 20, 2023
@AntonyBlakey AntonyBlakey deleted the AntonyBlakey/node-cursors branch July 20, 2023 07:58
@github-actions github-actions bot mentioned this pull request Jul 20, 2023
@OmarTawfik OmarTawfik linked an issue Jul 20, 2023 that may be closed by this pull request
github-merge-queue bot pushed a commit that referenced this pull request Aug 2, 2023
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and publish to npm
yourself or [setup this action to publish
automatically](https://github.com/changesets/action#with-publishing). If
you're not ready to do a release yet, that's fine, whenever you add more
changesets to main, this PR will be updated.


# Releases
## @nomicfoundation/[email protected]

### Minor Changes

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add a Rust
Cursor API and refactor the Rust Visitor API to run on top of it.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Move Visitor
et al to node:: namespace, which is where Cursor is.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Rename
`range` functions that return a TextRange to `text_range`

### Patch Changes

- [#543](#543)
[`7a34599`](7a34599)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - Move
`syntax::parser::ProductionKind` to `syntax::nodes` namespace.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add
TokenNode.text to the TS API.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add first
pass of Typescript binding to the Cursor API, but no TS Visitor yet.

- [#545](#545)
[`e73658a`](e73658a)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - render EBNF
grammar on top of each `ProductionKind`, `RuleKind`, and `TokenKind`.

- [#558](#558)
[`95bbc50`](95bbc50)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - Correct
versioning for `SourceUnitMember` and `ContractMember` children.

## [email protected]

### Minor Changes

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add a Rust
Cursor API and refactor the Rust Visitor API to run on top of it.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Move Visitor
et al to node:: namespace, which is where Cursor is.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Rename
`range` functions that return a TextRange to `text_range`

### Patch Changes

- [#543](#543)
[`7a34599`](7a34599)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - Move
`syntax::parser::ProductionKind` to `syntax::nodes` namespace.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add
TokenNode.text to the TS API.

- [#540](#540)
[`0d04f95`](0d04f95)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add first
pass of Typescript binding to the Cursor API, but no TS Visitor yet.

- [#545](#545)
[`e73658a`](e73658a)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - render EBNF
grammar on top of each `ProductionKind`, `RuleKind`, and `TokenKind`.

- [#558](#558)
[`95bbc50`](95bbc50)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - Correct
versioning for `SourceUnitMember` and `ContractMember` children.

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide How Tos for the Cursor API
2 participants