Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic tokenization tests #5381

Merged
merged 4 commits into from
Jun 4, 2024
Merged

Conversation

OceanOak
Copy link
Collaborator

@OceanOak OceanOak commented May 31, 2024

This PR fixes semantic tokenization, and adds tests for it

@OceanOak OceanOak force-pushed the smtz-tests branch 6 times, most recently from 3b943ea to 5a742d1 Compare June 4, 2024 11:26
@OceanOak
Copy link
Collaborator Author

OceanOak commented Jun 4, 2024

Todos and things to consider for a follow-up PR:

  • Maybe in List <'a>, DB<'a>, and Dict<'a>, the <'a> should be tokenized as a typeParam instead of tokenizing <, typeName, > separately.

  • in Dict maybe key should be tokenized as a property instead of a string?

  • in EPipeEnum, all of this is Stdlib.Result.Result. is tokenized as a TypeName instead of ModuleName.ModuleName.TypeName. Same for EPipeFnCall and MPEnum

  • maybe true and false shouldn't be tokenized as Keyword

    Reason:
    lets take this example: if true then true else false
    the result of the semantic tokenization will be: Keyword Keyword Keyword Keyword Keyword Keyword
    and it will be highlighted in the same color, which doesn't look right
    (Maybe no one will write such code; it's just an example of an extreme case where everything would be
    tokenized as a keyword.)

Notes in case we decide to capture all symbols some day:

currently, we are not tokenizing the following:

  • the .s in fully qualified type name, function name, and constant name
  • the {} and ; in Type Declaration Record
  • the ; in List
  • both the = and ; in Dict
  • the () in PipeInfix and PipeLambda

@OceanOak OceanOak marked this pull request as ready for review June 4, 2024 13:45
@OceanOak OceanOak requested a review from StachuDotNet June 4, 2024 13:45
@OceanOak OceanOak changed the title WIP: add semantic tokenization tests Add semantic tokenization tests Jun 4, 2024
@StachuDotNet
Copy link
Member

Those are great comments - I added a link in #5259 to that comment so we don't lose track.
Quick thoughts:

  • typeParam point: yeah, probably good to reuse some functionality there
  • Dict key thing: yeah I'd say so
  • "maybe true and false shouldn't be tokenized as Keyword" - agreed. Just took a look and seems F# tokenizes true and false as other or constant (depending on which tokenization system). adding a new token type for constant might work well for us here. no 'default' available tokens fit super well, besides keyword, and the problem you outline is real

Copy link
Member

@StachuDotNet StachuDotNet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here - shame the auto-formatter does at it does, but those tests will look tidier soon. We'll be thankful for these tests.

@StachuDotNet StachuDotNet merged commit fc4b62f into darklang:main Jun 4, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants