Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia: generate reference tags for external entities #2727

Merged
merged 8 commits into from
Jan 16, 2021

Conversation

AmaiKinono
Copy link
Member

@AmaiKinono AmaiKinono commented Nov 24, 2020

This is an unsuccessful try, but at least you can use it as a reference. I put all the julia using/imported forms in the comment, and created a test case (with the expected output).

I haven't write C code for 7 or 8 years, so maybe my code is too bad to fix, but if you think this can still be fixed and merged, here are the problems:

(This part is outdated. See the added part below).

  1. I can't put initRefTagEntry to work. It just don't generate a tag. Fixed.
  2. Now the library name is recorded with the : after it, and those without a : is recorded as unknown kinds. I guess I have to change advanceToken somehow to fix this. I don't know how to do it. This part is mostly broken, as you can see when manually run the test case.
  3. The roles: kind are all def now, but I guess this could be fixed using initRefTagEntry.

Here are some minor considerations:

  1. The file uses spaces to indent, which is not the coding style of u-ctags. Should we change this?
  2. "Libraries" are actually called "modules" in Julia, but it's not bound to files, and you can define a module just like you define a function. So I choose "library" to not intefere with module definitions.

Update:

The patch is now partially fixed, but still has some problems. Run

$ ctags --options=args.ctags -f- input.jl

in Units/parser-julia.r/library.d/ to see them.

  1. The lexer doesn't jump accross dots. From other unit tests I guess this is the desired behavior, but then for Library4, it captures Library4.func1 as a whole, which is unwanted.

  2. Some pattern fields are generated for the wrong lines. I wonder why this happens...

@masatake
Copy link
Member

Please, run "make tmain" locally. You must update "stdout-expected.txt" file(s).
About tmain, see https://docs.ctags.io/en/latest/tmain.html.
misc/review script may help you.

@masatake
Copy link
Member

What you are missing is --extras=+r. This makes ctags gather reference tags.
See https://github.com/universal-ctags/ctags/blob/master/Units/parser-python.r/python-import.d/args.ctags .

I will review this pull request more. You can do "push --force" anytime you want.

@masatake
Copy link
Member

I haven't write C code for 7 or 8 years

If the language is not Sexp based, all the languages are the same for us, aren't they?

parsers/julia.c Outdated Show resolved Hide resolved
parsers/julia.c Outdated Show resolved Hide resolved
@AmaiKinono
Copy link
Member Author

If the language is not Sexp based, all the languages are the same for us, aren't they?

Haha ;) Mostly true, but the &/* based one is different.

The patch is now partially fixed, but still has some problems. Run

$ ctags --options=args.ctags -f- input.jl

in Units/parser-julia.r/library.d/ to see them.

  1. The lexer doesn't jump accross dots. From other unit tests I guess this is the desired behavior, but then for Library4, it captures Library4.func1 as a whole, which is unwanted.

  2. Some pattern fields are generated for the wrong lines. I wonder why this happens...

@masatake
Copy link
Member

  1. I can't put initRefTagEntry to work. It just don't generate a tag.
  2. Now the library name is recorded with the : after it, and those without a : is recorded as unknown kinds. I guess I have to change advanceToken somehow to fix this. I don't know how to do it. This part is mostly broken, as you can see when manually run the test case.
  3. The roles: kind are all def now, but I guess this could be fixed using initRefTagEntry.

About 1. and 3. , see my review comments.
For fixing 2. We have to inspect advanceToken() but asking the original author is the way of GitHub.
@getzze could you look at this pull request?
@AmaiKinono, let's wait for a while. In the case of the original author being busy, I will take a look.

  1. The file uses spaces to indent, which is not the coding style of u-ctags. Should we change this?

I decided not to be too strict about the way of style because e-ctags was not so strict.
Doing C-M-\ on all files was one of the approaches. However, it makes reading the output of git blame harder.
So, in most of all files, I have kept the style as is. Especially if a file has a maintainer.
When making a new .c file, apply .editorconfig to your emacs.

If you want to edit an existing .C file, you have to do (1) \C-x \C-f the file, (2) M-x c-guess.
c-guess, that I contributed to cc-mode years ago, studies the style from the current buffer and update the style in c-mode.
The command has helped me contribute to many projects.

@AmaiKinono
Copy link
Member Author

M-x c-guess

Oh yes! This is really cool. Thank you!

@masatake
Copy link
Member

The following change fixes the pattern.

diff --git a/parsers/julia.c b/parsers/julia.c
index 1016cf72..58408e21 100644
--- a/parsers/julia.c
+++ b/parsers/julia.c
@@ -885,14 +885,16 @@ static void addTag (vString* ident, const char* type, const char* arg_list, int
     makeTagEntry(&tag);
 }
 
-static void addReferenceTag (vString* ident, int kind, int role, vString* scope)
+static void addReferenceTag (vString* ident, int kind, int role, unsigned long line, MIOPos pos, vString* scope)
 {
     if (kind == K_NONE)
     {
         return;
     }
     tagEntryInfo tag;
-    initRefTagEntry(&tag, vStringValue(ident), kind, role);
+    initRefTagEntry(&tag, vStringValue (ident), kind, role);
+    tag.lineNumber = line;
+    tag.filePosition = pos;
     if (scope != NULL)
     {
         tag.extensionFields.scopeKindIndex = K_LIBRARY;
@@ -1221,14 +1223,14 @@ static void parseImport (lexerState *lexer, vString *scope, int parent_kind, int
     /* capture the imported name */
     advanceToken(lexer, true);
     vStringCopy(name, lexer->token_str);
-    addReferenceTag(name, K_LIBRARY, library_role, NULL);
+    addReferenceTag(name, K_LIBRARY, library_role, lexer->line, lexer->pos, NULL);
     if (lexer->cur_c == ':')
     {
         advanceChar(lexer);
         advanceToken(lexer, true);
         while (lexer->cur_token == TOKEN_IDENTIFIER || lexer->cur_token == TOKEN_MACROCALL)
         {
-            addReferenceTag(lexer->token_str, K_UNKNOWN, unknown_role, name);
+            addReferenceTag(lexer->token_str, K_UNKNOWN, unknown_role, lexer->line, lexer->pos, name);
             skipWhitespace(lexer, false);
             if (lexer->cur_c == ',')
             {

@AmaiKinono
Copy link
Member Author

The following change fixes the pattern.

Thanks! I think I know why. Although I don't totallly understand advanceToken and how files are handled in ctags, seems in advanceToken, the lexer->line and lexer->pos is set before it jump through a newline char, and we record a tag after calling it, so getInputLineNumber may give us the line after the token.

@masatake masatake mentioned this pull request Nov 25, 2020
8 tasks

skipWhitespace(lexer, false);
if (lexer->cur_c == ',')
while (lexer->cur_token == TOKEN_IDENTIFIER || lexer->cur_token == TOKEN_MACROCALL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a tab char may be used here. M-x untabify may be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Didn't know untabify before. Actually I ran c-guess on this file and seems it didn't set up the correct style for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can remember, it doesn't distinguish tab and space.

parsers/julia.c Outdated Show resolved Hide resolved
parsers/julia.c Outdated Show resolved Hide resolved
@masatake
Copy link
Member

  1. "Libraries" are actually called "modules" in Julia, but it's not bound to files, and you can define a module just like you define a function. So I choose "library" to not intefere with module definitions.

This is the most important topic.
Does using Library1 mean using something defined with "module Library1"?
If yes, "Library1" in using Library1 is a module. More precisely, it is a reference for the module.

They do not interfere with each other.
Library1 in using Library1 is a reference tag having the module kind with the using role.
Library1 in using Library is a definition tag having the module kind.

@AmaiKinono
Copy link
Member Author

Most of the things work well now. The only problem left is for something like:

import Lib1.func1, Lib1.func2

It will create 2 tags for Lib1 because it appears 2 times. This is probably acceptable, because semantically it's the same as

import Lib1.func1
import Lib1.func2

and generate Lib1 for 2 times for the latter one looks acceptable for me.

Does using Library1 mean using something defined with "module Library1"?

Exactly.

They do not interfere with each other.
Library1 in using Library1 is a reference tag having the module kind with the using role.
Library1 in module Library1 is a definition tag having the module kind.

Ok. I'll change it.

@masatake
Copy link
Member

It will create 2 tags for Lib1 because it appears 2 times. This is probably acceptable,

Yes. 'Lib1' is referenced twice. Therefore two reference tags are needed.

@masatake
Copy link
Member

You must update Tmain/list-roles.d/stdout-expected.txt.

@coveralls
Copy link

coveralls commented Nov 25, 2020

Coverage Status

Coverage increased (+0.02%) to 86.948% when pulling 4df8c75 on AmaiKinono:julia-import into 4f69a72 on universal-ctags:master.

@codecov
Copy link

codecov bot commented Nov 25, 2020

Codecov Report

Merging #2727 (f1bfe7b) into master (c026132) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2727      +/-   ##
==========================================
+ Coverage   86.96%   86.97%   +0.01%     
==========================================
  Files         193      193              
  Lines       44000    44048      +48     
==========================================
+ Hits        38263    38310      +47     
- Misses       5737     5738       +1     
Impacted Files Coverage Δ
parsers/julia.c 96.54% <100.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c026132...f1bfe7b. Read the comment docs.

parsers/julia.c Outdated Show resolved Hide resolved
@masatake
Copy link
Member

Could you squash the commits into one commit, and do push --force ?

@AmaiKinono
Copy link
Member Author

Done.

@getzze
Copy link
Contributor

getzze commented Nov 26, 2020

Hi,
sorry I am not following everything you are doing but I would like to suggest to put in the input.jl file imports with newline:

using Module6,
      Module7,
      Module8

using Module9: func1,
               func2,
               func3

To make sure these are parsed correctly. They are well parsed now but just in case maybe.

Also, ideally, when code is written inside a module:

module Module1
function func1() end
struct Struct1 end
end

the definitions should be in Module1 scope, like Module1.func1, but I commented this functionality in parseModule(last three commented lines of the function) because it was not looking good in Geany. But I guess this should be uncommented in ctags.

Please let me know if you need my help.

@getzze
Copy link
Contributor

getzze commented Nov 26, 2020

Just to let you know of a possibility in Julia:

using Module1
import Module1: func1

the whole Module1 is imported with using and the specific function func1 is imported with import.
import is used if you need to overload a function so it makes sense to have this sometimes.

@AmaiKinono
Copy link
Member Author

AmaiKinono commented Nov 26, 2020

Thanks!

using Module6,
      Module7,
      Module8

This isn't handled well in my experiment. I must fix it.

ideally, when code is written inside a module...the definition should be in the module scope.

I haven't tried the commented code you mentioned, but let's do it in a separate patch.

However I do want to point out a problem with file-based tagging: If something is directly defined in a file (with no module wrapping it), and when that file is executed, it's in the module Main.

Now let's support we have module-file1.jl and module-file2.jl. Then another file may include those two and wrap them in a module:

module MyModule
include("module-file1.jl")
include("module-file2.jl")
end

Then when we using MyModule, we expect the symbol defined in these files are under the MyModule scope. I suspect this can't be done with ctags at least for now. The best thing we can do is attach a Main scope, or don't record a scope with them. I guess this is also a problem for Ruby? It doesn't have a file-level scope too, as far as I know.

the whole Module1 is imported with using and the specific function func1 is imported with import.
import is used if you need to overload a function so it makes sense to have this sometimes.

I guess you could just using Module1 and implement Module1.func1. Does that work? It seems to work in my experiment.

Edit: Though out of topic... I think the behavior is: with import Mod: func you can extend func, and with using Mod: func you can't. But both import Mod and using Mod let you extend Mod.func.

@masatake
Copy link
Member

module MyModule
include("module-file1.jl")
include("module-file2.jl")
end

Then when we using MyModule, we expect the symbol defined in these files are under the MyModule scope.

This is similar to "#include <stdio.h>" in C language.

What we have to do as the first is:

  • defining "script" kind (or something like that having a better name)
  • defining "included" kind
  • make reference tags for the file names with the above kind and role.

You can add more tags in the second pass if you have "multipass parsing".
Even so, tagging the files are needed in the first pass.

@masatake
Copy link
Member

I could track only some of the comments but...you don't need to put all the ideas into one pull request.

@AmaiKinono
Copy link
Member Author

I think our CI has some problems on macOS...

@masatake
Copy link
Member

masatake commented Jan 4, 2021

I think our CI has some problems on macOS...

In such a case, trigger "Re-run all jobs". With my account, I can do it. I wonder whether you can do it with your account or not.

@AmaiKinono
Copy link
Member Author

I wonder whether you can do it with your account or not.

I can. In fact, I already tried it several times, but that didn't help. Also, the CI on the recent merge commit also ran into problem on macOS:

GitHub Actions has encountered an internal error when running your job.

So maybe it's a problem on GitHub's side.

@masatake
Copy link
Member

masatake commented Jan 6, 2021

I disabled macos-11.0 at master. Could you rebase your change on master and push --force here ?

@AmaiKinono
Copy link
Member Author

Done.

@masatake
Copy link
Member

I'm very sorry to be late.

I rethink the kinds and roles. How about these ones?

# your proposal:
#   Lib: kind:module,roles:using
# my idea:
#   Lib: kind:module,roles:used
using Lib

# your proposal:
#   BigLib: kind:module,roles:using
#   thing1: kind:unknown,roles:using
#   thing2: kind:unknown,roles:using
# my idea:
#   BigLib: kind:module, roles:namespace 
#   thing1: kind:unknown,roles:used
#   thing2:> kind:unknown,roles:used
using BigLib: thing1, thing2

# your proposal:
#   Base: kind:module,roles:imported
#   thing1: kind:unknown,roles:imported
# my idea:
#   Base: kind:module, roles:namespace
#   thing1: kind:unknown,roles:imported
import Base.show

I think used is better than using. It is similar to imported.
I introduced a role not having much meaning: namespace.
Let's pick one as an example.
See Base.
For import Base.show, you assign imported as a role to Base. However, I think Base is not imported.

[jet@living]~/var/ctags-new% julia
julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Fedora 33 build
|__/                   |

┌ Warning: Terminal not fully functional
└ @ Base client.jl:390
julia> import Test.Pass
import Test.Pass

julia> Test
Test
ERROR: UndefVarError: Test not defined
Stacktrace:
 [1] top-level scope
 [2] run_repl(::REPL.AbstractREPL, ::Any) at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288

julia> Pass
Pass
Pass

Test is not imported. Pass is imported.
This is applicable to Base.show. Base is not imported.
Base is just used for specifying show. So I introduced a meaningless role namespace and assigned it to Base.
I took this idea from python parser.

Acceptable?

If yes, it is better to have one page like man/ctags-lang-python.1.rst.in.

@AmaiKinono
Copy link
Member Author

Totally agree. Using a namespace role is also consistent with the Python parser (in the from X import Y case).

@AmaiKinono
Copy link
Member Author

@masatake I've finished it. I still want to do some refactor because the code parsing import/using expressions becomes a bit messy. You can read the added manpage first.

@masatake
Copy link
Member

@AmaiKinono, the man looks good to me.

@masatake
Copy link
Member

The unknown kind will be resolved in the future with #2741 :-P.

Copy link
Member

@masatake masatake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please, merge the commits with "Squash and merge" button.

@AmaiKinono AmaiKinono merged commit 3cc79e5 into universal-ctags:master Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants