Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gazelle: add resolvers for activated languages only #787

Merged
merged 1 commit into from
Aug 14, 2020

Conversation

robfig
Copy link
Contributor

@robfig robfig commented May 19, 2020

This disables processing of rules for languages which are disabled via the -lang
flag. Concretely, this halves the time it takes for Gazelle to update our Go
rules by skipping processing of all closure_js rules.

That's partially because processing JS is sort of slow, but I think the basic idea here is sound.

Updates #687

@robfig robfig requested a review from jayconrod as a code owner May 19, 2020 18:44
@jayconrod
Copy link
Contributor

Looks good.

Could you update this to the latest master? It looks like filterLanguages isn't present yet on the base commit.

@robfig robfig force-pushed the pr_resolve_activated_langs branch from 365632a to 993ee32 Compare May 19, 2020 20:11
@robfig
Copy link
Contributor Author

robfig commented May 19, 2020

Whoops, thanks. (I used an earlier release to avoid having to download the newer bazel required by rules_go.)

@robfig robfig force-pushed the pr_resolve_activated_langs branch from 993ee32 to 5b77c67 Compare May 19, 2020 20:14
@robfig
Copy link
Contributor Author

robfig commented May 19, 2020

Apologies, I must have not run the tests. TestConfigLang fails with a NPE. Investigating.

@robfig
Copy link
Contributor Author

robfig commented May 19, 2020

Oh, this change falls afoul of subsequent directives overriding the default lang restriction to be unrestricted. For example, gazelle -lang=proto then recurses into a directory with # gazelle:lang, disabling the filter.

I can think of a couple ways forward:

  1. The flag could act as a minimal filter that applies regardless of any directives.
  2. We could change the meaning of the # gazelle:lang directive to be a strict filters, without any ability to unset higher-up filters.
  3. Abort this change, and always process all language rules.

Thoughts on those:

  1. Minimal filter is a more complicated configuration model, but it also seems useful, and not overly surprising. It's not without precedent, since other configuration bits are combined in some ways with flags, not just strictly overridden.
  2. Strict filters seem ok. It seems like that model might prevent configuration of some scenario, but I'm struggling to think of anything that doesn't have an easy workaround.
  3. Aborting I don't care for, because that doesn't solve my problem, but I realize that you may not want to add more stuff around this flag.

What do you think?

@jayconrod
Copy link
Contributor

Ah, this is more complicated than I thought it was at first.

I wonder if there's another way to address the performance problem you were seeing? What exactly is taking a lot of time? Before this PR, if a language is disabled, we'd be indexing library rules for that language, but we shouldn't be generating any rules for it. And we should only be resolving dependencies for generated rules.

I think disabling indexing wouldn't be too difficult, but I wouldn't expect indexing to be too expensive.

@robfig
Copy link
Contributor Author

robfig commented May 20, 2020

Yeah, good call. I looked into it a bit more, and the underlying reason is that the closure_js implementation of Imports() reads the file to get the list of identifiers provided (which may be imported by other files). That is time consuming.
https://github.com/robfig/rules_closure/blob/2020-05-18/gazelle/closure_js/resolve.go#L18
https://github.com/robfig/rules_closure/blob/2020-05-18/gazelle/closure_js/fileinfo.go#L120

If I update the closure_js rules to have a provides = [] attribute, similar to how go_library has an importpath, then Imports could become super fast and this probably wouldn't be a problem any more.

@jayconrod
Copy link
Contributor

Would you rather add a provides = [] attribute or skip the indexing for those rules? If the rules themselves don't need a provides = [] attribute, I'd say let's skip the indexing.

That might make sense semantically, too: if a language is disabled for a large tree in a repository and that tree contains library rules for that language, it might be surprising if those libraries are indexed and dependencies are resolved to them.

@robfig
Copy link
Contributor Author

robfig commented May 20, 2020

I think doing both makes sense.

  • One use case is to run gazelle to update one app's worth of JS rules, in which case it would benefit from provides (only read that app's files, not all JS files in the repo) but not from the lang filter.

  • The lang filter is quick/easy to implement given a decision on functionality and it makes sense as standalone functionality for large repos, as you point out.

@jayconrod
Copy link
Contributor

Ok let's try skipping the indexing then.

One thing to point out: you can update specific directories in a repository by passing positional arguments on the command line. The -r flag controls recursion, and the -index flag toggles indexing (the whole repository will be indexed by default, even when only specific directories are updated).

@robfig
Copy link
Contributor Author

robfig commented May 26, 2020

Sounds good -- did you want the filter to work like one of the mechanisms described above ("minimal" or "strict" filter), or something else?

I have never tried disabling the -index flag.. I'll give that a shot, but we do have some non-standard rules so I'm not expecting that to work in general. Right now one Gazelle rule covers our whole Go monorepo, rather than app by app.

@jayconrod
Copy link
Contributor

I think the "minimal" behavior is better. It's a little more complicated, but not unmanageably so.

It seems like we'd need to initialize the index and the meta-resolver using all the languages (regardless of flag or directive), then use the current set of languages to filter calls to ruleIndex.AddRule.

(I may be missing something of course; context-switching a lot this morning)

@robfig
Copy link
Contributor Author

robfig commented Aug 8, 2020

Apologies for the long hiatus. Looking at this again, I wanted to confirm the functionality you would like.

The existing PR implements the "minimal" filter behavior of having a language filter specified on the command line always activated, although additional filters may be set and overridden as directives. It implements that by omitting the resolvers for deactivated languages, which causes rules of that language to be ignored. I think the only change needed for that PR to meet the functional requirement is to update the tests and docs to reflect that behavior.

In your comment where you say it should instead do the filtering in (*RuleIndex).AddRule, it would keep all of the resolvers and do the filtering at that later stage per rule, but it would keep the same behavior.

Just wanted to confirm that you indeed wanted the "minimal" filter behavior -- asking because the updated implementation you describe seems like would enable the implementation of a flexible filter where directives and flags are treated equivalently and it would be possible to unset the -lang filter using a directive.

I would be happy to implement that if so, just want to be sure that I'm doing the right thing.

Thanks for the consideration!

@jayconrod
Copy link
Contributor

Ah, paging this back in after a while.

I don't think we're referring to the same thing as minimal. It looks like the problem with the current implementation is that there's no way to turn a language back on in a subdirectory if it was disabled in the root directory. If there's a directive to change the set of active languages, I think it should be able to do that.

So I think the metaresolver should be configured with all available languages, and we should just after calls to RuleIndex.AddRule based on the configured set of languages. So then, a better title for this PR might be "index targets for activated languages only".

@robfig
Copy link
Contributor Author

robfig commented Aug 11, 2020

I didn't think of this before, but this approach changes the meaning of the filter. It's fine for my use case, but I wanted to be sure that the change was clear.

The original problem that the language filter solved for us was being able to run Gazelle on our Go sub-tree without generating rules for our JS files. So the original meaning of the -lang filter applied to rule generation.

The proposed change would expand the meaning to also not index languages. If someone is using directives to control Gazelle's rule generation in different sub-trees, then potentially they would still want Gazelle to index a language's rules in a sub-tree even if it's not generating them there. I'm not sure if that's an important factor, or whether that would be desirable or undesirable.

My use case only uses the -lang flag and invokes Gazelle separately for JS vs Go, so this case doesn't come up.

Still SGTY?

@jayconrod
Copy link
Contributor

The proposed change would expand the meaning to also not index languages. If someone is using directives to control Gazelle's rule generation in different sub-trees, then potentially they would still want Gazelle to index a language's rules in a sub-tree even if it's not generating them there. I'm not sure if that's an important factor, or whether that would be desirable or undesirable.

My use case only uses the -lang flag and invokes Gazelle separately for JS vs Go, so this case doesn't come up.

Still SGTY?

I think this makes sense. In this case, the Imports call was expensive due to I/O, and that may be true for other languages as well. For example, if we had a Java extension, we might need to read the files to come up with a list of packages or classes.

The code looks good, but I'd like to request two additional changes:

  1. Add a test or expand the existing test for # gazelle:lang in cmd/gazelle/integration_test.go.
  2. Document the indexing behavior for the # gazelle:lang directive in README.rst.

If a language is disabled via the `-lang` flag or `# gazelle:lang` directive,
skip indexing it.
@robfig
Copy link
Contributor Author

robfig commented Aug 12, 2020

SGTM, updated.

Any idea if someone is working on a Java extension? That seems doable-ish and pretty valuable.

Copy link
Contributor

@jayconrod jayconrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for fixing this!

@jayconrod jayconrod merged commit 49a5b63 into bazel-contrib:master Aug 14, 2020
@jayconrod
Copy link
Contributor

Any idea if someone is working on a Java extension? That seems doable-ish and pretty valuable.

Not AFAIK. Agreed that it would be useful and relatively doable. The challenge would be dealing with packages with cyclic imports (not allowed by Bazel) and of course, external repos (you might want a java_repository, like go_repository, or maybe a generalized gazelle_repository).

There's some discussion on the future home of language extensions in #857 if you're interested. Needs more thought.

@robfig robfig deleted the pr_resolve_activated_langs branch August 24, 2020 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants