Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict scala deps and unused_deps #235

Open
ittaiz opened this issue Jun 28, 2017 · 31 comments
Open

Strict scala deps and unused_deps #235

ittaiz opened this issue Jun 28, 2017 · 31 comments
Labels
dep-tracking Strict/unused deps, label collections related issues

Comments

@ittaiz
Copy link
Member

ittaiz commented Jun 28, 2017

So,
I met with @cgrushko a few days ago and apparently the way we (rules_scala) currently do strict deps (allowing a target to only use what it directly depends on) is very different from the way java rules do it and IMHO not as good.

Problem:
We currently pass to scalac only the direct jars and let scalac emit an error to the user.
These errors are quite often cryptic and unuseful for developers to be able to easily fix the problem.
Moreover because of how the scala compiler works there are many cases where it needs transitive dependencies even if they aren't mentioned in the source code of the unit we're compiling[1]. This leads to having to add many transitive targets without any reasoning just to appease the compiler or worse to start using exports loosely around the codebase.

We suffered from the above transitive dependencies so much that we're just adding the whole transitive closure to the deps of a target. This is in the onboarding of a project so we're ok with it but this solution can help greatly minimize it.

Proposed solution:
Pass direct and indirect jars to scalac's classpath (like java compilation does).
Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

We will also support a lenient mode where this error is outputted as a warning.
add_dep is a command which buildozer can take and apply it to your BUILD file.
I think (not sure) that ibazel can pick up these warnings and automatically apply them to your BUILD file. Probably rebazel will be able to do this as well if they'll want to.

Value:
Separation of concerns (scalac won't enforce graph concerns)
Allows higher level messages and so call to actions
Might allow finer grain modeling of when indirect dependencies are ok (for an example where scalac "messes" up but we might be able to improve see this repo)
Better cycle for developers since they can run with lenient mode and iterate quickly and fix the deps before pushing (or use ibazel/rebazel). The caveat is obviously the fact people can have a local green build which fails on CI.

Open questions:
It's unclear how smart the compiler plugin can be given scalac's limitations and so unclear on how much noise will still be added to a target's deps.

Small implementation detail:
To do this we'll need to pass to the compiler plugin a list of direct dependencies and a list of indirect dependencies where at least the latter needs to have both jars and the targets so that the error/warning message can be useful (add //src/main/foo:baz to your deps)

[1]Improve user experience for builds that use only direct dependencies

@ittaiz
Copy link
Member Author

ittaiz commented Jun 28, 2017

cc some people who are/were involved with thinking about this at Wix (@ors @natans @nadavwe @shvar @dkomanov)

@johnynek
Copy link
Member

this seems really nice.

note related post on bazel blog for java
https://blog.bazel.build/2017/06/28/sjd-unused_deps.html

The strategy here looks good to me. Some comments:

  1. can we get the reliability so this is the default, or will we need to be opt in for a while (I imagine the latter).
  2. should we have a version that just warns on unused dependencies first (kind of the reverse problem as this).
  3. I'm surprised the strict dependencies became a show-stopper for you. Our dependencies are probably over-broad, but it hasn't been debilitating.

@ittaiz
Copy link
Member Author

ittaiz commented Jun 29, 2017 via email

@ittaiz
Copy link
Member Author

ittaiz commented Jul 14, 2017

@gkossakowski Since Oscar pulled you to the PR I thought I'd pull you to the issue and ask you if you have an intiuition or an idea on how "smart" will the plugin be able to be? What we really want to enable is to warn/error the user when their code uses a transitive dependency. This is to allow for upstream targets freedom in changing implementation decisions. I don't know how feasible this distinction (I use vs compiler needs for transitive reasons) is in the compiler/compiler-plugins.

Thanks!

ittaiz pushed a commit that referenced this issue Jul 25, 2017
)

First step (experimental) in the introduction of strict scala deps feature (see issue #235).

Currently this feature is turned off (due to noticeable impact on performance)

switching ‘dependency_analyzer_mode_soon_to_be_removed’ to “on” means that ANY API change in a target’s transitive dependencies will trigger a recompilation of that target, on the other hand any internal change (i.e. on code that ijar omits) WON’T trigger recompilation by transitive dependencies

A few implementation  details:
A new scalac plugin informs the user on every target that has missing transitive dependencies that should be added as direct dependencies. For convenience a buildozer command is provided to make the change in the BUILD file easily.

before the compilation, transitive compile time jars are collected along with a mapping to the originating target label.
@ittaiz
Copy link
Member Author

ittaiz commented Oct 3, 2017

@gkossakowski Do you have an intuition or an idea on how we can advance the "smart" plugin mode? What we really want to enable is to warn/error the user when their code uses a transitive dependency as opposed to warning the user for cases where the compiler needs this symbol lookup just for the its own needs. This is to allow for upstream targets freedom in changing implementation decisions.

Thanks!

@gkossakowski
Copy link

Yes, I mulled over this problem in the context of zinc's incremental compilation decision. In zinc, you want to approximate the dependencies scalac would ever need.

The short answer is: the dependencies introduced by reference do not require a transitive closure but the one introduced by inheritance does require a transitive closure. When a program mentions a symbol, e.g. val x: Foo then it introduces a reference dependency. When program mentions a dependency in the context of declaring a parent type for a class, e.g class X extends Y[Foo], then it introduces a dependency by inheritance.

You can capture both kinds of dependencies by traversing the trees and collecting the information in the right context. Zinc implements this: https://github.com/sbt/zinc/blob/1.x/internal/compiler-bridge/src/main/scala/xsbt/Dependency.scala#L326

@ittaiz
Copy link
Member Author

ittaiz commented Oct 3, 2017

Are you sure?
I have an example which has a class A with a member B and B has a member C (which is a java interface) and you can't compile A without C. Tested with 2.11.7 and 2.12.1 at the time

@ittaiz
Copy link
Member Author

ittaiz commented Oct 3, 2017

Actually I think that my example above shows that there are more areas the plugin can improve in but what you wrote IIUC is that at least in that aspect (of inheritance) the plugin can be smarter and not warn/error. Did I understand you correctly?

@gkossakowski
Copy link

gkossakowski commented Oct 4, 2017

My comment is what should happen given Scala's typesystem design to shed light what directionally to expect. From there you can derive whether the behavior you're observing is a bug or a design choice you have to work with.

I just checked that what I said about dependencies by reference not forcing transitive dependencies is true for pure Scala:

[gkk@mbp ~/tmp/scala-tmp]$ echo "class A" > A.scala
[gkk@mbp ~/tmp/scala-tmp]$ echo "class B {
>     def a: A = new A
> }
> " > B.scala
[gkk@mbp ~/tmp/scala-tmp]$ echo "class C {
>     def b: B = new B
> }
> " > C.scala
[gkk@mbp ~/tmp/scala-tmp]$ mkdir outA outB outC
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outA A.scala
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outB -cp outA B.scala
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outC C.scala # skip -cp to show that classpaths are isolated
C.scala:2: error: not found: type B
    def b: B = new B
           ^
C.scala:2: error: not found: type B
    def b: B = new B
                   ^
two errors found
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outC -cp outB C.scala # the A class is not on the classpath

I'm not sure what's different about your example but I think it should work. Perhaps you can check whether you can reproduce the problem in pure Scala setting? I suspect that loading Java-originating symbols from a class file is overeager. It's a different code path than loading Scala symbols in the compiler.

@promiseofcake
Copy link
Contributor

Would the use-case where a scala_test target is required to redeclare the external dependencies of it's scala_library fall into this issue -- or would that be something else?

I have built a contrived example here: promiseofcake/bazel-scala-test#2 that runs with Travis to show the current issue.

I think it belongs here because for our java targets we have not had to perform similar adjustments.

@ittaiz
Copy link
Member Author

ittaiz commented Oct 18, 2017

Yes. This is under the purview of this issue.

@ittaiz ittaiz mentioned this issue Nov 6, 2017
@promiseofcake
Copy link
Contributor

Regarding the newly documented strict-deps,

Note: Currently strict-deps is protected by a feature toggle but we're strongly considering making it the default behavior as java_* rules do.

If it were to become the default behavior, would we want to follow the proposed solution before making it mandated?

@ittaiz
Copy link
Member Author

ittaiz commented Nov 9, 2017 via email

@promiseofcake
Copy link
Contributor

No worries @ittaiz, the proposed solution I was referring to was:

Pass direct and indirect jars to scalac's classpath (like java compilation does).
Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

With the gist of it being that dependency resolution could perhaps mimic the way it works for java targets? Here is another example: promiseofcake/pubref-java-proto-scala#1 which shows that exports from java_ targets aren't pulled in for scala_ rules, and I am fairly certain those targets are exported because the --strict_java_deps=ERROR does not error out for the java_ targets.

@ittaiz
Copy link
Member Author

ittaiz commented Nov 9, 2017 via email

@kamalmarhubi
Copy link

Pass direct and indirect jars to scalac's classpath (like java compilation does).

Is this implemented? I'm running at current tip, and am getting errors that seem to indicate that it isn't: I get a term not found error that I can resolve by manually adding transitive dependencies to the target.

@ittaiz
Copy link
Member Author

ittaiz commented Dec 10, 2017

This is implemented. Are you running the build with --strict_java_deps flag? You can use error or warn as the relevant flags. This isn't the default behavior yet since we're still ironing out some kinks (like warning on scala-library).
If you encounter issues I'd really appreciate opening a new issue and linking it to this one so we can have more scoped discussions.

@kamalmarhubi
Copy link

Ah, I didn't realize I had to pass that flag. That does indeed fix it, and allows me to remove the extra dependencies (or equivalently the exports on the direct dependencies). Thanks!

(There's some weirdness with the error messages mentioning unknown labels, but I'm pretty sure I saw that in another issue.)

@ittaiz
Copy link
Member Author

ittaiz commented Dec 10, 2017 via email

@kamalmarhubi
Copy link

kamalmarhubi commented Dec 10, 2017

@ittaiz

Are you using scala_import for your dependencies?

I'm experimenting with java_import_external from bazel_tools, so no. However it's possible to decipher the message in the meantime.

Thinking on how to solve this: it seems like it should be possible to apply an aspect to the deps of scala_* to collect this info. Is that the approach you were considering?

@ittaiz
Copy link
Member Author

ittaiz commented Dec 10, 2017

No actually.
The current direction (which I Haven't gotten around to but you're welcome to) is to generalize java_import_external to a jvm_import_external which will accept either scala_import (and we will expose here as scala_import_external) and also java_import_external as today.
See this discussion for agreement with @dslomov on this direction

@jjudd
Copy link
Contributor

jjudd commented Mar 7, 2018

As far as I know, strict mode for rules_scala does not yet catch extra dependencies. @ittaiz discussed writing a compiler plugin to deal with this part.

I ran across this compiler plugin from the Scala Center https://github.com/scalacenter/classpath-shrinker which seems to do what we are interested in doing.

I played around with using the plugin to clean up some BUILD files that were auto generated using sbt -> pom -> Bazel: https://docs.bazel.build/versions/master/migrate-maven.html

I ran into an issue where all of the transitive dependencies are on the classpath at compile time, so if you have scala targets that depend on other scala targets, then the compiler plugin prints a bunch of warnings.

I'm curious if this plugin has been explored before or if any work has been done on detecting extra dependencies.

@jjudd
Copy link
Contributor

jjudd commented Mar 7, 2018

Well, I feel silly. I didn't read closely enough. Looks like that plugin is already in rules_scala :D https://github.com/bazelbuild/rules_scala/blob/master/third_party/plugin/src/main/io/bazel/rulesscala/dependencyanalyzer/DependencyAnalyzer.scala

I'm still curious if the original plugin detected extra jars if rules_scala can do that. I'll dive in a bit more. Sorry to waste anyone's time.

@jjudd
Copy link
Contributor

jjudd commented Mar 7, 2018

Okay, I quickly hacked together a really rough implementation of unused_deps in #438 that is bundled with strict_java_deps. I would like to get people's feedback before going any further.

@Jamie5
Copy link
Contributor

Jamie5 commented Oct 3, 2018

Will there be a way in the future to use both unused_deps and strict_deps at once? Currently readme suggests not

@johnynek
Copy link
Member

johnynek commented Oct 3, 2018

It would be nice... that said, they work across purposes a bit.

At Stripe, we minimize the compiler classpath to reduce cache invalidation. So, we don't use strict_deps (which actually puts all your transitive deps on the classpath, but then errors if you use them). Instead, we just don't have them, and we get errors (although harder to diagnose) if you use them.

Really, I start to wonder if compiler plugins are the right path here. As long as scalac is not trying to do separate compilation, the experience will always be a bit second rate. I'd like to see separate compilation with minimal classpaths be a supported feature of scalac, but right now it is kind of tolerated, but not really strongly supported.

@jjudd
Copy link
Contributor

jjudd commented Oct 3, 2018

We've been using both together at Lucid for a few months with Annex and it has worked quite well. Zinc's analysis is used to determine unneeded and unused deps and outputs nothing, warnings, or errors with buildozer commands depending on the configuration. Combined with ibazel's run_output and run_output_interactive, you can get an interactive N/y prompt to execute a buildozer command to remove or add a dependency when it violates strict/unused deps. This has made it pretty easy to maintain minimal classpaths.

@ittaiz
Copy link
Member Author

ittaiz commented Oct 3, 2018 via email

@jjudd
Copy link
Contributor

jjudd commented Oct 3, 2018

There would need to be one index per scala_ target, right? The index would contain the mapping of class to label for the transitive closure of the current target? If it is global, it would be constantly churning and class name collisions could cause problems.

@ittaiz
Copy link
Member Author

ittaiz commented Oct 3, 2018 via email

@Jamie5
Copy link
Contributor

Jamie5 commented Oct 24, 2018

So while unused_deps and strict_deps do work a bit at cross purposes it still seems useful to have both. Because without strict_deps, you have to have dependencies in your classpath that don't really make sense (eg types you never use in any way yet a base package class uses). so you may well decide that the tradeoff in increased compile time in some cases is worthwhile. And unused_deps of course, because you don't want to have unnecessary dependencies regardless of what dependency model you are using.

@liucijus liucijus added the dep-tracking Strict/unused deps, label collections related issues label Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dep-tracking Strict/unused deps, label collections related issues
Projects
None yet
Development

No branches or pull requests

8 participants