Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FileSignature machine-independent. #566

Closed
nedtwigg opened this issue May 4, 2020 · 9 comments
Closed

Make FileSignature machine-independent. #566

nedtwigg opened this issue May 4, 2020 · 9 comments

Comments

@nedtwigg
Copy link
Member

nedtwigg commented May 4, 2020

This is a prerequisite to making the gradle build cache useful across machines (#280). Here is why:

  • spotless formats by applying a series of steps - every step is capable of serializing its entire state, including config files, for the purpose of up-to-date checks

  • @Input
    public List<FormatterStep> getSteps() {
    return Collections.unmodifiableList(steps);
    }

  • /**
    * Implements a FormatterStep in a strict way which guarantees correct and lazy implementation
    * of up-to-date checks. This maximizes performance for cases where the FormatterStep is not
    * actually needed (e.g. don't load eclipse setting file unless this step is actually running)
    * while also ensuring that gradle can detect changes in a step's settings to determine that
    * it needs to rerun a format.
    */
    abstract class Strict<State extends Serializable> extends LazyForwardingEquality<State> implements FormatterStep {
    private static final long serialVersionUID = 1L;
    /**
    * Implements the formatting function strictly in terms
    * of the input data and the result of {@link #calculateState()}.
    */
    protected abstract String format(State state, String rawUnix, File file) throws Exception;

  • /**
    * This function is guaranteed to be called at most once.
    * If the state is never required, then it will never be called at all.
    *
    * Throws exception because it's likely that there will be some IO going on.
    */
    protected abstract T calculateState() throws Exception;
    /** Returns the underlying state, possibly triggering a call to {{@link #calculateState()}. */
    protected final T state() {
    // double-checked locking for lazy evaluation of calculateState
    if (state == null) {
    synchronized (this) {
    if (state == null) {
    try {
    state = calculateState();
    } catch (Exception e) {
    throw ThrowingEx.asRuntime(e);
    }
    }
    }
    }
    return state; // will always be nonnull at this point
    }
    // override serialize output
    private void writeObject(ObjectOutputStream out) throws IOException {
    out.writeObject(state());
    }
    // override serialize input
    @SuppressWarnings("unchecked")
    private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
    state = (T) Objects.requireNonNull(in.readObject());
    }

  • so far so good, this means that most steps will work with the gradle build cache. The problem is for some formatters that have config files.

    • Some formatters capture the state of these config files by loading their content into a String, these will relocate okay
    • e.g. LicenseHeaderStep
      /** Reads the license file from the given file. */
      private LicenseHeaderStep(File licenseFile, Charset encoding, String delimiter, String yearSeparator) throws IOException {
      this(new String(Files.readAllBytes(licenseFile.toPath()), encoding), delimiter, yearSeparator);
      }
    • But most of them use FileSignature, which will not relocate.
    • e.g. ScalaFmtStep
      static final class State implements Serializable {
      private static final long serialVersionUID = 1L;
      final JarState jarState;
      final FileSignature configSignature;
      State(String version, Provisioner provisioner, @Nullable File configFile) throws IOException {
      String mavenCoordinate;
      Matcher versionMatcher = VERSION_PRE_2_0.matcher(version);
      if (versionMatcher.matches()) {
      mavenCoordinate = MAVEN_COORDINATE_PRE_2_0;
      } else {
      mavenCoordinate = MAVEN_COORDINATE;
      }
      this.jarState = JarState.from(mavenCoordinate + version, provisioner);
      this.configSignature = FileSignature.signAsList(configFile == null ? Collections.emptySet() : Collections.singleton(configFile));
      }
  • FileSignature uses filesystem absolute paths and lastModified timestampts

  • Worst of all, JarState is used by almost every single FormatterStep, and it has a FileSignature inside it

@nedtwigg
Copy link
Member Author

nedtwigg commented May 4, 2020

One possible fix, change this:

private FileSignature(final List<File> files) throws IOException {
this.files = files;
filenames = new String[this.files.size()];
filesizes = new long[this.files.size()];
lastModified = new long[this.files.size()];
int i = 0;
for (File file : this.files) {
filenames[i] = file.getCanonicalPath();
filesizes[i] = file.length();
lastModified[i] = file.lastModified();

into

filenames[i] = file.getName()
filesizes[i] = file.length();
hash[i] = someFastHash(file)

This performs a good bit worse, because it requires reading every single config file, and we have to do it for every single step. It could be improved by only doing the hashing on serialization, and continuing to rely on filesystem timestamps for non-serialized in-memory FileSignature.

A faster, but perhaps harder-to-implement, alternative would be to pass around a CacheApi object, something like this:

private FileSignature(CacheApi api, List<File> files). For platforms that don't support caching, like maven, we just pass in a no-op. And for platforms that do, like gradle, we pass in an adapter on TaskInputs. The wiring for this latter case is quite a bit harder.

Could start off with the easy low-performance one, and then do the harder one later...

@nedtwigg
Copy link
Member Author

@lutovich implemented the content-hashing approach in #571. I implemented a content-hash-when-appropriate mode in #573. I have a lot of confidence that 571 will work. I think 573 can probably work, and is faster, but it's definitely more treacherous. It's possible that the speedup of 573 is immeasurably small, in which case 571 is the slam-dunk answer. Even if 573 is measurably faster, it needs to be a lot faster to justify the testing effort and complexity. I'm gonna stew on this for a while. Just wanted to get both ideas out there in case it sparks any ideas from other stakeholders.

@bigdaz
Copy link
Contributor

bigdaz commented Jun 22, 2020

@nedtwigg I'm wondering if it's worth reworking how and when FormatterStep instances are created and configured. You've done some work to make this lazy, but as far as I can tell most of the work is still done in the extension and the instantiated steps are then provided directly to the SpotlessTaskModern.

This runs somewhat counter to idiomatic Gradle, where we generally expect the inputs to a task to be value objects that can be easily tracked for changes, with the task itself responsible for translating these configuration values into executable logic. For example, instead of the task input being a complete LicenseHeaderStep, you'd have each of the input properties (delimiter, yearSeparator, licenseHeaderFile) registered as task inputs. This way, you would automatically benefit from the file snapshotting that Gradle does, and you could mark the input file with PathSensitivity.NONE so that we ignore all but the contents of the file.

I understand this would be a big change, and I'm sure I'm glossing over some major difficulties in the migration. But I believe it's worth considering, and would have the following benefits:

  • FormatterStep instances would not need to be constructed or configured in order to determine if the SpotlessTask is up to date. So more work could be deferred until task execution.
  • Keeping track of input file changes (and creating appropriate cache keys) is challenging, and with the current approach we're required to re-implement a lot of functionality that is core to Gradle.
  • When the Spotless task is required to re-execute due to configuration changes, Gradle will be able to report exactly what input property has changed.

@nedtwigg
Copy link
Member Author

I definitely agree that it is no longer necessary, in the gradle plugin, for FormatterStep to be lazy. I also think it would be great if Spotless' inputs were legible to gradle. Here are the challenges I see:

The fundamental value proposition of Spotless is that you can compose formatters, rather than use just one specific formatter. If someone makes a good license header gizmo, you can use it with every formatter, not just foo. Correct me if I'm wrong, but:

public class SpotlessTask { 
  @Nested
  public List<FormatterStep> steps;
...
}

public class FooStep implements FormatterStep {
  @Input
  String someProp
  @InputFile
  @PathSensitive(PathSensitivity.RELATIVE)
  File configFile
  ...
}
  • the List steps will not be decomposed into its elements in Gradle, correct? It's a minor point, because of course it could be, and that would be a nice feature to have someday, but for now it isn't, yeah?

If Gradle provides the ability to inspect a generic @Nested List<>, it could also decompose the serialized fields in the elements. And if Gradle could do that, why spend the time ripping out Spotless' guts, and making it less flexible in non-gradle contexts (e.g. the maven plugin is still eager, and benefits from the internal laziness).

The sticking point, as I see it.

If we merge #571, I think we would be most of the way there already. It would be ever-so-slightly less performant than if we made our inputs available to Gradle in the canonical way, but there's still a possibility to do that someday by making FileSignature into a service, which in Gradle could be injected with a Gradle-specific implementation. So I think we already have a good and easy path to make the build cache machine independent, except...:

The big sticking point, is GitAttributesLineEndings. The problem is, people have weird git setups that they don't understand. And that means any autoformatter is going to end up fighting git and seem "broken", unless your autoformatter understands git, and silently does whatever whackadoodle thing git is configured to do. We put a lot of work into making that as efficient as possible, by lazily finding the .gitattributes files underneath their target specifications. People had lots of complaints about Spotless screwing up their line endings (because they didn't understand what git was doing to their line endings) until we added that feature.

It's tempting to replace GIT_ATTRIBUTES with just PRESERVE_CURRENT, but one of the most useful things we do for maintainers is get rid of PRs that accidentally change all the line endings, and PRESERVE_CURRENT can't help with that. We could deprecate GIT_ATTRIBUTES, and encourage users to switch to either UNIX or WINDOWS, but that screws up the "ability" of users to set their own personal line-endings preferences, and git will helpfully mangle their local WC to suit, and then Spotless and git will fight.

A way forward

If we fix GitAttributesLineEndings, we can merge #571 and be done. GitAttributesLineEndings doesn't have as many constraints as a FormatterStep does, there's a lot of flexibility. I have an idea, lemme sketch it out and push it up. If the idea works, then FileSignature will only be used for random config files and JarState, where #571 works great.

@nedtwigg
Copy link
Member Author

Okay, #621 is up, and I feel pretty good about it. Between that and the existing #571, I think we have a complete fix. Still curious about your thoughts if there are ways we can be more gradle-friendly, @bigdaz

@bigdaz
Copy link
Contributor

bigdaz commented Jun 23, 2020

I'm happy that we're close to having cacheability working for Spotless, which was my primary goal setting out. So I don't see any great pressure to refactor things.

However, the @Nested annotation has supported Iterable properties since Gradle 4.5 (https://docs.gradle.org/4.5/release-notes.html). I wanted to confirm this for myself so I put together a little test project: https://github.com/bigdaz/nested-inputs.

In this example the task input is a list of beans; each implementation has it's own input file property and one of the implementations delegates to another @Nested instance.

The @Nested annotations work to ensure that all of the 4 input files are tracked for changes.

@nedtwigg
Copy link
Member Author

Awesome, thanks for the info!

@nedtwigg
Copy link
Member Author

Implemented through a combination of #571 and #621.

@nedtwigg nedtwigg unpinned this issue Jun 28, 2020
@nedtwigg
Copy link
Member Author

nedtwigg commented Jul 2, 2020

Released in plugin-maven 2.0.0 and plugin-gradle 4.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants