Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Up-to-date index for Maven plugin #935

Merged
merged 22 commits into from
Dec 23, 2021

Conversation

lutovich
Copy link
Contributor

POC of a component that allows the Maven plugin to skip already formatted and unchanged files. The code is not production-ready and probably does not work. It is only to illustrate the approach and gather feedback.

A new UpToDateChecker maintains an index file in the build directory (<project>/target). The index file has the following structure:

<source file 1> <last modified time 1>
<source file 2> <last modified time 2>
...
<source file n> <last modified time n>

UpToDateChecker loads or creates an index file at creation time into an in-memory map.

Method #isUpToDate(File) checks the file's current last modified time against the last modified time from the in-memory map.

Method #setUpToDate(File) updates or adds the file and its last modified time to the in-memory map.

Method #close() writes the complete in-memory map into the index file.

Spotless apply goal uses UpToDateChecker to determine if a file needs to be formatted. It also notifies the checker if the file has was formatted.

@lutovich
Copy link
Contributor Author

Hi @nedtwigg, this PR demonstrates an approach that could make the Maven plugin faster. It makes the Maven plugin maintain an index file->lastModifiedTime and use it to determine if a file needs formatting. Could you please take a look and let me know if this approach makes sense. The code is not good, it is just to demonstrate the idea.

@lutovich lutovich requested a review from nedtwigg September 16, 2021 22:37
@nedtwigg
Copy link
Member

nedtwigg commented Sep 16, 2021

I think that the UpToDateChecker interface is great. I would implement with:

  • normalize paths relative to the project dir, and store in a Trie map
  • my understanding is that maven doesn't have a daemon of any kind, which means you need to be able to serialize that map to/from disk

You could do just a Map<File, Long> rather than TrieMap<String, Long>, but I suspect that will become a problem for any build big enough to care about the incremental build.

@tisoft
Copy link
Contributor

tisoft commented Sep 30, 2021

I really want this. 😆

But I think the index also needs to be invalidated, if the configuration of the spotless plugin (or the plugin version) changes. So the index would need a hash or something of the current configuration settings. I don't know if they can be read out as a whole.

@lutovich
Copy link
Contributor Author

@tisoft good point, I agree! I will investigate how to include configuration settings into the index.

@nedtwigg
Copy link
Member

For this, you can copy the properties marked @Input in SpotlessTask. They are all serializable and they support .equals(). If they are all the same, then the formatter configuration has not changed.

@lutovich lutovich force-pushed the maven-up-to-date-index branch from ff06fe8 to 792a9ef Compare October 19, 2021 21:05
@lutovich
Copy link
Contributor Author

Force-pushed a few refinements to this PR. The code structure has improved, but tests are still largely missing. Introduced a PluginFingerprint to handle plugin version/configuration changes. It uses a serialized Plugin value object from the Maven project that captures the complete XML definition of the plugin.

This PR is still a draft. I'll keep working on unit and integration tests.

Copy link
Member

@nedtwigg nedtwigg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress!

@tisoft
Copy link
Contributor

tisoft commented Oct 20, 2021

And another feature request from me:

Can you make the index file location/name configurable? I would like to have it outside of the target directory, so a mvn clean doesn't remove it.

@lutovich
Copy link
Contributor Author

@tisoft sounds reasonable. The first version will probably not support this but I will adjust the configuration to be more extensible, something like:

<configuration>
  <upToDateCheck>
    <enabled>true</enabled>
    <indexLocation>/tmp</indexLocation> <!--  easier to add this when config is an object  -->
  </upToDateCheck>
</configuration>

@nedtwigg
Copy link
Member

This looks mergeable to me. I resolved the convos that have been solved. The remaining open convos are:

  • GitRatchet (I think we can release and integration test by hand)
  • style nits which I'm happy to drop if you prefer the current style

Anything else you think needs to happen before we merge and release this?

@lutovich
Copy link
Contributor Author

Hi @nedtwigg, I would like to address the following few items before merging:

  • Try your suggestion about making FileIndex use File instead of Path and remove Optionals
  • Add a few more unit tests for FileIndex
  • Add a few integration tests for spotless:apply and spotless:check
  • Fix the remaining TODOs

This PR is already pretty big, sorry about this. I couldn't think of a good way to split it into meaningful parts. I will address the remaining items in separate commits to simplify the review.

@nedtwigg
Copy link
Member

Sounds great! PR size not a problem :)

@lutovich
Copy link
Contributor Author

Hi @nedtwigg,

I refactored the code to avoid Optional and added more unit and integration tests. All new classes now consistently use Path instead of both Path and File. This makes the implementation cleaner because Path exposes a friendlier API.

I think this PR is now ready for review. I can squash all commits and rebase before merging.

@lutovich lutovich marked this pull request as ready for review December 14, 2021 16:41
@lutovich lutovich requested a review from nedtwigg December 14, 2021 16:41
Copy link
Member

@nedtwigg nedtwigg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'd prefer no squash, the history is helpful. I think changelog, docs, and we can merge and release!

@lutovich lutovich force-pushed the maven-up-to-date-index branch 2 times, most recently from e5efef3 to 590553d Compare December 16, 2021 10:08
POC of a component that allows the Maven plugin to skip already formatted and
unchanged files. The code is not production-ready and probably does not work.
It is only to illustrate the approach and gather feedback.

A new `UpToDateChecker` maintains an index file in the build directory
(`<project>/target`). The index file has the following structure:

```
<source file 1> <last modified time 1>
<source file 2> <last modified time 2>
...
<source file n> <last modified time n>
```

`UpToDateChecker` loads or creates an index file at creation time into an
in-memory map.

Method `#isUpToDate(File)` checks the file's current last modified time
against the last modified time from the in-memory map.

Method `#setUpToDate(File)` updates or adds the file and its last modified
time to the in-memory map.

Method `#close()` writes the complete in-memory map into the index file.

Spotless apply goal uses `UpToDateChecker` to determine if a file needs to be
formatted. It also notifies the checker if the file has was formatted.
Instead of creating/opening an index for every source file.
Include the list of serialized formatters in PluginFingerprint. It makes the
file index sensitive to changes in configuration files referenced by the POM
(e.g. `eclipseformat.xml` and `.gitattributes`).

To achieve this, Maven plugin has to instantiate all formatters upfront.
It builds a map of formatters and files that need formatting. The formatters
are then used to calculate a PluginFingerprint.
It is useful for local development only.
Add a dedicated POM XML element for up-to-date checking:

```
<plugin>
  <groupId>com.diffplug.spotless</groupId>
  <artifactId>spotless-maven-plugin</artifactId>
  <version>${spotless.version}</version>
  <configuration>
    <upToDateChecking>
      <enabled>true</enabled>
    </upToDateChecking>
    <java>
      <googleJavaFormat/>
    </java>
  </configuration>
<plugin>
```

It makes the configuration more flexible and allows adding more options
related to up-to-date checking in the future.
Introduce `FileIndexConfig` that abstracts away some directory manipulations
and file naming.

Fix two issues:
 * ensure the parent dir for the index file exists before writing the file itself
 * relativize files against the project dir, not against the target dir
Instead, use `@Nullable` to simplify the API. This requires adding a new
dependency to the POM when building the Maven plugin.
Instead of using a mix of File and Path. Path has more suitable API for:
 * Relativizing file paths
 * Getting last modified time. `File#lastModified()` may return zero. It is
   also not very clear about overflows
This conversion is needed because FileIndex stores human-readable timestamps in
the index file. There's a way to convert FileTime to String but no way to
convert String to FileTime when reading an index from a file. Thus, we convert
to Instant that has well-defined `#toString()` and `#parse()` methods.

Conversion from FileTime to Instant can be lossy because FileTime can represent
timestamps further in the past/future than Instant. Such timestamps are saturated
to Instant.MIN and Instant.MAX. It is not safe to record such timestamps in the
index file because they are lossy.

IndexBasedChecker will not record overflown Instants in the index. It will also
log a warning for every file with a small/large last modified time that overflows
an Instant.
The added tests assert on debug logging output of the apply and check goals.
Path#getParent() can return null. This should not happen for the maven plugin
up-to-date index. Thus, the fix is to check for null and throw an
IllegalStateException.
@lutovich lutovich force-pushed the maven-up-to-date-index branch from 590553d to 25f70e0 Compare December 22, 2021 17:56
@lutovich
Copy link
Contributor Author

lutovich commented Dec 22, 2021

Added documentation and a changelog entry. Hope both make sense :)

@lutovich lutovich requested a review from nedtwigg December 22, 2021 17:58
@lutovich lutovich force-pushed the maven-up-to-date-index branch from 25f70e0 to fc1f9cc Compare December 22, 2021 18:08
@lutovich lutovich force-pushed the maven-up-to-date-index branch from fc1f9cc to a50a58f Compare December 22, 2021 18:10
Copy link
Member

@nedtwigg nedtwigg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, huzzah!

@nedtwigg nedtwigg merged commit bebaa59 into diffplug:main Dec 23, 2021
@nedtwigg
Copy link
Member

Published in plugin-maven 2.18.0.

@lutovich lutovich deleted the maven-up-to-date-index branch December 25, 2021 17:42
@lutovich
Copy link
Contributor Author

Thanks for releasing this change!
I did a quick & dirty test on neo4j codebase with the following simple config:

<plugin>
  <groupId>com.diffplug.spotless</groupId>
  <artifactId>spotless-maven-plugin</artifactId>
  <version>2.18.0</version>
  <configuration>
    <upToDateChecking>
      <enabled>true</enabled>
    </upToDateChecking>
    <java>
      <googleJavaFormat/>
    </java>
  </configuration>
</plugin>

On my machine with 8 cores:

Command Up-to-date checking OFF Up-to-date checking ON
Sequential mvn spotless:apply ~85 sec ~4 sec
Parallel mvn spotless:apply -T 1C ~50 sec ~3 sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants