Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/parallelize file parsing #28

Merged
merged 3 commits into from
Mar 19, 2018

Conversation

AndyCoveo
Copy link
Contributor

Parallelize the file formatting, shows some interesting performance improvement on my setup:

Benchmark Mode Cnt Score Error Units
BenchmarkRunner.newFormatSourceFilesInDirectory thrpt 20 0,626 ± 0,093 ops/s
BenchmarkRunner.oldFormatSourceFilesInDirectory thrpt 20 0,440 ± 0,084 ops/s
BenchmarkRunner.newFormatSourceFilesInDirectory avgt 20 1,489 ± 0,206 s/op
BenchmarkRunner.oldFormatSourceFilesInDirectory avgt 20 3,321 ± 0,545 s/op

These numbers were optained using jmh, on a medium sized projet that I own.

@@ -87,8 +90,10 @@ public void execute() throws MojoExecutionException, MojoFailureException {
}
}

Formatter formatter = getFormatter();

for (File directoryToFormat : directoriesToFormat) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not flatten the files from multiple root directories in order to process all of them in parallel?

Copy link
Contributor

@GuiSim GuiSim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments, no showstoppers 👋

<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the project purposefully tried to use an earlier version of Java. I seem to remember avoiding lambdas.

Is this an okay change? Might require a major bump. Not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The File.walk is only available since java 8. If bumping is out of option then we might want to close/rework this PR.

Is bumping to a major version an ok option here @malaporte ?

formatter = new Formatter(JavaFormatterOptions.builder().style(style()).build());
}
return formatter;
private Formatter getFormatter() throws MojoFailureException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a change of behavior here, not sure how relevant it is:

It used to be that the formatter wouldn't be initialized if there was no file to format.
If we don't care, we should move the init of the formatter in the constructor.

Copy link
Contributor

@GuiSim GuiSim Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something tells me that it was lazy initialized for a reason 🤔
I'd look into the history of that line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, it was done lazily to allow the formatter to take into account the JavaFormatterOption.Style that is now taken as a parameter (can be "oasp" or "google").

By providing the formatter as a parameter, I keep the same functionnality (initialize the Formatter only once the style is set) while removing the need for a synchronized getter.

try (Stream<Path> paths = Files.walk(Paths.get(directory.getPath()))) {
paths
.collect(Collectors.toList())
.parallelStream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this approach you'll get a better performance gain if you have a single folder with many files vs multiple folders with few files.

I would suggest flatmap, as proposed by @malaporte

Also, I would recommend using an approach that allows specifying a thread pool / executor. Is the default one use by parallelStream sufficient here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the vanilla parallelStream might not be optimal, but it does provide some improvement. I'll leave it as future work to tweek the necessary parameters to get the best performances out of it.

@malaporte malaporte merged commit 97044ef into spotify:master Mar 19, 2018
@malaporte
Copy link
Contributor

I just released 2.4.0 with this change included. Enjoy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants