Stamp locking slows down large builds #1129

viveksjain · 2022-09-18T03:53:17Z

steps

Pick a large scala repo, like https://github.com/apache/spark
Do a full compilation
Make a one-line change in some source file
See how long a dependent test takes to recompile

problem

I was investigating incremental scala compilation performance, which takes O(30s) for one-line change on a large internal repo and found that quite a large fraction of time is spent blocked on Stamp locks. About 16s on this global Stamper lock

zinc/internal/zinc-core/src/main/scala/sbt/internal/inc/Stamp.scala

Line 248 in fc9e66a

synchronized {

and 8s on the InitialStamps lock

zinc/internal/zinc-core/src/main/scala/sbt/internal/inc/Stamp.scala

Line 420 in fc9e66a

synchronized { sources.getOrElseUpdate(src, underlying.source(src)) }

. Note that the lock time numbers do not correspond to wall-clock because zinc has internal parallelism, calling from multiple threads.

notes

We can improve the former by not using timeWrapBinaryStamps when calling zinc, but the latter seems to be used internally in zinc to wrap the ReadStamps that is passed in and cannot be removed. I am not very familiar with the code, but initial ideas:

The InitialStamps lock seems to just be for getAll*Stamps, but I couldn't actually see where these methods are used. Is supporting them really necessary?
The global Stamper lock seems to be to cache hash results. The shared data structure is the cache though, so it should lock on that. Using something like ConcurrentHashMap.update would likely be even faster.

The text was updated successfully, but these errors were encountered:

eed3si9n · 2022-09-18T19:43:40Z

@viveksjain Thanks for the report. For the sake of reproducibility, could you suggest exact line someone could make in the Spark code base that would reproduce this behavior please?

viveksjain · 2022-09-19T04:26:37Z

Good question, when I do spark incremental builds with mvn or sbt it seems I'm not actually able to reproduce this issue (sbt takes 15-20s but profiling shows only 300ms is spent on Stamper). I don't know enough about how mvn/sbt calls into zinc, and how it's different from how we are doing it in our internal repo (with custom bazel rules), to properly understand why. Does sbt maintain a global timeWrapBinaryStamps instance so that pretty much everything remains cached?

I guess as it currently stands this issue isn't particularly actionable without a repro, feel free to close it.

eed3si9n · 2022-09-20T23:00:37Z

Inside the target directory, sbt maintains incremental state file also known as Analysis. The Analysis contains hopefully-machine-independent content hash. If I remember correctly, timeWrapBinaryStamps is a way of optimizing this behavior by keeping the timestamp in-memory so one a single machine, we don't have to run content hashing again if the timestamp of the input has not changed.

Note that Bazel wipes the timestamp to 2010-01-01, so that part of the logic may or may not work.

viveksjain · 2022-09-21T19:04:18Z

That's my understanding as well from the code. In our bazel logic we create a new Stamper per compilation step (i.e. there will be no cache reuse across multiple targets), I was trying to clarify whether sbt does the same or presumably keeps one Stamper globally that gets reused across multiple compilations?

Friendseeker · 2023-12-28T02:48:57Z

Without exact reproduction, it is difficult to pinpoint the root cause of the issue.

I looked into the stamp logic and saw no obvious fault (The only remotely problematic part I noted is that since a TimeWrapBinaryStamps can be nested in InitialStamps, this results in two nesting synchronized, but since no deadlock happened I guess this probably does not attribute to the issue).

If the issue is still occurring on your internal repo, I guess the best we can do for now is to remove synchronized lock by more granular ConcurrentHashMap as you suggested, and hope the issue goes away....

Friendseeker mentioned this issue Dec 28, 2023

Use ConcurrentHashMap in Stamp.scala #1317

Merged

Friendseeker closed this as completed Dec 28, 2023

SethTisue closed this as not planned Won't fix, can't repro, duplicate, stale Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stamp locking slows down large builds #1129

Stamp locking slows down large builds #1129

viveksjain commented Sep 18, 2022

eed3si9n commented Sep 18, 2022

viveksjain commented Sep 19, 2022 •

edited

Loading

eed3si9n commented Sep 20, 2022

viveksjain commented Sep 21, 2022

Friendseeker commented Dec 28, 2023 •

edited

Loading

Stamp locking slows down large builds #1129

Stamp locking slows down large builds #1129

Comments

viveksjain commented Sep 18, 2022

steps

problem

notes

eed3si9n commented Sep 18, 2022

viveksjain commented Sep 19, 2022 • edited Loading

eed3si9n commented Sep 20, 2022

viveksjain commented Sep 21, 2022

Friendseeker commented Dec 28, 2023 • edited Loading

viveksjain commented Sep 19, 2022 •

edited

Loading

Friendseeker commented Dec 28, 2023 •

edited

Loading