Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stamp locking slows down large builds #1129

Closed
viveksjain opened this issue Sep 18, 2022 · 5 comments
Closed

Stamp locking slows down large builds #1129

viveksjain opened this issue Sep 18, 2022 · 5 comments

Comments

@viveksjain
Copy link

steps

  1. Pick a large scala repo, like https://github.com/apache/spark
  2. Do a full compilation
  3. Make a one-line change in some source file
  4. See how long a dependent test takes to recompile

problem

I was investigating incremental scala compilation performance, which takes O(30s) for one-line change on a large internal repo and found that quite a large fraction of time is spent blocked on Stamp locks. About 16s on this global Stamper lock

and 8s on the InitialStamps lock
synchronized { sources.getOrElseUpdate(src, underlying.source(src)) }
. Note that the lock time numbers do not correspond to wall-clock because zinc has internal parallelism, calling from multiple threads.

notes

We can improve the former by not using timeWrapBinaryStamps when calling zinc, but the latter seems to be used internally in zinc to wrap the ReadStamps that is passed in and cannot be removed. I am not very familiar with the code, but initial ideas:

  1. The InitialStamps lock seems to just be for getAll*Stamps, but I couldn't actually see where these methods are used. Is supporting them really necessary?
  2. The global Stamper lock seems to be to cache hash results. The shared data structure is the cache though, so it should lock on that. Using something like ConcurrentHashMap.update would likely be even faster.
@eed3si9n
Copy link
Member

@viveksjain Thanks for the report. For the sake of reproducibility, could you suggest exact line someone could make in the Spark code base that would reproduce this behavior please?

@viveksjain
Copy link
Author

viveksjain commented Sep 19, 2022

Good question, when I do spark incremental builds with mvn or sbt it seems I'm not actually able to reproduce this issue (sbt takes 15-20s but profiling shows only 300ms is spent on Stamper). I don't know enough about how mvn/sbt calls into zinc, and how it's different from how we are doing it in our internal repo (with custom bazel rules), to properly understand why. Does sbt maintain a global timeWrapBinaryStamps instance so that pretty much everything remains cached?

I guess as it currently stands this issue isn't particularly actionable without a repro, feel free to close it.

@eed3si9n
Copy link
Member

Inside the target directory, sbt maintains incremental state file also known as Analysis. The Analysis contains hopefully-machine-independent content hash. If I remember correctly, timeWrapBinaryStamps is a way of optimizing this behavior by keeping the timestamp in-memory so one a single machine, we don't have to run content hashing again if the timestamp of the input has not changed.

Note that Bazel wipes the timestamp to 2010-01-01, so that part of the logic may or may not work.

@viveksjain
Copy link
Author

That's my understanding as well from the code. In our bazel logic we create a new Stamper per compilation step (i.e. there will be no cache reuse across multiple targets), I was trying to clarify whether sbt does the same or presumably keeps one Stamper globally that gets reused across multiple compilations?

@Friendseeker
Copy link
Member

Friendseeker commented Dec 28, 2023

Without exact reproduction, it is difficult to pinpoint the root cause of the issue.

I looked into the stamp logic and saw no obvious fault (The only remotely problematic part I noted is that since a TimeWrapBinaryStamps can be nested in InitialStamps, this results in two nesting synchronized, but since no deadlock happened I guess this probably does not attribute to the issue).

If the issue is still occurring on your internal repo, I guess the best we can do for now is to remove synchronized lock by more granular ConcurrentHashMap as you suggested, and hope the issue goes away....

@SethTisue SethTisue closed this as not planned Won't fix, can't repro, duplicate, stale Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants