-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/coordinator: prioritize gomote buildlets for pool quotas #48857
Comments
Change https://golang.org/cl/354644 mentions this issue: |
Change https://go.dev/cl/418951 mentions this issue: |
Change https://go.dev/cl/418952 mentions this issue: |
Change https://go.dev/cl/418953 mentions this issue: |
Change https://go.dev/cl/419034 mentions this issue: |
Change https://go.dev/cl/418950 mentions this issue: |
Change https://go.dev/cl/418949 mentions this issue: |
Change https://go.dev/cl/418948 mentions this issue: |
Change https://go.dev/cl/419035 mentions this issue: |
Change https://go.dev/cl/419036 mentions this issue: |
Change https://go.dev/cl/419429 mentions this issue: |
Change https://go.dev/cl/419428 mentions this issue: |
This refactors schedLess to use priorities and a sortable key for each SchedItem. The behavior of the scheduler remains identical with these changes. This is in preparation for using a consistent sortable key for scheduler items in order to prioritize buildlets across different quotas in buildlet pools. For golang/go#48857 Change-Id: Ie6a3570d9f5ac2fe27c9ede43672011b3b7ee8fa Reviewed-on: https://go-review.googlesource.com/c/build/+/354644 Auto-Submit: Jenny Rakoczy <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
Adds a priority queue per-cpu-quota for buildlets. The scheduler prioritization mechanism was unaware of per-CPU-type quotas. This replaces it with a queue per-quota implemented instead on the pool. With this change, this should correctly prioritize gomotes that are waiting over other builds. Previously, when waiting on a gomote (such as for a releaselet), other smaller host types could win the quota race before a larger releaselet would be created. This was particularly problematic during minor releases with a number of security fixes, which causes a very large number of concurrent builds due to all of the backports being committed shortly before the release. reverse buildlets: This change replaces waiters and waitChan with a per-hostType queue for prioritizing remote buildlet requests. EC2 pool: This replaces quota management in the EC2 pool with QuotaQueue. Some minor changes were necessary to deal with bookkeeping across different instance types. A1 metal instances, for example, have a separate hard-limit that we want to enforce, even though they also consume the normal CPU quota. Moved internal/coordinator/log to avoid an import cycle. Moved SchedItem to internal/coordinator/pool/queue, where it can live next to the Quota code that relies on it heavily. For golang/go#48857 Change-Id: Ic4039cf425d47f46e2cc6678b4b578338c653143 Reviewed-on: https://go-review.googlesource.com/c/build/+/419036 Auto-Submit: Jenny Rakoczy <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
This adds a UI for viewing what builds are in each queue. For golang/go#48857 Change-Id: I025d01510833cc9c0d23556d7f90a62119eb39fb Reviewed-on: https://go-review.googlesource.com/c/build/+/419429 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]> Auto-Submit: Jenny Rakoczy <[email protected]>
Change https://go.dev/cl/419433 mentions this issue: |
Remove stray queue length call, which wasn't even in the correct spot for Sprintf. For golang/go#48857 Change-Id: I98bb33f0b3e51d5abe25d1675554b96be893b1f9 Reviewed-on: https://go-review.googlesource.com/c/build/+/419433 Auto-Submit: Jenny Rakoczy <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Change https://go.dev/cl/420875 mentions this issue: |
These were accidentally swapped when changing return argument order. For golang/go#48857 Change-Id: I0f70d20c9e0cdb23b3d17cc51f9dfb7ce27cc108 Reviewed-on: https://go-review.googlesource.com/c/build/+/420875 Run-TryBot: Jenny Rakoczy <[email protected]> Auto-Submit: Jenny Rakoczy <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Change https://go.dev/cl/420797 mentions this issue: |
Attempt to reduce raciness when we are at our quota limit by subtracting instances we're not tracking from the available quota, and relying on our counts to be otherwise correct for GCE. Reduce the interval for pollQuota to one minute, and only update limits. pollQuota now queries all instances and accounts for their CPUs separately from the usage numbers. Our previous approach was problematic, as pollQuota would continually clobber our tracking, which may have been causing instances to contend for the same available quota. For golang/go#48857 Change-Id: I482c5900567a95e8ded968176dd8851a0aee1631 Reviewed-on: https://go-review.googlesource.com/c/build/+/420797 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Auto-Submit: Jenny Rakoczy <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]>
Change https://go.dev/cl/422596 mentions this issue: |
Updates the username matching in the coordinator to determine if a buildlet is for a release. Updates golang/go#48857 Change-Id: I0d7197ac2955baab2d8b3eb37bb265cf77a43876 Reviewed-on: https://go-review.googlesource.com/c/build/+/422596 Auto-Submit: Jenny Rakoczy <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]>
Change https://go.dev/cl/426794 mentions this issue: |
Handle QUOTA_EXCEEDED errors when trying to create GCE buildlets. Due to contention with other processes (GKE, etc.) in our builder project, we may not have a perfect representation of quota usage. Instead, retry as long as someone is waiting for a builder. Also, delete some dead code. Updates golang/go#48857 Change-Id: Iecf48378bdd46a4c817c09436fc45b4e27744ab6 Reviewed-on: https://go-review.googlesource.com/c/build/+/426794 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Auto-Submit: Jenny Rakoczy <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]>
Change https://go.dev/cl/426798 mentions this issue: |
This was intended to be part of CL 426794. Updates golang/go#48857 Change-Id: Ic317c7fb633910c969f27bd220e17dba148e8909 Reviewed-on: https://go-review.googlesource.com/c/build/+/426798 Run-TryBot: Jenny Rakoczy <[email protected]> Reviewed-by: Carlos Amedee <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://go.dev/cl/428114 mentions this issue: |
As implemented, it just logs a pointer address. Updates golang/go#48857 Change-Id: Iba5bc8dbbb7a8eb7b5c110cd34233731e4f534ef Reviewed-on: https://go-review.googlesource.com/c/build/+/428114 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Run-TryBot: Jenny Rakoczy <[email protected]>
Change https://go.dev/cl/435379 mentions this issue: |
Some of our logging in the coordinator includes pointers and repetition, like so: 2022/09/27 19:16:27 failed with errors: [0xc02b50e120] 2022/09/27 19:16:27 Failed to create VM: "failed with errors: [0xc02b50e120]". Retrying after 1 second (attempt: 2). The GCE error types only have MarshalJSON methods, so use that to show more information. (This builds on the previous attempt in CL 428114.) Remove the duplicate log, since the caller can log or otherwise handle the error. For golang/go#48857. Change-Id: I0d31cac833ea617b4d722751e3ffe9a6689aeb87 Reviewed-on: https://go-review.googlesource.com/c/build/+/435379 Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Run-TryBot: Dmitri Shuralyov <[email protected]> Reviewed-by: Jenny Rakoczy <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
When working on the 1.16.9 and 1.17.2 releases (#48842, #48843), a buildlet was pending for the linux-s390x-cross builder for some time, despite seemingly having quotas.
The cause was threefold:
In the second case, it's a race for quota for which machine gets created across all host types. For unpopular (or release-only builders), it's the worst case scenario, as there will be very few pending s390x-cross builders to be able to prioritize the gomote properly in the scheduler.
We can make the following changes:
The text was updated successfully, but these errors were encountered: