-
Notifications
You must be signed in to change notification settings - Fork 21
[Revalidation] Improve throughput by batching revalidations #667
Conversation
// Split the list of revalidations by which ones have been completed. | ||
var completed = new List<PackageRevalidation>(); | ||
var uncompleted = revalidations.ToDictionary( | ||
r => Tuple.Create(r.PackageId, r.PackageNormalizedVersion), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create [](start = 27, length = 6)
Can we use PackageIdentity
or an anonymous type instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the tuple with a string. This was necessary for the Entity Framework queries to work.
What are the measurements for this performance improvement? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
Update tests and config Improving logging Undo hack :'( Update connection strings
908e329
to
e7367ba
Compare
/// </summary> | ||
public int? MaximumPackageVersions { get; set; } | ||
public int MaxBatchSize { get; set; } = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 64? I think every limit we've used so far has been a multiple of 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 was chosen rather arbitrarily as this is the largest number that fits in a single catalog commit. This number directly correlates into how big queries into the Gallery DB are, so I think we should avoid any thing too big.
Side-note: at 64 revalidations per batch, each batch on DEV spend ~30 seconds deciding whether to enqueue the batch, and ~3 seconds actually enqueueing the batch.
src/NuGet.Services.Revalidate/Services/PackageRevalidationStateService.cs
Outdated
Show resolved
Hide resolved
@joelverhagen I configured the job to do 1500 revalidations per hour on DEV, and it was able to maintain ~1470 revalidations per hour for nine hours. Previously, the highest rate I was able to reach with the job was ~1300 revalidations per hour, and to reach that rate I had to configure the job to do 2000 revalidations per hour. |
Increase the throughput of the revalidation job by batching revalidations. The revalidation job does a bunch of work before enqueueing revalidations as it must: 1. Ensure the revalidation job hasn't been killswitched 2. Verify that the ingestion pipeline is healthy 3. Verify that the desired package event rate hasn't been reached This change amortizes that work by enqueueing revalidations in batches. Also, the job now takes into account how long it spent enqueueing revalidations when deciding how long to sleep for. Addresses https://github.com/NuGet/Engineering/issues/1877
Increase the throughput of the revalidation job by batching revalidations. The revalidation job does a bunch of work before enqueueing revalidations as it must:
This change amortizes that work by enqueueing revalidations in batches. Also, the job now takes into account how long it spent enqueueing revalidations when deciding how long to sleep for.
Addresses https://github.com/NuGet/Engineering/issues/1877