-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression: 6x slower array allocation on Alpine #41398
Comments
Tagging subscribers to this area: @dotnet/gc |
These tests are measuring raw GC allocation speed. This is GC performance regression. @mangod9 Could you please find somebody to investigate this? |
Ok will have someone take a look. @adamsitnik , Is the regression specifically with Alpine on WSL2, or Alpine in general? |
mangod9 The benchmarks were executed with Alpine on WSL2, I don't have "Alpine only" results to be able to answer. FWIW the Ubuntu on WSL2 results are fine so it might be a general Alpine problem |
It repros without WSL2 too. The problem is that the GC is running like 100x more often than it should. It is likely problem in the budget computation. One of the places to check is |
ok thanks for the context. We will look if there is any Alpine specific behavior which is leading to this. |
Here is a simple repro: using System.Diagnostics;
namespace System.Collections
{
class Program
{
static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
var foo = new int[512];
}
sw.Stop();
Console.WriteLine(System.GC.CollectionCount(0));
Console.WriteLine(sw.ElapsedTicks);
}
}
}
|
yeah this is clearly related to calculating the minimal gen0 budget because this test, which allocates only temporary objects, will only trigger gen0 GCs that will always have a min gen0 budget. The min gen0 budget is calculated in |
I investigated this more and it appears none of the Wonder if #34488 caused the regression, since I notice this case is missing when compared to 3.1 |
I think you are right. It means that we had a performance bug for Alpine here even before. The code that I have deleted in #34488 produced provably wrong results for AMD and the logic was questionable for Intel too. Would you like me to work on getting this fixed? |
I notice that you had an aborted PR: #34484 to fix |
#34484 fixed just one of the multiple different bugs that this code had. I think it needs to be written more or less from scratch using current editions of Intel and AMD manuals. |
We can also gather this information from the |
We have this code for ARM64, so we can just remove the ifdef around it and use it unconditonally. |
yeah just removing this seems like the best option for 5. Will create a PR, do we need to rethink for 6? |
Agree. We have a duplicate of this code in gcenv.unix.cpp for standalone GC, so please remove it there as well. I think it is fine as a permanent fix ... until we discover the next problem with this. |
imo, the better fix would be: - #if defined(HOST_ARM64)
+ #if defined(TARGET_LINUX) to keep the support for non-Linux Unix-like operating systems intact (macOS, FreeBSD, SunOS and since so forth). |
Reopening, so it can ported to .net 5 |
I have validated that the simplified repro shows low gen0 collection counts. I am unable to get the benchmarks working on WSL2 (will investigate), @adamsitnik if you could please validate that the benchmarks are showing comparable perf to 3.1 that would be great (guess we need to wait for a daily build anyways). Thx. |
@mangod9 what error are you getting? You should be able to run the python script and get 3.1 results: git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 --filter 'System.Collections.CtorGivenSize<Int32>.Array' And then run the benchmarks as 5.0 using your local build of dotnet/runtime by specifying the path to python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.CtorGivenSize<Int32>.Array' \
--corerun './runtime/artifacts/bin/testhost/net5.0-Alpine-Release-x64/shared/Microsoft.NETCore.App/5.0.0/corerun' You can read more about benchmarking local builds of dotnet/runtime here
I currently don't have access to Alpine machine, but as soon as #41547 get propagated to SDK I can spawn a new Azure VM and run the benchmarks |
The container was missing libgdiplus on Alpine, but was able to resolve it by adding it from a I was able to validate that the fix gets the perf to be comparable with 3.1:
|
Ported to 5.0 with PR: #41547 |
Thanks @adamsitnik and @mangod9 for resolving, I'm very happy that my data turned out to be rather useful! |
@danmosemsft @mangod9 I wonder how we could prevent from a similar problem in the future. Perhaps we should add some asserts to the method to ensure that it never returns 0 again? |
@danmosemsft Yeah it was very helpful in pinpointing where the issue might be. @adamsitnik yeah I will create a separate issue to track how we can add a test/asserts for this. |
@Lxiamail @billwert @DrewScoggins this is evidence I think that regular Alpine runs in the lab are important. This would have justified servicing, I think. I'm not sure where we left that conversation @billwert ? |
@danmosemsft Yes, we have an email discussion about adding additional OS coverage in perf lab. We will looks into @adamsitnik's finalized report of exercises' data, and I'm trying to get .NET Core OS usage telemetry data. Hope we can identify the commonly used OSes and add them to perf lab. |
@Lxiamail sounds good, thanks! Telemetry aside, I think if we can only do 2 Linux flavors, Alpine probably needs to be one of those because of its unique characteristics. |
|
Just as side note, https://stackshare.io/stackups/alpine-linux-vs-ubuntu shows Alpine has a lot less users than Ubuntu. If only using OS telemetry to pick the Linux OSes for perf lab, Alpine may not pop to the top rank. |
Alpine is different since it uses the |
Right, anything related to libc can have different characteristics. Beyond musl, Alpine is the most "size optimized" distro we currently support, so it may have unique characteristics elsewhere because of those optimizations (eg possibly missing bits of /proc - I don't know, just an example). Plus, folks who deploy to Alpine are presumably particularly sensitive to perf (or at least size) so we should particularly care about the perf on Alpine. cc @richlander in case he wants to add anything. |
@mangod9 @danmosemsft thanks for the info! Looks like we definitly should add Alpine to perf lab. Is there any other OSes have unique charactenstics, which should be considered to monitor in perf lab? |
Others are better positioned to answer that one, off the top of my head I cannot remember Linux regressions that wouldn't show up in either Ubuntu or Alpine. |
Adding Alpine is a good start, we can always re-evaluate if we find any other distro specific issues. |
From the data that I got from @danmosemsft which was collected by running dotnet/performance microbenchmarks on alpine 3.11 via WSL2, it looks like allocating arrays of both value and reference types became 6 times slower compared to 3.1.
Initially, I thought that it was just an outlier, but I can see the same pattern for other collections that internally use arrays (queue, list, stack etc). The regression is specific to alpine. Ubuntu 18.04 (with and without WSL2) is fine.
@jkotas @janvorli who would be the best person to investigate that?
Repro
System.Collections.CtorGivenSize.Array(Size: 512)
The text was updated successfully, but these errors were encountered: