Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TE startup/time-to-first-request regression #84923

Closed
EgorBo opened this issue Apr 17, 2023 · 13 comments · Fixed by #85370
Closed

TE startup/time-to-first-request regression #84923

EgorBo opened this issue Apr 17, 2023 · 13 comments · Fixed by #85370
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue

Comments

@EgorBo
Copy link
Member

EgorBo commented Apr 17, 2023

Startup and time to first requests regressions on all x64 targets, e.g.

image

image

(arm64 seems to be not affected)

cc @sebastienros

dotnet/runtime commits: 722e19b...5117e1f
dotnet/aspnetcore commits: dotnet/aspnetcore@7d0c273...f3387d6

@EgorBo EgorBo added the tenet-performance Performance related issue label Apr 17, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 17, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Apr 17, 2023

My blind guess is dotnet/aspnetcore#47664 cc @eerhardt

@eerhardt
Copy link
Member

My blind guess is dotnet/aspnetcore#47664

I'd be surprised if that caused this - TE doesn't use CreateSlimBuilder (AFAIK). Also because "(arm64 seems to be not affected)".

@jeffschwMSFT
Copy link
Member

this regression looks pretty significant. who is the right team/person to take an initial look?

cc @danmoseley

@jkotas
Copy link
Member

jkotas commented Apr 17, 2023

How much JIT time is there before / after? My blind guess would be that #84623 broke R2R and made a lot of methods fallback to JIT.

@EgorBo
Copy link
Member Author

EgorBo commented Apr 17, 2023

How much JIT time is there before / after? My blind guess would be that #84623 broke R2R and made a lot of methods fallback to JIT.

Good point! @sebastienros it'd be nice to also visualize "methods jitted" counter if possible, at least somewhere.
I'll check this theory via crank

@EgorBo
Copy link
Member Author

EgorBo commented Apr 17, 2023

| application                             | t1                         | t2                         |              |
| --------------------------------------- | -------------------------- | -------------------------- | ------------ |
| CPU Usage (%)                           |                         98 |                         97 |       -1.02% |
| Cores usage (%)                         |                      2,738 |                      2,729 |       -0.33% |
| Working Set (MB)                        |                        532 |                        525 |       -1.32% |
| Private Memory (MB)                     |                      1,278 |                      1,281 |       +0.23% |
| Build Time (ms)                         |                     14,671 |                     12,716 |      -13.33% |
| Start Time (ms)                         |                        307 |                        343 |      +11.73% |
| Published Size (KB)                     |                    113,323 |                    113,386 |       +0.06% |
| Symbols Size (KB)                       |                         51 |                         51 |        0.00% |
| .NET Core SDK Version                   | 8.0.100-preview.4.23211.45 | 8.0.100-preview.4.23212.21 |              |
| Max CPU Usage (%)                       |                         97 |                         97 |       +0.17% |
| Max Working Set (MB)                    |                        558 |                        550 |       -1.42% |
| Max GC Heap Size (MB)                   |                        341 |                        340 |       -0.28% |
| Size of committed memory by the GC (MB) |                        472 |                        459 |       -2.57% |
| Max Number of Gen 0 GCs / sec           |                       5.00 |                       4.00 |      -20.00% |
| Max Number of Gen 1 GCs / sec           |                       2.00 |                       1.00 |      -50.00% |
| Max Number of Gen 2 GCs / sec           |                       1.00 |                       1.00 |        0.00% |
| Max Time in GC (%)                      |                       0.00 |                       0.00 |              |
| Max Gen 0 Size (B)                      |                        584 |                  1,250,632 | +214,049.32% |
| Max Gen 1 Size (B)                      |                  4,114,520 |                  5,832,264 |      +41.75% |
| Max Gen 2 Size (B)                      |                  5,364,376 |                  5,410,840 |       +0.87% |
| Max LOH Size (B)                        |                    185,336 |                    185,336 |        0.00% |
| Max POH Size (B)                        |                  1,251,320 |                  1,251,320 |        0.00% |
| Max Allocation Rate (B/sec)             |              1,352,462,064 |              1,348,632,512 |       -0.28% |
| Max GC Heap Fragmentation               |                         15 |                         43 |     +190.00% |
| # of Assemblies Loaded                  |                        110 |                        110 |        0.00% |
| Max Exceptions (#/s)                    |                        450 |                        450 |        0.00% |
| Max Lock Contention (#/s)               |                         71 |                         93 |      +30.99% |
| Max ThreadPool Threads Count            |                         48 |                         48 |        0.00% |
| Max ThreadPool Queue Length             |                        180 |                        158 |      -12.22% |
| Max ThreadPool Items (#/s)              |                    569,133 |                    567,157 |       -0.35% |
| Max Active Timers                       |                          0 |                          0 |              |
| IL Jitted (B)                           |                    315,172 |                    416,050 |      +32.01% |
| Methods Jitted                          |                      3,350 |                      4,601 |      +37.34% |


| load                   | t1        | t2        |         |
| ---------------------- | --------- | --------- | ------- |
| CPU Usage (%)          |        41 |        41 |   0.00% |
| Cores usage (%)        |     1,150 |     1,148 |  -0.17% |
| Working Set (MB)       |        48 |        48 |   0.00% |
| Private Memory (MB)    |       358 |       358 |   0.00% |
| Start Time (ms)        |         0 |         0 |         |
| First Request (ms)     |       188 |       255 | +35.64% |
| Requests/sec           |   500,534 |   499,908 |  -0.13% |
| Requests               | 7,557,852 | 7,548,509 |  -0.12% |
| Mean latency (ms)      |      0.59 |      0.60 |  +0.68% |
| Max latency (ms)       |     44.44 |     52.43 | +17.98% |
| Bad responses          |         0 |         0 |         |
| Socket errors          |         0 |         0 |         |
| Read throughput (MB/s) |    103.58 |    103.45 |  -0.13% |
| Latency 50th (ms)      |      0.48 |      0.48 |  -0.21% |
| Latency 75th (ms)      |      0.56 |      0.56 |  -0.18% |
| Latency 90th (ms)      |      0.64 |      0.65 |  +0.62% |
| Latency 99th (ms)      |      3.63 |      3.56 |  -1.93% |

@jkotas you're right, the number of methods jitted is significantly higher now (in relative numbers)

No idea what happened with gen0 size here, probably some artifact, will run again. UPD: yeah gen0 size is random between runs

@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 17, 2023
@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 17, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Apr 18, 2023

@tannergooding can you take a look whether it's indeed #84623 could break it?

@kunalspathak
Copy link
Member

@tannergooding can you take a look whether it's indeed #84623 could break it?

@tannergooding - did you get a chance to check this out?

@tannergooding
Copy link
Member

Sorry, no. I missed the ping.

Why is the expectation that changing a JIT side promotion limit broke R2R? Was there some corresponding R2R side change that would've been required here or do we think its due to additional querying of compOpportunisticallyDependsOn(InstructionSet_AVX)?

I'm not seeing the connection here nor why this would significantly increase the number of failing methods compared to all the other places we use SIMD and have to query support.

@kunalspathak
Copy link
Member

or do we think its due to additional querying of compOpportunisticallyDependsOn(InstructionSet_AVX)?

I am guessing it could be because of this one? @EgorBo - when you measured the difference in #84923 (comment) , did you do it with and without Tanner's changes?

@EgorBo
Copy link
Member Author

EgorBo commented Apr 25, 2023

or do we think its due to additional querying of compOpportunisticallyDependsOn(InstructionSet_AVX)?

I am guessing it could be because of this one? @EgorBo - when you measured the difference in #84923 (comment) , did you do it with and without Tanner's changes?

I was only able to run crank for daily SDKs, when I tried to run it with custom bits for commits in-between it got stuck 🙁

@tannergooding
Copy link
Member

I am guessing it could be because of this one?

Hmmm.

We have opportunistic checks all throughout the JIT, it's used as the delimiter to indicate something "can" use the ISA for an optimization but that it won't semantically impact what the code does.

I didn't think the expectation was that this would negatively impact R2R: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/vectors-and-intrinsics.md?plain=1

However, comments in largestEnregisterableStructSize and structSizeMightRepresentSIMDType seems to indicate otherwise. @davidwrighton, is that correct?

If so, then I think we need to update basically all the callers of maxSIMDStructBytes (so CanPromoteStructType, roundUpSIMDSize, roundDownSIMDSize, and getUnrollThreshold).

Or we need to update maxSIMDStructBytes to not use that particular query. Forcing the size to be ZMM_REGSIZE_BYTES or YMM_REGSIZE_BYTES in R2R pessimizes downlevel hardware and the actual "target CPU" for R2R. While forcing XMM_REGSIZE_BYTES will inversely not do the "right thing" when users explicitly target a higher baseline.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 26, 2023
@tannergooding
Copy link
Member

Have a draft fix up for this here: #85370

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants