Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReJIT race: 1st time execution may not be instrumented - detected via StrongNamedTests #1242

Closed
pjanotti opened this issue Sep 16, 2022 · 7 comments · Fixed by #1279
Closed
Assignees
Milestone

Comments

@pjanotti
Copy link
Contributor

pjanotti commented Sep 16, 2022

[Edit 01: updated the description to reflect the actual underlying issue]

The bytecode instrumentations are added via requests to ReJIT the targeted methods. Those requests are made when a module finishes loading at the callback MethodLoadedFinished. The actual ReJIT request happens in a dedicated thread (per docs recommendation) and until such request is completed there is a time window in which the method can be JIT compiled and executed without the expected bytecode instrumentation. The instrumentation is going to happen only on the first method invocation after the ReJIT request is completed.

4 instances of test failures showing this issue:

@pjanotti
Copy link
Contributor Author

pjanotti commented Sep 16, 2022

Confirming that is actually flakyness: alpine, macos, and windows passed on retry on PR #1234

@pellared
Copy link
Member

Some span is captured but it does not have the expected attribute: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3097103461/jobs/5013412328

@pjanotti pjanotti modified the milestones: 0.5.0-beta, 0.4.0-beta Sep 21, 2022
@pjanotti
Copy link
Contributor Author

pjanotti commented Sep 22, 2022

It seems that the issue is related to the ReJIT happening asynchronous. In a failed pass of the test we have:

[2022-09-22T02:47:31.803708579Z] [3580|3580] [debug] Enqueue for ReJIT [ModuleId=140150729282504, MethodDef=06000001, AppDomainId=94699831388896, IsDomainNeutral=0, Assembly=TestLibrary.InstrumentationTarget, Type=TestLibrary.InstrumentationTarget.Command, Method=Execute, Signature=200001]
[2022-09-22T02:47:31.803725179Z] [3580|3580] [debug] ModuleLoadFinished stored metadata for 140150729282504 TestLibrary.InstrumentationTarget AppDomain 94699831388896 [clrhost]
[2022-09-22T02:47:31.803728679Z] [3580|3580] [debug] AssemblyLoadFinished: 94699845728400 0
[2022-09-22T02:47:31.803896979Z] [3580|3580] [debug] AssemblyLoadStarted: 94699845775072
[2022-09-22T02:47:31.803910079Z] [3580|3580] [debug] ModuleLoadStarted: 140150729284448
[2022-09-22T02:47:31.803947279Z] [3580|3580] [debug] ModuleLoadFinished: 140150729284448 TestApplication.Shared AppDomain 94699831388896 [clrhost]  | IsNGEN = false | IsDynamic = false | IsResource = false
[2022-09-22T02:47:31.804019179Z] [3580|3580] [debug] ModuleLoadFinished stored metadata for 140150729284448 TestApplication.Shared AppDomain 94699831388896 [clrhost]
[2022-09-22T02:47:31.804022379Z] [3580|3580] [debug] AssemblyLoadFinished: 94699845775072 0
[2022-09-22T02:47:31.804504578Z] [3580|3593] [info] Request ReJIT done for 1 methods

but there is no completion of the actul ReJIT, for example see an example of a successful run of the test:

[2022-09-22T03:03:24.426651900Z] [256760|195716] [debug] Enqueue for ReJIT [ModuleId=140704364384120, MethodDef=06000001, AppDomainId=2472867908960, IsDomainNeutral=0, Assembly=TestLibrary.InstrumentationTarget, Type=TestLibrary.InstrumentationTarget.Command, Method=Execute, Signature=200001]
[2022-09-22T03:03:24.426692800Z] [256760|195716] [debug] ModuleLoadFinished stored metadata for 140704364384120 TestLibrary.InstrumentationTarget AppDomain 2472867908960 [TestApplication.StrongNamed.exe]
[2022-09-22T03:03:24.426706600Z] [256760|195716] [debug] AssemblyLoadFinished: 2473312560592 0
[2022-09-22T03:03:24.427114000Z] [256760|259340] [info] Request ReJIT done for 1 methods
...
[2022-09-22T03:03:28.560054100Z] [256760|195716] [debug] GetReJITParameters: [moduleId: 140704364384120, methodId: 100663297]
[2022-09-22T03:03:28.560427700Z] [256760|195716] [debug] *** CallTarget_RewriterCallback() Start: TestLibrary.InstrumentationTarget.Command.Execute() [IsVoid=1, IsStatic=0, IntegrationType=OpenTelemetry.AutoInstrumentation.Instrumentations.Validations.StrongNamedValidation, Arguments=0]
[2022-09-22T03:03:28.560529800Z] [256760|195716] [debug] CallTargetTokens::EnsureBaseCalltargetTokens() Bytecode Instrumentation Assembly: OpenTelemetry.AutoInstrumentation, Version=0.3.1.0, Culture=neutral, PublicKeyToken=c0db600a13f60b51
[2022-09-22T03:03:28.560589800Z] [256760|195716] [debug] Caller Type.Id: 02000002
[2022-09-22T03:03:28.560598300Z] [256760|195716] [debug] Caller Type.IsGeneric: 0
[2022-09-22T03:03:28.560602600Z] [256760|195716] [debug] Caller Type.IsValid: 1
[2022-09-22T03:03:28.560607500Z] [256760|195716] [debug] Caller Type.Name: TestLibrary.InstrumentationTarget.Command
[2022-09-22T03:03:28.560612400Z] [256760|195716] [debug] Caller Type.TokenType: 33554432
[2022-09-22T03:03:28.560616200Z] [256760|195716] [debug] Caller Type.Spec: 0000001b
[2022-09-22T03:03:28.560620400Z] [256760|195716] [debug] Caller Type.ValueType: 0
[2022-09-22T03:03:28.560624500Z] [256760|195716] [debug] Caller Type Extend From.Id: 0c000001
[2022-09-22T03:03:28.560628900Z] [256760|195716] [debug] Caller Type Extend From.IsGeneric: 0
[2022-09-22T03:03:28.560633700Z] [256760|195716] [debug] Caller Type Extend From.IsValid: 1
[2022-09-22T03:03:28.560638000Z] [256760|195716] [debug] Caller Type Extend From.Name: System.Object
[2022-09-22T03:03:28.560643300Z] [256760|195716] [debug] Caller Type Extend From.TokenType: 16777216
[2022-09-22T03:03:28.560647600Z] [256760|195716] [debug] Caller Type Extend From.Spec: 0000001b
[2022-09-22T03:03:28.560652600Z] [256760|195716] [debug] Caller Type Extend From.ValueType: 0
[2022-09-22T03:03:28.560707000Z] [256760|195716] [info] *** CallTarget_RewriterCallback() Finished: TestLibrary.InstrumentationTarget.Command.Execute() [IsVoid=1, IsStatic=0, IntegrationType=OpenTelemetry.AutoInstrumentation.Instrumentations.Validations.StrongNamedValidation, Arguments=0]
[2022-09-22T03:03:28.560836300Z] [256760|195716] [debug] ReJITCompilationStarted: [functionId: 140704364385784, rejitId: 2, safeToBlock: 1]
[2022-09-22T03:03:28.588680300Z] [256760|195716] [debug] ReJITCompilationFinished: [functionId: 140704364385784, rejitId: 2, hrStatus: 0, safeToBlock: 1]

This seems to indicate that we need to ensure that the ReJIT completes before code to be instrumented is executed.

@RassK
Copy link
Contributor

RassK commented Sep 22, 2022

This seems to indicate that we need to ensure that the ReJIT completes before code to be instrumented is executed.

In other words, the startup time of a lightweight application might be too fast in time to time?

@Kielek Kielek changed the title Flaky test: IntegrationTests.StrongNamedTests.SubmitsTraces reJIT is not always working - dected by IntegrationTests.StrongNamedTests.SubmitsTraces Sep 22, 2022
@pjanotti pjanotti changed the title reJIT is not always working - dected by IntegrationTests.StrongNamedTests.SubmitsTraces ReJIT race: 1st time execution may not be instrumented - detected via StrongNamedTests Sep 23, 2022
@pjanotti
Copy link
Contributor Author

I simple modification to request the ReJIT in a synchronous fashion (directly on the thread handling ModuleLoadedFinsih) fixes the issue. However, I guess that this will have some perf impact, which, I didn't measure yet.

@pellared
Copy link
Member

If the impact will be noticeable we could introduce an env var to control it. However, I would rather not do it as it is just about longer initialization.

Probably this one failed for the same reason https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3128880953/jobs/5077270798

@pjanotti
Copy link
Contributor Author

Yeah, I don't want to give the option to "perhaps a little fast" but some non-deterministic observability results. It seems that we have to offer a single and consistent way of instrumenting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants