Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove interfaces used for creating mock gRPC clients #2831

Conversation

alanwest
Copy link
Member

My primary intent with this PR is to enable the move of the generated proto code to its own package. Something like OpenTelemetry.Proto. The OTLP exporter package would then depend on OpenTelemetry.Proto rather than generating the code itself.

The interfaces we've introduced for testing purposes cause a wrinkle in this effort. The issue is that you can't extend a partial class from another assembly. So, in this case we're extending [Metric|Log|Trace]ServiceClient in order to introduce an interface.

    internal static partial class TraceService
    {
        public interface ITraceServiceClient
        {
            ExportTraceServiceResponse Export(...);
        }

        public partial class TraceServiceClient : ITraceServiceClient
        {
        }
    }

As it turns out, we're not actually making very extensive use of these interfaces (I think we were more so before). Furthermore, they're internal. I think we should remove them.

Docs for gRPC .NET essentially just guide folks to use a mocking library https://docs.microsoft.com/en-us/aspnet/core/grpc/test-client?view=aspnetcore-6.0#mock-a-grpc-client.

The only place remaining where we need to mock the client is in the benchmark project, so I've demonstrated things there. Might be nice to centralize this "get me a mock gRPC client" code somewhere if we end up needing to do the same thing in other tests.

@alanwest alanwest requested a review from a team January 29, 2022 05:11
@codecov
Copy link

codecov bot commented Jan 29, 2022

Codecov Report

Merging #2831 (9205c0a) into main (1b8599b) will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2831      +/-   ##
==========================================
+ Coverage   83.72%   83.73%   +0.01%     
==========================================
  Files         251      251              
  Lines        8866     8866              
==========================================
+ Hits         7423     7424       +1     
+ Misses       1443     1442       -1     
Impacted Files Coverage Δ
...ementation/ExportClient/OtlpGrpcLogExportClient.cs 0.00% <ø> (ø)
...tation/ExportClient/OtlpGrpcMetricsExportClient.cs 0.00% <ø> (ø)
...entation/ExportClient/OtlpGrpcTraceExportClient.cs 35.71% <ø> (ø)
...Telemetry/Internal/SelfDiagnosticsEventListener.cs 97.65% <0.00%> (+0.78%) ⬆️

@cijothomas
Copy link
Member

@alanwest looks like the CI is failing. (I retried and still the same)

@alanwest
Copy link
Member Author

alanwest commented Feb 1, 2022

Hmm, I ran it once again and it worked... though we may have a bug hiding somewhere (not related to this PR)

Found this in the log of one of the failed runs

The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
   at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success)
   at Interop.Kernel32.ResetEvent(SafeWaitHandle handle)
   at System.Threading.EventWaitHandle.Reset()
   at OpenTelemetry.BatchExportProcessor`1.ExporterProc() in /home/runner/work/opentelemetry-dotnet/opentelemetry-dotnet/src/OpenTelemetry/BatchExportProcessor.cs:line 251
   at System.Threading.Thread.StartCallback()

@reyang
Copy link
Member

reyang commented Feb 2, 2022

@alanwest do you have the link to the failed run and test case? I can take a look. I guess it might be related to some race condition that the exporter got disposed before the exporting thread joined.

The exception callstack suggested that Export failed (which is running in it's dedicated worker thread) due to the invocation to a disposed safe handle.

The safe handle used by the events were owned by the exporter instance, and is disposed here

protected override void Dispose(bool disposing)
{
if (!this.disposed)
{
if (disposing)
{
this.exportTrigger.Dispose();
this.dataExportedNotification.Dispose();
this.shutdownTrigger.Dispose();
}
.

The exporter shutdown logic is making a time-bound invocation on worker thread's Join:

OpenTelemetrySdkEventSource.Log.DroppedExportProcessorItems(this.GetType().Name, this.exporter.GetType().Name, this.droppedCount);
if (timeoutMilliseconds == Timeout.Infinite)
{
this.exporterThread.Join();
return this.exporter.Shutdown();
}

var sw = Stopwatch.StartNew();
this.exporterThread.Join(timeoutMilliseconds);
var timeout = timeoutMilliseconds - sw.ElapsedMilliseconds;
return this.exporter.Shutdown((int)Math.Max(timeout, 0));

If the test case is not giving the exporter sufficient time to join the worker thread, the events could be disposed while the worker thread is still running. We could in theory swallow these exceptions and check if shutdown already happened though.

@alanwest
Copy link
Member Author

alanwest commented Feb 2, 2022

Failed run: https://github.com/open-telemetry/opentelemetry-dotnet/runs/5028395785?check_suite_focus=true

It's unclear exactly which test case it failed on but the surrounding passing test cases suggests it may be OpenTelemetry.Trace.Tests.BatchExportActivityProcessorTest.CheckForceFlushExport. Probably for one of the test cases where the timeout is very short - 0 or 1 ms.

  Passed OpenTelemetry.Trace.Tests.BatchExportActivityProcessorTest.CheckExportForRecordingButNotSampledActivity [4 ms]
  Passed OpenTelemetry.Trace.Tests.BatchExportActivityProcessorTest.CheckForceFlushExport(timeout: -1) [1 s]
The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.

...

@reyang
Copy link
Member

reyang commented Feb 2, 2022

Failed run: https://github.com/open-telemetry/opentelemetry-dotnet/runs/5028395785?check_suite_focus=true

It's unclear exactly which test case it failed on but the surrounding passing test cases suggests it may be OpenTelemetry.Trace.Tests.BatchExportActivityProcessorTest.CheckForceFlushExport. Probably for one of the test cases where the timeout is very short - 0 or 1 ms.

Thanks! That's sufficient for me to look into the problem 💪

@cijothomas cijothomas merged commit 840b24e into open-telemetry:main Feb 2, 2022
@alanwest alanwest deleted the alanwest/remove-otlp-grpc-client-interfaces branch February 2, 2022 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants