Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SqlClient] Add error.type and db.response.status_code attributes #2262

Merged
merged 6 commits into from
Oct 31, 2024

Conversation

alanwest
Copy link
Member

Fixes #2239

@github-actions github-actions bot added the comp:instrumentation.sqlclient Things related to OpenTelemetry.Instrumentation.SqlClient label Oct 28, 2024
Copy link
Member Author

@alanwest alanwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova @trask Can I get your thoughts on a couple things regarding error.type and db.response.status_code?

@@ -210,6 +212,13 @@ public override void OnEventWritten(string name, object? payload)
{
if (this.exceptionFetcher.TryFetch(payload, out Exception? exception) && exception != null)
{
activity.AddTag(SemanticConventions.AttributeErrorType, exception.GetType().FullName);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very useful because error.type will always equal Microsoft.Data.SqlClient.SqlException.

Per the conventions, this is not necessarily wrong, but it's not ideal:

The error.type SHOULD match the db.response.status_code returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. Instrumentations SHOULD document how error.type is populated.

Doing something like the following would be more useful and would align with the response status code.

activity.AddTag(SemanticConventions.AttributeErrorType, exception.Message);

However exception message can have variable text and could cause error.type to be high cardinality. For example:

SELECT ColumnThatDoesNotExist FROM MyTable

This query results in error code 207 and message:

Invalid column name 'ColumnThatDoesNotExist'

It's tempting to do something like the following to replace anything in single-quotes with something.

Regex.Replace(exception.Message, "'[^']*'", "'%'"));

But, glancing over the list of error codes, I'm not sure such a simple replacement will suffice.

Alternatively, we could maintain a dictionary of codes to messages... there are a lot.

Copy link
Contributor

@lmolkova lmolkova Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a common practice to set exception message on the span description.

It's ok if for .NET it's uncommon to subclass exceptions and that error.type won't be super-interesting. We still have status code.


if (this.exceptionNumberFetcher.TryFetch(exception, out var exceptionNumber))
{
activity.AddTag(SemanticConventions.AttributeDbResponseStatusCode, exceptionNumber.ToString(CultureInfo.InvariantCulture));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error has a number, severity, and state.

The number is most descriptive of the problem that occurred, though I can see severity and state being useful.

From the docs

Severity:

The severity indicates how serious the error is. Errors that have a low severity, such as 1 or 2, are information messages or low-level warnings. Errors that have a high severity indicate problems that should be addressed as soon as possible. For more information about severities, see Database Engine Error Severities.

State:

Some error messages can be raised at multiple points in the code for the Database Engine. For example, an 1105 error can be raised for several different conditions. Each specific condition that raises an error assigns a unique state code.

When you are viewing databases that contain information about known issues, such as the Microsoft Knowledge Base, you can use the state number to determine whether the recorded issue is the same as the error you have encountered. For example, if a Knowledge Base Article describes an 1105 error that has a state of 2 and the 1105 error message you received had a state of 3, the error probably has a different cause than the one reported in the article.

A Microsoft support engineer can also use the state code from an error to find the location in the source code where that error code is being raised. This information might provide additional ideas on how to diagnose the problem.

Copy link
Contributor

@lmolkova lmolkova Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please check out https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/sql.md?plain=1#L102-L136 ?

I believe MS SQL server doesn't support SQL STATE, but if you ever see both - state and code, you can concat them together - see the link above.

If severity is interesting, we can always add it later, but it seems there is a 1:1 mapping from the error code to severity and it might be redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notion of state here is different from SQL STATE. From what I can tell you're correct that MSSQL does not support SQL STATE.

My read is that for a given MSSQL error code, that code may be raised from different locations in the DBMS and the state specifies exactly which location the code was raised. I think I'm going to just ignore state.

If severity is interesting, we can always add it later, but it seems there is a 1:1 mapping from the error code to severity and it might be redundant.

I agree it's probably redundant.

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.

Project coverage is 73.08%. Comparing base (71655ce) to head (43bc9c4).
Report is 572 commits behind head on main.

Files with missing lines Patch % Lines
...ent/Implementation/SqlEventSourceListener.netfx.cs 0.00% 7 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2262      +/-   ##
==========================================
- Coverage   73.91%   73.08%   -0.83%     
==========================================
  Files         267      359      +92     
  Lines        9615    13768    +4153     
==========================================
+ Hits         7107    10063    +2956     
- Misses       2508     3705    +1197     
Flag Coverage Δ
unittests-Contrib.Shared.Tests 77.13% <ø> (?)
unittests-Exporter.Geneva 62.52% <ø> (?)
unittests-Exporter.InfluxDB 95.14% <ø> (?)
unittests-Exporter.Instana 74.86% <ø> (?)
unittests-Exporter.OneCollector 94.34% <ø> (?)
unittests-Exporter.Stackdriver 79.26% <ø> (?)
unittests-Extensions 88.63% <ø> (?)
unittests-Extensions.Enrichment 100.00% <ø> (?)
unittests-Instrumentation.AWS 85.95% <ø> (?)
unittests-Instrumentation.AspNet 77.00% <ø> (?)
unittests-Instrumentation.AspNetCore 70.06% <ø> (?)
unittests-Instrumentation.ConfluentKafka 14.37% <ø> (?)
unittests-Instrumentation.ElasticsearchClient 79.87% <ø> (?)
unittests-Instrumentation.EntityFrameworkCore 57.06% <ø> (?)
unittests-Instrumentation.EventCounters 76.36% <ø> (?)
unittests-Instrumentation.GrpcNetClient 79.61% <ø> (?)
unittests-Instrumentation.Hangfire 93.58% <ø> (?)
unittests-Instrumentation.Http 73.57% <ø> (?)
unittests-Instrumentation.Owin 85.97% <ø> (?)
unittests-Instrumentation.Process 100.00% <ø> (?)
unittests-Instrumentation.Quartz 78.94% <ø> (?)
unittests-Instrumentation.Runtime 97.53% <ø> (?)
unittests-Instrumentation.SqlClient 89.65% <36.36%> (?)
unittests-Instrumentation.StackExchangeRedis 71.21% <ø> (?)
unittests-Instrumentation.Wcf 78.62% <ø> (?)
unittests-PersistentStorage 65.21% <ø> (?)
unittests-Resources.AWS 79.16% <ø> (?)
unittests-Resources.Azure 84.56% <ø> (?)
unittests-Resources.Container 67.34% <ø> (?)
unittests-Resources.Gcp 71.15% <ø> (?)
unittests-Resources.Host 73.91% <ø> (?)
unittests-Resources.OperatingSystem 76.98% <ø> (?)
unittests-Resources.Process 100.00% <ø> (?)
unittests-Resources.ProcessRuntime 78.26% <ø> (?)
unittests-Sampler.AWS 88.23% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ient/Implementation/SqlClientDiagnosticListener.cs 84.70% <100.00%> (ø)
...ent/Implementation/SqlEventSourceListener.netfx.cs 81.15% <0.00%> (ø)

... and 377 files with indirect coverage changes

@alanwest alanwest marked this pull request as ready for review October 30, 2024 23:13
@alanwest alanwest requested a review from a team as a code owner October 30, 2024 23:13
Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:instrumentation.sqlclient Things related to OpenTelemetry.Instrumentation.SqlClient
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[sql] Support error.type and db.response.status_code attributes
3 participants