-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mysql2 adapter #278
Add mysql2 adapter #278
Conversation
I am 👍 👍 👍 for use of containers for integration tests rather than mocking. |
response = super(sql, options) | ||
end | ||
|
||
response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local response
doesn’t really buy us anything here. tracer.in_span(...) do ... end
will return the result of the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 makes sense, updated
# https://github.com/open-telemetry/opentelemetry-python/blob/39fa078312e6f41c403aa8cad1868264011f7546/ext/opentelemetry-ext-dbapi/tests/test_dbapi_integration.py#L53 | ||
# This would create span names like mysql.default, mysql.replica, postgresql.staging etc etc | ||
database_name ? "mysql.#{database_name}" : 'mysql' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic conventions for databases states:
Span
name
should be set to low cardinality value representing the statement executed on the database. It may be stored procedure name (without argument), sql statement without variable arguments, etc. When it's impossible to get any meaningful representation of the spanname
, it can be populated using the same value asdb.instance
.
Do we have any way to get a meaningful, low-cardinality representation of the statement at this point? E.g. “stored procedure name (without argument), sql statement without variable arguments”. I think the answer is “no”, in which case what you have seems fine, but I’d love to be proven wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, so i poked around the mysql2 codebase a good amount here, and also looked into how some similar open-telemetry-python instrumentations were doing things, and it seems like the answer is no. That being said I am very far from a SQL expert, so i'd be curious to know how others are handling this. Is it just that everyone is breaking spec? Additionally, how are other's handling the db.statement
sql obfuscation, is it all at the collector level? If there's an example of obfuscation of db.statement done at the tracer adapter level, perhaps we could leverage that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are other's handling the
db.statement
sql obfuscation
I don’t think anyone is. We (Shopify) dug into this quite a bit a few years ago and concluded we’d have to add a SQL parser to sanitize or obfuscate the query. We redact the db.statement
attribute from all spans in the collector — there is no obfuscation support specific to SQL queries there — but I don’t know what others are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense. From the perspective of where this instrumentation is being ported from (dd-trace-rb), the datadog-agent
handles the obfuscation, which makes low cardinality representations of sql possible without adding that work into the tracing client code. In this case, I don't believe it's possible to set span.name to a representation of sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use the raw sql as a span name, but I join you all in not knowing the best way to deal with this. The spec says:
When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.
However, that doesn't make this instrumentation very useful it all it does is create a span with the database instance information. I'd like to see if I can find any other SIGs who have encountered this problem, but haven't yet. I'll look / ask around a bit more.
One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.
Coincidentally, we just made a very similar change to our Shopify-internal tracing gem. It seems like a reasonable path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i agree that MySQL <INSERT/SELECT/DELETE/etc>
seems reasonable enough. I can think of some very inefficient ways to do this but going to look to see if this pattern is being used in other otel repo's so that I don't have to re-invent the wheel here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our code for this identifies specific patterns:
QUERY_NAMES = [
"set names",
"select",
"insert",
"update",
"delete",
"begin",
"commit",
"rollback",
"savepoint",
"release savepoint",
"explain",
"drop database",
"drop table",
"create database",
"create table",
].freeze
QUERY_NAME_RE = Regexp.new("^(#{QUERY_NAMES.join('|')})", Regexp::IGNORECASE)
...
QUERY_NAME_RE.match(sql) { |match| match[1].downcase } unless sql.nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's extremely helpful, thank you. I've updated the PR and tests accordingly
To Update re: testing. I've added to the circleci config a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks really good here @ericmustin. We need to figure out what to do about the span name. Let's discuss this at the SIG meeting tomorrow.
# https://github.com/open-telemetry/opentelemetry-python/blob/39fa078312e6f41c403aa8cad1868264011f7546/ext/opentelemetry-ext-dbapi/tests/test_dbapi_integration.py#L53 | ||
# This would create span names like mysql.default, mysql.replica, postgresql.staging etc etc | ||
database_name ? "mysql.#{database_name}" : 'mysql' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use the raw sql as a span name, but I join you all in not knowing the best way to deal with this. The spec says:
When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.
However, that doesn't make this instrumentation very useful it all it does is create a span with the database instance information. I'd like to see if I can find any other SIGs who have encountered this problem, but haven't yet. I'll look / ask around a bit more.
One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks @ericmustin !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thanks @ericmustin!
Summary
This PR addresses #243 and adds a Mysql2 adapter.
The instrumentation is largely a port from dd-trace-rb, found here, with a few modifications to account for opentelemetry span conventions.
Notes / Open Questions
I do have some more open ended questions on the test suite though:
webmock
orfakeredis
.docker-compose.yml
, as well as a minor modification to the base app's Dockerfile to add support for mysql.I tried to modify the circeci config as well but have almost surely botched it, so I assume the CI will probably fail, as
I wasn't able to test circleci version 2.1 locally, and got a bit bogged down trying to test with a comparable 2.0 format.(Update: Yes, i totally botched it, but after upgrading my circleci-cli version, I can at least test v2.1 configs now. reverted the circleci config changes, so the CI test suite is currently not testing the above referenced tests)Overall, switching over to using containers for integrations to test against vs writing mocks may not be worth the pain here, but think has some benefits as well (as mocking out some more obtuse instrumentations could become difficult and time consuming). Happy to move back to mocks here if that's preferred, just let me know