Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(plugin): OTEL exporting with translate #10332

Merged
merged 11 commits into from
Feb 23, 2023
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,8 @@
[#10160](https://github.com/Kong/kong/pull/10160)
- For `http.flavor`. It should be a string value, not a double.
[#10160](https://github.com/Kong/kong/pull/10160)
- **OpenTelemetry**: Fix a bug that when getting the trace of other formats, the trace ID reported and propagated could be of incorrect length.
[#10332](https://github.com/Kong/kong/pull/10332)
- **OAuth2**: `refresh_token_ttl` is now limited between `0` and `100000000` by schema validator. Previously numbers that are too large causes requests to fail.
[#10068](https://github.com/Kong/kong/pull/10068)

Expand Down
4 changes: 3 additions & 1 deletion kong/plugins/opentelemetry/handler.lua
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ local propagation_parse = propagation.parse
local propagation_set = propagation.set
local null = ngx.null
local encode_traces = otlp.encode_traces
local translate_span = otlp.translate_span
local transform_span = otlp.transform_span

local _log_prefix = "[otel] "
Expand Down Expand Up @@ -132,6 +133,7 @@ function OpenTelemetryHandler:access()
end

-- overwrite trace id
-- as we are in a chain of existing trace
if trace_id then
root_span.trace_id = trace_id
kong.ctx.plugin.trace_id = trace_id
Expand All @@ -145,7 +147,7 @@ function OpenTelemetryHandler:access()
root_span.parent_id = parent_id
end

propagation_set("preserve", header_type, root_span, "w3c")
propagation_set("preserve", header_type, translate_span(root_span), "w3c")
end

function OpenTelemetryHandler:log(conf)
Expand Down
36 changes: 36 additions & 0 deletions kong/plugins/opentelemetry/otlp.lua
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,13 @@ local insert = table.insert
local tablepool_fetch = tablepool.fetch
local tablepool_release = tablepool.release
local deep_copy = utils.deep_copy
local shallow_copy = utils.shallow_copy
local table_merge = utils.table_merge
local getmetatable = getmetatable
local setmetatable = setmetatable

local TRACE_ID_LEN = 16
local NULL = "\0"
local POOL_OTLP = "KONG_OTLP"
local EMPTY_TAB = {}

Expand Down Expand Up @@ -72,9 +77,39 @@ local function transform_events(events)
return pb_events
end

-- this function is to translate the span to otlp span, but of universal span format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "but of universal span format" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original design to support arbitrary types of tracing plugins has not made it clear:

  1. We have an internal representation for span, and it should preserve what it received from downstream requests at best effort;
  2. when exporting the data to collectors, it needs to transform the span to meet the requirement with its field (DD, OTEL, etc), and then translate it into a format used for the reporting API (gRPC, JSONRPC, etc). This function does the first step but not the second one.

local function translate_span(span)
local trace_id = span.trace_id
local len = #trace_id
local new_id = trace_id

StarlightIbuki marked this conversation as resolved.
Show resolved Hide resolved
if len == TRACE_ID_LEN then
return span
end

-- make sure the trace id is of 16 bytes
if len > TRACE_ID_LEN then
new_id = trace_id:sub(-TRACE_ID_LEN)

elseif len < TRACE_ID_LEN then
new_id = NULL:rep(TRACE_ID_LEN - len) .. trace_id
end

local translated = shallow_copy(span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a shallow_copy needed?

I would have expected this function to do something like:

local function adjust_trace_id_for_oltp(trace_id)

And the usage would be, when creating the root span:

root_span.trace_id = adjust_trace_id_for_oltp(trace_id)

And inside transform_span:

local pb_span = {
    trace_id = adjust_trace_id_for_oltp(span.trace_id),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't recommend using abstract names (like transform_span or translate_span) when the function is only doing one modification on one field. The tradeoff ("making the code easier to modify in the future if we find that we need to do more modifications in the future") is not worth the upfront cost ("making the code more difficult to read now"). Future us can always group several modifications into one if the need arises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a shallow_copy needed?

The span passed in may also be used by other plugins. We need to preserve as much original information as possible, and only do the transformation for exporting/propagating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree naming is a problem, but transform_span is used as some kind of interface for tracing plugins (we can see a function with the same name from datadog-tracing).
I'm adding comments and renaming the internal function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the function is in the file "otlp.lua" so I think we do not need to suffix it with "_otlp".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the usage would be, when creating the root span:

We cannot just modify the root_span given the reason: https://github.com/Kong/kong/pull/10332/files#r1115411177

setmetatable(translated, getmetatable(span))

translated.trace_id = new_id
span = translated

return span
Comment on lines +105 to +107
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, not needed if we don't do a shallow copy.

Suggested change
span = translated
return span
return translated

end

-- this function is to tranform universal span to otlp span that fits the protobuf
local function transform_span(span)
assert(type(span) == "table")

span = translate_span(span)

local pb_span = {
trace_id = span.trace_id,
span_id = span.span_id,
Expand Down Expand Up @@ -161,6 +196,7 @@ do
end

return {
translate_span = translate_span,
transform_span = transform_span,
encode_traces = encode_traces,
}
22 changes: 22 additions & 0 deletions spec/03-plugins/37-opentelemetry/03-propagation_spec.lua
Original file line number Diff line number Diff line change
Expand Up @@ -160,5 +160,27 @@ describe("propagation tests #" .. strategy, function()

assert.equals(trace_id, json.headers["ot-tracer-traceid"])
end)

it("propagates dd headers", function()
local trace_id = gen_trace_id()
local trace_id_truncated = trace_id:sub(1, 16)
local span_id = gen_span_id()
local r = proxy_client:get("/", {
headers = {
["ot-tracer-traceid"] = trace_id_truncated,
["ot-tracer-spanid"] = span_id,
["ot-tracer-sampled"] = "1",
host = "http-route",
},
})
local body = assert.response(r).has.status(200)
local json = cjson.decode(body)

assert.equals(#trace_id, #json.headers["ot-tracer-traceid"],
"trace ID was not padded correctly")

local expected = string.rep("0", 16) .. trace_id_truncated
assert.equals(expected, json.headers["ot-tracer-traceid"])
end)
end)
end