Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten field names in the RUM V3 spec #3414

Merged
merged 2 commits into from
Mar 18, 2020
Merged

Conversation

jalvz
Copy link
Contributor

@jalvz jalvz commented Mar 3, 2020

Some pending decisions:
- What to do with fields not used by RUM currently (leave them / remove them / shorten them anyways). Ill come up with the list of fields, and upload also a file with the mappings.
- What to spec in marks.json, and weather to make a separate spec for RUM V3.

Fields removed in this PR (not used by RUM):

response.finished
response.headers_sent
request.body
request.socket
request.cookies
ephemeral_id
service.node
span.db
stacktrace.vars
stactrace.library_frame

Closes #3403

@jalvz
Copy link
Contributor Author

jalvz commented Mar 3, 2020

fyi @vigneshshanmugam

@jalvz jalvz force-pushed the shorten-rum-fields branch from be26e17 to 681a56f Compare March 3, 2020 15:18
}
}
},
"framework": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might start sending these field since we already have framework specific information. The reason why they are not sent right now its mainly bcoz of payload size. Can we also shorten these fields

"type": ["string", "null"],
"maxLength": 1024
},
"runtime": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these could be prefilled from user agent since that is the runtime for the RUM agent.

"type": ["boolean", "null"],
"description": "Indicates whether the span was executed synchronously or asynchronously."
}
},
"required": ["duration", "name", "type", "id","trace_id", "parent_id"]
"required": ["d", "n", "t", "id","trace_id", "parent_id"]
Copy link
Member

@vigneshshanmugam vigneshshanmugam Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace_id and parent_id must be optional right? . Or will that be part of a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that is job for another pr :)

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good @jalvz. Dont really belong here, Can we make this v3 spec backwards compatible? Newer versions of APM server would be able to understand the shortened vs unshortened fields (if users is using old versions of the agent)

@jalvz
Copy link
Contributor Author

jalvz commented Mar 9, 2020

Here are some fields not shortened so far:

r.finished

r.headers_sent

r.headers

q.body

q.headers

q.http_version

q.socket

q.url[raw|protocol|full|hostname|port|pathname|search|hash]

q.cookies

se.ephemeral_id

se.framework

se.environment

se.runtime

se.node

st.context_line

st.classname

st.library_frame

st.module

st.post_context

st.pre_context

st.vars

y.action

You mentioned framework, what about the rest? I am quite sure that some can be dropped, but wanted to hear from you

@vigneshshanmugam
Copy link
Member

vigneshshanmugam commented Mar 9, 2020

Fields that RUM would never use it in for the foresable future.

  • r.finished
  • r.headers_sent
  • q.body (Expensive to read stream in RUM, Might end up in memory leak if not done correctly)
  • q.socket
  • se.ephemeral_id
  • se.node
    // Hard to get these below fields unless browsers exposes these (not going to happen for now)
  • st.vars
  • st.library_frame
  • q.cookies - Possible but totally depends on Same site cookie spec. Also this could be large and might pollute our payload. so we should try to not send this at all.

Needs changes

  • se.environment - se.e
  • q.headers, r.headers - RUM agent currently sets server-timing info on response headers. so if we can have specific fields and also accept wildcards, we can set headers to q.h and r.h and r.h.st for server-timing info.
  • se.runtime - can be autofilled from User agent by server
  • q.url - RUM agent currently sends context.http.url which is same as the request URL. We can delegate the work to APM server and can do the job of parsing the URL if its required.
  • y.action - Optional right now and we don't set it for all spans at the moment. we can shorten it to y.a. We might send it as external.http.post or external.http.get in future.

// Might be used by Server when souremaps is present

  • st.post_context
  • st.pre_context

@jalvz
Copy link
Contributor Author

jalvz commented Mar 9, 2020

Awesome, thanks! I'll investigate the runtime bit, that would have to happen in a pipeline.

@jalvz jalvz force-pushed the shorten-rum-fields branch 3 times, most recently from 08ead1c to 4090f73 Compare March 11, 2020 12:11
@codecov-io
Copy link

codecov-io commented Mar 11, 2020

Codecov Report

Merging #3414 into master will increase coverage by 0.12%.
The diff coverage is 96.68%.

@@            Coverage Diff             @@
##           master    #3414      +/-   ##
==========================================
+ Coverage   79.29%   79.41%   +0.12%     
==========================================
  Files         109      109              
  Lines        5751     5772      +21     
==========================================
+ Hits         4560     4584      +24     
+ Misses       1191     1188       -3     
Impacted Files Coverage Δ
utility/data_fetcher.go 63.27% <0.00%> (ø)
model/metadata/metadata.go 92.00% <71.42%> (-3.75%) ⬇️
model/span/event.go 85.31% <96.87%> (+0.17%) ⬆️
beater/api/profile/handler.go 85.47% <100.00%> (ø)
model/context.go 60.64% <100.00%> (+0.97%) ⬆️
model/error/event.go 96.72% <100.00%> (+0.01%) ⬆️
model/message.go 93.10% <100.00%> (ø)
model/metadata/service.go 91.89% <100.00%> (+0.11%) ⬆️
model/metadata/user.go 100.00% <100.00%> (ø)
model/stacktrace.go 87.50% <100.00%> (ø)
... and 4 more

@jalvz jalvz force-pushed the shorten-rum-fields branch from 4090f73 to 7854c89 Compare March 11, 2020 15:00
},
"id": "ec2e280be8345240",
"marks": {
"a": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill create a ticket to define these in a follow-up pr

schema: er.RUMV3Schema,
modelDecoder: er.DecodeRUMV3Event,
},
},
metadataSchema: metadata.RUMV3ModelSchema(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metricset model is missing here, I'll file a ticket for that to follow up in a separate PR

@jalvz jalvz changed the title [WIP] Shorten field names in the RUM V3 spec Shorten field names in the RUM V3 spec Mar 11, 2020
@jalvz jalvz requested a review from a team March 11, 2020 15:11
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Intake JSON schema defines all of the fields that will be processed (with the exception of custom, which is not indexed). We explicitly allow additional fields to keep future versions of the agents compatible with older server versions. In case additional fields are sent, they are not processed and neither indexed. With these changes the Intake JSON schema for RUM only validates a part of the fields that are processed, as some fields were removed from the spec, but all Intake endpoints share the same model logic. Since the RUM endpoints are not protected, there is also no protection from someone else sending up fields processed but not JSON schema validated.
This might be a rather theoretical concern, so I am not asking for any related changes in this PR, but would like to get a discussion going about possible implications.

}
return s
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking shortFieldNames when creating the function rather than on every field name call? Something like:

func Mapper(shortFieldNames bool) func(string) string {
	if shortFieldNames {
		return func(s string) string {
			if shortField, ok := rumV3Mapping[s]; ok {
				return shortField
			}
			return s
		}
	}
	return func(s string) string {
		return s
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is more complex and the performance impact is negligible. I can benchmark it if you think is worth it, tho.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect it to only be a couple of nanoseconds difference per call, when shortFieldNames == false, but it is called for every decoded field, which is potentially a couple hundreds for larger events. This won't mage a big difference, especially as no allocations are involved, but the suggested solution is fairly straight forward, so I would rather go with that and avoid unnecessary access of the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all things equal ("equal" meaning +-200ns), I rather chose the simplest implementation if you don't mind (or avoid unnecessary complexity, put it another way).

docs/data/intake-api/generated/rum_v3_events.ndjson Outdated Show resolved Hide resolved
@@ -16,7 +16,7 @@
"type": ["string", "null"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you though about creating a dedicated docs/spec/rum_v3/ folder that contains all the specs? Not a requirement if you prefer to keep it as is, but I think it would be easier to track if there are still references to unshortened specs and generally easier to navigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but mixing 2 taxonomies could be more confusing... eg. rum_v3_transactions could still fit in either transactions or rum_v3, not sure why one would make more sense over the other...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the first time the endpoint and version are part of the spec name, therefore I think the suggested grouping makes sense. Just a suggestion though, up to you how to finally organize it.

model/fields/rum_v3_mapping.go Outdated Show resolved Hide resolved
if input == nil || err != nil {
return nil, err
}
raw, ok := input.(map[string]interface{})
if !ok {
return nil, errors.New("invalid type for service")
}
field := fields.Mapper(hasShortFieldNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could initialize the decoder with the mapping information decoder(fields.Mapper(hasShortFieldNames)), allowing to abstract away the mapper inside the decoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, but there are a lot of small functions (decodeHTTP, decodeService, decodeStacktrace, etc) that would need to be adapted to be methods instead, meaning that they would not be reusable (eg. duplicated metadata.DecodeService, transaction.DecodeService, span.DecodeService, etc.).
At the end thought it was not worth it and this produced the smallest diff.

testdata/intake-v3/rum_events.ndjson Outdated Show resolved Hide resolved
@jalvz
Copy link
Contributor Author

jalvz commented Mar 12, 2020

This might be a rather theoretical concern, so I am not asking for any related changes in this PR, but would like to get a discussion going about possible implications.

So, for my understanding, what are those implications?

model/context.go Outdated
@@ -125,7 +125,7 @@ func DecodeContext(input interface{}, cfg Config, err error) (*Context, error) {
}

decoder := utility.ManualDecoder{}
field := fields.Mapper(cfg.HasShortFieldNames)
field := field.Mapper(cfg.HasShortFieldNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The var field is shadowing the package name now. How about changing the variable to mapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... amended

@simitt
Copy link
Contributor

simitt commented Mar 12, 2020

Created #3481 to bundle the discussion started above and not block this PR.

@jalvz jalvz force-pushed the shorten-rum-fields branch from f55c541 to 5480fb2 Compare March 18, 2020 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RUM v3] Shorten field names in the intake spec
4 participants