Shorten field names in the RUM V3 spec #3414

jalvz · 2020-03-03T15:17:01Z

~~Some pending decisions:~~
~~- What to do with fields not used by RUM currently (leave them / remove them / shorten them~~ ~~anyways). Ill come up with the list of fields, and upload also a file with the mappings.~~
~~- What to spec in marks.json, and weather to make a separate spec for RUM V3.~~

Fields removed in this PR (not used by RUM):

response.finished
response.headers_sent
request.body
request.socket
request.cookies
ephemeral_id
service.node
span.db
stacktrace.vars
stactrace.library_frame

Closes #3403

jalvz · 2020-03-03T15:17:18Z

fyi @vigneshshanmugam

vigneshshanmugam · 2020-03-03T16:25:14Z

docs/spec/rum_v3_context.json

+                        }
+                    }
+                },
+                "framework": {


we might start sending these field since we already have framework specific information. The reason why they are not sent right now its mainly bcoz of payload size. Can we also shorten these fields

vigneshshanmugam · 2020-03-03T16:27:40Z

docs/spec/rum_v3_context.json

+                    "type": ["string", "null"],
+                    "maxLength": 1024
+                },
+                "runtime": {


these could be prefilled from user agent since that is the runtime for the RUM agent.

vigneshshanmugam · 2020-03-03T16:29:48Z

docs/spec/spans/rum_v3_span.json

                    "type": ["boolean", "null"],
                    "description": "Indicates whether the span was executed synchronously or asynchronously."
                }
            },
-            "required": ["duration", "name", "type", "id","trace_id", "parent_id"]
+            "required": ["d", "n", "t", "id","trace_id", "parent_id"]


trace_id and parent_id must be optional right? . Or will that be part of a separate PR?

yeah, that is job for another pr :)

vigneshshanmugam

Generally looks good @jalvz. Dont really belong here, Can we make this v3 spec backwards compatible? Newer versions of APM server would be able to understand the shortened vs unshortened fields (if users is using old versions of the agent)

jalvz · 2020-03-09T12:13:47Z

Here are some fields not shortened so far:

r.finished

r.headers_sent

r.headers

q.body

q.headers

q.http_version

q.socket

q.cookies

se.ephemeral_id

se.framework

se.environment

se.runtime

se.node

st.context_line

st.classname

st.library_frame

st.module

st.post_context

st.pre_context

st.vars

y.action

You mentioned framework, what about the rest? I am quite sure that some can be dropped, but wanted to hear from you

vigneshshanmugam · 2020-03-09T13:42:38Z

Fields that RUM would never use it in for the foresable future.

r.finished
r.headers_sent
q.body (Expensive to read stream in RUM, Might end up in memory leak if not done correctly)
q.socket
se.ephemeral_id
se.node
// Hard to get these below fields unless browsers exposes these (not going to happen for now)
st.vars
st.library_frame
q.cookies - Possible but totally depends on Same site cookie spec. Also this could be large and might pollute our payload. so we should try to not send this at all.

Needs changes

se.environment - se.e
q.headers, r.headers - RUM agent currently sets server-timing info on response headers. so if we can have specific fields and also accept wildcards, we can set headers to q.h and r.h and r.h.st for server-timing info.
se.runtime - can be autofilled from User agent by server
q.url - RUM agent currently sends context.http.url which is same as the request URL. We can delegate the work to APM server and can do the job of parsing the URL if its required.
y.action - Optional right now and we don't set it for all spans at the moment. we can shorten it to y.a. We might send it as external.http.post or external.http.get in future.

// Might be used by Server when souremaps is present

st.post_context
st.pre_context

jalvz · 2020-03-09T15:25:06Z

Awesome, thanks! I'll investigate the runtime bit, that would have to happen in a pipeline.

codecov-io · 2020-03-11T12:24:36Z

Codecov Report

Merging #3414 into master will increase coverage by 0.12%.
The diff coverage is 96.68%.

@@            Coverage Diff             @@
##           master    #3414      +/-   ##
==========================================
+ Coverage   79.29%   79.41%   +0.12%     
==========================================
  Files         109      109              
  Lines        5751     5772      +21     
==========================================
+ Hits         4560     4584      +24     
+ Misses       1191     1188       -3

Impacted Files	Coverage Δ
utility/data_fetcher.go	`63.27% <0.00%> (ø)`
model/metadata/metadata.go	`92.00% <71.42%> (-3.75%)`	⬇️
model/span/event.go	`85.31% <96.87%> (+0.17%)`	⬆️
beater/api/profile/handler.go	`85.47% <100.00%> (ø)`
model/context.go	`60.64% <100.00%> (+0.97%)`	⬆️
model/error/event.go	`96.72% <100.00%> (+0.01%)`	⬆️
model/message.go	`93.10% <100.00%> (ø)`
model/metadata/service.go	`91.89% <100.00%> (+0.11%)`	⬆️
model/metadata/user.go	`100.00% <100.00%> (ø)`
model/stacktrace.go	`87.50% <100.00%> (ø)`
... and 4 more

processor/stream/test_approved_es_documents/testIntakeRUMV3Errors.approved.json

jalvz · 2020-03-11T15:07:39Z

processor/stream/test_approved_es_documents/testIntakeRUMV3Events.approved.json

+                },
+                "id": "ec2e280be8345240",
+                "marks": {
+                    "a": {


ill create a ticket to define these in a follow-up pr

jalvz · 2020-03-11T15:10:22Z

processor/stream/processor.go

 				schema:       er.RUMV3Schema,
 				modelDecoder: er.DecodeRUMV3Event,
 			},
 		},
+		metadataSchema: metadata.RUMV3ModelSchema(),


The metricset model is missing here, I'll file a ticket for that to follow up in a separate PR

processor/stream/processor_test.go

simitt

The Intake JSON schema defines all of the fields that will be processed (with the exception of custom, which is not indexed). We explicitly allow additional fields to keep future versions of the agents compatible with older server versions. In case additional fields are sent, they are not processed and neither indexed. With these changes the Intake JSON schema for RUM only validates a part of the fields that are processed, as some fields were removed from the spec, but all Intake endpoints share the same model logic. Since the RUM endpoints are not protected, there is also no protection from someone else sending up fields processed but not JSON schema validated.
This might be a rather theoretical concern, so I am not asking for any related changes in this PR, but would like to get a discussion going about possible implications.

simitt · 2020-03-12T11:37:13Z

model/fields/rum_v3_mapping.go

+		}
+		return s
+	}
+}


How about checking shortFieldNames when creating the function rather than on every field name call? Something like:

func Mapper(shortFieldNames bool) func(string) string { if shortFieldNames { return func(s string) string { if shortField, ok := rumV3Mapping[s]; ok { return shortField } return s } } return func(s string) string { return s } }

I think is more complex and the performance impact is negligible. I can benchmark it if you think is worth it, tho.

I expect it to only be a couple of nanoseconds difference per call, when shortFieldNames == false, but it is called for every decoded field, which is potentially a couple hundreds for larger events. This won't mage a big difference, especially as no allocations are involved, but the suggested solution is fairly straight forward, so I would rather go with that and avoid unnecessary access of the map.

all things equal ("equal" meaning +-200ns), I rather chose the simplest implementation if you don't mind (or avoid unnecessary complexity, put it another way).

docs/data/intake-api/generated/rum_v3_events.ndjson

simitt · 2020-03-12T11:43:46Z

docs/spec/errors/rum_v3_error.json

@@ -16,7 +16,7 @@
                    "type": ["string", "null"],


Have you though about creating a dedicated docs/spec/rum_v3/ folder that contains all the specs? Not a requirement if you prefer to keep it as is, but I think it would be easier to track if there are still references to unshortened specs and generally easier to navigate.

I don't have a strong preference, but mixing 2 taxonomies could be more confusing... eg. rum_v3_transactions could still fit in either transactions or rum_v3, not sure why one would make more sense over the other...

It's the first time the endpoint and version are part of the spec name, therefore I think the suggested grouping makes sense. Just a suggestion though, up to you how to finally organize it.

model/fields/rum_v3_mapping.go

simitt · 2020-03-12T12:00:16Z

model/metadata/service.go

 	if input == nil || err != nil {
 		return nil, err
 	}
 	raw, ok := input.(map[string]interface{})
 	if !ok {
 		return nil, errors.New("invalid type for service")
 	}
+	field := fields.Mapper(hasShortFieldNames)


You could initialize the decoder with the mapping information decoder(fields.Mapper(hasShortFieldNames)), allowing to abstract away the mapper inside the decoding.

I tried that, but there are a lot of small functions (decodeHTTP, decodeService, decodeStacktrace, etc) that would need to be adapted to be methods instead, meaning that they would not be reusable (eg. duplicated metadata.DecodeService, transaction.DecodeService, span.DecodeService, etc.).
At the end thought it was not worth it and this produced the smallest diff.

processor/stream/test_approved_es_documents/testIntakeRUMV3Errors.approved.json

testdata/intake-v3/rum_events.ndjson

jalvz · 2020-03-12T14:45:44Z

This might be a rather theoretical concern, so I am not asking for any related changes in this PR, but would like to get a discussion going about possible implications.

So, for my understanding, what are those implications?

simitt · 2020-03-12T15:37:17Z

model/context.go

@@ -125,7 +125,7 @@ func DecodeContext(input interface{}, cfg Config, err error) (*Context, error) {
 	}

 	decoder := utility.ManualDecoder{}
-	field := fields.Mapper(cfg.HasShortFieldNames)
+	field := field.Mapper(cfg.HasShortFieldNames)


The var field is shadowing the package name now. How about changing the variable to mapper?

Yeah... amended

simitt · 2020-03-12T15:43:00Z

Created #3481 to bundle the discussion started above and not block this PR.

jalvz added in progress [zube]: In Progress labels Mar 3, 2020

jalvz force-pushed the shorten-rum-fields branch from be26e17 to 681a56f Compare March 3, 2020 15:18

vigneshshanmugam reviewed Mar 3, 2020

View reviewed changes

jalvz force-pushed the shorten-rum-fields branch 3 times, most recently from 08ead1c to 4090f73 Compare March 11, 2020 12:11

jalvz force-pushed the shorten-rum-fields branch from 4090f73 to 7854c89 Compare March 11, 2020 15:00

jalvz commented Mar 11, 2020

View reviewed changes

processor/stream/test_approved_es_documents/testIntakeRUMV3Errors.approved.json Outdated Show resolved Hide resolved

jalvz commented Mar 11, 2020

View reviewed changes

processor/stream/processor_test.go Outdated Show resolved Hide resolved

jalvz changed the title ~~[WIP] Shorten field names in the RUM V3 spec~~ Shorten field names in the RUM V3 spec Mar 11, 2020

jalvz added [zube]: In Review and removed [zube]: In Progress in progress labels Mar 11, 2020

jalvz requested a review from a team March 11, 2020 15:11

jalvz mentioned this pull request Mar 12, 2020

[RUM v3] Full golden test coverage #3476

Closed

simitt requested changes Mar 12, 2020

View reviewed changes

simitt approved these changes Mar 12, 2020

View reviewed changes

simitt mentioned this pull request Mar 12, 2020

Figure out implications of missing fields in JSON schema #3481

Closed

jalvz force-pushed the shorten-rum-fields branch from f55c541 to 5480fb2 Compare March 18, 2020 14:48

jalvz added 2 commits March 18, 2020 16:25

Update RUM V3 specs with short field names

ffb6a4c

Update changelog

efe8ca1

jalvz force-pushed the shorten-rum-fields branch from ec85678 to efe8ca1 Compare March 18, 2020 15:25

jalvz merged commit a40e214 into elastic:master Mar 18, 2020

zube bot added [zube]: Done and removed [zube]: In Review labels Mar 18, 2020

jalvz mentioned this pull request Mar 19, 2020

[7.x] Update RUM V3 specs with short field names (#3414) #3511

Merged

vigneshshanmugam mentioned this pull request Apr 27, 2020

Payload Optimisation phase I elastic/apm-agent-rum-js#768

Closed

jalvz mentioned this pull request May 11, 2020

Test (manually) the RUM V3 endpoint #3786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorten field names in the RUM V3 spec #3414

Shorten field names in the RUM V3 spec #3414

jalvz commented Mar 3, 2020 •

edited

Loading

jalvz commented Mar 3, 2020

vigneshshanmugam Mar 3, 2020

vigneshshanmugam Mar 3, 2020

vigneshshanmugam Mar 3, 2020 •

edited

Loading

jalvz Mar 9, 2020

vigneshshanmugam left a comment

jalvz commented Mar 9, 2020

vigneshshanmugam commented Mar 9, 2020 •

edited

Loading

jalvz commented Mar 9, 2020

codecov-io commented Mar 11, 2020 •

edited

Loading

jalvz Mar 11, 2020

jalvz Mar 11, 2020

simitt left a comment

simitt Mar 12, 2020

jalvz Mar 12, 2020

simitt Mar 12, 2020

jalvz Mar 12, 2020

simitt Mar 12, 2020

jalvz Mar 12, 2020

simitt Mar 12, 2020

simitt Mar 12, 2020

jalvz Mar 12, 2020

jalvz commented Mar 12, 2020 •

edited

Loading

simitt Mar 12, 2020

jalvz Mar 18, 2020

simitt commented Mar 12, 2020

Shorten field names in the RUM V3 spec #3414

Shorten field names in the RUM V3 spec #3414

Conversation

jalvz commented Mar 3, 2020 • edited Loading

jalvz commented Mar 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vigneshshanmugam Mar 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vigneshshanmugam left a comment

Choose a reason for hiding this comment

jalvz commented Mar 9, 2020

vigneshshanmugam commented Mar 9, 2020 • edited Loading

Fields that RUM would never use it in for the foresable future.

Needs changes

jalvz commented Mar 9, 2020

codecov-io commented Mar 11, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jalvz commented Mar 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simitt commented Mar 12, 2020

jalvz commented Mar 3, 2020 •

edited

Loading

vigneshshanmugam Mar 3, 2020 •

edited

Loading

vigneshshanmugam commented Mar 9, 2020 •

edited

Loading

codecov-io commented Mar 11, 2020 •

edited

Loading

jalvz commented Mar 12, 2020 •

edited

Loading