-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profiles proto #488
Add profiles proto #488
Conversation
@petethepig we have just discussed this in the TC meeting. We would like to give you a separate repo in Otel github org, a fork of this one, where you can experiment freely with the profile protos. We will give you approving/merging rights on this new repo, so that you can also iterate faster and don't need to wait for approvals/merging from OTLP maintainers. You can also fork Collector and easily change it to consume proto file files from your fork instead of this repo and do your experimentation in the Collector too. When the experiments are concluded you can do a cleanup on your fork and then we will merge back the finalized version of profile protos back into this repo. What do you think? cc @jsuereth |
} | ||
|
||
message Sample { | ||
repeated uint64 location_ids = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if those should be int32/uint32 I don't think we'll overflow this specially since they refer slice/array indexes.
This is not super important for the wire format since proto will varint encode it but for the collector this will allow it to buffer more as the memory footprint will be lower per profiles.
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January 1970. | ||
// | ||
// This field is semantically required and it is expected that end_time >= start_time. | ||
fixed64 end_time_unix_nano = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wouldn't be better to have a duration instead. I also wonder what happens for cumulative profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 questions:
- Can you elaborate on why it would be better to have a duration?
- Why is this going to be different for cumulative profiles? I'm not following
Btw I copied this from trace proto, so there's value in keeping like this for consistency.
// fields from one of these will be embedded in Profile message: | ||
oneof alternative_profile { | ||
opentelemetry.proto.profiles.v1.alternatives.pprof.Profile pprof = 8; | ||
opentelemetry.proto.profiles.v1.alternatives.normalized.Profile normalized = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waiting to see some benchmarks, but I do think the normalized
is the way to go.
With a lot of samples (we can expect ~10k per profiles) it avoid duplications. It will most likely have a smaller memory footprint and faster marshalling/unmarshalling.
It's also closer to pprof which is quite popular for profiling data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyriltovena Explain the math behind that 10k, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should 'pprof' even be listed as an alternative here? It doesn't cleanly support features that we'd like and we can't modify it (it's no longer pprof if we do).
We can however have pprof be supported as a "pass-through" protocol that gets converted to the profiling format, together with JFR (e.g. original_payload
field).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhalliday picked couple of pprof in my dev cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christos68k Yeah, pprof was included as a reference so that we'd be able to easily compare it to other ones. Long term plan is to keep just one of these 3 formats, and based on the discussions so far it seems like it will be based on the normalized
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work @petethepig && @felixge 🙏
// The instrumentation scope information for the profiles in this message. | ||
// Semantically when InstrumentationScope isn't set, it is equivalent with | ||
// an empty instrumentation scope name (unknown). | ||
opentelemetry.proto.common.v1.InstrumentationScope scope = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstrumentationScope
is a logical unit for application code. Wouldn't this limit whole system profiles down to application profiles?
https://opentelemetry.io/docs/specs/otel/glossary/#instrumentation-scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this part from other signals, like traces or logs — so if this is truly a limitation then I think it should be discussed in the broader OTEL context.
I don't think it is a limitation though — instrumentation scope describes 2 things:
- instrumentation scope name and version
- attributes
I think it is intended that scope name and version would be the same for system-wide profile. RE attributes, the format in its current state is very flexible when it comes to assigning attributes, for example, clients can create multiple separate profiles within one ScopeProfiles
, each with it's own set of attributes, allowing for "per application" attributes within one instrumentation scope.
message Sample { | ||
// The ids recorded here correspond to a Profile.location.id. | ||
// The leaf is at location_id[0]. | ||
repeated uint64 location_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To uniquely identify a stack trace across a complete deployment, using a 128 bit value would be great. In one of the first meetings of the group, @thomasdullien presented our custom format and explained why we went with 128 bit identifiers instead of 64 bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this ID would be to reference locations inside this profile. Are you thinking about global addressing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes - this comment would also fit to Location.id
just like #488 (comment).
One of the motivations is really to avoid collisions in global addressing. Even if the same code is compiled, differences in the build systems/compilers are important and can make a difference.
int64 num_unit = 4; // Index into string table | ||
} | ||
|
||
message Mapping { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For postprocessing it might be great for the backend to have additional (optional) information. Additional information can be something like, if this mapping belongs to some interpreter.
Maybe something like:
repeated Label label = 11;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it to Location
as well. My logic here is that the same argument can be made for adding it to Location
, so I figured it makes more sense to include it there. I made it a reference to an AttributeSet
, same way the linking is done from Sample
.
@florianl what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct is also for the normalized.
I'd like to make sure that we always use the suffix _index
for things that are array indices, and reserve _id
for things that are likely hashes or other pseudorandom values. I'd also strongly push for using 128-bit values for _id
s, always and everywhere, lest we create issues for everybody when collisions arise. 64-bit values collide randomly at data quantities that are absolutely the norm these days.
// that is most useful to humans. There should be enough | ||
// information present to determine the original sampled values. | ||
// | ||
// - On-disk, the serialized proto must be gzip-compressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason to use gzip, when pareto-improved compression methods are available? (e.g. algorithms that outperform gzip compression at less CPU usage at every level?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pprof proto and I only included pprof proto as an example.
Either way, from my perspective, compression is something that is broadly applicable to the whole OTEL / OTEL collector universe and therefore we should bring these kinds of things up to the broader OTEL collector maintainers group.
// The instruction address for this location, if available. It | ||
// should be within [Mapping.memory_start...Mapping.memory_limit] | ||
// for the corresponding mapping. A non-leaf address may be in the | ||
// middle of a call instruction. It is up to display tools to find |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to specify precisely what "in the middle" means? I'd prefer not to give implementers the choice between 3 different values...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment comes from pprof proto. My understanding is that this comment refers to the limitation of some profilers where they are not always able to get exact instruction address for a given location. So it's less about what the implementer should put into this field and more of a word of caution for implementers of display tools. I might be totally wrong though — this is getting a little outside of my area of expertise, so I will need help from someone.
// should be within [Mapping.memory_start...Mapping.memory_limit] | ||
// for the corresponding mapping. A non-leaf address may be in the | ||
// middle of a call instruction. It is up to display tools to find | ||
// the beginning of the instruction if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Display tools may not have the ability to find the beginning of the instruction unless the "in the middle" above is specified precisely.
// for the corresponding mapping. A non-leaf address may be in the | ||
// middle of a call instruction. It is up to display tools to find | ||
// the beginning of the instruction if necessary. | ||
uint64 address = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to specify exactly how deep into the call instruction the address is allowed to be, display tools do not have access to the assembly bytes and hence cannot do more than simple arithmetic to find the beginning of the instruction.
// The id of the corresponding profile.Function for this line. | ||
uint64 function_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an index into the other array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should rename _id
to _index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes a lot more sense — changed it to _index
and _indices
everywhere.
// The number of events between sampled occurrences. | ||
int64 period = 12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the comment be clarified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is sample rate basically. I'm not going to edit this one because this is copied from pprof spec. I will make sure we document it well in the final otel proto spec though.
// Unique nonzero id for the location. A profile could use | ||
// instruction addresses or any integer sequence as ids. | ||
uint64 id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an argument to be made for a 128-bit id here, as follows:
- Instruction address cannot be used as an ID, as in JIT scenarios, the same address can be occupied by different pieces of code; likewise in a scenario where modules are loaded / unloaded.
- The way to encode things without keeping too much state is therefore hashing other parts of the code into the ID.
- 64-bit IDs are too short for hashing in the profiling case, as the odds of collision are uncomfortably high even for 100m items (0.02% collision probability even at 100m items, 2 in 10000).
// The id of the corresponding profile.Function for this line. | ||
uint64 function_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that an ID or an index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index in that profile.
// Unique nonzero id for the function. | ||
uint64 id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that an ID or an index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pprof it is an ID. And there is some confusion right now because usually IDs and indexes are identical, but not always. The plan for the new spec is that we're going to remove ids and rely on indexes. This way there's one way to address objects and so there's going to be:
- less confusion
- better performance (array lookups are faster than search for ID)
// Copyright 2019, OpenTelemetry Authors | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cut & paste issue about Copyright year.
|
||
message Sample { | ||
repeated Location locations = 1; | ||
repeated Link links = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be repeated?
|
||
message Sample { | ||
repeated uint64 location_ids = 1; | ||
repeated uint64 link_ids = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embed these in attributes?
} | ||
|
||
message Sample { | ||
repeated uint64 location_ids = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experiment with storing these as a tree (encoded as 2d array of integers)
// * alloc_bytes | ||
// * inuse_objects | ||
// * inuse_bytes | ||
repeated fixed64 values = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte-seconds might overflow when aggregated, consider using double floats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be fixed64
|
||
|
||
// borrowed from pprof proto | ||
message Mapping { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Florian: should stay for profiles that don't have symbols
// it could be the contents of the .note.gnu.build-id field. | ||
int64 build_id = 5; // Index into string table | ||
|
||
// The following fields indicate the resolution of symbolic info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change these to 1 enum or a bit-field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aalexand Can you provide the list of combinations that are valid? I remember you mentioned that not all 16 combinations are valid, but also 4-way enum won't cut it either, so maybe we can make this an enum with more than 4 but less than 16 combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the combinations encountered in practice are these - see below.
has_functions | has_filenames | has_line_numbers | has_inline_frames | Description |
---|---|---|---|---|
true | false | false | false | A C++ / native binary with just symbol table (.symtab), no DWARF. |
true | true | true | true | A C++ / native binary symbolized using profiler-friendly (i.e. at least -gmlt ) DWARF. |
true | true | true | false | Full symbol information but no distinction of inline frames - e.g. from Java. Can also correspond to an exotic case of using separate file DWARF, but not providing the *.dwp file. |
false | false | false | false | Unsymbolized mapping |
I would appreciate if someone also familiar with the subject could verify this from their perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aalexand thank you! I was trying to make an enum out of this, and here's what I have so far:
enum SymbolicInfo {
SYMBOLIC_INFO_UNSPECIFIED = 0;
SYMBOLIC_INFO_FULL = 1;
SYMBOLIC_INFO_FUNCTIONS_ONLY = 2;
SYMBOLIC_INFO_NO_INLINE_FRAMES = 3;
}
As I was doing this one thing I thought of is do we need these booleans / enums at all? My understanding is that these booleans are useful when combining a profile without symbols with symbols information (e.g in DWARF format). But I would assume that when you have symbols in some different format (let's say DWARF), you already know if it has functions / line numbers / etc. So it feels like maybe we don't need these booleans / enums. Unfortunately, my understanding of all of this is still pretty limited, so I would love your input on this.
Btw, I checked google/pprof repo, and it looks like out of these 4 booleans, the only one that's meaningfully used is HasFunctions one, and even then there's only one mention (symbolz.go
). The other fields are just copied back and forth but not actually used it seems like.
cc @korniltsev
TODO: add a design decision log? |
@tigrannajaryan I created https://github.com/open-telemetry/opentelemetry-proto-profile for the profiling group to own and experiment with. So far I only add @Rperry2174 as a maintainer. I would add @petethepig but he's not a member of the OpenTelemetry group yet (please follow this: https://github.com/open-telemetry/community/blob/main/community-membership.md#community-membership, I can vouch for you as one of your two) |
@petethepig please re-create this PR in https://github.com/open-telemetry/opentelemetry-proto-profile I am going to close this PR and you can continue discussing it in the new repo. |
This PR is moved to a dedicated opentelemetry-proto-profile repo. Please start new conversations in that new repo.