-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/windowseventlog] Decouple rendering logic from 'raw' #34131
Comments
Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@djaglowski my understanding is that this can also help with issues like #32952, correct? |
I suppose by delaying parsing logic until later it may be easier to offer alternative parsing logic. Is that what you're getting at or something else? |
Yes, my assumption is that by delaying parsing logic and ensuring that we can handle all data via OTTL there won't be the need to keep changing formats as suggested in #32952. |
I think that the windows event xml is complex enough that there's value in having a dedicated parser for it, kind of like the new container parser but if it can be replaced by granular parsing operations then I'm not necessarily opposed. |
Looking into this a bit more, it's not entirely clear to me that it is possible to parse the raw format. The problem is that both raw and formatted logs are created from syscalls which require an event handle, but the event handle is not available after we emit the log record. There might be some way to recreate the equivalent logic, but my interpretation at this point is that the events may need to be rendered in the receiver. This would mean that users needing both formats (e.g. for different backends) must set up two receivers to read the same data. Perhaps a better alternative here is to allow one payload to carry both formats (likely by replacing the @pjanotti, do you reach the same conclusion or am I missing a way to postpone parsing? |
No, @djaglowski you are not missing a way to postpone that. I was with the mental model of ETW events, but, here we are dealing with Event Logs. For Event Logs, the API only gives the opaque handle and returns XML from it. The XML has the same schema for raw and formatted, the raw just don't have the Same event, as body of raw and formatted: <Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='otelcorecol' />
<EventID Qualifiers='0'>1</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime='2024-08-08T22:20:04.7760478Z' />
<EventRecordID>3005055</EventRecordID>
<Correlation />
<Execution ProcessID='36124' ThreadID='0' />
<Channel>Application</Channel>
<Computer>MyComputer</Computer>
<Security UserID='S-1-5-21-1783686499-2158177463-2193993347-31799' />
</System>
<EventData>
<Data>Creating event provider for 'otelcorecol'</Data>
</EventData>
</Event> <Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='otelcorecol' />
<EventID Qualifiers='0'>1</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime='2024-08-08T22:20:04.7760478Z' />
<EventRecordID>3005055</EventRecordID>
<Correlation />
<Execution ProcessID='36124' ThreadID='0' />
<Channel>Application</Channel>
<Computer>MyComputer</Computer>
<Security UserID='S-1-5-21-1783686499-2158177463-2193993347-31799' />
</System>
<EventData>
<Data>Creating event provider for 'otelcorecol'</Data>
</EventData>
<RenderingInfo Culture='en-US'>
<Message>Creating event provider for 'otelcorecol'</Message>
<Level>Information</Level>
<Opcode>Info</Opcode>
<Keywords>
<Keyword>Classic</Keyword>
</Keywords>
</RenderingInfo>
</Event> As you can see the formatted carries all the info from the raw one. I don't see a scenario that someone needs both. |
Thanks for the detailed example @pjanotti.
There are indeed times when both are needed. In short, it is necessary because some backends require the raw xml bytes, while others prefer that it be parsed ahead of time. I've described such cases in more detail here when arguing for a dedicated original body field (which ended up being a semantic convention). The examples you gave show important distinctions in terms of the information conveyed, but I'm more focused on the format in which the information is represented. The difference between raw and formatted is quite substantial in this regard. A raw log body is a |
Ah, I think I got it now @djaglowski In this case, my first reaction is to treat the XML as the raw log body, no matter if it contains the With the unified formatted body we could also use your suggestion to, optionally, pass the original XML (with or without |
Great, that's sounds just right to me. Circling back to the configuration then, do you think we should have a
This feels a little more complex but I suppose it gives users the option to emit a formatted body without having to make the syscall to retrieve the rendering info. |
The current option
This is a larger change to current settings, but, I think it is more comprehensible and better reflects the options. |
I like your design. I think we could migrate to it relatively painlessly as long as each of the current behaviors has an equivalent to some combination of the settings. Even better if defaults for the new settings achieve the current default behavior. |
Circling back to the original proposal here, I think it should be possible to move "parsing" into a seperate stanza operator, and eventually a processor or OTTL function. The input to this parser is an xml string, which can be unmarshaled into the Incorporating this into the design in your most recent comment, this would mean |
My understanding of the stanza operator is superficial, but, it seems reasonable: the lower level is always the XML, a stanza operator could transform it into the Some Qs:
|
I am imagining that
I think there should be a bool that automatically moves the original xml to |
Sounds reasonable to me @djaglowski |
…simplify internal logic. (open-telemetry#34720) **Description:** This PR contains several changes described in open-telemetry#34131. It does not go as far as breaking out a separate parsing component, but I think it is enough to satisfy the known use cases. - Add `suppress_rendering_info` parameter, which acts orthogonally to `raw` flag. - Remove `RemoteServer` field from `EventXML`. Instead, set `attributes["remote_server"]` if remote collection is used. **Link to tracking Issue:** Resolves open-telemetry#34131
Component(s)
receiver/windowseventlog
Is your feature request related to a problem? Please describe.
The receiver currently forces users to choose between raw xml events or parsed. There are cases where users may need both. (See open-telemetry/semantic-conventions#1217 and open-telemetry/opentelemetry-specification#3932.)
Describe the solution you'd like
Instead of forcing users to make a choice between raw or parsed, I proposed that we should standardize on raw within the receiver and separate the parsing functionality. Parsing can be provided as both a stanza operator and OTTL function.
Suggested migration path:
raw
flag on the input operator to control whether this parser is embedded and used within the input operator. At this point there has been no change to user-facing functionality.wel.alwaysRaw
) controlling whether the raw flag may be used at all. In alpha stage, the flag may still be used.raw
parameter. It may still be used, but requires disabling the feature gate.Later:
attributes["log.record.original"]
(once the semantic convention is released)Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: