-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for more straightforward network metrics fields. #179
Conversation
Mathieu, I disagree with your contention that the use of inbound and outbound change in the context of the connection direction. In normal network usage, including when looking at a host, the direction of the connection setup is distinct from the direction of the traffic. What you are measuring here is traffic: from the POV of the measured device, did it send the packet or did it receive the packet? The direction of the connection (i.e. who initiated it, and who was listening for a connection) is irrelevant. I find the new definitions much more vague and confusing. |
@farrp The issue we had with the inbound/outbound terminology is that to several people on the team intepreted them as "inbound/outbound to my network" or "inbound/outbound to my service". That is generally harder to establish, and we can't know in Packetbeat, for example. From what I understand, that wasn't actually the intention of the fields in ECS, but the chosen names made it seem that way. We considered several alternative options, for example |
@farrp Your answer seems to focus more on the host-based monitoring point of view. The situation is different for a network device (e.g. router or firewall), where packets are received and passed along, so they are essentially both inbound and outbound :-) One could say that only the total should be stored in this case, but this approach loses information. Which side is generating the most traffic? (e.g. small request that triggers a big transfer, such as a download vs big request like an upload, that triggers a small response). The proposed approach supports storing the details of both sides trivially. I agree that the more low level metrics may make it harder to figure out the big picture. For example, when inbound/outbound can be accurately determined for a given organization's situation, it will be more straightforward to figure out total inbound/outbound traffic, for example. Nothing prevents someone from determining inbound/outbound however they like (host-based, network boundary-based), adding that to their events in addition to these raw fields. ECS is different from most other schemas, in that people can add fields around the "official" fields :-) |
@webmat - not at all. My focus is actually more network-centric than server-centric. From the perspective of a network interface on a router, switch, server, whatever - the concept of receive and send are very straightforward and never change context. This appears to be what you are measuring so I fail to see the source of the confusion. You seem to be conflating the concepts of session and traffic. A session connection has a direction (from initiator to listener) and a relationship (client-server or peer-peer), but bytes and packets flow either in or out regardless of the session characteristics. |
@tsg I do get the difficulty in the packet beat scenario since it is watching the middle of a session. In that case though the source and destination make even less sense, unless you put it in the context of a session... but then what about ICMP and UDP? I agree you have a problem in this case with send and receive, but don't throw away perfectly good terms for one specific use case. In all situations except packet beats the terminology works fine. |
It's not just Packetbeat, it's the same for Suricata, Zeek, and any other tool based on capturing the traffic. Also, consider for example the traffic between two Docker containers on the same host. Is that incoming or outgoing? It's really neither, it's "internal" from the host PoV, so using incoming/outgoing is going to be a source of confusion. Depending on what you consider the network border, the same can be applied to the communication between internal hosts, switches, etc.
Not sure I understand this, the new fields are not tied to sessions/connections any more than the previous fields were. In case of uni-directional traffic, only
We're just making a rename of |
I like the new approach. I'm looking at this primarily from a network-centric (i.e. network sensor via tap) point of view. As has been discussed in other issues, Even on an endpoint, when you have the netstat table, there's still a It all comes out in the wash when using |
I went back into source and looked it all in context and I realize I misunderstood the original comments. My apologies. |
Thanks for your feedback, @dcode. Yes, we realize that even if this may create duplication, adding client/server besides source/destination will likely simplify the consumption of the data later on. So this is still under serious consideration. Let us know what comes out of your experiments in tagging the local side of the connection. This is also something we need to look into, and find a good way to represent. |
I like I'm not as keen on ditching the |
For tools dealing with a bidirectional streams, but having no exact idea about originator/inbound/server, neither really makes sense and is always up for interpretation. E.g. packetbeat flows use source/destination, but it has another field to identify the originator of the current stream (at least internally), because it doesn't really know the difference between any of the 2 addresses. Field names like source/destination/inbound/outbound/client/server also imply some kind of known 'direction' or 'order'. This is where the confusion in packetbeat comes from. Packetbeat treats streams as bidirectional, so to have both directional stats in one document (some kind of denormalization). It has no real idea about 'order', but rather tracks both addresses as endpoints (which from a 'session'). Unfortunately we didn't use There seems to be a preference in one or the other naming depending on actual use-case in mind (kind of device/network/software in use). Do we have a collection of these use-cases + proposed namings (+ documentation why)? Even if something sounds natural to you (for now), there will always be someone being confused if the scenario/deployment type is unknown. E.g. I really like the idea of inbound/outbound (or local/remote on a network level), but the order of these might be very well different from client/server or source/destination (on the connection level). |
@dcode I'm having a look at mapping the Zeek conn log to ECS. Question: how do you populate the ECE fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this should work well for unidirectional and bidirectional flow.
@MikePaquette Zeek/Bro actually uses source and destination in most of the detection types of logs and also unified2.log. It's used instead of (or in the case of notice.log to supplement) originator and responder for signature events in the signature.log, traceroute.log, notice.log, and notice_alarm.log in order to specify the direction of the connection that the triggered event took place. And I do add these to the |
@robgil @MikePaquette @ruflin I'm requesting each of your review, to make sure this is in line with everyone's expectation, based on recent discussions. Does this work for you? I'd like to merge this in this week. |
b8d2f86
to
524e7c4
Compare
@@ -5,6 +5,11 @@ All notable changes to this project will be documented in this file based on the | |||
|
|||
### Breaking changes | |||
|
|||
* Rename `network.total.bytes` to `network.bytes` and `network.total.packets` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about this breaking change.
@andrewkroh How much will this affect packetbeat / auditbeat?
@webmat Did you check how much this effects Metricbeat / Filebeat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn’t previously used by Auditbeat or Packetbeat AFAIK. I have
elastic/beats#9121 to use the new network.bytes in packetbeat.
@MikePaquette @webmat @robgil With this PR we would break our previous promise not to introduce any further breaking changes. Are all of you aware of that? |
@ruflin The current state is pretty difficult to use, since it's using the terms inbound/outbound, which mean different things in different situations, and cannot be used in some other situations (e.g. networking device reporting on internal traffic). So in a sense yeah this is a breaking change vs the first Beta of ECS. However I would argue that this part of the spec was not really useable. I'd rather do this breaking change now, while we're still in Beta. |
524e7c4
to
60bfae7
Compare
Prior to this PR, the network metrics are defined like this:
network.inbound.bytes
network.inbound.packets
network.outbound.bytes
network.outbound.packets
network.total.bytes
network.total.packets
Discussions around ambiguity of inbound/outbound naming have come up multiple times,
more or less directly in #2, #51, #63, and at various other times internally.
The issues with inbound/outbound metrics:
has to change, depending on whether the host is receiving the connection inbound,
or initiating a connection outbound.
can usually be determined accurately, but the device has nowhere to store
metrics about internal only traffic.
to store metrics about the traffic.
The solution proposed here is to store metrics in fields that carry less ambiguous
meaning.
source.bytes
= sent by sourcedestination.bytes
= sent by destinationnetwork.bytes
= totalsource.packets
= sent by sourcedestination.packets
= sent by destinationnetwork.packets
= totalThe case where source and destination cannot be accurately determined is still
not fully addressed, other than setting
network.direction:unknown
. It may beuseful to eventually allow for storing the heuristic used to define which end
was assigned the name "source" and "destination".
This PR introduces breaking changes by removing the old fields.
network.total.bytes/packets
tonetwork.bytes/packets
,to be as concise as the other new fields.
populated when the situation permits. If they can be populated reliably in a
given environment, they may help at visualizing inbound/outbound more directly.