-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Top level: "client" and "server" #63
Comments
Indeed, we've been pondering that already, after @robcowart's comment in this thread. I wonder with this strategy, which side is being called "server", in cases where the infrastructure under management is calling to the outside (e.g. calling an external API, triggering webhooks to arbitrary customer endpoints). In common parlance, my node generating the event would be the "client", and the remote (which I may or may not manage) would be the "server". Parallel to this, it may be worth mentioning that for other reasons, we're starting to discuss doing classification of the IPs (local, private, public, multicast), which may help figure out which side is which. |
Another note about this, more applicable for security, but also when monitoring network gear. Who's the client and who's the server, if the flow event comes from an agent that's sitting in between? :-) |
Basically a server provides a service (a port or group of ports). Clients connect to those services. A server will only respond to a client. It doesn't initiate conversations. Conversely a client only listens for responses. It doesn't listen for arbitrary connection requests. The determination of client and server can be quite tricky if you don't have a record of the initial packet transmitted (such as the SYN packet sent to initiate the TCP handshake). 20 years ago you could be >90% accurate simply by assuming the lower port value is the server and the larger value is the client. However with so many applications now listening on higher ports (e.g. ES 9200, LS 9600, Kafka 9092, etc.) you will get at best about 65% accuracy with with this method. Many log sources are bit more authoritative in this regard than flow records. Basically there isn't a single method that works. A combination of data source specific methods that arrive at a consensus is usually necessary. With the solution we provide to our paying customers, we find that we are about 95% accurate out-of-the-box. With some tuning (it can be customized) 98-99% is possible. @webmat can you provide a more specific example of what you are referring to regarding an "agent"? |
I will also add that local/private/public isn't much help when determining client/server. However reserved multicast and broadcast IP and MAC addresses will always be associated with the server end of the conversation. This is one input for the "consensus" method we use. |
The last point I will make is that it is not an either/or situation. While client/server is the preferred perspective for most use-cases, src/dst is needed for some types of threat detection. Consider a few security related analytics scenarios...
So depending on what we are looking for our analytics configuration will sometimes use src/dst and sometimes use client/server. |
The host running the API service is also generating logs. From that service's perspective, it's the In the events coming from your server, its the |
Whoever got the first SYN is the server. Generally, the lower port. [edit: our pcap drop rates are really low, but not zero, so we might miss that SYN. See @robcowart comment. also also, with UDP you don't even get that. For a UDP service, unless you do protocol inspection, you can't really know whether the packet you saw was the request or the answer.] The agent in the middle may not be able to tell, though. When we map |
You are so right! |
Honestly, I don't expect a flow's interpretation of Most of the time we're going to use those tags, we're applying them to logs coming from things like web servers. From inside a web server's event feed, the It just makes a little space. |
And for cyber reasons, having all of your servers (on whatever boxes) call all of their clients |
I agree with you @ave19, for some data sources the client and server are clear. We still set source and destination fields, but will also set something like |
@robcowart interesting... we have lots of different kinds of feeds, so lots of different parsing logic. most of the time, we can go straight in to the |
Although I can definitely understand your points @ave19, for me source and destination are more clear and less Confusing / ambigous then client and server. When 2 applications are exchanging data through an esb, id really prefer to be able to use source and destination objects in the esb logs. But then again I'm a system engineer, not a network engineer. |
@willemdh I hear you, but think about UDP. Or think about DNS in particular. One system sends a request to another, and the other answers it. Do you swap |
@ave19 Ok, I can defintely use the client / servers approach for F5 / Palo Alto use cases. So, looking at #51 this is were ECS would go then
Shouldn't we move |
Thanks for all the discussion above. My take away so far is that What if we have all 4? I personally like adding |
Heh, um, at the risk of scuttling my own topic: I was poking this today and decided that To be clear, this is mostly about logs coming from that running instance of the service (ie apache). That The logs from that service will leave artifacts that allow me to collect information about the |
@robcowart what I meant by "agent" was simply a monitoring agent like Packetbeats. Perhaps a misnomer, because in some cases, the event source will be a device itself being poked from the outside. But I just meant whatever was collecting the traffic event data. Given the current consensus of how tricky it can be to reliably determine who's the server & who's the client, I think we don't have a choice but to keep source & destination. Then in cases where we can reliably determine server/client, we can add the appropriate fields. Or were you actually removing src/dst whenever you got to reliably determine srv/cli? |
As I mention above both can be valuable, depending on what you are trying to determine. I mentioned in another issue, that I would prefer to have src/dst and then a flag field like Until there is more flexibility I will continue to tell myself "disks are cheap" and will value functionality and great user-experience over a few extra HDDs. |
I agree, both are valuable. I also agree disks are cheap! In a network flow monitoring situation, the only thing you can really reliably know is When will we get per document field aliasing? 😄 That would be the best scenario. In a service monitoring situation, if the service has an open port and responds to queries, it's a straight |
I like where this is going, but to throw a wrench in it, I'm a huge user of Bro data (and also Suricata). I like the connection top-level object concept, but bro tracks "client" and "server" a little differently, as does Suricata. Bro calls whoever initiate the TCP/IP connection the "originator" and other system in the conversation the "responder". Going a layer deeper, Bro will analyze the protocol, and for something like HTTP it will record the "originator" and "responder" of that protocol. In most common protocols, the originator is the same at the TCP/IP layer and the HTTP layer. In several protocols it's not guaranteed, like SMTP or FTP. In those protocols, it's completely possible that the "responder" of the TCP/IP connection initiates the protocol as the "originator". All that said, I think it makes sense to manage "connection" data at only the TCP/IP layer (or equivalent transport protocol). If there's protocol specific information that confirms direction of the application protocol, that can be recorded in a protocol-specific subobject (i.e. Note, that I'm not trying to get into a religious war against client/server and originator/responder. I think for the purposes of ECS, it's equivalent. Also of note, Suricata uses All that said, I'm in favor of (where the semantics of packets/bytes mean that endpoint sent it):
Under any case, if I'm receiving packets via a tap or span port, I have no idea which direction that's going (inbound vs outbound). EDIT: Added example data |
@dcode I use client/server determination with Suricata data here... I would appreciate hearing your feedback on how it is handled, and whether you see any issues. |
This is not feedback, this is just a more precise pointer ;-) Client vs Server code starts at line 601 here See also various places between lines 209 to 468 to see the traffic locality determination. |
One thing I like about it is that it's entirely based on information taken from the event itself (including some fast translate-based enrichment). It doesn't depend from doing an ElasticSearch search per event. |
@ave19 To answer your question on aliasing, here's the progress so far. The concept of alias is available in recent builds, but still incomplete (in my opinion) for what we're trying to achieve. So if you use a recent build of ElasticSearch, you can search in Kibana -- and even leverage the new auto-complete -- based on your "original" field just as much as your alias. What's still missing is the ability to display based on the alias' name. Your visualizations and API results will only contain the original field. I haven't checked yet if the response includes a mapping of the aliases, so clients could handle this however they want. I suspect the alias mapping is not returned yet either. |
Sounds like progress! Thanks
On Mon, Aug 6, 2018, 11:14 AM Mathieu Martin ***@***.***> wrote:
@ave19 <https://github.com/ave19> To answer your question on aliasing,
here's the progress so far. The concept of alias is available in recent
builds, but still incomplete (in my opinion) for what we're trying to
achieve.
So if you use a recent build of ElasticSearch, you can search in Kibana --
and even leverage the new auto-complete -- based on your "original" field
just as much as your alias.
What's still missing is the ability to display based on the alias' name.
Your visualizations and API results will only contain the original field. I
haven't checked yet if the response includes a mapping of the aliases, so
clients could handle this however they want. I suspect the alias mapping is
not returned yet either.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#63 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AICmGo3oFxHL7XAD3ZbYuW9YBKW2NnLpks5uOHlogaJpZM4VqsO1>
.
--
…-ave
|
This discussion triggered a more general question on my end on what our "standard" is to reusing / composing objects. I opened an issue related to it here to not mix it with this discussion here: #71 |
+1 for having server, client, source and destination. I can imagine some application logs may require all of them (for DHCP for example). Also web application logs contains client and server (source and destination is quite odd use here). |
So, pre 1.0-beta, I implemented as much ECS and ECS-friendly items as I could for RockNSM. In light of a firm decision, I went with Now, understandably, having IPs in different fields makes it more difficult to build dashboards and such. In my final logstash enrichment for generic ECS data, I added an additional field called I'm not proposing we make |
I love the idea of supporting |
Hey! We do a lot of network flow work. We have a sort of issue using "source" and "destination" because flow data comes in both directions and we get records for each. The data for a single session might look like:
So that's a problem for us. The concepts of source and destination really only apply on a packet scale anyway. We'd like to normalize both of the records into:
This would also sort through things like DNS requests and other services that open a port.
Thoughts about that?
The text was updated successfully, but these errors were encountered: