Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection specific info field #40

Closed
vbohata opened this issue Jul 8, 2018 · 11 comments
Closed

Connection specific info field #40

vbohata opened this issue Jul 8, 2018 · 11 comments

Comments

@vbohata
Copy link

vbohata commented Jul 8, 2018

Hello,

I often need to know (not only) the source ip address of the agent sending the forwarded logs to Logstash (the agent is beats or some syslog sender in some device or ...). Usually the logs contains information about some connections (web proxy logs, firewall logs), so "source" and "destination" fields should contain info about the logged connections inside logs itself. Field "host" can identify the originator of the logs (for example switch device). But I also need to know the IP of the log forwarder, so to what field should I put it? Maybe there should be another field named like "conn" for connection related info.

Consider following scenario: ORIGINATOR DEVICE -> SYSLOG FORWARDER -> LOGSTASH
I think in "host.ip" I should put ORIGINATOR DEVICE ip, in "source." and "destination." I should put some part of the log itself and in "conn" I should put IP of the SYSLOG FORWARDER, so the source IP of the connection made from SYSLOG FORWARDER to LOGSTASH.
The same applies for forwarded windows events logs and more ...

@willemdh
Copy link
Contributor

willemdh commented Jul 9, 2018

Maybe network.forwarded_ip?

@MikePaquette
Copy link
Contributor

@vbohata we defined the "device fields" to describe the entity that is actually forwarding the events to the Elastic Stack, when it is not the host on which the event happened. For example, an IDS, or network sensor, etc. In your case, the SYSLOG FORWARDER details should be mapped to the device.* fields.

Also, as you say, the ORIGINATOR DEVICE details should be mapped to the host.* fields, and the source.* and dest.* fields should capture the details of the log message.

Note that in many cases, some details in the host.* and source.* fields will be identical. This would allow any analysis content that uses either host fields or source fields to correctly identify the origin of the event.

Does this make sense?

@vbohata
Copy link
Author

vbohata commented Jul 10, 2018

The "device" field does not seem to be correct field, the name of this field is a little bit confusing for this use case. In doc I can see "TLS fields" according to the doc (The tls fields contain the TLS related data about a specific connection.) should describe some connection. So I think source IP of the connection should be under some another field (+ tls fields also). For example:

connection.source_ip
even for tls it make sense to put it in connection like: connection.tls.....

With this field it is more clear than with device.ip. In device.ip I do not know which IP it is. The device can have tens of IPs but I need to know the source ip used in current TCP connection. How can I handle the situation with multiple IPs?

@MikePaquette
Copy link
Contributor

@vbohata we agree that the device.* name is confusing, and we have been trying to think of a better name (suggestions welcome), but that is the field set that is intended to describe the attributes of the intermediary device that sends events to the Elastic Stack. Commonly, the device.ip would be populated with the management IP address of the sensor/SYSLOG FORWARDER.

There have been other discussions (e.g., #9) about creating a connection.* field set as an "anchor" for certain fields such as TLS and other flow-related or connection-related fields. Let's see how this develops.

The source.ip field should be populated with the actual IP address used as the source of the connection, regardless of how many interfaces a host may have.

@ruflin
Copy link
Member

ruflin commented Jul 12, 2018

The way I understand the above is that here syslog is the agent: https://github.com/elastic/ecs#agent In most cases I would think the agent ip is not relevant but for the cases where it is, it could be put under agent.host.ip. The host object is reused here under the agent object.

@MikePaquette
Copy link
Contributor

While there may be a syslog agent running on the host, the sensor/SYSLOG FORWARDER is a separate entity which should map to the device.* in the ECS model. This diagram shows what I mean.

screen shot 2018-07-12 at 7 56 35 pm

If we don't use agent.* as a top level field set, then I think it would fit better to have host.agent.* rather than agent.host.*

@ruflin
Copy link
Member

ruflin commented Jul 13, 2018

It seems in the above setup the Syslog NG Server is only forwarding the event. Do we really need data about it?

What if we have 2-3 hops in between instead of only one. Do we want to store all this information?

Note: For the above I think device.* does not fit as when I think of device I think of hardware.

@MikePaquette
Copy link
Contributor

It seems in the above setup the Syslog NG Server is only forwarding the event. Do we really need data about it?

Yes, we want to know a bit about the entity that is actually sending the data to the Elastic stack. In the case of a syslog server, we at least want to know about its IP address, identity, and location information. If it was a firewall or IDS, then we'd want to know more info, such as the rule set which was used to make a forwarding or detection decision. For example, an alert dashboard showing total event activity (e.g. network traffic, or intrusions detected) can be mapped to show which of my syslog servers is seeing the most activity, to help the analyst know how to take the next step in incident response.

What if we have 2-3 hops in between instead of only one. Do we want to store all this information?

I don't think so, but this is a good question. I was thinking that we capture just the entity that sends the event to the Elastic Stack, but what if we had an IDS, FW, or Bro sensor forwarding to a Syslog NG server? then I think the Syslog server is just an aggregator, and is less important. Better think a bit more about this one. Note, we are not including network device hops here, only logical entities involved in event creation or processing.

Note: For the above I think device.* does not fit as when I think of device I think of hardware.

As mentioned above, agreed that the device.* name can be confusing, and we have been trying to think of a better name (suggestions welcome). "Device" was chosen since this entity is commonly a firewall, network sensor, or intrusion detection system, all of which fit more nicely into the "device" category, and some existing schemas refer to this entity as a device. For example Splunk CIM uses dvc to refer to the IDS device that detected an intrusion, the DLP system that detects data leakage, etc. In this diagram below, we see how a network sensor, or IDS maps onto the device.* field set.

screen shot 2018-07-13 at 6 39 17 am

@vbohata
Copy link
Author

vbohata commented Jul 14, 2018

To clarify one of our use cases:

FIREWALL(or another network device like SWITCH or ROUTER or...) -----> LINUX SERVER running as network log concentrator, receives logs via rsyslog and stores them to local log files
on LINUX SERVER there is also filebeat which reads stored log files and sends them to Logstash.

so: FIREWALL -> LOG CONCETRATOR (rsyslog + filebeat) -> LOGSTASH

So in the reality the FIREWALL is hardware "device" (not log concentrator/forwarder). Here even probably the "host" field should be never used as the top level entity because the FIREWALL is some host ... and also LINUX SERVER CONCENTRATOR is the host.

Maybe the device could be used only with another custom second level field which should identify the device or the device type itself like:

device.originator.host.ip
device.relay.host.ip
To be able to use multiple relays, device.relay should be array, nested or ... I dont know :)

I use the same field names (originator, relay) as https://tools.ietf.org/html/rfc5424, why to reinvent the wheel.

@vbohata
Copy link
Author

vbohata commented Jul 14, 2018

And maybe another field device.collector.peer.host.ip to identify the source ip address which is used to send logs from relay device (can be different and sometimes it is usefull to know both or just device.collector.peer.host.ip).

@ebeahan
Copy link
Member

ebeahan commented Aug 4, 2020

All - do we feel like this issue is still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants