-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Threat Intel - Stage 1 #1127
Conversation
Merging Master - Stage 1
rfcs/text/0008-threat-intel.md
Outdated
* event.risk_score _risk score provided by threat intelligence source_ | ||
* event.original _raw intelligence event_ | ||
|
||
### Using existing ECS Fields nested under Threat.ioc.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about also adding the process
fields to the possibilities of being nested?
For example a given process.args
could be an IOC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally dropped this in slack, but I think it got lost.
on a bigger picture of "intel words and semantics", it makes sense to me that something like W32.Trojan.22gp.1201
as labelled perhaps from an AV engine or malware analysis feed (ala VirusTotal, ReversingLabs, etc) should be assigned as a rule.name
. The rationale being that there's clearly some sort of signature, whether atomic or behavioral.
Meanwhile, saying something is a variant of a given malware family is informative and takes raw information and transforms it into intelligence and better aligns with ontologies like STIX or MAEC. I'm proposing something like:
{
"file": { "hash": "abcdef01234567890...", "name": "totally_bad.exe", ...},
"rule": [ {"author": "ALYac", "name": "Generic.Malware.SPV!PKprn.5A432451", "update": "20200915"} ],
"threat": {
"malware": {"name": "WickedBadness.A", "family": "WickedBadness"},
"intrusion_set": {"name": "APT127", ...},
}
Now, that may not align where things are currently heading and I'm not trying to open pandoras box here... the above approach does a few things, IMO:
- This can allow me to enrich data coming from various sources (i.e. my VT feed, or file access data from an endpoint, or file transfer data from my proxy or IDS. The enrichment takes previous knowledge and tags this file, network, or process event with "threat" knowledge.
- Allows me to take a rule (IDS, AV, Elastic Signal, etc) and say that "X" indicates
threat.malware.name: "WickedBadness.A"
, taking normal time series events and making the detection actionable with why we care about said detection - If the document only included the subset of information to provides a
file.hash.sha256
andthreat.malware.name
, the hash is immediately actionable in how we apply filters in Elasticsearch becomes a pivot field that is shared with my logging sources directly. I can click a single visualization on a Kibana dashboard and drill into documents about a given file transfer and see in an adjacent visualization that the hash I clicked on is associated with a malware family or intrusion set.
Alternatively, nesting the known file threat indicators underthreat.file.hash
, we either use lookup detection rules (which we could do in either case), or we rely on a 3rd pivot field ofrelated.hash
, which duplicates data.
just sharing some thinking... would love to discuss more in depth when the time is right. Right now may be premature, and if so... sorry for stirring the pot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to distill this a bit, the IOCs should to be actionable entities - an indicator. So threat.malware.*
is context and descriptors of the IOC, but it's not an IOC in itself.
So while a lot of the fields are proposed under threat.ioc.
, threat.malware.{name,family,type}
makes sense to me to describe the indicator, which would likely be a hash value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this in depth with @shimonmodi and got some additional context. I think the threat.ioc.*
makes sense from a enrichment perspective, but I think as stored in the "threat intel" index, keeping the shared top-level ECS fields makes the data more actionable from a retro-hunt perspective. Namely, if we get intelligence about events after they pass through the enrichment pipeline, those events will miss the threat data. Keeping them in the non-ioc fields allows hunting using dashboards and discover. Copying the respective enrichment match to threat.ioc.*
keep the original document intact while also giving context of why we think this particular event is a threat.
I want to propose some hypothetical documents and how this might work. Visualizing what the data could be helps me better understand what we're trying to achieve. Here's an example process start event with file metadata (I added the file hash for the sake of for example):
{
"agent": {
"id": "0829aba6-34db-de36-1d42-30eac745e980",
"type": "endpoint",
"version": "7.10.0"
},
"process": {
"name": "svchost.exe",
"pid": 1644,
"entity_id": "MDgyOWFiYTYtMzRkYi1kZTM2LTFkNDItMzBlYWM3NDVlOTgwLTE2NDQtMTMyNDk3MTA2OTcuNDc1OTExNTAw",
"executable": "C:\\Windows\\System32\\svchost.exe"
},
"message": "Endpoint file event",
"@timestamp": "2020-11-17T19:07:46.0956672Z",
"file": {
"path": "C:\\Windows\\Prefetch\\SVCHOST.EXE-AE7DB802.pf",
"extension": "pf",
"name": "SVCHOST.EXE-AE7DB802.pf",
"hash": {
"sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
}
},
"ecs": {
"version": "1.5.0"
},
"data_stream": {
"namespace": "default",
"type": "logs",
"dataset": "endpoint.events.file"
},
"host": {
"hostname": "WinDev2001Eval",
"os": {
"Ext": {
"variant": "Windows 10 Enterprise Evaluation"
},
"kernel": "1909 (10.0.18363.1139)",
"name": "Windows",
"family": "windows",
"version": "1909 (10.0.18363.1139)",
"platform": "windows",
"full": "Windows 10 Enterprise Evaluation 1909 (10.0.18363.1139)"
},
"ip": [
"192.168.93.145",
"10.203.18.111",
"fe80::4402:db6b:fd75:544",
"127.0.0.1",
"::1"
],
"name": "WinDev2001Eval",
"id": "5baae4dd-4abe-4ba5-a0fb-704d6e7a4328",
"mac": [
"00:0c:29:76:2b:0a",
"00:ff:7b:c6:4a:64"
],
"architecture": "x86_64"
},
"event": {
"sequence": 186465,
"ingested": "2020-11-17T19:08:01.275480865Z",
"created": "2020-11-17T19:07:46.0956672Z",
"kind": "event",
"module": "endpoint",
"action": "creation",
"id": "LurtOw/d18obyz+u++++/boE",
"category": [
"file"
],
"type": [
"creation"
],
"dataset": "endpoint.events.file"
},
"user": {
"domain": "NT AUTHORITY",
"name": "SYSTEM",
"id": "S-1-5-18"
}
}
Meanwhile you have the following document containing threat intel:
{
"file": {
"hash": {
"sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
}
},
"threat": {
"malware": {
"name": "CryptoMinerX.B",
"family": "CryptoMinerX",
"type": "cryptominer"
},
"threat_actor": {
"name": "CryptoCurrency R Us",
"type": [
"criminal"
]
}
}
}
What I'd like to see then, is an enriched document like so (abbreviated):
{
"process": {
"name": "svchost.exe",
"pid": 1644,
"entity_id": "MDgyOWFiYTYtMzRkYi1kZTM2LTFkNDItMzBlYWM3NDVlOTgwLTE2NDQtMTMyNDk3MTA2OTcuNDc1OTExNTAw",
"executable": "C:\\Windows\\System32\\svchost.exe"
},
"message": "Endpoint file event",
"@timestamp": "2020-11-17T19:07:46.0956672Z",
"file": {
"path": "C:\\Windows\\Prefetch\\SVCHOST.EXE-AE7DB802.pf",
"extension": "pf",
"name": "SVCHOST.EXE-AE7DB802.pf",
"hash": {
"sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
}
},
...
"threat": {
"ioc": {
"file": {
"hash": {
"sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
}
}
},
"malware": {
"name": "CryptoMinerX.B",
"family": "CryptoMinerX",
"type": "cryptominer"
},
"threat_actor": {
"name": "CryptoCurrency R Us",
"type": [
"criminal"
]
}
}
}
DISCUSSION
Keeping the indicator in what I'll call the "actionable" ECS field (e.g. file.hash.sha1
), allows me to search across all documents, including my logs and threat data. I can apply a single filter on that hash field. I can then visualize them all in a common dashboard to find events that occurred prior to receiving new threat intel.
Copying the matched IOC to the threat.ioc
object in the enriched document answers the analyst question of "what specific information from this event indicates that it is a cryptominer?" It could have just as easily been the process arguments (missing from this example) or the path.
This approach may complicate using the enrich
processor on ingest. Namely, if we want to enrich the document with everything under threat.*
, it's not just a single lookup. You'd have to lookup, then rename some fields, which isn't awful, I think. Alternatively, the threat feed indexes could make file.hash.sha1
as an alias to threat.ioc.file.hash.sha1
. This would require an alias for each indicator type field.
Digging into the specifics of using the enrich processor, how would it handle multiple IOC matches? Say if we had an IOC that was {"user": {"id": "S-1-5-18"}}
, which is obviously concoted, but ideally, I'd like to see the merging of threat data into one. It might imply a criminal crypto mining threat actor, and also a nation-state threat actor that's leveraging the tool with some other attack pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an IOC is a "bad IP" e.g. the essential information is the IP. But in the IOC index, should the value be captured in source.ip, destination.ip, client.ip or server.ip? All we need to do is capture "the IP" itself, then we look for it in the appropriate places, depending on the event source.
In the threat index this information could be stored in threat.ioc.ip
This goes for all "IOC types" i.m.o.
Storing them under threat.ioc
in the threat index tells you the type of ioc and its value.
During enrichment you can then "add" the threat.ioc.*
fields to tell the story of what was the match.
I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.
As a start, existing ECS fields to be nested under threat.ioc
:
Potential other additions:
- Registry
- User
- DNS
- Process
It's an extensive list, however i do feel like all of these could be used as an IOC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During enrichment you can then "add" the threat.ioc.* fields to tell the story of what was the match.
I think this makes a lot of sense; knowing if the match was an "IP" or a "URL" or a "hash" is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcode @peasead - based on today's discussion with @MikePaquette our next step is to move all actionable IOC information in threat intel documents to the top level, and not nest them under threat.ioc.*
(as is currently proposed). we will be using threat.ioc.*
for file
, url
etc. for enrichment use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bullet list below (
file.*
,file.hash.*
,url.*
...) is useful to give us a general idea of the landscape of the types of IOCs. I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.
threat.yml
and the table should now have descriptors and fields requested.
https://github.com/elastic/ecs/blob/1347f2ba4c0ef6c00d5ffccee7fa2ad854f631d0/rfcs/text/0008/threat.yml
https://github.com/elastic/ecs/blob/1347f2ba4c0ef6c00d5ffccee7fa2ad854f631d0/rfcs/text/0008-threat-intel.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebeahan is going to look at how non-direction network artifacts should be reflected, taking ECS as-a-whole into account in addition to threat.*
and the enriched document.
@dcode I've been giving some thought to your proposal for how to use the This might already be solved, but I think we also need to find a way to indicate which field in the source document matched the ioc when there are more than one that could have . In your case it's clear that Anyhow, to aid my own understanding of your proposal, I've created this diagram. Can you review to see if it accurately portrays your proposal? (for now, I ignored the minimally enriched case of ingestion-time matching in this diagram.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In today's review I'm responding to a a few ongoing discussions, as well as pointing out a few small things that need adjustment.
One common thread across the discussions is that there will be two major usages of the fields. The fields used to describe IOCs in an "enrichment" index, and the fields that are appended to a live event that matches known IOCs. I think it may be useful to start fleshing out these two use cases in the "Usage" section. For example, are only certain fields meant to be copied to events, or all IOC fields?
rfcs/text/0008-threat-intel.md
Outdated
* event.risk_score _risk score provided by threat intelligence source_ | ||
* event.original _raw intelligence event_ | ||
|
||
### Using existing ECS Fields nested under Threat.ioc.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone for the thought going into this 🤔 👍
I like the symmetry of having the various fields that could be used to describe an IOC be directly the ECS fields. However I wonder if this would work in practice.
If an IOC is a "bad IP" e.g. the essential information is the IP. But in the IOC index, should the value be captured in source.ip
, destination.ip
, client.ip
or server.ip
? All we need to do is capture "the IP" itself, then we look for it in the appropriate places, depending on the event source.
We may have a similar problem with TLS or x509 certs, where in ECS we currently have fields to describe TLS exchanges, potentially mutual TLS, so tls.client.*
and tls.server.*
, but if e.g. a cipher or hash is bad no matter on which side it is, which field should be used to capture it in the IOC descriptor?
To get back to the symmetry of being able to filter for an IOC's indicator and get both the IOC descriptor and actual events show up, I'm not actually sure this saves users anything. In Elasticsearch, unlike SQL, we have to be explicit in which values we want to match. We can't say like in SQL "give me all events and indicators that have the same value in source.ip
" for example. If you have 10 fresh new bad IPs to test for, you'd have to explicitly make an include query for ["bad ip 1", "bad ip 2", ...]
.
The bullet list below (file.*
, file.hash.*
, url.*
...) is useful to give us a general idea of the landscape of the types of IOCs. I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to existing discussions, mostly around the usage of fields.
rfcs/text/0008-threat-intel.md
Outdated
* event.risk_score _risk score provided by threat intelligence source_ | ||
* event.original _raw intelligence event_ | ||
|
||
### Using existing ECS Fields nested under Threat.ioc.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an IOC is a "bad IP" e.g. the essential information is the IP. But in the IOC index, should the value be captured in source.ip, destination.ip, client.ip or server.ip? All we need to do is capture "the IP" itself, then we look for it in the appropriate places, depending on the event source.
In the threat index this information could be stored in threat.ioc.ip
This goes for all "IOC types" i.m.o.
Storing them under threat.ioc
in the threat index tells you the type of ioc and its value.
During enrichment you can then "add" the threat.ioc.*
fields to tell the story of what was the match.
I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.
As a start, existing ECS fields to be nested under threat.ioc
:
Potential other additions:
- Registry
- User
- DNS
- Process
It's an extensive list, however i do feel like all of these could be used as an IOC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to everyone for the great discussion throughout!
After reviewing the current doc and the accompanying discussions, I've noted a few items outstanding to help continue the progress. Very possible I overlooked something 😅 .
-
Is the proposed list of
threat.*
fields up-to-date with the most recent discussions/decisions? -
Including some mapping examples using the proposed fields would be very useful. For example, the classifications for the different types of IOCs that @webmat suggested ([RFC] Threat Intel - Stage 1 #1127 (comment)).
-
Can the outcome noted here be captured in the document?
-
The concern @MikePaquette raised is notable. Let's capture that as an additional concern.
Co-authored-by: Eric Beahan <[email protected]>
Co-authored-by: Eric Beahan <[email protected]>
Co-authored-by: Eric Beahan <[email protected]>
Co-authored-by: Eric Beahan <[email protected]>
Co-authored-by: Eric Beahan <[email protected]>
Two fields I listed in #1127 (review) made way into the example definitions as field reuses. Are there any other fields listed from the proposal here that we need to capture the intent to nest? |
Sorry about that @ebeahan , misunderstanding on my part. I got those cleaned up.
@ebeahan, we're zeroing in on this. |
We have a meeting next week to get a final answer, but as it sits now: For the threat intelligence and enriched signals indices, they should both be the same. Options:
|
Where this ended up:
Once there is better support for nested field types in Kibana, there will be a migration to |
Do we see this development affecting the timeline for this RFC's advancement? I imagine many users interested in Including the We're only targeting experimental support here, so I don't see these as items we must address now, though. Let's add both the decision from #1127 (comment) and these related concerns to the |
Due to a bad copy/paste, these were causing the YAML to be invalid.
Documenting additional concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
We do need to capture @devonakerr as the sponsor in the People
section. please.
I went ahead and addressed it. @devonakerr can you give this a final look? If all looks good, I'll update the advancement date and merge! |
Yessir, apologies for the 11th hour updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve, sero sed serio.
make test
?make
and committed those changes?Preview the RFC