Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat] VirusTotal Livehunt dataset - WIP #21815

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7270a44
Adds virustotal module for livehunt notifications
dcode Oct 1, 2020
62cab07
initial docs
peasead Oct 15, 2020
422240f
added dashboard
peasead Oct 15, 2020
74f346f
Change results to nested field
dcode Oct 16, 2020
fffecb4
updated dashboard
peasead Oct 16, 2020
59c7ecb
spelling fixes
peasead Oct 16, 2020
1853f56
Adds VT dashboard and related viz
dcode Oct 16, 2020
8ef284c
Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…
dcode Oct 16, 2020
60f0b3c
Adds sample data for testing
dcode Oct 16, 2020
17339c3
Adds virustotal module for livehunt notifications
dcode Oct 1, 2020
0d5e31a
initial docs
peasead Oct 15, 2020
cc06e0b
added dashboard
peasead Oct 15, 2020
bf11b7c
Change results to nested field
dcode Oct 16, 2020
29bd99f
Adds VT dashboard and related viz
dcode Oct 16, 2020
6d04c5e
updated dashboard
peasead Oct 16, 2020
3a7bbbc
spelling fixes
peasead Oct 16, 2020
59fdfe4
Adds sample data for testing
dcode Oct 16, 2020
da2d1db
re-exported dashboards using dev tools
dcode Oct 19, 2020
819be31
Move raw logs to correct place
dcode Oct 19, 2020
9675f2e
Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…
peasead Oct 20, 2020
d27f4ef
Renamed test data to `.log`
dcode Oct 22, 2020
e1671e3
Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…
dcode Oct 22, 2020
de1a9d4
Updated CHANGELOGs
peasead Oct 22, 2020
76cb9db
Merge branch 'dcode/virustotal-module' of github.com:dcode/beats into…
peasead Oct 22, 2020
2d1ea4e
updated dashboard and docs
peasead Oct 29, 2020
f647b3b
Parsed out packer list for all binaries, not just PEs
dcode Oct 30, 2020
4e86c74
Towards normalized symbol tables
dcode Nov 9, 2020
3e01dc1
Normalizing symbols
dcode Nov 17, 2020
2080552
move towards normalized symbol objects across all
dcode Dec 14, 2020
f87b6b9
Merge remote-tracking branch 'upstream/master' into dcode/virustotal-…
dcode Dec 14, 2020
93e5f8a
Merge remote-tracking branch 'upstream/master' into dcode/virustotal-…
dcode Dec 14, 2020
2b275c5
update to for nested fields
dcode Dec 16, 2020
be22d2f
Merge remote-tracking branch 'upstream/master' into dcode/virustotal-…
dcode Dec 17, 2020
959b6bd
Adjust fields.yml to implement nested types
dcode Dec 17, 2020
b8b1e72
Merge branch 'master' of github.com:elastic/beats into dcode/virustot…
dcode Jan 14, 2021
f78e752
catch up and sample documents of working ideas
dcode Jan 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG-developer.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ The list below covers the major changes between 7.0.0-rc2 and master only.

==== Added

- Add VirusTotal Intelligence Live Hunt module. {issue}21541[21541] {pull}21815[21815]
- Add configuration for APM instrumentation and expose the tracer trough the Beat object. {pull}17938[17938]
- Make the behavior of clientWorker and netClientWorker consistent when error is returned from publisher pipeline
- Metricset generator generates beta modules by default now. {pull}10657[10657]
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Change network.direction values to ECS recommended values (inbound, outbound). {issue}12445[12445] {pull}20695[20695]
- Docker container needs to be explicitly run as user root for auditing. {pull}21202[21202]
- File integrity dataset no longer includes the leading dot in `file.extension` values (e.g. it will report "png" instead of ".png") to comply with ECS. {pull}21644[21644]

*Filebeat*

- Introduce VirusTotal Intelligence Live Hunt module. {issue}21541[21541] {pull}21815[21815]

*Auditbeat*

- Use ECS 1.7 ingress/egress network directions instead of inbound/outbound. {pull}22991[22991]
- Use ingress/egress instead of inbound/outbound for ECS 1.7 in auditd module. {pull}23000[23000]

Expand Down
10 changes: 10 additions & 0 deletions x-pack/filebeat/filebeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1990,6 +1990,16 @@ filebeat.modules:
# can be added under this section.
#input:

#------------------------------ VirusTotal Module ------------------------------
- module: virustotal
# All logs
livehunt:
enabled: true
# Set the VirusTotal private API key
var.apikey: ""
# Set retrieval limit, maximum 40, default 10
var.limit: 10

#--------------------------------- Zeek Module ---------------------------------
- module: zeek
capture_loss:
Expand Down
1 change: 1 addition & 0 deletions x-pack/filebeat/include/list.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

128 changes: 128 additions & 0 deletions x-pack/filebeat/module/virustotal/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Development Workflow

In all examples below, `filebeat.dev.yml` is a local development only configuration with VT API key, creds to Elastic Cloud instance, etc.

My `filebeat.dev.yml` looks like this, with appropriate substitutions.

```yml
filebeat.modules:
- module: virustotal
livehunt:
enabled: True
var.input: httpjson # httpjson or kafka

# If consuming events from VirusTotal httpjson
var.api_key: INSERT-YOUR-VT-API-KEY
var.vt_fileinfo_url: https://www.virustotal.com/gui/file/
var.limit: 40 # maximum 40, default is 10

# If consuming raw events from Kafka
var.kafka_brokers:
- 127.0.0.1:9093
var.kafka_topics:
- virustotal.raw

cloud.id: "INSERT-YOUR-CLOUD-ID"
cloud.auth: "INSERT-YOUR-CLOUD-AUTH"
```

## Setup

Run filebeat setup to establish ILM policies, Elasticsearch mapping patterns, Kibana index patterns. This needs to happen every time you change a `fields.yml`. This configuration will overwrite existing items, except maybe Kibana index pattern. I just manually delete that before I run setup. I also delete the existing `filebeat-*` index in Elasticsearch too, to avoid mapping conflicts. See notes on `kafka` and `elastidump` below.

```shell
./filebeat -c filebeat.dev.yml -e setup -E setup.template.overwrite=true -E setup.ilm.overwrite=true -E setup.dashboards.directory=build/kibana
```

## Kafka

I added Kafka as an input type because I think some users will find this useful and it's incredibly useful to replay events from VT for development purposes. I used `docker-compose` to standup a local Kafka cluster for development purposes.

```shell
# Download my compose file
curl -O https://gist.githubusercontent.com/dcode/a79d24624aee11ca713250cc5ba02a22/raw/e519b85bad45b3a2f757fbdc2f9808c94969cf13/docker-compose.yml

# Bring up cluster
docker-compose up -d
```

Configure filebeat to use this as an input for the module in your `filebeat.dev.yml`, where `virustotal.raw` is the topic name of the unmodified LiveHunt notification file objects.

```yaml
filebeat.modules:
- module: virustotal
livehunt:
enabled: True
var.input: kafka
var.kafka_brokers:
- 127.0.0.1:9093
var.kafka_topics:
- virustotal.raw
```

## Replay Events using Kafka

First, save off existing events from the cluster. Do this before you delete the index in the **setup** step above.

```shell
# Install elasticdump, uses npm/node.js
npm install elasticdump -g

# Install kafkacat and jq
brew install kafkacat jq

# Dump the filebeat index
elasticdump --input=https://elastic:[email protected]:9243/filebeat-* \
--output=$ \
| gzip > data/filebeat-virustotal.json.gz

# Replay filebeat data into kafka topic (if setup using compose file above)
gzcat data/filebeat-virustotal.json.gz | jq -cr '._source.event.original' | kafkacat -b 127.0.0.1:9093 -P -t virustotal.raw
```

NOTES:

- `https://elastic:[email protected]:9243`: is the HTTPS endpoint as retreived from the Elastic Cloud panel, with the `elastic` username and password prefixed to the server. You can optionally use an HTTP auth ini file with `elasticdump`. See the `--help` output for specifics.
- Elasticdump can do a lot of things. In this scenario, I'm merely compressing it and writing it to a disk. These are the raw JSON documents as stored in Elasticsearch.
- I'm dumping only indices that match the pattern `filebeat-*`. You can make this more or less specific, as desired.
- I'm using `jq` here to output compact (single line) JSON documents as raw strings, which unquotes the field value.
- kafkacat here is connecting to the local broker `-b`, in producer mode `-P`, and writing to the topic `-t` using data from stdin.

If you wanted to validate or otherwise manipulate the raw data, you can use `kafkacat` in consumer mode `-C`. This example shows the first 10 records. You could pipe these to `jq` to format them.

```shell
kafkacat -b 127.0.0.1:9093 -C -t virustotal.raw | head | jq
```

Configure filebeat to use the kafka input as show above, and run it until all events are replayed. After which, you can switch back to `httpjson` as the input type and stream new data.

```shell
./filebeat -c filebeat.dev.yml -e
```
## Delete all Kibana saved objects between test runs

```bash
#!/bin/bash

# From the docs: https://www.elastic.co/guide/en/kibana/current/saved-objects-api-get.html#saved-objects-api-get-params
# Types can be: visualization, dashboard, search, index-pattern, config, timelion-sheet
# You can also have a map type, which isn't in the docs linked above

function clear_kibana() {
export KIBANA_API_URL="${KIBANA_API_URL:-http://elastic:[email protected]:5601}"
export OBJECTS=$(curl -s "${KIBANA_API_URL}/api/saved_objects/_find?fields=id&type=index-pattern&type=visualization&type=dashboard&type=search&type=index-pattern&type=timelion-sheet&type=map&per_page=1000" | jq -rc '.saved_objects[] | {"type": .type, "id": .id } | @base64')

# Loops through the base64-encoded JSON objects
for item in ${OBJECTS};
do
TYPE=$(echo "${item}" | base64 -d | jq -r '.type')
ID=$(echo "${item}" | base64 -d | jq -r '.id')

echo "Deleting ${TYPE} with ID ${ID}"
curl -s -H 'kbn-xsrf: true' -XDELETE "${KIBANA_API_URL}/api/saved_objects/${TYPE}/${ID}" >/dev/null

done
}

clear_kibana
```
8 changes: 8 additions & 0 deletions x-pack/filebeat/module/virustotal/_meta/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- module: virustotal
# All logs
livehunt:
enabled: true
# Set the VirusTotal private API key
var.apikey: ""
# Set retrieval limit, maximum 40, default 10
var.limit: 10
7 changes: 7 additions & 0 deletions x-pack/filebeat/module/virustotal/_meta/fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#spellchecker: disable
- key: virustotal
title: VirusTotal
description: >
Module for handling the VirusTotal API notifications
fields:

Loading