From b7ac174c08ee90771efef2e62cf09982d2f2a991 Mon Sep 17 00:00:00 2001 From: Seth Grover Date: Thu, 22 Sep 2022 11:02:37 -0600 Subject: [PATCH] testing relative links --- docs/authsetup.md | 99 ++++++++ docs/contributing-dashboards.md | 41 +++ docs/contributing-file-scanners.md | 14 ++ docs/contributing-local-modifications.md | 126 ++++++++++ docs/contributing-logstash.md | 51 ++++ docs/contributing-new-image.md | 18 ++ docs/contributing-new-log-fields.md | 11 + docs/contributing-pcap.md | 12 + docs/contributing-style.md | 5 + docs/contributing-zeek.md | 15 ++ docs/contributing.md | 302 ----------------------- docs/hedgehog-config-root.md | 47 ++++ docs/hedgehog-config-user.md | 189 ++++++++++++++ docs/hedgehog-config-zeek-intel.md | 7 + docs/hedgehog-config.md | 245 ------------------ docs/main.md | 2 +- docs/malcolm-iso.md | 4 +- docs/opensearch-instances.md | 78 ++++++ docs/running.md | 179 -------------- 19 files changed, 716 insertions(+), 729 deletions(-) create mode 100644 docs/authsetup.md create mode 100644 docs/contributing-dashboards.md create mode 100644 docs/contributing-file-scanners.md create mode 100644 docs/contributing-local-modifications.md create mode 100644 docs/contributing-logstash.md create mode 100644 docs/contributing-new-image.md create mode 100644 docs/contributing-new-log-fields.md create mode 100644 docs/contributing-pcap.md create mode 100644 docs/contributing-style.md create mode 100644 docs/contributing-zeek.md create mode 100644 docs/hedgehog-config-root.md create mode 100644 docs/hedgehog-config-user.md create mode 100644 docs/hedgehog-config-zeek-intel.md create mode 100644 docs/opensearch-instances.md diff --git a/docs/authsetup.md b/docs/authsetup.md new file mode 100644 index 000000000..eb5075304 --- /dev/null +++ b/docs/authsetup.md @@ -0,0 +1,99 @@ +### Configure authentication + +Malcolm requires authentication to access the [user interface](#UserInterfaceURLs). [Nginx](https://nginx.org/) can authenticate users with either local TLS-encrypted HTTP basic authentication or using a remote Lightweight Directory Access Protocol (LDAP) authentication server. + +With the local basic authentication method, user accounts are managed by Malcolm and can be created, modified, and deleted using a [user management web interface](#AccountManagement). This method is suitable in instances where accounts and credentials do not need to be synced across many Malcolm installations. + +LDAP authentication are managed on a remote directory service, such as a [Microsoft Active Directory Domain Services](https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/get-started/virtual-dc/active-directory-domain-services-overview) or [OpenLDAP](https://www.openldap.org/). + +Malcolm's authentication method is defined in the `x-auth-variables` section near the top of the [`docker-compose.yml`](#DockerComposeYml) file with the `NGINX_BASIC_AUTH` environment variable: `true` for local TLS-encrypted HTTP basic authentication, `false` for LDAP authentication. + +In either case, you **must** run `./scripts/auth_setup` before starting Malcolm for the first time in order to: + +* define the local Malcolm administrator account username and password (although these credentials will only be used for basic authentication, not LDAP authentication) +* specify whether or not to (re)generate the self-signed certificates used for HTTPS access + * key and certificate files are located in the `nginx/certs/` directory +* specify whether or not to (re)generate the self-signed certificates used by a remote log forwarder (see the `BEATS_SSL` environment variable above) + * certificate authority, certificate, and key files for Malcolm's Logstash instance are located in the `logstash/certs/` directory + * certificate authority, certificate, and key files to be copied to and used by the remote log forwarder are located in the `filebeat/certs/` directory; if using [Hedgehog Linux](#Hedgehog), these certificates should be copied to the `/opt/sensor/sensor_ctl/logstash-client-certificates` directory on the sensor +* specify whether or not to [store the username/password](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#authenticate-sender-account) for [email alert senders](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#create-destinations) + * these parameters are stored securely in the OpenSearch keystore file `opensearch/opensearch.keystore` + +##### Local account management + +[`auth_setup`](#AuthSetup) is used to define the username and password for the administrator account. Once Malcolm is running, the administrator account can be used to manage other user accounts via a **Malcolm User Management** page served over HTTPS on port 488 (e.g., [https://localhost:488](https://localhost:488) if you are connecting locally). + +Malcolm user accounts can be used to access the [interfaces](#UserInterfaceURLs) of all of its [components](#Components), including Arkime. Arkime uses its own internal database of user accounts, so when a Malcolm user account logs in to Arkime for the first time Malcolm creates a corresponding Arkime user account automatically. This being the case, it is *not* recommended to use the Arkime **Users** settings page or change the password via the **Password** form under the Arkime **Settings** page, as those settings would not be consistently used across Malcolm. + +Users may change their passwords via the **Malcolm User Management** page by clicking **User Self Service**. A forgotten password can also be reset via an emailed link, though this requires SMTP server settings to be specified in `htadmin/config.ini` in the Malcolm installation directory. + +#### Lightweight Directory Access Protocol (LDAP) authentication + +The [nginx-auth-ldap](https://github.com/kvspb/nginx-auth-ldap) module serves as the interface between Malcolm's [Nginx](https://nginx.org/) web server and a remote LDAP server. When you run [`auth_setup`](#AuthSetup) for the first time, a sample LDAP configuration file is created at `nginx/nginx_ldap.conf`. + +``` +# This is a sample configuration for the ldap_server section of nginx.conf. +# Yours will vary depending on how your Active Directory/LDAP server is configured. +# See https://github.com/kvspb/nginx-auth-ldap#available-config-parameters for options. + +ldap_server ad_server { + url "ldap://ds.example.com:3268/DC=ds,DC=example,DC=com?sAMAccountName?sub?(objectClass=person)"; + + binddn "bind_dn"; + binddn_passwd "bind_dn_password"; + + group_attribute member; + group_attribute_is_dn on; + require group "CN=Malcolm,CN=Users,DC=ds,DC=example,DC=com"; + require valid_user; + satisfy all; +} + +auth_ldap_cache_enabled on; +auth_ldap_cache_expiration_time 10000; +auth_ldap_cache_size 1000; +``` + +This file is mounted into the `nginx` container when Malcolm is started to provide connection information for the LDAP server. + +The contents of `nginx_ldap.conf` will vary depending on how the LDAP server is configured. Some of the [avaiable parameters](https://github.com/kvspb/nginx-auth-ldap#available-config-parameters) in that file include: + +* **`url`** - the `ldap://` or `ldaps://` connection URL for the remote LDAP server, which has the [following syntax](https://www.ietf.org/rfc/rfc2255.txt): `ldap[s]://:/???` +* **`binddn`** and **`binddn_password`** - the account credentials used to query the LDAP directory +* **`group_attribute`** - the group attribute name which contains the member object (e.g., `member` or `memberUid`) +* **`group_attribute_is_dn`** - whether or not to search for the user's full distinguished name as the value in the group's member attribute +* **`require`** and **`satisfy`** - `require user`, `require group` and `require valid_user` can be used in conjunction with `satisfy any` or `satisfy all` to limit the users that are allowed to access the Malcolm instance + +Before starting Malcolm, edit `nginx/nginx_ldap.conf` according to the specifics of your LDAP server and directory tree structure. Using a LDAP search tool such as [`ldapsearch`](https://www.openldap.org/software/man.cgi?query=ldapsearch) in Linux or [`dsquery`](https://social.technet.microsoft.com/wiki/contents/articles/2195.active-directory-dsquery-commands.aspx) in Windows may be of help as you formulate the configuration. Your changes should be made within the curly braces of the `ldap_server ad_server { … }` section. You can troubleshoot configuration file syntax errors and LDAP connection or credentials issues by running `./scripts/logs` (or `docker-compose logs nginx`) and examining the output of the `nginx` container. + +The **Malcolm User Management** page described above is not available when using LDAP authentication. + +##### LDAP connection security + +Authentication over LDAP can be done using one of three ways, [two of which](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-adts/8e73932f-70cf-46d6-88b1-8d9f86235e81) offer data confidentiality protection: + +* **StartTLS** - the [standard extension](https://tools.ietf.org/html/rfc2830) to the LDAP protocol to establish an encrypted SSL/TLS connection within an already established LDAP connection +* **LDAPS** - a commonly used (though unofficial and considered deprecated) method in which SSL negotiation takes place before any commands are sent from the client to the server +* **Unencrypted** (cleartext) (***not recommended***) + +In addition to the `NGINX_BASIC_AUTH` environment variable being set to `false` in the `x-auth-variables` section near the top of the [`docker-compose.yml`](#DockerComposeYml) file, the `NGINX_LDAP_TLS_STUNNEL` and `NGINX_LDAP_TLS_STUNNEL` environment variables are used in conjunction with the values in `nginx/nginx_ldap.conf` to define the LDAP connection security level. Use the following combinations of values to achieve the connection security methods above, respectively: + +* **StartTLS** + - `NGINX_LDAP_TLS_STUNNEL` set to `true` in [`docker-compose.yml`](#DockerComposeYml) + - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` +* **LDAPS** + - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](#DockerComposeYml) + - `url` should begin with `ldaps://` and its port should be either the default LDAPS port (636) or the default LDAPS Global Catalog port (3269) in `nginx/nginx_ldap.conf` +* **Unencrypted** (clear text) (***not recommended***) + - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](#DockerComposeYml) + - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` + +For encrypted connections (whether using **StartTLS** or **LDAPS**), Malcolm will require and verify certificates when one or more trusted CA certificate files are placed in the `nginx/ca-trust/` directory. Otherwise, any certificate presented by the domain server will be accepted. + +### TLS certificates + +When you [set up authentication](#AuthSetup) for Malcolm a set of unique [self-signed](https://en.wikipedia.org/wiki/Self-signed_certificate) TLS certificates are created which are used to secure the connection between clients (e.g., your web browser) and Malcolm's browser-based interface. This is adequate for most Malcolm instances as they are often run locally or on internal networks, although your browser will most likely require you to add a security exception for the certificate the first time you connect to Malcolm. + +Another option is to generate your own certificates (or have them issued to you) and have them placed in the `nginx/certs/` directory. The certificate and key file should be named `cert.pem` and `key.pem`, respectively. + +A third possibility is to use a third-party reverse proxy (e.g., [Traefik](https://doc.traefik.io/traefik/) or [Caddy](https://caddyserver.com/docs/quick-starts/reverse-proxy)) to handle the issuance of the certificates for you and to broker the connections between clients and Malcolm. Reverse proxies such as these often implement the [ACME](https://datatracker.ietf.org/doc/html/rfc8555) protocol for domain name authentication and can be used to request certificates from certificate authorities like [Let's Encrypt](https://letsencrypt.org/how-it-works/). In this configuration, the reverse proxy will be encrypting the connections instead of Malcolm, so you'll need to set the `NGINX_SSL` environment variable to `false` in [`docker-compose.yml`](#DockerComposeYml) (or answer `no` to the "Require encrypted HTTPS connections?" question posed by `install.py`). If you are setting `NGINX_SSL` to `false`, **make sure** you understand what you are doing and ensure that external connections cannot reach ports over which Malcolm will be communicating without encryption, including verifying your local firewall configuration. \ No newline at end of file diff --git a/docs/contributing-dashboards.md b/docs/contributing-dashboards.md new file mode 100644 index 000000000..835959f76 --- /dev/null +++ b/docs/contributing-dashboards.md @@ -0,0 +1,41 @@ +## OpenSearch Dashboards + +[OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) is an open-source fork of [Kibana](https://www.elastic.co/kibana/), which is [no longer open-source software](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0). + +### Adding new visualizations and dashboards + +Visualizations and dashboards can be [easily created](../../README.md#BuildDashboard) in OpenSearch Dashboards using its drag-and-drop WYSIWIG tools. Assuming you've created a new dashboard you wish to package with Malcolm, the dashboard and its visualization components can be exported using the following steps: + +1. Identify the ID of the dashboard (found in the URL: e.g., for `/dashboards/app/dashboards#/view/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` the ID would be `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) +1. Export the dashboard with that ID and save it in the `./dashboards./dashboards/` directory with the following command: + ``` + export DASHID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx && \ + docker-compose exec dashboards curl -XGET \ + "http://localhost:5601/dashboards/api/opensearch-dashboards/dashboards/export?dashboard=$DASHID" > \ + ./dashboards/dashboards/$DASHID.json + ``` +1. It's preferrable for Malcolm to dynamically create the `arkime_sessions3-*` index template rather than including it in imported dashboards, so edit the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.json` that was generated, and carefully locate and remove the section with the `id` of `arkime_sessions3-*` and the `type` of `index-pattern` (including the comma preceding it): + ``` + , + { + "id": "arkime_sessions3-*", + "type": "index-pattern", + "namespaces": [ + "default" + ], + "updated_at": "2021-12-13T18:21:42.973Z", + "version": "Wzk3MSwxXQ==", + … + "references": [], + "migrationVersion": { + "index-pattern": "7.6.0" + } + } + ``` +1. Include the new dashboard either by using a [bind mount](#Bind) for the `./dashboards./dashboards/` directory or by [rebuilding](#Build) the `dashboards-helper` Docker image. Dashboards are imported the first time Malcolm starts up. + +### OpenSearch Dashboards plugins + +The [dashboards.Dockerfile](../../Dockerfiles/dashboards.Dockerfile) installs the OpenSearch Dashboards plugins used by Malcolm (search for `opensearch-dashboards-plugin install` in that file). Additional Dashboards plugins could be installed by modifying this Dockerfile and [rebuilding](#Build) the `dashboards` Docker image. + +Third-party or community plugisn developed for Kibana will not install into OpenSearch dashboards without source code modification. Depending on the plugin, this could range from very smiple to very complex. As an illustrative example, the changes that were required to port the Sankey diagram visualization plugin from Kibana to OpenSearch Dashboards compatibility can be [viewed on GitHub](https://github.com/mmguero-dev/osd_sankey_vis/compare/edacf6b...main). diff --git a/docs/contributing-file-scanners.md b/docs/contributing-file-scanners.md new file mode 100644 index 000000000..dd3b3aa19 --- /dev/null +++ b/docs/contributing-file-scanners.md @@ -0,0 +1,14 @@ +## Carved file scanners + +Similar to the [PCAP processing pipeline](#PCAP) described above, new tools can plug into Malcolm's [automatic file extraction and scanning](../../README.md#ZeekFileExtraction) to examine file transfers carved from network traffic. + +When Zeek extracts a file it observes being transfered in network traffic, the `file-monitor` container picks up those extracted files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that extracted file. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), currently implemented file scanners include ClamAV, YARA, capa and VirusTotal, all of which are managed by the `file-monitor` container. The scripts involved in this code are: + +* [shared/bin/zeek_carve_watcher.py](../../shared/bin/zeek_carve_watcher.py) - watches the directory to which Zeek extracts files and publishes information about those files to the ZeroMQ ventilator on port 5987 +* [shared/bin/zeek_carve_scanner.py](../../shared/bin/zeek_carve_scanner.py) - subscribes to `zeek_carve_watcher.py`'s topic and performs file scanning for the ClamAV, YARA, capa and VirusTotal engines and sends "hits" to another ZeroMQ sync on port 5988 +* [shared/bin/zeek_carve_logger.py](../../shared/bin/zeek_carve_logger.py) - subscribes to `zeek_carve_scanner.py`'s topic and logs hits to a "fake" Zeek signatures.log file which is parsed and ingested by Logstash +* [shared/bin/zeek_carve_utils.py](../../shared/bin/zeek_carve_utils.py) - various variables and classes related to carved file scanning + +Additional file scanners could either be added to the `file-monitor` service, or to avoid coupling with Malcolm's code you could simply define a new service as instructed in the [Adding a new service](#NewImage) section and write your own scripts to subscribe and publish to the topics as described above. While that might be a bit of hand-waving, these general steps take care of the plumbing around extracting the file and notifying your tool, as well as handling the logging of "hits": you shouldn't have to really edit any *existing* code to add a new carved file scanner. + +The `EXTRACTED_FILE_PIPELINE_DEBUG` and `EXTRACTED_FILE_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the carved file processing pipeline. \ No newline at end of file diff --git a/docs/contributing-local-modifications.md b/docs/contributing-local-modifications.md new file mode 100644 index 000000000..77aeb5ba5 --- /dev/null +++ b/docs/contributing-local-modifications.md @@ -0,0 +1,126 @@ +## Local modifications + +There are several ways to customize Malcolm's runtime behavior via local changes to configuration files. Many commonly-tweaked settings are discussed in the project [README](../../README.md) (see [`docker-compose.yml` parameters](../../README.md#DockerComposeYml) and [Customizing event severity scoring](../../README.md#SeverityConfig) for some examples). + +### Docker bind mounts + +Some configuration changes can be put in place by modifying local copies of configuration files and then use a [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) to overlay the modified file onto the running Malcolm container. This is already done for many files and directories used to persist Malcolm configuration and data. For example, the default list of bind mounted files and directories for each Malcolm service is as follows: + +``` +$ grep -P "^( - ./| [\w-]+:)" docker-compose-standalone.yml + opensearch: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro + - ./opensearch/opensearch.keystore:/usr/share/opensearch/config/opensearch.keystore:rw + - ./opensearch:/usr/share/opensearch/data:delegated + - ./opensearch-backup:/opt/opensearch/backup:delegated + dashboards-helper: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + dashboards: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + logstash: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro + - ./logstash/maps/malcolm_severity.yaml:/etc/malcolm_severity.yaml:ro + - ./logstash/certs/ca.crt:/certs/ca.crt:ro + - ./logstash/certs/server.crt:/certs/server.crt:ro + - ./logstash/certs/server.key:/certs/server.key:ro + - ./cidr-map.txt:/usr/share/logstash/config/cidr-map.txt:ro + - ./host-map.txt:/usr/share/logstash/config/host-map.txt:ro + - ./net-map.json:/usr/share/logstash/config/net-map.json:ro + filebeat: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./zeek-logs:/zeek + - ./suricata-logs:/suricata + - ./filebeat/certs/ca.crt:/certs/ca.crt:ro + - ./filebeat/certs/client.crt:/certs/client.crt:ro + - ./filebeat/certs/client.key:/certs/client.key:ro + arkime: + - ./auth.env + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./pcap:/data/pcap + - ./arkime-logs:/opt/arkime/logs + - ./arkime-raw:/opt/arkime/raw + zeek: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./pcap:/pcap + - ./zeek-logs/upload:/zeek/upload + - ./zeek-logs/extract_files:/zeek/extract_files + - ./zeek/intel:/opt/zeek/share/zeek/site/intel + zeek-live: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./zeek-logs/live:/zeek/live + - ./zeek-logs/extract_files:/zeek/extract_files + - ./zeek/intel:/opt/zeek/share/zeek/site/intel + suricata: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./suricata-logs:/var/log/suricata + - ./pcap:/data/pcap + - ./suricata/rules:/opt/suricata/rules:ro + suricata-live: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./suricata-logs:/var/log/suricata + - ./suricata/rules:/opt/suricata/rules:ro + file-monitor: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./zeek-logs/extract_files:/zeek/extract_files + - ./zeek-logs/current:/zeek/logs + - ./yara/rules:/yara-rules/custom:ro + pcap-capture: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./pcap/upload:/pcap + pcap-monitor: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./zeek-logs:/zeek + - ./pcap:/pcap + upload: + - ./auth.env + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./pcap/upload:/var/www/upload/server/php/chroot/files + htadmin: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./htadmin/config.ini:/var/www/htadmin/config/config.ini:rw + - ./htadmin/metadata:/var/www/htadmin/config/metadata:rw + - ./nginx/htpasswd:/var/www/htadmin/config/htpasswd:rw + freq: + - ./nginx/ca-trust:/var/local/ca-trust:ro + name-map-ui: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./cidr-map.txt:/var/www/html/maps/cidr-map.txt:ro + - ./host-map.txt:/var/www/html/maps/host-map.txt:ro + - ./net-map.json:/var/www/html/maps/net-map.json:rw + api: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + nginx-proxy: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./nginx/nginx_ldap.conf:/etc/nginx/nginx_ldap.conf:ro + - ./nginx/htpasswd:/etc/nginx/.htpasswd:ro + - ./nginx/certs:/etc/nginx/certs:ro + - ./nginx/certs/dhparam.pem:/etc/nginx/dhparam/dhparam.pem:ro +``` + +So, for example, if you wanted to make a change to the `nginx-proxy` container's `nginx.conf` file, you could add the following line to the `volumes:` section of the `nginx-proxy` service in your `docker-compose.yml` file: + +``` +- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro +``` + +The change would take effect after stopping and starting Malcolm. + +See the documentation on [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) for more information on this technique. + +### Building Malcolm's Docker images + +Another method for modifying your local copies of Malcolm's services' containers is to [build your own](../../README.md#Build) containers with the modifications baked-in. + +For example, say you wanted to create a Malcolm container which includes a new dashboard for OpenSearch Dashboards and a new enrichment filter `.conf` file for Logstash. After placing these files under `./dashboards/dashboards` and `./logstash/pipelines/enrichment`, respectively, in your Malcolm working copy, run `./build.sh dashboards-helper logstash` to build just those containers. After the build completes, you can run `docker images` and see you have fresh images for `malcolmnetsec/dashboards-helper` and `malcolmnetsec/logstash-oss`. You may need to review the contents of the [Dockerfiles](../../Dockerfiles) to determine the correct service and filesystem location within that service's Docker image depending on what you're trying to accomplish. + +Alternately, if you have forked Malcolm on GitHub, [workflow files](../../.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](#Hedgehog) and [Malcolm](#ISO) installer ISOs. The resulting images are named according to the pattern `ghcr.io/owner/malcolmnetsec/image:branch` (e.g., if you've forked Malcolm with the github user `romeogdetlevjr`, the `arkime` container built for the `main` would be named `ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main`). To run your local instance of Malcolm using these images instead of the official ones, you'll need to edit your `docker-compose.yml` file(s) and replace the `image:` tags according to this new pattern, or use the bash helper script `./shared/bin/github_image_helper.sh` to pull and re-tag the images. \ No newline at end of file diff --git a/docs/contributing-logstash.md b/docs/contributing-logstash.md new file mode 100644 index 000000000..a2598d3b9 --- /dev/null +++ b/docs/contributing-logstash.md @@ -0,0 +1,51 @@ +## Logstash + +### Parsing a new log data source + +Let's continue with the example of the `cooltool` service we added in the [PCAP processors](#PCAP) section above, assuming that `cooltool` generates some textual log files we want to parse and index into Malcolm. + +You'd have configured `cooltool` in your `cooltool.Dockerfile` and its section in the `docker-compose` files to write logs into a subdirectory or subdirectories in a shared folder [bind mounted](#Bind) in such a way that both the `cooltool` and `filebeat` containers can access. Referring to the `zeek` container as an example, this is how the `./zeek-logs` folder is handled; both the `filebeat` and `zeek` services have `./zeek-logs` in their `volumes:` section: + +``` +$ grep -P "^( - ./zeek-logs| [\w-]+:)" docker-compose.yml | grep -B1 "zeek-logs" + filebeat: + - ./zeek-logs:/data/zeek +-- + zeek: + - ./zeek-logs/upload:/zeek/upload +… +``` + +You'll need to provide access to your `cooltool` logs in a similar fashion. + +Next, tweak [`filebeat.yml`](../../filebeat/filebeat.yml) by adding a new log input path pointing to the `cooltool` logs to send them along to the `logstash` container. This modified `filebeat.yml` will need to be reflected in the `filebeat` container via [bind mount](#Bind) or by [rebuilding](#Build) it. + +Logstash can then be easily extended to add more [`logstash/pipelines`](../../logstash/pipelines). At the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), the Logstash pipelines basically look like this: + +* input (from `filebeat`) sends logs to 1..*n* **parse pipelines** +* each **parse pipeline** does what it needs to do to parse its logs then sends them to the [**enrichment pipeline**](#LogstashEnrichments) +* the [**enrichment pipeline**](../../logstash/pipelines/enrichment) performs common lookups to the fields that have been normalized and indexes the logs into the OpenSearch data store + +So, in order to add a new **parse pipeline** for `cooltool` after tweaking [`filebeat.yml`](../../filebeat/filebeat.yml) as described above, create a `cooltool` directory under [`logstash/pipelines`](../../logstash/pipelines) which follows the same pattern as the `zeek` parse pipeline. This directory will have an input file (tiny), a filter file (possibly large), and an output file (tiny). In your filter file, be sure to set the field [`event.hash`](https://www.elastic.co/guide/en/ecs/master/ecs-event.html#field-event-hash) to a unique value to identify indexed documents in OpenSearch; the [fingerprint filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html) may be useful for this. + +Finally, in your `docker-compose` files, set a new `LOGSTASH_PARSE_PIPELINE_ADDRESSES` environment variable under `logstash-variables` to `cooltool-parse,zeek-parse,suricata-parse,beats-parse` (assuming you named the pipeline address from the previous step `cooltool-parse`) so that logs sent from `filebeat` to `logstash` are forwarded to all parse pipelines. + +### Parsing new Zeek logs + +The following modifications must be made in order for Malcolm to be able to parse new Zeek log files: + +1. Add a parsing section to [`logstash/pipelines/zeek/11_zeek_logs.conf`](../../logstash/pipelines/zeek/11_zeek_logs.conf) + * Follow patterns for existing log files as an example + * For common Zeek fields like the `id` four-tuple, timestamp, etc., use the same convention used by existing Zeek logs in that file (e.g., `ts`, `uid`, `orig_h`, `orig_p`, `resp_h`, `resp_p`) + * Take care, especially when copy-pasting filter code, that the Zeek delimiter isn't modified from a tab character to a space character (see "*zeek's default delimiter is a literal tab, MAKE SURE YOUR EDITOR DOESN'T SCREW IT UP*" warnings in that file) +1. If necessary, perform log normalization in [`logstash/pipelines/zeek/12_zeek_normalize.conf`](../../logstash/pipelines/zeek/12_zeek_normalize.conf) for values like action (`event.action`), result (`event.result`), application protocol version (`network.protocol_version`), etc. +1. If necessary, define conversions for floating point or integer values in [`logstash/pipelines/zeek/11_zeek_logs.conf`](../../logstash/pipelines/zeek/13_zeek_convert.conf) +1. Identify the new fields and add them as described in [Adding new log fields](#NewFields) + +### Enrichments + +Malcolm's Logstash instance will do a lot of enrichments for you automatically: see the [enrichment pipeline](../../logstash/pipelines/enrichment), including MAC address to vendor by OUI, GeoIP, ASN, and a few others. In order to take advantage of these enrichments that are already in place, normalize new fields to use the same standardized field names Malcolm uses for things like IP addresses, MAC addresses, etc. You can add your own additional enrichments by creating new `.conf` files containing [Logstash filters](https://www.elastic.co/guide/en/logstash/7.10/filter-plugins.html) in the [enrichment pipeline](../../logstash/pipelines/enrichment) directory and using either of the techniques in the [Local modifications](#LocalMods) section to implement your changes in the `logstash` container + +### Logstash plugins + +The [logstash.Dockerfile](../../Dockerfiles/logstash.Dockerfile) installs the Logstash plugins used by Malcolm (search for `logstash-plugin install` in that file). Additional Logstash plugins could be installed by modifying this Dockerfile and [rebuilding](#Build) the `logstash` Docker image. \ No newline at end of file diff --git a/docs/contributing-new-image.md b/docs/contributing-new-image.md new file mode 100644 index 000000000..6f305e2f3 --- /dev/null +++ b/docs/contributing-new-image.md @@ -0,0 +1,18 @@ +## Adding a new service (Docker image) + +A new service can be added to Malcolm by following the following steps: + +1. Create a new subdirectory for the service (under the Malcolm working copy base directory) containing whatever source or configuration files are necessary to build and run the service +1. Create the service's Dockerfile in the [Dockerfiles](../../Dockerfiles) directory of your Malcolm working copy +1. Add a new section for your service under `services:` in the `docker-compose.yml` and `docker-compose-standalone.yml` files +1. If you want to enable automatic builds for your service on GitHub, create a new [workflow](../../.github/workflows/), using an existing workflow as an example + +### Networking and firewall + +If your service needs to expose a web interface to the user, you'll need to adjust the following files: + +* Ensure your service's section in the `docker-compose` files uses the `expose` directive to indicate which ports its providing +* Add the service to the `depends_on` section of the `nginx-proxy` service in the `docker-compose` files +* Modify the configuration of the `nginx-proxy` container (in [`nginx/nginx.conf`](../../nginx/nginx.conf)) to define `upstream` and `location` directives to point to your service + +Avoid publishing ports directly from your container to the host machine's network interface if at all possible. The `nginx-proxy` container handles encryption and authentication and should sit in front of any user-facing interface provided by Malcolm. \ No newline at end of file diff --git a/docs/contributing-new-log-fields.md b/docs/contributing-new-log-fields.md new file mode 100644 index 000000000..25f391349 --- /dev/null +++ b/docs/contributing-new-log-fields.md @@ -0,0 +1,11 @@ +## Adding new log fields + +As several of the sections in this document will reference adding new data source fields, we'll cover that here at the beginning. + +Although OpenSearch is a NoSQL database and as-such is "unstructured" and "schemaless," in order to add a new data source field you'll need to define that field in a few places in order for it to show up and be usable throughout Malcolm. Minimally, you'll probably want to do it in these three files + +* [`arkime/etc/config.ini`](../../arkime/etc/config.ini) - follow existing examples in the `[custom-fields]` and `[custom-views]` sections in order for [Arkime](https://arkime.com) to be aware of your new fields +* [`arkime/wise/source.zeeklogs.js`](../../arkime/wise/source.zeeklogs.js) - add new fields to the `allFields` array for Malcolm to create Arkime [value actions](https://arkime.com/settings#right-click) for your fields +* [`dashboards/templates/composable/component/__(name)__.json`](../../dashboards/templates/composable/component/) - add new fields to a new [composable index template](https://opensearch.org/docs/latest/opensearch/index-templates/#composable-index-templates) file in this directory and add its name (prefixed with `custom_`) to the `composed_of` section of [`dashboards/templates/malcolm_template.json`](../../dashboards/templates/malcolm_template.json) in order for it to be included as part of the `arkime_sessions3-*` [index template](https://opensearch.org/docs/latest/opensearch/index-templates/) used by Arkime and OpenSearch Dashboards in Malcolm + +When possible, I recommend you to use (or at least take inspiration from) the [Elastic Common Schema (ECS) Reference](https://www.elastic.co/guide/en/ecs/current/index.html) when deciding how to define new field names. \ No newline at end of file diff --git a/docs/contributing-pcap.md b/docs/contributing-pcap.md new file mode 100644 index 000000000..b847c0e2e --- /dev/null +++ b/docs/contributing-pcap.md @@ -0,0 +1,12 @@ +## PCAP processors + +When a PCAP is uploaded (either through Malcolm's [upload web interface](../../README.md#Upload) or just copied manually into the `./pcap/upload/` directory), the `pcap-monitor` container has a script that picks up those PCAP files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that PCAP. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), there are two of those: the `zeek` container and the `arkime` container. In Malcolm, they actually both share the [same script](../../shared/bin/pcap_processor.py) to read from that topic and run the PCAP through Zeek and Arkime, respectively. If you're looking for an example to follow, the `zeek` container is the less complicated of the two. So, if you were looking to integrate a new PCAP processing tool into Malcolm (named `cooltool` for this example), the process would be something like: + +1. Define your service as instructed in the [Adding a new service](#NewImage) section + * Note how the existing `zeek` and `arkime` services use [bind mounts](#Bind) to access the local `./pcap` directory +1. Write a script (modelled after [the one](../../shared/bin/pcap_processor.py) `zeek` and `arkime` use, if you like) which subscribes to the PCAP topic port (`30441` as defined in [pcap_utils.py](../../shared/bin/pcap_utils.py)) and handles the PCAP files published there, each PCAP file represented by a JSON dictionary with `name`, `tags`, `size`, `type` and `mime` keys (search for `FILE_INFO_` in [pcap_utils.py](../../shared/bin/pcap_utils.py)). This script should be added to and run by your `cooltool.Dockerfile`-generated container. +1. Add whatever other logic needed to get your tool's data into Malcolm, whether by writing it directly info OpenSearch or by sending log files for parsing and enrichment by [Logstash](#Logstash) (especially see the section on [Parsing a new log data source](#LogstashNewSource)) + +While that might be a bit of hand-waving, these general steps take care of the PCAP processing piece: you shouldn't have to really edit any *existing* code to add a new PCAP processor. You're just creating a new container for the Malcolm appliance to the ZeroMQ topic and handle the PCAPs your tool receives. + +The `PCAP_PIPELINE_DEBUG` and `PCAP_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the PCAP processing pipeline. diff --git a/docs/contributing-style.md b/docs/contributing-style.md new file mode 100644 index 000000000..7fdcb819c --- /dev/null +++ b/docs/contributing-style.md @@ -0,0 +1,5 @@ +## Style + +### Python + +For Python code found in Malcolm, the author uses [Black: The uncompromising Python code formatter](https://github.com/psf/black) with the options `--line-length 120 --skip-string-normalization`. \ No newline at end of file diff --git a/docs/contributing-zeek.md b/docs/contributing-zeek.md new file mode 100644 index 000000000..7666b5c43 --- /dev/null +++ b/docs/contributing-zeek.md @@ -0,0 +1,15 @@ +## Zeek + +### `local.zeek` + +Some Zeek behavior can be tweaked without having to manually edit configuration files through the use of environment variables: search for `ZEEK` in the [`docker-compose.yml` parameters](../../README.md#DockerComposeYml) section of the documentation. + +Other changes to Zeek's behavior could be made by modifying [local.zeek](../../zeek/config/local.zeek) and either using a [bind mount](#Bind) or [rebuilding](#Build) the `zeek` Docker image with the modification. See the [Zeek documentation](https://docs.zeek.org/en/master/quickstart.html#local-site-customization) for more information on customizing a Zeek instance. Note that changing Zeek's behavior could result in changes to the format of the logs Zeek generates, which could break Malcolm's parsing of those logs, so exercise caution. + +### Adding a new Zeek package + +The easiest way to add a new Zeek package to Malcolm is to add the git URL of that package to the `ZKG_GITHUB_URLS` array in [zeek_install_plugins.sh](../../shared/bin/zeek_install_plugins.sh) script and then [rebuilding](#Build) the `zeek` Docker image. This will cause your package to be installed (via the [`zkg`](https://docs.zeek.org/projects/package-manager/en/stable/zkg.html) command-line tool). See [Parsing new Zeek logs](#LogstashZeek) on how to process any new `.log` files if your package generates them. + +### Zeek Intelligence Framework + +See [Zeek Intelligence Framework](../../README.md#ZeekIntel) in the Malcolm README for information on how to use Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) with Malcolm. \ No newline at end of file diff --git a/docs/contributing.md b/docs/contributing.md index 8f9a1b86d..851abe2ea 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -26,308 +26,6 @@ The purpose of this document is to provide some direction for those willing to m * [Carved file scanners](#Scanners) * [Style](#Style) -## Local modifications - -There are several ways to customize Malcolm's runtime behavior via local changes to configuration files. Many commonly-tweaked settings are discussed in the project [README](../../README.md) (see [`docker-compose.yml` parameters](../../README.md#DockerComposeYml) and [Customizing event severity scoring](../../README.md#SeverityConfig) for some examples). - -### Docker bind mounts - -Some configuration changes can be put in place by modifying local copies of configuration files and then use a [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) to overlay the modified file onto the running Malcolm container. This is already done for many files and directories used to persist Malcolm configuration and data. For example, the default list of bind mounted files and directories for each Malcolm service is as follows: - -``` -$ grep -P "^( - ./| [\w-]+:)" docker-compose-standalone.yml - opensearch: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro - - ./opensearch/opensearch.keystore:/usr/share/opensearch/config/opensearch.keystore:rw - - ./opensearch:/usr/share/opensearch/data:delegated - - ./opensearch-backup:/opt/opensearch/backup:delegated - dashboards-helper: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - dashboards: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - logstash: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro - - ./logstash/maps/malcolm_severity.yaml:/etc/malcolm_severity.yaml:ro - - ./logstash/certs/ca.crt:/certs/ca.crt:ro - - ./logstash/certs/server.crt:/certs/server.crt:ro - - ./logstash/certs/server.key:/certs/server.key:ro - - ./cidr-map.txt:/usr/share/logstash/config/cidr-map.txt:ro - - ./host-map.txt:/usr/share/logstash/config/host-map.txt:ro - - ./net-map.json:/usr/share/logstash/config/net-map.json:ro - filebeat: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./zeek-logs:/zeek - - ./suricata-logs:/suricata - - ./filebeat/certs/ca.crt:/certs/ca.crt:ro - - ./filebeat/certs/client.crt:/certs/client.crt:ro - - ./filebeat/certs/client.key:/certs/client.key:ro - arkime: - - ./auth.env - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./pcap:/data/pcap - - ./arkime-logs:/opt/arkime/logs - - ./arkime-raw:/opt/arkime/raw - zeek: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./pcap:/pcap - - ./zeek-logs/upload:/zeek/upload - - ./zeek-logs/extract_files:/zeek/extract_files - - ./zeek/intel:/opt/zeek/share/zeek/site/intel - zeek-live: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./zeek-logs/live:/zeek/live - - ./zeek-logs/extract_files:/zeek/extract_files - - ./zeek/intel:/opt/zeek/share/zeek/site/intel - suricata: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./suricata-logs:/var/log/suricata - - ./pcap:/data/pcap - - ./suricata/rules:/opt/suricata/rules:ro - suricata-live: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./suricata-logs:/var/log/suricata - - ./suricata/rules:/opt/suricata/rules:ro - file-monitor: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./zeek-logs/extract_files:/zeek/extract_files - - ./zeek-logs/current:/zeek/logs - - ./yara/rules:/yara-rules/custom:ro - pcap-capture: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./pcap/upload:/pcap - pcap-monitor: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./zeek-logs:/zeek - - ./pcap:/pcap - upload: - - ./auth.env - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./pcap/upload:/var/www/upload/server/php/chroot/files - htadmin: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./htadmin/config.ini:/var/www/htadmin/config/config.ini:rw - - ./htadmin/metadata:/var/www/htadmin/config/metadata:rw - - ./nginx/htpasswd:/var/www/htadmin/config/htpasswd:rw - freq: - - ./nginx/ca-trust:/var/local/ca-trust:ro - name-map-ui: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./cidr-map.txt:/var/www/html/maps/cidr-map.txt:ro - - ./host-map.txt:/var/www/html/maps/host-map.txt:ro - - ./net-map.json:/var/www/html/maps/net-map.json:rw - api: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - nginx-proxy: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./nginx/nginx_ldap.conf:/etc/nginx/nginx_ldap.conf:ro - - ./nginx/htpasswd:/etc/nginx/.htpasswd:ro - - ./nginx/certs:/etc/nginx/certs:ro - - ./nginx/certs/dhparam.pem:/etc/nginx/dhparam/dhparam.pem:ro -``` - -So, for example, if you wanted to make a change to the `nginx-proxy` container's `nginx.conf` file, you could add the following line to the `volumes:` section of the `nginx-proxy` service in your `docker-compose.yml` file: - -``` -- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro -``` - -The change would take effect after stopping and starting Malcolm. - -See the documentation on [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) for more information on this technique. - -### Building Malcolm's Docker images - -Another method for modifying your local copies of Malcolm's services' containers is to [build your own](../../README.md#Build) containers with the modifications baked-in. - -For example, say you wanted to create a Malcolm container which includes a new dashboard for OpenSearch Dashboards and a new enrichment filter `.conf` file for Logstash. After placing these files under `./dashboards/dashboards` and `./logstash/pipelines/enrichment`, respectively, in your Malcolm working copy, run `./build.sh dashboards-helper logstash` to build just those containers. After the build completes, you can run `docker images` and see you have fresh images for `malcolmnetsec/dashboards-helper` and `malcolmnetsec/logstash-oss`. You may need to review the contents of the [Dockerfiles](../../Dockerfiles) to determine the correct service and filesystem location within that service's Docker image depending on what you're trying to accomplish. - -Alternately, if you have forked Malcolm on GitHub, [workflow files](../../.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](#Hedgehog) and [Malcolm](#ISO) installer ISOs. The resulting images are named according to the pattern `ghcr.io/owner/malcolmnetsec/image:branch` (e.g., if you've forked Malcolm with the github user `romeogdetlevjr`, the `arkime` container built for the `main` would be named `ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main`). To run your local instance of Malcolm using these images instead of the official ones, you'll need to edit your `docker-compose.yml` file(s) and replace the `image:` tags according to this new pattern, or use the bash helper script `./shared/bin/github_image_helper.sh` to pull and re-tag the images. - -## Adding a new service (Docker image) - -A new service can be added to Malcolm by following the following steps: - -1. Create a new subdirectory for the service (under the Malcolm working copy base directory) containing whatever source or configuration files are necessary to build and run the service -1. Create the service's Dockerfile in the [Dockerfiles](../../Dockerfiles) directory of your Malcolm working copy -1. Add a new section for your service under `services:` in the `docker-compose.yml` and `docker-compose-standalone.yml` files -1. If you want to enable automatic builds for your service on GitHub, create a new [workflow](../../.github/workflows/), using an existing workflow as an example - -### Networking and firewall - -If your service needs to expose a web interface to the user, you'll need to adjust the following files: - -* Ensure your service's section in the `docker-compose` files uses the `expose` directive to indicate which ports its providing -* Add the service to the `depends_on` section of the `nginx-proxy` service in the `docker-compose` files -* Modify the configuration of the `nginx-proxy` container (in [`nginx/nginx.conf`](../../nginx/nginx.conf)) to define `upstream` and `location` directives to point to your service - -Avoid publishing ports directly from your container to the host machine's network interface if at all possible. The `nginx-proxy` container handles encryption and authentication and should sit in front of any user-facing interface provided by Malcolm. - -## Adding new log fields - -As several of the sections in this document will reference adding new data source fields, we'll cover that here at the beginning. - -Although OpenSearch is a NoSQL database and as-such is "unstructured" and "schemaless," in order to add a new data source field you'll need to define that field in a few places in order for it to show up and be usable throughout Malcolm. Minimally, you'll probably want to do it in these three files - -* [`arkime/etc/config.ini`](../../arkime/etc/config.ini) - follow existing examples in the `[custom-fields]` and `[custom-views]` sections in order for [Arkime](https://arkime.com) to be aware of your new fields -* [`arkime/wise/source.zeeklogs.js`](../../arkime/wise/source.zeeklogs.js) - add new fields to the `allFields` array for Malcolm to create Arkime [value actions](https://arkime.com/settings#right-click) for your fields -* [`dashboards/templates/composable/component/__(name)__.json`](../../dashboards/templates/composable/component/) - add new fields to a new [composable index template](https://opensearch.org/docs/latest/opensearch/index-templates/#composable-index-templates) file in this directory and add its name (prefixed with `custom_`) to the `composed_of` section of [`dashboards/templates/malcolm_template.json`](../../dashboards/templates/malcolm_template.json) in order for it to be included as part of the `arkime_sessions3-*` [index template](https://opensearch.org/docs/latest/opensearch/index-templates/) used by Arkime and OpenSearch Dashboards in Malcolm - -When possible, I recommend you to use (or at least take inspiration from) the [Elastic Common Schema (ECS) Reference](https://www.elastic.co/guide/en/ecs/current/index.html) when deciding how to define new field names. - -## Zeek - -### `local.zeek` - -Some Zeek behavior can be tweaked without having to manually edit configuration files through the use of environment variables: search for `ZEEK` in the [`docker-compose.yml` parameters](../../README.md#DockerComposeYml) section of the documentation. - -Other changes to Zeek's behavior could be made by modifying [local.zeek](../../zeek/config/local.zeek) and either using a [bind mount](#Bind) or [rebuilding](#Build) the `zeek` Docker image with the modification. See the [Zeek documentation](https://docs.zeek.org/en/master/quickstart.html#local-site-customization) for more information on customizing a Zeek instance. Note that changing Zeek's behavior could result in changes to the format of the logs Zeek generates, which could break Malcolm's parsing of those logs, so exercise caution. - -### Adding a new Zeek package - -The easiest way to add a new Zeek package to Malcolm is to add the git URL of that package to the `ZKG_GITHUB_URLS` array in [zeek_install_plugins.sh](../../shared/bin/zeek_install_plugins.sh) script and then [rebuilding](#Build) the `zeek` Docker image. This will cause your package to be installed (via the [`zkg`](https://docs.zeek.org/projects/package-manager/en/stable/zkg.html) command-line tool). See [Parsing new Zeek logs](#LogstashZeek) on how to process any new `.log` files if your package generates them. - -### Zeek Intelligence Framework - -See [Zeek Intelligence Framework](../../README.md#ZeekIntel) in the Malcolm README for information on how to use Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) with Malcolm. - -## PCAP processors - -When a PCAP is uploaded (either through Malcolm's [upload web interface](../../README.md#Upload) or just copied manually into the `./pcap/upload/` directory), the `pcap-monitor` container has a script that picks up those PCAP files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that PCAP. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), there are two of those: the `zeek` container and the `arkime` container. In Malcolm, they actually both share the [same script](../../shared/bin/pcap_processor.py) to read from that topic and run the PCAP through Zeek and Arkime, respectively. If you're looking for an example to follow, the `zeek` container is the less complicated of the two. So, if you were looking to integrate a new PCAP processing tool into Malcolm (named `cooltool` for this example), the process would be something like: - -1. Define your service as instructed in the [Adding a new service](#NewImage) section - * Note how the existing `zeek` and `arkime` services use [bind mounts](#Bind) to access the local `./pcap` directory -1. Write a script (modelled after [the one](../../shared/bin/pcap_processor.py) `zeek` and `arkime` use, if you like) which subscribes to the PCAP topic port (`30441` as defined in [pcap_utils.py](../../shared/bin/pcap_utils.py)) and handles the PCAP files published there, each PCAP file represented by a JSON dictionary with `name`, `tags`, `size`, `type` and `mime` keys (search for `FILE_INFO_` in [pcap_utils.py](../../shared/bin/pcap_utils.py)). This script should be added to and run by your `cooltool.Dockerfile`-generated container. -1. Add whatever other logic needed to get your tool's data into Malcolm, whether by writing it directly info OpenSearch or by sending log files for parsing and enrichment by [Logstash](#Logstash) (especially see the section on [Parsing a new log data source](#LogstashNewSource)) - -While that might be a bit of hand-waving, these general steps take care of the PCAP processing piece: you shouldn't have to really edit any *existing* code to add a new PCAP processor. You're just creating a new container for the Malcolm appliance to the ZeroMQ topic and handle the PCAPs your tool receives. - -The `PCAP_PIPELINE_DEBUG` and `PCAP_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the PCAP processing pipeline. - -## Logstash - -### Parsing a new log data source - -Let's continue with the example of the `cooltool` service we added in the [PCAP processors](#PCAP) section above, assuming that `cooltool` generates some textual log files we want to parse and index into Malcolm. - -You'd have configured `cooltool` in your `cooltool.Dockerfile` and its section in the `docker-compose` files to write logs into a subdirectory or subdirectories in a shared folder [bind mounted](#Bind) in such a way that both the `cooltool` and `filebeat` containers can access. Referring to the `zeek` container as an example, this is how the `./zeek-logs` folder is handled; both the `filebeat` and `zeek` services have `./zeek-logs` in their `volumes:` section: - -``` -$ grep -P "^( - ./zeek-logs| [\w-]+:)" docker-compose.yml | grep -B1 "zeek-logs" - filebeat: - - ./zeek-logs:/data/zeek --- - zeek: - - ./zeek-logs/upload:/zeek/upload -… -``` - -You'll need to provide access to your `cooltool` logs in a similar fashion. - -Next, tweak [`filebeat.yml`](../../filebeat/filebeat.yml) by adding a new log input path pointing to the `cooltool` logs to send them along to the `logstash` container. This modified `filebeat.yml` will need to be reflected in the `filebeat` container via [bind mount](#Bind) or by [rebuilding](#Build) it. - -Logstash can then be easily extended to add more [`logstash/pipelines`](../../logstash/pipelines). At the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), the Logstash pipelines basically look like this: - -* input (from `filebeat`) sends logs to 1..*n* **parse pipelines** -* each **parse pipeline** does what it needs to do to parse its logs then sends them to the [**enrichment pipeline**](#LogstashEnrichments) -* the [**enrichment pipeline**](../../logstash/pipelines/enrichment) performs common lookups to the fields that have been normalized and indexes the logs into the OpenSearch data store - -So, in order to add a new **parse pipeline** for `cooltool` after tweaking [`filebeat.yml`](../../filebeat/filebeat.yml) as described above, create a `cooltool` directory under [`logstash/pipelines`](../../logstash/pipelines) which follows the same pattern as the `zeek` parse pipeline. This directory will have an input file (tiny), a filter file (possibly large), and an output file (tiny). In your filter file, be sure to set the field [`event.hash`](https://www.elastic.co/guide/en/ecs/master/ecs-event.html#field-event-hash) to a unique value to identify indexed documents in OpenSearch; the [fingerprint filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html) may be useful for this. - -Finally, in your `docker-compose` files, set a new `LOGSTASH_PARSE_PIPELINE_ADDRESSES` environment variable under `logstash-variables` to `cooltool-parse,zeek-parse,suricata-parse,beats-parse` (assuming you named the pipeline address from the previous step `cooltool-parse`) so that logs sent from `filebeat` to `logstash` are forwarded to all parse pipelines. - -### Parsing new Zeek logs - -The following modifications must be made in order for Malcolm to be able to parse new Zeek log files: - -1. Add a parsing section to [`logstash/pipelines/zeek/11_zeek_logs.conf`](../../logstash/pipelines/zeek/11_zeek_logs.conf) - * Follow patterns for existing log files as an example - * For common Zeek fields like the `id` four-tuple, timestamp, etc., use the same convention used by existing Zeek logs in that file (e.g., `ts`, `uid`, `orig_h`, `orig_p`, `resp_h`, `resp_p`) - * Take care, especially when copy-pasting filter code, that the Zeek delimiter isn't modified from a tab character to a space character (see "*zeek's default delimiter is a literal tab, MAKE SURE YOUR EDITOR DOESN'T SCREW IT UP*" warnings in that file) -1. If necessary, perform log normalization in [`logstash/pipelines/zeek/12_zeek_normalize.conf`](../../logstash/pipelines/zeek/12_zeek_normalize.conf) for values like action (`event.action`), result (`event.result`), application protocol version (`network.protocol_version`), etc. -1. If necessary, define conversions for floating point or integer values in [`logstash/pipelines/zeek/11_zeek_logs.conf`](../../logstash/pipelines/zeek/13_zeek_convert.conf) -1. Identify the new fields and add them as described in [Adding new log fields](#NewFields) - -### Enrichments - -Malcolm's Logstash instance will do a lot of enrichments for you automatically: see the [enrichment pipeline](../../logstash/pipelines/enrichment), including MAC address to vendor by OUI, GeoIP, ASN, and a few others. In order to take advantage of these enrichments that are already in place, normalize new fields to use the same standardized field names Malcolm uses for things like IP addresses, MAC addresses, etc. You can add your own additional enrichments by creating new `.conf` files containing [Logstash filters](https://www.elastic.co/guide/en/logstash/7.10/filter-plugins.html) in the [enrichment pipeline](../../logstash/pipelines/enrichment) directory and using either of the techniques in the [Local modifications](#LocalMods) section to implement your changes in the `logstash` container - -### Logstash plugins - -The [logstash.Dockerfile](../../Dockerfiles/logstash.Dockerfile) installs the Logstash plugins used by Malcolm (search for `logstash-plugin install` in that file). Additional Logstash plugins could be installed by modifying this Dockerfile and [rebuilding](#Build) the `logstash` Docker image. - -## OpenSearch Dashboards - -[OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) is an open-source fork of [Kibana](https://www.elastic.co/kibana/), which is [no longer open-source software](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0). - -### Adding new visualizations and dashboards - -Visualizations and dashboards can be [easily created](../../README.md#BuildDashboard) in OpenSearch Dashboards using its drag-and-drop WYSIWIG tools. Assuming you've created a new dashboard you wish to package with Malcolm, the dashboard and its visualization components can be exported using the following steps: - -1. Identify the ID of the dashboard (found in the URL: e.g., for `/dashboards/app/dashboards#/view/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` the ID would be `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) -1. Export the dashboard with that ID and save it in the `./dashboards./dashboards/` directory with the following command: - ``` - export DASHID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx && \ - docker-compose exec dashboards curl -XGET \ - "http://localhost:5601/dashboards/api/opensearch-dashboards/dashboards/export?dashboard=$DASHID" > \ - ./dashboards/dashboards/$DASHID.json - ``` -1. It's preferrable for Malcolm to dynamically create the `arkime_sessions3-*` index template rather than including it in imported dashboards, so edit the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.json` that was generated, and carefully locate and remove the section with the `id` of `arkime_sessions3-*` and the `type` of `index-pattern` (including the comma preceding it): - ``` - , - { - "id": "arkime_sessions3-*", - "type": "index-pattern", - "namespaces": [ - "default" - ], - "updated_at": "2021-12-13T18:21:42.973Z", - "version": "Wzk3MSwxXQ==", - … - "references": [], - "migrationVersion": { - "index-pattern": "7.6.0" - } - } - ``` -1. Include the new dashboard either by using a [bind mount](#Bind) for the `./dashboards./dashboards/` directory or by [rebuilding](#Build) the `dashboards-helper` Docker image. Dashboards are imported the first time Malcolm starts up. - -### OpenSearch Dashboards plugins - -The [dashboards.Dockerfile](../../Dockerfiles/dashboards.Dockerfile) installs the OpenSearch Dashboards plugins used by Malcolm (search for `opensearch-dashboards-plugin install` in that file). Additional Dashboards plugins could be installed by modifying this Dockerfile and [rebuilding](#Build) the `dashboards` Docker image. - -Third-party or community plugisn developed for Kibana will not install into OpenSearch dashboards without source code modification. Depending on the plugin, this could range from very smiple to very complex. As an illustrative example, the changes that were required to port the Sankey diagram visualization plugin from Kibana to OpenSearch Dashboards compatibility can be [viewed on GitHub](https://github.com/mmguero-dev/osd_sankey_vis/compare/edacf6b...main). - -## Carved file scanners - -Similar to the [PCAP processing pipeline](#PCAP) described above, new tools can plug into Malcolm's [automatic file extraction and scanning](../../README.md#ZeekFileExtraction) to examine file transfers carved from network traffic. - -When Zeek extracts a file it observes being transfered in network traffic, the `file-monitor` container picks up those extracted files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that extracted file. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), currently implemented file scanners include ClamAV, YARA, capa and VirusTotal, all of which are managed by the `file-monitor` container. The scripts involved in this code are: - -* [shared/bin/zeek_carve_watcher.py](../../shared/bin/zeek_carve_watcher.py) - watches the directory to which Zeek extracts files and publishes information about those files to the ZeroMQ ventilator on port 5987 -* [shared/bin/zeek_carve_scanner.py](../../shared/bin/zeek_carve_scanner.py) - subscribes to `zeek_carve_watcher.py`'s topic and performs file scanning for the ClamAV, YARA, capa and VirusTotal engines and sends "hits" to another ZeroMQ sync on port 5988 -* [shared/bin/zeek_carve_logger.py](../../shared/bin/zeek_carve_logger.py) - subscribes to `zeek_carve_scanner.py`'s topic and logs hits to a "fake" Zeek signatures.log file which is parsed and ingested by Logstash -* [shared/bin/zeek_carve_utils.py](../../shared/bin/zeek_carve_utils.py) - various variables and classes related to carved file scanning - -Additional file scanners could either be added to the `file-monitor` service, or to avoid coupling with Malcolm's code you could simply define a new service as instructed in the [Adding a new service](#NewImage) section and write your own scripts to subscribe and publish to the topics as described above. While that might be a bit of hand-waving, these general steps take care of the plumbing around extracting the file and notifying your tool, as well as handling the logging of "hits": you shouldn't have to really edit any *existing* code to add a new carved file scanner. - -The `EXTRACTED_FILE_PIPELINE_DEBUG` and `EXTRACTED_FILE_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the carved file processing pipeline. - -## Style - -### Python - -For Python code found in Malcolm, the author uses [Black: The uncompromising Python code formatter](https://github.com/psf/black) with the options `--line-length 120 --skip-string-normalization`. - ## Copyright [Malcolm](https://github.com/idaholab/Malcolm) is Copyright 2022 Battelle Energy Alliance, LLC, and is developed and released through the cooperation of the [Cybersecurity and Infrastructure Security Agency](https://www.cisa.gov/) of the [U.S. Department of Homeland Security](https://www.dhs.gov/). diff --git a/docs/hedgehog-config-root.md b/docs/hedgehog-config-root.md new file mode 100644 index 000000000..eafe5e2a0 --- /dev/null +++ b/docs/hedgehog-config-root.md @@ -0,0 +1,47 @@ +## Interfaces, hostname, and time synchronization + +### Hostname + +The first step of sensor configuration is to configure the network interfaces and sensor hostname. Clicking the **Configure Interfaces and Hostname** toolbar icon (or, if you are at a command line prompt, running `configure-interfaces`) will prompt you for the root password you created during installation, after which the configuration welcome screen is shown. Select **Continue** to proceed. + +You may next select whether to configure the network interfaces, hostname, or time synchronization. + +![Selection to configure network interfaces, hostname, or time synchronization](./docs/images/root_config_mode.png) + +Selecting **Hostname**, you will be presented with a summary of the current sensor identification information, after which you may specify a new sensor hostname. This name will be used to tag all events forwarded from this sensor in the events' **host.name** field. + +![Specifying a new sensor hostname](./docs/images/hostname_setting.png) + +### Interfaces + +Returning to the configuration mode selection, choose **Interface**. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. + +You will be presented with a list of interfaces to configure as the sensor management interface. This is the interface the sensor itself will use to communicate with the network in order to, for example, forward captured logs to an aggregate server. In order to do so, the management interface must be assigned an IP address. This is generally **not** the interface used for capturing data. Select the interface to which you wish to assign an IP address. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. + +![Management interface selection](./docs/images/select_iface.png) + +Depending on the configuration of your network, you may now specify how the management interface will be assigned an IP address. In order to communicate with an event aggregator over the management interface, either **static** or **dhcp** must be selected. + +![Interface address source](./docs/images/iface_mode.png) + +If you select static, you will be prompted to enter the IP address, netmask, and gateway to assign to the management interface. + +![Static IP configuration](./docs/images/iface_static.png) + +In either case, upon selecting **OK** the network interface will be brought down, configured, and brought back up, and the result of the operation will be displayed. You may choose **Quit** upon returning to the configuration tool's welcome screen. + +### Time synchronization + +Returning to the configuration mode selection, choose **Time Sync**. Here you can configure the sensor to keep its time synchronized with either an NTP server (using the NTP protocol) or a local [Malcolm](https://github.com/idaholab/Malcolm) aggregator or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure. + +![Time synchronization method](./docs/images/time_sync_mode.png) + +If **htpdate** is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port `9200` may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server. + +![*htpdate* configuration](./docs/images/htpdate_setup.png) + +If *ntpdate* is selected, you will be prompted to enter the IP address or hostname of the NTP server. + +![NTP configuration](./docs/images/ntp_host.png) + +Upon configuring time synchronization, a "Time synchronization configured successfully!" message will be displayed, after which you will be returned to the welcome screen. \ No newline at end of file diff --git a/docs/hedgehog-config-user.md b/docs/hedgehog-config-user.md new file mode 100644 index 000000000..64e2a9ce0 --- /dev/null +++ b/docs/hedgehog-config-user.md @@ -0,0 +1,189 @@ +## Capture, forwarding, and autostart services + +Clicking the **Configure Capture and Forwarding** toolbar icon (or, if you are at a command prompt, running `configure-capture`) will launch the configuration tool for capture and forwarding. The root password is not required as it was for the interface and hostname configuration, as sensor services are run under the non-privileged sensor account. Select **Continue** to proceed. You may select from a list of configuration options. + +![Select configuration mode](./docs/images/capture_config_main.png) + +### Capture + +Choose **Configure Capture** to configure parameters related to traffic capture and local analysis. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. + +You will be presented with a list of network interfaces and prompted to select one or more capture interfaces. An interface used to capture traffic is generally a different interface than the one selected previously as the management interface, and each capture interface should be connected to a network tap or span port for traffic monitoring. Capture interfaces are usually not assigned an IP address as they are only used to passively “listen” to the traffic on the wire. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. + +![Select capture interfaces](./docs/images/capture_iface_select.png) + +Upon choosing the capture interfaces and selecting OK, you may optionally provide a capture filter. This filter will be used to limit what traffic the PCAP service ([`tcpdump`](https://www.tcpdump.org/)) and the traffic analysis services ([`zeek`](https://www.zeek.org/) and [`suricata`](https://suricata.io/)) will see. Capture filters are specified using [Berkeley Packet Filter (BPF)](http://biot.com/capstats/bpf.html) syntax. Clicking **OK** will attempt to validate the capture filter, if specified, and will present a warning if the filter is invalid. + +![Specify capture filters](./docs/images/capture_filter.png) + +Next you must specify the paths where captured PCAP files and logs will be stored locally on the sensor. If the installation worked as expected, these paths should be prepopulated to reflect paths on the volumes formatted at install time for the purpose storing these artifacts. Usually these paths will exist on separate storage volumes. Enabling the PCAP and log pruning autostart services (see the section on autostart services below) will enable monitoring of these paths to ensure that their contents do not consume more than 90% of their respective volumes' space. Choose **OK** to continue. + +![Specify capture paths](./docs/images/capture_paths.png) + +#### Automatic file extraction and scanning + +Hedgehog Linux can leverage Zeek's knowledge of network protocols to automatically detect file transfers and extract those files from network traffic as Zeek sees them. + +To specify which files should be extracted, specify the Zeek file carving mode: + +![Zeek file carving mode](./docs/images/zeek_file_carve_mode.png) + +If you're not sure what to choose, either of **mapped (except common plain text files)** (if you want to carve and scan almost all files) or **interesting** (if you only want to carve and scan files with [mime types of common attack vectors](./interface/sensor_ctl/zeek/extractor_override.interesting.zeek)) is probably a good choice. + +Next, specify which carved files to preserve (saved on the sensor under `/capture/bro/capture/extract_files/quarantine` by default). In order to not consume all of the sensor's available storage space, the oldest preserved files will be pruned along with the oldest Zeek logs as described below with **AUTOSTART_PRUNE_ZEEK** in the [autostart services](#ConfigAutostart) section. + +You'll be prompted to specify which engine(s) to use to analyze extracted files. Extracted files can be examined through any of three methods: + +![File scanners](./docs/images/zeek_file_carve_scanners.png) + +* scanning files with [**ClamAV**](https://www.clamav.net/); to enable this method, select **ZEEK_FILE_SCAN_CLAMAV** when specifying scanners for Zeek-carved files +* submitting file hashes to [**VirusTotal**](https://www.virustotal.com/en/#search); to enable this method, select **ZEEK_FILE_SCAN_VTOT** when specifying scanners for Zeek-carved files, then manually edit `/opt/sensor/sensor_ctl/control_vars.conf` and specify your [VirusTotal API key](https://developers.virustotal.com/reference) in `VTOT_API2_KEY` +* scanning files with [**Yara**](https://github.com/VirusTotal/yara); to enable this method, select **ZEEK_FILE_SCAN_YARA** when specifying scanners for Zeek-carved files +* scanning portable executable (PE) files with [**Capa**](https://github.com/fireeye/capa); to enable this method, select **ZEEK_FILE_SCAN_CAPA** when specifying scanners for Zeek-carved files + +Files which are flagged as potentially malicious will be logged as Zeek `signatures.log` entries, and can be viewed in the **Signatures** dashboard in [OpenSearch Dashboards](https://github.com/idaholab/Malcolm#DashboardsVisualizations) when forwarded to Malcolm. + +![File quarantine](./docs/images/file_quarantine.png) + +Finally, you will be presented with the list of configuration variables that will be used for capture, including the values which you have configured up to this point in this section. Upon choosing **OK** these values will be written back out to the sensor configuration file located at `/opt/sensor/sensor_ctl/control_vars.conf`. It is not recommended that you edit this file manually. After confirming these values, you will be presented with a confirmation that these settings have been written to the configuration file, and you will be returned to the welcome screen. + +### Forwarding + +Select **Configure Forwarding** to set up forwarding logs and statistics from the sensor to an aggregator server, such as [Malcolm](https://github.com/idaholab/Malcolm). + +![Configure forwarders](./docs/images/forwarder_config.png) + +There are five forwarder services used on the sensor, each for forwarding a different type of log or sensor metric. + +### capture: Arkime session forwarding + +[capture](https://github.com/arkime/arkime/tree/master/capture) is not only used to capture PCAP files, but also the parse raw traffic into sessions and forward this session metadata to an [OpenSearch](https://opensearch.org/) database so that it can be viewed in [Arkime viewer](https://arkime.com/), whether standalone or as part of a [Malcolm](https://github.com/idaholab/Malcolm) instance. If you're using Hedgehog Linux with Malcolm, please read [Correlating Zeek logs and Arkime sessions](https://github.com/idaholab/Malcolm#ZeekArkimeFlowCorrelation) in the Malcolm documentation for more information. + +First, select the OpenSearch connection transport protocol, either **HTTPS** or **HTTP**. If the metrics are being forwarded to Malcolm, select **HTTPS** to encrypt messages from the sensor to the aggregator using TLS v1.2 using ECDHE-RSA-AES128-GCM-SHA256. If **HTTPS** is chosen, you must choose whether to enable SSL certificate verification. If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication)), choose **None**. + +![OpenSearch connection protocol](./docs/images/opensearch_connection_protocol.png) ![OpenSearch SSL verification](./docs/images/opensearch_ssl_verification.png) + +Next, enter the **OpenSearch host** IP address (ie., the IP address of the aggregator) and port. These metrics are written to an OpenSearch database using a RESTful API, usually using port 9200. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. + +![OpenSearch host and port](./docs/images/arkime-capture-ip-port.png) + +You will be asked to enter authentication credentials for the sensor's connections to the aggregator's OpenSearch API. After you've entered the username and the password, the sensor will attempt a test connection to OpenSearch using the connection information provided. + +![OpenSearch username](./docs/images/opensearch_username.png) ![OpenSearch password](./docs/images/opensearch_password.png) ![Successful OpenSearch connection](./docs/images/opensearch_connection_success.png) + +Finally, you will be shown a dialog for a list of IP addresses used to populate an access control list (ACL) for hosts allowed to connect back to the sensor for retrieving session payloads from its PCAP files for display in Arkime viewer. The list will be prepopulated with the IP address entered a few screens prior to this one. + +![PCAP retrieval ACL](./docs/images/malcolm_arkime_reachback_acl.png) + +Finally, you'll be given the opportunity to review the all of the Arkime `capture` options you've specified. Selecting **OK** will cause the parameters to be saved and you will be returned to the configuration tool's welcome screen. + +![capture settings confirmation](./docs/images/arkime_confirm.png) + +### filebeat: Zeek and Suricata log forwarding + +[Filebeat](https://www.elastic.co/products/beats/filebeat) is used to forward [Zeek](https://www.zeek.org/) and [Suricata](https://suricata.io/) logs to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. + +To configure filebeat, first provide the log path (the same path previously configured for log file generation). + +![Configure filebeat for log forwarding](./docs/images/filebeat_log_path.png) + +You must also provide the IP address of the Logstash instance to which the logs are to be forwarded, and the port on which Logstash is listening. These logs are forwarded using the Beats protocol, generally over port 5044. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. + +![Configure filebeat for log forwrading](./docs/images/filebeat_ip_port.png) + +Next you are asked whether the connection used for log forwarding should be done **unencrypted** or over **SSL**. Unencrypted communication requires less processing overhead and is simpler to configure, but the contents of the logs may be visible to anyone who is able to intercept that traffic. + +![Filebeat SSL certificate verification](./docs/images/filebeat_ssl.png) + +If **SSL** is chosen, you must choose whether to enable [SSL certificate verification](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html). If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication), choose **None**. + +![Unencrypted vs. SSL encryption for log forwarding](./docs/images/filebeat_ssl_verify.png) + +The last step for SSL-encrypted log forwarding is to specify the SSL certificate authority, certificate, and key files. These files must match those used by the Logstash instance receiving the logs on the aggregator. If Malcolm's `auth_setup` script was used to generate these files they would be found in the `filebeat/certs/` subdirectory of the Malcolm installation and must be manually copied to the sensor (stored under `/opt/sensor/sensor_ctl/logstash-client-certificates` or in any other path accessible to the sensor account). Specify the location of the certificate authorities file (eg., `ca.crt`), the certificate file (eg., `client.crt`), and the key file (eg., `client.key`). + +![SSL certificate files](./docs/images/filebeat_certs.png) + +The Logstash instance receiving the events must be similarly configured with matching SSL certificate and key files. Under Malcolm, the `BEATS_SSL` variable must be set to `true` in Malcolm's `docker-compose.yml` file and the SSL files must exist in the `logstash/certs/` subdirectory of the Malcolm installation. + +Once you have specified all of the filebeat parameters, you will be presented with a summary of the settings related to the forwarding of these logs. Selecting **OK** will cause the parameters to be written to filebeat's configuration keystore under `/opt/sensor/sensor_ctl/logstash-client-certificates` and you will be returned to the configuration tool's welcome screen. + +![Confirm filebeat settings](./docs/images/filebeat_confirm.png) + +### miscbeat: System metrics forwarding + +The sensor uses [Fluent Bit](https://fluentbit.io/) to gather miscellaneous system resource metrics (CPU, network I/O, disk I/O, memory utilization, temperature, etc.) and the [Beats](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) protocol to forward these metrics to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. Metrics categories can be enabled/disabled as described in the [autostart services](#ConfigAutostart) section of this document. + +This forwarder's configuration is almost identical to that of [filebeat](#filebeat) in the previous section. Select `miscbeat` from the forwarding configuration mode options and follow the same steps outlined above to set up this forwarder. + +### Autostart services + +Once the forwarders have been configured, the final step is to **Configure Autostart Services**. Choose this option from the configuration mode menu after the welcome screen of the sensor configuration tool. + +Despite configuring capture and/or forwarder services as described in previous sections, only services enabled in the autostart configuration will run when the sensor starts up. The available autostart processes are as follows (recommended services are in **bold text**): + +* **AUTOSTART_ARKIME** - [capture](#arkime-capture) PCAP engine for traffic capture, as well as traffic parsing and metadata insertion into OpenSearch for viewing in [Arkime](https://arkime.com/). If you are using Hedgehog Linux along with [Malcolm](https://github.com/idaholab/Malcolm) or another Arkime installation, this is probably the packet capture engine you want to use. +* **AUTOSTART_CLAMAV_UPDATES** - Virus database update service for ClamAV (requires sensor to be connected to the internet) +* **AUTOSTART_FILEBEAT** - [filebeat](#filebeat) Zeek and Suricata log forwarder +* **AUTOSTART_FLUENTBIT_AIDE** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/exec) [AIDE](https://aide.github.io/) file system integrity checks +* **AUTOSTART_FLUENTBIT_AUDITLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/tail) [auditd](https://man7.org/linux/man-pages/man8/auditd.8.html) logs +* *AUTOSTART_FLUENTBIT_KMSG* - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/kernel-logs) the Linux kernel log buffer (these are generally reflected in syslog as well, which may make this agent redundant) +* **AUTOSTART_FLUENTBIT_METRICS** - [Fluent Bit](https://fluentbit.io/) agent for collecting [various](https://docs.fluentbit.io/manual/pipeline/inputs) system resource and performance metrics +* **AUTOSTART_FLUENTBIT_SYSLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/syslog) Linux syslog messages +* **AUTOSTART_FLUENTBIT_THERMAL** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/thermal) system temperatures +* **AUTOSTART_MISCBEAT** - [filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) forwarder which sends system metrics collected by [Fluent Bit](https://fluentbit.io/) to a remote Logstash instance (e.g., [Malcolm](https://github.com/idaholab/Malcolm)'s) +* *AUTOSTART_NETSNIFF* - [netsniff-ng](http://netsniff-ng.org/) PCAP engine for saving packet capture (PCAP) files +* **AUTOSTART_PRUNE_PCAP** - storage space monitor to ensure that PCAP files do not consume more than 90% of the total size of the storage volume to which PCAP files are written +* **AUTOSTART_PRUNE_ZEEK** - storage space monitor to ensure that Zeek logs do not consume more than 90% of the total size of the storage volume to which Zeek logs are written +* **AUTOSTART_SURICATA** - [Suricata](https://suricata.io/) traffic analysis engine +* **AUTOSTART_SURICATA_UPDATES** - Rule update service for Suricata (requires sensor to be connected to the internet) +* *AUTOSTART_TCPDUMP* - [tcpdump](https://www.tcpdump.org/) PCAP engine for saving packet capture (PCAP) files +* **AUTOSTART_ZEEK** - [Zeek](https://www.zeek.org/) traffic analysis engine + +Note that only one packet capture engine ([capture](https://arkime.com/), [netsniff-ng](http://netsniff-ng.org/), or [tcpdump](https://www.tcpdump.org/)) can be used. + +![Autostart services](./docs/images/autostarts.png) + +Once you have selected the autostart services, you will be prompted to confirm your selections. Doing so will cause these values to be written back out to the `/opt/sensor/sensor_ctl/control_vars.conf` configuration file. + +![Autostart services confirmation](./docs/images/autostarts_confirm.png) + +After you have completed configuring the sensor it is recommended that you reboot the sensor to ensure all new settings take effect. If rebooting is not an option, you may click the **Restart Sensor Services** menu icon in the top menu bar, or open a terminal and run: + +``` +/opt/sensor/sensor_ctl/shutdown && sleep 10 && /opt/sensor/sensor_ctl/supervisor.sh +``` + +This will cause the sensor services controller to stop, wait a few seconds, and restart. You can check the status of the sensor's processes by choosing **Sensor Status** from the sensor's kiosk mode, clicking the **Sensor Service Status** toolbar icon, or running `/opt/sensor/sensor_ctl/status` from the command line: + +``` +$ /opt/sensor/sensor_ctl/status +arkime:arkime-capture RUNNING pid 6455, uptime 0:03:17 +arkime:arkime-viewer RUNNING pid 6456, uptime 0:03:17 +beats:filebeat RUNNING pid 6457, uptime 0:03:17 +beats:miscbeat RUNNING pid 6458, uptime 0:03:17 +clamav:clamav-service RUNNING pid 6459, uptime 0:03:17 +clamav:clamav-updates RUNNING pid 6461, uptime 0:03:17 +fluentbit-auditlog RUNNING pid 6463, uptime 0:03:17 +fluentbit-kmsg STOPPED Not started +fluentbit-metrics:cpu RUNNING pid 6466, uptime 0:03:17 +fluentbit-metrics:df RUNNING pid 6471, uptime 0:03:17 +fluentbit-metrics:disk RUNNING pid 6468, uptime 0:03:17 +fluentbit-metrics:mem RUNNING pid 6472, uptime 0:03:17 +fluentbit-metrics:mem_p RUNNING pid 6473, uptime 0:03:17 +fluentbit-metrics:netif RUNNING pid 6474, uptime 0:03:17 +fluentbit-syslog RUNNING pid 6478, uptime 0:03:17 +fluentbit-thermal RUNNING pid 6480, uptime 0:03:17 +netsniff:netsniff-enp1s0 STOPPED Not started +prune:prune-pcap RUNNING pid 6484, uptime 0:03:17 +prune:prune-zeek RUNNING pid 6486, uptime 0:03:17 +supercronic RUNNING pid 6490, uptime 0:03:17 +suricata RUNNING pid 6501, uptime 0:03:17 +tcpdump:tcpdump-enp1s0 STOPPED Not started +zeek:capa RUNNING pid 6553, uptime 0:03:17 +zeek:clamav RUNNING pid 6512, uptime 0:03:17 +zeek:logger RUNNING pid 6554, uptime 0:03:17 +zeek:virustotal STOPPED Not started +zeek:watcher RUNNING pid 6510, uptime 0:03:17 +zeek:yara RUNNING pid 6548, uptime 0:03:17 +zeek:zeekctl RUNNING pid 6502, uptime 0:03:17 +``` \ No newline at end of file diff --git a/docs/hedgehog-config-zeek-intel.md b/docs/hedgehog-config-zeek-intel.md new file mode 100644 index 000000000..e50966346 --- /dev/null +++ b/docs/hedgehog-config-zeek-intel.md @@ -0,0 +1,7 @@ +### Zeek Intelligence Framework + +To quote Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) documentation, "The goals of Zeek’s Intelligence Framework are to consume intelligence data, make it available for matching, and provide infrastructure to improve performance and memory utilization. Data in the Intelligence Framework is an atomic piece of intelligence such as an IP address or an e-mail address. This atomic data will be packed with metadata such as a freeform source field, a freeform descriptive field, and a URL which might lead to more information about the specific item." Zeek [intelligence](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html) [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type) include IP addresses, URLs, file names, hashes, email addresses, and more. + +Hedgehog Linux doesn't come bundled with intelligence files from any particular feed, but they can be easily included into your local instance. Before Zeek starts, Hedgehog Linux configures it such that intelligence files will be automatically included in its local policy. Subdirectories under `/opt/sensor/sensor_ctl/zeek/intel` which contain their own `__load__.zeek` file will be `@load`-ed as-is, while subdirectories containing "loose" intelligence files will be [loaded](https://docs.zeek.org/en/master/frameworks/intel.html#loading-intelligence) automatically with a `redef Intel::read_files` directive. + +Note that Hedgehog Linux does not manage updates for these intelligence files. You should use the update mechanism suggested by your feeds' maintainers to keep them up to date. Adding and deleting intelligence files under this directory will take effect upon restarting Zeek. \ No newline at end of file diff --git a/docs/hedgehog-config.md b/docs/hedgehog-config.md index e7dbdba9e..b37bf2eff 100644 --- a/docs/hedgehog-config.md +++ b/docs/hedgehog-config.md @@ -15,248 +15,3 @@ Several icons are available in the top menu bar: * **Configure interfaces and hostname** – opens a dialog for configuring the sensor's network interfaces and setting the sensor's hostname * **Restart sensor services** - stops and restarts all of the [autostart services](#ConfigAutostart) -## Interfaces, hostname, and time synchronization - -### Hostname - -The first step of sensor configuration is to configure the network interfaces and sensor hostname. Clicking the **Configure Interfaces and Hostname** toolbar icon (or, if you are at a command line prompt, running `configure-interfaces`) will prompt you for the root password you created during installation, after which the configuration welcome screen is shown. Select **Continue** to proceed. - -You may next select whether to configure the network interfaces, hostname, or time synchronization. - -![Selection to configure network interfaces, hostname, or time synchronization](./docs/images/root_config_mode.png) - -Selecting **Hostname**, you will be presented with a summary of the current sensor identification information, after which you may specify a new sensor hostname. This name will be used to tag all events forwarded from this sensor in the events' **host.name** field. - -![Specifying a new sensor hostname](./docs/images/hostname_setting.png) - -### Interfaces - -Returning to the configuration mode selection, choose **Interface**. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. - -You will be presented with a list of interfaces to configure as the sensor management interface. This is the interface the sensor itself will use to communicate with the network in order to, for example, forward captured logs to an aggregate server. In order to do so, the management interface must be assigned an IP address. This is generally **not** the interface used for capturing data. Select the interface to which you wish to assign an IP address. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. - -![Management interface selection](./docs/images/select_iface.png) - -Depending on the configuration of your network, you may now specify how the management interface will be assigned an IP address. In order to communicate with an event aggregator over the management interface, either **static** or **dhcp** must be selected. - -![Interface address source](./docs/images/iface_mode.png) - -If you select static, you will be prompted to enter the IP address, netmask, and gateway to assign to the management interface. - -![Static IP configuration](./docs/images/iface_static.png) - -In either case, upon selecting **OK** the network interface will be brought down, configured, and brought back up, and the result of the operation will be displayed. You may choose **Quit** upon returning to the configuration tool's welcome screen. - -### Time synchronization - -Returning to the configuration mode selection, choose **Time Sync**. Here you can configure the sensor to keep its time synchronized with either an NTP server (using the NTP protocol) or a local [Malcolm](https://github.com/idaholab/Malcolm) aggregator or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure. - -![Time synchronization method](./docs/images/time_sync_mode.png) - -If **htpdate** is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port `9200` may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server. - -![*htpdate* configuration](./docs/images/htpdate_setup.png) - -If *ntpdate* is selected, you will be prompted to enter the IP address or hostname of the NTP server. - -![NTP configuration](./docs/images/ntp_host.png) - -Upon configuring time synchronization, a "Time synchronization configured successfully!" message will be displayed, after which you will be returned to the welcome screen. - -## Capture, forwarding, and autostart services - -Clicking the **Configure Capture and Forwarding** toolbar icon (or, if you are at a command prompt, running `configure-capture`) will launch the configuration tool for capture and forwarding. The root password is not required as it was for the interface and hostname configuration, as sensor services are run under the non-privileged sensor account. Select **Continue** to proceed. You may select from a list of configuration options. - -![Select configuration mode](./docs/images/capture_config_main.png) - -### Capture - -Choose **Configure Capture** to configure parameters related to traffic capture and local analysis. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. - -You will be presented with a list of network interfaces and prompted to select one or more capture interfaces. An interface used to capture traffic is generally a different interface than the one selected previously as the management interface, and each capture interface should be connected to a network tap or span port for traffic monitoring. Capture interfaces are usually not assigned an IP address as they are only used to passively “listen” to the traffic on the wire. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. - -![Select capture interfaces](./docs/images/capture_iface_select.png) - -Upon choosing the capture interfaces and selecting OK, you may optionally provide a capture filter. This filter will be used to limit what traffic the PCAP service ([`tcpdump`](https://www.tcpdump.org/)) and the traffic analysis services ([`zeek`](https://www.zeek.org/) and [`suricata`](https://suricata.io/)) will see. Capture filters are specified using [Berkeley Packet Filter (BPF)](http://biot.com/capstats/bpf.html) syntax. Clicking **OK** will attempt to validate the capture filter, if specified, and will present a warning if the filter is invalid. - -![Specify capture filters](./docs/images/capture_filter.png) - -Next you must specify the paths where captured PCAP files and logs will be stored locally on the sensor. If the installation worked as expected, these paths should be prepopulated to reflect paths on the volumes formatted at install time for the purpose storing these artifacts. Usually these paths will exist on separate storage volumes. Enabling the PCAP and log pruning autostart services (see the section on autostart services below) will enable monitoring of these paths to ensure that their contents do not consume more than 90% of their respective volumes' space. Choose **OK** to continue. - -![Specify capture paths](./docs/images/capture_paths.png) - -#### Automatic file extraction and scanning - -Hedgehog Linux can leverage Zeek's knowledge of network protocols to automatically detect file transfers and extract those files from network traffic as Zeek sees them. - -To specify which files should be extracted, specify the Zeek file carving mode: - -![Zeek file carving mode](./docs/images/zeek_file_carve_mode.png) - -If you're not sure what to choose, either of **mapped (except common plain text files)** (if you want to carve and scan almost all files) or **interesting** (if you only want to carve and scan files with [mime types of common attack vectors](./interface/sensor_ctl/zeek/extractor_override.interesting.zeek)) is probably a good choice. - -Next, specify which carved files to preserve (saved on the sensor under `/capture/bro/capture/extract_files/quarantine` by default). In order to not consume all of the sensor's available storage space, the oldest preserved files will be pruned along with the oldest Zeek logs as described below with **AUTOSTART_PRUNE_ZEEK** in the [autostart services](#ConfigAutostart) section. - -You'll be prompted to specify which engine(s) to use to analyze extracted files. Extracted files can be examined through any of three methods: - -![File scanners](./docs/images/zeek_file_carve_scanners.png) - -* scanning files with [**ClamAV**](https://www.clamav.net/); to enable this method, select **ZEEK_FILE_SCAN_CLAMAV** when specifying scanners for Zeek-carved files -* submitting file hashes to [**VirusTotal**](https://www.virustotal.com/en/#search); to enable this method, select **ZEEK_FILE_SCAN_VTOT** when specifying scanners for Zeek-carved files, then manually edit `/opt/sensor/sensor_ctl/control_vars.conf` and specify your [VirusTotal API key](https://developers.virustotal.com/reference) in `VTOT_API2_KEY` -* scanning files with [**Yara**](https://github.com/VirusTotal/yara); to enable this method, select **ZEEK_FILE_SCAN_YARA** when specifying scanners for Zeek-carved files -* scanning portable executable (PE) files with [**Capa**](https://github.com/fireeye/capa); to enable this method, select **ZEEK_FILE_SCAN_CAPA** when specifying scanners for Zeek-carved files - -Files which are flagged as potentially malicious will be logged as Zeek `signatures.log` entries, and can be viewed in the **Signatures** dashboard in [OpenSearch Dashboards](https://github.com/idaholab/Malcolm#DashboardsVisualizations) when forwarded to Malcolm. - -![File quarantine](./docs/images/file_quarantine.png) - -Finally, you will be presented with the list of configuration variables that will be used for capture, including the values which you have configured up to this point in this section. Upon choosing **OK** these values will be written back out to the sensor configuration file located at `/opt/sensor/sensor_ctl/control_vars.conf`. It is not recommended that you edit this file manually. After confirming these values, you will be presented with a confirmation that these settings have been written to the configuration file, and you will be returned to the welcome screen. - -### Forwarding - -Select **Configure Forwarding** to set up forwarding logs and statistics from the sensor to an aggregator server, such as [Malcolm](https://github.com/idaholab/Malcolm). - -![Configure forwarders](./docs/images/forwarder_config.png) - -There are five forwarder services used on the sensor, each for forwarding a different type of log or sensor metric. - -### capture: Arkime session forwarding - -[capture](https://github.com/arkime/arkime/tree/master/capture) is not only used to capture PCAP files, but also the parse raw traffic into sessions and forward this session metadata to an [OpenSearch](https://opensearch.org/) database so that it can be viewed in [Arkime viewer](https://arkime.com/), whether standalone or as part of a [Malcolm](https://github.com/idaholab/Malcolm) instance. If you're using Hedgehog Linux with Malcolm, please read [Correlating Zeek logs and Arkime sessions](https://github.com/idaholab/Malcolm#ZeekArkimeFlowCorrelation) in the Malcolm documentation for more information. - -First, select the OpenSearch connection transport protocol, either **HTTPS** or **HTTP**. If the metrics are being forwarded to Malcolm, select **HTTPS** to encrypt messages from the sensor to the aggregator using TLS v1.2 using ECDHE-RSA-AES128-GCM-SHA256. If **HTTPS** is chosen, you must choose whether to enable SSL certificate verification. If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication)), choose **None**. - -![OpenSearch connection protocol](./docs/images/opensearch_connection_protocol.png) ![OpenSearch SSL verification](./docs/images/opensearch_ssl_verification.png) - -Next, enter the **OpenSearch host** IP address (ie., the IP address of the aggregator) and port. These metrics are written to an OpenSearch database using a RESTful API, usually using port 9200. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. - -![OpenSearch host and port](./docs/images/arkime-capture-ip-port.png) - -You will be asked to enter authentication credentials for the sensor's connections to the aggregator's OpenSearch API. After you've entered the username and the password, the sensor will attempt a test connection to OpenSearch using the connection information provided. - -![OpenSearch username](./docs/images/opensearch_username.png) ![OpenSearch password](./docs/images/opensearch_password.png) ![Successful OpenSearch connection](./docs/images/opensearch_connection_success.png) - -Finally, you will be shown a dialog for a list of IP addresses used to populate an access control list (ACL) for hosts allowed to connect back to the sensor for retrieving session payloads from its PCAP files for display in Arkime viewer. The list will be prepopulated with the IP address entered a few screens prior to this one. - -![PCAP retrieval ACL](./docs/images/malcolm_arkime_reachback_acl.png) - -Finally, you'll be given the opportunity to review the all of the Arkime `capture` options you've specified. Selecting **OK** will cause the parameters to be saved and you will be returned to the configuration tool's welcome screen. - -![capture settings confirmation](./docs/images/arkime_confirm.png) - -### filebeat: Zeek and Suricata log forwarding - -[Filebeat](https://www.elastic.co/products/beats/filebeat) is used to forward [Zeek](https://www.zeek.org/) and [Suricata](https://suricata.io/) logs to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. - -To configure filebeat, first provide the log path (the same path previously configured for log file generation). - -![Configure filebeat for log forwarding](./docs/images/filebeat_log_path.png) - -You must also provide the IP address of the Logstash instance to which the logs are to be forwarded, and the port on which Logstash is listening. These logs are forwarded using the Beats protocol, generally over port 5044. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. - -![Configure filebeat for log forwrading](./docs/images/filebeat_ip_port.png) - -Next you are asked whether the connection used for log forwarding should be done **unencrypted** or over **SSL**. Unencrypted communication requires less processing overhead and is simpler to configure, but the contents of the logs may be visible to anyone who is able to intercept that traffic. - -![Filebeat SSL certificate verification](./docs/images/filebeat_ssl.png) - -If **SSL** is chosen, you must choose whether to enable [SSL certificate verification](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html). If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication), choose **None**. - -![Unencrypted vs. SSL encryption for log forwarding](./docs/images/filebeat_ssl_verify.png) - -The last step for SSL-encrypted log forwarding is to specify the SSL certificate authority, certificate, and key files. These files must match those used by the Logstash instance receiving the logs on the aggregator. If Malcolm's `auth_setup` script was used to generate these files they would be found in the `filebeat/certs/` subdirectory of the Malcolm installation and must be manually copied to the sensor (stored under `/opt/sensor/sensor_ctl/logstash-client-certificates` or in any other path accessible to the sensor account). Specify the location of the certificate authorities file (eg., `ca.crt`), the certificate file (eg., `client.crt`), and the key file (eg., `client.key`). - -![SSL certificate files](./docs/images/filebeat_certs.png) - -The Logstash instance receiving the events must be similarly configured with matching SSL certificate and key files. Under Malcolm, the `BEATS_SSL` variable must be set to `true` in Malcolm's `docker-compose.yml` file and the SSL files must exist in the `logstash/certs/` subdirectory of the Malcolm installation. - -Once you have specified all of the filebeat parameters, you will be presented with a summary of the settings related to the forwarding of these logs. Selecting **OK** will cause the parameters to be written to filebeat's configuration keystore under `/opt/sensor/sensor_ctl/logstash-client-certificates` and you will be returned to the configuration tool's welcome screen. - -![Confirm filebeat settings](./docs/images/filebeat_confirm.png) - -### miscbeat: System metrics forwarding - -The sensor uses [Fluent Bit](https://fluentbit.io/) to gather miscellaneous system resource metrics (CPU, network I/O, disk I/O, memory utilization, temperature, etc.) and the [Beats](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) protocol to forward these metrics to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. Metrics categories can be enabled/disabled as described in the [autostart services](#ConfigAutostart) section of this document. - -This forwarder's configuration is almost identical to that of [filebeat](#filebeat) in the previous section. Select `miscbeat` from the forwarding configuration mode options and follow the same steps outlined above to set up this forwarder. - -### Autostart services - -Once the forwarders have been configured, the final step is to **Configure Autostart Services**. Choose this option from the configuration mode menu after the welcome screen of the sensor configuration tool. - -Despite configuring capture and/or forwarder services as described in previous sections, only services enabled in the autostart configuration will run when the sensor starts up. The available autostart processes are as follows (recommended services are in **bold text**): - -* **AUTOSTART_ARKIME** - [capture](#arkime-capture) PCAP engine for traffic capture, as well as traffic parsing and metadata insertion into OpenSearch for viewing in [Arkime](https://arkime.com/). If you are using Hedgehog Linux along with [Malcolm](https://github.com/idaholab/Malcolm) or another Arkime installation, this is probably the packet capture engine you want to use. -* **AUTOSTART_CLAMAV_UPDATES** - Virus database update service for ClamAV (requires sensor to be connected to the internet) -* **AUTOSTART_FILEBEAT** - [filebeat](#filebeat) Zeek and Suricata log forwarder -* **AUTOSTART_FLUENTBIT_AIDE** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/exec) [AIDE](https://aide.github.io/) file system integrity checks -* **AUTOSTART_FLUENTBIT_AUDITLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/tail) [auditd](https://man7.org/linux/man-pages/man8/auditd.8.html) logs -* *AUTOSTART_FLUENTBIT_KMSG* - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/kernel-logs) the Linux kernel log buffer (these are generally reflected in syslog as well, which may make this agent redundant) -* **AUTOSTART_FLUENTBIT_METRICS** - [Fluent Bit](https://fluentbit.io/) agent for collecting [various](https://docs.fluentbit.io/manual/pipeline/inputs) system resource and performance metrics -* **AUTOSTART_FLUENTBIT_SYSLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/syslog) Linux syslog messages -* **AUTOSTART_FLUENTBIT_THERMAL** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/thermal) system temperatures -* **AUTOSTART_MISCBEAT** - [filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) forwarder which sends system metrics collected by [Fluent Bit](https://fluentbit.io/) to a remote Logstash instance (e.g., [Malcolm](https://github.com/idaholab/Malcolm)'s) -* *AUTOSTART_NETSNIFF* - [netsniff-ng](http://netsniff-ng.org/) PCAP engine for saving packet capture (PCAP) files -* **AUTOSTART_PRUNE_PCAP** - storage space monitor to ensure that PCAP files do not consume more than 90% of the total size of the storage volume to which PCAP files are written -* **AUTOSTART_PRUNE_ZEEK** - storage space monitor to ensure that Zeek logs do not consume more than 90% of the total size of the storage volume to which Zeek logs are written -* **AUTOSTART_SURICATA** - [Suricata](https://suricata.io/) traffic analysis engine -* **AUTOSTART_SURICATA_UPDATES** - Rule update service for Suricata (requires sensor to be connected to the internet) -* *AUTOSTART_TCPDUMP* - [tcpdump](https://www.tcpdump.org/) PCAP engine for saving packet capture (PCAP) files -* **AUTOSTART_ZEEK** - [Zeek](https://www.zeek.org/) traffic analysis engine - -Note that only one packet capture engine ([capture](https://arkime.com/), [netsniff-ng](http://netsniff-ng.org/), or [tcpdump](https://www.tcpdump.org/)) can be used. - -![Autostart services](./docs/images/autostarts.png) - -Once you have selected the autostart services, you will be prompted to confirm your selections. Doing so will cause these values to be written back out to the `/opt/sensor/sensor_ctl/control_vars.conf` configuration file. - -![Autostart services confirmation](./docs/images/autostarts_confirm.png) - -After you have completed configuring the sensor it is recommended that you reboot the sensor to ensure all new settings take effect. If rebooting is not an option, you may click the **Restart Sensor Services** menu icon in the top menu bar, or open a terminal and run: - -``` -/opt/sensor/sensor_ctl/shutdown && sleep 10 && /opt/sensor/sensor_ctl/supervisor.sh -``` - -This will cause the sensor services controller to stop, wait a few seconds, and restart. You can check the status of the sensor's processes by choosing **Sensor Status** from the sensor's kiosk mode, clicking the **Sensor Service Status** toolbar icon, or running `/opt/sensor/sensor_ctl/status` from the command line: - -``` -$ /opt/sensor/sensor_ctl/status -arkime:arkime-capture RUNNING pid 6455, uptime 0:03:17 -arkime:arkime-viewer RUNNING pid 6456, uptime 0:03:17 -beats:filebeat RUNNING pid 6457, uptime 0:03:17 -beats:miscbeat RUNNING pid 6458, uptime 0:03:17 -clamav:clamav-service RUNNING pid 6459, uptime 0:03:17 -clamav:clamav-updates RUNNING pid 6461, uptime 0:03:17 -fluentbit-auditlog RUNNING pid 6463, uptime 0:03:17 -fluentbit-kmsg STOPPED Not started -fluentbit-metrics:cpu RUNNING pid 6466, uptime 0:03:17 -fluentbit-metrics:df RUNNING pid 6471, uptime 0:03:17 -fluentbit-metrics:disk RUNNING pid 6468, uptime 0:03:17 -fluentbit-metrics:mem RUNNING pid 6472, uptime 0:03:17 -fluentbit-metrics:mem_p RUNNING pid 6473, uptime 0:03:17 -fluentbit-metrics:netif RUNNING pid 6474, uptime 0:03:17 -fluentbit-syslog RUNNING pid 6478, uptime 0:03:17 -fluentbit-thermal RUNNING pid 6480, uptime 0:03:17 -netsniff:netsniff-enp1s0 STOPPED Not started -prune:prune-pcap RUNNING pid 6484, uptime 0:03:17 -prune:prune-zeek RUNNING pid 6486, uptime 0:03:17 -supercronic RUNNING pid 6490, uptime 0:03:17 -suricata RUNNING pid 6501, uptime 0:03:17 -tcpdump:tcpdump-enp1s0 STOPPED Not started -zeek:capa RUNNING pid 6553, uptime 0:03:17 -zeek:clamav RUNNING pid 6512, uptime 0:03:17 -zeek:logger RUNNING pid 6554, uptime 0:03:17 -zeek:virustotal STOPPED Not started -zeek:watcher RUNNING pid 6510, uptime 0:03:17 -zeek:yara RUNNING pid 6548, uptime 0:03:17 -zeek:zeekctl RUNNING pid 6502, uptime 0:03:17 -``` - -### Zeek Intelligence Framework - -To quote Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) documentation, "The goals of Zeek’s Intelligence Framework are to consume intelligence data, make it available for matching, and provide infrastructure to improve performance and memory utilization. Data in the Intelligence Framework is an atomic piece of intelligence such as an IP address or an e-mail address. This atomic data will be packed with metadata such as a freeform source field, a freeform descriptive field, and a URL which might lead to more information about the specific item." Zeek [intelligence](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html) [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type) include IP addresses, URLs, file names, hashes, email addresses, and more. - -Hedgehog Linux doesn't come bundled with intelligence files from any particular feed, but they can be easily included into your local instance. Before Zeek starts, Hedgehog Linux configures it such that intelligence files will be automatically included in its local policy. Subdirectories under `/opt/sensor/sensor_ctl/zeek/intel` which contain their own `__load__.zeek` file will be `@load`-ed as-is, while subdirectories containing "loose" intelligence files will be [loaded](https://docs.zeek.org/en/master/frameworks/intel.html#loading-intelligence) automatically with a `redef Intel::read_files` directive. - -Note that Hedgehog Linux does not manage updates for these intelligence files. You should use the update mechanism suggested by your feeds' maintainers to keep them up to date. Adding and deleting intelligence files under this directory will take effect upon restarting Zeek. \ No newline at end of file diff --git a/docs/main.md b/docs/main.md index f25a0699b..9460bf79b 100644 --- a/docs/main.md +++ b/docs/main.md @@ -22,7 +22,7 @@ You can help steer Malcolm's development by sharing your ideas and feedback. Ple ## Table of Contents * [Automated Build Workflows Status](#BuildBadges) -* [Quick start](#QuickStart) +* [Quick start](quickstart.md#QuickStart) * [Getting Malcolm](#GetMalcolm) * [User interface](#UserInterfaceURLs) * [Overview](#Overview) diff --git a/docs/malcolm-iso.md b/docs/malcolm-iso.md index cfb3f9182..0a0136930 100644 --- a/docs/malcolm-iso.md +++ b/docs/malcolm-iso.md @@ -39,7 +39,7 @@ Finished, created "/malcolm-build/malcolm-iso/malcolm-6.4.0.iso" … ``` -By default, Malcolm's Docker images are not packaged with the installer ISO, assuming instead that you will pull the [latest images](https://hub.docker.com/u/malcolmnetsec) with a `docker-compose pull` command as described in the [Quick start](#QuickStart) section. If you wish to build an ISO with the latest Malcolm images included, follow the directions to create [pre-packaged installation files](#Packager), which include a tarball with a name like `malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz`. Then, pass that images tarball to the ISO build script with a `-d`, like this: +By default, Malcolm's Docker images are not packaged with the installer ISO, assuming instead that you will pull the [latest images](https://hub.docker.com/u/malcolmnetsec) with a `docker-compose pull` command as described in the [Quick start](quickstart.md#QuickStart) section. If you wish to build an ISO with the latest Malcolm images included, follow the directions to create [pre-packaged installation files](#Packager), which include a tarball with a name like `malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz`. Then, pass that images tarball to the ISO build script with a `-d`, like this: ``` $ ./malcolm-iso/build_via_vagrant.sh -f -d malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz @@ -76,6 +76,6 @@ Following these prompts, the installer will reboot and the Malcolm base operatin When the system boots for the first time, the Malcolm Docker images will load if the installer was built with pre-packaged installation files as described above. Wait for this operation to continue (the progress dialog will disappear when they have finished loading) before continuing the setup. -Open a terminal (click the red terminal 🗔 icon next to the Debian swirl logo 🍥 menu button in the menu bar). At this point, setup is similar to the steps described in the [Quick start](#QuickStart) section. Navigate to the Malcolm directory (`cd ~/Malcolm`) and run [`auth_setup`](#AuthSetup) to configure authentication. If the ISO didn't have pre-packaged Malcolm images, or if you'd like to retrieve the latest updates, run `docker-compose pull`. Finalize your configuration by running `scripts/install.py --configure` and follow the prompts as illustrated in the [installation example](#InstallationExample). +Open a terminal (click the red terminal 🗔 icon next to the Debian swirl logo 🍥 menu button in the menu bar). At this point, setup is similar to the steps described in the [Quick start](quickstart.md#QuickStart) section. Navigate to the Malcolm directory (`cd ~/Malcolm`) and run [`auth_setup`](#AuthSetup) to configure authentication. If the ISO didn't have pre-packaged Malcolm images, or if you'd like to retrieve the latest updates, run `docker-compose pull`. Finalize your configuration by running `scripts/install.py --configure` and follow the prompts as illustrated in the [installation example](#InstallationExample). Once Malcolm is configured, you can [start Malcolm](#Starting) via the command line or by clicking the circular yellow Malcolm icon in the menu bar. \ No newline at end of file diff --git a/docs/opensearch-instances.md b/docs/opensearch-instances.md new file mode 100644 index 000000000..ffcb57b68 --- /dev/null +++ b/docs/opensearch-instances.md @@ -0,0 +1,78 @@ +### OpenSearch instances + +Malcolm's default standalone configuration is to use a local [OpenSearch](https://opensearch.org/) instance in a Docker container to index and search network traffic metadata. OpenSearch can also run as a [cluster](https://opensearch.org/docs/latest/opensearch/cluster/) with instances distributed across multiple nodes with dedicated [roles](https://opensearch.org/docs/latest/opensearch/cluster/#nodes) like cluster manager, data node, ingest node, etc. + +As the permutations of OpenSearch cluster configurations are numerous, it is beyond Malcolm's scope to set up multi-node clusters. However, Malcolm can be configured to use a remote OpenSearch cluster rather than its own internal instance. + +The `OPENSEARCH_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) control whether Malcolm uses its own local OpenSearch instance or a remote OpenSearch instance as its primary data store. The configuration portion of Malcolm install script ([`./scripts/install.py --configure`](#ConfigAndTuning)) can help you configure these options. + +For example, to use the default standalone configuration, answer `Y` when prompted `Should Malcolm use and maintain its own OpenSearch instance?`. + +Or, to use a remote OpenSearch cluster: + +``` +… +Should Malcolm use and maintain its own OpenSearch instance? (Y/n): n + +Enter primary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.123:9200 + +Require SSL certificate validation for communication with primary OpenSearch instance? (y/N): n + +You must run auth_setup after install.py to store OpenSearch connection credentials. +… +``` + +Whether the primary OpenSearch instance is a locally maintained single-node instance or is a remote cluster, Malcolm can be configured additionally forward logs to a secondary remote OpenSearch instance. The `OPENSEARCH_SECONDARY_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) control this behavior. Configuration of a remote secondary OpenSearch instance is similar to that of a remote primary OpenSearch instance: + + +``` +… +Forward Logstash logs to a secondary remote OpenSearch instance? (y/N): y + +Enter secondary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.124:9200 + +Require SSL certificate validation for communication with secondary OpenSearch instance? (y/N): n + +You must run auth_setup after install.py to store OpenSearch connection credentials. +… +``` + +#### Authentication and authorization for remote OpenSearch clusters + +In addition to setting the environment variables in [`docker-compose.yml`](#DockerComposeYml) as described above, you must provide Malcolm with credentials for it to be able to communicate with remote OpenSearch instances. These credentials are stored in the Malcolm installation directory as `.opensearch.primary.curlrc` and `.opensearch.secondary.curlrc` for the primary and secondary OpenSearch connections, respectively, and are bind mounted into the Docker containers which need to communicate with OpenSearch. These [cURL-formatted](https://everything.curl.dev/cmdline/configfile) config files can be generated for you by the [`auth_setup`](#AuthSetup) script as illustrated: + +``` +$ ./scripts/auth_setup + +… + +Store username/password for primary remote OpenSearch instance? (y/N): y + +OpenSearch username: servicedb +servicedb password: +servicedb password (again): + +Require SSL certificate validation for OpenSearch communication? (Y/n): n + +Store username/password for secondary remote OpenSearch instance? (y/N): y + +OpenSearch username: remotedb +remotedb password: +remotedb password (again): + +Require SSL certificate validation for OpenSearch communication? (Y/n): n + +… +``` + +These files are created with permissions such that only the user account running Malcolm can access them: + +``` +$ ls -la .opensearch.*.curlrc +-rw------- 1 user user 36 Aug 22 14:17 .opensearch.primary.curlrc +-rw------- 1 user user 35 Aug 22 14:18 .opensearch.secondary.curlrc +``` + +One caveat with Malcolm using a remote OpenSearch cluster as its primary document store is that the accounts used to access Malcolm's [web interfaces](#UserInterfaceURLs), particularly [OpenSearch Dashboards](#Dashboards), are in some instance passed directly through to OpenSearch itself. For this reason, both Malcolm and the remote primary OpenSearch instance must have the same account information. The easiest way to accomplish this is to use an Active Directory/LDAP server that both [Malcolm](#AuthLDAP) and [OpenSearch](https://opensearch.org/docs/latest/security-plugin/configuration/ldap/) use as a common authentication backend. + +See the OpenSearch documentation on [access control](https://opensearch.org/docs/latest/security-plugin/access-control/index/) for more information. \ No newline at end of file diff --git a/docs/running.md b/docs/running.md index af0867660..e70aaebc7 100644 --- a/docs/running.md +++ b/docs/running.md @@ -1,184 +1,5 @@ ## Running Malcolm -### OpenSearch instances - -Malcolm's default standalone configuration is to use a local [OpenSearch](https://opensearch.org/) instance in a Docker container to index and search network traffic metadata. OpenSearch can also run as a [cluster](https://opensearch.org/docs/latest/opensearch/cluster/) with instances distributed across multiple nodes with dedicated [roles](https://opensearch.org/docs/latest/opensearch/cluster/#nodes) like cluster manager, data node, ingest node, etc. - -As the permutations of OpenSearch cluster configurations are numerous, it is beyond Malcolm's scope to set up multi-node clusters. However, Malcolm can be configured to use a remote OpenSearch cluster rather than its own internal instance. - -The `OPENSEARCH_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) control whether Malcolm uses its own local OpenSearch instance or a remote OpenSearch instance as its primary data store. The configuration portion of Malcolm install script ([`./scripts/install.py --configure`](#ConfigAndTuning)) can help you configure these options. - -For example, to use the default standalone configuration, answer `Y` when prompted `Should Malcolm use and maintain its own OpenSearch instance?`. - -Or, to use a remote OpenSearch cluster: - -``` -… -Should Malcolm use and maintain its own OpenSearch instance? (Y/n): n - -Enter primary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.123:9200 - -Require SSL certificate validation for communication with primary OpenSearch instance? (y/N): n - -You must run auth_setup after install.py to store OpenSearch connection credentials. -… -``` - -Whether the primary OpenSearch instance is a locally maintained single-node instance or is a remote cluster, Malcolm can be configured additionally forward logs to a secondary remote OpenSearch instance. The `OPENSEARCH_SECONDARY_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) control this behavior. Configuration of a remote secondary OpenSearch instance is similar to that of a remote primary OpenSearch instance: - - -``` -… -Forward Logstash logs to a secondary remote OpenSearch instance? (y/N): y - -Enter secondary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.124:9200 - -Require SSL certificate validation for communication with secondary OpenSearch instance? (y/N): n - -You must run auth_setup after install.py to store OpenSearch connection credentials. -… -``` - -#### Authentication and authorization for remote OpenSearch clusters - -In addition to setting the environment variables in [`docker-compose.yml`](#DockerComposeYml) as described above, you must provide Malcolm with credentials for it to be able to communicate with remote OpenSearch instances. These credentials are stored in the Malcolm installation directory as `.opensearch.primary.curlrc` and `.opensearch.secondary.curlrc` for the primary and secondary OpenSearch connections, respectively, and are bind mounted into the Docker containers which need to communicate with OpenSearch. These [cURL-formatted](https://everything.curl.dev/cmdline/configfile) config files can be generated for you by the [`auth_setup`](#AuthSetup) script as illustrated: - -``` -$ ./scripts/auth_setup - -… - -Store username/password for primary remote OpenSearch instance? (y/N): y - -OpenSearch username: servicedb -servicedb password: -servicedb password (again): - -Require SSL certificate validation for OpenSearch communication? (Y/n): n - -Store username/password for secondary remote OpenSearch instance? (y/N): y - -OpenSearch username: remotedb -remotedb password: -remotedb password (again): - -Require SSL certificate validation for OpenSearch communication? (Y/n): n - -… -``` - -These files are created with permissions such that only the user account running Malcolm can access them: - -``` -$ ls -la .opensearch.*.curlrc --rw------- 1 user user 36 Aug 22 14:17 .opensearch.primary.curlrc --rw------- 1 user user 35 Aug 22 14:18 .opensearch.secondary.curlrc -``` - -One caveat with Malcolm using a remote OpenSearch cluster as its primary document store is that the accounts used to access Malcolm's [web interfaces](#UserInterfaceURLs), particularly [OpenSearch Dashboards](#Dashboards), are in some instance passed directly through to OpenSearch itself. For this reason, both Malcolm and the remote primary OpenSearch instance must have the same account information. The easiest way to accomplish this is to use an Active Directory/LDAP server that both [Malcolm](#AuthLDAP) and [OpenSearch](https://opensearch.org/docs/latest/security-plugin/configuration/ldap/) use as a common authentication backend. - -See the OpenSearch documentation on [access control](https://opensearch.org/docs/latest/security-plugin/access-control/index/) for more information. - -### Configure authentication - -Malcolm requires authentication to access the [user interface](#UserInterfaceURLs). [Nginx](https://nginx.org/) can authenticate users with either local TLS-encrypted HTTP basic authentication or using a remote Lightweight Directory Access Protocol (LDAP) authentication server. - -With the local basic authentication method, user accounts are managed by Malcolm and can be created, modified, and deleted using a [user management web interface](#AccountManagement). This method is suitable in instances where accounts and credentials do not need to be synced across many Malcolm installations. - -LDAP authentication are managed on a remote directory service, such as a [Microsoft Active Directory Domain Services](https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/get-started/virtual-dc/active-directory-domain-services-overview) or [OpenLDAP](https://www.openldap.org/). - -Malcolm's authentication method is defined in the `x-auth-variables` section near the top of the [`docker-compose.yml`](#DockerComposeYml) file with the `NGINX_BASIC_AUTH` environment variable: `true` for local TLS-encrypted HTTP basic authentication, `false` for LDAP authentication. - -In either case, you **must** run `./scripts/auth_setup` before starting Malcolm for the first time in order to: - -* define the local Malcolm administrator account username and password (although these credentials will only be used for basic authentication, not LDAP authentication) -* specify whether or not to (re)generate the self-signed certificates used for HTTPS access - * key and certificate files are located in the `nginx/certs/` directory -* specify whether or not to (re)generate the self-signed certificates used by a remote log forwarder (see the `BEATS_SSL` environment variable above) - * certificate authority, certificate, and key files for Malcolm's Logstash instance are located in the `logstash/certs/` directory - * certificate authority, certificate, and key files to be copied to and used by the remote log forwarder are located in the `filebeat/certs/` directory; if using [Hedgehog Linux](#Hedgehog), these certificates should be copied to the `/opt/sensor/sensor_ctl/logstash-client-certificates` directory on the sensor -* specify whether or not to [store the username/password](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#authenticate-sender-account) for [email alert senders](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#create-destinations) - * these parameters are stored securely in the OpenSearch keystore file `opensearch/opensearch.keystore` - -##### Local account management - -[`auth_setup`](#AuthSetup) is used to define the username and password for the administrator account. Once Malcolm is running, the administrator account can be used to manage other user accounts via a **Malcolm User Management** page served over HTTPS on port 488 (e.g., [https://localhost:488](https://localhost:488) if you are connecting locally). - -Malcolm user accounts can be used to access the [interfaces](#UserInterfaceURLs) of all of its [components](#Components), including Arkime. Arkime uses its own internal database of user accounts, so when a Malcolm user account logs in to Arkime for the first time Malcolm creates a corresponding Arkime user account automatically. This being the case, it is *not* recommended to use the Arkime **Users** settings page or change the password via the **Password** form under the Arkime **Settings** page, as those settings would not be consistently used across Malcolm. - -Users may change their passwords via the **Malcolm User Management** page by clicking **User Self Service**. A forgotten password can also be reset via an emailed link, though this requires SMTP server settings to be specified in `htadmin/config.ini` in the Malcolm installation directory. - -#### Lightweight Directory Access Protocol (LDAP) authentication - -The [nginx-auth-ldap](https://github.com/kvspb/nginx-auth-ldap) module serves as the interface between Malcolm's [Nginx](https://nginx.org/) web server and a remote LDAP server. When you run [`auth_setup`](#AuthSetup) for the first time, a sample LDAP configuration file is created at `nginx/nginx_ldap.conf`. - -``` -# This is a sample configuration for the ldap_server section of nginx.conf. -# Yours will vary depending on how your Active Directory/LDAP server is configured. -# See https://github.com/kvspb/nginx-auth-ldap#available-config-parameters for options. - -ldap_server ad_server { - url "ldap://ds.example.com:3268/DC=ds,DC=example,DC=com?sAMAccountName?sub?(objectClass=person)"; - - binddn "bind_dn"; - binddn_passwd "bind_dn_password"; - - group_attribute member; - group_attribute_is_dn on; - require group "CN=Malcolm,CN=Users,DC=ds,DC=example,DC=com"; - require valid_user; - satisfy all; -} - -auth_ldap_cache_enabled on; -auth_ldap_cache_expiration_time 10000; -auth_ldap_cache_size 1000; -``` - -This file is mounted into the `nginx` container when Malcolm is started to provide connection information for the LDAP server. - -The contents of `nginx_ldap.conf` will vary depending on how the LDAP server is configured. Some of the [avaiable parameters](https://github.com/kvspb/nginx-auth-ldap#available-config-parameters) in that file include: - -* **`url`** - the `ldap://` or `ldaps://` connection URL for the remote LDAP server, which has the [following syntax](https://www.ietf.org/rfc/rfc2255.txt): `ldap[s]://:/???` -* **`binddn`** and **`binddn_password`** - the account credentials used to query the LDAP directory -* **`group_attribute`** - the group attribute name which contains the member object (e.g., `member` or `memberUid`) -* **`group_attribute_is_dn`** - whether or not to search for the user's full distinguished name as the value in the group's member attribute -* **`require`** and **`satisfy`** - `require user`, `require group` and `require valid_user` can be used in conjunction with `satisfy any` or `satisfy all` to limit the users that are allowed to access the Malcolm instance - -Before starting Malcolm, edit `nginx/nginx_ldap.conf` according to the specifics of your LDAP server and directory tree structure. Using a LDAP search tool such as [`ldapsearch`](https://www.openldap.org/software/man.cgi?query=ldapsearch) in Linux or [`dsquery`](https://social.technet.microsoft.com/wiki/contents/articles/2195.active-directory-dsquery-commands.aspx) in Windows may be of help as you formulate the configuration. Your changes should be made within the curly braces of the `ldap_server ad_server { … }` section. You can troubleshoot configuration file syntax errors and LDAP connection or credentials issues by running `./scripts/logs` (or `docker-compose logs nginx`) and examining the output of the `nginx` container. - -The **Malcolm User Management** page described above is not available when using LDAP authentication. - -##### LDAP connection security - -Authentication over LDAP can be done using one of three ways, [two of which](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-adts/8e73932f-70cf-46d6-88b1-8d9f86235e81) offer data confidentiality protection: - -* **StartTLS** - the [standard extension](https://tools.ietf.org/html/rfc2830) to the LDAP protocol to establish an encrypted SSL/TLS connection within an already established LDAP connection -* **LDAPS** - a commonly used (though unofficial and considered deprecated) method in which SSL negotiation takes place before any commands are sent from the client to the server -* **Unencrypted** (cleartext) (***not recommended***) - -In addition to the `NGINX_BASIC_AUTH` environment variable being set to `false` in the `x-auth-variables` section near the top of the [`docker-compose.yml`](#DockerComposeYml) file, the `NGINX_LDAP_TLS_STUNNEL` and `NGINX_LDAP_TLS_STUNNEL` environment variables are used in conjunction with the values in `nginx/nginx_ldap.conf` to define the LDAP connection security level. Use the following combinations of values to achieve the connection security methods above, respectively: - -* **StartTLS** - - `NGINX_LDAP_TLS_STUNNEL` set to `true` in [`docker-compose.yml`](#DockerComposeYml) - - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` -* **LDAPS** - - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](#DockerComposeYml) - - `url` should begin with `ldaps://` and its port should be either the default LDAPS port (636) or the default LDAPS Global Catalog port (3269) in `nginx/nginx_ldap.conf` -* **Unencrypted** (clear text) (***not recommended***) - - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](#DockerComposeYml) - - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` - -For encrypted connections (whether using **StartTLS** or **LDAPS**), Malcolm will require and verify certificates when one or more trusted CA certificate files are placed in the `nginx/ca-trust/` directory. Otherwise, any certificate presented by the domain server will be accepted. - -### TLS certificates - -When you [set up authentication](#AuthSetup) for Malcolm a set of unique [self-signed](https://en.wikipedia.org/wiki/Self-signed_certificate) TLS certificates are created which are used to secure the connection between clients (e.g., your web browser) and Malcolm's browser-based interface. This is adequate for most Malcolm instances as they are often run locally or on internal networks, although your browser will most likely require you to add a security exception for the certificate the first time you connect to Malcolm. - -Another option is to generate your own certificates (or have them issued to you) and have them placed in the `nginx/certs/` directory. The certificate and key file should be named `cert.pem` and `key.pem`, respectively. - -A third possibility is to use a third-party reverse proxy (e.g., [Traefik](https://doc.traefik.io/traefik/) or [Caddy](https://caddyserver.com/docs/quick-starts/reverse-proxy)) to handle the issuance of the certificates for you and to broker the connections between clients and Malcolm. Reverse proxies such as these often implement the [ACME](https://datatracker.ietf.org/doc/html/rfc8555) protocol for domain name authentication and can be used to request certificates from certificate authorities like [Let's Encrypt](https://letsencrypt.org/how-it-works/). In this configuration, the reverse proxy will be encrypting the connections instead of Malcolm, so you'll need to set the `NGINX_SSL` environment variable to `false` in [`docker-compose.yml`](#DockerComposeYml) (or answer `no` to the "Require encrypted HTTPS connections?" question posed by `install.py`). If you are setting `NGINX_SSL` to `false`, **make sure** you understand what you are doing and ensure that external connections cannot reach ports over which Malcolm will be communicating without encryption, including verifying your local firewall configuration. - ### Starting Malcolm [Docker compose](https://docs.docker.com/compose/) is used to coordinate running the Docker containers. To start Malcolm, navigate to the directory containing `docker-compose.yml` and run: