diff --git a/README.md b/README.md index eed455f5b..f1f7b219c 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Malcolm -![](./docs/images/logo/Malcolm_banner.png) +![](./docs/images/logo/Malcolm_outline_banner_dark.png) [Malcolm](https://github.com/idaholab/Malcolm) is a powerful network traffic analysis tool suite designed with the following goals in mind: @@ -15,108 +15,17 @@ Although all of the open source tools which make up Malcolm are already availabl In short, Malcolm provides an easily deployable network analysis tool suite for full packet capture artifacts (PCAP files) and Zeek logs. While Internet access is required to build it, it is not required at runtime. -## Share your feedback +## Documentation -You can help steer Malcolm's development by sharing your ideas and feedback. Please take a few minutes to complete [this survey ↪](https://forms.gle/JYt9QwA5C4SYX8My6) (hosted on Google Forms) so we can understand the members of the Malcolm community and their use cases for this tool. +See the [**Malcolm documentation**](docs/README.md). -## Table of Contents +## Share your feedback -* [Automated Build Workflows Status](#BuildBadges) -* [Quick start](#QuickStart) - * [Getting Malcolm](#GetMalcolm) - * [User interface](#UserInterfaceURLs) -* [Overview](#Overview) -* [Components](#Components) -* [Supported Protocols](#Protocols) -* [Development](#Development) - * [Building from source](#Build) -* [Pre-Packaged installation files](#Packager) -* [Preparing your system](#Preparing) - * [Recommended system requirements](#SystemRequirements) - * [System configuration and tuning](#ConfigAndTuning) - * [`docker-compose.yml` parameters](#DockerComposeYml) - * [Linux host system configuration](#HostSystemConfigLinux) - * [macOS host system configuration](#HostSystemConfigMac) - * [Windows host system configuration](#HostSystemConfigWindows) -* [Running Malcolm](#Running) - * [OpenSearch instances](#OpenSearchInstance) - * [Authentication and authorization for remote OpenSearch clusters](#OpenSearchAuth) - * [Configure authentication](#AuthSetup) - * [Local account management](#AuthBasicAccountManagement) - * [Lightweight Directory Access Protocol (LDAP) authentication](#AuthLDAP) - - [LDAP connection security](#AuthLDAPSecurity) - * [TLS certificates](#TLSCerts) - * [Starting Malcolm](#Starting) - * [Stopping and restarting Malcolm](#StopAndRestart) - * [Clearing Malcolm's data](#Wipe) - * [Temporary read-only interface](#ReadOnlyUI) -* [Capture file and log archive upload](#Upload) - - [Tagging](#Tagging) - - [Processing uploaded PCAPs with Zeek and Suricata](#UploadPCAPProcessors) -* [Live analysis](#LiveAnalysis) - * [Using a network sensor appliance](#Hedgehog) - * [Monitoring local network interfaces](#LocalPCAP) - * [Manually forwarding logs from an external source](#ExternalForward) -* [Arkime](#Arkime) - * [Zeek log integration](#ArkimeZeek) - - [Correlating Zeek logs and Arkime sessions](#ZeekArkimeFlowCorrelation) - * [Help](#ArkimeHelp) - * [Sessions](#ArkimeSessions) - * [PCAP Export](#ArkimePCAPExport) - * [SPIView](#ArkimeSPIView) - * [SPIGraph](#ArkimeSPIGraph) - * [Connections](#ArkimeConnections) - * [Hunt](#ArkimeHunt) - * [Statistics](#ArkimeStats) - * [Settings](#ArkimeSettings) -* [OpenSearch Dashboards](#Dashboards) - * [Discover](#Discover) - - [Screenshots](#DiscoverGallery) - * [Visualizations and dashboards](#DashboardsVisualizations) - - [Prebuilt visualizations and dashboards](#PrebuiltVisualizations) - - [Screenshots](#PrebuiltVisualizationsGallery) - - [Building your own visualizations and dashboards](#BuildDashboard) - + [Screenshots](#NewVisualizationsGallery) -* [Search Queries in Arkime and OpenSearch](#SearchCheatSheet) -* [Other Malcolm features](#MalcolmFeatures) - - [Automatic file extraction and scanning](#ZeekFileExtraction) - - [Automatic host and subnet name assignment](#HostAndSubnetNaming) - + [IP/MAC address to hostname mapping via `host-map.txt`](#HostNaming) - + [CIDR subnet to network segment name mapping via `cidr-map.txt`](#SegmentNaming) - + [Defining hostname and CIDR subnet names interface](#NameMapUI) - + [Applying mapping changes](#ApplyMapping) - - [OpenSearch index management](#IndexManagement) - - [Event severity scoring](#Severity) - + [Customizing event severity scoring](#SeverityConfig) - - [Zeek Intelligence Framework](#ZeekIntel) - + [STIX™ and TAXII™](#ZeekIntelSTIX) - + [MISP](#ZeekIntelMISP) - - [Anomaly Detection](#AnomalyDetection) - - [Alerting](#Alerting) - + [Email Sender Accounts](#AlertingEmail) - - ["Best Guess" Fingerprinting for ICS Protocols](#ICSBestGuess) - - [Asset Management with NetBox](#NetBox) - - [CyberChef](#CyberChef) - - [API](#API) - + [Examples](#APIExamples) -* [Ingesting Third-party Logs](#ThirdPartyLogs) -* [Malcolm installer ISO](#ISO) - * [Installation](#ISOInstallation) - * [Generating the ISO](#ISOBuild) - * [Setup](#ISOSetup) - * [Time synchronization](#ConfigTime) - * [Hardening](#Hardening) - * [Compliance Exceptions](#ComplianceExceptions) -* [Installation example using Ubuntu 22.04 LTS](#InstallationExample) -* [Upgrading Malcolm](#UpgradePlan) -* [Modifying or Contributing to Malcolm](#Contributing) -* [Forks](#Forks) -* [Copyright](#Footer) -* [Contact](#Contact) +You can help steer Malcolm's development by sharing your ideas and feedback. Please take a few minutes to complete [this survey ↪](https://forms.gle/JYt9QwA5C4SYX8My6) (hosted on Google Forms) so we can understand the members of the Malcolm community and their use cases for this tool. ## Automated Builds Status -See [**Building from source**](#Build) to read how you can use GitHub [workflow files](./.github/workflows/) to build Malcolm. +See [**Building from source**](docs/development.md#Build) to read how you can use GitHub [workflow files](./.github/workflows/) to build Malcolm. ![api-build-and-push-ghcr](https://github.com/mmguero-dev/Malcolm/workflows/api-build-and-push-ghcr/badge.svg) ![arkime-build-and-push-ghcr](https://github.com/mmguero-dev/Malcolm/workflows/arkime-build-and-push-ghcr/badge.svg) @@ -138,3958 +47,6 @@ See [**Building from source**](#Build) to read how you can use GitHub [workflow ![malcolm-iso-build-docker-wrap-push-ghcr](https://github.com/mmguero-dev/Malcolm/workflows/malcolm-iso-build-docker-wrap-push-ghcr/badge.svg) ![sensor-iso-build-docker-wrap-push-ghcr](https://github.com/mmguero-dev/Malcolm/workflows/sensor-iso-build-docker-wrap-push-ghcr/badge.svg) -## Quick start - -### Getting Malcolm - -For a `TL;DR` example of downloading, configuring, and running Malcolm on a Linux platform, see [Installation example using Ubuntu 22.04 LTS](#InstallationExample). - -The scripts to control Malcolm require Python 3. The [`install.py`](#ConfigAndTuning) script requires the [requests](https://docs.python-requests.org/en/latest/) module for Python 3, and will make use of the [pythondialog](https://pythondialog.sourceforge.io/) module for user interaction (on Linux) if it is available. - -#### Source code - -The files required to build and run Malcolm are available on its [GitHub page](https://github.com/idaholab/Malcolm/tree/main). Malcolm's source code is released under the terms of a permissive open source software license (see see `License.txt` for the terms of its release). - -#### Building Malcolm from scratch - -The `build.sh` script can build Malcolm's Docker images from scratch. See [Building from source](#Build) for more information. - -#### Initial configuration - -You must run [`auth_setup`](#AuthSetup) prior to pulling Malcolm's Docker images. You should also ensure your system configuration and `docker-compose.yml` settings are tuned by running `./scripts/install.py` or `./scripts/install.py --configure` (see [System configuration and tuning](#ConfigAndTuning)). - -#### Pull Malcolm's Docker images - -Malcolm's Docker images are periodically built and hosted on [Docker Hub](https://hub.docker.com/u/malcolmnetsec). If you already have [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/), these prebuilt images can be pulled by navigating into the Malcolm directory (containing the `docker-compose.yml` file) and running `docker-compose pull` like this: -``` -$ docker-compose pull -Pulling api ... done -Pulling arkime ... done -Pulling dashboards ... done -Pulling dashboards-helper ... done -Pulling file-monitor ... done -Pulling filebeat ... done -Pulling freq ... done -Pulling htadmin ... done -Pulling logstash ... done -Pulling name-map-ui ... done -Pulling netbox ... done -Pulling netbox-postgresql ... done -Pulling netbox-redis ... done -Pulling nginx-proxy ... done -Pulling opensearch ... done -Pulling pcap-capture ... done -Pulling pcap-monitor ... done -Pulling suricata ... done -Pulling upload ... done -Pulling zeek ... done -``` - -You can then observe that the images have been retrieved by running `docker images`: -``` -$ docker images -REPOSITORY TAG IMAGE ID CREATED SIZE -malcolmnetsec/api 6.4.0 xxxxxxxxxxxx 3 days ago 158MB -malcolmnetsec/arkime 6.4.0 xxxxxxxxxxxx 3 days ago 816MB -malcolmnetsec/dashboards 6.4.0 xxxxxxxxxxxx 3 days ago 1.02GB -malcolmnetsec/dashboards-helper 6.4.0 xxxxxxxxxxxx 3 days ago 184MB -malcolmnetsec/file-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 588MB -malcolmnetsec/file-upload 6.4.0 xxxxxxxxxxxx 3 days ago 259MB -malcolmnetsec/filebeat-oss 6.4.0 xxxxxxxxxxxx 3 days ago 624MB -malcolmnetsec/freq 6.4.0 xxxxxxxxxxxx 3 days ago 132MB -malcolmnetsec/htadmin 6.4.0 xxxxxxxxxxxx 3 days ago 242MB -malcolmnetsec/logstash-oss 6.4.0 xxxxxxxxxxxx 3 days ago 1.35GB -malcolmnetsec/name-map-ui 6.4.0 xxxxxxxxxxxx 3 days ago 143MB -malcolmnetsec/netbox 6.4.0 xxxxxxxxxxxx 3 days ago 1.01GB -malcolmnetsec/nginx-proxy 6.4.0 xxxxxxxxxxxx 3 days ago 121MB -malcolmnetsec/opensearch 6.4.0 xxxxxxxxxxxx 3 days ago 1.17GB -malcolmnetsec/pcap-capture 6.4.0 xxxxxxxxxxxx 3 days ago 121MB -malcolmnetsec/pcap-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 213MB -malcolmnetsec/postgresql 6.4.0 xxxxxxxxxxxx 3 days ago 268MB -malcolmnetsec/redis 6.4.0 xxxxxxxxxxxx 3 days ago 34.2MB -malcolmnetsec/suricata 6.4.0 xxxxxxxxxxxx 3 days ago 278MB -malcolmnetsec/zeek 6.4.0 xxxxxxxxxxxx 3 days ago 1GB -``` - -#### Import from pre-packaged tarballs - -Once built, the `malcolm_appliance_packager.sh` script can be used to create pre-packaged Malcolm tarballs for import on another machine. See [Pre-Packaged Installation Files](#Packager) for more information. - -### Starting and stopping Malcolm - -Use the scripts in the `scripts/` directory to start and stop Malcolm, view debug logs of a currently running -instance, wipe the database and restore Malcolm to a fresh state, etc. - -### User interface - -A few minutes after starting Malcolm (probably 5 to 10 minutes for Logstash to be completely up, depending on the system), the following services will be accessible: - -* [Arkime](https://arkime.com/): [https://localhost:443](https://localhost:443) -* [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/): [https://localhost/dashboards/](https://localhost/dashboards/) or [https://localhost:5601](https://localhost:5601) -* [Capture File and Log Archive Upload (Web)](#Upload): [https://localhost/upload/](https://localhost/upload/) -* [Capture File and Log Archive Upload (SFTP)](#Upload): `sftp://@127.0.0.1:8022/files` -* [Host and Subnet Name Mapping](#HostAndSubnetNaming) Editor: [https://localhost/name-map-ui/](https://localhost/name-map-ui/) -* [NetBox](#NetBox): [https://localhost/netbox/](https://localhost/netbox/) -* [Account Management](#AuthBasicAccountManagement): [https://localhost:488](https://localhost:488) - -## Overview - -![Malcolm Network Diagram](./docs/images/malcolm_network_diagram.png) - -Malcolm processes network traffic data in the form of packet capture (PCAP) files or Zeek logs. A [sensor](#Hedgehog) (packet capture appliance) monitors network traffic mirrored to it over a SPAN port on a network switch or router, or using a network TAP device. [Zeek](https://www.zeek.org/index.html) logs and [Arkime](https://molo.ch/) sessions are generated containing important session metadata from the traffic observed, which are then securely forwarded to a Malcolm instance. Full PCAP files are optionally stored locally on the sensor device for examination later. - -Malcolm parses the network session data and enriches it with additional lookups and mappings including GeoIP mapping, hardware manufacturer lookups from [organizationally unique identifiers (OUI)](http://standards-oui.ieee.org/oui/oui.txt) in MAC addresses, assigning names to [network segments](#SegmentNaming) and [hosts](#HostNaming) based on user-defined IP address and MAC mappings, performing [TLS fingerprinting](#https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967), and many others. - -The enriched data is stored in an [OpenSearch](https://opensearch.org/) document store in a format suitable for analysis through two intuitive interfaces: OpenSearch Dashboards, a flexible data visualization plugin with dozens of prebuilt dashboards providing an at-a-glance overview of network protocols; and Arkime, a powerful tool for finding and identifying the network sessions comprising suspected security incidents. These tools can be accessed through a web browser from analyst workstations or for display in a security operations center (SOC). Logs can also optionally be forwarded on to another instance of Malcolm. - -![Malcolm Data Pipeline](./docs/images/malcolm_data_pipeline.png) - -For smaller networks, use at home by network security enthusiasts, or in the field for incident response engagements, Malcolm can also easily be deployed locally on an ordinary consumer workstation or laptop. Malcolm can process local artifacts such as locally-generated Zeek logs, locally-captured PCAP files, and PCAP files collected offline without the use of a dedicated sensor appliance. - -## Components - -Malcolm leverages the following excellent open source tools, among others. - -* [Arkime](https://arkime.com/) (formerly Moloch) - for PCAP file processing, browsing, searching, analysis, and carving/exporting; Arkime itself consists of two parts: - * [capture](https://github.com/arkime/arkime/tree/master/capture) - a tool for traffic capture, as well as offline PCAP parsing and metadata insertion into OpenSearch - * [viewer](https://github.com/arkime/arkime/tree/master/viewer) - a browser-based interface for data visualization -* [OpenSearch](https://opensearch.org/) - a search and analytics engine for indexing and querying network traffic session metadata -* [Logstash](https://www.elastic.co/products/logstash) and [Filebeat](https://www.elastic.co/products/beats/filebeat) - for ingesting and parsing [Zeek](https://www.zeek.org/index.html) [Log Files](https://docs.zeek.org/en/stable/script-reference/log-files.html) and ingesting them into OpenSearch in a format that Arkime understands in the same way it natively understands PCAP data -* [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) - for creating additional ad-hoc visualizations and dashboards beyond that which is provided by Arkime viewer -* [Zeek](https://www.zeek.org/index.html) - a network analysis framework and IDS -* [Suricata](https://suricata.io/) - an IDS and threat detection engine -* [Yara](https://github.com/VirusTotal/yara) - a tool used to identify and classify malware samples -* [Capa](https://github.com/fireeye/capa) - a tool for detecting capabilities in executable files -* [ClamAV](https://www.clamav.net/) - an antivirus engine for scanning files extracted by Zeek -* [CyberChef](https://github.com/gchq/CyberChef) - a "swiss-army knife" data conversion tool -* [jQuery File Upload](https://github.com/blueimp/jQuery-File-Upload) - for uploading PCAP files and Zeek logs for processing -* [List.js](https://github.com/javve/list.js) - for the [host and subnet name mapping](#HostAndSubnetNaming) interface -* [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/) - for simple, reproducible deployment of the Malcolm appliance across environments and to coordinate communication between its various components -* [NetBox](https://netbox.dev/) - a suite for modeling and documenting modern networks -* [PostgreSQL](https://www.postgresql.org/) - a relational database for persisting NetBox's data -* [Redis](https://redis.io/) - an in-memory data store for caching NetBox session information -* [Nginx](https://nginx.org/) - for HTTPS and reverse proxying Malcolm components -* [nginx-auth-ldap](https://github.com/kvspb/nginx-auth-ldap) - an LDAP authentication module for nginx -* [Fluent Bit](https://fluentbit.io/) - for forwarding metrics to Malcolm from [network sensors](#Hedgehog) (packet capture appliances) -* [Mark Baggett](https://github.com/MarkBaggett)'s [freq](https://github.com/MarkBaggett/freq) - a tool for calculating entropy of strings -* [Florian Roth](https://github.com/Neo23x0)'s [Signature-Base](https://github.com/Neo23x0/signature-base) Yara ruleset -* These Zeek plugins: - * some of Amazon.com, Inc.'s [ICS protocol](https://github.com/amzn?q=zeek) analyzers - * Andrew Klaus's [Sniffpass](https://github.com/cybera/zeek-sniffpass) plugin for detecting cleartext passwords in HTTP POST requests - * Andrew Klaus's [zeek-httpattacks](https://github.com/precurse/zeek-httpattacks) plugin for detecting noncompliant HTTP requests - * ICS protocol analyzers for Zeek published by [DHS CISA](https://github.com/cisagov/ICSNPP) and [Idaho National Lab](https://github.com/idaholab/ICSNPP) - * Corelight's ["bad neighbor" (CVE-2020-16898)](https://github.com/corelight/CVE-2020-16898) plugin - * Corelight's ["Log4Shell" (CVE-2021-44228)](https://github.com/corelight/cve-2021-44228) plugin - * Corelight's ["OMIGOD" (CVE-2021-38647)](https://github.com/corelight/CVE-2021-38647) plugin - * Corelight's [Apache HTTP server 2.4.49-2.4.50 path traversal/RCE vulnerability (CVE-2021-41773)](https://github.com/corelight/CVE-2021-41773) plugin - * Corelight's [bro-xor-exe](https://github.com/corelight/bro-xor-exe-plugin) plugin - * Corelight's [callstranger-detector](https://github.com/corelight/callstranger-detector) plugin - * Corelight's [community ID](https://github.com/corelight/zeek-community-id) flow hashing plugin - * Corelight's [DCE/RPC remote code execution vulnerability (CVE-2022-26809)](https://github.com/corelight/cve-2022-26809) plugin - * Corelight's [HTTP More Filenames](https://github.com/corelight/http-more-files-names) plugin - * Corelight's [HTTP protocol stack vulnerability (CVE-2021-31166)](https://github.com/corelight/CVE-2021-31166) plugin - * Corelight's [pingback](https://github.com/corelight/pingback) plugin - * Corelight's [ripple20](https://github.com/corelight/ripple20) plugin - * Corelight's [SIGred](https://github.com/corelight/SIGred) plugin - * Corelight's [VMware Workspace ONE Access and Identity Manager RCE vulnerability (CVE-2022-22954)](https://github.com/corelight/cve-2022-22954) plugin - * Corelight's [Zerologon](https://github.com/corelight/zerologon) plugin - * Corelight's [Microsoft Excel privilege escalation detection (CVE-2021-42292)](https://github.com/corelight/CVE-2021-42292) plugin - * J-Gras' [Zeek::AF_Packet](https://github.com/J-Gras/zeek-af_packet-plugin) plugin - * Johanna Amann's [CVE-2020-0601](https://github.com/0xxon/cve-2020-0601) ECC certificate validation plugin and [CVE-2020-13777](https://github.com/0xxon/cve-2020-13777) GnuTLS unencrypted session ticket detection plugin - * Lexi Brent's [EternalSafety](https://github.com/0xl3x1/zeek-EternalSafety) plugin - * MITRE Cyber Analytics Repository's [Bro/Zeek ATT&CK®-Based Analytics (BZAR)](https://github.com/mitre-attack/car/tree/master/implementations) script - * Salesforce's [gQUIC](https://github.com/salesforce/GQUIC_Protocol_Analyzer) analyzer - * Salesforce's [HASSH](https://github.com/salesforce/hassh) SSH fingerprinting plugin - * Salesforce's [JA3](https://github.com/salesforce/ja3) TLS fingerprinting plugin - * Zeek's [Spicy](https://github.com/zeek/spicy) plugin framework -* [GeoLite2](https://dev.maxmind.com/geoip/geoip2/geolite2/) - Malcolm includes GeoLite2 data created by [MaxMind](https://www.maxmind.com) - -![Malcolm Components](./docs/images/malcolm_components.png) - -## Supported Protocols - -Malcolm uses [Zeek](https://docs.zeek.org/en/stable/script-reference/proto-analyzers.html) and [Arkime](https://github.com/arkime/arkime/tree/master/capture/parsers) to analyze network traffic. These tools provide varying degrees of visibility into traffic transmitted over the following network protocols: - -| Traffic | Wiki | Organization/Specification | Arkime | Zeek | -|---|:---:|:---:|:---:|:---:| -|Internet layer|[🔗](https://en.wikipedia.org/wiki/Internet_layer)|[🔗](https://tools.ietf.org/html/rfc791)|[✓](https://github.com/arkime/arkime/blob/master/capture/packet.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/conn/main.zeek.html#type-Conn::Info)| -|Border Gateway Protocol (BGP)|[🔗](https://en.wikipedia.org/wiki/Border_Gateway_Protocol)|[🔗](https://tools.ietf.org/html/rfc2283)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/bgp.c)|| -|Building Automation and Control (BACnet)|[🔗](https://en.wikipedia.org/wiki/BACnet)|[🔗](http://www.bacnet.org/)||[✓](https://github.com/cisagov/icsnpp-bacnet)| -|Bristol Standard Asynchronous Protocol (BSAP)|[🔗](https://en.wikipedia.org/wiki/Bristol_Standard_Asynchronous_Protocol)|[🔗](http://www.documentation.emersonprocess.com/groups/public/documents/specification_sheets/d301321x012.pdf)[🔗](http://www.documentation.emersonprocess.com/groups/public/documents/instruction_manuals/d301401x012.pdf)||[✓](https://github.com/cisagov/icsnpp-bsap)| -|Distributed Computing Environment / Remote Procedure Calls (DCE/RPC)|[🔗](https://en.wikipedia.org/wiki/DCE/RPC)|[🔗](https://pubs.opengroup.org/onlinepubs/009629399/toc.pdf)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dce-rpc/main.zeek.html#type-DCE_RPC::Info)| -|Dynamic Host Configuration Protocol (DHCP)|[🔗](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol)|[🔗](https://tools.ietf.org/html/rfc2131)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/dhcp.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dhcp/main.zeek.html#type-DHCP::Info)| -|Distributed Network Protocol 3 (DNP3)|[🔗](https://en.wikipedia.org/wiki/DNP3)|[🔗](https://www.dnp.org)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dnp3/main.zeek.html#type-DNP3::Info)[✓](https://github.com/cisagov/icsnpp-dnp3)| -|Domain Name System (DNS)|[🔗](https://en.wikipedia.org/wiki/Domain_Name_System)|[🔗](https://tools.ietf.org/html/rfc1035)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/dns.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dns/main.zeek.html#type-DNS::Info)| -|EtherCAT|[🔗](https://en.wikipedia.org/wiki/EtherCAT)|[🔗](https://www.ethercat.org/en/downloads/downloads_A02E436C7A97479F9261FDFA8A6D71E5.htm)||[✓](https://github.com/cisagov/icsnpp-ethercat)| -|EtherNet/IP / Common Industrial Protocol (CIP)|[🔗](https://en.wikipedia.org/wiki/EtherNet/IP) [🔗](https://en.wikipedia.org/wiki/Common_Industrial_Protocol)|[🔗](https://www.odva.org/Technology-Standards/EtherNet-IP/Overview)||[✓](https://github.com/cisagov/icsnpp-enip)| -|FTP (File Transfer Protocol)|[🔗](https://en.wikipedia.org/wiki/File_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc959)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ftp/info.zeek.html#type-FTP::Info)| -|GENISYS||[🔗](https://manualzz.com/doc/6363274/genisys-2000---ansaldo-sts---product-support#93)[🔗](https://gitlab.com/wireshark/wireshark/-/issues/3422)||[✓](https://github.com/cisagov/icsnpp-genisys)| -|Google Quick UDP Internet Connections (gQUIC)|[🔗](https://en.wikipedia.org/wiki/QUIC#Google_QUIC_(gQUIC))|[🔗](https://www.chromium.org/quic)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/quic.c)|[✓](https://github.com/salesforce/GQUIC_Protocol_Analyzer/blob/master/scripts/Salesforce/GQUIC/main.bro)| -|Hypertext Transfer Protocol (HTTP)|[🔗](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc7230)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/http.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/http/main.zeek.html#type-HTTP::Info)| -|IPsec|[🔗](https://en.wikipedia.org/wiki/IPsec)|[🔗](https://zeek.org/2021/04/20/zeeks-ipsec-protocol-analyzer/)||[✓](https://github.com/corelight/zeek-spicy-ipsec)| -|Internet Relay Chat (IRC)|[🔗](https://en.wikipedia.org/wiki/Internet_Relay_Chat)|[🔗](https://tools.ietf.org/html/rfc1459)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/irc.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/irc/main.zeek.html#type-IRC::Info)| -|Lightweight Directory Access Protocol (LDAP)|[🔗](https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol)|[🔗](https://tools.ietf.org/html/rfc4511)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/ldap.c)|[✓](https://github.com/zeek/spicy-ldap)| -|Kerberos|[🔗](https://en.wikipedia.org/wiki/Kerberos_(protocol))|[🔗](https://tools.ietf.org/html/rfc4120)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/krb5.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/krb/main.zeek.html#type-KRB::Info)| -|Modbus|[🔗](https://en.wikipedia.org/wiki/Modbus)|[🔗](http://www.modbus.org/)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/modbus/main.zeek.html#type-Modbus::Info)[✓](https://github.com/cisagov/icsnpp-modbus)| -|MQ Telemetry Transport (MQTT)|[🔗](https://en.wikipedia.org/wiki/MQTT)|[🔗](https://mqtt.org/)||[✓](https://docs.zeek.org/en/stable/scripts/policy/protocols/mqtt/main.zeek.html)| -|MySQL|[🔗](https://en.wikipedia.org/wiki/MySQL)|[🔗](https://dev.mysql.com/doc/internals/en/client-server-protocol.html)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/mysql.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/mysql/main.zeek.html#type-MySQL::Info)| -|NT Lan Manager (NTLM)|[🔗](https://en.wikipedia.org/wiki/NT_LAN_Manager)|[🔗](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-nlmp/b38c36ed-2804-4868-a9ff-8dd3182128e4?redirectedfrom=MSDN)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ntlm/main.zeek.html#type-NTLM::Info)| -|Network Time Protocol (NTP)|[🔗](https://en.wikipedia.org/wiki/Network_Time_Protocol)|[🔗](http://www.ntp.org)||[✓](https://docs.zeek.org/en/latest/scripts/base/protocols/ntp/main.zeek.html#type-NTP::Info)| -|Oracle|[🔗](https://en.wikipedia.org/wiki/Oracle_Net_Services)|[🔗](https://docs.oracle.com/cd/E11882_01/network.112/e41945/layers.htm#NETAG004)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/oracle.c)|| -|Open Platform Communications Unified Architecture (OPC UA) Binary|[🔗](https://en.wikipedia.org/wiki/OPC_Unified_Architecture)|[🔗](https://opcfoundation.org/developer-tools/specifications-unified-architecture)||[✓](https://github.com/cisagov/icsnpp-opcua-binary)| -|Open Shortest Path First (OSPF)|[🔗](https://en.wikipedia.org/wiki/Open_Shortest_Path_First)|[🔗](https://datatracker.ietf.org/wg/ospf/charter/)[🔗](https://datatracker.ietf.org/doc/html/rfc2328)[🔗](https://datatracker.ietf.org/doc/html/rfc5340)||[✓](https://github.com/corelight/zeek-spicy-ospf)| -|OpenVPN|[🔗](https://en.wikipedia.org/wiki/OpenVPN)|[🔗](https://openvpn.net/community-resources/openvpn-protocol/)[🔗](https://zeek.org/2021/03/16/a-zeek-openvpn-protocol-analyzer/)||[✓](https://github.com/corelight/zeek-spicy-openvpn)| -|PostgreSQL|[🔗](https://en.wikipedia.org/wiki/PostgreSQL)|[🔗](https://www.postgresql.org/)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/postgresql.c)|| -|Process Field Net (PROFINET)|[🔗](https://en.wikipedia.org/wiki/PROFINET)|[🔗](https://us.profinet.com/technology/profinet/)||[✓](https://github.com/amzn/zeek-plugin-profinet/blob/master/scripts/main.zeek)| -|Remote Authentication Dial-In User Service (RADIUS)|[🔗](https://en.wikipedia.org/wiki/RADIUS)|[🔗](https://tools.ietf.org/html/rfc2865)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/radius.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/radius/main.zeek.html#type-RADIUS::Info)| -|Remote Desktop Protocol (RDP)|[🔗](https://en.wikipedia.org/wiki/Remote_Desktop_Protocol)|[🔗](https://docs.microsoft.com/en-us/windows/win32/termserv/remote-desktop-protocol?redirectedfrom=MSDN)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/rdp/main.zeek.html#type-RDP::Info)| -|Remote Framebuffer (RFB)|[🔗](https://en.wikipedia.org/wiki/RFB_protocol)|[🔗](https://tools.ietf.org/html/rfc6143)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/rfb/main.zeek.html#type-RFB::Info)| -|S7comm / Connection Oriented Transport Protocol (COTP)|[🔗](https://wiki.wireshark.org/S7comm) [🔗](https://wiki.wireshark.org/COTP)|[🔗](https://support.industry.siemens.com/cs/document/26483647/what-properties-advantages-and-special-features-does-the-s7-protocol-offer-?dti=0&lc=en-WW) [🔗](https://www.ietf.org/rfc/rfc0905.txt)||[✓](https://github.com/cisagov/icsnpp-s7comm)| -|Secure Shell (SSH)|[🔗](https://en.wikipedia.org/wiki/Secure_Shell)|[🔗](https://tools.ietf.org/html/rfc4253)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/ssh.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ssh/main.zeek.html#type-SSH::Info)| -|Secure Sockets Layer (SSL) / Transport Layer Security (TLS)|[🔗](https://en.wikipedia.org/wiki/Transport_Layer_Security)|[🔗](https://tools.ietf.org/html/rfc5246)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/socks.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ssl/main.zeek.html#type-SSL::Info)| -|Session Initiation Protocol (SIP)|[🔗](https://en.wikipedia.org/wiki/Session_Initiation_Protocol)|[🔗](https://tools.ietf.org/html/rfc3261)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/sip/main.zeek.html#type-SIP::Info)| -|Server Message Block (SMB) / Common Internet File System (CIFS)|[🔗](https://en.wikipedia.org/wiki/Server_Message_Block)|[🔗](https://docs.microsoft.com/en-us/windows/win32/fileio/microsoft-smb-protocol-and-cifs-protocol-overview)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/smb.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/smb/main.zeek.html)| -|Simple Mail Transfer Protocol (SMTP)|[🔗](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc5321)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/smtp.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/smtp/main.zeek.html#type-SMTP::Info)| -|Simple Network Management Protocol (SNMP)|[🔗](https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol)|[🔗](https://tools.ietf.org/html/rfc2578)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/smtp.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/snmp/main.zeek.html#type-SNMP::Info)| -|SOCKS|[🔗](https://en.wikipedia.org/wiki/SOCKS)|[🔗](https://tools.ietf.org/html/rfc1928)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/socks.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/socks/main.zeek.html#type-SOCKS::Info)| -|STUN (Session Traversal Utilities for NAT)|[🔗](https://en.wikipedia.org/wiki/STUN)|[🔗](https://datatracker.ietf.org/doc/html/rfc3489)|[✓](https://github.com/arkime/arkime/blob/main/capture/parsers/misc.c#L147)|[✓](https://github.com/corelight/zeek-spicy-stun)| -|Syslog|[🔗](https://en.wikipedia.org/wiki/Syslog)|[🔗](https://tools.ietf.org/html/rfc5424)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/tls.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/syslog/main.zeek.html#type-Syslog::Info)| -|Tabular Data Stream (TDS)|[🔗](https://en.wikipedia.org/wiki/Tabular_Data_Stream)|[🔗](https://www.freetds.org/tds.html) [🔗](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/b46a581a-39de-4745-b076-ec4dbb7d13ec)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/tds.c)|[✓](https://github.com/amzn/zeek-plugin-tds/blob/master/scripts/main.zeek)| -|Telnet / remote shell (rsh) / remote login (rlogin)|[🔗](https://en.wikipedia.org/wiki/Telnet)[🔗](https://en.wikipedia.org/wiki/Berkeley_r-commands)|[🔗](https://tools.ietf.org/html/rfc854)[🔗](https://tools.ietf.org/html/rfc1282)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/misc.c#L336)|[✓](https://docs.zeek.org/en/current/scripts/base/bif/plugins/Zeek_Login.events.bif.zeek.html)[❋](https://github.com/idaholab/Malcolm/blob/main/zeek/config/login.zeek)| -|TFTP (Trivial File Transfer Protocol)|[🔗](https://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc1350)||[✓](https://github.com/zeek/spicy-analyzers/blob/main/analyzer/protocol/tftp/tftp.zeek)| -|WireGuard|[🔗](https://en.wikipedia.org/wiki/WireGuard)|[🔗](https://www.wireguard.com/protocol/)[🔗](https://www.wireguard.com/papers/wireguard.pdf)||[✓](https://github.com/corelight/zeek-spicy-wireguard)| -|various tunnel protocols (e.g., GTP, GRE, Teredo, AYIYA, IP-in-IP, etc.)|[🔗](https://en.wikipedia.org/wiki/Tunneling_protocol)||[✓](https://github.com/arkime/arkime/blob/master/capture/packet.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/frameworks/tunnels/main.zeek.html#type-Tunnel::Info)| - -Additionally, Zeek is able to detect and, where possible, log the type, vendor and version of [various](https://docs.zeek.org/en/stable/scripts/base/frameworks/software/main.zeek.html#type-Software::Type) other [software protocols](https://en.wikipedia.org/wiki/Application_layer). - -As part of its network traffic analysis, Zeek can extract and analyze files transferred across the protocols it understands. In addition to generating logs for transferred files, deeper analysis is done into the following file types: - -* [Portable executable](https://docs.zeek.org/en/stable/scripts/base/files/pe/main.zeek.html#type-PE::Info) files -* [X.509](https://docs.zeek.org/en/stable/scripts/base/files/x509/main.zeek.html#type-X509::Info) certificates - -See [automatic file extraction and scanning](#ZeekFileExtraction) for additional features related to file scanning. - -See [Zeek log integration](#ArkimeZeek) for more information on how Malcolm integrates [Arkime sessions and Zeek logs](#ZeekArkimeFlowCorrelation) for analysis. - -## Development - -Checking out the [Malcolm source code](https://github.com/idaholab/Malcolm/tree/main) results in the following subdirectories in your `malcolm/` working copy: - -* `api` - code and configuration for the `api` container which provides a REST API to query Malcolm -* `arkime` - code and configuration for the `arkime` container which processes PCAP files using `capture` and which serves the Viewer application -* `arkime-logs` - an initially empty directory to which the `arkime` container will write some debug log files -* `arkime-raw` - an initially empty directory to which the `arkime` container will write captured PCAP files; as Arkime as employed by Malcolm is currently used for processing previously-captured PCAP files, this directory is currently unused -* `Dockerfiles` - a directory containing build instructions for Malcolm's docker images -* `docs` - a directory containing instructions and documentation -* `opensearch` - an initially empty directory where the OpenSearch database instance will reside -* `opensearch-backup` - an initially empty directory for storing OpenSearch [index snapshots](#IndexManagement) -* `filebeat` - code and configuration for the `filebeat` container which ingests Zeek logs and forwards them to the `logstash` container -* `file-monitor` - code and configuration for the `file-monitor` container which can scan files extracted by Zeek -* `file-upload` - code and configuration for the `upload` container which serves a web browser-based upload form for uploading PCAP files and Zeek logs, and which serves an SFTP share as an alternate method for upload -* `freq-server` - code and configuration for the `freq` container used for calculating entropy of strings -* `htadmin` - configuration for the `htadmin` user account management container -* `dashboards` - code and configuration for the `dashboards` container for creating additional ad-hoc visualizations and dashboards beyond that which is provided by Arkime Viewer -* `logstash` - code and configuration for the `logstash` container which parses Zeek logs and forwards them to the `opensearch` container -* `malcolm-iso` - code and configuration for building an [installer ISO](#ISO) for a minimal Debian-based Linux installation for running Malcolm -* `name-map-ui` - code and configuration for the `name-map-ui` container which provides the [host and subnet name mapping](#HostAndSubnetNaming) interface -* `netbox` - code and configuration for the `netbox`, `netbox-postgres`, `netbox-redis` and `netbox-redis-cache` containers which provide asset management capabilities -* `nginx` - configuration for the `nginx` reverse proxy container -* `pcap` - an initially empty directory for PCAP files to be uploaded, processed, and stored -* `pcap-capture` - code and configuration for the `pcap-capture` container which can capture network traffic -* `pcap-monitor` - code and configuration for the `pcap-monitor` container which watches for new or uploaded PCAP files notifies the other services to process them -* `scripts` - control scripts for starting, stopping, restarting, etc. Malcolm -* `sensor-iso` - code and configuration for building a [Hedgehog Linux](#Hedgehog) ISO -* `shared` - miscellaneous code used by various Malcolm components -* `suricata` - code and configuration for the `suricata` container which handles PCAP processing using Suricata -* `suricata-logs` - an initially empty directory for Suricata logs to be uploaded, processed, and stored -* `zeek` - code and configuration for the `zeek` container which handles PCAP processing using Zeek -* `zeek-logs` - an initially empty directory for Zeek logs to be uploaded, processed, and stored - -and the following files of special note: - -* `auth.env` - the script `./scripts/auth_setup` prompts the user for the administrator credentials used by the Malcolm appliance, and `auth.env` is the environment file where those values are stored -* `cidr-map.txt` - specify custom IP address to network segment mapping -* `host-map.txt` - specify custom IP and/or MAC address to host mapping -* `net-map.json` - an alternative to `cidr-map.txt` and `host-map.txt`, mapping hosts and network segments to their names in a JSON-formatted file -* `docker-compose.yml` - the configuration file used by `docker-compose` to build, start, and stop an instance of the Malcolm appliance -* `docker-compose-standalone.yml` - similar to `docker-compose.yml`, only used for the ["packaged"](#Packager) installation of Malcolm - -### Building from source - -Building the Malcolm docker images from scratch requires internet access to pull source files for its components. Once internet access is available, execute the following command to build all of the Docker images used by the Malcolm appliance: - -``` -$ ./scripts/build.sh -``` - -Then, go take a walk or something since it will be a while. When you're done, you can run `docker images` and see you have fresh images for: - -* `malcolmnetsec/api` (based on `python:3-slim`) -* `malcolmnetsec/arkime` (based on `debian:11-slim`) -* `malcolmnetsec/dashboards-helper` (based on `alpine:3.16`) -* `malcolmnetsec/dashboards` (based on `opensearchproject/opensearch-dashboards`) -* `malcolmnetsec/file-monitor` (based on `debian:11-slim`) -* `malcolmnetsec/file-upload` (based on `debian:11-slim`) -* `malcolmnetsec/filebeat-oss` (based on `docker.elastic.co/beats/filebeat-oss`) -* `malcolmnetsec/freq` (based on `debian:11-slim`) -* `malcolmnetsec/htadmin` (based on `debian:11-slim`) -* `malcolmnetsec/logstash-oss` (based on `opensearchproject/logstash-oss-with-opensearch-output-plugin`) -* `malcolmnetsec/name-map-ui` (based on `alpine:3.16`) -* `malcolmnetsec/netbox` (based on `netboxcommunity/netbox:latest`) -* `malcolmnetsec/nginx-proxy` (based on `alpine:3.16`) -* `malcolmnetsec/opensearch` (based on `opensearchproject/opensearch`) -* `malcolmnetsec/pcap-capture` (based on `debian:11-slim`) -* `malcolmnetsec/pcap-monitor` (based on `debian:11-slim`) -* `malcolmnetsec/postgresql` (based on `postgres:14-alpine`) -* `malcolmnetsec/redis` (based on `redis:7-alpine`) -* `malcolmnetsec/suricata` (based on `debian:11-slim`) -* `malcolmnetsec/zeek` (based on `debian:11-slim`) - -Alternately, if you have forked Malcolm on GitHub, [workflow files](./.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](#Hedgehog) and [Malcolm](#ISO) installer ISOs. The resulting images are named according to the pattern `ghcr.io/owner/malcolmnetsec/image:branch` (e.g., if you've forked Malcolm with the github user `romeogdetlevjr`, the `arkime` container built for the `main` would be named `ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main`). To run your local instance of Malcolm using these images instead of the official ones, you'll need to edit your `docker-compose.yml` file(s) and replace the `image:` tags according to this new pattern, or use the bash helper script `./shared/bin/github_image_helper.sh` to pull and re-tag the images. - -## Pre-Packaged installation files - -### Creating pre-packaged installation files - -`scripts/malcolm_appliance_packager.sh` can be run to package up the configuration files (and, if necessary, the Docker images) which can be copied to a network share or USB drive for distribution to non-networked machines. For example: - -``` -$ ./scripts/malcolm_appliance_packager.sh -You must set a username and password for Malcolm, and self-signed X.509 certificates will be generated - -Store administrator username/password for local Malcolm access? (Y/n): y - -Administrator username: analyst -analyst password: -analyst password (again): - -(Re)generate self-signed certificates for HTTPS access (Y/n): y - -(Re)generate self-signed certificates for a remote log forwarder (Y/n): y - -Store username/password for primary remote OpenSearch instance? (y/N): n - -Store username/password for secondary remote OpenSearch instance? (y/N): n - -Store username/password for email alert sender account? (y/N): n - -(Re)generate internal passwords for NetBox (Y/n): y - -Packaged Malcolm to "/home/user/tmp/malcolm_20190513_101117_f0d052c.tar.gz" - -Do you need to package docker images also [y/N]? y -This might take a few minutes... - -Packaged Malcolm docker images to "/home/user/tmp/malcolm_20190513_101117_f0d052c_images.tar.gz" - - -To install Malcolm: - 1. Run install.py - 2. Follow the prompts - -To start, stop, restart, etc. Malcolm: - Use the control scripts in the "scripts/" directory: - - start (start Malcolm) - - stop (stop Malcolm) - - restart (restart Malcolm) - - logs (monitor Malcolm logs) - - wipe (stop Malcolm and clear its database) - - auth_setup (change authentication-related settings) - -A minute or so after starting Malcolm, the following services will be accessible: - - Arkime: https://localhost/ - - OpenSearch Dashboards: https://localhost/dashboards/ - - PCAP upload (web): https://localhost/upload/ - - PCAP upload (sftp): sftp://USERNAME@127.0.0.1:8022/files/ - - Host and subnet name mapping editor: https://localhost/name-map-ui/ - - NetBox: https://localhost/netbox/ - - Account management: https://localhost:488/ -``` - -The above example will result in the following artifacts for distribution as explained in the script's output: - -``` -$ ls -lh -total 2.0G --rwxr-xr-x 1 user user 61k May 13 11:32 install.py --rw-r--r-- 1 user user 2.0G May 13 11:37 malcolm_20190513_101117_f0d052c_images.tar.gz --rw-r--r-- 1 user user 683 May 13 11:37 malcolm_20190513_101117_f0d052c.README.txt --rw-r--r-- 1 user user 183k May 13 11:32 malcolm_20190513_101117_f0d052c.tar.gz -``` - -### Installing from pre-packaged installation files - -If you have obtained pre-packaged installation files to install Malcolm on a non-networked machine via an internal network share or on a USB key, you likely have the following files: - -* `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.README.txt` - This readme file contains a minimal set up instructions for extracting the contents of the other tarballs and running the Malcolm appliance. -* `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` - This tarball contains the configuration files and directory configuration used by an instance of Malcolm. It can be extracted via `tar -xf malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` upon which a directory will be created (named similarly to the tarball) containing the directories and configuration files. Alternatively, `install.py` can accept this filename as an argument and handle its extraction and initial configuration for you. -* `malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz` - This tarball contains the Docker images used by Malcolm. It can be imported manually via `docker load -i malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz` -* `install.py` - This install script can load the Docker images and extract Malcolm configuration files from the aforementioned tarballs and do some initial configuration for you. - -Run `install.py malcolm_XXXXXXXX_XXXXXX_XXXXXXX.tar.gz` and follow the prompts. If you do not already have Docker and Docker Compose installed, the `install.py` script will help you install them. - -## Preparing your system - -### Recommended system requirements - -Malcolm runs on top of [Docker](https://www.docker.com/) which runs on recent releases of Linux, Apple macOS and Microsoft Windows 10. - -To quote the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html), "If there is one resource that you will run out of first, it will likely be memory." The same is true for Malcolm: you will want at least 16 gigabytes of RAM to run Malcolm comfortably. For processing large volumes of traffic, I'd recommend at a bare minimum a dedicated server with 16 cores and 16 gigabytes of RAM. Malcolm can run on less, but more is better. You're going to want as much hard drive space as possible, of course, as the amount of PCAP data you're able to analyze and store will be limited by your hard drive. - -Arkime's wiki has a couple of documents ([here](https://github.com/arkime/arkime#hardware-requirements) and [here](https://github.com/arkime/arkime/wiki/FAQ#what-kind-of-capture-machines-should-we-buy) and [here](https://github.com/arkime/arkime/wiki/FAQ#how-many-elasticsearch-nodes-or-machines-do-i-need) and a [calculator here](https://molo.ch/#estimators)) which may be helpful, although not everything in those documents will apply to a Docker-based setup like Malcolm. - -### System configuration and tuning - -If you already have Docker and Docker Compose installed, the `install.py` script can still help you tune system configuration and `docker-compose.yml` parameters for Malcolm. To run it in "configuration only" mode, bypassing the steps to install Docker and Docker Compose, run it like this: -``` -./scripts/install.py --configure -``` - -Although `install.py` will attempt to automate many of the following configuration and tuning parameters, they are nonetheless listed in the following sections for reference: - -#### `docker-compose.yml` parameters - -Edit `docker-compose.yml` and search for the `OPENSEARCH_JAVA_OPTS` key. Edit the `-Xms4g -Xmx4g` values, replacing `4g` with a number that is half of your total system memory, or just under 32 gigabytes, whichever is less. So, for example, if I had 64 gigabytes of memory I would edit those values to be `-Xms31g -Xmx31g`. This indicates how much memory can be allocated to the OpenSearch heaps. For a pleasant experience, I would suggest not using a value under 10 gigabytes. Similar values can be modified for Logstash with `LS_JAVA_OPTS`, where using 3 or 4 gigabytes is recommended. - -Various other environment variables inside of `docker-compose.yml` can be tweaked to control aspects of how Malcolm behaves, particularly with regards to processing PCAP files and Zeek logs. The environment variables of particular interest are located near the top of that file under **Commonly tweaked configuration options**, which include: - -* `ARKIME_ANALYZE_PCAP_THREADS` – the number of threads available to Arkime for analyzing PCAP files (default `1`) -* `AUTO_TAG` – if set to `true`, Malcolm will automatically create Arkime sessions and Zeek logs with tags based on the filename, as described in [Tagging](#Tagging) (default `true`) -* `BEATS_SSL` – if set to `true`, Logstash will use require encrypted communications for any external [Beats](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html)-based forwarders from which it will accept logs (default `true`) -* `CONNECTION_SECONDS_SEVERITY_THRESHOLD` - when [severity scoring](#Severity) is enabled, this variable indicates the duration threshold (in seconds) for assigning severity to long connections (default `3600`) -* `EXTRACTED_FILE_CAPA_VERBOSE` – if set to `true`, all Capa rule hits will be logged; otherwise (`false`) only [MITRE ATT&CK® technique](https://attack.mitre.org/techniques) classifications will be logged -* `EXTRACTED_FILE_ENABLE_CAPA` – if set to `true`, [Zeek-extracted files](#ZeekFileExtraction) that are determined to be PE (portable executable) files will be scanned with [Capa](https://github.com/fireeye/capa) -* `EXTRACTED_FILE_ENABLE_CLAMAV` – if set to `true`, [Zeek-extracted files](#ZeekFileExtraction) will be scanned with [ClamAV](https://www.clamav.net/) -* `EXTRACTED_FILE_ENABLE_YARA` – if set to `true`, [Zeek-extracted files](#ZeekFileExtraction) will be scanned with [Yara](https://github.com/VirusTotal/yara) -* `EXTRACTED_FILE_HTTP_SERVER_ENABLE` – if set to `true`, the directory containing [Zeek-extracted files](#ZeekFileExtraction) will be served over HTTP at `./extracted-files/` (e.g., [https://localhost/extracted-files/](https://localhost/extracted-files/) if you are connecting locally) -* `EXTRACTED_FILE_HTTP_SERVER_ENCRYPT` – if set to `true`, those Zeek-extracted files will be AES-256-CBC-encrypted in an `openssl enc`-compatible format (e.g., `openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe`) -* `EXTRACTED_FILE_HTTP_SERVER_KEY` – specifies the AES-256-CBC decryption password for encrypted Zeek-extracted files; used in conjunction with `EXTRACTED_FILE_HTTP_SERVER_ENCRYPT` -* `EXTRACTED_FILE_IGNORE_EXISTING` – if set to `true`, files extant in `./zeek-logs/extract_files/` directory will be ignored on startup rather than scanned -* `EXTRACTED_FILE_PRESERVATION` – determines behavior for preservation of [Zeek-extracted files](#ZeekFileExtraction) -* `EXTRACTED_FILE_UPDATE_RULES` – if set to `true`, file scanner engines (e.g., ClamAV, Capa, Yara) will periodically update their rule definitions -* `EXTRACTED_FILE_YARA_CUSTOM_ONLY` – if set to `true`, Malcolm will bypass the default [Yara ruleset](https://github.com/Neo23x0/signature-base) and use only user-defined rules in `./yara/rules` -* `FREQ_LOOKUP` - if set to `true`, domain names (from DNS queries and SSL server names) will be assigned entropy scores as calculated by [`freq`](https://github.com/MarkBaggett/freq) (default `false`) -* `FREQ_SEVERITY_THRESHOLD` - when [severity scoring](#Severity) is enabled, this variable indicates the entropy threshold for assigning severity to events with entropy scores calculated by [`freq`](https://github.com/MarkBaggett/freq); a lower value will only assign severity scores to fewer domain names with higher entropy (e.g., `2.0` for `NQZHTFHRMYMTVBQJE.COM`), while a higher value will assign severity scores to more domain names with lower entropy (e.g., `7.5` for `naturallanguagedomain.example.org`) (default `2.0`) -* `LOGSTASH_OUI_LOOKUP` – if set to `true`, Logstash will map MAC addresses to vendors for all source and destination MAC addresses when analyzing Zeek logs (default `true`) -* `LOGSTASH_REVERSE_DNS` – if set to `true`, Logstash will perform a reverse DNS lookup for all external source and destination IP address values when analyzing Zeek logs (default `false`) -* `LOGSTASH_SEVERITY_SCORING` - if set to `true`, Logstash will perform [severity scoring](#Severity) when analyzing Zeek logs (default `true`) -* `MANAGE_PCAP_FILES` – if set to `true`, all PCAP files imported into Malcolm will be marked as available for deletion by Arkime if available storage space becomes too low (default `false`) -* `MAXMIND_GEOIP_DB_LICENSE_KEY` - Malcolm uses MaxMind's free GeoLite2 databases for GeoIP lookups. As of December 30, 2019, these databases are [no longer available](https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/) for download via a public URL. Instead, they must be downloaded using a MaxMind license key (available without charge [from MaxMind](https://www.maxmind.com/en/geolite2/signup)). The license key can be specified here for GeoIP database downloads during build- and run-time. -* `OPENSEARCH_LOCAL` - if set to `true`, Malcolm will use its own internal [OpenSearch instance](#OpenSearchInstance) (default `true`) -* `OPENSEARCH_URL` - when using Malcolm's internal OpenSearch instance (i.e., `OPENSEARCH_LOCAL` is `true`) this should be `http://opensearch:9200`, otherwise this value specifies the primary remote instance URL in the format `protocol://host:port` (default `http://opensearch:9200`) -* `OPENSEARCH_SSL_CERTIFICATE_VERIFICATION` - if set to `true`, connections to the primary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (default `false`) -* `OPENSEARCH_SECONDARY` - if set to `true`, Malcolm will forward logs to a secondary remote OpenSearch instance in addition to the primary (local or remote) OpenSearch instance (default `false`) -* `OPENSEARCH_SECONDARY_URL` - when forwarding to a secondary remote OpenSearch instance (i.e., `OPENSEARCH_SECONDARY` is `true`) this value specifies the secondary remote instance URL in the format `protocol://host:port` -* `OPENSEARCH_SECONDARY_SSL_CERTIFICATE_VERIFICATION` - if set to `true`, connections to the secondary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (default `false`) -* `NETBOX_DISABLED` - if set to `true`, Malcolm will **not** start [NetBox](#NetBox) and manage a [NetBox](#NetBox) instance (default `true`) -* `NETBOX_CRON` - if set to `true`, network traffic metadata will periodically be queried and used to populate Malcolm's [NetBox](#NetBox) instance -* `NGINX_BASIC_AUTH` - if set to `true`, use [TLS-encrypted HTTP basic](#AuthBasicAccountManagement) authentication (default); if set to `false`, use [Lightweight Directory Access Protocol (LDAP)](#AuthLDAP) authentication -* `NGINX_LOG_ACCESS_AND_ERRORS` - if set to `true`, all access to Malcolm via its [web interfaces](#UserInterfaceURLs) will be logged to OpenSearch (default `false`) -* `NGINX_SSL` - if set to `true`, require HTTPS connections to Malcolm's `nginx-proxy` container (default); if set to `false`, use unencrypted HTTP connections (using unsecured HTTP connections is **NOT** recommended unless you are running Malcolm behind another reverse proxy like Traefik, Caddy, etc.) -* `PCAP_ENABLE_NETSNIFF` – if set to `true`, Malcolm will capture network traffic on the local network interface(s) indicated in `PCAP_IFACE` using [netsniff-ng](http://netsniff-ng.org/) -* `PCAP_ENABLE_TCPDUMP` – if set to `true`, Malcolm will capture network traffic on the local network interface(s) indicated in `PCAP_IFACE` using [tcpdump](https://www.tcpdump.org/); there is no reason to enable *both* `PCAP_ENABLE_NETSNIFF` and `PCAP_ENABLE_TCPDUMP` -* `PCAP_FILTER` – specifies a tcpdump-style filter expression for local packet capture; leave blank to capture all traffic -* `PCAP_IFACE` – used to specify the network interface(s) for local packet capture if `PCAP_ENABLE_NETSNIFF`, `PCAP_ENABLE_TCPDUMP`, `ZEEK_LIVE_CAPTURE` or `SURICATA_LIVE_CAPTURE` are enabled; for multiple interfaces, separate the interface names with a comma (e.g., `'enp0s25'` or `'enp10s0,enp11s0'`) -* `PCAP_IFACE_TWEAK` - if set to `true`, Malcolm will [use `ethtool`](shared/bin/nic-capture-setup.sh) to disable NIC hardware offloading features and adjust ring buffer sizes for capture interface(s); this should be `true` if the interface(s) are being used for capture only, `false` if they are being used for management/communication -* `PCAP_ROTATE_MEGABYTES` – used to specify how large a locally-captured PCAP file can become (in megabytes) before it is closed for processing and a new PCAP file created -* `PCAP_ROTATE_MINUTES` – used to specify a time interval (in minutes) after which a locally-captured PCAP file will be closed for processing and a new PCAP file created -* `pipeline.workers`, `pipeline.batch.size` and `pipeline.batch.delay` - these settings are used to tune the performance and resource utilization of the the `logstash` container; see [Tuning and Profiling Logstash Performance](https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html), [`logstash.yml`](https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html) and [Multiple Pipelines](https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html) -* `PUID` and `PGID` - Docker runs all of its containers as the privileged `root` user by default. For better security, Malcolm immediately drops to non-privileged user accounts for executing internal processes wherever possible. The `PUID` (**p**rocess **u**ser **ID**) and `PGID` (**p**rocess **g**roup **ID**) environment variables allow Malcolm to map internal non-privileged user accounts to a corresponding [user account](https://en.wikipedia.org/wiki/User_identifier) on the host. -* `SENSITIVE_COUNTRY_CODES` - when [severity scoring](#Severity) is enabled, this variable defines a comma-separated list of sensitive countries (using [ISO 3166-1 alpha-2 codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Current_codes)) (default `'AM,AZ,BY,CN,CU,DZ,GE,HK,IL,IN,IQ,IR,KG,KP,KZ,LY,MD,MO,PK,RU,SD,SS,SY,TJ,TM,TW,UA,UZ'`, taken from the U.S. Department of Energy Sensitive Country List) -* `SURICATA_AUTO_ANALYZE_PCAP_FILES` – if set to `true`, all PCAP files imported into Malcolm will automatically be analyzed by Suricata, and the resulting logs will also be imported (default `false`) -* `SURICATA_AUTO_ANALYZE_PCAP_THREADS` – the number of threads available to Malcolm for analyzing Suricata logs (default `1`) -* `SURICATA_CUSTOM_RULES_ONLY` – if set to `true`, Malcolm will bypass the default [Suricata ruleset](https://github.com/OISF/suricata/tree/master/rules) and use only user-defined rules (`./suricata/rules/*.rules`). -* `SURICATA_UPDATE_RULES` – if set to `true`, Suricata signatures will periodically be updated (default `false`) -* `SURICATA_LIVE_CAPTURE` - if set to `true`, Suricata will monitor live traffic on the local interface(s) defined by `PCAP_FILTER` -* `SURICATA_ROTATED_PCAP` - if set to `true`, Suricata can analyze captured PCAP files captured by `netsniff-ng` or `tcpdump` (see `PCAP_ENABLE_NETSNIFF` and `PCAP_ENABLE_TCPDUMP`, as well as `SURICATA_AUTO_ANALYZE_PCAP_FILES`); if `SURICATA_LIVE_CAPTURE` is `true`, this should be false, otherwise Suricata will see duplicate traffic -* `SURICATA_…` - the [`suricata` container entrypoint script](shared/bin/suricata_config_populate.py) can use **many** more environment variables to tweak [suricata.yaml](https://github.com/OISF/suricata/blob/master/suricata.yaml.in); in that script, `DEFAULT_VARS` defines those variables (albeit without the `SURICATA_` prefix you must add to each for use) -* `TOTAL_MEGABYTES_SEVERITY_THRESHOLD` - when [severity scoring](#Severity) is enabled, this variable indicates the size threshold (in megabytes) for assigning severity to large connections or file transfers (default `1000`) -* `VTOT_API2_KEY` – used to specify a [VirusTotal Public API v.20](https://www.virustotal.com/en/documentation/public-api/) key, which, if specified, will be used to submit hashes of [Zeek-extracted files](#ZeekFileExtraction) to VirusTotal -* `ZEEK_AUTO_ANALYZE_PCAP_FILES` – if set to `true`, all PCAP files imported into Malcolm will automatically be analyzed by Zeek, and the resulting logs will also be imported (default `false`) -* `ZEEK_AUTO_ANALYZE_PCAP_THREADS` – the number of threads available to Malcolm for analyzing Zeek logs (default `1`) -* `ZEEK_DISABLE_…` - if set to any non-blank value, each of these variables can be used to disable a certain Zeek function when it analyzes PCAP files (for example, setting `ZEEK_DISABLE_LOG_PASSWORDS` to `true` to disable logging of cleartext passwords) -* `ZEEK_DISABLE_BEST_GUESS_ICS` - see ["Best Guess" Fingerprinting for ICS Protocols](#ICSBestGuess) -* `ZEEK_EXTRACTOR_MODE` – determines the file extraction behavior for file transfers detected by Zeek; see [Automatic file extraction and scanning](#ZeekFileExtraction) for more details -* `ZEEK_INTEL_FEED_SINCE` - when querying a [TAXII](#ZeekIntelSTIX) or [MISP](#ZeekIntelMISP) feed, only process threat indicators that have been created or modified since the time represented by this value; it may be either a fixed date/time (`01/01/2021`) or relative interval (`30 days ago`) -* `ZEEK_INTEL_ITEM_EXPIRATION` - specifies the value for Zeek's [`Intel::item_expiration`](https://docs.zeek.org/en/current/scripts/base/frameworks/intel/main.zeek.html#id-Intel::item_expiration) timeout as used by the [Zeek Intelligence Framework](#ZeekIntel) (default `-1min`, which disables item expiration) -* `ZEEK_INTEL_REFRESH_CRON_EXPRESSION` - specifies a [cron expression](https://en.wikipedia.org/wiki/Cron#CRON_expression) indicating the refresh interval for generating the [Zeek Intelligence Framework](#ZeekIntel) files (defaults to empty, which disables automatic refresh) -* `ZEEK_LIVE_CAPTURE` - if set to `true`, Zeek will monitor live traffic on the local interface(s) defined by `PCAP_FILTER` -* `ZEEK_ROTATED_PCAP` - if set to `true`, Zeek can analyze captured PCAP files captured by `netsniff-ng` or `tcpdump` (see `PCAP_ENABLE_NETSNIFF` and `PCAP_ENABLE_TCPDUMP`, as well as `ZEEK_AUTO_ANALYZE_PCAP_FILES`); if `ZEEK_LIVE_CAPTURE` is `true`, this should be false, otherwise Zeek will see duplicate traffic - -#### Linux host system configuration - -##### Installing Docker - -Docker installation instructions vary slightly by distribution. Please follow the links below to docker.com to find the instructions specific to your distribution: - -* [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) -* [Debian](https://docs.docker.com/install/linux/docker-ce/debian/) -* [Fedora](https://docs.docker.com/install/linux/docker-ce/fedora/) -* [CentOS](https://docs.docker.com/install/linux/docker-ce/centos/) -* [Binaries](https://docs.docker.com/install/linux/docker-ce/binaries/) - -After installing Docker, because Malcolm should be run as a non-root user, add your user to the `docker` group with something like: -``` -$ sudo usermod -aG docker yourusername -``` - -Following this, either reboot or log out then log back in. - -Docker starts automatically on DEB-based distributions. On RPM-based distributions, you need to start it manually or enable it using the appropriate `systemctl` or `service` command(s). - -You can test docker by running `docker info`, or (assuming you have internet access), `docker run --rm hello-world`. - -##### Installing docker-compose - -Please follow [this link](https://docs.docker.com/compose/install/) on docker.com for instructions on installing docker-compose. - -##### Operating system configuration - -The host system (ie., the one running Docker) will need to be configured for the [best possible OpenSearch performance](https://www.elastic.co/guide/en/elasticsearch/reference/master/system-config.html). Here are a few suggestions for Linux hosts (these may vary from distribution to distribution): - -* Append the following lines to `/etc/sysctl.conf`: - -``` -# the maximum number of open file handles -fs.file-max=2097152 - -# increase maximums for inotify watches -fs.inotify.max_user_watches=131072 -fs.inotify.max_queued_events=131072 -fs.inotify.max_user_instances=512 - -# the maximum number of memory map areas a process may have -vm.max_map_count=262144 - -# decrease "swappiness" (swapping out runtime memory vs. dropping pages) -vm.swappiness=1 - -# the maximum number of incoming connections -net.core.somaxconn=65535 - -# the % of system memory fillable with "dirty" pages before flushing -vm.dirty_background_ratio=40 - -# maximum % of dirty system memory before committing everything -vm.dirty_ratio=80 -``` - -* Depending on your distribution, create **either** the file `/etc/security/limits.d/limits.conf` containing: - -``` -# the maximum number of open file handles -* soft nofile 65535 -* hard nofile 65535 -# do not limit the size of memory that can be locked -* soft memlock unlimited -* hard memlock unlimited -``` - -**OR** the file `/etc/systemd/system.conf.d/limits.conf` containing: - -``` -[Manager] -# the maximum number of open file handles -DefaultLimitNOFILE=65535:65535 -# do not limit the size of memory that can be locked -DefaultLimitMEMLOCK=infinity -``` - -* Change the readahead value for the disk where the OpenSearch data will be stored. There are a few ways to do this. For example, you could add this line to `/etc/rc.local` (replacing `/dev/sda` with your disk block descriptor): - -``` -# change disk read-adhead value (# of blocks) -blockdev --setra 512 /dev/sda -``` - -* Change the I/O scheduler to `deadline` or `noop`. Again, this can be done in a variety of ways. The simplest is to add `elevator=deadline` to the arguments in `GRUB_CMDLINE_LINUX` in `/etc/default/grub`, then running `sudo update-grub2` - -* If you are planning on using very large data sets, consider formatting the drive containing the `opensearch` volume as XFS. - -After making all of these changes, do a reboot for good measure! - -#### macOS host system configuration - -##### Automatic installation using `install.py` - -The `install.py` script will attempt to guide you through the installation of Docker and Docker Compose if they are not present. If that works for you, you can skip ahead to **Configure docker daemon option** in this section. - -##### Install Homebrew - -The easiest way to install and maintain docker on Mac is using the [Homebrew cask](https://brew.sh). Execute the following in a terminal. - -``` -$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" -$ brew install cask -$ brew tap homebrew/cask-versions -``` - -##### Install docker-edge - -``` -$ brew cask install docker-edge -``` -This will install the latest version of docker and docker-compose. It can be upgraded later using `brew` as well: -``` -$ brew cask upgrade --no-quarantine docker-edge -``` -You can now run docker from the Applications folder. - -##### Configure docker daemon option - -Some changes should be made for performance ([this link](http://markshust.com/2018/01/30/performance-tuning-docker-mac) gives a good succinct overview). - -* **Resource allocation** - For a good experience, you likely need at least a quad-core MacBook Pro with 16GB RAM and an SSD. I have run Malcolm on an older 2013 MacBook Pro with 8GB of RAM, but the more the better. Go in your system tray and select **Docker** → **Preferences** → **Advanced**. Set the resources available to docker to at least 4 CPUs and 8GB of RAM (>= 16GB is preferable). - -* **Volume mount performance** - You can speed up performance of volume mounts by removing unused paths from **Docker** → **Preferences** → **File Sharing**. For example, if you're only going to be mounting volumes under your home directory, you could share `/Users` but remove other paths. - -After making these changes, right click on the Docker 🐋 icon in the system tray and select **Restart**. - -#### Windows host system configuration - -#### Installing and configuring Docker Desktop for Windows - -Installing and configuring [Docker to run under Windows](https://docs.docker.com/desktop/windows/wsl/) must be done manually, rather than through the `install.py` script as is done for Linux and macOS. - -1. Be running Windows 10, version 1903 or higher -1. Prepare your system and [install WSL](https://docs.microsoft.com/en-us/windows/wsl/install) and a Linux distribution by running `wsl --install -d Debian` in PowerShell as Administrator (these instructions are tested with Debian, but may work with other distributions) -1. Install Docker Desktop for Windows either by downloading the installer from the [official Docker site](https://hub.docker.com/editions/community/docker-ce-desktop-windows) or installing it through [chocolatey](https://chocolatey.org/packages/docker-desktop). -1. Follow the [Docker Desktop WSL 2 backend](https://docs.docker.com/desktop/windows/wsl/) instructions to finish configuration and review best practices -1. Reboot -1. Open the WSL distribution's terminal and run run `docker info` to make sure Docker is running - -#### Finish Malcolm's configuration - -Once Docker is installed, configured and running as described in the previous section, run [`./scripts/install.py --configure`](#ConfigAndTuning) to finish configuration of the local Malcolm installation. Malcolm will be controlled and run from within your WSL distribution's terminal environment. - -## Running Malcolm - -### OpenSearch instances - -Malcolm's default standalone configuration is to use a local [OpenSearch](https://opensearch.org/) instance in a Docker container to index and search network traffic metadata. OpenSearch can also run as a [cluster](https://opensearch.org/docs/latest/opensearch/cluster/) with instances distributed across multiple nodes with dedicated [roles](https://opensearch.org/docs/latest/opensearch/cluster/#nodes) like cluster manager, data node, ingest node, etc. - -As the permutations of OpenSearch cluster configurations are numerous, it is beyond Malcolm's scope to set up multi-node clusters. However, Malcolm can be configured to use a remote OpenSearch cluster rather than its own internal instance. - -The `OPENSEARCH_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) control whether Malcolm uses its own local OpenSearch instance or a remote OpenSearch instance as its primary data store. The configuration portion of Malcolm install script ([`./scripts/install.py --configure`](#ConfigAndTuning)) can help you configure these options. - -For example, to use the default standalone configuration, answer `Y` when prompted `Should Malcolm use and maintain its own OpenSearch instance?`. - -Or, to use a remote OpenSearch cluster: - -``` -… -Should Malcolm use and maintain its own OpenSearch instance? (Y/n): n - -Enter primary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.123:9200 - -Require SSL certificate validation for communication with primary OpenSearch instance? (y/N): n - -You must run auth_setup after install.py to store OpenSearch connection credentials. -… -``` - -Whether the primary OpenSearch instance is a locally maintained single-node instance or is a remote cluster, Malcolm can be configured additionally forward logs to a secondary remote OpenSearch instance. The `OPENSEARCH_SECONDARY_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) control this behavior. Configuration of a remote secondary OpenSearch instance is similar to that of a remote primary OpenSearch instance: - - -``` -… -Forward Logstash logs to a secondary remote OpenSearch instance? (y/N): y - -Enter secondary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.124:9200 - -Require SSL certificate validation for communication with secondary OpenSearch instance? (y/N): n - -You must run auth_setup after install.py to store OpenSearch connection credentials. -… -``` - -#### Authentication and authorization for remote OpenSearch clusters - -In addition to setting the environment variables in [`docker-compose.yml`](#DockerComposeYml) as described above, you must provide Malcolm with credentials for it to be able to communicate with remote OpenSearch instances. These credentials are stored in the Malcolm installation directory as `.opensearch.primary.curlrc` and `.opensearch.secondary.curlrc` for the primary and secondary OpenSearch connections, respectively, and are bind mounted into the Docker containers which need to communicate with OpenSearch. These [cURL-formatted](https://everything.curl.dev/cmdline/configfile) config files can be generated for you by the [`auth_setup`](#AuthSetup) script as illustrated: - -``` -$ ./scripts/auth_setup - -… - -Store username/password for primary remote OpenSearch instance? (y/N): y - -OpenSearch username: servicedb -servicedb password: -servicedb password (again): - -Require SSL certificate validation for OpenSearch communication? (Y/n): n - -Store username/password for secondary remote OpenSearch instance? (y/N): y - -OpenSearch username: remotedb -remotedb password: -remotedb password (again): - -Require SSL certificate validation for OpenSearch communication? (Y/n): n - -… -``` - -These files are created with permissions such that only the user account running Malcolm can access them: - -``` -$ ls -la .opensearch.*.curlrc --rw------- 1 user user 36 Aug 22 14:17 .opensearch.primary.curlrc --rw------- 1 user user 35 Aug 22 14:18 .opensearch.secondary.curlrc -``` - -One caveat with Malcolm using a remote OpenSearch cluster as its primary document store is that the accounts used to access Malcolm's [web interfaces](#UserInterfaceURLs), particularly [OpenSearch Dashboards](#Dashboards), are in some instance passed directly through to OpenSearch itself. For this reason, both Malcolm and the remote primary OpenSearch instance must have the same account information. The easiest way to accomplish this is to use an Active Directory/LDAP server that both [Malcolm](#AuthLDAP) and [OpenSearch](https://opensearch.org/docs/latest/security-plugin/configuration/ldap/) use as a common authentication backend. - -See the OpenSearch documentation on [access control](https://opensearch.org/docs/latest/security-plugin/access-control/index/) for more information. - -### Configure authentication - -Malcolm requires authentication to access the [user interface](#UserInterfaceURLs). [Nginx](https://nginx.org/) can authenticate users with either local TLS-encrypted HTTP basic authentication or using a remote Lightweight Directory Access Protocol (LDAP) authentication server. - -With the local basic authentication method, user accounts are managed by Malcolm and can be created, modified, and deleted using a [user management web interface](#AccountManagement). This method is suitable in instances where accounts and credentials do not need to be synced across many Malcolm installations. - -LDAP authentication are managed on a remote directory service, such as a [Microsoft Active Directory Domain Services](https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/get-started/virtual-dc/active-directory-domain-services-overview) or [OpenLDAP](https://www.openldap.org/). - -Malcolm's authentication method is defined in the `x-auth-variables` section near the top of the [`docker-compose.yml`](#DockerComposeYml) file with the `NGINX_BASIC_AUTH` environment variable: `true` for local TLS-encrypted HTTP basic authentication, `false` for LDAP authentication. - -In either case, you **must** run `./scripts/auth_setup` before starting Malcolm for the first time in order to: - -* define the local Malcolm administrator account username and password (although these credentials will only be used for basic authentication, not LDAP authentication) -* specify whether or not to (re)generate the self-signed certificates used for HTTPS access - * key and certificate files are located in the `nginx/certs/` directory -* specify whether or not to (re)generate the self-signed certificates used by a remote log forwarder (see the `BEATS_SSL` environment variable above) - * certificate authority, certificate, and key files for Malcolm's Logstash instance are located in the `logstash/certs/` directory - * certificate authority, certificate, and key files to be copied to and used by the remote log forwarder are located in the `filebeat/certs/` directory; if using [Hedgehog Linux](#Hedgehog), these certificates should be copied to the `/opt/sensor/sensor_ctl/logstash-client-certificates` directory on the sensor -* specify whether or not to [store the username/password](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#authenticate-sender-account) for [email alert senders](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#create-destinations) - * these parameters are stored securely in the OpenSearch keystore file `opensearch/opensearch.keystore` - -##### Local account management - -[`auth_setup`](#AuthSetup) is used to define the username and password for the administrator account. Once Malcolm is running, the administrator account can be used to manage other user accounts via a **Malcolm User Management** page served over HTTPS on port 488 (e.g., [https://localhost:488](https://localhost:488) if you are connecting locally). - -Malcolm user accounts can be used to access the [interfaces](#UserInterfaceURLs) of all of its [components](#Components), including Arkime. Arkime uses its own internal database of user accounts, so when a Malcolm user account logs in to Arkime for the first time Malcolm creates a corresponding Arkime user account automatically. This being the case, it is *not* recommended to use the Arkime **Users** settings page or change the password via the **Password** form under the Arkime **Settings** page, as those settings would not be consistently used across Malcolm. - -Users may change their passwords via the **Malcolm User Management** page by clicking **User Self Service**. A forgotten password can also be reset via an emailed link, though this requires SMTP server settings to be specified in `htadmin/config.ini` in the Malcolm installation directory. - -#### Lightweight Directory Access Protocol (LDAP) authentication - -The [nginx-auth-ldap](https://github.com/kvspb/nginx-auth-ldap) module serves as the interface between Malcolm's [Nginx](https://nginx.org/) web server and a remote LDAP server. When you run [`auth_setup`](#AuthSetup) for the first time, a sample LDAP configuration file is created at `nginx/nginx_ldap.conf`. - -``` -# This is a sample configuration for the ldap_server section of nginx.conf. -# Yours will vary depending on how your Active Directory/LDAP server is configured. -# See https://github.com/kvspb/nginx-auth-ldap#available-config-parameters for options. - -ldap_server ad_server { - url "ldap://ds.example.com:3268/DC=ds,DC=example,DC=com?sAMAccountName?sub?(objectClass=person)"; - - binddn "bind_dn"; - binddn_passwd "bind_dn_password"; - - group_attribute member; - group_attribute_is_dn on; - require group "CN=Malcolm,CN=Users,DC=ds,DC=example,DC=com"; - require valid_user; - satisfy all; -} - -auth_ldap_cache_enabled on; -auth_ldap_cache_expiration_time 10000; -auth_ldap_cache_size 1000; -``` - -This file is mounted into the `nginx` container when Malcolm is started to provide connection information for the LDAP server. - -The contents of `nginx_ldap.conf` will vary depending on how the LDAP server is configured. Some of the [avaiable parameters](https://github.com/kvspb/nginx-auth-ldap#available-config-parameters) in that file include: - -* **`url`** - the `ldap://` or `ldaps://` connection URL for the remote LDAP server, which has the [following syntax](https://www.ietf.org/rfc/rfc2255.txt): `ldap[s]://:/???` -* **`binddn`** and **`binddn_password`** - the account credentials used to query the LDAP directory -* **`group_attribute`** - the group attribute name which contains the member object (e.g., `member` or `memberUid`) -* **`group_attribute_is_dn`** - whether or not to search for the user's full distinguished name as the value in the group's member attribute -* **`require`** and **`satisfy`** - `require user`, `require group` and `require valid_user` can be used in conjunction with `satisfy any` or `satisfy all` to limit the users that are allowed to access the Malcolm instance - -Before starting Malcolm, edit `nginx/nginx_ldap.conf` according to the specifics of your LDAP server and directory tree structure. Using a LDAP search tool such as [`ldapsearch`](https://www.openldap.org/software/man.cgi?query=ldapsearch) in Linux or [`dsquery`](https://social.technet.microsoft.com/wiki/contents/articles/2195.active-directory-dsquery-commands.aspx) in Windows may be of help as you formulate the configuration. Your changes should be made within the curly braces of the `ldap_server ad_server { … }` section. You can troubleshoot configuration file syntax errors and LDAP connection or credentials issues by running `./scripts/logs` (or `docker-compose logs nginx`) and examining the output of the `nginx` container. - -The **Malcolm User Management** page described above is not available when using LDAP authentication. - -##### LDAP connection security - -Authentication over LDAP can be done using one of three ways, [two of which](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-adts/8e73932f-70cf-46d6-88b1-8d9f86235e81) offer data confidentiality protection: - -* **StartTLS** - the [standard extension](https://tools.ietf.org/html/rfc2830) to the LDAP protocol to establish an encrypted SSL/TLS connection within an already established LDAP connection -* **LDAPS** - a commonly used (though unofficial and considered deprecated) method in which SSL negotiation takes place before any commands are sent from the client to the server -* **Unencrypted** (cleartext) (***not recommended***) - -In addition to the `NGINX_BASIC_AUTH` environment variable being set to `false` in the `x-auth-variables` section near the top of the [`docker-compose.yml`](#DockerComposeYml) file, the `NGINX_LDAP_TLS_STUNNEL` and `NGINX_LDAP_TLS_STUNNEL` environment variables are used in conjunction with the values in `nginx/nginx_ldap.conf` to define the LDAP connection security level. Use the following combinations of values to achieve the connection security methods above, respectively: - -* **StartTLS** - - `NGINX_LDAP_TLS_STUNNEL` set to `true` in [`docker-compose.yml`](#DockerComposeYml) - - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` -* **LDAPS** - - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](#DockerComposeYml) - - `url` should begin with `ldaps://` and its port should be either the default LDAPS port (636) or the default LDAPS Global Catalog port (3269) in `nginx/nginx_ldap.conf` -* **Unencrypted** (clear text) (***not recommended***) - - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](#DockerComposeYml) - - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` - -For encrypted connections (whether using **StartTLS** or **LDAPS**), Malcolm will require and verify certificates when one or more trusted CA certificate files are placed in the `nginx/ca-trust/` directory. Otherwise, any certificate presented by the domain server will be accepted. - -### TLS certificates - -When you [set up authentication](#AuthSetup) for Malcolm a set of unique [self-signed](https://en.wikipedia.org/wiki/Self-signed_certificate) TLS certificates are created which are used to secure the connection between clients (e.g., your web browser) and Malcolm's browser-based interface. This is adequate for most Malcolm instances as they are often run locally or on internal networks, although your browser will most likely require you to add a security exception for the certificate the first time you connect to Malcolm. - -Another option is to generate your own certificates (or have them issued to you) and have them placed in the `nginx/certs/` directory. The certificate and key file should be named `cert.pem` and `key.pem`, respectively. - -A third possibility is to use a third-party reverse proxy (e.g., [Traefik](https://doc.traefik.io/traefik/) or [Caddy](https://caddyserver.com/docs/quick-starts/reverse-proxy)) to handle the issuance of the certificates for you and to broker the connections between clients and Malcolm. Reverse proxies such as these often implement the [ACME](https://datatracker.ietf.org/doc/html/rfc8555) protocol for domain name authentication and can be used to request certificates from certificate authorities like [Let's Encrypt](https://letsencrypt.org/how-it-works/). In this configuration, the reverse proxy will be encrypting the connections instead of Malcolm, so you'll need to set the `NGINX_SSL` environment variable to `false` in [`docker-compose.yml`](#DockerComposeYml) (or answer `no` to the "Require encrypted HTTPS connections?" question posed by `install.py`). If you are setting `NGINX_SSL` to `false`, **make sure** you understand what you are doing and ensure that external connections cannot reach ports over which Malcolm will be communicating without encryption, including verifying your local firewall configuration. - -### Starting Malcolm - -[Docker compose](https://docs.docker.com/compose/) is used to coordinate running the Docker containers. To start Malcolm, navigate to the directory containing `docker-compose.yml` and run: -``` -$ ./scripts/start -``` -This will create the containers' virtual network and instantiate them, then leave them running in the background. The Malcolm containers may take a several minutes to start up completely. To follow the debug output for an already-running Malcolm instance, run: -``` -$ ./scripts/logs -``` -You can also use `docker stats` to monitor the resource utilization of running containers. - -### Stopping and restarting Malcolm - -You can run `./scripts/stop` to stop the docker containers and remove their virtual network. Alternatively, `./scripts/restart` will restart an instance of Malcolm. Because the data on disk is stored on the host in docker volumes, doing these operations will not result in loss of data. - -Malcolm can be configured to be automatically restarted when the Docker system daemon restart (for example, on system reboot). This behavior depends on the [value](https://docs.docker.com/config/containers/start-containers-automatically/) of the [`restart:`](https://docs.docker.com/compose/compose-file/#restart) setting for each service in the `docker-compose.yml` file. This value can be set by running [`./scripts/install.py --configure`](#ConfigAndTuning) and answering "yes" to "`Restart Malcolm upon system or Docker daemon restart?`." - -### Clearing Malcolm's data - -Run `./scripts/wipe` to stop the Malcolm instance and wipe its OpenSearch database (**including** [index snapshots and management policies](#IndexManagement) and [alerting configuration](#Alerting)). - -### Temporary read-only interface - -To temporarily set the Malcolm user interaces into a read-only configuration, run the following commands from the Malcolm installation directory. - -First, to configure [Nginx] to disable access to the upload and other interfaces for changing Malcolm settings, and to deny HTTP methods other than `GET` and `POST`: - -``` -docker-compose exec nginx-proxy bash -c "cp /etc/nginx/nginx_readonly.conf /etc/nginx/nginx.conf && nginx -s reload" -``` - -Second, to set the existing OpenSearch data store to read-only: - -``` -docker-compose exec dashboards-helper /data/opensearch_read_only.py -i _cluster -``` - -These commands must be re-run every time you restart Malcolm. - -Note that after you run these commands you may see an increase of error messages in the Malcolm containers' output as various background processes will fail due to the read-only nature of the indices. Additionally, some features such as Arkime's [Hunt](#ArkimeHunt) and [building your own visualizations and dashboards](#BuildDashboard) in OpenSearch Dashboards will not function correctly in read-only mode. - -## Capture file and log archive upload - -Malcolm serves a web browser-based upload form for uploading PCAP files and Zeek logs at [https://localhost/upload/](https://localhost/upload/) if you are connecting locally. - -![Capture File and Log Archive Upload](./docs/images/screenshots/malcolm_upload.png) - -Additionally, there is a writable `files` directory on an SFTP server served on port 8022 (e.g., `sftp://USERNAME@localhost:8022/files/` if you are connecting locally). - -The types of files supported are: - -* PCAP files (of mime type `application/vnd.tcpdump.pcap` or `application/x-pcapng`) - - PCAPNG files are *partially* supported: Zeek is able to process PCAPNG files, but not all of Arkime's packet examination features work correctly -* Zeek logs in archive files (`application/gzip`, `application/x-gzip`, `application/x-7z-compressed`, `application/x-bzip2`, `application/x-cpio`, `application/x-lzip`, `application/x-lzma`, `application/x-rar-compressed`, `application/x-tar`, `application/x-xz`, or `application/zip`) - - where the Zeek logs are found in the internal directory structure in the archive file does not matter - -Files uploaded via these methods are monitored and moved automatically to other directories for processing to begin, generally within one minute of completion of the upload. - -### Tagging - -In addition to be processed for uploading, Malcolm events will be tagged according to the components of the filenames of the PCAP files or Zeek log archives files from which the events were parsed. For example, records created from a PCAP file named `ACME_Scada_VLAN10.pcap` would be tagged with `ACME`, `Scada`, and `VLAN10`. Tags are extracted from filenames by splitting on the characters "," (comma), "-" (dash), and "_" (underscore). These tags are viewable and searchable (via the `tags` field) in Arkime and OpenSearch Dashboards. This behavior can be changed by modifying the `AUTO_TAG` [environment variable in `docker-compose.yml`](#DockerComposeYml). - -Tags may also be specified manually with the [browser-based upload form](#Upload). - -### Processing uploaded PCAPs with Zeek and Suricata - -The **Analyze with Zeek** and **Analyze with Suricata** checkboxes may be used when uploading PCAP files to cause them to be analyzed by Zeek and Suricata, respectively. This is functionally equivalent to the `ZEEK_AUTO_ANALYZE_PCAP_FILES` and `SURICATA_AUTO_ANALYZE_PCAP_FILES` environment variables [described above](#DockerComposeYml), only on a per-upload basis. Zeek can also automatically carve out files from file transfers; see [Automatic file extraction and scanning](#ZeekFileExtraction) for more details. - -## Live analysis - -### Using a network sensor appliance - -A dedicated network sensor appliance is the recommended method for capturing and analyzing live network traffic when performance and throughput is of utmost importance. [Hedgehog Linux](./sensor-iso/README.md) is a custom Debian-based operating system built to: - -* monitor network interfaces -* capture packets to PCAP files -* detect file transfers in network traffic and extract and scan those files for threats -* generate and forward Zeek and Suricata logs, Arkime sessions, and other information to [Malcolm](https://github.com/idaholab/Malcolm) - -Please see the [Hedgehog Linux README](./sensor-iso/README.md) for more information. - -### Monitoring local network interfaces - -Malcolm's `pcap-capture`, `suricata-live` and `zeek-live` containers can monitor one or more local network interfaces, specified by the `PCAP_IFACE` environment variable in [`docker-compose.yml`](#DockerComposeYml). These containers are started with additional privileges (`IPC_LOCK`, `NET_ADMIN`, `NET_RAW`, and `SYS_ADMIN`) to allow opening network interfaces in promiscuous mode for capture. - -The instances of Zeek and Suricata (in the `suricata-live` and `zeek-live` containers when the `SURICATA_LIVE_CAPTURE` and `ZEEK_LIVE_CAPTURE` environment variables in [`docker-compose.yml`](#DockerComposeYml) are set to `true`, respectively) analyze traffic on-the-fly and generate log files containing network session metadata. These log files are in turn scanned by Filebeat and forwarded to Logstash for enrichment and indexing into the OpenSearch document store. - -In contrast, the `pcap-capture` container buffers traffic to PCAP files and periodically rotates these files for processing (by Arkime's `capture` utlity in the `arkime` container) according to the thresholds defined by the `PCAP_ROTATE_MEGABYTES` and `PCAP_ROTATE_MINUTES` environment variables in [`docker-compose.yml`](#DockerComposeYml). If for some reason (e.g., a low resources environment) you also want Zeek and Suricata to process these intermediate PCAP files rather than monitoring the network interfaces directly, you can set `SURICATA_ROTATED_PCAP`/`ZEEK_ROTATED_PCAP` to `true` and `SURICATA_LIVE_CAPTURE`/`ZEEK_LIVE_CAPTURE` to false. - -These various options for monitoring traffic on local network interfaces can also be configured by running [`./scripts/install.py --configure`](#ConfigAndTuning). - -Note that currently Microsoft Windows and Apple macOS platforms run Docker inside of a virtualized environment. Live traffic capture and analysis on those platforms would require additional configuration of virtual interfaces and port forwarding in Docker which is outside of the scope of this document. - -### Manually forwarding logs from an external source - -Malcolm's Logstash instance can also be configured to accept logs from a [remote forwarder](https://www.elastic.co/products/beats/filebeat) by running [`./scripts/install.py --configure`](#ConfigAndTuning) and answering "yes" to "`Expose Logstash port to external hosts?`." Enabling encrypted transport of these logs files is discussed in [Configure authentication](#AuthSetup) and the description of the `BEATS_SSL` environment variable in the [`docker-compose.yml`](#DockerComposeYml) file. - -Configuring Filebeat to forward Zeek logs to Malcolm might look something like this example [`filebeat.yml`](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-reference-yml.html): -``` -filebeat.inputs: -- type: log - paths: - - /var/zeek/*.log - fields_under_root: true - compression_level: 0 - exclude_lines: ['^\s*#'] - scan_frequency: 10s - clean_inactive: 180m - ignore_older: 120m - close_inactive: 90m - close_renamed: true - close_removed: true - close_eof: false - clean_renamed: true - clean_removed: true - -output.logstash: - hosts: ["192.0.2.123:5044"] - ssl.enabled: true - ssl.certificate_authorities: ["/foo/bar/ca.crt"] - ssl.certificate: "/foo/bar/client.crt" - ssl.key: "/foo/bar/client.key" - ssl.supported_protocols: "TLSv1.2" - ssl.verification_mode: "none" -``` - -## Arkime - -The Arkime interface will be accessible over HTTPS on port 443 at the docker hosts IP address (e.g., [https://localhost](https://localhost) if you are connecting locally). - -### Zeek log integration - -A stock installation of Arkime extracts all of its network connection ("session") metadata ("SPI" or "Session Profile Information") from full packet capture artifacts (PCAP files). Zeek (formerly Bro) generates similar session metadata, linking network events to sessions via a connection UID. Malcolm aims to facilitate analysis of Zeek logs by mapping values from Zeek logs to the Arkime session database schema for equivalent fields, and by creating new "native" Arkime database fields for all the other Zeek log values for which there is not currently an equivalent in Arkime: - -![Zeek log session record](./docs/images/screenshots/arkime_session_zeek.png) - -In this way, when full packet capture is an option, analysis of PCAP files can be enhanced by the additional information Zeek provides. When full packet capture is not an option, similar analysis can still be performed using the same interfaces and processes using the Zeek logs alone. - -A few values of particular mention include **Data Source** (`event.provider` in OpenSearch), which can be used to distinguish from among the sources of the network traffic metadata record (e.g., `zeek` for Zeek logs and `arkime` for Arkime sessions); and, **Log Type** (`event.dataset` in OpenSearch), which corresponds to the kind of Zeek `.log` file from which the record was created. In other words, a search could be restricted to records from `conn.log` by searching `event.provider == zeek && event.dataset == conn`, or restricted to records from `weird.log` by searching `event.provider == zeek && event.dataset == weird`. - -Click the icon of the owl **🦉** in the upper-left hand corner of to access the Arkime usage documentation (accessible at [https://localhost/help](https://localhost/help) if you are connecting locally), click the **Fields** label in the navigation pane, then search for `zeek` to see a list of the other Zeek log types and fields available to Malcolm. - -![Zeek fields](./docs/images/screenshots/arkime_help_fields.png) - -The values of records created from Zeek logs can be expanded and viewed like any native Arkime session by clicking the plus **➕** icon to the left of the record in the Sessions view. However, note that when dealing with these Zeek records the full packet contents are not available, so buttons dealing with viewing and exporting PCAP information will not behave as they would for records from PCAP files. Other than that, Zeek records and their values are usable in Malcolm just like native PCAP session records. - -#### Correlating Zeek logs and Arkime sessions - -The Arkime interface displays both Zeek logs and Arkime sessions alongside each other. Using fields common to both data sources, one can [craft queries](#SearchCheatSheet) to filter results matching desired criteria. - -A few fields of particular mention that help limit returned results to those Zeek logs and Arkime session records generated from the same network connection are [Community ID](https://github.com/corelight/community-id-spec) (`network.community_id`) and Zeek's [connection UID](https://docs.zeek.org/en/stable/examples/logs/#using-uids) (`zeek.uid`), which Malcolm maps to both Arkime's `rootId` field and the [ECS](https://www.elastic.co/guide/en/ecs/current/ecs-event.html#field-event-id) `event.id` field. - -Community ID is specification for standard flow hashing [published by Corelight](https://github.com/corelight/community-id-spec) with the intent of making it easier to pivot from one dataset (e.g., Arkime sessions) to another (e.g., Zeek `conn.log` entries). In Malcolm both Arkime and [Zeek](https://github.com/corelight/zeek-community-id) populate this value, which makes it possible to filter for a specific network connection and see both data sources' results for that connection. - -The `rootId` field is used by Arkime to link session records together when a particular session has too many packets to be represented by a single session. When normalizing Zeek logs to Arkime's schema, Malcolm piggybacks on `rootId` to store Zeek's [connection UID](https://docs.zeek.org/en/stable/examples/logs/#using-uids) to crossreference entries across Zeek log types. The connection UID is also stored in `zeek.uid`. - -Filtering on community ID OR'ed with zeek UID (e.g., `network.community_id == "1:r7tGG//fXP1P0+BXH3zXETCtEFI=" || rootId == "CQcoro2z6adgtGlk42"`) is an effective way to see both the Arkime sessions and Zeek logs generated by a particular network connection. - -![Correlating Arkime sessions and Zeek logs](./docs/images/screenshots/arkime_correlate_communityid_uid.png) - -### Help - -Click the icon of the owl 🦉 in the upper-left hand corner of to access the Arkime usage documentation (accessible at [https://localhost/help](https://localhost/help) if you are connecting locally), which includes such topics as [search syntax](https://localhost/help#search), the [Sessions view](https://localhost/help#sessions), [SPIView](https://localhost/help#spiview), [SPIGraph](https://localhost/help#spigraph), and the [Connections](https://localhost/help#connections) graph. - -### Sessions - -The **Sessions** view provides low-level details of the sessions being investigated, whether they be Arkime sessions created from PCAP files or [Zeek logs mapped](#ArkimeZeek) to the Arkime session database schema. - -![Arkime's Sessions view](./docs/images/screenshots/arkime_sessions.png) - -The **Sessions** view contains many controls for filtering the sessions displayed from all sessions down to sessions of interest: - -* [search bar](https://localhost/help#search): Indicated by the magnifying glass **🔍** icon, the search bar allows defining filters on session/log metadata -* [time bounding](https://localhost/help#timebounding) controls: The **🕘**, **Start**, **End**, **Bounding**, and **Interval** fields, and the **date histogram** can be used to visually zoom and pan the time range being examined. -* search button: The **Search** button re-runs the sessions query with the filters currently specified. -* views button: Indicated by the eyeball **👁** icon, views allow overlaying additional previously-specified filters onto the current sessions filters. For convenience, Malcolm provides several Arkime preconfigured views including filtering on the `event.dataset` field. - -![Malcolm views](./docs/images/screenshots/arkime_apply_view.png) - -* map: A global map can be expanded by clicking the globe **🌎** icon. This allows filtering sessions by IP-based geolocation when possible. - -Some of these filter controls are also available on other Arkime pages (such as SPIView, SPIGraph, Connections, and Hunt). - -The number of sessions displayed per page, as well as the page currently displayed, can be specified using the paging controls underneath the time bounding controls. - -The sessions table is displayed below the filter controls. This table contains the sessions/logs matching the specified filters. - -To the left of the column headers are two buttons. The **Toggle visible columns** button, indicated by a grid **⊞** icon, allows toggling which columns are displayed in the sessions table. The **Save or load custom column configuration** button, indicated by a columns **◫** icon, allows saving the current displayed columns or loading previously-saved configurations. This is useful for customizing which columns are displayed when investigating different types of traffic. Column headers can also be clicked to sort the results in the table, and column widths may be adjusted by dragging the separators between column headers. - -Details for individual sessions/logs can be expanded by clicking the plus **➕** icon on the left of each row. Each row may contain multiple sections and controls, depending on whether the row represents a Arkime session or a [Zeek log](#ArkimeZeek). Clicking the field names and values in the details sections allows additional filters to be specified or summary lists of unique values to be exported. - -When viewing Arkime session details (ie., a session generated from a PCAP file), an additional packets section will be visible underneath the metadata sections. When the details of a session of this type are expanded, Arkime will read the packet(s) comprising the session for display here. Various controls can be used to adjust how the packet is displayed (enabling **natural** decoding and enabling **Show Images & Files** may produce visually pleasing results), and other options (including PCAP download, carving images and files, applying decoding filters, and examining payloads in [CyberChef](https://github.com/gchq/CyberChef)) are available. - -See also Arkime's usage documentation for more information on the [Sessions view](https://localhost/help#sessions). - -#### PCAP Export - -Clicking the down arrow **▼** icon to the far right of the search bar presents a list of actions including **PCAP Export** (see Arkime's [sessions help](https://localhost/help#sessions) for information on the other actions). When full PCAP sessions are displayed, the **PCAP Export** feature allows you to create a new PCAP file from the matching Arkime sessions, including controls for which sessions are included (open items, visible items, or all matching items) and whether or not to include linked segments. Click **Export PCAP** button to generate the PCAP, after which you'll be presented with a browser download dialog to save or open the file. Note that depending on the scope of the filters specified this might take a long time (or, possibly even time out). - -![Export PCAP](./docs/images/screenshots/arkime_export_pcap.png) - -See the [issues](#Issues) section of this document for an error that can occur using this feature when Zeek log sessions are displayed.View - -### SPIView - -Arkime's **SPI** (**S**ession **P**rofile **I**nformation) **View** provides a quick and easy-to-use interface for exploring session/log metrics. The SPIView page lists categories for general session metrics (e.g., protocol, source and destination IP addresses, sort and destination ports, etc.) as well as for all of various types of network traffic understood by Malcolm. These categories can be expanded and the top *n* values displayed, along with each value's cardinality, for the fields of interest they contain. - -![Arkime's SPIView](./docs/images/screenshots/arkime_spiview.png) - -Click the the plus **➕** icon to the right of a category to expand it. The values for specific fields are displayed by clicking the field description in the field list underneath the category name. The list of field names can be filtered by typing part of the field name in the *Search for fields to display in this category* text input. The **Load All** and **Unload All** buttons can be used to toggle display of all of the fields belonging to that category. Once displayed, a field's name or one of its values may be clicked to provide further actions for filtering or displaying that field or its values. Of particular interest may be the **Open [fieldname] SPI Graph** option when clicking on a field's name. This will open a new tab with the SPI Graph ([see below](#ArkimeSPIGraph)) populated with the field's top values. - -Note that because the SPIView page can potentially run many queries, SPIView limits the search domain to seven days (in other words, seven indices, as each index represents one day's worth of data). When using SPIView, you will have best results if you limit your search time frame to less than or equal to seven days. This limit can be adjusted by editing the `spiDataMaxIndices` setting in [config.ini](./etc/arkime/config.ini) and rebuilding the `malcolmnetsec/arkime` docker container. - -See also Arkime's usage documentation for more information on [SPIView](https://localhost/help#spiview). - -### SPIGraph - -Arkime's **SPI** (**S**ession **P**rofile **I**nformation) **Graph** visualizes the occurrence of some field's top *n* values over time, and (optionally) geographically. This is particularly useful for identifying trends in a particular type of communication over time: traffic using a particular protocol when seen sparsely at regular intervals on that protocol's date histogram in the SPIGraph may indicate a connection check, polling, or beaconing (for example, see the `llmnr` protocol in the screenshot below). - -![Arkime's SPIGraph](./docs/images/screenshots/arkime_spigraph.png) - -Controls can be found underneath the time bounding controls for selecting the field of interest, the number of elements to be displayed, the sort order, and a periodic refresh of the data. - -See also Arkime's usage documentation for more information on [SPIGraph](https://localhost/help#spigraph). - -### Connections - -The **Connections** page presents network communications via a force-directed graph, making it easy to visualize logical relationships between network hosts. - -![Arkime's Connections graph](./docs/images/screenshots/arkime_connections.png) - -Controls are available for specifying the query size (where smaller values will execute more quickly but may only contain an incomplete representation of the top *n* sessions, and larger values may take longer to execute but will be more complete), which fields to use as the source and destination for node values, a minimum connections threshold, and the method for determining the "weight" of the link between two nodes. As is the case with most other visualizations in Arkime, the graph is interactive: clicking on a node or the link between two nodes can be used to modify query filters, and the nodes themselves may be repositioned by dragging and dropping them. A node's color indicates whether it communicated as a source/originator, a destination/responder, or both. - -While the default source and destination fields are *Src IP* and *Dst IP:Dst Port*, the Connections view is able to use any combination of fields. For example: - -* *Src OUI* and *Dst OUI* (hardware manufacturers) -* *Src IP* and *Protocols* -* *Originating Network Segment* and *Responding Network Segment* (see [CIDR subnet to network segment name mapping](#SegmentNaming)) -* *Originating GeoIP City* and *Responding GeoIP City* - -or any other combination of these or other fields. - -See also Arkime's usage documentation for more information on the [Connections graph](https://localhost/help#connections). - -### Hunt - -Arkime's **Hunt** feature allows an analyst to search within the packets themselves (including payload data) rather than simply searching the session metadata. The search string may be specified using ASCII (with or without case sensitivity), hex codes, or regular expressions. Once a hunt job is complete, matching sessions can be viewed in the [Sessions](#ArkimeSessions) view. - -Clicking the **Create a packet search job** on the Hunt page will allow you to specify the following parameters for a new hunt job: - -* a packet search job **name** -* a **maximum number of packets** to examine per session -* the **search string** and its format (*ascii*, *ascii (case sensitive)*, *hex*, *regex*, or *hex regex*) -* whether to search **source packets**, **destination packets**, or both -* whether to search **raw** or **reassembled** packets - -Click the **➕ Create** button to begin the search. Arkime will scan the source PCAP files from which the sessions were created according to the search criteria. Note that whatever filters were specified when the hunt job is executed will apply to the hunt job as well; the number of sessions matching the current filters will be displayed above the hunt job parameters with text like "ⓘ Creating a new packet search job will search the packets of # sessions." - -![Hunt creation](./docs/images/screenshots/arkime_hunt_creation.png) - -Once a hunt job is submitted, it will be assigned a unique hunt ID (a long unique string of characters like `yuBHAGsBdljYmwGkbEMm`) and its progress will be updated periodically in the **Hunt Job Queue** with the execution percent complete, the number of matches found so far, and the other parameters with which the job was submitted. More details for the hunt job can be viewed by expanding its row with the plus **➕** icon on the left. - -![Hunt completed](./docs/images/screenshots/arkime_hunt_finished.png) - -Once the hunt job is complete (and a minute or so has passed, as the `huntId` must be added to the matching session records in the database), click the folder **📂** icon on the right side of the hunt job row to open a new [Sessions](#ArkimeSessions) tab with the search bar prepopulated to filter to sessions with packets matching the search criteria. - -![Hunt result sessions](./docs/images/screenshots/arkime_hunt_sessions.png) - -From this list of filtered sessions you can expand session details and explore packet payloads which matched the hunt search criteria. - -The hunt feature is available only for sessions created from full packet capture data, not Zeek logs. This being the case, it is a good idea to click the eyeball **👁** icon and select the **Arkime Sessions** view to exclude Zeek logs from candidate sessions prior to using the hunt feature. - -See also Arkime's usage documentation for more information on the [hunt feature](https://localhost/help#hunt). - -### Statistics - -Arkime provides several other reports which show information about the state of Arkime and the underlying OpenSearch database. - -The **Files** list displays a list of PCAP files processed by Arkime, the date and time of the earliest packet in each file, and the file size: - -![Arkime's Files list](./docs/images/screenshots/arkime_files.png) - -The **ES Indices** list (available under the **Stats** page) lists the OpenSearch indices within which log data is contained: - -![Arkime's ES indices list](./docs/images/screenshots/arkime_es_stats.png) - -The **History** view provides a historical list of queries issues to Arkime and the details of those queries: - -![Arkime's History view](./docs/images/screenshots/arkime_history.png) - -See also Arkime's usage documentation for more information on the [Files list](https://localhost/help#files), [statistics](https://localhost/help#files), and [history](https://localhost/help#history). - -### Settings - -#### General settings - -The **Settings** page can be used to tweak Arkime preferences, defined additional custom views and column configurations, tweak the color theme, and more. - -See Arkime's usage documentation for more information on [settings](https://localhost/help#settings). - -![Arkime general settings](./docs/images/screenshots/arkime_general_settings.png) - -![Arkime custom view management](./docs/images/screenshots/arkime_view_settings.png) - -## OpenSearch Dashboards - -While Arkime provides very nice visualizations, especially for network traffic, [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) (an open source general-purpose data visualization tool for OpenSearch) can be used to create custom visualizations (tables, charts, graphs, dashboards, etc.) using the same data. - -The OpenSearch Dashboards container can be accessed at [https://localhost/dashboards/](https://localhost/dashboards/) if you are connecting locally. Several preconfigured dashboards for Zeek logs are included in Malcolm's OpenSearch Dashboards configuration. - -OpenSearch Dashboards has several components for data searching and visualization: - -### Discover - -The **Discover** view enables you to view events on a record-by-record basis (similar to a *session* record in Arkime or an individual line from a Zeek log). See the official [Kibana User Guide](https://www.elastic.co/guide/en/kibana/7.10/index.html) (OpenSearch Dashboards is an open-source fork of Kibana, which is no longer open-source software) for information on using the Discover view: - -* [Discover](https://www.elastic.co/guide/en/kibana/7.10/discover.html) -* [Searching Your Data](https://www.elastic.co/guide/en/kibana/7.10/search.html) - -#### Screenshots - -![Discover view](./docs/images/screenshots/dashboards_discover.png) - -![Viewing the details of a session in Discover](./docs/images/screenshots/dashboards_discover_table.png) - -![Filtering by tags to display only sessions with public IP addresses](./docs/images/screenshots/dashboards_add_filter.png) - -![Changing the fields displayed in Discover](./docs/images/screenshots/dashboards_fields_list.png) - -![Opening a previously-saved search](./docs/images/screenshots/dashboards_open_search.png) - -### Visualizations and dashboards - -#### Prebuilt visualizations and dashboards - -Malcolm comes with dozens of prebuilt visualizations and dashboards for the network traffic represented by each of the Zeek log types. Click **Dashboard** to see a list of these dashboards. As is the case with all OpenSearch Dashboards visualizations, all of the charts, graphs, maps, and tables are interactive and can be clicked on to narrow or expand the scope of the data you are investigating. Similarly, click **Visualize** to explore the prebuilt visualizations used to build the dashboards. - -Many of Malcolm's prebuilt visualizations for Zeek logs were originally inspired by the excellent [Kibana Dashboards](https://github.com/Security-Onion-Solutions/securityonion-elastic/tree/master/kibana/dashboards) that are part of [Security Onion](https://securityonion.net/). - -##### Screenshots - -![The Security Overview highlights security-related network events](./docs/images/screenshots/dashboards_security_overview.png) - -![The ICS/IoT Security Overview dashboard displays information about ICS and IoT network traffic](./docs/images/screenshots/dashboards_ics_iot_security_overview.png) - -![The Connections dashboard displays information about the "top talkers" across all types of sessions](./docs/images/screenshots/dashboards_connections.png) - -![The HTTP dashboard displays important details about HTTP traffic](./docs/images/screenshots/dashboards_http.png) - -![There are several Connections visualizations using locations from GeoIP lookups](./docs/images/screenshots/dashboards_latlon_map.png) - -![OpenSearch Dashboards includes both coordinate and region map types](./docs/images/screenshots/dashboards_region_map.png) - -![The Suricata Alerts dashboard highlights traffic which matched Suricata signatures](./docs/images/screenshots/dashboards_suricata_alerts.png) - -![The Zeek Notices dashboard highlights things which Zeek determine are potentially bad](./docs/images/screenshots/dashboards_notices.png) - -![The Zeek Signatures dashboard displays signature hits, such as antivirus hits on files extracted from network traffic](./docs/images/screenshots/dashboards_signatures.png) - -![The Software dashboard displays the type, name, and version of software seen communicating on the network](./docs/images/screenshots/dashboards_software.png) - -![The PE (portable executables) dashboard displays information about executable files transferred over the network](./docs/images/screenshots/dashboards_portable_executables.png) - -![The SMTP dashboard highlights details about SMTP traffic](./docs/images/screenshots/dashboards_smtp.png) - -![The SSL dashboard displays information about SSL versions, certificates, and TLS JA3 fingerprints](./docs/images/screenshots/dashboards_ssl.png) - -![The files dashboard displays metrics about the files transferred over the network](./docs/images/screenshots/dashboards_files_source.png) - -![This dashboard provides insight into DNP3 (Distributed Network Protocol), a protocol used commonly in electric and water utilities](./docs/images/screenshots/dashboards_dnp3.png) - -![Modbus is a standard protocol found in many industrial control systems (ICS)](./docs/images/screenshots/dashboards_modbus.png) - -![BACnet is a communications protocol for Building Automation and Control (BAC) networks](./docs/images/screenshots/dashboards_bacnet.png) - -![EtherCAT is an Ethernet-based fieldbus system](./docs/images/screenshots/dashboards_ecat.png) - -![EtherNet/IP is an industrial network protocol that adapts the Common Industrial Protocol to standard Ethernet](./docs/images/screenshots/dashboards_ethernetip.png) - -![PROFINET is an industry technical standard for data communication over Industrial Ethernet](./docs/images/screenshots/dashboards_profinet.png) - -![S7comm is a Siemens proprietary protocol that runs between programmable logic controllers (PLCs) of the Siemens family](./docs/images/screenshots/dashboards_s7comm.png) - -#### Building your own visualizations and dashboards - -See the official [Kibana User Guide](https://www.elastic.co/guide/en/kibana/7.10/index.html) and [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) (OpenSearch Dashboards is an open-source fork of Kibana, which is no longer open-source software) documentation for information on creating your own visualizations and dashboards: - -* [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) -* [Kibana Dashboards](https://www.elastic.co/guide/en/kibana/7.10/dashboard.html) -* [TimeLine](https://www.elastic.co/guide/en/kibana/7.12/timelion.html) - -##### Screenshots - -![OpenSearch dashboards boasts many types of visualizations for displaying your data](./docs/images/screenshots/dashboards_new_visualization.png) - -## Search Queries in Arkime and OpenSearch Dashboards - -OpenSearch Dashboards supports two query syntaxes: the legacy [Lucene](https://www.elastic.co/guide/en/kibana/current/lucene-query.html) syntax and [Dashboards Query Language (DQL)](https://opensearch.org/docs/1.2/dashboards/dql/), both of which are somewhat different than Arkime's query syntax (see the help at [https://localhost/help#search](https://localhost/help#search) if you are connecting locally). The Arkime interface is for searching and visualizing both Arkime sessions and Zeek logs. The prebuilt dashboards in the OpenSearch Dashboards interface are for searching and visualizing Zeek logs, but will not include Arkime sessions. Here are some common patterns used in building search query strings for Arkime and OpenSearch Dashboards, respectively. See the links provided for further documentation. - -| | [Arkime Search String](https://localhost/help#search) | [OpenSearch Dashboards Search String (Lucene)](https://www.elastic.co/guide/en/kibana/current/lucene-query.html) | [OpenSearch Dashboards Search String (DQL)](https://www.elastic.co/guide/en/kibana/current/kuery-query.html)| -|---|:---:|:---:|:---:| -| Field exists |`event.dataset == EXISTS!`|`_exists_:event.dataset`|`event.dataset:*`| -| Field does not exist |`event.dataset != EXISTS!`|`NOT _exists_:event.dataset`|`NOT event.dataset:*`| -| Field matches a value |`port.dst == 22`|`destination.port:22`|`destination.port:22`| -| Field does not match a value |`port.dst != 22`|`NOT destination.port:22`|`NOT destination.port:22`| -| Field matches at least one of a list of values |`tags == [foo, bar]`|`tags:(foo OR bar)`|`tags:(foo or bar)`| -| Field range (inclusive) |`http.statuscode >= 200 && http.statuscode <= 300`|`http.statuscode:[200 TO 300]`|`http.statuscode >= 200 and http.statuscode <= 300`| -| Field range (exclusive) |`http.statuscode > 200 && http.statuscode < 300`|`http.statuscode:{200 TO 300}`|`http.statuscode > 200 and http.statuscode < 300`| -| Field range (mixed exclusivity) |`http.statuscode >= 200 && http.statuscode < 300`|`http.statuscode:[200 TO 300}`|`http.statuscode >= 200 and http.statuscode < 300`| -| Match all search terms (AND) |`(tags == [foo, bar]) && (http.statuscode == 401)`|`tags:(foo OR bar) AND http.statuscode:401`|`tags:(foo or bar) and http.statuscode:401`| -| Match any search terms (OR) |`(zeek.ftp.password == EXISTS!) || (zeek.http.password == EXISTS!) || (related.user == "anonymous")`|`_exists_:zeek.ftp.password OR _exists_:zeek.http.password OR related.user:"anonymous"`|`zeek.ftp.password:* or zeek.http.password:* or related.user:"anonymous"`| -| Global string search (anywhere in the document) |all Arkime search expressions are field-based|`microsoft`|`microsoft`| -| Wildcards|`host.dns == "*micro?oft*"` (`?` for single character, `*` for any characters)|`dns.host:*micro?oft*` (`?` for single character, `*` for any characters)|`dns.host:*micro*ft*` (`*` for any characters)| -| Regex |`host.http == /.*www\.f.*k\.com.*/`|`zeek.http.host:/.*www\.f.*k\.com.*/`|DQL does not support regex| -| IPv4 values |`ip == 0.0.0.0/0`|`source.ip:"0.0.0.0/0" OR destination.ip:"0.0.0.0/0"`|`source.ip:"0.0.0.0/0" OR destination.ip:"0.0.0.0/0"`| -| IPv6 values |`(ip.src == EXISTS! || ip.dst == EXISTS!) && (ip != 0.0.0.0/0)`|`(_exists_:source.ip AND NOT source.ip:"0.0.0.0/0") OR (_exists_:destination.ip AND NOT destination.ip:"0.0.0.0/0")`|`(source.ip:* and not source.ip:"0.0.0.0/0") or (destination.ip:* and not destination.ip:"0.0.0.0/0")`| -| GeoIP information available |`country == EXISTS!`|`_exists_:destination.geo OR _exists_:source.geo`|`destination.geo:* or source.geo:*`| -| Zeek log type |`event.dataset == notice`|`event.dataset:notice`|`event.dataset:notice`| -| IP CIDR Subnets |`ip.src == 172.16.0.0/12`|`source.ip:"172.16.0.0/12"`|`source.ip:"172.16.0.0/12"`| -| Search time frame |Use Arkime time bounding controls under the search bar|Use OpenSearch Dashboards time range controls in the upper right-hand corner|Use OpenSearch Dashboards time range controls in the upper right-hand corner| - -When building complex queries, it is **strongly recommended** that you enclose search terms and expressions in parentheses to control order of operations. - -As Zeek logs are ingested, Malcolm parses and normalizes the logs' fields to match Arkime's underlying OpenSearch schema. A complete list of these fields can be found in the Arkime help (accessible at [https://localhost/help#fields](https://localhost/help#fields) if you are connecting locally). - -Whenever possible, Zeek fields are mapped to existing corresponding Arkime fields: for example, the `orig_h` field in Zeek is mapped to Arkime's `source.ip` field. The original Zeek fields are also left intact. To complicate the issue, the Arkime interface uses its own aliases to reference those fields: the source IP field is referenced as `ip.src` (Arkime's alias) in Arkime and `source.ip` or `source.ip` in OpenSearch Dashboards. - -The table below shows the mapping of some of these fields. - -| Field Description |Arkime Field Alias(es)|Arkime-mapped Zeek Field(s)|Zeek Field(s)| -|---|:---:|:---:|:---:| -| [Community ID](https://github.com/corelight/community-id-spec) Flow Hash ||`network.community_id`|`network.community_id`| -| Destination IP |`ip.dst`|`destination.ip`|`destination.ip`| -| Destination MAC |`mac.dst`|`destination.mac`|`destination.mac`| -| Destination Port |`port.dst`|`destination.port`|`destination.port`| -| Duration |`session.length`|`length`|`zeek.conn.duration`| -| First Packet Time |`starttime`|`firstPacket`|`zeek.ts`, `@timestamp`| -| IP Protocol |`ip.protocol`|`ipProtocol`|`network.transport`| -| Last Packet Time |`stoptime`|`lastPacket`|| -| MIME Type |`email.bodymagic`, `http.bodymagic`|`http.bodyMagic`|`file.mime_type`, `zeek.files.mime_type`, `zeek.ftp.mime_type`, `zeek.http.orig_mime_types`, `zeek.http.resp_mime_types`, `zeek.irc.dcc_mime_type`| -| Protocol/Service |`protocols`|`protocol`|`network.transport`, `network.protocol`| -| Request Bytes |`databytes.src`, `bytes.src`|`source.bytes`, `client.bytes`|`zeek.conn.orig_bytes`, `zeek.conn.orig_ip_bytes`| -| Request Packets |`packets.src`|`source.packets`|`zeek.conn.orig_pkts`| -| Response Bytes |`databytes.dst`, `bytes.dst`|`destination.bytes`, `server.bytes`|`zeek.conn.resp_bytes`, `zeek.conn.resp_ip_bytes`| -| Response Packets |`packets.dst`|`destination.packets`|`zeek.con.resp_pkts`| -| Source IP |`ip.src`|`source.ip`|`source.ip`| -| Source MAC |`mac.src`|`source.mac`|`source.mac`| -| Source Port |`port.src`|`source.port`|`source.port`| -| Total Bytes |`databytes`, `bytes`|`totDataBytes`, `network.bytes`|| -| Total Packets |`packets`|`network.packets`|| -| Username |`user`|`user`|`related.user`| -| Zeek Connection UID|||`zeek.uid`, `event.id`| -| Zeek File UID |||`zeek.fuid`, `event.id`| -| Zeek Log Type |||`event.dataset`| - -In addition to the fields listed above, Arkime provides several special field aliases for matching any field of a particular type. While these aliases do not exist in OpenSearch Dashboards *per se*, they can be approximated as illustrated below. - -| Matches Any | Arkime Special Field Example | OpenSearch Dashboards/Zeek Equivalent Example | -|---|:---:|:---:| -| IP Address | `ip == 192.168.0.1` | `source.ip:192.168.0.1 OR destination.ip:192.168.0.1` | -| Port | `port == [80, 443, 8080, 8443]` | `source.port:(80 OR 443 OR 8080 OR 8443) OR destination.port:(80 OR 443 OR 8080 OR 8443)` | -| Country (code) | `country == [RU,CN]` | `destination.geo.country_code2:(RU OR CN) OR source.geo.country_code2:(RU OR CN) OR dns.GEO:(RU OR CN)` | -| Country (name) | | `destination.geo.country_name:(Russia OR China) OR source.geo.country_name:(Russia OR China)` | -| ASN | `asn == "*Mozilla*"` | `source.as.full:*Mozilla* OR destination.as.full:*Mozilla* OR dns.ASN:*Mozilla*` | -| Host | `host == www.microsoft.com` | `zeek.http.host:www.microsoft.com (or zeek.dhcp.host_name, zeek.dns.host, zeek.ntlm.host, smb.host, etc.)` | -| Protocol (layers >= 4) | `protocols == tls` | `protocol:tls` | -| User | `user == EXISTS! && user != anonymous` | `_exists_:user AND (NOT user:anonymous)` | - -For details on how to filter both Zeek logs and Arkime session records for a particular connection, see [Correlating Zeek logs and Arkime sessions](#ZeekArkimeFlowCorrelation). - -## Other Malcolm features - -### Automatic file extraction and scanning - -Malcolm can leverage Zeek's knowledge of network protocols to automatically detect file transfers and extract those files from PCAPs as Zeek processes them. This behavior can be enabled globally by modifying the `ZEEK_EXTRACTOR_MODE` [environment variable in `docker-compose.yml`](#DockerComposeYml), or on a per-upload basis for PCAP files uploaded via the [browser-based upload form](#Upload) when **Analyze with Zeek** is selected. - -To specify which files should be extracted, the following values are acceptable in `ZEEK_EXTRACTOR_MODE`: - -* `none`: no file extraction -* `interesting`: extraction of files with mime types of common attack vectors -* `mapped`: extraction of files with recognized mime types -* `known`: extraction of files for which any mime type can be determined -* `all`: extract all files - -Extracted files can be examined through any of the following methods: - -* submitting file hashes to [**VirusTotal**](https://www.virustotal.com/en/#search); to enable this method, specify the `VTOT_API2_KEY` [environment variable in `docker-compose.yml`](#DockerComposeYml) -* scanning files with [**ClamAV**](https://www.clamav.net/); to enable this method, set the `EXTRACTED_FILE_ENABLE_CLAMAV` [environment variable in `docker-compose.yml`](#DockerComposeYml) to `true` -* scanning files with [**Yara**](https://github.com/VirusTotal/yara); to enable this method, set the `EXTRACTED_FILE_ENABLE_YARA` [environment variable in `docker-compose.yml`](#DockerComposeYml) to `true` -* scanning PE (portable executable) files with [**Capa**](https://github.com/fireeye/capa); to enable this method, set the `EXTRACTED_FILE_ENABLE_CAPA` [environment variable in `docker-compose.yml`](#DockerComposeYml) to `true` - -Files which are flagged via any of these methods will be logged as Zeek `signatures.log` entries, and can be viewed in the **Signatures** dashboard in OpenSearch Dashboards. - -The `EXTRACTED_FILE_PRESERVATION` [environment variable in `docker-compose.yml`](#DockerComposeYml) determines the behavior for preservation of Zeek-extracted files: - -* `quarantined`: preserve only flagged files in `./zeek-logs/extract_files/quarantine` -* `all`: preserve flagged files in `./zeek-logs/extract_files/quarantine` and all other extracted files in `./zeek-logs/extract_files/preserved` -* `none`: preserve no extracted files - -The `EXTRACTED_FILE_HTTP_SERVER_…` [environment variables in `docker-compose.yml`](#DockerComposeYml) configure access to the Zeek-extracted files path through the means of a simple HTTPS directory server. Beware that Zeek-extracted files may contain malware. As such, the files may be optionally encrypted upon download. - -### Automatic host and subnet name assignment - -#### IP/MAC address to hostname mapping via `host-map.txt` - -The `host-map.txt` file in the Malcolm installation directory can be used to define names for network hosts based on IP and/or MAC addresses in Zeek logs. The default empty configuration looks like this: -``` -# IP or MAC address to host name map: -# address|host name|required tag -# -# where: -# address: comma-separated list of IPv4, IPv6, or MAC addresses -# e.g., 172.16.10.41, 02:42:45:dc:a2:96, 2001:0db8:85a3:0000:0000:8a2e:0370:7334 -# -# host name: host name to be assigned when event address(es) match -# -# required tag (optional): only check match and apply host name if the event -# contains this tag -# -``` -Each non-comment line (not beginning with a `#`), defines an address-to-name mapping for a network host. For example: -``` -127.0.0.1,127.0.1.1,::1|localhost| -192.168.10.10|office-laptop.intranet.lan| -06:46:0b:a6:16:bf|serial-host.intranet.lan|testbed -``` -Each line consists of three `|`-separated fields: address(es), hostname, and, optionally, a tag which, if specified, must belong to a log for the matching to occur. - -As Zeek logs are processed into Malcolm's OpenSearch instance, the log's source and destination IP and MAC address fields (`source.ip`, `destination.ip`, `source.mac`, and `destination.mac`, respectively) are compared against the lists of addresses in `host-map.txt`. When a match is found, a new field is added to the log: `source.hostname` or `destination.hostname`, depending on whether the matching address belongs to the originating or responding host. If the third field (the "required tag" field) is specified, a log must also contain that value in its `tags` field in addition to matching the IP or MAC address specified in order for the corresponding `_hostname` field to be added. - -`source.hostname` and `destination.hostname` may each contain multiple values. For example, if both a host's source IP address and source MAC address were matched by two different lines, `source.hostname` would contain the hostname values from both matching lines. - -#### CIDR subnet to network segment name mapping via `cidr-map.txt` - -The `cidr-map.txt` file in the Malcolm installation directory can be used to define names for network segments based on IP addresses in Zeek logs. The default empty configuration looks like this: -``` -# CIDR to network segment format: -# IP(s)|segment name|required tag -# -# where: -# IP(s): comma-separated list of CIDR-formatted network IP addresses -# e.g., 10.0.0.0/8, 169.254.0.0/16, 172.16.10.41 -# -# segment name: segment name to be assigned when event IP address(es) match -# -# required tag (optional): only check match and apply segment name if the event -# contains this tag -# -``` -Each non-comment line (not beginning with a `#`), defines an subnet-to-name mapping for a network host. For example: -``` -192.168.50.0/24,192.168.40.0/24,10.0.0.0/8|corporate| -192.168.100.0/24|control| -192.168.200.0/24|dmz| -172.16.0.0/12|virtualized|testbed -``` -Each line consists of three `|`-separated fields: CIDR-formatted subnet IP range(s), subnet name, and, optionally, a tag which, if specified, must belong to a log for the matching to occur. - -As Zeek logs are processed into Malcolm's OpenSearch instance, the log's source and destination IP address fields (`source.ip` and `destination.ip`, respectively) are compared against the lists of addresses in `cidr-map.txt`. When a match is found, a new field is added to the log: `source.segment` or `destination.segment`, depending on whether the matching address belongs to the originating or responding host. If the third field (the "required tag" field) is specified, a log must also contain that value in its `tags` field in addition to its IP address falling within the subnet specified in order for the corresponding `_segment` field to be added. - -`source.segment` and `destination.segment` may each contain multiple values. For example, if `cidr-map.txt` specifies multiple overlapping subnets on different lines, `source.segment` would contain the hostname values from both matching lines if `source.ip` belonged to both subnets. - -If both `source.segment` and `destination.segment` are added to a log, and if they contain different values, the tag `cross_segment` will be added to the log's `tags` field for convenient identification of cross-segment traffic. This traffic could be easily visualized using Arkime's **Connections** graph, by setting the **Src:** value to **Originating Network Segment** and the **Dst:** value to **Responding Network Segment**: - -![Cross-segment traffic in Connections](./docs/images/screenshots/arkime_connections_segments.png) - -#### Defining hostname and CIDR subnet names interface - -As an alternative to manually editing `cidr-map.txt` and `host-map.txt`, a **Host and Subnet Name Mapping** editor is available at [https://localhost/name-map-ui/](https://localhost/name-map-ui/) if you are connecting locally. Upon loading, the editor is populated from `cidr-map.txt`, `host-map.txt` and `net-map.json`. - -This editor provides the following controls: - -* 🔎 **Search mappings** - narrow the list of visible items using a search filter -* **Type**, **Address**, **Name** and **Tag** *(column headings)* - sort the list of items by clicking a column header -* 📝 *(per item)* - modify the selected item -* 🚫 *(per item)* - remove the selected item -* 🖳 **host** / 🖧 **segment**, **Address**, **Name**, **Tag (optional)** and 💾 - save the item with these values (either adding a new item or updating the item being modified) -* 📥 **Import** - clear the list and replace it with the contents of an uploaded `net-map.json` file -* 📤 **Export** - format and download the list as a `net-map.json` file -* 💾 **Save Mappings** - format and store `net-map.json` in the Malcolm directory (replacing the existing `net-map.json` file) -* 🔁 **Restart Logstash** - restart log ingestion, parsing and enrichment - -![Host and Subnet Name Mapping Editor](./docs/images/screenshots/malcolm_name_map_ui.png) - -#### Applying mapping changes - -When changes are made to either `cidr-map.txt`, `host-map.txt` or `net-map.json`, Malcolm's Logstash container must be restarted. The easiest way to do this is to restart malcolm via `restart` (see [Stopping and restarting Malcolm](#StopAndRestart)) or by clicking the 🔁 **Restart Logstash** button in the [name mapping interface](#NameMapUI) interface. - -Restarting Logstash may take several minutes, after which log ingestion will be resumed. - -### OpenSearch index management - -Malcolm releases prior to v6.2.0 used environment variables to configure OpenSearch [Index State Management](https://opensearch.org/docs/latest/im-plugin/ism/index/) [policies](https://opensearch.org/docs/latest/im-plugin/ism/policies/). - -Since then, OpenSearch Dashboards has developed and released plugins with UIs for [Index State Management](https://opensearch.org/docs/latest/im-plugin/ism/index/) and [Snapshot Management](https://opensearch.org/docs/latest/opensearch/snapshots/sm-dashboards/). Because these plugins provide a more comprehensive and user-friendly interfaces for these features, the old environment variable-based configuration code has been removed from Malcolm, with the exception of the code that uses `OPENSEARCH_INDEX_SIZE_PRUNE_LIMIT` and `OPENSEARCH_INDEX_SIZE_PRUNE_NAME_SORT` which deals with deleting the oldest network session metadata indices when the database exceeds a certain size. - -Note that OpenSearch index state management and snapshot management only deals with disk space consumed by OpenSearch indices: it does not have anything to do with PCAP file storage. The `MANAGE_PCAP_FILES` environment variable in the [`docker-compose.yml`](#DockerComposeYml) file can be used to allow Arkime to prune old PCAP files based on available disk space. - -### Event severity scoring - -As Zeek logs are parsed and enriched prior to indexing, a severity score up to `100` (a higher score indicating a more severe event) can be assigned when one or more of the following conditions are met: - -* cross-segment network traffic (if [network subnets were defined](#HostAndSubnetNaming)) -* connection origination and destination (e.g., inbound, outbound, external, internal) -* traffic to or from sensitive countries - - The comma-separated list of countries (by [ISO 3166-1 alpha-2 code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Current_codes)) can be customized by setting the `SENSITIVE_COUNTRY_CODES` environment variable in [`docker-compose.yml`](#DockerComposeYml). -* domain names (from DNS queries and SSL server names) with high entropy as calculated by [freq](https://github.com/MarkBaggett/freq) - - The entropy threshold for this condition to trigger can be adjusted by setting the `FREQ_SEVERITY_THRESHOLD` environment variable in [`docker-compose.yml`](#DockerComposeYml). A lower value will only assign severity scores to fewer domain names with higher entropy (e.g., `2.0` for `NQZHTFHRMYMTVBQJE.COM`), while a higher value will assign severity scores to more domain names with lower entropy (e.g., `7.5` for `naturallanguagedomain.example.org`). -* file transfers (categorized by mime type) -* `notice.log`, [`intel.log`](#ZeekIntel) and `weird.log` entries, including those generated by Zeek plugins detecting vulnerabilities (see the list of Zeek plugins under [Components](#Components)) -* detection of cleartext passwords -* use of insecure or outdated protocols -* tunneled traffic or use of VPN protocols -* rejected or aborted connections -* common network services communicating over non-standard ports -* file scanning engine hits on [extracted files](#ZeekFileExtraction) -* large connection or file transfer - - The size (in megabytes) threshold for this condition to trigger can be adjusted by setting the `TOTAL_MEGABYTES_SEVERITY_THRESHOLD` environment variable in [`docker-compose.yml`](#DockerComposeYml). -* long connection duration - - The duration (in seconds) threshold for this condition to trigger can be adjusted by setting the `CONNECTION_SECONDS_SEVERITY_THRESHOLD` environment variable in [`docker-compose.yml`](#DockerComposeYml). - -As this [feature](https://github.com/idaholab/Malcolm/issues/19) is improved it's expected that additional categories will be identified and implemented for severity scoring. - -When a Zeek log satisfies more than one of these conditions its severity scores will be summed, with a maximum score of `100`. A Zeek log's severity score is indexed in the `event.severity` field and the conditions which contributed to its score are indexed in `event.severity_tags`. - -![The Severity dashboard](./docs/images/screenshots/dashboards_severity.png) - -#### Customizing event severity scoring - -These categories' severity scores can be customized by editing `logstash/maps/malcolm_severity.yaml`: - -* Each category can be assigned a number between `1` and `100` for severity scoring. -* Any category may be disabled by assigning it a score of `0`. -* A severity score can be assigned for any [supported protocol](#Protocols) by adding an entry with the key formatted like `"PROTOCOL_XYZ"`, where `XYZ` is the uppercased value of the protocol as stored in the `network.protocol` field. For example, to assign a score of `40` to Zeek logs generated for SSH traffic, you could add the following line to `malcolm_severity.yaml`: - -``` -"PROTOCOL_SSH": 40 -``` - -Restart Logstash after modifying `malcolm_severity.yaml` for the changes to take effect. The [hostname and CIDR subnet names interface](#NameMapUI) provides a convenient button for restarting Logstash. - -Severity scoring can be disabled globally by setting the `LOGSTASH_SEVERITY_SCORING` environment variable to `false` in the [`docker-compose.yml`](#DockerComposeYml) file and [restarting Malcolm](#StopAndRestart). - -### Zeek Intelligence Framework - -To quote Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) documentation, "The goals of Zeek’s Intelligence Framework are to consume intelligence data, make it available for matching, and provide infrastructure to improve performance and memory utilization. Data in the Intelligence Framework is an atomic piece of intelligence such as an IP address or an e-mail address. This atomic data will be packed with metadata such as a freeform source field, a freeform descriptive field, and a URL which might lead to more information about the specific item." Zeek [intelligence](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html) [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type) include IP addresses, URLs, file names, hashes, email addresses, and more. - -Malcolm doesn't come bundled with intelligence files from any particular feed, but they can be easily included into your local instance. On [startup](shared/bin/zeek_intel_setup.sh), Malcolm's `malcolmnetsec/zeek` docker container enumerates the subdirectories under `./zeek/intel` (which is [bind mounted](https://docs.docker.com/storage/bind-mounts/) into the container's runtime) and configures Zeek so that those intelligence files will be automatically included in its local policy. Subdirectories under `./zeek/intel` which contain their own `__load__.zeek` file will be `@load`-ed as-is, while subdirectories containing "loose" intelligence files will be [loaded](https://docs.zeek.org/en/master/frameworks/intel.html#loading-intelligence) automatically with a `redef Intel::read_files` directive. - -Note that Malcolm does not manage updates for these intelligence files. You should use the update mechanism suggested by your feeds' maintainers to keep them up to date, or use a [TAXII](#ZeekIntelSTIX) or [MISP](#ZeekIntelMISP) feed as described below. - -Adding and deleting intelligence files under this directory will take effect upon [restarting Malcolm](#StopAndRestart). Alternately, you can use the `ZEEK_INTEL_REFRESH_CRON_EXPRESSION` environment variable containing a [cron expression](https://en.wikipedia.org/wiki/Cron#CRON_expression) to specify the interval at which the intel files should be refreshed. It can also be done manually without restarting Malcolm by running the following command from the Malcolm installation directory: - -``` -docker-compose exec --user $(id -u) zeek /usr/local/bin/entrypoint.sh true -``` - -For a public example of Zeek intelligence files, see Critical Path Security's [repository](https://github.com/CriticalPathSecurity/Zeek-Intelligence-Feeds) which aggregates data from various other threat feeds into Zeek's format. - -#### STIX™ and TAXII™ - -In addition to loading Zeek intelligence files, on startup Malcolm will [automatically generate](shared/bin/zeek_intel_from_threat_feed.py) a Zeek intelligence file for all [Structured Threat Information Expression (STIX™)](https://oasis-open.github.io/cti-documentation/stix/intro.html) [v2.0](https://docs.oasis-open.org/cti/stix/v2.0/stix-v2.0-part1-stix-core.html)/[v2.1](https://docs.oasis-open.org/cti/stix/v2.1/stix-v2.1.html) JSON files found under `./zeek/intel/STIX`. - -Additionally, if a special text file named `.stix_input.txt` is found in `./zeek/intel/STIX`, that file will be read and processed as a list of [TAXII™](https://oasis-open.github.io/cti-documentation/taxii/intro.html) [2.0](http://docs.oasis-open.org/cti/taxii/v2.0/cs01/taxii-v2.0-cs01.html)/[2.1](https://docs.oasis-open.org/cti/taxii/v2.1/csprd02/taxii-v2.1-csprd02.html) feeds, one per line, according to the following format (the username and password are optional): - -``` -taxii|version|discovery_url|collection_name|username|password -``` - -For example: - -``` -taxii|2.0|http://example.org/taxii/|IP Blocklist|guest|guest -taxii|2.1|https://example.com/taxii/api2/|URL Blocklist -… -``` - -Malcolm will attempt to query the TAXII feed(s) for `indicator` STIX objects and convert them to the Zeek intelligence format as described above. There are publicly available TAXII 2.x-compatible services provided by a number of organizations including [Anomali Labs](https://www.anomali.com/resources/limo) and [MITRE](https://www.mitre.org/capabilities/cybersecurity/overview/cybersecurity-blog/attck%E2%84%A2-content-available-in-stix%E2%84%A2-20-via), or you may choose from several open-source offerings to roll your own TAXII 2 server (e.g., [oasis-open/cti-taxii-server](https://github.com/oasis-open/cti-taxii-server), [freetaxii/server](https://github.com/freetaxii/server), [StephenOTT/TAXII-Server](https://github.com/StephenOTT/TAXII-Server), etc.). - -Note that only **indicators** of [**cyber-observable objects**](https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html#_mlbmudhl16lr) matched with the **equals (`=`)** [comparison operator](https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html#_t11hn314cr7w) against a **single value** can be expressed as Zeek intelligence items. More complex STIX indicators will be silently ignored. - -#### MISP - -In addition to loading Zeek intelligence files, on startup Malcolm will [automatically generate](shared/bin/zeek_intel_from_threat_feed.py) a Zeek intelligence file for all [Malware Information Sharing Platform (MISP)](https://www.misp-project.org/datamodels/) JSON files found under `./zeek/intel/MISP`. - -Additionally, if a special text file named `.misp_input.txt` is found in `./zeek/intel/MISP`, that file will be read and processed as a list of [MISP feed](https://misp.gitbooks.io/misp-book/content/managing-feeds/#feeds) URLs, one per line, according to the following format (the authentication key is optional): - -``` -misp|manifest_url|auth_key -``` - -For example: - -``` -misp|https://example.com/data/feed-osint/manifest.json|df97338db644c64fbfd90f3e03ba8870 -… -``` - -Malcolm will attempt to connect to the MISP feed(s) and retrieve [`Attribute`](https://www.misp-standard.org/rfc/misp-standard-core.html#name-attribute) objects of MISP events and convert them to the Zeek intelligence format as described above. There are publicly available [MISP feeds](https://www.misp-project.org/feeds/) and [communities](https://www.misp-project.org/communities/), or you may [run your own MISP instance](https://www.misp-project.org/2019/09/25/hostev-vs-own-misp.html/). - -Note that only a subset of MISP [attribute types](https://www.misp-project.org/datamodels/#attribute-categories-vs-types) can be expressed with the Zeek intelligence [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type). MISP attributes with other types will be silently ignored. - -### Anomaly Detection - -Malcolm uses the Anomaly Detection plugins for [OpenSearch](https://github.com/opensearch-project/anomaly-detection) and [OpenSearch Dashboards](https://github.com/opensearch-project/anomaly-detection-dashboards-plugin) to identify anomalous log data in near real-time using the [Random Cut Forest](https://api.semanticscholar.org/CorpusID:927435) (RCF) algorithm. This can be paired with [Alerting](#Alerting) to automatically notify when anomalies are found. See [Anomaly detection](https://opensearch.org/docs/latest/monitoring-plugins/ad/index/) in the OpenSearch documentation for usage instructions on how to create detectors for any of the many fields Malcolm supports. - -A fresh installation of Malcolm configures [several detectors](dashboards/anomaly_detectors) for detecting anomalous network traffic: - -* **network_protocol** - Detects anomalies based on application protocol (`network.protocol`) -* **action_result_user** - Detects anomalies in action (`event.action`), result (`event.result`) and user (`related.user`) within application protocols (`network.protocol`) -* **file_mime_type** - Detects anomalies based on transferred file type (`file.mime_type`) -* **total_bytes** - Detects anomalies based on traffic size (sum of `network.bytes`) - -These detectors are disabled by default, but may be enabled for anomaly detection over streaming or [historical data](https://aws.amazon.com/about-aws/whats-new/2022/01/amazon-opensearch-service-elasticsearch-anomaly-detection/). - -### Alerting - -Malcolm uses the Alerting plugins for [OpenSearch](https://github.com/opensearch-project/alerting) and [OpenSearch Dashboards](https://github.com/opensearch-project/alerting-dashboards-plugin). See [Alerting](https://opensearch.org/docs/latest/monitoring-plugins/alerting/index/) in the OpenSearch documentation for usage instructions. - -A fresh installation of Malcolm configures an example [custom webhook destination](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#create-destinations) named **Malcolm API Loopback Webhook** that directs the triggered alerts back into the [Malcolm API](#API) to be reindexed as a session record with `event.dataset` set to `alerting`. The corresponding monitor **Malcolm API Loopback Monitor** is disabled by default, as you'll likely want to configure the trigger conditions to suit your needs. These examples are provided to illustrate how triggers and monitors can interact with a custom webhook to process alerts. - -#### Email Sender Accounts - -When using an email account to send alerts, you must [authenticate each sender account](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#authenticate-sender-account) before you can send an email. The [`auth_setup`](#AuthSetup) script can be used to securely store the email account credentials: - -``` -./scripts/auth_setup - -Store administrator username/password for local Malcolm access? (Y/n): n - -(Re)generate self-signed certificates for HTTPS access (Y/n): n - -(Re)generate self-signed certificates for a remote log forwarder (Y/n): n - -Store username/password for primary remote OpenSearch instance? (y/N): n - -Store username/password for secondary remote OpenSearch instance? (y/N): n - -Store username/password for email alert sender account? (y/N): y - -Email account username: analyst@example.org -analyst@example.org password: -analyst@example.org password (again): -Email alert sender account variables stored: opensearch.alerting.destination.email.destination_alpha.password, opensearch.alerting.destination.email.destination_alpha.username - -(Re)generate internal passwords for NetBox (Y/n): n -``` - -This action should only be performed while Malcolm is [stopped](#StopAndRestart): otherwise the credentials will not be stored correctly. - -### "Best Guess" Fingerprinting for ICS Protocols - -There are many ICS (industrial control systems) protocols. While Malcolm's collection of [protocol parsers](#Protocols) includes a number of them, many, particularly those that are proprietary or less common, are unlikely to be supported with a full parser in the foreseeable future. - -In an effort to help identify more ICS traffic, Malcolm can use "best guess" method based on transport protocol (e.g., TCP or UDP) and port(s) to categorize potential traffic communicating over some ICS protocols without full parser support. This feature involves a [mapping table](https://github.com/idaholab/Malcolm/blob/master/zeek/config/guess_ics_map.txt) and a [Zeek script](https://github.com/idaholab/Malcolm/blob/master/zeek/config/guess.zeek) to look up the transport protocol and destination and/or source port to make a best guess at whether a connection belongs to one of those protocols. These potential ICS communications are categorized by vendor where possible. - -Naturally, these lookups could produce false positives, so these connections are displayed in their own dashboard (the **Best Guess** dashboard found under the **ICS** section of Malcolm's [OpenSearch Dashboards](#DashboardsVisualizations) navigation pane). Values such as IP addresses, ports, or UID can be used to [pivot to other dashboards](#ZeekArkimeFlowCorrelation) to investigate further. - -![](./docs/images/screenshots/dashboards_bestguess.png) - -This feature is disabled by default, but it can be enabled by clearing (setting to `''`) the value of the `ZEEK_DISABLE_BEST_GUESS_ICS` environment variable in [`docker-compose.yml`](#DockerComposeYml). - -### Asset Management with NetBox - -Malcolm provides an instance of [NetBox](https://netbox.dev/), an open-source "solution for modeling and documenting modern networks." The NetBox web interface is available at at [https://localhost/netbox/](https://localhost/netbox/) if you are connecting locally. - -The design of a potentially deeper integration between Malcolm and Netbox is a work in progress. The purpose of an asset management system is to document the intended state of a network: were Malcolm to actively and agressively populate NetBox with the live network state, a network configuration fault could result in an incorrect documented configuration. The Malcolm development team is investigating what data, if any, should automatically flow to NetBox based on traffic observed (enabled via the `NETBOX_CRON` [environment variable in `docker-compose.yml`](#DockerComposeYml)), and what NetBox inventory data could be used, if any, to enrich Malcolm's network traffic metadata. Well-considered suggestions in this area [are welcome](mailto:malcolm@inl.gov?subject=NetBox). - -Please see the [NetBox page on GitHub](https://github.com/netbox-community/netbox), its [documentation](https://docs.netbox.dev/en/stable/) and its [public demo](https://demo.netbox.dev/) for more information. - -### CyberChef - -Malcolm provides an instance of [CyberChef](https://github.com/gchq/CyberChef), the "Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis." CyberChef is available at at [https://localhost/cyberchef.html](https://localhost/cyberchef.html) if you are connecting locally. - -Arkime's [Sessions](#ArkimeSessions) view has built-in CyberChef integration for Arkime sessions with full PCAP payloads available: expanding a session and opening the **Packet Options** drop-down menu in its payload section will provide options for **Open src packets with CyberChef** and **Open dst packets with CyberChef**. - -### API - -Malcolm provides a [REST API](./api/project/__init__.py) that can be used to programatically query some aspects of Malcolm's status and data. Malcolm's API is not to be confused with the [Viewer API](https://arkime.com/apiv3) provided by Arkime, although there may be some overlap in functionality. - -#### Ping - -`GET` - /mapi/ping - -Returns `pong` (for a simple "up" check). - -Example output: - -```json -{"ping":"pong"} -``` - -#### Version Information - -`GET` - /mapi/version - -Returns version information about Malcolm and version/[health](https://opensearch.org/docs/latest/opensearch/rest-api/cluster-health/) information about the underlying OpenSearch instance. - -
-Example output: - -```json -{ - "built": "2022-01-18T16:10:39Z", - "opensearch": { - "cluster_name": "docker-cluster", - "cluster_uuid": "TcSiEaOgTdO_l1IivYz2gA", - "name": "opensearch", - "tagline": "The OpenSearch Project: https://opensearch.org/", - "version": { - "build_date": "2021-12-21T01:36:21.407473Z", - "build_hash": "8a529d77c7432bc45b005ac1c4ba3b2741b57d4a", - "build_snapshot": false, - "build_type": "tar", - "lucene_version": "8.10.1", - "minimum_index_compatibility_version": "6.0.0-beta1", - "minimum_wire_compatibility_version": "6.8.0", - "number": "7.10.2" - } - }, - "opensearch_health": { - "active_primary_shards": 29, - "active_shards": 29, - "active_shards_percent_as_number": 82.85714285714286, - "cluster_name": "docker-cluster", - "delayed_unassigned_shards": 0, - "discovered_master": true, - "initializing_shards": 0, - "number_of_data_nodes": 1, - "number_of_in_flight_fetch": 0, - "number_of_nodes": 1, - "number_of_pending_tasks": 0, - "relocating_shards": 0, - "status": "yellow", - "task_max_waiting_in_queue_millis": 0, - "timed_out": false, - "unassigned_shards": 6 - }, - "sha": "8ddbbf4", - "version": "5.2.0" -} -``` -
- -#### Fields - -`GET` - /mapi/fields - -Returns the (very long) list of fields known to Malcolm, comprised of data from Arkime's [`fields` table](https://arkime.com/apiv3#fields-api), the Malcolm [OpenSearch template](./dashboards/templates/malcolm_template.json) and the OpenSearch Dashboards index pattern API. - -
-Example output: - -```json -{ - "fields": { - "@timestamp": { - "type": "date" - }, -… - "zeek.x509.san_uri": { - "description": "Subject Alternative Name URI", - "type": "string" - }, - "zeek.x509.san_uri.text": { - "type": "string" - } - }, - "total": 2005 -} -``` -
- - -#### Indices - -`GET` - /mapi/indices - -Lists [information related to the underlying OpenSearch indices](https://opensearch.org/docs/latest/opensearch/rest-api/cat/cat-indices/), similar to Arkime's [esindices](https://arkime.com/apiv3#esindices-api) API. - -
-Example output: - -```json - -{ - "indices": [ -… - { - "docs.count": "2268613", - "docs.deleted": "0", - "health": "green", - "index": "arkime_sessions3-210301", - "pri": "1", - "pri.store.size": "1.8gb", - "rep": "0", - "status": "open", - "store.size": "1.8gb", - "uuid": "w-4Q0ofBTdWO9KqeIIAAWg" - }, -… - ] -} -``` -
- -#### Field Aggregations - -`GET` or `POST` - /mapi/agg/`` - -Executes an OpenSearch [bucket aggregation](https://opensearch.org/docs/latest/opensearch/bucket-agg/) query for the requested fields across all of Malcolm's indexed network traffic metadata. - -Parameters: - -* `fieldname` (URL parameter) - the name(s) of the field(s) to be queried (comma-separated if multiple fields) (default: `event.provider`) -* `limit` (query parameter) - the maximum number of records to return at each level of aggregation (default: 500) -* `from` (query parameter) - the time frame ([`gte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: "1 day ago") -* `to` (query parameter) - the time frame ([`lte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: "now") -* `filter` (query parameter) - field filters formatted as a JSON dictionary - -The `from`, `to`, and `filter` parameters can be used to further restrict the range of documents returned. The `filter` dictionary should be formatted such that its keys are field names and its values are the values for which to filter. A field name may be prepended with a `!` to negate the filter (e.g., `{"event.provider":"zeek"}` vs. `{"!event.provider":"zeek"}`). Filtering for value `null` implies "is not set" or "does not exist" (e.g., `{"event.dataset":null}` means "the field `event.dataset` is `null`/is not set" while `{"!event.dataset":null}` means "the field `event.dataset` is not `null`/is set"). - -Examples of `filter` parameter: - -* `{"!network.transport":"icmp"}` - `network.transport` is not `icmp` -* `{"network.direction":["inbound","outbound"]}` - `network.direction` is either `inbound` or `outbound` -* `{"event.provider":"zeek","event.dataset":["conn","dns"]}` - "`event.provider` is `zeek` and `event.dataset` is either `conn` or `dns`" -* `{"!event.dataset":null}` - "`event.dataset` is set (is not `null`)" - -See [Examples](#APIExamples) for more examples of `filter` and corresponding output. - -#### Document Lookup - -`GET` or `POST` - /mapi/document - -Executes an OpenSearch [query](https://opensearch.org/docs/latest/opensearch/bucket-agg/) query for the matching documents across all of Malcolm's indexed network traffic metadata. - -Parameters: - -* `limit` (query parameter) - the maximum number of documents to return (default: 500) -* `from` (query parameter) - the time frame ([`gte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: the UNIX epoch) -* `to` (query parameter) - the time frame ([`lte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: "now") -* `filter` (query parameter) - field filters formatted as a JSON dictionary (see **Field Aggregations** for examples) - -
-Example cURL command and output: - -``` -$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ - 'https://localhost/mapi/document' \ - -d '{"limit": 10, filter":{"zeek.uid":"CYeji2z7CKmPRGyga"}}' -``` - -```json -{ - "filter": { - "zeek.uid": "CYeji2z7CKmPRGyga" - }, - "range": [ - 0, - 1643056677 - ], - "results": [ - { - "_id": "220124-CYeji2z7CKmPRGyga-http-7677", - "_index": "arkime_sessions3-220124", - "_score": 0.0, - "_source": { - "@timestamp": "2022-01-24T20:31:01.846Z", - "@version": "1", - "agent": { - "hostname": "filebeat", - "id": "bc25716b-8fe7-4de6-a357-65c7d3c15c33", - "name": "filebeat", - "type": "filebeat", - "version": "7.10.2" - }, - "client": { - "bytes": 0 - }, - "destination": { - "as": { - "full": "AS54113 Fastly" - }, - "geo": { - "city_name": "Seattle", - "continent_code": "NA", - "country_code2": "US", - "country_code3": "US", - "country_iso_code": "US", - "country_name": "United States", - "dma_code": 819, - "ip": "151.101.54.132", - "latitude": 47.6092, - "location": { - "lat": 47.6092, - "lon": -122.3314 - }, - "longitude": -122.3314, - "postal_code": "98111", - "region_code": "WA", - "region_name": "Washington", - "timezone": "America/Los_Angeles" - }, - "ip": "151.101.54.132", - "port": 80 - }, - "ecs": { - "version": "1.6.0" - }, - "event": { - "action": [ - "GET" - ], - "category": [ - "web", - "network" - ], -… -``` -
- -#### Examples - -Some security-related API examples: - -
-Protocols - -``` -/mapi/agg/network.type,network.transport,network.protocol,network.protocol_version -``` - -```json -{ - "fields": [ - "network.type", - "network.transport", - "network.protocol", - "network.protocol_version" - ], - "filter": null, - "range": [ - 1970, - 1643067256 - ], - "urls": [ - "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 442240, - "key": "ipv4", - "values": { - "buckets": [ - { - "doc_count": 279538, - "key": "udp", - "values": { - "buckets": [ - { - "doc_count": 266527, - "key": "bacnet", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 12365, - "key": "dns", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 78, - "key": "dhcp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 44, - "key": "ntp", - "values": { - "buckets": [ - { - "doc_count": 22, - "key": "4" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 3, - "key": "enip", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "krb", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "syslog", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 30824, - "key": "tcp", - "values": { - "buckets": [ - { - "doc_count": 7097, - "key": "smb", - "values": { - "buckets": [ - { - "doc_count": 4244, - "key": "1" - }, - { - "doc_count": 1438, - "key": "2" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1792, - "key": "http", - "values": { - "buckets": [ - { - "doc_count": 829, - "key": "1.0" - }, - { - "doc_count": 230, - "key": "1.1" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1280, - "key": "dce_rpc", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 857, - "key": "s7comm", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 426, - "key": "ntlm", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 378, - "key": "gssapi", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 146, - "key": "tds", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 125, - "key": "ssl", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 91, - "key": "tls", - "values": { - "buckets": [ - { - "doc_count": 48, - "key": "TLSv13" - }, - { - "doc_count": 28, - "key": "TLSv12" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 29, - "key": "ssh", - "values": { - "buckets": [ - { - "doc_count": 18, - "key": "2" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 26, - "key": "modbus", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 17, - "key": "iso_cotp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 8, - "key": "enip", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 6, - "key": "rdp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 4, - "key": "ftp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 4, - "key": "krb", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 4, - "key": "rfb", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 3, - "key": "ldap", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "telnet", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 848, - "key": "icmp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1573, - "key": "ipv6", - "values": { - "buckets": [ - { - "doc_count": 1486, - "key": "udp", - "values": { - "buckets": [ - { - "doc_count": 1433, - "key": "dns", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 80, - "key": "icmp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-Software - -``` -/mapi/agg/zeek.software.name,zeek.software.unparsed_version -``` - -```json -{ - "fields": [ - "zeek.software.name", - "zeek.software.unparsed_version" - ], - "filter": null, - "range": [ - 1970, - 1643067759 - ], - "urls": [ - "/dashboards/app/dashboards#/view/87d990cc-9e0b-41e5-b8fe-b10ae1da0c85?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 6, - "key": "Chrome", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.36 Safari/525.19" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 6, - "key": "Nmap-SSH", - "values": { - "buckets": [ - { - "doc_count": 3, - "key": "Nmap-SSH1-Hostkey" - }, - { - "doc_count": 3, - "key": "Nmap-SSH2-Hostkey" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 5, - "key": "MSIE", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" - }, - { - "doc_count": 1, - "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E)" - }, - { - "doc_count": 1, - "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 4, - "key": "Firefox", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:34.0) Gecko/20100101 Firefox/34.0" - }, - { - "doc_count": 1, - "key": "Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 3, - "key": "ECS (sec", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "ECS (sec/96EE)" - }, - { - "doc_count": 1, - "key": "ECS (sec/97A6)" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 3, - "key": "NmapNSE", - "values": { - "buckets": [ - { - "doc_count": 3, - "key": "NmapNSE_1.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "Microsoft-Windows", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Microsoft-Windows/6.1 UPnP/1.0 Windows-Media-Player-DMS/12.0.7601.17514 DLNADOC/1.50" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "Microsoft-Windows-NT", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Microsoft-Windows-NT/5.1 UPnP/1.0 UPnP-Device-Host/1.0 Microsoft-HTTPAPI/2.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "SimpleHTTP", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "SimpleHTTP/0.6 Python/2.7.17" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "Windows-Media-Player-DMS", - "values": { - "buckets": [ - { - "doc_count": 2, - "key": "Windows-Media-Player-DMS/12.0.7601.17514" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "A-B WWW", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "A-B WWW/0.1" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "CONF-CTR-NAE1", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "CONF-CTR-NAE1" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "ClearSCADA", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "ClearSCADA/6.72.4644.1" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "GoAhead-Webs", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "GoAhead-Webs" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "MSFT", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "MSFT 5.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Microsoft-IIS", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Microsoft-IIS/7.5" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Microsoft-WebDAV-MiniRedir", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Microsoft-WebDAV-MiniRedir/6.1.7601" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Python-urllib", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Python-urllib/2.7" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Schneider-WEB/V", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Schneider-WEB/V2.1.4" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Version", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Version_1.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "nginx", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "nginx" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "sublime-license-check", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "sublime-license-check/3.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-User agent - -``` -/mapi/agg/user_agent.original -``` - -```json -{ - "fields": [ - "user_agent.original" - ], - "filter": null, - "range": [ - 1970, - 1643067845 - ], - "values": { - "buckets": [ - { - "doc_count": 230, - "key": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" - }, - { - "doc_count": 142, - "key": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" - }, - { - "doc_count": 114, - "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" - }, - { - "doc_count": 50, - "key": "Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)" - }, - { - "doc_count": 48, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 43, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 33, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:34.0) Gecko/20100101 Firefox/34.0" - }, - { - "doc_count": 17, - "key": "Python-urllib/2.7" - }, - { - "doc_count": 12, - "key": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" - }, - { - "doc_count": 9, - "key": "Microsoft-Windows/6.1 UPnP/1.0 Windows-Media-Player-DMS/12.0.7601.17514 DLNADOC/1.50" - }, - { - "doc_count": 9, - "key": "Windows-Media-Player-DMS/12.0.7601.17514" - }, - { - "doc_count": 8, - "key": "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko" - }, - { - "doc_count": 5, - "key": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 5, - "key": "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.36 Safari/525.19" - }, - { - "doc_count": 3, - "key": "Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0" - }, - { - "doc_count": 2, - "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - }, - { - "doc_count": 1, - "key": "Microsoft-WebDAV-MiniRedir/6.1.7601" - }, - { - "doc_count": 1, - "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E)" - }, - { - "doc_count": 1, - "key": "sublime-license-check/3.0" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-External traffic (outbound/inbound) - -``` -$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ - 'https://localhost/mapi/agg/network.protocol' \ - -d '{"filter":{"network.direction":["inbound","outbound"]}}' -``` - -```json -{ - "fields": [ - "network.protocol" - ], - "filter": { - "network.direction": [ - "inbound", - "outbound" - ] - }, - "range": [ - 1970, - 1643068000 - ], - "urls": [ - "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 202597, - "key": "bacnet" - }, - { - "doc_count": 129, - "key": "tls" - }, - { - "doc_count": 128, - "key": "ssl" - }, - { - "doc_count": 33, - "key": "http" - }, - { - "doc_count": 33, - "key": "ntp" - }, - { - "doc_count": 20, - "key": "dns" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-Cross-segment traffic - -``` -$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ - 'https://localhost/mapi/agg/source.segment,destination.segment,network.protocol' \ - -d '{"filter":{"tags":"cross_segment"}}' -``` - -```json -{ - "fields": [ - "source.segment", - "destination.segment", - "network.protocol" - ], - "filter": { - "tags": "cross_segment" - }, - "range": [ - 1970, - 1643068080 - ], - "urls": [ - "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 6893, - "key": "Corporate", - "values": { - "buckets": [ - { - "doc_count": 6893, - "key": "OT", - "values": { - "buckets": [ - { - "doc_count": 891, - "key": "enip" - }, - { - "doc_count": 889, - "key": "cip" - }, - { - "doc_count": 202, - "key": "http" - }, - { - "doc_count": 146, - "key": "modbus" - }, - { - "doc_count": 1, - "key": "ftp" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 189, - "key": "OT", - "values": { - "buckets": [ - { - "doc_count": 138, - "key": "Corporate", - "values": { - "buckets": [ - { - "doc_count": 128, - "key": "http" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 51, - "key": "DMZ", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 28, - "key": "Battery Network", - "values": { - "buckets": [ - { - "doc_count": 25, - "key": "Combined Cycle BOP", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 3, - "key": "Solar Panel Network", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 20, - "key": "Combined Cycle BOP", - "values": { - "buckets": [ - { - "doc_count": 11, - "key": "Battery Network", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 9, - "key": "Solar Panel Network", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Solar Panel Network", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Combined Cycle BOP", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-Plaintext password - -``` -$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ - 'https://localhost/mapi/agg/network.protocol' \ - -d '{"filter":{"!related.password":null}}' -``` - -```json -{ - "fields": [ - "network.protocol" - ], - "filter": { - "!related.password": null - }, - "range": [ - 1970, - 1643068162 - ], - "urls": [ - "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 20, - "key": "http" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-Insecure/outdated protocols - -``` -$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ - 'https://localhost/mapi/agg/network.protocol,network.protocol_version' \ - -d '{"filter":{"event.severity_tags":"Insecure or outdated protocol"}}' -``` - -```json -{ - "fields": [ - "network.protocol", - "network.protocol_version" - ], - "filter": { - "event.severity_tags": "Insecure or outdated protocol" - }, - "range": [ - 1970, - 1643068248 - ], - "urls": [ - "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 4244, - "key": "smb", - "values": { - "buckets": [ - { - "doc_count": 4244, - "key": "1" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "ftp", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "rdp", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "5.1" - }, - { - "doc_count": 1, - "key": "5.2" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 2, - "key": "telnet", - "values": { - "buckets": [], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-Notice categories - -``` -/mapi/agg/zeek.notice.category,zeek.notice.sub_category -``` - -```json -{ - "fields": [ - "zeek.notice.category", - "zeek.notice.sub_category" - ], - "filter": null, - "range": [ - 1970, - 1643068300 - ], - "urls": [ - "/dashboards/app/dashboards#/view/f1f09567-fc7f-450b-a341-19d2f2bb468b?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))", - "/dashboards/app/dashboards#/view/95479950-41f2-11ea-88fa-7151df485405?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 100, - "key": "ATTACK", - "values": { - "buckets": [ - { - "doc_count": 42, - "key": "Lateral_Movement_Extracted_File" - }, - { - "doc_count": 30, - "key": "Lateral_Movement" - }, - { - "doc_count": 17, - "key": "Discovery" - }, - { - "doc_count": 5, - "key": "Execution" - }, - { - "doc_count": 5, - "key": "Lateral_Movement_Multiple_Attempts" - }, - { - "doc_count": 1, - "key": "Lateral_Movement_and_Execution" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 14, - "key": "EternalSafety", - "values": { - "buckets": [ - { - "doc_count": 11, - "key": "EternalSynergy" - }, - { - "doc_count": 3, - "key": "ViolationPidMid" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 6, - "key": "Scan", - "values": { - "buckets": [ - { - "doc_count": 6, - "key": "Port_Scan" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - }, - { - "doc_count": 1, - "key": "Ripple20", - "values": { - "buckets": [ - { - "doc_count": 1, - "key": "Treck_TCP_observed" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
-
-Severity tags - -``` -/mapi/agg/event.severity_tags -``` - -```json -{ - "fields": [ - "event.severity_tags" - ], - "filter": null, - "range": [ - 1970, - 1643068363 - ], - "urls": [ - "/dashboards/app/dashboards#/view/d2dd0180-06b1-11ec-8c6b-353266ade330?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))", - "/dashboards/app/dashboards#/view/95479950-41f2-11ea-88fa-7151df485405?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" - ], - "values": { - "buckets": [ - { - "doc_count": 160180, - "key": "Outbound traffic" - }, - { - "doc_count": 43059, - "key": "Inbound traffic" - }, - { - "doc_count": 11091, - "key": "Connection attempt rejected" - }, - { - "doc_count": 8967, - "key": "Connection attempt, no reply" - }, - { - "doc_count": 7131, - "key": "Cross-segment traffic" - }, - { - "doc_count": 4250, - "key": "Insecure or outdated protocol" - }, - { - "doc_count": 2219, - "key": "External traffic" - }, - { - "doc_count": 1985, - "key": "Sensitive country" - }, - { - "doc_count": 760, - "key": "Weird" - }, - { - "doc_count": 537, - "key": "Connection aborted (originator)" - }, - { - "doc_count": 474, - "key": "Connection aborted (responder)" - }, - { - "doc_count": 206, - "key": "File transfer (high concern)" - }, - { - "doc_count": 100, - "key": "MITRE ATT&CK framework technique" - }, - { - "doc_count": 66, - "key": "Service on non-standard port" - }, - { - "doc_count": 64, - "key": "Signature (capa)" - }, - { - "doc_count": 30, - "key": "Signature (YARA)" - }, - { - "doc_count": 25, - "key": "Signature (ClamAV)" - }, - { - "doc_count": 20, - "key": "Cleartext password" - }, - { - "doc_count": 19, - "key": "Long connection" - }, - { - "doc_count": 15, - "key": "Notice (vulnerability)" - }, - { - "doc_count": 13, - "key": "File transfer (medium concern)" - }, - { - "doc_count": 6, - "key": "Notice (scan)" - }, - { - "doc_count": 1, - "key": "High volume connection" - } - ], - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0 - } -} -``` -
- -#### Event Logging - -`POST` - /mapi/event - -A webhook that accepts alert data to be reindexed into OpenSearch as session records for viewing in Malcolm's [dashboards](#Dashboards). See [Alerting](#Alerting) for more details and an example of how this API is used. - -
-Example input: - -```json - -{ - "alert": { - "monitor": { - "name": "Malcolm API Loopback Monitor" - }, - "trigger": { - "name": "Malcolm API Loopback Trigger", - "severity": 4 - }, - "period": { - "start": "2022-03-08T18:03:30.576Z", - "end": "2022-03-08T18:04:30.576Z" - }, - "results": [ - { - "_shards": { - "total": 5, - "failed": 0, - "successful": 5, - "skipped": 0 - }, - "hits": { - "hits": [], - "total": { - "value": 697, - "relation": "eq" - }, - "max_score": null - }, - "took": 1, - "timed_out": false - } - ], - "body": "", - "alert": "PLauan8BaL6eY1yCu9Xj", - "error": "" - } -} -``` -
- -
-Example output: - -```json - -{ - "_index": "arkime_sessions3-220308", - "_type": "_doc", - "_id": "220308-PLauan8BaL6eY1yCu9Xj", - "_version": 4, - "result": "updated", - "_shards": { - "total": 1, - "successful": 1, - "failed": 0 - }, - "_seq_no": 9045, - "_primary_term": 1 -} -``` -
- -## Ingesting Third-Party Logs - -Malcolm uses [OpenSearch](https://opensearch.org/) and [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) for data storage, search and visualization, and [Logstash](https://www.elastic.co/logstash/) for log processing. Because these tools are data agnostic, Malcolm can be configured to accept various host logs and other third-party logs sent from log forwaders such as [Fluent Bit](https://fluentbit.io/) and [Beats](https://www.elastic.co/beats/). Some examples of the types of logs these forwarders might send include: - -* System resource utilization metrics (CPU, memory, disk, network, etc.) -* System temperatures -* Linux system logs -* Windows event logs -* Process or service health status -* Logs appended to textual log files (e.g., `tail`-ing a log file) -* The output of an external script or program -* Messages in the form of MQTT control packets -* many more… - -Refer to [**Forwarding Third-Party Logs to Malcolm**](./scripts/third-party-logs/README.md) for more information. - -## Malcolm installer ISO - -Malcolm's Docker-based deployment model makes Malcolm able to run on a variety of platforms. However, in some circumstances (for example, as a long-running appliance as part of a security operations center, or inside of a virtual machine) it may be desirable to install Malcolm as a dedicated standalone installation. - -Malcolm can be packaged into an installer ISO based on the current [stable release](https://wiki.debian.org/DebianStable) of [Debian](https://www.debian.org/). This [customized Debian installation](https://wiki.debian.org/DebianLive) is preconfigured with the bare minimum software needed to run Malcolm. - -### Generating the ISO - -Official downloads of the Malcolm installer ISO are not provided: however, it can be built easily on an internet-connected Linux host with Vagrant: - -* [Vagrant](https://www.vagrantup.com/) - - [`vagrant-reload`](https://github.com/aidanns/vagrant-reload) plugin - - [`vagrant-sshfs`](https://github.com/dustymabe/vagrant-sshfs) plugin - - [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box - -The build should work with either the [VirtualBox](https://www.virtualbox.org/) provider or the [libvirt](https://libvirt.org/) provider: - -* [VirtualBox](https://www.virtualbox.org/) [provider](https://www.vagrantup.com/docs/providers/virtualbox) - - [`vagrant-vbguest`](https://github.com/dotless-de/vagrant-vbguest) plugin -* [libvirt](https://libvirt.org/) - - [`vagrant-libvirt`](https://github.com/vagrant-libvirt/vagrant-libvirt) provider plugin - - [`vagrant-mutate`](https://github.com/sciurus/vagrant-mutate) plugin to convert [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box to `libvirt` format - -To perform a clean build the Malcolm installer ISO, navigate to your local Malcolm working copy and run: - -``` -$ ./malcolm-iso/build_via_vagrant.sh -f -… -Starting build machine... -Bringing machine 'default' up with 'virtualbox' provider... -… -``` - -Building the ISO may take 30 minutes or more depending on your system. As the build finishes, you will see the following message indicating success: - -``` -… -Finished, created "/malcolm-build/malcolm-iso/malcolm-6.4.0.iso" -… -``` - -By default, Malcolm's Docker images are not packaged with the installer ISO, assuming instead that you will pull the [latest images](https://hub.docker.com/u/malcolmnetsec) with a `docker-compose pull` command as described in the [Quick start](#QuickStart) section. If you wish to build an ISO with the latest Malcolm images included, follow the directions to create [pre-packaged installation files](#Packager), which include a tarball with a name like `malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz`. Then, pass that images tarball to the ISO build script with a `-d`, like this: - -``` -$ ./malcolm-iso/build_via_vagrant.sh -f -d malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz -… -``` - -A system installed from the resulting ISO will load the Malcolm Docker images upon first boot. This method is desirable when the ISO is to be installed in an "air gapped" environment or for distribution to non-networked machines. - -Alternately, if you have forked Malcolm on GitHub, [workflow files](./.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](#Hedgehog) and [Malcolm](#ISO) installer ISOs, specifically [`malcolm-iso-build-docker-wrap-push-ghcr.yml`](./.github/workflows/malcolm-iso-build-docker-wrap-push-ghcr.yml) for the Malcolm ISO. You'll need to run the workflows to build and push your fork's Malcolm docker images before building the ISO. The resulting ISO file is wrapped in a Docker image that provides an HTTP server from which the ISO may be downloaded. - -### Installation - -The installer is designed to require as little user input as possible. For this reason, there are NO user prompts and confirmations about partitioning and reformatting hard disks for use by the operating system. The installer assumes that all non-removable storage media (eg., SSD, HDD, NVMe, etc.) are available for use and ⛔🆘😭💀 ***will partition and format them without warning*** 💀😭🆘⛔. - -The installer will ask for several pieces of information prior to installing the Malcolm base operating system: - -* Hostname -* Domain name -* Root password – (optional) a password for the privileged root account which is rarely needed -* User name: the name for the non-privileged service account user account under which the Malcolm runs -* User password – a password for the non-privileged sensor account -* Encryption password (optional) – if the encrypted installation option was selected at boot time, the encryption password must be entered every time the system boots - -At the end of the installation process, you will be prompted with a few self-explanatory yes/no questions: - -* **Disable IPv6?** -* **Automatically login to the GUI session?** -* **Should the GUI session be locked due to inactivity?** -* **Display the [Standard Mandatory DoD Notice and Consent Banner](https://www.stigviewer.com/stig/application_security_and_development/2018-12-24/finding/V-69349)?** *(only applies when installed on U.S. government information systems)* - -Following these prompts, the installer will reboot and the Malcolm base operating system will boot. - -### Setup - -When the system boots for the first time, the Malcolm Docker images will load if the installer was built with pre-packaged installation files as described above. Wait for this operation to continue (the progress dialog will disappear when they have finished loading) before continuing the setup. - -Open a terminal (click the red terminal 🗔 icon next to the Debian swirl logo 🍥 menu button in the menu bar). At this point, setup is similar to the steps described in the [Quick start](#QuickStart) section. Navigate to the Malcolm directory (`cd ~/Malcolm`) and run [`auth_setup`](#AuthSetup) to configure authentication. If the ISO didn't have pre-packaged Malcolm images, or if you'd like to retrieve the latest updates, run `docker-compose pull`. Finalize your configuration by running `scripts/install.py --configure` and follow the prompts as illustrated in the [installation example](#InstallationExample). - -Once Malcolm is configured, you can [start Malcolm](#Starting) via the command line or by clicking the circular yellow Malcolm icon in the menu bar. - -### Time synchronization - -If you wish to set up time synchronization via [NTP](http://www.ntp.org/) or `htpdate`, open a terminal and run `sudo configure-interfaces.py`. Select **Continue**, then choose **Time Sync**. Here you can configure the operating system to keep its time synchronized with either an NTP server (using the NTP protocol), another Malcolm instance, or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure. - -If **htpdate** is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port `9200` may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server. - -If *ntpdate* is selected, you will be prompted to enter the IP address or hostname of the NTP server. - -Upon configuring time synchronization, a "Time synchronization configured successfully!" message will be displayed. - -### Hardening - -The Malcolm aggregator base operating system uses the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmarks which target the following guidelines for establishing a secure configuration posture: - -* [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) -* [DISA STIG (Security Technical Implementation Guides) for RHEL 7](https://www.stigviewer.com/stig/red_hat_enterprise_linux_7/) v2r5 Ubuntu v1r2 [adapted](https://github.com/hardenedlinux/STIG-OS-mirror/blob/master/redhat-STIG-DOCs/U_Red_Hat_Enterprise_Linux_7_V2R5_STIG.zip) for a Debian operating system -* Additional recommendations from [cisecurity.org](https://www.cisecurity.org/) - -#### Compliance Exceptions - -[Currently](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) there are 274 checks to determine compliance with the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmark. - -The Malcolm aggregator base operating system claims exceptions from the recommendations in this benchmark in the following categories: - -**1.1 Install Updates, Patches and Additional Security Software** - When the the Malcolm aggregator appliance software is built, all of the latest applicable security patches and updates are included in it. How future updates are to be handled is still in design. - -**1.3 Enable verify the signature of local packages** - As the base distribution is not using embedded signatures, `debsig-verify` would reject all packages (see comment in `/etc/dpkg/dpkg.cfg`). Enabling it after installation would disallow any future updates. - -**2.14 Add nodev option to /run/shm Partition**, **2.15 Add nosuid Option to /run/shm Partition**, **2.16 Add noexec Option to /run/shm Partition** - The Malcolm aggregator base operating system does not mount `/run/shm` as a separate partition, so these recommendations do not apply. - -**2.19 Disable Mounting of freevxfs Filesystems**, **2.20 Disable Mounting of jffs2 Filesystems**, **2.21 Disable Mounting of hfs Filesystems**, **2.22 Disable Mounting of hfsplus Filesystems**, **2.23 Disable Mounting of squashfs Filesystems**, **2.24 Disable Mounting of udf Filesystems** - The Malcolm aggregator base operating system is not compiling a custom Linux kernel, so these filesystems are inherently supported as they are part Debian Linux's default kernel. - -**3.3 Set Boot Loader Password** - As maximizing availability is a system requirement, Malcolm should restart automatically without user intervention to ensured uninterrupted service. A boot loader password is not enabled. - -**4.8 Disable USB Devices** - The ability to ingest data (such as PCAP files) from a mounted USB mass storage device is a requirement of the system. - -**6.1 Ensure the X Window system is not installed**, **6.2 Ensure Avahi Server is not enabled**, **6.3 Ensure print server is not enabled** - An X Windows session is provided for displaying dashboards. The library packages `libavahi-common-data`, `libavahi-common3`, and `libcups2` are dependencies of some of the X components used by the Malcolm aggregator base operating system, but the `avahi` and `cups` services themselves are disabled. - -**6.17 Ensure virus scan Server is enabled**, **6.18 Ensure virus scan Server update is enabled** - As this is a network traffic analysis appliance rather than an end-user device, regular user files will not be created. A virus scan program would impact device performance and would be unnecessary. - -**7.1.1 Disable IP Forwarding**, **7.2.4 Log Suspicious Packets**, **7.2.7 Enable RFC-recommended Source Route Validation**, **7.4.1 Install TCP Wrappers** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, these recommendations do not apply. - -**8.1.1.2 Disable System on Audit Log Full**, **8.1.1.3 Keep All Auditing Information**, **8.1.1.5 Ensure set remote_server for audit service**, **8.1.1.6 Ensure enable_krb5 set to yes for remote audit service**, **8.1.1.7 Ensure set action for audit storage volume is fulled**, **8.1.1.8 Ensure set action for network failure on remote audit service**, **8.1.1.9 Set space left for auditd service**, a few other audit-related items under section **8.1**, **8.2.4 Configure rsyslog to Send Logs to a Remote Log Host** - As maximizing availability is a system requirement, audit processing failures will be logged on the device rather than halting the system. `auditd` is set up to syslog when its local storage capacity is reached. - -**8.4.2 Implement Periodic Execution of File Integrity** - This functionality is not configured by default, but it can be configured post-install by the end user. - -Password-related recommendations under **9.2** and **10.1** - The library package `libpam-pwquality` is used in favor of `libpam-cracklib` which is what the [compliance scripts](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) are looking for. Also, as an appliance running Malcolm is intended to be used as an appliance rather than a general user-facing software platform, some exceptions to password enforcement policies are claimed. - -**9.3.13 Limit Access via SSH** - The Malcolm aggregator base operating system does not create multiple regular user accounts: only `root` and an aggregator service account are used. SSH access for `root` is disabled. SSH login with a password is also disallowed: only key-based authentication is accepted. The service account accepts no keys by default. As such, the `AllowUsers`, `AllowGroups`, `DenyUsers`, and `DenyGroups` values in `sshd_config` do not apply. - -**9.4 Restrict Access to the su Command** - The Malcolm aggregator base operating system does not create multiple regular user accounts: only `root` and an aggregator service account are used. - -**10.1.6 Remove nopasswd option from the sudoers configuration** - A very limited set of operations (a single script used to run the AIDE integrity check as a non-root user) has the NOPASSWD option set to allow it to be run in the background without user intervention. - -**10.1.10 Set maxlogins for all accounts** and **10.5 Set Timeout on ttys** - The Malcolm aggregator base operating system does not create multiple regular user accounts: only `root` and an aggregator service account are used. - -**12.10 Find SUID System Executables**, **12.11 Find SGID System Executables** - The few files found by [these](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.10_find_suid_files.sh) [scripts](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.11_find_sgid_files.sh) are valid exceptions required by the Malcolm aggregator base operating system's core requirements. - -**14.1 Defense for NAT Slipstreaming** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, this recommendation does not apply. - -Please review the notes for these additional guidelines. While not claiming an exception, the Malcolm aggregator base operating system may implement them in a manner different than is described by the [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) or the [hardenedlinux/harbian-audit](https://github.com/hardenedlinux/harbian-audit) audit scripts. - -**4.1 Restrict Core Dumps** - The Malcolm aggregator base operating system disables core dumps using a configuration file for `ulimit` named `/etc/security/limits.d/limits.conf`. The [audit script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/4.1_restrict_core_dumps.sh) checking for this does not check the `limits.d` subdirectory, which is why this is incorrectly flagged as noncompliant. - -**5.4 Ensure ctrl-alt-del is disabled** - The Malcolm aggregator base operating system disables the `ctrl+alt+delete` key sequence by executing `systemctl disable ctrl-alt-del.target` during installation and the command `systemctl mask ctrl-alt-del.target` at boot time. - -**7.4.4 Create /etc/hosts.deny**, **7.7.1 Ensure Firewall is active**, **7.7.4.1 Ensure default deny firewall policy**, **7.7.4.2 Ensure loopback traffic is configured**, **7.7.4.3 Ensure default deny firewall policy**, **7.7.4.4 Ensure outbound and established connections are configured** - The Malcolm aggregator base operating system **is** configured with an appropriately locked-down software firewall (managed by "Uncomplicated Firewall" `ufw`). However, the methods outlined in the CIS benchmark recommendations do not account for this configuration. - -**8.6 Verifies integrity all packages** - The [script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/8.7_verify_integrity_packages.sh) which verifies package integrity only "fails" because of missing (status `??5??????` displayed by the utility) language ("locale") files, which are removed as part of the Malcolm aggregator base operating system's trimming-down process. All non-locale-related system files pass intergrity checks. - -## Installation example using Ubuntu 22.04 LTS - -Here's a step-by-step example of getting [Malcolm from GitHub](https://github.com/idaholab/Malcolm/tree/main), configuring your system and your Malcolm instance, and running it on a system running Ubuntu Linux. Your mileage may vary depending on your individual system configuration, but this should be a good starting point. - -The commands in this example should be executed as a non-root user. - -You can use `git` to clone Malcolm into a local working copy, or you can download and extract the artifacts from the [latest release](https://github.com/idaholab/Malcolm/releases). - -To install Malcolm from the latest Malcolm release, browse to the [Malcolm releases page on GitHub](https://github.com/idaholab/Malcolm/releases) and download at a minimum `install.py` and the `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` file, then navigate to your downloads directory: -``` -user@host:~$ cd Downloads/ -user@host:~/Downloads$ ls -malcolm_common.py install.py malcolm_20190611_095410_ce2d8de.tar.gz -``` - -If you are obtaining Malcolm using `git` instead, run the following command to clone Malcolm into a local working copy: -``` -user@host:~$ git clone https://github.com/idaholab/Malcolm -Cloning into 'Malcolm'... -remote: Enumerating objects: 443, done. -remote: Counting objects: 100% (443/443), done. -remote: Compressing objects: 100% (310/310), done. -remote: Total 443 (delta 81), reused 441 (delta 79), pack-reused 0 -Receiving objects: 100% (443/443), 6.87 MiB | 18.86 MiB/s, done. -Resolving deltas: 100% (81/81), done. - -user@host:~$ cd Malcolm/ -``` - -Next, run the `install.py` script to configure your system. Replace `user` in this example with your local account username, and follow the prompts. Most questions have an acceptable default you can accept by pressing the `Enter` key. Depending on whether you are installing Malcolm from the release tarball or inside of a git working copy, the questions below will be slightly different, but for the most part are the same. -``` -user@host:~/Malcolm$ sudo ./scripts/install.py -Installing required packages: ['apache2-utils', 'make', 'openssl', 'python3-dialog'] - -"docker info" failed, attempt to install Docker? (Y/n): y - -Attempt to install Docker using official repositories? (Y/n): y -Installing required packages: ['apt-transport-https', 'ca-certificates', 'curl', 'gnupg-agent', 'software-properties-common'] -Installing docker packages: ['docker-ce', 'docker-ce-cli', 'containerd.io'] -Installation of docker packages apparently succeeded - -Add a non-root user to the "docker" group?: y - -Enter user account: user - -Add another non-root user to the "docker" group?: n - -"docker-compose version" failed, attempt to install docker-compose? (Y/n): y - -Install docker-compose directly from docker github? (Y/n): y -Download and installation of docker-compose apparently succeeded - -fs.file-max increases allowed maximum for file handles -fs.file-max= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -fs.inotify.max_user_watches increases allowed maximum for monitored files -fs.inotify.max_user_watches= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -fs.inotify.max_queued_events increases queue size for monitored files -fs.inotify.max_queued_events= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -fs.inotify.max_user_instances increases allowed maximum monitor file watchers -fs.inotify.max_user_instances= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -vm.max_map_count increases allowed maximum for memory segments -vm.max_map_count= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -net.core.somaxconn increases allowed maximum for socket connections -net.core.somaxconn= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -vm.swappiness adjusts the preference of the system to swap vs. drop runtime memory pages -vm.swappiness= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -vm.dirty_background_ratio defines the percentage of system memory fillable with "dirty" pages before flushing -vm.dirty_background_ratio= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -vm.dirty_ratio defines the maximum percentage of dirty system memory before committing everything -vm.dirty_ratio= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y - -/etc/security/limits.d/limits.conf increases the allowed maximums for file handles and memlocked segments -/etc/security/limits.d/limits.conf does not exist, create it? (Y/n): y -``` - -If you are configuring Malcolm from within a git working copy, `install.py` will now exit. Run `install.py` again like you did at the beginning of the example, only remove the `sudo` and add `--configure` to run `install.py` in "configuration only" mode. -``` -user@host:~/Malcolm$ ./scripts/install.py --configure -``` - -Alternately, if you are configuring Malcolm from the release tarball you will be asked if you would like to extract the contents of the tarball and to specify the installation directory and `install.py` will continue: -``` -Extract Malcolm runtime files from /home/user/Downloads/malcolm_20190611_095410_ce2d8de.tar.gz (Y/n): y - -Enter installation path for Malcolm [/home/user/Downloads/malcolm]: /home/user/Malcolm -Malcolm runtime files extracted to /home/user/Malcolm -``` - -Now that any necessary system configuration changes have been made, the local Malcolm instance will be configured: -``` -Malcolm processes will run as UID 1000 and GID 1000. Is this OK? (Y/n): y - -Should Malcolm use and maintain its own OpenSearch instance? (Y/n): y - -Forward Logstash logs to a secondary remote OpenSearch instance? (y/N): n - -Setting 10g for OpenSearch and 3g for Logstash. Is this OK? (Y/n): y - -Setting 3 workers for Logstash pipelines. Is this OK? (Y/n): y - -Restart Malcolm upon system or Docker daemon restart? (y/N): y -1: no -2: on-failure -3: always -4: unless-stopped -Select Malcolm restart behavior (unless-stopped): 4 - -Require encrypted HTTPS connections? (Y/n): y - -Will Malcolm be running behind another reverse proxy (Traefik, Caddy, etc.)? (y/N): n - -Specify external Docker network name (or leave blank for default networking) (): - -Authenticate against Lightweight Directory Access Protocol (LDAP) server? (y/N): n - -Store OpenSearch index snapshots locally in /home/user/Malcolm/opensearch-backup? (Y/n): y - -Compress OpenSearch index snapshots? (y/N): n - -Delete the oldest indices when the database exceeds a certain size? (y/N): n - -Automatically analyze all PCAP files with Suricata? (Y/n): y - -Download updated Suricata signatures periodically? (Y/n): y - -Automatically analyze all PCAP files with Zeek? (Y/n): y - -Perform reverse DNS lookup locally for source and destination IP addresses in logs? (y/N): n - -Perform hardware vendor OUI lookups for MAC addresses? (Y/n): y - -Perform string randomness scoring on some fields? (Y/n): y - -Expose OpenSearch port to external hosts? (y/N): n - -Expose Logstash port to external hosts? (y/N): n - -Expose Filebeat TCP port to external hosts? (y/N): y -1: json -2: raw -Select log format for messages sent to Filebeat TCP listener (json): 1 - -Source field to parse for messages sent to Filebeat TCP listener (message): message - -Target field under which to store decoded JSON fields for messages sent to Filebeat TCP listener (miscbeat): miscbeat - -Field to drop from events sent to Filebeat TCP listener (message): message - -Tag to apply to messages sent to Filebeat TCP listener (_malcolm_beats): _malcolm_beats - -Expose SFTP server (for PCAP upload) to external hosts? (y/N): n - -Enable file extraction with Zeek? (y/N): y -1: none -2: known -3: mapped -4: all -5: interesting -Select file extraction behavior (none): 5 -1: quarantined -2: all -3: none -Select file preservation behavior (quarantined): 1 - -Scan extracted files with ClamAV? (y/N): y - -Scan extracted files with Yara? (y/N): y - -Scan extracted PE files with Capa? (y/N): y - -Lookup extracted file hashes with VirusTotal? (y/N): n - -Download updated file scanner signatures periodically? (Y/n): y - -Should Malcolm capture live network traffic to PCAP files for analysis with Arkime? (y/N): y - -Capture packets using netsniff-ng? (Y/n): y - -Capture packets using tcpdump? (y/N): n - -Should Malcolm analyze live network traffic with Suricata? (y/N): y - -Should Malcolm analyze live network traffic with Zeek? (y/N): y - -Specify capture interface(s) (comma-separated): eth0 - -Capture filter (tcpdump-like filter expression; leave blank to capture all traffic) (): not port 5044 and not port 8005 and not port 9200 - -Disable capture interface hardware offloading and adjust ring buffer sizes? (y/N): n - -Malcolm has been installed to /home/user/Malcolm. See README.md for more information. -Scripts for starting and stopping Malcolm and changing authentication-related settings can be found in /home/user/Malcolm/scripts. -``` - -At this point you should **reboot your computer** so that the new system settings can be applied. After rebooting, log back in and return to the directory to which Malcolm was installed (or to which the git working copy was cloned). - -Now we need to [set up authentication](#AuthSetup) and generate some unique self-signed TLS certificates. You can replace `analyst` in this example with whatever username you wish to use to log in to the Malcolm web interface. -``` -user@host:~/Malcolm$ ./scripts/auth_setup - -Store administrator username/password for local Malcolm access? (Y/n): y - -Administrator username: analyst -analyst password: -analyst password (again): - -(Re)generate self-signed certificates for HTTPS access (Y/n): y - -(Re)generate self-signed certificates for a remote log forwarder (Y/n): y - -Store username/password for primary remote OpenSearch instance? (y/N): n - -Store username/password for secondary remote OpenSearch instance? (y/N): n - -Store username/password for email alert sender account? (y/N): n - -(Re)generate internal passwords for NetBox (Y/n): y -``` - -For now, rather than [build Malcolm from scratch](#Build), we'll pull images from [Docker Hub](https://hub.docker.com/u/malcolmnetsec): -``` -user@host:~/Malcolm$ docker-compose pull -Pulling api ... done -Pulling arkime ... done -Pulling dashboards ... done -Pulling dashboards-helper ... done -Pulling file-monitor ... done -Pulling filebeat ... done -Pulling freq ... done -Pulling htadmin ... done -Pulling logstash ... done -Pulling name-map-ui ... done -Pulling netbox ... done -Pulling netbox-postgresql ... done -Pulling netbox-redis ... done -Pulling nginx-proxy ... done -Pulling opensearch ... done -Pulling pcap-capture ... done -Pulling pcap-monitor ... done -Pulling suricata ... done -Pulling upload ... done -Pulling zeek ... done - -user@host:~/Malcolm$ docker images -REPOSITORY TAG IMAGE ID CREATED SIZE -malcolmnetsec/api 6.4.0 xxxxxxxxxxxx 3 days ago 158MB -malcolmnetsec/arkime 6.4.0 xxxxxxxxxxxx 3 days ago 816MB -malcolmnetsec/dashboards 6.4.0 xxxxxxxxxxxx 3 days ago 1.02GB -malcolmnetsec/dashboards-helper 6.4.0 xxxxxxxxxxxx 3 days ago 184MB -malcolmnetsec/file-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 588MB -malcolmnetsec/file-upload 6.4.0 xxxxxxxxxxxx 3 days ago 259MB -malcolmnetsec/filebeat-oss 6.4.0 xxxxxxxxxxxx 3 days ago 624MB -malcolmnetsec/freq 6.4.0 xxxxxxxxxxxx 3 days ago 132MB -malcolmnetsec/htadmin 6.4.0 xxxxxxxxxxxx 3 days ago 242MB -malcolmnetsec/logstash-oss 6.4.0 xxxxxxxxxxxx 3 days ago 1.35GB -malcolmnetsec/name-map-ui 6.4.0 xxxxxxxxxxxx 3 days ago 143MB -malcolmnetsec/netbox 6.4.0 xxxxxxxxxxxx 3 days ago 1.01GB -malcolmnetsec/nginx-proxy 6.4.0 xxxxxxxxxxxx 3 days ago 121MB -malcolmnetsec/opensearch 6.4.0 xxxxxxxxxxxx 3 days ago 1.17GB -malcolmnetsec/pcap-capture 6.4.0 xxxxxxxxxxxx 3 days ago 121MB -malcolmnetsec/pcap-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 213MB -malcolmnetsec/postgresql 6.4.0 xxxxxxxxxxxx 3 days ago 268MB -malcolmnetsec/redis 6.4.0 xxxxxxxxxxxx 3 days ago 34.2MB -malcolmnetsec/suricata 6.4.0 xxxxxxxxxxxx 3 days ago 278MB -malcolmnetsec/zeek 6.4.0 xxxxxxxxxxxx 3 days ago 1GB -``` - -Finally, we can start Malcolm. When Malcolm starts it will stream informational and debug messages to the console. If you wish, you can safely close the console or use `Ctrl+C` to stop these messages; Malcolm will continue running in the background. -``` -user@host:~/Malcolm$ ./scripts/start -In a few minutes, Malcolm services will be accessible via the following URLs: ------------------------------------------------------------------------------- - - Arkime: https://localhost/ - - OpenSearch Dashboards: https://localhost/dashboards/ - - PCAP upload (web): https://localhost/upload/ - - PCAP upload (sftp): sftp://username@127.0.0.1:8022/files/ - - Host and subnet name mapping editor: https://localhost/name-map-ui/ - - NetBox: https://localhost/netbox/ - - Account management: https://localhost:488/ - -NAME COMMAND SERVICE STATUS PORTS -malcolm-api-1 "/usr/local/bin/dock…" api running (starting) … -malcolm-arkime-1 "/usr/local/bin/dock…" arkime running (starting) … -malcolm-dashboards-1 "/usr/local/bin/dock…" dashboards running (starting) … -malcolm-dashboards-helper-1 "/usr/local/bin/dock…" dashboards-helper running (starting) … -malcolm-file-monitor-1 "/usr/local/bin/dock…" file-monitor running (starting) … -malcolm-filebeat-1 "/usr/local/bin/dock…" filebeat running (starting) … -malcolm-freq-1 "/usr/local/bin/dock…" freq running (starting) … -malcolm-htadmin-1 "/usr/local/bin/dock…" htadmin running (starting) … -malcolm-logstash-1 "/usr/local/bin/dock…" logstash running (starting) … -malcolm-name-map-ui-1 "/usr/local/bin/dock…" name-map-ui running (starting) … -malcolm-netbox-1 "/usr/bin/tini -- /u…" netbox running (starting) … -malcolm-netbox-postgres-1 "/usr/bin/docker-uid…" netbox-postgres running (starting) … -malcolm-netbox-redis-1 "/sbin/tini -- /usr/…" netbox-redis running (starting) … -malcolm-netbox-redis-cache-1 "/sbin/tini -- /usr/…" netbox-redis-cache running (starting) … -malcolm-nginx-proxy-1 "/usr/local/bin/dock…" nginx-proxy running (starting) … -malcolm-opensearch-1 "/usr/local/bin/dock…" opensearch running (starting) … -malcolm-pcap-capture-1 "/usr/local/bin/dock…" pcap-capture running … -malcolm-pcap-monitor-1 "/usr/local/bin/dock…" pcap-monitor running (starting) … -malcolm-suricata-1 "/usr/local/bin/dock…" suricata running (starting) … -malcolm-suricata-live-1 "/usr/local/bin/dock…" suricata-live running … -malcolm-upload-1 "/usr/local/bin/dock…" upload running (starting) … -malcolm-zeek-1 "/usr/local/bin/dock…" zeek running (starting) … -malcolm-zeek-live-1 "/usr/local/bin/dock…" zeek-live running … -… -``` - -It will take several minutes for all of Malcolm's components to start up. Logstash will take the longest, probably 3 to 5 minutes. You'll know Logstash is fully ready when you see Logstash spit out a bunch of starting up messages, ending with this: -``` -… -malcolm-logstash-1 | [2022-07-27T20:27:52,056][INFO ][logstash.agent ] Pipelines running {:count=>6, :running_pipelines=>[:"malcolm-input", :"malcolm-output", :"malcolm-beats", :"malcolm-suricata", :"malcolm-enrichment", :"malcolm-zeek"], :non_running_pipelines=>[]} -… -``` - -You can now open a web browser and navigate to one of the [Malcolm user interfaces](#UserInterfaceURLs). - -## Upgrading Malcolm - -At this time there is not an "official" upgrade procedure to get from one version of Malcolm to the next, as it may vary from platform to platform. However, the process is fairly simple can be done by following these steps: - -### Update the underlying system - -You may wish to get the official updates for the underlying system's software packages before you proceed. Consult the documentation of your operating system for how to do this. - -If you are upgrading an Malcolm instance installed from [Malcolm installation ISO](#ISOInstallation), follow scenario 2 below. Due to the Malcolm base operating system's [hardened](#Hardening) configuration, when updating the underlying system, temporarily set the umask value to Debian default (`umask 0022` in the root shell in which updates are being performed) instead of the more restrictive Malcolm default. This will allow updates to be applied with the right permissions. - -### Scenario 1: Malcolm is a GitHub clone - -If you checked out a working copy of the Malcolm repository from GitHub with a `git clone` command, here are the basic steps to performing an upgrade: - -1. stop Malcolm - * `./scripts/stop` -2. stash changes to `docker-compose.yml` and other files - * `git stash save "pre-upgrade Malcolm configuration changes"` -3. pull changes from GitHub repository - * `git pull --rebase` -4. pull new Docker images (this will take a while) - * `docker-compose pull` -5. apply saved configuration change stashed earlier - * `git stash pop` -6. if you see `Merge conflict` messages, resolve the [conflicts](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging#_basic_merge_conflicts) with your favorite text editor -7. you may wish to re-run `install.py --configure` as described in [System configuration and tuning](#ConfigAndTuning) in case there are any new `docker-compose.yml` parameters for Malcolm that need to be set up -8. start Malcolm - * `./scripts/start` -9. you may be prompted to [configure authentication](#AuthSetup) if there are new authentication-related files that need to be generated - * you probably do not need to re-generate self-signed certificates - -### Scenario 2: Malcolm was installed from a packaged tarball - -If you installed Malcolm from [pre-packaged installation files](https://github.com/idaholab/Malcolm#Packager), here are the basic steps to perform an upgrade: - -1. stop Malcolm - * `./scripts/stop` -2. uncompress the new pre-packaged installation files (using `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` as an example, the file and/or directory names will be different depending on the release) - * `tar xf malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` -3. backup current Malcolm scripts, configuration files and certificates - * `mkdir -p ./upgrade_backup_$(date +%Y-%m-%d)` - * `cp -r filebeat/ htadmin/ logstash/ nginx/ auth.env cidr-map.txt docker-compose.yml host-map.txt net-map.json ./scripts ./README.md ./upgrade_backup_$(date +%Y-%m-%d)/` -3. replace scripts and local documentation in your existing installation with the new ones - * `rm -rf ./scripts ./README.md` - * `cp -r ./malcolm_YYYYMMDD_HHNNSS_xxxxxxx/scripts ./malcolm_YYYYMMDD_HHNNSS_xxxxxxx/README.md ./` -4. replace (overwrite) `docker-compose.yml` file with new version - * `cp ./malcolm_YYYYMMDD_HHNNSS_xxxxxxx/docker-compose.yml ./docker-compose.yml` -5. re-run `./scripts/install.py --configure` as described in [System configuration and tuning](#ConfigAndTuning) -6. using a file comparison tool (e.g., `diff`, `meld`, `Beyond Compare`, etc.), compare `docker-compose.yml` and the `docker-compare.yml` file you backed up in step 3, and manually migrate over any customizations you wish to preserve from that file (e.g., `PCAP_FILTER`, `MAXMIND_GEOIP_DB_LICENSE_KEY`, `MANAGE_PCAP_FILES`; [anything else](#DockerComposeYml) you may have edited by hand in `docker-compose.yml` that's not prompted for in `install.py --configure`) -7. pull the new docker images (this will take a while) - * `docker-compose pull` to pull them from Docker Hub or `docker-compose load -i malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz` if you have an offline tarball of the Malcolm docker images -8. start Malcolm - * `./scripts/start` -9. you may be prompted to [configure authentication](#AuthSetup) if there are new authentication-related files that need to be generated - * you probably do not need to re-generate self-signed certificates - -### Post-upgrade - -#### Monitoring Malcolm - -If you are technically-minded, you may wish to follow the debug output provided by `./scripts/start` (or `./scripts/logs` if you need to re-open the log stream after you've closed it), although there is a lot there and it may be hard to distinguish whether or not something is okay. - -Running `docker-compose ps -a` should give you a good idea if all of Malcolm's Docker containers started up and, in some cases, may be able to indicate if the containers are "healthy" or not. - -After upgrading following one of the previous outlines, give Malcolm several minutes to get started. Once things are up and running, open one of Malcolm's [web interfaces](#UserInterfaceURLs) to verify that things are working. - -#### Loading new OpenSearch Dashboards visualizations - -Once the upgraded instance Malcolm has started up, you'll probably want to import the new dashboards and visualizations for OpenSearch Dashboards. You can signal Malcolm to load the new visualizations by opening OpenSearch Dashboards, clicking **Management** → **Index Patterns**, then selecting the `arkime_sessions3-*` index pattern and clicking the delete **🗑** button near the upper-right of the window. Confirm the **Delete index pattern?** prompt by clicking **Delete**. Close the OpenSearch Dashboards browser window. After a few minutes the missing index pattern will be detected and OpenSearch Dashboards will be signalled to load its new dashboards and visualizations. - -### Major releases - -The Malcolm project uses [semantic versioning](https://semver.org/) when choosing version numbers. If you are moving between major releases (e.g., from v4.0.1 to v5.0.0), you're likely to find that there are enough major backwards compatibility-breaking changes that upgrading may not be worth the time and trouble. A fresh install is strongly recommended between major releases. - -## Modifying or Contributing to Malcolm - -If you are interested in contributing to the Malcolm project, please read the [Malcolm Contributor Guide](./docs/contributing/README.md). - ## Forks [CISA](https://www.cisa.gov/) maintains the upstream source code repository for Malcolm at [https://github.com/cisagov/Malcolm](https://github.com/cisagov/Malcolm). The [Idaho National Lab](https://inl.gov/)'s fork of Malcolm, which is currently kept up-to-date with CISA's upstream development, can be found at [https://github.com/idaholab/Malcolm](https://github.com/idaholab/Malcolm). diff --git a/_config.yml b/_config.yml new file mode 100644 index 000000000..ce1c45574 --- /dev/null +++ b/_config.yml @@ -0,0 +1,33 @@ +title: Malcolm +description: A powerful, easily deployable network traffic analysis tool suite +logo: docs/images/logo/Malcolm_outline_banner_dark.png +remote_theme: pages-themes/minimal@v0.2.0 +external_download_url: https://malcolm.fyi/download/ +youtube_url: https://www.youtube.com/c/MalcolmNetworkTrafficAnalysisToolSuite +docs_uri: docs/ +alerting_docs_uri: docs/alerting.html +anomaly_detection_docs_uri: docs/anomaly-detection.html +api_docs_uri: docs/api.html +arkime_docs_uri: docs/arkime.html +components_docs_uri: docs/components.html +configuring_docs_uri: docs/malcolm-preparation.html +contributing_docs_uri: docs/contributing-guide.html +dashboards_docs_uri: docs/dashboards.html +hardening_docs_uri: docs/hardening.html +hedgehog_docs_uri: docs/hedgehog.html +live_analysis_docs_uri: docs/live-analysis.html +protocols_docs_uri: docs/protocols.html +queries_docs_uri: docs/queries-cheat-sheet.html +quickstart_docs_uri: docs/quickstart.html +severity_docs_uri: docs/severity.html +thirdparty_logs_docs_uri: docs/third-party-logs.html +upload_docs_uri: docs/upload.html +github: + owner_name: Seth Grover +plugins: + - jekyll-remote-theme + - jekyll-relative-links +show_downloads: true +relative_links: + enabled: true + collections: true diff --git a/_includes/head-custom.html b/_includes/head-custom.html new file mode 100644 index 000000000..30e2a3471 --- /dev/null +++ b/_includes/head-custom.html @@ -0,0 +1 @@ + diff --git a/_layouts/default.html b/_layouts/default.html new file mode 100644 index 000000000..7c44ecca6 --- /dev/null +++ b/_layouts/default.html @@ -0,0 +1,98 @@ + + + + + + + +{% seo %} + + + + {% include head-custom.html %} + + +
+
+ {% if site.logo %} + Logo + {% else %} +

{{ site.title | default: site.github.repository_name }}

+ {% endif %} + +

{{ site.description | default: site.github.project_tagline }}

+ +

Quick Start

+ +

Documentation

+ +

Components

+ +

Supported Protocols

+ +

Configuring

+ +

Arkime

+ +

Dashboards

+ +

API

+ +

Hardening

+ +

Hedgehog Linux

+ +

Contribution Guide

+ + + +
+
+ {{ content }} + +
+ +
+ + + diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 000000000..4cea66ec6 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,102 @@ +# Overview + +![Malcolm Network Diagram](./images/malcolm_network_diagram.png) + +Malcolm processes network traffic data in the form of packet capture (docs/PCAP) files or Zeek logs. A [sensor](live-analysis.md#Hedgehog) (packet capture appliance) monitors network traffic mirrored to it over a SPAN port on a network switch or router, or using a network TAP device. [Zeek](https://www.zeek.org/index.html) logs and [Arkime](https://molo.ch/) sessions are generated containing important session metadata from the traffic observed, which are then securely forwarded to a Malcolm instance. Full PCAP files are optionally stored locally on the sensor device for examination later. + +Malcolm parses the network session data and enriches it with additional lookups and mappings including GeoIP mapping, hardware manufacturer lookups from [organizationally unique identifiers (docs/OUI)](http://standards-oui.ieee.org/oui/oui.txt) in MAC addresses, assigning names to [network segments](host-and-subnet-mapping.md#SegmentNaming) and [hosts](host-and-subnet-mapping.md#HostNaming) based on user-defined IP address and MAC mappings, performing [TLS fingerprinting](https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967), and many others. + +The enriched data is stored in an [OpenSearch](https://opensearch.org/) document store in a format suitable for analysis through two intuitive interfaces: OpenSearch Dashboards, a flexible data visualization plugin with dozens of prebuilt dashboards providing an at-a-glance overview of network protocols; and Arkime, a powerful tool for finding and identifying the network sessions comprising suspected security incidents. These tools can be accessed through a web browser from analyst workstations or for display in a security operations center (SOC). Logs can also optionally be forwarded on to another instance of Malcolm. + +![Malcolm Data Pipeline](./images/malcolm_data_pipeline.png) + +For smaller networks, use at home by network security enthusiasts, or in the field for incident response engagements, Malcolm can also easily be deployed locally on an ordinary consumer workstation or laptop. Malcolm can process local artifacts such as locally-generated Zeek logs, locally-captured PCAP files, and PCAP files collected offline without the use of a dedicated sensor appliance. + + +* [Quick start](quickstart.md#QuickStart) + - [Getting Malcolm](quickstart.md#GetMalcolm) + - [User interface](quickstart.md#UserInterfaceURLs) +* [Components](components.md#Components) +* [Supported Protocols](protocols.md#Protocols) +* [Development](development.md#Development) + - [Building from source](development.md#Build) + - [Pre-Packaged installation files](development.md#Packager) +* [Configuration](malcolm-preparation.md#Configuration) + - [Recommended system requirements](system-requirements.md#SystemRequirements) + - [Malcolm Configuration](malcolm-config.md#ConfigAndTuning) + + [`docker-compose.yml` parameters](malcolm-config.md#DockerComposeYml) + - [Configure authentication](authsetup.md#AuthSetup) + + [Local account management](authsetup.md#AuthBasicAccountManagement) + + [Lightweight Directory Access Protocol (LDAP) authentication](authsetup.md#AuthLDAP) + * [LDAP connection security](authsetup.md#AuthLDAPSecurity) + + [TLS certificates](authsetup.md#TLSCerts) + - [Platform-specific Configuration](host-config.md#HostSystemConfig) + + [Linux host system configuration](host-config-linux.md#HostSystemConfigLinux) + + [macOS host system configuration](host-config-macos.md#HostSystemConfigMac) + + [Windows host system configuration](host-config-windows.md#HostSystemConfigWindows) +* [Running Malcolm](running.md#Running) + - [OpenSearch instances](opensearch-instances.md#OpenSearchInstance) + + [Authentication and authorization for remote OpenSearch clusters](opensearch-instances.md#OpenSearchAuth) + - [Starting Malcolm](running.md#Starting) + - [Stopping and restarting Malcolm](running.md#StopAndRestart) + - [Clearing Malcolm's data](running.md#Wipe) + - [Temporary read-only interface](running.md#ReadOnlyUI) +* [Capture file and log archive upload](upload.md#Upload) + - [Tagging](upload.md#Tagging) + - [Processing uploaded PCAPs with Zeek and Suricata](upload.md#UploadPCAPProcessors) +* [Live analysis](live-analysis.md#LiveAnalysis) + - [Using a network sensor appliance](live-analysis.md#Hedgehog) + - [Monitoring local network interfaces](live-analysis.md#LocalPCAP) + - [Manually forwarding logs from an external source](live-analysis.md#ExternalForward) +* [Arkime](arkime.md#Arkime) + - [Zeek log integration](arkime.md#ArkimeZeek) + + [Correlating Zeek logs and Arkime sessions](arkime.md#ZeekArkimeFlowCorrelation) + - [Help](arkime.md#ArkimeHelp) + - [Sessions](arkime.md#ArkimeSessions) + + [PCAP Export](arkime.md#ArkimePCAPExport) + - [SPIView](arkime.md#ArkimeSPIView) + - [SPIGraph](arkime.md#ArkimeSPIGraph) + - [Connections](arkime.md#ArkimeConnections) + - [Hunt](arkime.md#ArkimeHunt) + - [Statistics](arkime.md#ArkimeStats) + - [Settings](arkime.md#ArkimeSettings) +* [OpenSearch Dashboards](dashboards.md#Dashboards) + - [Discover](dashboards.md#Discover) + + [Screenshots](dashboards.md#DiscoverGallery) + - [Visualizations and dashboards](dashboards.md#DashboardsVisualizations) + + [Prebuilt visualizations and dashboards](dashboards.md#PrebuiltVisualizations) + * [Screenshots](dashboards.md#PrebuiltVisualizationsGallery) + + [Building your own visualizations and dashboards](dashboards.md#BuildDashboard) + * [Screenshots](dashboards.md#NewVisualizationsGallery) +* [Search Queries in Arkime and OpenSearch](queries-cheat-sheet.md#SearchCheatSheet) +* Other Malcolm features + - [Automatic file extraction and scanning](file-scanning.md#ZeekFileExtraction) + - [Automatic host and subnet name assignment](host-and-subnet-mapping.md#HostAndSubnetNaming) + + [IP/MAC address to hostname mapping via `host-map.txt`](host-and-subnet-mapping.md#HostNaming) + + [CIDR subnet to network segment name mapping via `cidr-map.txt`](host-and-subnet-mapping.md#SegmentNaming) + + [Defining hostname and CIDR subnet names interface](host-and-subnet-mapping.md#NameMapUI) + + [Applying mapping changes](host-and-subnet-mapping.md#ApplyMapping) + - [OpenSearch index management](index-management.md#IndexManagement) + - [Event severity scoring](severity.md#Severity) + + [Customizing event severity scoring](severity.md#SeverityConfig) + - [Zeek Intelligence Framework](zeek-intel.md#ZeekIntel) + + [STIX™ and TAXII™](zeek-intel.md#ZeekIntelSTIX) + + [MISP](zeek-intel.md#ZeekIntelMISP) + - [Anomaly Detection](anomaly-detection.md#AnomalyDetection) + - [Alerting](alerting.md#Alerting) + + [Email Sender Accounts](alerting.md#AlertingEmail) + - ["Best Guess" Fingerprinting for ICS Protocols](ics-best-guess.md#ICSBestGuess) + - [Asset Management with NetBox](netbox.md#NetBox) + - [CyberChef](cyberchef.md#CyberChef) + - [API](api.md#API) +* [Forwarding Third-Party Logs to Malcolm](third-party-logs.md#ThirdPartyLogs) +* [Malcolm installer ISO](malcolm-iso.md#ISO) + - [Installation](malcolm-iso.md#ISOInstallation) + - [Generating the ISO](malcolm-iso.md#ISOBuild) + - [Setup](malcolm-iso.md#ISOSetup) + - [Time synchronization](time-sync.md#ConfigTime) +* [Hardening](hardening.md#Hardening) + - [Compliance Exceptions](hardening.md#ComplianceExceptions) +* [Installation example using Ubuntu 22.04 LTS](ubuntu-install-example.md#InstallationExample) +* [Upgrading Malcolm](malcolm-upgrade.md#UpgradePlan) +* [Modifying or Contributing to Malcolm](contributing-guide.md#Contributing) \ No newline at end of file diff --git a/docs/alerting.md b/docs/alerting.md new file mode 100644 index 000000000..4aa5f83ab --- /dev/null +++ b/docs/alerting.md @@ -0,0 +1,37 @@ +# Alerting + +* [Alerting](#Alerting) + - [Email Sender Accounts](#AlertingEmail) + +Malcolm uses the Alerting plugins for [OpenSearch](https://github.com/opensearch-project/alerting) and [OpenSearch Dashboards](https://github.com/opensearch-project/alerting-dashboards-plugin). See [Alerting](https://opensearch.org/docs/latest/monitoring-plugins/alerting/index/) in the OpenSearch documentation for usage instructions. + +A fresh installation of Malcolm configures an example [custom webhook destination](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#create-destinations) named **Malcolm API Loopback Webhook** that directs the triggered alerts back into the [Malcolm API](api.md#API) to be reindexed as a session record with `event.dataset` set to `alerting`. The corresponding monitor **Malcolm API Loopback Monitor** is disabled by default, as you'll likely want to configure the trigger conditions to suit your needs. These examples are provided to illustrate how triggers and monitors can interact with a custom webhook to process alerts. + +## Email Sender Accounts + +When using an email account to send alerts, you must [authenticate each sender account](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#authenticate-sender-account) before you can send an email. The [`auth_setup`](authsetup.md#AuthSetup) script can be used to securely store the email account credentials: + +``` +./scripts/auth_setup + +Store administrator username/password for local Malcolm access? (Y/n): n + +(Re)generate self-signed certificates for HTTPS access (Y/n): n + +(Re)generate self-signed certificates for a remote log forwarder (Y/n): n + +Store username/password for primary remote OpenSearch instance? (y/N): n + +Store username/password for secondary remote OpenSearch instance? (y/N): n + +Store username/password for email alert sender account? (y/N): y + +Email account username: analyst@example.org +analyst@example.org password: +analyst@example.org password (again): +Email alert sender account variables stored: opensearch.alerting.destination.email.destination_alpha.password, opensearch.alerting.destination.email.destination_alpha.username + +(Re)generate internal passwords for NetBox (Y/n): n +``` + +This action should only be performed while Malcolm is [stopped](running.md#StopAndRestart): otherwise the credentials will not be stored correctly. \ No newline at end of file diff --git a/docs/anomaly-detection.md b/docs/anomaly-detection.md new file mode 100644 index 000000000..0123a615c --- /dev/null +++ b/docs/anomaly-detection.md @@ -0,0 +1,12 @@ +# Anomaly Detection + +Malcolm uses the Anomaly Detection plugins for [OpenSearch](https://github.com/opensearch-project/anomaly-detection) and [OpenSearch Dashboards](https://github.com/opensearch-project/anomaly-detection-dashboards-plugin) to identify anomalous log data in near real-time using the [Random Cut Forest](https://api.semanticscholar.org/CorpusID:927435) (RCF) algorithm. This can be paired with [Alerting](alerting.md#Alerting) to automatically notify when anomalies are found. See [Anomaly detection](https://opensearch.org/docs/latest/monitoring-plugins/ad/index/) in the OpenSearch documentation for usage instructions on how to create detectors for any of the many fields Malcolm supports. + +A fresh installation of Malcolm configures [several detectors](dashboards/anomaly_detectors) for detecting anomalous network traffic: + +* **network_protocol** - Detects anomalies based on application protocol (`network.protocol`) +* **action_result_user** - Detects anomalies in action (`event.action`), result (`event.result`) and user (`related.user`) within application protocols (`network.protocol`) +* **file_mime_type** - Detects anomalies based on transferred file type (`file.mime_type`) +* **total_bytes** - Detects anomalies based on traffic size (sum of `network.bytes`) + +These detectors are disabled by default, but may be enabled for anomaly detection over streaming or [historical data](https://aws.amazon.com/about-aws/whats-new/2022/01/amazon-opensearch-service-elasticsearch-anomaly-detection/). \ No newline at end of file diff --git a/docs/api-aggregations.md b/docs/api-aggregations.md new file mode 100644 index 000000000..a201aa76e --- /dev/null +++ b/docs/api-aggregations.md @@ -0,0 +1,24 @@ +# Field Aggregations + +`GET` or `POST` - /mapi/agg/`` + +Executes an OpenSearch [bucket aggregation](https://opensearch.org/docs/latest/opensearch/bucket-agg/) query for the requested fields across all of Malcolm's indexed network traffic metadata. + +Parameters: + +* `fieldname` (URL parameter) - the name(s) of the field(s) to be queried (comma-separated if multiple fields) (default: `event.provider`) +* `limit` (query parameter) - the maximum number of records to return at each level of aggregation (default: 500) +* `from` (query parameter) - the time frame ([`gte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: "1 day ago") +* `to` (query parameter) - the time frame ([`lte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: "now") +* `filter` (query parameter) - field filters formatted as a JSON dictionary + +The `from`, `to`, and `filter` parameters can be used to further restrict the range of documents returned. The `filter` dictionary should be formatted such that its keys are field names and its values are the values for which to filter. A field name may be prepended with a `!` to negate the filter (e.g., `{"event.provider":"zeek"}` vs. `{"!event.provider":"zeek"}`). Filtering for value `null` implies "is not set" or "does not exist" (e.g., `{"event.dataset":null}` means "the field `event.dataset` is `null`/is not set" while `{"!event.dataset":null}` means "the field `event.dataset` is not `null`/is set"). + +Examples of `filter` parameter: + +* `{"!network.transport":"icmp"}` - `network.transport` is not `icmp` +* `{"network.direction":["inbound","outbound"]}` - `network.direction` is either `inbound` or `outbound` +* `{"event.provider":"zeek","event.dataset":["conn","dns"]}` - "`event.provider` is `zeek` and `event.dataset` is either `conn` or `dns`" +* `{"!event.dataset":null}` - "`event.dataset` is set (is not `null`)" + +See [Examples](api-examples.md#APIExamples) for more examples of `filter` and corresponding output. \ No newline at end of file diff --git a/docs/api-document-lookup.md b/docs/api-document-lookup.md new file mode 100644 index 000000000..07d1ddbce --- /dev/null +++ b/docs/api-document-lookup.md @@ -0,0 +1,88 @@ +# Document Lookup + +`GET` or `POST` - /mapi/document + +Executes an OpenSearch [query](https://opensearch.org/docs/latest/opensearch/bucket-agg/) query for the matching documents across all of Malcolm's indexed network traffic metadata. + +Parameters: + +* `limit` (query parameter) - the maximum number of documents to return (default: 500) +* `from` (query parameter) - the time frame ([`gte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: the UNIX epoch) +* `to` (query parameter) - the time frame ([`lte`](https://opensearch.org/docs/latest/opensearch/query-dsl/term/#range)) for the beginning of the search based on the session's `firstPacket` field value in a format supported by the [dateparser](https://github.com/scrapinghub/dateparser) library (default: "now") +* `filter` (query parameter) - field filters formatted as a JSON dictionary (see **Field Aggregations** for examples) + +**Example cURL command and output:** + +``` +$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ + 'https://localhost/mapi/document' \ + -d '{"limit": 10, filter":{"zeek.uid":"CYeji2z7CKmPRGyga"}}' +``` + +```json +{ + "filter": { + "zeek.uid": "CYeji2z7CKmPRGyga" + }, + "range": [ + 0, + 1643056677 + ], + "results": [ + { + "_id": "220124-CYeji2z7CKmPRGyga-http-7677", + "_index": "arkime_sessions3-220124", + "_score": 0.0, + "_source": { + "@timestamp": "2022-01-24T20:31:01.846Z", + "@version": "1", + "agent": { + "hostname": "filebeat", + "id": "bc25716b-8fe7-4de6-a357-65c7d3c15c33", + "name": "filebeat", + "type": "filebeat", + "version": "7.10.2" + }, + "client": { + "bytes": 0 + }, + "destination": { + "as": { + "full": "AS54113 Fastly" + }, + "geo": { + "city_name": "Seattle", + "continent_code": "NA", + "country_code2": "US", + "country_code3": "US", + "country_iso_code": "US", + "country_name": "United States", + "dma_code": 819, + "ip": "151.101.54.132", + "latitude": 47.6092, + "location": { + "lat": 47.6092, + "lon": -122.3314 + }, + "longitude": -122.3314, + "postal_code": "98111", + "region_code": "WA", + "region_name": "Washington", + "timezone": "America/Los_Angeles" + }, + "ip": "151.101.54.132", + "port": 80 + }, + "ecs": { + "version": "1.6.0" + }, + "event": { + "action": [ + "GET" + ], + "category": [ + "web", + "network" + ], +… +``` \ No newline at end of file diff --git a/docs/api-event-logging.md b/docs/api-event-logging.md new file mode 100644 index 000000000..e4391e3d5 --- /dev/null +++ b/docs/api-event-logging.md @@ -0,0 +1,67 @@ +# Event Logging + +`POST` - /mapi/event + +A webhook that accepts alert data to be reindexed into OpenSearch as session records for viewing in Malcolm's [dashboards](dashboards.md#Dashboards). See [Alerting](alerting.md#Alerting) for more details and an example of how this API is used. + +**Example input:** + +```json +{ + "alert": { + "monitor": { + "name": "Malcolm API Loopback Monitor" + }, + "trigger": { + "name": "Malcolm API Loopback Trigger", + "severity": 4 + }, + "period": { + "start": "2022-03-08T18:03:30.576Z", + "end": "2022-03-08T18:04:30.576Z" + }, + "results": [ + { + "_shards": { + "total": 5, + "failed": 0, + "successful": 5, + "skipped": 0 + }, + "hits": { + "hits": [], + "total": { + "value": 697, + "relation": "eq" + }, + "max_score": null + }, + "took": 1, + "timed_out": false + } + ], + "body": "", + "alert": "PLauan8BaL6eY1yCu9Xj", + "error": "" + } +} +``` + +**Example output:** + +```json +{ + "_index": "arkime_sessions3-220308", + "_type": "_doc", + "_id": "220308-PLauan8BaL6eY1yCu9Xj", + "_version": 4, + "result": "updated", + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "_seq_no": 9045, + "_primary_term": 1 +} +``` \ No newline at end of file diff --git a/docs/api-examples.md b/docs/api-examples.md new file mode 100644 index 000000000..f060d8c11 --- /dev/null +++ b/docs/api-examples.md @@ -0,0 +1,1479 @@ +# Examples + +Some security-related API examples: + +* [Protocols](#Protocols) +* [Software](#Software) +* [User agent](#UserAgent) +* [External traffic (outbound/inbound)](#ExternalTraffic) +* [Cross-segment traffic](#CrossSegmentTraffic) +* [Plaintext password](#PlaintextPassword) +* [Insecure/outdated protocols](#InsecureProtocol) +* [Notice categories](#NoticeCategories) +* [Severity tags](#SeverityTags) + +## Protocols + +``` +/mapi/agg/network.type,network.transport,network.protocol,network.protocol_version +``` + +```json +{ + "fields": [ + "network.type", + "network.transport", + "network.protocol", + "network.protocol_version" + ], + "filter": null, + "range": [ + 1970, + 1643067256 + ], + "urls": [ + "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 442240, + "key": "ipv4", + "values": { + "buckets": [ + { + "doc_count": 279538, + "key": "udp", + "values": { + "buckets": [ + { + "doc_count": 266527, + "key": "bacnet", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 12365, + "key": "dns", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 78, + "key": "dhcp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 44, + "key": "ntp", + "values": { + "buckets": [ + { + "doc_count": 22, + "key": "4" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 3, + "key": "enip", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "krb", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "syslog", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 30824, + "key": "tcp", + "values": { + "buckets": [ + { + "doc_count": 7097, + "key": "smb", + "values": { + "buckets": [ + { + "doc_count": 4244, + "key": "1" + }, + { + "doc_count": 1438, + "key": "2" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1792, + "key": "http", + "values": { + "buckets": [ + { + "doc_count": 829, + "key": "1.0" + }, + { + "doc_count": 230, + "key": "1.1" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1280, + "key": "dce_rpc", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 857, + "key": "s7comm", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 426, + "key": "ntlm", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 378, + "key": "gssapi", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 146, + "key": "tds", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 125, + "key": "ssl", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 91, + "key": "tls", + "values": { + "buckets": [ + { + "doc_count": 48, + "key": "TLSv13" + }, + { + "doc_count": 28, + "key": "TLSv12" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 29, + "key": "ssh", + "values": { + "buckets": [ + { + "doc_count": 18, + "key": "2" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 26, + "key": "modbus", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 17, + "key": "iso_cotp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 8, + "key": "enip", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 6, + "key": "rdp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 4, + "key": "ftp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 4, + "key": "krb", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 4, + "key": "rfb", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 3, + "key": "ldap", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "telnet", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 848, + "key": "icmp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1573, + "key": "ipv6", + "values": { + "buckets": [ + { + "doc_count": 1486, + "key": "udp", + "values": { + "buckets": [ + { + "doc_count": 1433, + "key": "dns", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 80, + "key": "icmp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## Software + +``` +/mapi/agg/zeek.software.name,zeek.software.unparsed_version +``` + +```json +{ + "fields": [ + "zeek.software.name", + "zeek.software.unparsed_version" + ], + "filter": null, + "range": [ + 1970, + 1643067759 + ], + "urls": [ + "/dashboards/app/dashboards#/view/87d990cc-9e0b-41e5-b8fe-b10ae1da0c85?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 6, + "key": "Chrome", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.36 Safari/525.19" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 6, + "key": "Nmap-SSH", + "values": { + "buckets": [ + { + "doc_count": 3, + "key": "Nmap-SSH1-Hostkey" + }, + { + "doc_count": 3, + "key": "Nmap-SSH2-Hostkey" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 5, + "key": "MSIE", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + }, + { + "doc_count": 1, + "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E)" + }, + { + "doc_count": 1, + "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 4, + "key": "Firefox", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:34.0) Gecko/20100101 Firefox/34.0" + }, + { + "doc_count": 1, + "key": "Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 3, + "key": "ECS (sec", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "ECS (sec/96EE)" + }, + { + "doc_count": 1, + "key": "ECS (sec/97A6)" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 3, + "key": "NmapNSE", + "values": { + "buckets": [ + { + "doc_count": 3, + "key": "NmapNSE_1.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "Microsoft-Windows", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Microsoft-Windows/6.1 UPnP/1.0 Windows-Media-Player-DMS/12.0.7601.17514 DLNADOC/1.50" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "Microsoft-Windows-NT", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Microsoft-Windows-NT/5.1 UPnP/1.0 UPnP-Device-Host/1.0 Microsoft-HTTPAPI/2.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "SimpleHTTP", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "SimpleHTTP/0.6 Python/2.7.17" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "Windows-Media-Player-DMS", + "values": { + "buckets": [ + { + "doc_count": 2, + "key": "Windows-Media-Player-DMS/12.0.7601.17514" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "A-B WWW", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "A-B WWW/0.1" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "CONF-CTR-NAE1", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "CONF-CTR-NAE1" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "ClearSCADA", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "ClearSCADA/6.72.4644.1" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "GoAhead-Webs", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "GoAhead-Webs" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "MSFT", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "MSFT 5.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Microsoft-IIS", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Microsoft-IIS/7.5" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Microsoft-WebDAV-MiniRedir", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Microsoft-WebDAV-MiniRedir/6.1.7601" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Python-urllib", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Python-urllib/2.7" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Schneider-WEB/V", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Schneider-WEB/V2.1.4" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Version", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Version_1.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "nginx", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "nginx" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "sublime-license-check", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "sublime-license-check/3.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## User agent + +``` +/mapi/agg/user_agent.original +``` + +```json +{ + "fields": [ + "user_agent.original" + ], + "filter": null, + "range": [ + 1970, + 1643067845 + ], + "values": { + "buckets": [ + { + "doc_count": 230, + "key": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" + }, + { + "doc_count": 142, + "key": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" + }, + { + "doc_count": 114, + "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" + }, + { + "doc_count": 50, + "key": "Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)" + }, + { + "doc_count": 48, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 43, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 33, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:34.0) Gecko/20100101 Firefox/34.0" + }, + { + "doc_count": 17, + "key": "Python-urllib/2.7" + }, + { + "doc_count": 12, + "key": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + }, + { + "doc_count": 9, + "key": "Microsoft-Windows/6.1 UPnP/1.0 Windows-Media-Player-DMS/12.0.7601.17514 DLNADOC/1.50" + }, + { + "doc_count": 9, + "key": "Windows-Media-Player-DMS/12.0.7601.17514" + }, + { + "doc_count": 8, + "key": "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko" + }, + { + "doc_count": 5, + "key": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 5, + "key": "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.36 Safari/525.19" + }, + { + "doc_count": 3, + "key": "Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0" + }, + { + "doc_count": 2, + "key": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" + }, + { + "doc_count": 1, + "key": "Microsoft-WebDAV-MiniRedir/6.1.7601" + }, + { + "doc_count": 1, + "key": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E)" + }, + { + "doc_count": 1, + "key": "sublime-license-check/3.0" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## External traffic (outbound/inbound) + +``` +$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ + 'https://localhost/mapi/agg/network.protocol' \ + -d '{"filter":{"network.direction":["inbound","outbound"]}}' +``` + +```json +{ + "fields": [ + "network.protocol" + ], + "filter": { + "network.direction": [ + "inbound", + "outbound" + ] + }, + "range": [ + 1970, + 1643068000 + ], + "urls": [ + "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 202597, + "key": "bacnet" + }, + { + "doc_count": 129, + "key": "tls" + }, + { + "doc_count": 128, + "key": "ssl" + }, + { + "doc_count": 33, + "key": "http" + }, + { + "doc_count": 33, + "key": "ntp" + }, + { + "doc_count": 20, + "key": "dns" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## Cross-segment traffic + +``` +$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ + 'https://localhost/mapi/agg/source.segment,destination.segment,network.protocol' \ + -d '{"filter":{"tags":"cross_segment"}}' +``` + +```json +{ + "fields": [ + "source.segment", + "destination.segment", + "network.protocol" + ], + "filter": { + "tags": "cross_segment" + }, + "range": [ + 1970, + 1643068080 + ], + "urls": [ + "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 6893, + "key": "Corporate", + "values": { + "buckets": [ + { + "doc_count": 6893, + "key": "OT", + "values": { + "buckets": [ + { + "doc_count": 891, + "key": "enip" + }, + { + "doc_count": 889, + "key": "cip" + }, + { + "doc_count": 202, + "key": "http" + }, + { + "doc_count": 146, + "key": "modbus" + }, + { + "doc_count": 1, + "key": "ftp" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 189, + "key": "OT", + "values": { + "buckets": [ + { + "doc_count": 138, + "key": "Corporate", + "values": { + "buckets": [ + { + "doc_count": 128, + "key": "http" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 51, + "key": "DMZ", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 28, + "key": "Battery Network", + "values": { + "buckets": [ + { + "doc_count": 25, + "key": "Combined Cycle BOP", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 3, + "key": "Solar Panel Network", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 20, + "key": "Combined Cycle BOP", + "values": { + "buckets": [ + { + "doc_count": 11, + "key": "Battery Network", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 9, + "key": "Solar Panel Network", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Solar Panel Network", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Combined Cycle BOP", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## Plaintext password + +``` +$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ + 'https://localhost/mapi/agg/network.protocol' \ + -d '{"filter":{"!related.password":null}}' +``` + +```json +{ + "fields": [ + "network.protocol" + ], + "filter": { + "!related.password": null + }, + "range": [ + 1970, + 1643068162 + ], + "urls": [ + "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 20, + "key": "http" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## Insecure/outdated protocols + +``` +$ curl -k -u username -L -XPOST -H 'Content-Type: application/json' \ + 'https://localhost/mapi/agg/network.protocol,network.protocol_version' \ + -d '{"filter":{"event.severity_tags":"Insecure or outdated protocol"}}' +``` + +```json +{ + "fields": [ + "network.protocol", + "network.protocol_version" + ], + "filter": { + "event.severity_tags": "Insecure or outdated protocol" + }, + "range": [ + 1970, + 1643068248 + ], + "urls": [ + "/dashboards/app/dashboards#/view/abdd7550-2c7c-40dc-947e-f6d186a158c4?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 4244, + "key": "smb", + "values": { + "buckets": [ + { + "doc_count": 4244, + "key": "1" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "ftp", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "rdp", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "5.1" + }, + { + "doc_count": 1, + "key": "5.2" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 2, + "key": "telnet", + "values": { + "buckets": [], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## Notice categories + +``` +/mapi/agg/zeek.notice.category,zeek.notice.sub_category +``` + +```json +{ + "fields": [ + "zeek.notice.category", + "zeek.notice.sub_category" + ], + "filter": null, + "range": [ + 1970, + 1643068300 + ], + "urls": [ + "/dashboards/app/dashboards#/view/f1f09567-fc7f-450b-a341-19d2f2bb468b?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))", + "/dashboards/app/dashboards#/view/95479950-41f2-11ea-88fa-7151df485405?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 100, + "key": "ATTACK", + "values": { + "buckets": [ + { + "doc_count": 42, + "key": "Lateral_Movement_Extracted_File" + }, + { + "doc_count": 30, + "key": "Lateral_Movement" + }, + { + "doc_count": 17, + "key": "Discovery" + }, + { + "doc_count": 5, + "key": "Execution" + }, + { + "doc_count": 5, + "key": "Lateral_Movement_Multiple_Attempts" + }, + { + "doc_count": 1, + "key": "Lateral_Movement_and_Execution" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 14, + "key": "EternalSafety", + "values": { + "buckets": [ + { + "doc_count": 11, + "key": "EternalSynergy" + }, + { + "doc_count": 3, + "key": "ViolationPidMid" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 6, + "key": "Scan", + "values": { + "buckets": [ + { + "doc_count": 6, + "key": "Port_Scan" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + }, + { + "doc_count": 1, + "key": "Ripple20", + "values": { + "buckets": [ + { + "doc_count": 1, + "key": "Treck_TCP_observed" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` + +## Severity tags + +``` +/mapi/agg/event.severity_tags +``` + +```json +{ + "fields": [ + "event.severity_tags" + ], + "filter": null, + "range": [ + 1970, + 1643068363 + ], + "urls": [ + "/dashboards/app/dashboards#/view/d2dd0180-06b1-11ec-8c6b-353266ade330?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))", + "/dashboards/app/dashboards#/view/95479950-41f2-11ea-88fa-7151df485405?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'1970-01-01T00:32:50Z',to:now))" + ], + "values": { + "buckets": [ + { + "doc_count": 160180, + "key": "Outbound traffic" + }, + { + "doc_count": 43059, + "key": "Inbound traffic" + }, + { + "doc_count": 11091, + "key": "Connection attempt rejected" + }, + { + "doc_count": 8967, + "key": "Connection attempt, no reply" + }, + { + "doc_count": 7131, + "key": "Cross-segment traffic" + }, + { + "doc_count": 4250, + "key": "Insecure or outdated protocol" + }, + { + "doc_count": 2219, + "key": "External traffic" + }, + { + "doc_count": 1985, + "key": "Sensitive country" + }, + { + "doc_count": 760, + "key": "Weird" + }, + { + "doc_count": 537, + "key": "Connection aborted (originator)" + }, + { + "doc_count": 474, + "key": "Connection aborted (responder)" + }, + { + "doc_count": 206, + "key": "File transfer (high concern)" + }, + { + "doc_count": 100, + "key": "MITRE ATT&CK framework technique" + }, + { + "doc_count": 66, + "key": "Service on non-standard port" + }, + { + "doc_count": 64, + "key": "Signature (capa)" + }, + { + "doc_count": 30, + "key": "Signature (YARA)" + }, + { + "doc_count": 25, + "key": "Signature (ClamAV)" + }, + { + "doc_count": 20, + "key": "Cleartext password" + }, + { + "doc_count": 19, + "key": "Long connection" + }, + { + "doc_count": 15, + "key": "Notice (vulnerability)" + }, + { + "doc_count": 13, + "key": "File transfer (medium concern)" + }, + { + "doc_count": 6, + "key": "Notice (scan)" + }, + { + "doc_count": 1, + "key": "High volume connection" + } + ], + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0 + } +} +``` diff --git a/docs/api-fields.md b/docs/api-fields.md new file mode 100644 index 000000000..530c08a89 --- /dev/null +++ b/docs/api-fields.md @@ -0,0 +1,26 @@ +# Fields + +`GET` - /mapi/fields + +Returns the (very long) list of fields known to Malcolm, comprised of data from Arkime's [`fields` table](https://arkime.com/apiv3#fields-api), the Malcolm [OpenSearch template](./dashboards/templates/malcolm_template.json) and the OpenSearch Dashboards index pattern API. + +**Example output:** + +```json +{ + "fields": { + "@timestamp": { + "type": "date" + }, +… + "zeek.x509.san_uri": { + "description": "Subject Alternative Name URI", + "type": "string" + }, + "zeek.x509.san_uri.text": { + "type": "string" + } + }, + "total": 2005 +} +``` diff --git a/docs/api-indices.md b/docs/api-indices.md new file mode 100644 index 000000000..2003b7880 --- /dev/null +++ b/docs/api-indices.md @@ -0,0 +1,28 @@ +# Indices + +`GET` - /mapi/indices + +Lists [information related to the underlying OpenSearch indices](https://opensearch.org/docs/latest/opensearch/rest-api/cat/cat-indices/), similar to Arkime's [esindices](https://arkime.com/apiv3#esindices-api) API. + +**Example output:** + +```json +{ + "indices": [ +… + { + "docs.count": "2268613", + "docs.deleted": "0", + "health": "green", + "index": "arkime_sessions3-210301", + "pri": "1", + "pri.store.size": "1.8gb", + "rep": "0", + "status": "open", + "store.size": "1.8gb", + "uuid": "w-4Q0ofBTdWO9KqeIIAAWg" + }, +… + ] +} +``` diff --git a/docs/api-ping.md b/docs/api-ping.md new file mode 100644 index 000000000..75dc67070 --- /dev/null +++ b/docs/api-ping.md @@ -0,0 +1,11 @@ +# Ping + +`GET` - /mapi/ping + +Returns `pong` (for a simple "up" check). + +Example output: + +``` +{"ping":"pong"} +``` \ No newline at end of file diff --git a/docs/api-version.md b/docs/api-version.md new file mode 100644 index 000000000..123648891 --- /dev/null +++ b/docs/api-version.md @@ -0,0 +1,49 @@ +# Version Information + +`GET` - /mapi/version + +Returns version information about Malcolm and version/[health](https://opensearch.org/docs/latest/opensearch/rest-api/cluster-health/) information about the underlying OpenSearch instance. + +**Example output:** + +```json +{ + "built": "2022-01-18T16:10:39Z", + "opensearch": { + "cluster_name": "docker-cluster", + "cluster_uuid": "TcSiEaOgTdO_l1IivYz2gA", + "name": "opensearch", + "tagline": "The OpenSearch Project: https://opensearch.org/", + "version": { + "build_date": "2021-12-21T01:36:21.407473Z", + "build_hash": "8a529d77c7432bc45b005ac1c4ba3b2741b57d4a", + "build_snapshot": false, + "build_type": "tar", + "lucene_version": "8.10.1", + "minimum_index_compatibility_version": "6.0.0-beta1", + "minimum_wire_compatibility_version": "6.8.0", + "number": "7.10.2" + } + }, + "opensearch_health": { + "active_primary_shards": 29, + "active_shards": 29, + "active_shards_percent_as_number": 82.85714285714286, + "cluster_name": "docker-cluster", + "delayed_unassigned_shards": 0, + "discovered_master": true, + "initializing_shards": 0, + "number_of_data_nodes": 1, + "number_of_in_flight_fetch": 0, + "number_of_nodes": 1, + "number_of_pending_tasks": 0, + "relocating_shards": 0, + "status": "yellow", + "task_max_waiting_in_queue_millis": 0, + "timed_out": false, + "unassigned_shards": 6 + }, + "sha": "8ddbbf4", + "version": "5.2.0" +} +``` diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 000000000..96aca5467 --- /dev/null +++ b/docs/api.md @@ -0,0 +1,12 @@ +# API + +* [Field Aggregations](api-aggregations.md) +* [Document Lookup](api-document-lookup.md) +* [Event Logging](api-event-logging.md) +* [Fields](api-fields.md) +* [Indices](api-indices.md) +* [Ping](api-ping.md) +* [Version](api-version.md) +* [Examples](api-examples.md) + +Malcolm provides a [REST API](./api/project/__init__.py) that can be used to programatically query some aspects of Malcolm's status and data. Malcolm's API is not to be confused with the [Viewer API](https://arkime.com/apiv3) provided by Arkime, although there may be some overlap in functionality. diff --git a/docs/arkime.md b/docs/arkime.md new file mode 100644 index 000000000..6dfa79a0c --- /dev/null +++ b/docs/arkime.md @@ -0,0 +1,188 @@ +# Arkime + +* [Arkime](#Arkime) + - [Zeek log integration](#ArkimeZeek) + + [Correlating Zeek logs and Arkime sessions](#ZeekArkimeFlowCorrelation) + - [Help](#ArkimeHelp) + - [Sessions](#ArkimeSessions) + + [PCAP Export](#ArkimePCAPExport) + - [SPIView](#ArkimeSPIView) + - [SPIGraph](#ArkimeSPIGraph) + - [Connections](#ArkimeConnections) + - [Hunt](#ArkimeHunt) + - [Statistics](#ArkimeStats) + - [Settings](#ArkimeSettings) + +The Arkime interface will be accessible over HTTPS on port 443 at the docker hosts IP address (e.g., [https://localhost](https://localhost) if you are connecting locally). + +## Zeek log integration + +A stock installation of Arkime extracts all of its network connection ("session") metadata ("SPI" or "Session Profile Information") from full packet capture artifacts (PCAP files). Zeek (formerly Bro) generates similar session metadata, linking network events to sessions via a connection UID. Malcolm aims to facilitate analysis of Zeek logs by mapping values from Zeek logs to the Arkime session database schema for equivalent fields, and by creating new "native" Arkime database fields for all the other Zeek log values for which there is not currently an equivalent in Arkime: + +![Zeek log session record](./images/screenshots/arkime_session_zeek.png) + +In this way, when full packet capture is an option, analysis of PCAP files can be enhanced by the additional information Zeek provides. When full packet capture is not an option, similar analysis can still be performed using the same interfaces and processes using the Zeek logs alone. + +A few values of particular mention include **Data Source** (`event.provider` in OpenSearch), which can be used to distinguish from among the sources of the network traffic metadata record (e.g., `zeek` for Zeek logs and `arkime` for Arkime sessions); and, **Log Type** (`event.dataset` in OpenSearch), which corresponds to the kind of Zeek `.log` file from which the record was created. In other words, a search could be restricted to records from `conn.log` by searching `event.provider == zeek && event.dataset == conn`, or restricted to records from `weird.log` by searching `event.provider == zeek && event.dataset == weird`. + +Click the icon of the owl **🦉** in the upper-left hand corner of to access the Arkime usage documentation (accessible at [https://localhost/help](https://localhost/help) if you are connecting locally), click the **Fields** label in the navigation pane, then search for `zeek` to see a list of the other Zeek log types and fields available to Malcolm. + +![Zeek fields](./images/screenshots/arkime_help_fields.png) + +The values of records created from Zeek logs can be expanded and viewed like any native Arkime session by clicking the plus **➕** icon to the left of the record in the Sessions view. However, note that when dealing with these Zeek records the full packet contents are not available, so buttons dealing with viewing and exporting PCAP information will not behave as they would for records from PCAP files. Other than that, Zeek records and their values are usable in Malcolm just like native PCAP session records. + +### Correlating Zeek logs and Arkime sessions + +The Arkime interface displays both Zeek logs and Arkime sessions alongside each other. Using fields common to both data sources, one can [craft queries](queries-cheat-sheet.md#SearchCheatSheet) to filter results matching desired criteria. + +A few fields of particular mention that help limit returned results to those Zeek logs and Arkime session records generated from the same network connection are [Community ID](https://github.com/corelight/community-id-spec) (`network.community_id`) and Zeek's [connection UID](https://docs.zeek.org/en/stable/examples/logs/#using-uids) (`zeek.uid`), which Malcolm maps to both Arkime's `rootId` field and the [ECS](https://www.elastic.co/guide/en/ecs/current/ecs-event.html#field-event-id) `event.id` field. + +Community ID is specification for standard flow hashing [published by Corelight](https://github.com/corelight/community-id-spec) with the intent of making it easier to pivot from one dataset (e.g., Arkime sessions) to another (e.g., Zeek `conn.log` entries). In Malcolm both Arkime and [Zeek](https://github.com/corelight/zeek-community-id) populate this value, which makes it possible to filter for a specific network connection and see both data sources' results for that connection. + +The `rootId` field is used by Arkime to link session records together when a particular session has too many packets to be represented by a single session. When normalizing Zeek logs to Arkime's schema, Malcolm piggybacks on `rootId` to store Zeek's [connection UID](https://docs.zeek.org/en/stable/examples/logs/#using-uids) to crossreference entries across Zeek log types. The connection UID is also stored in `zeek.uid`. + +Filtering on community ID OR'ed with zeek UID (e.g., `network.community_id == "1:r7tGG//fXP1P0+BXH3zXETCtEFI=" || rootId == "CQcoro2z6adgtGlk42"`) is an effective way to see both the Arkime sessions and Zeek logs generated by a particular network connection. + +![Correlating Arkime sessions and Zeek logs](./images/screenshots/arkime_correlate_communityid_uid.png) + +## Help + +Click the icon of the owl 🦉 in the upper-left hand corner of to access the Arkime usage documentation (accessible at [https://localhost/help](https://localhost/help) if you are connecting locally), which includes such topics as [search syntax](https://localhost/help#search), the [Sessions view](https://localhost/help#sessions), [SPIView](https://localhost/help#spiview), [SPIGraph](https://localhost/help#spigraph), and the [Connections](https://localhost/help#connections) graph. + +## Sessions + +The **Sessions** view provides low-level details of the sessions being investigated, whether they be Arkime sessions created from PCAP files or [Zeek logs mapped](#ArkimeZeek) to the Arkime session database schema. + +![Arkime's Sessions view](./images/screenshots/arkime_sessions.png) + +The **Sessions** view contains many controls for filtering the sessions displayed from all sessions down to sessions of interest: + +* [search bar](https://localhost/help#search): Indicated by the magnifying glass **🔍** icon, the search bar allows defining filters on session/log metadata +* [time bounding](https://localhost/help#timebounding) controls: The **🕘**, **Start**, **End**, **Bounding**, and **Interval** fields, and the **date histogram** can be used to visually zoom and pan the time range being examined. +* search button: The **Search** button re-runs the sessions query with the filters currently specified. +* views button: Indicated by the eyeball **👁** icon, views allow overlaying additional previously-specified filters onto the current sessions filters. For convenience, Malcolm provides several Arkime preconfigured views including filtering on the `event.dataset` field. + +![Malcolm views](./images/screenshots/arkime_apply_view.png) + +* map: A global map can be expanded by clicking the globe **🌎** icon. This allows filtering sessions by IP-based geolocation when possible. + +Some of these filter controls are also available on other Arkime pages (such as SPIView, SPIGraph, Connections, and Hunt). + +The number of sessions displayed per page, as well as the page currently displayed, can be specified using the paging controls underneath the time bounding controls. + +The sessions table is displayed below the filter controls. This table contains the sessions/logs matching the specified filters. + +To the left of the column headers are two buttons. The **Toggle visible columns** button, indicated by a grid **⊞** icon, allows toggling which columns are displayed in the sessions table. The **Save or load custom column configuration** button, indicated by a columns **◫** icon, allows saving the current displayed columns or loading previously-saved configurations. This is useful for customizing which columns are displayed when investigating different types of traffic. Column headers can also be clicked to sort the results in the table, and column widths may be adjusted by dragging the separators between column headers. + +Details for individual sessions/logs can be expanded by clicking the plus **➕** icon on the left of each row. Each row may contain multiple sections and controls, depending on whether the row represents a Arkime session or a [Zeek log](#ArkimeZeek). Clicking the field names and values in the details sections allows additional filters to be specified or summary lists of unique values to be exported. + +When viewing Arkime session details (ie., a session generated from a PCAP file), an additional packets section will be visible underneath the metadata sections. When the details of a session of this type are expanded, Arkime will read the packet(s) comprising the session for display here. Various controls can be used to adjust how the packet is displayed (enabling **natural** decoding and enabling **Show Images & Files** may produce visually pleasing results), and other options (including PCAP download, carving images and files, applying decoding filters, and examining payloads in [CyberChef](https://github.com/gchq/CyberChef)) are available. + +See also Arkime's usage documentation for more information on the [Sessions view](https://localhost/help#sessions). + +### PCAP Export + +Clicking the down arrow **▼** icon to the far right of the search bar presents a list of actions including **PCAP Export** (see Arkime's [sessions help](https://localhost/help#sessions) for information on the other actions). When full PCAP sessions are displayed, the **PCAP Export** feature allows you to create a new PCAP file from the matching Arkime sessions, including controls for which sessions are included (open items, visible items, or all matching items) and whether or not to include linked segments. Click **Export PCAP** button to generate the PCAP, after which you'll be presented with a browser download dialog to save or open the file. Note that depending on the scope of the filters specified this might take a long time (or, possibly even time out). + +![Export PCAP](./images/screenshots/arkime_export_pcap.png) + +## SPIView + +Arkime's **SPI** (**S**ession **P**rofile **I**nformation) **View** provides a quick and easy-to-use interface for exploring session/log metrics. The SPIView page lists categories for general session metrics (e.g., protocol, source and destination IP addresses, sort and destination ports, etc.) as well as for all of various types of network traffic understood by Malcolm. These categories can be expanded and the top *n* values displayed, along with each value's cardinality, for the fields of interest they contain. + +![Arkime's SPIView](./images/screenshots/arkime_spiview.png) + +Click the the plus **➕** icon to the right of a category to expand it. The values for specific fields are displayed by clicking the field description in the field list underneath the category name. The list of field names can be filtered by typing part of the field name in the *Search for fields to display in this category* text input. The **Load All** and **Unload All** buttons can be used to toggle display of all of the fields belonging to that category. Once displayed, a field's name or one of its values may be clicked to provide further actions for filtering or displaying that field or its values. Of particular interest may be the **Open [fieldname] SPI Graph** option when clicking on a field's name. This will open a new tab with the SPI Graph ([see below](#ArkimeSPIGraph)) populated with the field's top values. + +Note that because the SPIView page can potentially run many queries, SPIView limits the search domain to seven days (in other words, seven indices, as each index represents one day's worth of data). When using SPIView, you will have best results if you limit your search time frame to less than or equal to seven days. This limit can be adjusted by editing the `spiDataMaxIndices` setting in [config.ini](./etc/arkime/config.ini) and rebuilding the `malcolmnetsec/arkime` docker container. + +See also Arkime's usage documentation for more information on [SPIView](https://localhost/help#spiview). + +## SPIGraph + +Arkime's **SPI** (**S**ession **P**rofile **I**nformation) **Graph** visualizes the occurrence of some field's top *n* values over time, and (optionally) geographically. This is particularly useful for identifying trends in a particular type of communication over time: traffic using a particular protocol when seen sparsely at regular intervals on that protocol's date histogram in the SPIGraph may indicate a connection check, polling, or beaconing (for example, see the `llmnr` protocol in the screenshot below). + +![Arkime's SPIGraph](./images/screenshots/arkime_spigraph.png) + +Controls can be found underneath the time bounding controls for selecting the field of interest, the number of elements to be displayed, the sort order, and a periodic refresh of the data. + +See also Arkime's usage documentation for more information on [SPIGraph](https://localhost/help#spigraph). + +## Connections + +The **Connections** page presents network communications via a force-directed graph, making it easy to visualize logical relationships between network hosts. + +![Arkime's Connections graph](./images/screenshots/arkime_connections.png) + +Controls are available for specifying the query size (where smaller values will execute more quickly but may only contain an incomplete representation of the top *n* sessions, and larger values may take longer to execute but will be more complete), which fields to use as the source and destination for node values, a minimum connections threshold, and the method for determining the "weight" of the link between two nodes. As is the case with most other visualizations in Arkime, the graph is interactive: clicking on a node or the link between two nodes can be used to modify query filters, and the nodes themselves may be repositioned by dragging and dropping them. A node's color indicates whether it communicated as a source/originator, a destination/responder, or both. + +While the default source and destination fields are *Src IP* and *Dst IP:Dst Port*, the Connections view is able to use any combination of fields. For example: + +* *Src OUI* and *Dst OUI* (hardware manufacturers) +* *Src IP* and *Protocols* +* *Originating Network Segment* and *Responding Network Segment* (see [CIDR subnet to network segment name mapping](host-and-subnet-mapping.md#SegmentNaming)) +* *Originating GeoIP City* and *Responding GeoIP City* + +or any other combination of these or other fields. + +See also Arkime's usage documentation for more information on the [Connections graph](https://localhost/help#connections). + +## Hunt + +Arkime's **Hunt** feature allows an analyst to search within the packets themselves (including payload data) rather than simply searching the session metadata. The search string may be specified using ASCII (with or without case sensitivity), hex codes, or regular expressions. Once a hunt job is complete, matching sessions can be viewed in the [Sessions](#ArkimeSessions) view. + +Clicking the **Create a packet search job** on the Hunt page will allow you to specify the following parameters for a new hunt job: + +* a packet search job **name** +* a **maximum number of packets** to examine per session +* the **search string** and its format (*ascii*, *ascii (case sensitive)*, *hex*, *regex*, or *hex regex*) +* whether to search **source packets**, **destination packets**, or both +* whether to search **raw** or **reassembled** packets + +Click the **➕ Create** button to begin the search. Arkime will scan the source PCAP files from which the sessions were created according to the search criteria. Note that whatever filters were specified when the hunt job is executed will apply to the hunt job as well; the number of sessions matching the current filters will be displayed above the hunt job parameters with text like "ⓘ Creating a new packet search job will search the packets of # sessions." + +![Hunt creation](./images/screenshots/arkime_hunt_creation.png) + +Once a hunt job is submitted, it will be assigned a unique hunt ID (a long unique string of characters like `yuBHAGsBdljYmwGkbEMm`) and its progress will be updated periodically in the **Hunt Job Queue** with the execution percent complete, the number of matches found so far, and the other parameters with which the job was submitted. More details for the hunt job can be viewed by expanding its row with the plus **➕** icon on the left. + +![Hunt completed](./images/screenshots/arkime_hunt_finished.png) + +Once the hunt job is complete (and a minute or so has passed, as the `huntId` must be added to the matching session records in the database), click the folder **📂** icon on the right side of the hunt job row to open a new [Sessions](#ArkimeSessions) tab with the search bar prepopulated to filter to sessions with packets matching the search criteria. + +![Hunt result sessions](./images/screenshots/arkime_hunt_sessions.png) + +From this list of filtered sessions you can expand session details and explore packet payloads which matched the hunt search criteria. + +The hunt feature is available only for sessions created from full packet capture data, not Zeek logs. This being the case, it is a good idea to click the eyeball **👁** icon and select the **Arkime Sessions** view to exclude Zeek logs from candidate sessions prior to using the hunt feature. + +See also Arkime's usage documentation for more information on the [hunt feature](https://localhost/help#hunt). + +## Statistics + +Arkime provides several other reports which show information about the state of Arkime and the underlying OpenSearch database. + +The **Files** list displays a list of PCAP files processed by Arkime, the date and time of the earliest packet in each file, and the file size: + +![Arkime's Files list](./images/screenshots/arkime_files.png) + +The **ES Indices** list (available under the **Stats** page) lists the OpenSearch indices within which log data is contained: + +![Arkime's ES indices list](./images/screenshots/arkime_es_stats.png) + +The **History** view provides a historical list of queries issues to Arkime and the details of those queries: + +![Arkime's History view](./images/screenshots/arkime_history.png) + +See also Arkime's usage documentation for more information on the [Files list](https://localhost/help#files), [statistics](https://localhost/help#files), and [history](https://localhost/help#history). + +## Settings + +### General settings + +The **Settings** page can be used to tweak Arkime preferences, defined additional custom views and column configurations, tweak the color theme, and more. + +See Arkime's usage documentation for more information on [settings](https://localhost/help#settings). + +![Arkime general settings](./images/screenshots/arkime_general_settings.png) + +![Arkime custom view management](./images/screenshots/arkime_view_settings.png) \ No newline at end of file diff --git a/docs/authsetup.md b/docs/authsetup.md new file mode 100644 index 000000000..4a0fe6a10 --- /dev/null +++ b/docs/authsetup.md @@ -0,0 +1,105 @@ +# Configure authentication + +* [Configure authentication](#AuthSetup) + - [Local account management](#AuthBasicAccountManagement) + - [Lightweight Directory Access Protocol (LDAP) authentication](#AuthLDAP) + + [LDAP connection security](#AuthLDAPSecurity) + - [TLS certificates](#TLSCerts) + +Malcolm requires authentication to access the [user interface](quickstart.md#UserInterfaceURLs). [Nginx](https://nginx.org/) can authenticate users with either local TLS-encrypted HTTP basic authentication or using a remote Lightweight Directory Access Protocol (LDAP) authentication server. + +With the local basic authentication method, user accounts are managed by Malcolm and can be created, modified, and deleted using a [user management web interface](#AuthBasicAccountManagement). This method is suitable in instances where accounts and credentials do not need to be synced across many Malcolm installations. + +LDAP authentication are managed on a remote directory service, such as a [Microsoft Active Directory Domain Services](https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/get-started/virtual-dc/active-directory-domain-services-overview) or [OpenLDAP](https://www.openldap.org/). + +Malcolm's authentication method is defined in the `x-auth-variables` section near the top of the [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) file with the `NGINX_BASIC_AUTH` environment variable: `true` for local TLS-encrypted HTTP basic authentication, `false` for LDAP authentication. + +In either case, you **must** run `./scripts/auth_setup` before starting Malcolm for the first time in order to: + +* define the local Malcolm administrator account username and password (although these credentials will only be used for basic authentication, not LDAP authentication) +* specify whether or not to (re)generate the self-signed certificates used for HTTPS access + * key and certificate files are located in the `nginx/certs/` directory +* specify whether or not to (re)generate the self-signed certificates used by a remote log forwarder (see the `BEATS_SSL` environment variable above) + * certificate authority, certificate, and key files for Malcolm's Logstash instance are located in the `logstash/certs/` directory + * certificate authority, certificate, and key files to be copied to and used by the remote log forwarder are located in the `filebeat/certs/` directory; if using [Hedgehog Linux](live-analysis.md#Hedgehog), these certificates should be copied to the `/opt/sensor/sensor_ctl/logstash-client-certificates` directory on the sensor +* specify whether or not to [store the username/password](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#authenticate-sender-account) for [email alert senders](https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#create-destinations) + * these parameters are stored securely in the OpenSearch keystore file `opensearch/opensearch.keystore` + +# Local account management + +[`auth_setup`](#AuthSetup) is used to define the username and password for the administrator account. Once Malcolm is running, the administrator account can be used to manage other user accounts via a **Malcolm User Management** page served over HTTPS on port 488 (e.g., [https://localhost:488](https://localhost:488) if you are connecting locally). + +Malcolm user accounts can be used to access the [interfaces](quickstart.md#UserInterfaceURLs) of all of its [components](components.md#Components), including Arkime. Arkime uses its own internal database of user accounts, so when a Malcolm user account logs in to Arkime for the first time Malcolm creates a corresponding Arkime user account automatically. This being the case, it is *not* recommended to use the Arkime **Users** settings page or change the password via the **Password** form under the Arkime **Settings** page, as those settings would not be consistently used across Malcolm. + +Users may change their passwords via the **Malcolm User Management** page by clicking **User Self Service**. A forgotten password can also be reset via an emailed link, though this requires SMTP server settings to be specified in `htadmin/config.ini` in the Malcolm installation directory. + +## Lightweight Directory Access Protocol (LDAP) authentication + +The [nginx-auth-ldap](https://github.com/kvspb/nginx-auth-ldap) module serves as the interface between Malcolm's [Nginx](https://nginx.org/) web server and a remote LDAP server. When you run [`auth_setup`](#AuthSetup) for the first time, a sample LDAP configuration file is created at `nginx/nginx_ldap.conf`. + +``` +# This is a sample configuration for the ldap_server section of nginx.conf. +# Yours will vary depending on how your Active Directory/LDAP server is configured. +# See https://github.com/kvspb/nginx-auth-ldap#available-config-parameters for options. + +ldap_server ad_server { + url "ldap://ds.example.com:3268/DC=ds,DC=example,DC=com?sAMAccountName?sub?(objectClass=person)"; + + binddn "bind_dn"; + binddn_passwd "bind_dn_password"; + + group_attribute member; + group_attribute_is_dn on; + require group "CN=Malcolm,CN=Users,DC=ds,DC=example,DC=com"; + require valid_user; + satisfy all; +} + +auth_ldap_cache_enabled on; +auth_ldap_cache_expiration_time 10000; +auth_ldap_cache_size 1000; +``` + +This file is mounted into the `nginx` container when Malcolm is started to provide connection information for the LDAP server. + +The contents of `nginx_ldap.conf` will vary depending on how the LDAP server is configured. Some of the [avaiable parameters](https://github.com/kvspb/nginx-auth-ldap#available-config-parameters) in that file include: + +* **`url`** - the `ldap://` or `ldaps://` connection URL for the remote LDAP server, which has the [following syntax](https://www.ietf.org/rfc/rfc2255.txt): `ldap[s]://:/???` +* **`binddn`** and **`binddn_password`** - the account credentials used to query the LDAP directory +* **`group_attribute`** - the group attribute name which contains the member object (e.g., `member` or `memberUid`) +* **`group_attribute_is_dn`** - whether or not to search for the user's full distinguished name as the value in the group's member attribute +* **`require`** and **`satisfy`** - `require user`, `require group` and `require valid_user` can be used in conjunction with `satisfy any` or `satisfy all` to limit the users that are allowed to access the Malcolm instance + +Before starting Malcolm, edit `nginx/nginx_ldap.conf` according to the specifics of your LDAP server and directory tree structure. Using a LDAP search tool such as [`ldapsearch`](https://www.openldap.org/software/man.cgi?query=ldapsearch) in Linux or [`dsquery`](https://social.technet.microsoft.com/wiki/contents/articles/2195.active-directory-dsquery-commands.aspx) in Windows may be of help as you formulate the configuration. Your changes should be made within the curly braces of the `ldap_server ad_server { … }` section. You can troubleshoot configuration file syntax errors and LDAP connection or credentials issues by running `./scripts/logs` (or `docker-compose logs nginx`) and examining the output of the `nginx` container. + +The **Malcolm User Management** page described above is not available when using LDAP authentication. + +# LDAP connection security + +Authentication over LDAP can be done using one of three ways, [two of which](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-adts/8e73932f-70cf-46d6-88b1-8d9f86235e81) offer data confidentiality protection: + +* **StartTLS** - the [standard extension](https://tools.ietf.org/html/rfc2830) to the LDAP protocol to establish an encrypted SSL/TLS connection within an already established LDAP connection +* **LDAPS** - a commonly used (though unofficial and considered deprecated) method in which SSL negotiation takes place before any commands are sent from the client to the server +* **Unencrypted** (cleartext) (***not recommended***) + +In addition to the `NGINX_BASIC_AUTH` environment variable being set to `false` in the `x-auth-variables` section near the top of the [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) file, the `NGINX_LDAP_TLS_STUNNEL` and `NGINX_LDAP_TLS_STUNNEL` environment variables are used in conjunction with the values in `nginx/nginx_ldap.conf` to define the LDAP connection security level. Use the following combinations of values to achieve the connection security methods above, respectively: + +* **StartTLS** + - `NGINX_LDAP_TLS_STUNNEL` set to `true` in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) + - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` +* **LDAPS** + - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) + - `url` should begin with `ldaps://` and its port should be either the default LDAPS port (636) or the default LDAPS Global Catalog port (3269) in `nginx/nginx_ldap.conf` +* **Unencrypted** (clear text) (***not recommended***) + - `NGINX_LDAP_TLS_STUNNEL` set to `false` in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) + - `url` should begin with `ldap://` and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) in `nginx/nginx_ldap.conf` + +For encrypted connections (whether using **StartTLS** or **LDAPS**), Malcolm will require and verify certificates when one or more trusted CA certificate files are placed in the `nginx/ca-trust/` directory. Otherwise, any certificate presented by the domain server will be accepted. + +# TLS certificates + +When you [set up authentication](#AuthSetup) for Malcolm a set of unique [self-signed](https://en.wikipedia.org/wiki/Self-signed_certificate) TLS certificates are created which are used to secure the connection between clients (e.g., your web browser) and Malcolm's browser-based interface. This is adequate for most Malcolm instances as they are often run locally or on internal networks, although your browser will most likely require you to add a security exception for the certificate the first time you connect to Malcolm. + +Another option is to generate your own certificates (or have them issued to you) and have them placed in the `nginx/certs/` directory. The certificate and key file should be named `cert.pem` and `key.pem`, respectively. + +A third possibility is to use a third-party reverse proxy (e.g., [Traefik](https://doc.traefik.io/traefik/) or [Caddy](https://caddyserver.com/docs/quick-starts/reverse-proxy)) to handle the issuance of the certificates for you and to broker the connections between clients and Malcolm. Reverse proxies such as these often implement the [ACME](https://datatracker.ietf.org/doc/html/rfc8555) protocol for domain name authentication and can be used to request certificates from certificate authorities like [Let's Encrypt](https://letsencrypt.org/how-it-works/). In this configuration, the reverse proxy will be encrypting the connections instead of Malcolm, so you'll need to set the `NGINX_SSL` environment variable to `false` in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) (or answer `no` to the "Require encrypted HTTPS connections?" question posed by `install.py`). If you are setting `NGINX_SSL` to `false`, **make sure** you understand what you are doing and ensure that external connections cannot reach ports over which Malcolm will be communicating without encryption, including verifying your local firewall configuration. \ No newline at end of file diff --git a/docs/components.md b/docs/components.md new file mode 100644 index 000000000..c885f477d --- /dev/null +++ b/docs/components.md @@ -0,0 +1,59 @@ +# Components + +Malcolm leverages the following excellent open source tools, among others. + +* [Arkime](https://arkime.com/) (formerly Moloch) - for PCAP file processing, browsing, searching, analysis, and carving/exporting; Arkime itself consists of two parts: + * [capture](https://github.com/arkime/arkime/tree/master/capture) - a tool for traffic capture, as well as offline PCAP parsing and metadata insertion into OpenSearch + * [viewer](https://github.com/arkime/arkime/tree/master/viewer) - a browser-based interface for data visualization +* [OpenSearch](https://opensearch.org/) - a search and analytics engine for indexing and querying network traffic session metadata +* [Logstash](https://www.elastic.co/products/logstash) and [Filebeat](https://www.elastic.co/products/beats/filebeat) - for ingesting and parsing [Zeek](https://www.zeek.org/index.html) [Log Files](https://docs.zeek.org/en/stable/script-reference/log-files.html) and ingesting them into OpenSearch in a format that Arkime understands in the same way it natively understands PCAP data +* [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) - for creating additional ad-hoc visualizations and dashboards beyond that which is provided by Arkime viewer +* [Zeek](https://www.zeek.org/index.html) - a network analysis framework and IDS +* [Suricata](https://suricata.io/) - an IDS and threat detection engine +* [Yara](https://github.com/VirusTotal/yara) - a tool used to identify and classify malware samples +* [Capa](https://github.com/fireeye/capa) - a tool for detecting capabilities in executable files +* [ClamAV](https://www.clamav.net/) - an antivirus engine for scanning files extracted by Zeek +* [CyberChef](https://github.com/gchq/CyberChef) - a "swiss-army knife" data conversion tool +* [jQuery File Upload](https://github.com/blueimp/jQuery-File-Upload) - for uploading PCAP files and Zeek logs for processing +* [List.js](https://github.com/javve/list.js) - for the [host and subnet name mapping](host-and-subnet-mapping.md#HostAndSubnetNaming) interface +* [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/) - for simple, reproducible deployment of the Malcolm appliance across environments and to coordinate communication between its various components +* [NetBox](https://netbox.dev/) - a suite for modeling and documenting modern networks +* [PostgreSQL](https://www.postgresql.org/) - a relational database for persisting NetBox's data +* [Redis](https://redis.io/) - an in-memory data store for caching NetBox session information +* [Nginx](https://nginx.org/) - for HTTPS and reverse proxying Malcolm components +* [nginx-auth-ldap](https://github.com/kvspb/nginx-auth-ldap) - an LDAP authentication module for nginx +* [Fluent Bit](https://fluentbit.io/) - for forwarding metrics to Malcolm from [network sensors](live-analysis.md#Hedgehog) (packet capture appliances) +* [Mark Baggett](https://github.com/MarkBaggett)'s [freq](https://github.com/MarkBaggett/freq) - a tool for calculating entropy of strings +* [Florian Roth](https://github.com/Neo23x0)'s [Signature-Base](https://github.com/Neo23x0/signature-base) Yara ruleset +* These Zeek plugins: + * some of Amazon.com, Inc.'s [ICS protocol](https://github.com/amzn?q=zeek) analyzers + * Andrew Klaus's [Sniffpass](https://github.com/cybera/zeek-sniffpass) plugin for detecting cleartext passwords in HTTP POST requests + * Andrew Klaus's [zeek-httpattacks](https://github.com/precurse/zeek-httpattacks) plugin for detecting noncompliant HTTP requests + * ICS protocol analyzers for Zeek published by [DHS CISA](https://github.com/cisagov/ICSNPP) and [Idaho National Lab](https://github.com/idaholab/ICSNPP) + * Corelight's ["bad neighbor" (CVE-2020-16898)](https://github.com/corelight/CVE-2020-16898) plugin + * Corelight's ["Log4Shell" (CVE-2021-44228)](https://github.com/corelight/cve-2021-44228) plugin + * Corelight's ["OMIGOD" (CVE-2021-38647)](https://github.com/corelight/CVE-2021-38647) plugin + * Corelight's [Apache HTTP server 2.4.49-2.4.50 path traversal/RCE vulnerability (CVE-2021-41773)](https://github.com/corelight/CVE-2021-41773) plugin + * Corelight's [bro-xor-exe](https://github.com/corelight/bro-xor-exe-plugin) plugin + * Corelight's [callstranger-detector](https://github.com/corelight/callstranger-detector) plugin + * Corelight's [community ID](https://github.com/corelight/zeek-community-id) flow hashing plugin + * Corelight's [DCE/RPC remote code execution vulnerability (CVE-2022-26809)](https://github.com/corelight/cve-2022-26809) plugin + * Corelight's [HTTP More Filenames](https://github.com/corelight/http-more-files-names) plugin + * Corelight's [HTTP protocol stack vulnerability (CVE-2021-31166)](https://github.com/corelight/CVE-2021-31166) plugin + * Corelight's [pingback](https://github.com/corelight/pingback) plugin + * Corelight's [ripple20](https://github.com/corelight/ripple20) plugin + * Corelight's [SIGred](https://github.com/corelight/SIGred) plugin + * Corelight's [VMware Workspace ONE Access and Identity Manager RCE vulnerability (CVE-2022-22954)](https://github.com/corelight/cve-2022-22954) plugin + * Corelight's [Zerologon](https://github.com/corelight/zerologon) plugin + * Corelight's [Microsoft Excel privilege escalation detection (CVE-2021-42292)](https://github.com/corelight/CVE-2021-42292) plugin + * J-Gras' [Zeek::AF_Packet](https://github.com/J-Gras/zeek-af_packet-plugin) plugin + * Johanna Amann's [CVE-2020-0601](https://github.com/0xxon/cve-2020-0601) ECC certificate validation plugin and [CVE-2020-13777](https://github.com/0xxon/cve-2020-13777) GnuTLS unencrypted session ticket detection plugin + * Lexi Brent's [EternalSafety](https://github.com/0xl3x1/zeek-EternalSafety) plugin + * MITRE Cyber Analytics Repository's [Bro/Zeek ATT&CK®-Based Analytics (BZAR)](https://github.com/mitre-attack/car/tree/master/implementations) script + * Salesforce's [gQUIC](https://github.com/salesforce/GQUIC_Protocol_Analyzer) analyzer + * Salesforce's [HASSH](https://github.com/salesforce/hassh) SSH fingerprinting plugin + * Salesforce's [JA3](https://github.com/salesforce/ja3) TLS fingerprinting plugin + * Zeek's [Spicy](https://github.com/zeek/spicy) plugin framework +* [GeoLite2](https://dev.maxmind.com/geoip/geoip2/geolite2/) - Malcolm includes GeoLite2 data created by [MaxMind](https://www.maxmind.com) + +![Malcolm Components](./images/malcolm_components.png) \ No newline at end of file diff --git a/docs/contributing-dashboards.md b/docs/contributing-dashboards.md new file mode 100644 index 000000000..785c9fe2a --- /dev/null +++ b/docs/contributing-dashboards.md @@ -0,0 +1,41 @@ +# OpenSearch Dashboards + +[OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) is an open-source fork of [Kibana](https://www.elastic.co/kibana/), which is [no longer open-source software](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0). + +## Adding new visualizations and dashboards + +Visualizations and dashboards can be [easily created](dashboards.md#BuildDashboard) in OpenSearch Dashboards using its drag-and-drop WYSIWIG tools. Assuming you've created a new dashboard you wish to package with Malcolm, the dashboard and its visualization components can be exported using the following steps: + +1. Identify the ID of the dashboard (found in the URL: e.g., for `/dashboards/app/dashboards#/view/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` the ID would be `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) +1. Export the dashboard with that ID and save it in the `./dashboards./dashboards/` directory with the following command: + ``` + export DASHID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx && \ + docker-compose exec dashboards curl -XGET \ + "http://localhost:5601/dashboards/api/opensearch-dashboards/dashboards/export?dashboard=$DASHID" > \ + ./dashboards/dashboards/$DASHID.json + ``` +1. It's preferrable for Malcolm to dynamically create the `arkime_sessions3-*` index template rather than including it in imported dashboards, so edit the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.json` that was generated, and carefully locate and remove the section with the `id` of `arkime_sessions3-*` and the `type` of `index-pattern` (including the comma preceding it): + ``` + , + { + "id": "arkime_sessions3-*", + "type": "index-pattern", + "namespaces": [ + "default" + ], + "updated_at": "2021-12-13T18:21:42.973Z", + "version": "Wzk3MSwxXQ==", + … + "references": [], + "migrationVersion": { + "index-pattern": "7.6.0" + } + } + ``` +1. Include the new dashboard either by using a [bind mount](contributing-local-modifications.md#Bind) for the `./dashboards./dashboards/` directory or by [rebuilding](development.md#Build) the `dashboards-helper` Docker image. Dashboards are imported the first time Malcolm starts up. + +## OpenSearch Dashboards plugins + +The [dashboards.Dockerfile](../Dockerfiles/dashboards.Dockerfile) installs the OpenSearch Dashboards plugins used by Malcolm (search for `opensearch-dashboards-plugin install` in that file). Additional Dashboards plugins could be installed by modifying this Dockerfile and [rebuilding](development.md#Build) the `dashboards` Docker image. + +Third-party or community plugisn developed for Kibana will not install into OpenSearch dashboards without source code modification. Depending on the plugin, this could range from very smiple to very complex. As an illustrative example, the changes that were required to port the Sankey diagram visualization plugin from Kibana to OpenSearch Dashboards compatibility can be [viewed on GitHub](https://github.com/mmguero-dev/osd_sankey_vis/compare/edacf6b...main). diff --git a/docs/contributing-file-scanners.md b/docs/contributing-file-scanners.md new file mode 100644 index 000000000..2118516d8 --- /dev/null +++ b/docs/contributing-file-scanners.md @@ -0,0 +1,14 @@ +# Carved file scanners + +Similar to the [PCAP processing pipeline](contributing-pcap.md#PCAP) described above, new tools can plug into Malcolm's [automatic file extraction and scanning](file-scanning.md#ZeekFileExtraction) to examine file transfers carved from network traffic. + +When Zeek extracts a file it observes being transfered in network traffic, the `file-monitor` container picks up those extracted files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that extracted file. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), currently implemented file scanners include ClamAV, YARA, capa and VirusTotal, all of which are managed by the `file-monitor` container. The scripts involved in this code are: + +* [shared/bin/zeek_carve_watcher.py](../shared/bin/zeek_carve_watcher.py) - watches the directory to which Zeek extracts files and publishes information about those files to the ZeroMQ ventilator on port 5987 +* [shared/bin/zeek_carve_scanner.py](../shared/bin/zeek_carve_scanner.py) - subscribes to `zeek_carve_watcher.py`'s topic and performs file scanning for the ClamAV, YARA, capa and VirusTotal engines and sends "hits" to another ZeroMQ sync on port 5988 +* [shared/bin/zeek_carve_logger.py](../shared/bin/zeek_carve_logger.py) - subscribes to `zeek_carve_scanner.py`'s topic and logs hits to a "fake" Zeek signatures.log file which is parsed and ingested by Logstash +* [shared/bin/zeek_carve_utils.py](../shared/bin/zeek_carve_utils.py) - various variables and classes related to carved file scanning + +Additional file scanners could either be added to the `file-monitor` service, or to avoid coupling with Malcolm's code you could simply define a new service as instructed in the [Adding a new service](contributing-new-image.md#NewImage) section and write your own scripts to subscribe and publish to the topics as described above. While that might be a bit of hand-waving, these general steps take care of the plumbing around extracting the file and notifying your tool, as well as handling the logging of "hits": you shouldn't have to really edit any *existing* code to add a new carved file scanner. + +The `EXTRACTED_FILE_PIPELINE_DEBUG` and `EXTRACTED_FILE_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the carved file processing pipeline. \ No newline at end of file diff --git a/docs/contributing-guide.md b/docs/contributing-guide.md new file mode 100644 index 000000000..47b984eeb --- /dev/null +++ b/docs/contributing-guide.md @@ -0,0 +1,26 @@ +# Malcolm Contributor Guide + +The purpose of this document is to provide some direction for those willing to modify Malcolm, whether for local customization or for contribution to the Malcolm project. + + +* [Local modifications](contributing-local-modifications.md#LocalMods) + + [Docker bind mounts](contributing-local-modifications.md#Bind) + + [Building Malcolm's Docker images](contributing-local-modifications.md#ContribBuild) +* [Adding a new service (Docker image)](contributing-new-image.md#NewImage) + + [Networking and firewall](contributing-new-image.md#NewImageFirewall) +* [Adding new log fields](contributing-new-log-fields.md#NewFields) +- [Zeek](contributing-zeek.md#Zeek) + + [`local.zeek`](contributing-zeek.md#LocalZeek) + + [Adding a new Zeek package](contributing-zeek.md#ZeekPackage) + + [Zeek Intelligence Framework](contributing-zeek.md#ContributingZeekIntel) +* [PCAP processors](contributing-pcap.md#PCAP) +* [Logstash](contributing-logstash.md#Logstash) + + [Parsing a new log data source](contributing-logstash.md#LogstashNewSource) + + [Parsing new Zeek logs](contributing-logstash.md#LogstashZeek) + + [Enrichments](contributing-logstash.md#LogstashEnrichments) + + [Logstash plugins](contributing-logstash.md#LogstashPlugins) +* [OpenSearch Dashboards](contributing-dashboards.md#dashboards) + + [Adding new visualizations and dashboards](contributing-dashboards.md#DashboardsNewViz) + + [OpenSearch Dashboards plugins](contributing-dashboards.md#DashboardsPlugins) +* [Carved file scanners](contributing-file-scanners.md#Scanners) +* [Style](contributing-style.md#Style) diff --git a/docs/contributing-local-modifications.md b/docs/contributing-local-modifications.md new file mode 100644 index 000000000..d69ee9674 --- /dev/null +++ b/docs/contributing-local-modifications.md @@ -0,0 +1,126 @@ +# Local modifications + +There are several ways to customize Malcolm's runtime behavior via local changes to configuration files. Many commonly-tweaked settings are discussed in the project [README](README.md) (see [`docker-compose.yml` parameters](malcolm-config.md#DockerComposeYml) and [Customizing event severity scoring](severity.md#SeverityConfig) for some examples). + +## Docker bind mounts + +Some configuration changes can be put in place by modifying local copies of configuration files and then use a [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) to overlay the modified file onto the running Malcolm container. This is already done for many files and directories used to persist Malcolm configuration and data. For example, the default list of bind mounted files and directories for each Malcolm service is as follows: + +``` +$ grep -P "^( - ./| [\w-]+:)" docker-compose-standalone.yml + opensearch: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro + - ./opensearch/opensearch.keystore:/usr/share/opensearch/config/opensearch.keystore:rw + - ./opensearch:/usr/share/opensearch/data:delegated + - ./opensearch-backup:/opt/opensearch/backup:delegated + dashboards-helper: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + dashboards: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + logstash: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro + - ./logstash/maps/malcolm_severity.yaml:/etc/malcolm_severity.yaml:ro + - ./logstash/certs/ca.crt:/certs/ca.crt:ro + - ./logstash/certs/server.crt:/certs/server.crt:ro + - ./logstash/certs/server.key:/certs/server.key:ro + - ./cidr-map.txt:/usr/share/logstash/config/cidr-map.txt:ro + - ./host-map.txt:/usr/share/logstash/config/host-map.txt:ro + - ./net-map.json:/usr/share/logstash/config/net-map.json:ro + filebeat: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./zeek-logs:/zeek + - ./suricata-logs:/suricata + - ./filebeat/certs/ca.crt:/certs/ca.crt:ro + - ./filebeat/certs/client.crt:/certs/client.crt:ro + - ./filebeat/certs/client.key:/certs/client.key:ro + arkime: + - ./auth.env + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./pcap:/data/pcap + - ./arkime-logs:/opt/arkime/logs + - ./arkime-raw:/opt/arkime/raw + zeek: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./pcap:/pcap + - ./zeek-logs/upload:/zeek/upload + - ./zeek-logs/extract_files:/zeek/extract_files + - ./zeek/intel:/opt/zeek/share/zeek/site/intel + zeek-live: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./zeek-logs/live:/zeek/live + - ./zeek-logs/extract_files:/zeek/extract_files + - ./zeek/intel:/opt/zeek/share/zeek/site/intel + suricata: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./suricata-logs:/var/log/suricata + - ./pcap:/data/pcap + - ./suricata/rules:/opt/suricata/rules:ro + suricata-live: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./suricata-logs:/var/log/suricata + - ./suricata/rules:/opt/suricata/rules:ro + file-monitor: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./zeek-logs/extract_files:/zeek/extract_files + - ./zeek-logs/current:/zeek/logs + - ./yara/rules:/yara-rules/custom:ro + pcap-capture: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./pcap/upload:/pcap + pcap-monitor: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + - ./zeek-logs:/zeek + - ./pcap:/pcap + upload: + - ./auth.env + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./pcap/upload:/var/www/upload/server/php/chroot/files + htadmin: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./htadmin/config.ini:/var/www/htadmin/config/config.ini:rw + - ./htadmin/metadata:/var/www/htadmin/config/metadata:rw + - ./nginx/htpasswd:/var/www/htadmin/config/htpasswd:rw + freq: + - ./nginx/ca-trust:/var/local/ca-trust:ro + name-map-ui: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./cidr-map.txt:/var/www/html/maps/cidr-map.txt:ro + - ./host-map.txt:/var/www/html/maps/host-map.txt:ro + - ./net-map.json:/var/www/html/maps/net-map.json:rw + api: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro + nginx-proxy: + - ./nginx/ca-trust:/var/local/ca-trust:ro + - ./nginx/nginx_ldap.conf:/etc/nginx/nginx_ldap.conf:ro + - ./nginx/htpasswd:/etc/nginx/.htpasswd:ro + - ./nginx/certs:/etc/nginx/certs:ro + - ./nginx/certs/dhparam.pem:/etc/nginx/dhparam/dhparam.pem:ro +``` + +So, for example, if you wanted to make a change to the `nginx-proxy` container's `nginx.conf` file, you could add the following line to the `volumes:` section of the `nginx-proxy` service in your `docker-compose.yml` file: + +``` +- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro +``` + +The change would take effect after stopping and starting Malcolm. + +See the documentation on [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) for more information on this technique. + +## Building Malcolm's Docker images + +Another method for modifying your local copies of Malcolm's services' containers is to [build your own](development.md#Build) containers with the modifications baked-in. + +For example, say you wanted to create a Malcolm container which includes a new dashboard for OpenSearch Dashboards and a new enrichment filter `.conf` file for Logstash. After placing these files under `./dashboards/dashboards` and `./logstash/pipelines/enrichment`, respectively, in your Malcolm working copy, run `./build.sh dashboards-helper logstash` to build just those containers. After the build completes, you can run `docker images` and see you have fresh images for `malcolmnetsec/dashboards-helper` and `malcolmnetsec/logstash-oss`. You may need to review the contents of the [Dockerfiles](../Dockerfiles) to determine the correct service and filesystem location within that service's Docker image depending on what you're trying to accomplish. + +Alternately, if you have forked Malcolm on GitHub, [workflow files](../.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](live-analysis.md#Hedgehog) and [Malcolm](malcolm-iso.md#ISO) installer ISOs. The resulting images are named according to the pattern `ghcr.io/owner/malcolmnetsec/image:branch` (e.g., if you've forked Malcolm with the github user `romeogdetlevjr`, the `arkime` container built for the `main` would be named `ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main`). To run your local instance of Malcolm using these images instead of the official ones, you'll need to edit your `docker-compose.yml` file(s) and replace the `image:` tags according to this new pattern, or use the bash helper script `./shared/bin/github_image_helper.sh` to pull and re-tag the images. \ No newline at end of file diff --git a/docs/contributing-logstash.md b/docs/contributing-logstash.md new file mode 100644 index 000000000..610fda03e --- /dev/null +++ b/docs/contributing-logstash.md @@ -0,0 +1,51 @@ +# Logstash + +## Parsing a new log data source + +Let's continue with the example of the `cooltool` service we added in the [PCAP processors](contributing-pcap.md#PCAP) section above, assuming that `cooltool` generates some textual log files we want to parse and index into Malcolm. + +You'd have configured `cooltool` in your `cooltool.Dockerfile` and its section in the `docker-compose` files to write logs into a subdirectory or subdirectories in a shared folder [bind mounted](contributing-local-modifications.md#Bind) in such a way that both the `cooltool` and `filebeat` containers can access. Referring to the `zeek` container as an example, this is how the `./zeek-logs` folder is handled; both the `filebeat` and `zeek` services have `./zeek-logs` in their `volumes:` section: + +``` +$ grep -P "^( - ./zeek-logs| [\w-]+:)" docker-compose.yml | grep -B1 "zeek-logs" + filebeat: + - ./zeek-logs:/data/zeek +-- + zeek: + - ./zeek-logs/upload:/zeek/upload +… +``` + +You'll need to provide access to your `cooltool` logs in a similar fashion. + +Next, tweak [`filebeat.yml`](../filebeat/filebeat.yml) by adding a new log input path pointing to the `cooltool` logs to send them along to the `logstash` container. This modified `filebeat.yml` will need to be reflected in the `filebeat` container via [bind mount](contributing-local-modifications.md#Bind) or by [rebuilding](development.md#Build) it. + +Logstash can then be easily extended to add more [`logstash/pipelines`](../logstash/pipelines). At the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), the Logstash pipelines basically look like this: + +* input (from `filebeat`) sends logs to 1..*n* **parse pipelines** +* each **parse pipeline** does what it needs to do to parse its logs then sends them to the [**enrichment pipeline**](#LogstashEnrichments) +* the [**enrichment pipeline**](../logstash/pipelines/enrichment) performs common lookups to the fields that have been normalized and indexes the logs into the OpenSearch data store + +So, in order to add a new **parse pipeline** for `cooltool` after tweaking [`filebeat.yml`](../filebeat/filebeat.yml) as described above, create a `cooltool` directory under [`logstash/pipelines`](../logstash/pipelines) which follows the same pattern as the `zeek` parse pipeline. This directory will have an input file (tiny), a filter file (possibly large), and an output file (tiny). In your filter file, be sure to set the field [`event.hash`](https://www.elastic.co/guide/en/ecs/master/ecs-event.html#field-event-hash) to a unique value to identify indexed documents in OpenSearch; the [fingerprint filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html) may be useful for this. + +Finally, in your `docker-compose` files, set a new `LOGSTASH_PARSE_PIPELINE_ADDRESSES` environment variable under `logstash-variables` to `cooltool-parse,zeek-parse,suricata-parse,beats-parse` (assuming you named the pipeline address from the previous step `cooltool-parse`) so that logs sent from `filebeat` to `logstash` are forwarded to all parse pipelines. + +## Parsing new Zeek logs + +The following modifications must be made in order for Malcolm to be able to parse new Zeek log files: + +1. Add a parsing section to [`logstash/pipelines/zeek/11_zeek_logs.conf`](../logstash/pipelines/zeek/11_zeek_logs.conf) + * Follow patterns for existing log files as an example + * For common Zeek fields like the `id` four-tuple, timestamp, etc., use the same convention used by existing Zeek logs in that file (e.g., `ts`, `uid`, `orig_h`, `orig_p`, `resp_h`, `resp_p`) + * Take care, especially when copy-pasting filter code, that the Zeek delimiter isn't modified from a tab character to a space character (see "*zeek's default delimiter is a literal tab, MAKE SURE YOUR EDITOR DOESN'T SCREW IT UP*" warnings in that file) +1. If necessary, perform log normalization in [`logstash/pipelines/zeek/12_zeek_normalize.conf`](../logstash/pipelines/zeek/12_zeek_normalize.conf) for values like action (`event.action`), result (`event.result`), application protocol version (`network.protocol_version`), etc. +1. If necessary, define conversions for floating point or integer values in [`logstash/pipelines/zeek/11_zeek_logs.conf`](../logstash/pipelines/zeek/13_zeek_convert.conf) +1. Identify the new fields and add them as described in [Adding new log fields](contributing-new-log-fields.md#NewFields) + +## Enrichments + +Malcolm's Logstash instance will do a lot of enrichments for you automatically: see the [enrichment pipeline](../logstash/pipelines/enrichment), including MAC address to vendor by OUI, GeoIP, ASN, and a few others. In order to take advantage of these enrichments that are already in place, normalize new fields to use the same standardized field names Malcolm uses for things like IP addresses, MAC addresses, etc. You can add your own additional enrichments by creating new `.conf` files containing [Logstash filters](https://www.elastic.co/guide/en/logstash/7.10/filter-plugins.html) in the [enrichment pipeline](../logstash/pipelines/enrichment) directory and using either of the techniques in the [Local modifications](contributing-local-modifications.md#LocalMods) section to implement your changes in the `logstash` container + +## Logstash plugins + +The [logstash.Dockerfile](../Dockerfiles/logstash.Dockerfile) installs the Logstash plugins used by Malcolm (search for `logstash-plugin install` in that file). Additional Logstash plugins could be installed by modifying this Dockerfile and [rebuilding](development.md#Build) the `logstash` Docker image. \ No newline at end of file diff --git a/docs/contributing-new-image.md b/docs/contributing-new-image.md new file mode 100644 index 000000000..e8c8fb566 --- /dev/null +++ b/docs/contributing-new-image.md @@ -0,0 +1,18 @@ +# Adding a new service (Docker image) + +A new service can be added to Malcolm by following the following steps: + +1. Create a new subdirectory for the service (under the Malcolm working copy base directory) containing whatever source or configuration files are necessary to build and run the service +1. Create the service's Dockerfile in the [Dockerfiles](../Dockerfiles) directory of your Malcolm working copy +1. Add a new section for your service under `services:` in the `docker-compose.yml` and `docker-compose-standalone.yml` files +1. If you want to enable automatic builds for your service on GitHub, create a new [workflow](../.github/workflows/), using an existing workflow as an example + +## Networking and firewall + +If your service needs to expose a web interface to the user, you'll need to adjust the following files: + +* Ensure your service's section in the `docker-compose` files uses the `expose` directive to indicate which ports its providing +* Add the service to the `depends_on` section of the `nginx-proxy` service in the `docker-compose` files +* Modify the configuration of the `nginx-proxy` container (in [`nginx/nginx.conf`](../nginx/nginx.conf)) to define `upstream` and `location` directives to point to your service + +Avoid publishing ports directly from your container to the host machine's network interface if at all possible. The `nginx-proxy` container handles encryption and authentication and should sit in front of any user-facing interface provided by Malcolm. \ No newline at end of file diff --git a/docs/contributing-new-log-fields.md b/docs/contributing-new-log-fields.md new file mode 100644 index 000000000..17d7c09bb --- /dev/null +++ b/docs/contributing-new-log-fields.md @@ -0,0 +1,11 @@ +# Adding new log fields + +As several of the sections in this document will reference adding new data source fields, we'll cover that here at the beginning. + +Although OpenSearch is a NoSQL database and as-such is "unstructured" and "schemaless," in order to add a new data source field you'll need to define that field in a few places in order for it to show up and be usable throughout Malcolm. Minimally, you'll probably want to do it in these three files + +* [`arkime/etc/config.ini`](../arkime/etc/config.ini) - follow existing examples in the `[custom-fields]` and `[custom-views]` sections in order for [Arkime](https://arkime.com) to be aware of your new fields +* [`arkime/wise/source.zeeklogs.js`](../arkime/wise/source.zeeklogs.js) - add new fields to the `allFields` array for Malcolm to create Arkime [value actions](https://arkime.com/settings#right-click) for your fields +* [`dashboards/templates/composable/component/__(name)__.json`](../dashboards/templates/composable/component/) - add new fields to a new [composable index template](https://opensearch.org/docs/latest/opensearch/index-templates/#composable-index-templates) file in this directory and add its name (prefixed with `custom_`) to the `composed_of` section of [`dashboards/templates/malcolm_template.json`](../dashboards/templates/malcolm_template.json) in order for it to be included as part of the `arkime_sessions3-*` [index template](https://opensearch.org/docs/latest/opensearch/index-templates/) used by Arkime and OpenSearch Dashboards in Malcolm + +When possible, I recommend you to use (or at least take inspiration from) the [Elastic Common Schema (ECS) Reference](https://www.elastic.co/guide/en/ecs/current/index.html) when deciding how to define new field names. \ No newline at end of file diff --git a/docs/contributing-pcap.md b/docs/contributing-pcap.md new file mode 100644 index 000000000..a896f19d8 --- /dev/null +++ b/docs/contributing-pcap.md @@ -0,0 +1,12 @@ +# PCAP processors + +When a PCAP is uploaded (either through Malcolm's [upload web interface](upload.md#Upload) or just copied manually into the `./pcap/upload/` directory), the `pcap-monitor` container has a script that picks up those PCAP files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that PCAP. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), there are two of those: the `zeek` container and the `arkime` container. In Malcolm, they actually both share the [same script](../shared/bin/pcap_processor.py) to read from that topic and run the PCAP through Zeek and Arkime, respectively. If you're looking for an example to follow, the `zeek` container is the less complicated of the two. So, if you were looking to integrate a new PCAP processing tool into Malcolm (named `cooltool` for this example), the process would be something like: + +1. Define your service as instructed in the [Adding a new service](contributing-new-image.md#NewImage) section + * Note how the existing `zeek` and `arkime` services use [bind mounts](contributing-local-modifications.md#Bind) to access the local `./pcap` directory +1. Write a script (modelled after [the one](../shared/bin/pcap_processor.py) `zeek` and `arkime` use, if you like) which subscribes to the PCAP topic port (`30441` as defined in [pcap_utils.py](../shared/bin/pcap_utils.py)) and handles the PCAP files published there, each PCAP file represented by a JSON dictionary with `name`, `tags`, `size`, `type` and `mime` keys (search for `FILE_INFO_` in [pcap_utils.py](../shared/bin/pcap_utils.py)). This script should be added to and run by your `cooltool.Dockerfile`-generated container. +1. Add whatever other logic needed to get your tool's data into Malcolm, whether by writing it directly info OpenSearch or by sending log files for parsing and enrichment by [Logstash](contributing-logstash.md#Logstash) (especially see the section on [Parsing a new log data source](contributing-logstash.md#LogstashNewSource)) + +While that might be a bit of hand-waving, these general steps take care of the PCAP processing piece: you shouldn't have to really edit any *existing* code to add a new PCAP processor. You're just creating a new container for the Malcolm appliance to the ZeroMQ topic and handle the PCAPs your tool receives. + +The `PCAP_PIPELINE_DEBUG` and `PCAP_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the PCAP processing pipeline. diff --git a/docs/contributing-style.md b/docs/contributing-style.md new file mode 100644 index 000000000..9211b23e1 --- /dev/null +++ b/docs/contributing-style.md @@ -0,0 +1,5 @@ +# Style + +## Python + +For Python code found in Malcolm, the author uses [Black: The uncompromising Python code formatter](https://github.com/psf/black) with the options `--line-length 120 --skip-string-normalization`. \ No newline at end of file diff --git a/docs/contributing-zeek.md b/docs/contributing-zeek.md new file mode 100644 index 000000000..9564681de --- /dev/null +++ b/docs/contributing-zeek.md @@ -0,0 +1,15 @@ +# Zeek + +## `local.zeek` + +Some Zeek behavior can be tweaked without having to manually edit configuration files through the use of environment variables: search for `ZEEK` in the [`docker-compose.yml` parameters](malcolm-config.md#DockerComposeYml) section of the documentation. + +Other changes to Zeek's behavior could be made by modifying [local.zeek](../zeek/config/local.zeek) and either using a [bind mount](contributing-local-modifications.md#Bind) or [rebuilding](development.md#Build) the `zeek` Docker image with the modification. See the [Zeek documentation](https://docs.zeek.org/en/master/quickstart.html#local-site-customization) for more information on customizing a Zeek instance. Note that changing Zeek's behavior could result in changes to the format of the logs Zeek generates, which could break Malcolm's parsing of those logs, so exercise caution. + +## Adding a new Zeek package + +The easiest way to add a new Zeek package to Malcolm is to add the git URL of that package to the `ZKG_GITHUB_URLS` array in [zeek_install_plugins.sh](../shared/bin/zeek_install_plugins.sh) script and then [rebuilding](development.md#Build) the `zeek` Docker image. This will cause your package to be installed (via the [`zkg`](https://docs.zeek.org/projects/package-manager/en/stable/zkg.html) command-line tool). See [Parsing new Zeek logs](contributing-logstash.md#LogstashZeek) on how to process any new `.log` files if your package generates them. + +## Zeek Intelligence Framework + +See [Zeek Intelligence Framework](zeek-intel.md#ZeekIntel) in the Malcolm README for information on how to use Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) with Malcolm. \ No newline at end of file diff --git a/docs/contributing/README.md b/docs/contributing/README.md deleted file mode 100644 index 8f9a1b86d..000000000 --- a/docs/contributing/README.md +++ /dev/null @@ -1,339 +0,0 @@ -# Malcolm Contributor Guide - -The purpose of this document is to provide some direction for those willing to modify Malcolm, whether for local customization or for contribution to the Malcolm project. - -## Table of Contents - -* [Local modifications](#LocalMods) - + [Docker bind mounts](#Bind) - + [Building Malcolm's Docker images](#Build) -* [Adding a new service (Docker image)](#NewImage) - + [Networking and firewall](#NewImageFirewall) -* [Adding new log fields](#NewFields) -- [Zeek](#Zeek) - + [`local.zeek`](#LocalZeek) - + [Adding a new Zeek package](#ZeekPackage) - + [Zeek Intelligence Framework](#ZeekIntel) -* [PCAP processors](#PCAP) -* [Logstash](#Logstash) - + [Parsing a new log data source](#LogstashNewSource) - + [Parsing new Zeek logs](#LogstashZeek) - + [Enrichments](#LogstashEnrichments) - + [Logstash plugins](#LogstashPlugins) -* [OpenSearch Dashboards](#dashboards) - + [Adding new visualizations and dashboards](#DashboardsNewViz) - + [OpenSearch Dashboards plugins](#DashboardsPlugins) -* [Carved file scanners](#Scanners) -* [Style](#Style) - -## Local modifications - -There are several ways to customize Malcolm's runtime behavior via local changes to configuration files. Many commonly-tweaked settings are discussed in the project [README](../../README.md) (see [`docker-compose.yml` parameters](../../README.md#DockerComposeYml) and [Customizing event severity scoring](../../README.md#SeverityConfig) for some examples). - -### Docker bind mounts - -Some configuration changes can be put in place by modifying local copies of configuration files and then use a [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) to overlay the modified file onto the running Malcolm container. This is already done for many files and directories used to persist Malcolm configuration and data. For example, the default list of bind mounted files and directories for each Malcolm service is as follows: - -``` -$ grep -P "^( - ./| [\w-]+:)" docker-compose-standalone.yml - opensearch: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro - - ./opensearch/opensearch.keystore:/usr/share/opensearch/config/opensearch.keystore:rw - - ./opensearch:/usr/share/opensearch/data:delegated - - ./opensearch-backup:/opt/opensearch/backup:delegated - dashboards-helper: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - dashboards: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - logstash: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./.opensearch.secondary.curlrc:/var/local/opensearch.secondary.curlrc:ro - - ./logstash/maps/malcolm_severity.yaml:/etc/malcolm_severity.yaml:ro - - ./logstash/certs/ca.crt:/certs/ca.crt:ro - - ./logstash/certs/server.crt:/certs/server.crt:ro - - ./logstash/certs/server.key:/certs/server.key:ro - - ./cidr-map.txt:/usr/share/logstash/config/cidr-map.txt:ro - - ./host-map.txt:/usr/share/logstash/config/host-map.txt:ro - - ./net-map.json:/usr/share/logstash/config/net-map.json:ro - filebeat: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./zeek-logs:/zeek - - ./suricata-logs:/suricata - - ./filebeat/certs/ca.crt:/certs/ca.crt:ro - - ./filebeat/certs/client.crt:/certs/client.crt:ro - - ./filebeat/certs/client.key:/certs/client.key:ro - arkime: - - ./auth.env - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./pcap:/data/pcap - - ./arkime-logs:/opt/arkime/logs - - ./arkime-raw:/opt/arkime/raw - zeek: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./pcap:/pcap - - ./zeek-logs/upload:/zeek/upload - - ./zeek-logs/extract_files:/zeek/extract_files - - ./zeek/intel:/opt/zeek/share/zeek/site/intel - zeek-live: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./zeek-logs/live:/zeek/live - - ./zeek-logs/extract_files:/zeek/extract_files - - ./zeek/intel:/opt/zeek/share/zeek/site/intel - suricata: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./suricata-logs:/var/log/suricata - - ./pcap:/data/pcap - - ./suricata/rules:/opt/suricata/rules:ro - suricata-live: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./suricata-logs:/var/log/suricata - - ./suricata/rules:/opt/suricata/rules:ro - file-monitor: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./zeek-logs/extract_files:/zeek/extract_files - - ./zeek-logs/current:/zeek/logs - - ./yara/rules:/yara-rules/custom:ro - pcap-capture: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./pcap/upload:/pcap - pcap-monitor: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - - ./zeek-logs:/zeek - - ./pcap:/pcap - upload: - - ./auth.env - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./pcap/upload:/var/www/upload/server/php/chroot/files - htadmin: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./htadmin/config.ini:/var/www/htadmin/config/config.ini:rw - - ./htadmin/metadata:/var/www/htadmin/config/metadata:rw - - ./nginx/htpasswd:/var/www/htadmin/config/htpasswd:rw - freq: - - ./nginx/ca-trust:/var/local/ca-trust:ro - name-map-ui: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./cidr-map.txt:/var/www/html/maps/cidr-map.txt:ro - - ./host-map.txt:/var/www/html/maps/host-map.txt:ro - - ./net-map.json:/var/www/html/maps/net-map.json:rw - api: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./.opensearch.primary.curlrc:/var/local/opensearch.primary.curlrc:ro - nginx-proxy: - - ./nginx/ca-trust:/var/local/ca-trust:ro - - ./nginx/nginx_ldap.conf:/etc/nginx/nginx_ldap.conf:ro - - ./nginx/htpasswd:/etc/nginx/.htpasswd:ro - - ./nginx/certs:/etc/nginx/certs:ro - - ./nginx/certs/dhparam.pem:/etc/nginx/dhparam/dhparam.pem:ro -``` - -So, for example, if you wanted to make a change to the `nginx-proxy` container's `nginx.conf` file, you could add the following line to the `volumes:` section of the `nginx-proxy` service in your `docker-compose.yml` file: - -``` -- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro -``` - -The change would take effect after stopping and starting Malcolm. - -See the documentation on [Docker bind mount](https://docs.docker.com/storage/bind-mounts/) for more information on this technique. - -### Building Malcolm's Docker images - -Another method for modifying your local copies of Malcolm's services' containers is to [build your own](../../README.md#Build) containers with the modifications baked-in. - -For example, say you wanted to create a Malcolm container which includes a new dashboard for OpenSearch Dashboards and a new enrichment filter `.conf` file for Logstash. After placing these files under `./dashboards/dashboards` and `./logstash/pipelines/enrichment`, respectively, in your Malcolm working copy, run `./build.sh dashboards-helper logstash` to build just those containers. After the build completes, you can run `docker images` and see you have fresh images for `malcolmnetsec/dashboards-helper` and `malcolmnetsec/logstash-oss`. You may need to review the contents of the [Dockerfiles](../../Dockerfiles) to determine the correct service and filesystem location within that service's Docker image depending on what you're trying to accomplish. - -Alternately, if you have forked Malcolm on GitHub, [workflow files](../../.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](#Hedgehog) and [Malcolm](#ISO) installer ISOs. The resulting images are named according to the pattern `ghcr.io/owner/malcolmnetsec/image:branch` (e.g., if you've forked Malcolm with the github user `romeogdetlevjr`, the `arkime` container built for the `main` would be named `ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main`). To run your local instance of Malcolm using these images instead of the official ones, you'll need to edit your `docker-compose.yml` file(s) and replace the `image:` tags according to this new pattern, or use the bash helper script `./shared/bin/github_image_helper.sh` to pull and re-tag the images. - -## Adding a new service (Docker image) - -A new service can be added to Malcolm by following the following steps: - -1. Create a new subdirectory for the service (under the Malcolm working copy base directory) containing whatever source or configuration files are necessary to build and run the service -1. Create the service's Dockerfile in the [Dockerfiles](../../Dockerfiles) directory of your Malcolm working copy -1. Add a new section for your service under `services:` in the `docker-compose.yml` and `docker-compose-standalone.yml` files -1. If you want to enable automatic builds for your service on GitHub, create a new [workflow](../../.github/workflows/), using an existing workflow as an example - -### Networking and firewall - -If your service needs to expose a web interface to the user, you'll need to adjust the following files: - -* Ensure your service's section in the `docker-compose` files uses the `expose` directive to indicate which ports its providing -* Add the service to the `depends_on` section of the `nginx-proxy` service in the `docker-compose` files -* Modify the configuration of the `nginx-proxy` container (in [`nginx/nginx.conf`](../../nginx/nginx.conf)) to define `upstream` and `location` directives to point to your service - -Avoid publishing ports directly from your container to the host machine's network interface if at all possible. The `nginx-proxy` container handles encryption and authentication and should sit in front of any user-facing interface provided by Malcolm. - -## Adding new log fields - -As several of the sections in this document will reference adding new data source fields, we'll cover that here at the beginning. - -Although OpenSearch is a NoSQL database and as-such is "unstructured" and "schemaless," in order to add a new data source field you'll need to define that field in a few places in order for it to show up and be usable throughout Malcolm. Minimally, you'll probably want to do it in these three files - -* [`arkime/etc/config.ini`](../../arkime/etc/config.ini) - follow existing examples in the `[custom-fields]` and `[custom-views]` sections in order for [Arkime](https://arkime.com) to be aware of your new fields -* [`arkime/wise/source.zeeklogs.js`](../../arkime/wise/source.zeeklogs.js) - add new fields to the `allFields` array for Malcolm to create Arkime [value actions](https://arkime.com/settings#right-click) for your fields -* [`dashboards/templates/composable/component/__(name)__.json`](../../dashboards/templates/composable/component/) - add new fields to a new [composable index template](https://opensearch.org/docs/latest/opensearch/index-templates/#composable-index-templates) file in this directory and add its name (prefixed with `custom_`) to the `composed_of` section of [`dashboards/templates/malcolm_template.json`](../../dashboards/templates/malcolm_template.json) in order for it to be included as part of the `arkime_sessions3-*` [index template](https://opensearch.org/docs/latest/opensearch/index-templates/) used by Arkime and OpenSearch Dashboards in Malcolm - -When possible, I recommend you to use (or at least take inspiration from) the [Elastic Common Schema (ECS) Reference](https://www.elastic.co/guide/en/ecs/current/index.html) when deciding how to define new field names. - -## Zeek - -### `local.zeek` - -Some Zeek behavior can be tweaked without having to manually edit configuration files through the use of environment variables: search for `ZEEK` in the [`docker-compose.yml` parameters](../../README.md#DockerComposeYml) section of the documentation. - -Other changes to Zeek's behavior could be made by modifying [local.zeek](../../zeek/config/local.zeek) and either using a [bind mount](#Bind) or [rebuilding](#Build) the `zeek` Docker image with the modification. See the [Zeek documentation](https://docs.zeek.org/en/master/quickstart.html#local-site-customization) for more information on customizing a Zeek instance. Note that changing Zeek's behavior could result in changes to the format of the logs Zeek generates, which could break Malcolm's parsing of those logs, so exercise caution. - -### Adding a new Zeek package - -The easiest way to add a new Zeek package to Malcolm is to add the git URL of that package to the `ZKG_GITHUB_URLS` array in [zeek_install_plugins.sh](../../shared/bin/zeek_install_plugins.sh) script and then [rebuilding](#Build) the `zeek` Docker image. This will cause your package to be installed (via the [`zkg`](https://docs.zeek.org/projects/package-manager/en/stable/zkg.html) command-line tool). See [Parsing new Zeek logs](#LogstashZeek) on how to process any new `.log` files if your package generates them. - -### Zeek Intelligence Framework - -See [Zeek Intelligence Framework](../../README.md#ZeekIntel) in the Malcolm README for information on how to use Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) with Malcolm. - -## PCAP processors - -When a PCAP is uploaded (either through Malcolm's [upload web interface](../../README.md#Upload) or just copied manually into the `./pcap/upload/` directory), the `pcap-monitor` container has a script that picks up those PCAP files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that PCAP. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), there are two of those: the `zeek` container and the `arkime` container. In Malcolm, they actually both share the [same script](../../shared/bin/pcap_processor.py) to read from that topic and run the PCAP through Zeek and Arkime, respectively. If you're looking for an example to follow, the `zeek` container is the less complicated of the two. So, if you were looking to integrate a new PCAP processing tool into Malcolm (named `cooltool` for this example), the process would be something like: - -1. Define your service as instructed in the [Adding a new service](#NewImage) section - * Note how the existing `zeek` and `arkime` services use [bind mounts](#Bind) to access the local `./pcap` directory -1. Write a script (modelled after [the one](../../shared/bin/pcap_processor.py) `zeek` and `arkime` use, if you like) which subscribes to the PCAP topic port (`30441` as defined in [pcap_utils.py](../../shared/bin/pcap_utils.py)) and handles the PCAP files published there, each PCAP file represented by a JSON dictionary with `name`, `tags`, `size`, `type` and `mime` keys (search for `FILE_INFO_` in [pcap_utils.py](../../shared/bin/pcap_utils.py)). This script should be added to and run by your `cooltool.Dockerfile`-generated container. -1. Add whatever other logic needed to get your tool's data into Malcolm, whether by writing it directly info OpenSearch or by sending log files for parsing and enrichment by [Logstash](#Logstash) (especially see the section on [Parsing a new log data source](#LogstashNewSource)) - -While that might be a bit of hand-waving, these general steps take care of the PCAP processing piece: you shouldn't have to really edit any *existing* code to add a new PCAP processor. You're just creating a new container for the Malcolm appliance to the ZeroMQ topic and handle the PCAPs your tool receives. - -The `PCAP_PIPELINE_DEBUG` and `PCAP_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the PCAP processing pipeline. - -## Logstash - -### Parsing a new log data source - -Let's continue with the example of the `cooltool` service we added in the [PCAP processors](#PCAP) section above, assuming that `cooltool` generates some textual log files we want to parse and index into Malcolm. - -You'd have configured `cooltool` in your `cooltool.Dockerfile` and its section in the `docker-compose` files to write logs into a subdirectory or subdirectories in a shared folder [bind mounted](#Bind) in such a way that both the `cooltool` and `filebeat` containers can access. Referring to the `zeek` container as an example, this is how the `./zeek-logs` folder is handled; both the `filebeat` and `zeek` services have `./zeek-logs` in their `volumes:` section: - -``` -$ grep -P "^( - ./zeek-logs| [\w-]+:)" docker-compose.yml | grep -B1 "zeek-logs" - filebeat: - - ./zeek-logs:/data/zeek --- - zeek: - - ./zeek-logs/upload:/zeek/upload -… -``` - -You'll need to provide access to your `cooltool` logs in a similar fashion. - -Next, tweak [`filebeat.yml`](../../filebeat/filebeat.yml) by adding a new log input path pointing to the `cooltool` logs to send them along to the `logstash` container. This modified `filebeat.yml` will need to be reflected in the `filebeat` container via [bind mount](#Bind) or by [rebuilding](#Build) it. - -Logstash can then be easily extended to add more [`logstash/pipelines`](../../logstash/pipelines). At the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), the Logstash pipelines basically look like this: - -* input (from `filebeat`) sends logs to 1..*n* **parse pipelines** -* each **parse pipeline** does what it needs to do to parse its logs then sends them to the [**enrichment pipeline**](#LogstashEnrichments) -* the [**enrichment pipeline**](../../logstash/pipelines/enrichment) performs common lookups to the fields that have been normalized and indexes the logs into the OpenSearch data store - -So, in order to add a new **parse pipeline** for `cooltool` after tweaking [`filebeat.yml`](../../filebeat/filebeat.yml) as described above, create a `cooltool` directory under [`logstash/pipelines`](../../logstash/pipelines) which follows the same pattern as the `zeek` parse pipeline. This directory will have an input file (tiny), a filter file (possibly large), and an output file (tiny). In your filter file, be sure to set the field [`event.hash`](https://www.elastic.co/guide/en/ecs/master/ecs-event.html#field-event-hash) to a unique value to identify indexed documents in OpenSearch; the [fingerprint filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html) may be useful for this. - -Finally, in your `docker-compose` files, set a new `LOGSTASH_PARSE_PIPELINE_ADDRESSES` environment variable under `logstash-variables` to `cooltool-parse,zeek-parse,suricata-parse,beats-parse` (assuming you named the pipeline address from the previous step `cooltool-parse`) so that logs sent from `filebeat` to `logstash` are forwarded to all parse pipelines. - -### Parsing new Zeek logs - -The following modifications must be made in order for Malcolm to be able to parse new Zeek log files: - -1. Add a parsing section to [`logstash/pipelines/zeek/11_zeek_logs.conf`](../../logstash/pipelines/zeek/11_zeek_logs.conf) - * Follow patterns for existing log files as an example - * For common Zeek fields like the `id` four-tuple, timestamp, etc., use the same convention used by existing Zeek logs in that file (e.g., `ts`, `uid`, `orig_h`, `orig_p`, `resp_h`, `resp_p`) - * Take care, especially when copy-pasting filter code, that the Zeek delimiter isn't modified from a tab character to a space character (see "*zeek's default delimiter is a literal tab, MAKE SURE YOUR EDITOR DOESN'T SCREW IT UP*" warnings in that file) -1. If necessary, perform log normalization in [`logstash/pipelines/zeek/12_zeek_normalize.conf`](../../logstash/pipelines/zeek/12_zeek_normalize.conf) for values like action (`event.action`), result (`event.result`), application protocol version (`network.protocol_version`), etc. -1. If necessary, define conversions for floating point or integer values in [`logstash/pipelines/zeek/11_zeek_logs.conf`](../../logstash/pipelines/zeek/13_zeek_convert.conf) -1. Identify the new fields and add them as described in [Adding new log fields](#NewFields) - -### Enrichments - -Malcolm's Logstash instance will do a lot of enrichments for you automatically: see the [enrichment pipeline](../../logstash/pipelines/enrichment), including MAC address to vendor by OUI, GeoIP, ASN, and a few others. In order to take advantage of these enrichments that are already in place, normalize new fields to use the same standardized field names Malcolm uses for things like IP addresses, MAC addresses, etc. You can add your own additional enrichments by creating new `.conf` files containing [Logstash filters](https://www.elastic.co/guide/en/logstash/7.10/filter-plugins.html) in the [enrichment pipeline](../../logstash/pipelines/enrichment) directory and using either of the techniques in the [Local modifications](#LocalMods) section to implement your changes in the `logstash` container - -### Logstash plugins - -The [logstash.Dockerfile](../../Dockerfiles/logstash.Dockerfile) installs the Logstash plugins used by Malcolm (search for `logstash-plugin install` in that file). Additional Logstash plugins could be installed by modifying this Dockerfile and [rebuilding](#Build) the `logstash` Docker image. - -## OpenSearch Dashboards - -[OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) is an open-source fork of [Kibana](https://www.elastic.co/kibana/), which is [no longer open-source software](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0). - -### Adding new visualizations and dashboards - -Visualizations and dashboards can be [easily created](../../README.md#BuildDashboard) in OpenSearch Dashboards using its drag-and-drop WYSIWIG tools. Assuming you've created a new dashboard you wish to package with Malcolm, the dashboard and its visualization components can be exported using the following steps: - -1. Identify the ID of the dashboard (found in the URL: e.g., for `/dashboards/app/dashboards#/view/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` the ID would be `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) -1. Export the dashboard with that ID and save it in the `./dashboards./dashboards/` directory with the following command: - ``` - export DASHID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx && \ - docker-compose exec dashboards curl -XGET \ - "http://localhost:5601/dashboards/api/opensearch-dashboards/dashboards/export?dashboard=$DASHID" > \ - ./dashboards/dashboards/$DASHID.json - ``` -1. It's preferrable for Malcolm to dynamically create the `arkime_sessions3-*` index template rather than including it in imported dashboards, so edit the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.json` that was generated, and carefully locate and remove the section with the `id` of `arkime_sessions3-*` and the `type` of `index-pattern` (including the comma preceding it): - ``` - , - { - "id": "arkime_sessions3-*", - "type": "index-pattern", - "namespaces": [ - "default" - ], - "updated_at": "2021-12-13T18:21:42.973Z", - "version": "Wzk3MSwxXQ==", - … - "references": [], - "migrationVersion": { - "index-pattern": "7.6.0" - } - } - ``` -1. Include the new dashboard either by using a [bind mount](#Bind) for the `./dashboards./dashboards/` directory or by [rebuilding](#Build) the `dashboards-helper` Docker image. Dashboards are imported the first time Malcolm starts up. - -### OpenSearch Dashboards plugins - -The [dashboards.Dockerfile](../../Dockerfiles/dashboards.Dockerfile) installs the OpenSearch Dashboards plugins used by Malcolm (search for `opensearch-dashboards-plugin install` in that file). Additional Dashboards plugins could be installed by modifying this Dockerfile and [rebuilding](#Build) the `dashboards` Docker image. - -Third-party or community plugisn developed for Kibana will not install into OpenSearch dashboards without source code modification. Depending on the plugin, this could range from very smiple to very complex. As an illustrative example, the changes that were required to port the Sankey diagram visualization plugin from Kibana to OpenSearch Dashboards compatibility can be [viewed on GitHub](https://github.com/mmguero-dev/osd_sankey_vis/compare/edacf6b...main). - -## Carved file scanners - -Similar to the [PCAP processing pipeline](#PCAP) described above, new tools can plug into Malcolm's [automatic file extraction and scanning](../../README.md#ZeekFileExtraction) to examine file transfers carved from network traffic. - -When Zeek extracts a file it observes being transfered in network traffic, the `file-monitor` container picks up those extracted files and publishes to a [ZeroMQ](https://zeromq.org/) topic that can be subscribed to by any other process that wants to analyze that extracted file. In Malcolm at the time of this writing (as of the [v5.0.0 release](https://github.com/idaholab/Malcolm/releases/tag/v5.0.0)), currently implemented file scanners include ClamAV, YARA, capa and VirusTotal, all of which are managed by the `file-monitor` container. The scripts involved in this code are: - -* [shared/bin/zeek_carve_watcher.py](../../shared/bin/zeek_carve_watcher.py) - watches the directory to which Zeek extracts files and publishes information about those files to the ZeroMQ ventilator on port 5987 -* [shared/bin/zeek_carve_scanner.py](../../shared/bin/zeek_carve_scanner.py) - subscribes to `zeek_carve_watcher.py`'s topic and performs file scanning for the ClamAV, YARA, capa and VirusTotal engines and sends "hits" to another ZeroMQ sync on port 5988 -* [shared/bin/zeek_carve_logger.py](../../shared/bin/zeek_carve_logger.py) - subscribes to `zeek_carve_scanner.py`'s topic and logs hits to a "fake" Zeek signatures.log file which is parsed and ingested by Logstash -* [shared/bin/zeek_carve_utils.py](../../shared/bin/zeek_carve_utils.py) - various variables and classes related to carved file scanning - -Additional file scanners could either be added to the `file-monitor` service, or to avoid coupling with Malcolm's code you could simply define a new service as instructed in the [Adding a new service](#NewImage) section and write your own scripts to subscribe and publish to the topics as described above. While that might be a bit of hand-waving, these general steps take care of the plumbing around extracting the file and notifying your tool, as well as handling the logging of "hits": you shouldn't have to really edit any *existing* code to add a new carved file scanner. - -The `EXTRACTED_FILE_PIPELINE_DEBUG` and `EXTRACTED_FILE_PIPELINE_DEBUG_EXTRA` environment variables in the `docker-compose` files can be set to `true` to enable verbose debug logging from the output of the Docker containers involved in the carved file processing pipeline. - -## Style - -### Python - -For Python code found in Malcolm, the author uses [Black: The uncompromising Python code formatter](https://github.com/psf/black) with the options `--line-length 120 --skip-string-normalization`. - -## Copyright - -[Malcolm](https://github.com/idaholab/Malcolm) is Copyright 2022 Battelle Energy Alliance, LLC, and is developed and released through the cooperation of the [Cybersecurity and Infrastructure Security Agency](https://www.cisa.gov/) of the [U.S. Department of Homeland Security](https://www.dhs.gov/). - -See [`License.txt`](../../License.txt) for the terms of its release. - -### Contact information of author(s): - -[malcolm@inl.gov](mailto:malcolm@inl.gov?subject=Malcolm) diff --git a/docs/cyberchef.md b/docs/cyberchef.md new file mode 100644 index 000000000..c9a6b2aeb --- /dev/null +++ b/docs/cyberchef.md @@ -0,0 +1,5 @@ +# CyberChef + +Malcolm provides an instance of [CyberChef](https://github.com/gchq/CyberChef), the "Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis." CyberChef is available at at [https://localhost/cyberchef.html](https://localhost/cyberchef.html) if you are connecting locally. + +Arkime's [Sessions](arkime.md#ArkimeSessions) view has built-in CyberChef integration for Arkime sessions with full PCAP payloads available: expanding a session and opening the **Packet Options** drop-down menu in its payload section will provide options for **Open src packets with CyberChef** and **Open dst packets with CyberChef**. \ No newline at end of file diff --git a/docs/dashboards.md b/docs/dashboards.md new file mode 100644 index 000000000..7a5f5fc00 --- /dev/null +++ b/docs/dashboards.md @@ -0,0 +1,99 @@ +# OpenSearch Dashboards + +* [OpenSearch Dashboards](#Dashboards) + - [Discover](#Discover) + + [Screenshots](#DiscoverGallery) + - [Visualizations and dashboards](#DashboardsVisualizations) + + [Prebuilt visualizations and dashboards](#PrebuiltVisualizations) + * [Screenshots](#PrebuiltVisualizationsGallery) + + [Building your own visualizations and dashboards](#BuildDashboard) + * [Screenshots](#NewVisualizationsGallery) + +While Arkime provides very nice visualizations, especially for network traffic, [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) (an open source general-purpose data visualization tool for OpenSearch) can be used to create custom visualizations (tables, charts, graphs, dashboards, etc.) using the same data. + +The OpenSearch Dashboards container can be accessed at [https://localhost/dashboards/](https://localhost/dashboards/) if you are connecting locally. Several preconfigured dashboards for Zeek logs are included in Malcolm's OpenSearch Dashboards configuration. + +OpenSearch Dashboards has several components for data searching and visualization: + +## Discover + +The **Discover** view enables you to view events on a record-by-record basis (similar to a *session* record in Arkime or an individual line from a Zeek log). See the official [Kibana User Guide](https://www.elastic.co/guide/en/kibana/7.10/index.html) (OpenSearch Dashboards is an open-source fork of Kibana, which is no longer open-source software) for information on using the Discover view: + +* [Discover](https://www.elastic.co/guide/en/kibana/7.10/discover.html) +* [Searching Your Data](https://www.elastic.co/guide/en/kibana/7.10/search.html) + +### Screenshots + +![Discover view](./images/screenshots/dashboards_discover.png) + +![Viewing the details of a session in Discover](./images/screenshots/dashboards_discover_table.png) + +![Filtering by tags to display only sessions with public IP addresses](./images/screenshots/dashboards_add_filter.png) + +![Changing the fields displayed in Discover](./images/screenshots/dashboards_fields_list.png) + +![Opening a previously-saved search](./images/screenshots/dashboards_open_search.png) + +## Visualizations and dashboards + +### Prebuilt visualizations and dashboards + +Malcolm comes with dozens of prebuilt visualizations and dashboards for the network traffic represented by each of the Zeek log types. Click **Dashboard** to see a list of these dashboards. As is the case with all OpenSearch Dashboards visualizations, all of the charts, graphs, maps, and tables are interactive and can be clicked on to narrow or expand the scope of the data you are investigating. Similarly, click **Visualize** to explore the prebuilt visualizations used to build the dashboards. + +Many of Malcolm's prebuilt visualizations for Zeek logs were originally inspired by the excellent [Kibana Dashboards](https://github.com/Security-Onion-Solutions/securityonion-elastic/tree/master/kibana/dashboards) that are part of [Security Onion](https://securityonion.net/). + +#### Screenshots + +![The Security Overview highlights security-related network events](./images/screenshots/dashboards_security_overview.png) + +![The ICS/IoT Security Overview dashboard displays information about ICS and IoT network traffic](./images/screenshots/dashboards_ics_iot_security_overview.png) + +![The Connections dashboard displays information about the "top talkers" across all types of sessions](./images/screenshots/dashboards_connections.png) + +![The HTTP dashboard displays important details about HTTP traffic](./images/screenshots/dashboards_http.png) + +![There are several Connections visualizations using locations from GeoIP lookups](./images/screenshots/dashboards_latlon_map.png) + +![OpenSearch Dashboards includes both coordinate and region map types](./images/screenshots/dashboards_region_map.png) + +![The Suricata Alerts dashboard highlights traffic which matched Suricata signatures](./images/screenshots/dashboards_suricata_alerts.png) + +![The Zeek Notices dashboard highlights things which Zeek determine are potentially bad](./images/screenshots/dashboards_notices.png) + +![The Zeek Signatures dashboard displays signature hits, such as antivirus hits on files extracted from network traffic](./images/screenshots/dashboards_signatures.png) + +![The Software dashboard displays the type, name, and version of software seen communicating on the network](./images/screenshots/dashboards_software.png) + +![The PE (portable executables) dashboard displays information about executable files transferred over the network](./images/screenshots/dashboards_portable_executables.png) + +![The SMTP dashboard highlights details about SMTP traffic](./images/screenshots/dashboards_smtp.png) + +![The SSL dashboard displays information about SSL versions, certificates, and TLS JA3 fingerprints](./images/screenshots/dashboards_ssl.png) + +![The files dashboard displays metrics about the files transferred over the network](./images/screenshots/dashboards_files_source.png) + +![This dashboard provides insight into DNP3 (Distributed Network Protocol), a protocol used commonly in electric and water utilities](./images/screenshots/dashboards_dnp3.png) + +![Modbus is a standard protocol found in many industrial control systems (ICS)](./images/screenshots/dashboards_modbus.png) + +![BACnet is a communications protocol for Building Automation and Control (BAC) networks](./images/screenshots/dashboards_bacnet.png) + +![EtherCAT is an Ethernet-based fieldbus system](./images/screenshots/dashboards_ecat.png) + +![EtherNet/IP is an industrial network protocol that adapts the Common Industrial Protocol to standard Ethernet](./images/screenshots/dashboards_ethernetip.png) + +![PROFINET is an industry technical standard for data communication over Industrial Ethernet](./images/screenshots/dashboards_profinet.png) + +![S7comm is a Siemens proprietary protocol that runs between programmable logic controllers (PLCs) of the Siemens family](./images/screenshots/dashboards_s7comm.png) + +### Building your own visualizations and dashboards + +See the official [Kibana User Guide](https://www.elastic.co/guide/en/kibana/7.10/index.html) and [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) (OpenSearch Dashboards is an open-source fork of Kibana, which is no longer open-source software) documentation for information on creating your own visualizations and dashboards: + +* [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) +* [Kibana Dashboards](https://www.elastic.co/guide/en/kibana/7.10/dashboard.html) +* [TimeLine](https://www.elastic.co/guide/en/kibana/7.12/timelion.html) + +#### Screenshots + +![OpenSearch dashboards boasts many types of visualizations for displaying your data](./images/screenshots/dashboards_new_visualization.png) \ No newline at end of file diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 000000000..06e3c77ef --- /dev/null +++ b/docs/development.md @@ -0,0 +1,161 @@ +# Development + +* [Development](#Development) + - [Building from source](#Build) + - [Pre-Packaged installation files](#Packager) + +Checking out the [Malcolm source code](https://github.com/idaholab/Malcolm/tree/main) results in the following subdirectories in your `malcolm/` working copy: + +* `api` - code and configuration for the `api` container which provides a REST API to query Malcolm +* `arkime` - code and configuration for the `arkime` container which processes PCAP files using `capture` and which serves the Viewer application +* `arkime-logs` - an initially empty directory to which the `arkime` container will write some debug log files +* `arkime-raw` - an initially empty directory to which the `arkime` container will write captured PCAP files; as Arkime as employed by Malcolm is currently used for processing previously-captured PCAP files, this directory is currently unused +* `Dockerfiles` - a directory containing build instructions for Malcolm's docker images +* `docs` - a directory containing instructions and documentation +* `opensearch` - an initially empty directory where the OpenSearch database instance will reside +* `opensearch-backup` - an initially empty directory for storing OpenSearch [index snapshots](index-management.md#IndexManagement) +* `filebeat` - code and configuration for the `filebeat` container which ingests Zeek logs and forwards them to the `logstash` container +* `file-monitor` - code and configuration for the `file-monitor` container which can scan files extracted by Zeek +* `file-upload` - code and configuration for the `upload` container which serves a web browser-based upload form for uploading PCAP files and Zeek logs, and which serves an SFTP share as an alternate method for upload +* `freq-server` - code and configuration for the `freq` container used for calculating entropy of strings +* `htadmin` - configuration for the `htadmin` user account management container +* `dashboards` - code and configuration for the `dashboards` container for creating additional ad-hoc visualizations and dashboards beyond that which is provided by Arkime Viewer +* `logstash` - code and configuration for the `logstash` container which parses Zeek logs and forwards them to the `opensearch` container +* `malcolm-iso` - code and configuration for building an [installer ISO](malcolm-iso.md#ISO) for a minimal Debian-based Linux installation for running Malcolm +* `name-map-ui` - code and configuration for the `name-map-ui` container which provides the [host and subnet name mapping](host-and-subnet-mapping.md#HostAndSubnetNaming) interface +* `netbox` - code and configuration for the `netbox`, `netbox-postgres`, `netbox-redis` and `netbox-redis-cache` containers which provide asset management capabilities +* `nginx` - configuration for the `nginx` reverse proxy container +* `pcap` - an initially empty directory for PCAP files to be uploaded, processed, and stored +* `pcap-capture` - code and configuration for the `pcap-capture` container which can capture network traffic +* `pcap-monitor` - code and configuration for the `pcap-monitor` container which watches for new or uploaded PCAP files notifies the other services to process them +* `scripts` - control scripts for starting, stopping, restarting, etc. Malcolm +* `sensor-iso` - code and configuration for building a [Hedgehog Linux](live-analysis.md#Hedgehog) ISO +* `shared` - miscellaneous code used by various Malcolm components +* `suricata` - code and configuration for the `suricata` container which handles PCAP processing using Suricata +* `suricata-logs` - an initially empty directory for Suricata logs to be uploaded, processed, and stored +* `zeek` - code and configuration for the `zeek` container which handles PCAP processing using Zeek +* `zeek-logs` - an initially empty directory for Zeek logs to be uploaded, processed, and stored +* `_includes` and `_layouts` - templates for the HTML version of the documentation + +and the following files of special note: + +* `auth.env` - the script `./scripts/auth_setup` prompts the user for the administrator credentials used by the Malcolm appliance, and `auth.env` is the environment file where those values are stored +* `cidr-map.txt` - specify custom IP address to network segment mapping +* `host-map.txt` - specify custom IP and/or MAC address to host mapping +* `net-map.json` - an alternative to `cidr-map.txt` and `host-map.txt`, mapping hosts and network segments to their names in a JSON-formatted file +* `docker-compose.yml` - the configuration file used by `docker-compose` to build, start, and stop an instance of the Malcolm appliance +* `docker-compose-standalone.yml` - similar to `docker-compose.yml`, only used for the ["packaged"](#Packager) installation of Malcolm + +## Building from source + +Building the Malcolm docker images from scratch requires internet access to pull source files for its components. Once internet access is available, execute the following command to build all of the Docker images used by the Malcolm appliance: + +``` +$ ./scripts/build.sh +``` + +Then, go take a walk or something since it will be a while. When you're done, you can run `docker images` and see you have fresh images for: + +* `malcolmnetsec/api` (based on `python:3-slim`) +* `malcolmnetsec/arkime` (based on `debian:11-slim`) +* `malcolmnetsec/dashboards-helper` (based on `alpine:3.16`) +* `malcolmnetsec/dashboards` (based on `opensearchproject/opensearch-dashboards`) +* `malcolmnetsec/file-monitor` (based on `debian:11-slim`) +* `malcolmnetsec/file-upload` (based on `debian:11-slim`) +* `malcolmnetsec/filebeat-oss` (based on `docker.elastic.co/beats/filebeat-oss`) +* `malcolmnetsec/freq` (based on `debian:11-slim`) +* `malcolmnetsec/htadmin` (based on `debian:11-slim`) +* `malcolmnetsec/logstash-oss` (based on `opensearchproject/logstash-oss-with-opensearch-output-plugin`) +* `malcolmnetsec/name-map-ui` (based on `alpine:3.16`) +* `malcolmnetsec/netbox` (based on `netboxcommunity/netbox:latest`) +* `malcolmnetsec/nginx-proxy` (based on `alpine:3.16`) +* `malcolmnetsec/opensearch` (based on `opensearchproject/opensearch`) +* `malcolmnetsec/pcap-capture` (based on `debian:11-slim`) +* `malcolmnetsec/pcap-monitor` (based on `debian:11-slim`) +* `malcolmnetsec/postgresql` (based on `postgres:14-alpine`) +* `malcolmnetsec/redis` (based on `redis:7-alpine`) +* `malcolmnetsec/suricata` (based on `debian:11-slim`) +* `malcolmnetsec/zeek` (based on `debian:11-slim`) + +Alternately, if you have forked Malcolm on GitHub, [workflow files](./.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](live-analysis.md#Hedgehog) and [Malcolm](malcolm-iso.md#ISO) installer ISOs. The resulting images are named according to the pattern `ghcr.io/owner/malcolmnetsec/image:branch` (e.g., if you've forked Malcolm with the github user `romeogdetlevjr`, the `arkime` container built for the `main` would be named `ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main`). To run your local instance of Malcolm using these images instead of the official ones, you'll need to edit your `docker-compose.yml` file(s) and replace the `image:` tags according to this new pattern, or use the bash helper script `./shared/bin/github_image_helper.sh` to pull and re-tag the images. + +# Pre-Packaged installation files + +## Creating pre-packaged installation files + +`scripts/malcolm_appliance_packager.sh` can be run to package up the configuration files (and, if necessary, the Docker images) which can be copied to a network share or USB drive for distribution to non-networked machines. For example: + +``` +$ ./scripts/malcolm_appliance_packager.sh +You must set a username and password for Malcolm, and self-signed X.509 certificates will be generated + +Store administrator username/password for local Malcolm access? (Y/n): y + +Administrator username: analyst +analyst password: +analyst password (again): + +(Re)generate self-signed certificates for HTTPS access (Y/n): y + +(Re)generate self-signed certificates for a remote log forwarder (Y/n): y + +Store username/password for primary remote OpenSearch instance? (y/N): n + +Store username/password for secondary remote OpenSearch instance? (y/N): n + +Store username/password for email alert sender account? (y/N): n + +(Re)generate internal passwords for NetBox (Y/n): y + +Packaged Malcolm to "/home/user/tmp/malcolm_20190513_101117_f0d052c.tar.gz" + +Do you need to package docker images also [y/N]? y +This might take a few minutes... + +Packaged Malcolm docker images to "/home/user/tmp/malcolm_20190513_101117_f0d052c_images.tar.gz" + + +To install Malcolm: + 1. Run install.py + 2. Follow the prompts + +To start, stop, restart, etc. Malcolm: + Use the control scripts in the "scripts/" directory: + - start (start Malcolm) + - stop (stop Malcolm) + - restart (restart Malcolm) + - logs (monitor Malcolm logs) + - wipe (stop Malcolm and clear its database) + - auth_setup (change authentication-related settings) + +A minute or so after starting Malcolm, the following services will be accessible: + - Arkime: https://localhost/ + - OpenSearch Dashboards: https://localhost/dashboards/ + - PCAP upload (web): https://localhost/upload/ + - PCAP upload (sftp): sftp://USERNAME@127.0.0.1:8022/files/ + - Host and subnet name mapping editor: https://localhost/name-map-ui/ + - NetBox: https://localhost/netbox/ + - Account management: https://localhost:488/ +``` + +The above example will result in the following artifacts for distribution as explained in the script's output: + +``` +$ ls -lh +total 2.0G +-rwxr-xr-x 1 user user 61k May 13 11:32 install.py +-rw-r--r-- 1 user user 2.0G May 13 11:37 malcolm_20190513_101117_f0d052c_images.tar.gz +-rw-r--r-- 1 user user 683 May 13 11:37 malcolm_20190513_101117_f0d052c.README.txt +-rw-r--r-- 1 user user 183k May 13 11:32 malcolm_20190513_101117_f0d052c.tar.gz +``` + +## Installing from pre-packaged installation files + +If you have obtained pre-packaged installation files to install Malcolm on a non-networked machine via an internal network share or on a USB key, you likely have the following files: + +* `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.README.txt` - This readme file contains a minimal set up instructions for extracting the contents of the other tarballs and running the Malcolm appliance. +* `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` - This tarball contains the configuration files and directory configuration used by an instance of Malcolm. It can be extracted via `tar -xf malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` upon which a directory will be created (named similarly to the tarball) containing the directories and configuration files. Alternatively, `install.py` can accept this filename as an argument and handle its extraction and initial configuration for you. +* `malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz` - This tarball contains the Docker images used by Malcolm. It can be imported manually via `docker load -i malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz` +* `install.py` - This install script can load the Docker images and extract Malcolm configuration files from the aforementioned tarballs and do some initial configuration for you. + +Run `install.py malcolm_XXXXXXXX_XXXXXX_XXXXXXX.tar.gz` and follow the prompts. If you do not already have Docker and Docker Compose installed, the `install.py` script will help you install them. \ No newline at end of file diff --git a/docs/web/download.md b/docs/download.md similarity index 91% rename from docs/web/download.md rename to docs/download.md index 180655cda..e7fb9a52c 100644 --- a/docs/web/download.md +++ b/docs/download.md @@ -22,7 +22,7 @@ While official downloads of the Malcolm installer ISO are not provided, an **uno ### Installer ISO -[Instructions are provided](/hedgehog/#ISOBuild) to generate the Hedgehog Linux ISO from source. While official downloads of the Hedgehog Linux ISO are not provided, an **unofficial build** of the ISO installer for the latest stable release is available for download here. +[Instructions are provided](/hedgehog/#SensorISOBuild) to generate the Hedgehog Linux ISO from source. While official downloads of the Hedgehog Linux ISO are not provided, an **unofficial build** of the ISO installer for the latest stable release is available for download here. | ISO | SHA256 | |---|---| diff --git a/docs/file-scanning.md b/docs/file-scanning.md new file mode 100644 index 000000000..6d6396a9e --- /dev/null +++ b/docs/file-scanning.md @@ -0,0 +1,28 @@ +# Automatic file extraction and scanning + +Malcolm can leverage Zeek's knowledge of network protocols to automatically detect file transfers and extract those files from PCAPs as Zeek processes them. This behavior can be enabled globally by modifying the `ZEEK_EXTRACTOR_MODE` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml), or on a per-upload basis for PCAP files uploaded via the [browser-based upload form](upload.md#Upload) when **Analyze with Zeek** is selected. + +To specify which files should be extracted, the following values are acceptable in `ZEEK_EXTRACTOR_MODE`: + +* `none`: no file extraction +* `interesting`: extraction of files with mime types of common attack vectors +* `mapped`: extraction of files with recognized mime types +* `known`: extraction of files for which any mime type can be determined +* `all`: extract all files + +Extracted files can be examined through any of the following methods: + +* submitting file hashes to [**VirusTotal**](https://www.virustotal.com/en/#search); to enable this method, specify the `VTOT_API2_KEY` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) +* scanning files with [**ClamAV**](https://www.clamav.net/); to enable this method, set the `EXTRACTED_FILE_ENABLE_CLAMAV` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) to `true` +* scanning files with [**Yara**](https://github.com/VirusTotal/yara); to enable this method, set the `EXTRACTED_FILE_ENABLE_YARA` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) to `true` +* scanning PE (portable executable) files with [**Capa**](https://github.com/fireeye/capa); to enable this method, set the `EXTRACTED_FILE_ENABLE_CAPA` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) to `true` + +Files which are flagged via any of these methods will be logged as Zeek `signatures.log` entries, and can be viewed in the **Signatures** dashboard in OpenSearch Dashboards. + +The `EXTRACTED_FILE_PRESERVATION` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) determines the behavior for preservation of Zeek-extracted files: + +* `quarantined`: preserve only flagged files in `./zeek-logs/extract_files/quarantine` +* `all`: preserve flagged files in `./zeek-logs/extract_files/quarantine` and all other extracted files in `./zeek-logs/extract_files/preserved` +* `none`: preserve no extracted files + +The `EXTRACTED_FILE_HTTP_SERVER_…` [environment variables in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) configure access to the Zeek-extracted files path through the means of a simple HTTPS directory server. Beware that Zeek-extracted files may contain malware. As such, the files may be optionally encrypted upon download. \ No newline at end of file diff --git a/docs/hardening.md b/docs/hardening.md new file mode 100644 index 000000000..71fa0996b --- /dev/null +++ b/docs/hardening.md @@ -0,0 +1,62 @@ +# Hardening + +* [Hardening](#Hardening) + - [Compliance Exceptions](#ComplianceExceptions) + +The Malcolm aggregator base operating system uses the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmarks which target the following guidelines for establishing a secure configuration posture: + +* [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) +* [DISA STIG (Security Technical Implementation Guides) for RHEL 7](https://www.stigviewer.com/stig/red_hat_enterprise_linux_7/) v2r5 Ubuntu v1r2 [adapted](https://github.com/hardenedlinux/STIG-OS-mirror/blob/master/redhat-STIG-DOCs/U_Red_Hat_Enterprise_Linux_7_V2R5_STIG.zip) for a Debian operating system +* Additional recommendations from [cisecurity.org](https://www.cisecurity.org/) + +## Compliance Exceptions + +[Currently](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) there are 274 checks to determine compliance with the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmark. + +The Malcolm aggregator base operating system claims exceptions from the recommendations in this benchmark in the following categories: + +**1.1 Install Updates, Patches and Additional Security Software** - When the the Malcolm aggregator appliance software is built, all of the latest applicable security patches and updates are included in it. How future updates are to be handled is still in design. + +**1.3 Enable verify the signature of local packages** - As the base distribution is not using embedded signatures, `debsig-verify` would reject all packages (see comment in `/etc/dpkg/dpkg.cfg`). Enabling it after installation would disallow any future updates. + +**2.14 Add nodev option to /run/shm Partition**, **2.15 Add nosuid Option to /run/shm Partition**, **2.16 Add noexec Option to /run/shm Partition** - The Malcolm aggregator base operating system does not mount `/run/shm` as a separate partition, so these recommendations do not apply. + +**2.19 Disable Mounting of freevxfs Filesystems**, **2.20 Disable Mounting of jffs2 Filesystems**, **2.21 Disable Mounting of hfs Filesystems**, **2.22 Disable Mounting of hfsplus Filesystems**, **2.23 Disable Mounting of squashfs Filesystems**, **2.24 Disable Mounting of udf Filesystems** - The Malcolm aggregator base operating system is not compiling a custom Linux kernel, so these filesystems are inherently supported as they are part Debian Linux's default kernel. + +**3.3 Set Boot Loader Password** - As maximizing availability is a system requirement, Malcolm should restart automatically without user intervention to ensured uninterrupted service. A boot loader password is not enabled. + +**4.8 Disable USB Devices** - The ability to ingest data (such as PCAP files) from a mounted USB mass storage device is a requirement of the system. + +**6.1 Ensure the X Window system is not installed**, **6.2 Ensure Avahi Server is not enabled**, **6.3 Ensure print server is not enabled** - An X Windows session is provided for displaying dashboards. The library packages `libavahi-common-data`, `libavahi-common3`, and `libcups2` are dependencies of some of the X components used by the Malcolm aggregator base operating system, but the `avahi` and `cups` services themselves are disabled. + +**6.17 Ensure virus scan Server is enabled**, **6.18 Ensure virus scan Server update is enabled** - As this is a network traffic analysis appliance rather than an end-user device, regular user files will not be created. A virus scan program would impact device performance and would be unnecessary. + +**7.1.1 Disable IP Forwarding**, **7.2.4 Log Suspicious Packets**, **7.2.7 Enable RFC-recommended Source Route Validation**, **7.4.1 Install TCP Wrappers** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, these recommendations do not apply. + +**8.1.1.2 Disable System on Audit Log Full**, **8.1.1.3 Keep All Auditing Information**, **8.1.1.5 Ensure set remote_server for audit service**, **8.1.1.6 Ensure enable_krb5 set to yes for remote audit service**, **8.1.1.7 Ensure set action for audit storage volume is fulled**, **8.1.1.8 Ensure set action for network failure on remote audit service**, **8.1.1.9 Set space left for auditd service**, a few other audit-related items under section **8.1**, **8.2.4 Configure rsyslog to Send Logs to a Remote Log Host** - As maximizing availability is a system requirement, audit processing failures will be logged on the device rather than halting the system. `auditd` is set up to syslog when its local storage capacity is reached. + +**8.4.2 Implement Periodic Execution of File Integrity** - This functionality is not configured by default, but it can be configured post-install by the end user. + +Password-related recommendations under **9.2** and **10.1** - The library package `libpam-pwquality` is used in favor of `libpam-cracklib` which is what the [compliance scripts](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) are looking for. Also, as an appliance running Malcolm is intended to be used as an appliance rather than a general user-facing software platform, some exceptions to password enforcement policies are claimed. + +**9.3.13 Limit Access via SSH** - The Malcolm aggregator base operating system does not create multiple regular user accounts: only `root` and an aggregator service account are used. SSH access for `root` is disabled. SSH login with a password is also disallowed: only key-based authentication is accepted. The service account accepts no keys by default. As such, the `AllowUsers`, `AllowGroups`, `DenyUsers`, and `DenyGroups` values in `sshd_config` do not apply. + +**9.4 Restrict Access to the su Command** - The Malcolm aggregator base operating system does not create multiple regular user accounts: only `root` and an aggregator service account are used. + +**10.1.6 Remove nopasswd option from the sudoers configuration** - A very limited set of operations (a single script used to run the AIDE integrity check as a non-root user) has the NOPASSWD option set to allow it to be run in the background without user intervention. + +**10.1.10 Set maxlogins for all accounts** and **10.5 Set Timeout on ttys** - The Malcolm aggregator base operating system does not create multiple regular user accounts: only `root` and an aggregator service account are used. + +**12.10 Find SUID System Executables**, **12.11 Find SGID System Executables** - The few files found by [these](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.10_find_suid_files.sh) [scripts](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.11_find_sgid_files.sh) are valid exceptions required by the Malcolm aggregator base operating system's core requirements. + +**14.1 Defense for NAT Slipstreaming** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, this recommendation does not apply. + +Please review the notes for these additional guidelines. While not claiming an exception, the Malcolm aggregator base operating system may implement them in a manner different than is described by the [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) or the [hardenedlinux/harbian-audit](https://github.com/hardenedlinux/harbian-audit) audit scripts. + +**4.1 Restrict Core Dumps** - The Malcolm aggregator base operating system disables core dumps using a configuration file for `ulimit` named `/etc/security/limits.d/limits.conf`. The [audit script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/4.1_restrict_core_dumps.sh) checking for this does not check the `limits.d` subdirectory, which is why this is incorrectly flagged as noncompliant. + +**5.4 Ensure ctrl-alt-del is disabled** - The Malcolm aggregator base operating system disables the `ctrl+alt+delete` key sequence by executing `systemctl disable ctrl-alt-del.target` during installation and the command `systemctl mask ctrl-alt-del.target` at boot time. + +**7.4.4 Create /etc/hosts.deny**, **7.7.1 Ensure Firewall is active**, **7.7.4.1 Ensure default deny firewall policy**, **7.7.4.2 Ensure loopback traffic is configured**, **7.7.4.3 Ensure default deny firewall policy**, **7.7.4.4 Ensure outbound and established connections are configured** - The Malcolm aggregator base operating system **is** configured with an appropriately locked-down software firewall (managed by "Uncomplicated Firewall" `ufw`). However, the methods outlined in the CIS benchmark recommendations do not account for this configuration. + +**8.6 Verifies integrity all packages** - The [script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/8.7_verify_integrity_packages.sh) which verifies package integrity only "fails" because of missing (status `??5??????` displayed by the utility) language ("locale") files, which are removed as part of the Malcolm aggregator base operating system's trimming-down process. All non-locale-related system files pass intergrity checks. diff --git a/docs/hedgehog-boot.md b/docs/hedgehog-boot.md new file mode 100644 index 000000000..a8bfeee64 --- /dev/null +++ b/docs/hedgehog-boot.md @@ -0,0 +1,17 @@ +# Boot + +Each time the sensor boots, a grub boot menu will be shown briefly, after which the sensor will proceed to load. + +## Kiosk mode + +![Kiosk mode sensor menu: resource statistics](./images/hedgehog/images/kiosk_mode_sensor_menu.png) + +The sensor automatically logs in as the sensor user account and runs in **kiosk mode**, which is intended to show an at-a-glance view of the its resource utilization. Clicking the **☰** icon in allows you to switch between the resource statistics view and the services view. + +![Kiosk mode sensor menu: services](./images/hedgehog/images/kiosk_mode_services_menu.png) + +The kiosk's services screen (designed with large clickable labels for small portable touch screens) can be used to start and stop essential services, get a status report of the currently running services, and clean all captured data from the sensor. + +!["Clean Sensor" confirmation prompt before deleting sensor data](./images/hedgehog/images/kiosk_mode_wipe_prompt.png) + +!["Sensor Status" report from the kiosk services menu](./images/hedgehog/images/kiosk_mode_status.png) \ No newline at end of file diff --git a/docs/hedgehog-config-root.md b/docs/hedgehog-config-root.md new file mode 100644 index 000000000..d542544de --- /dev/null +++ b/docs/hedgehog-config-root.md @@ -0,0 +1,47 @@ +# Interfaces, hostname, and time synchronization + +## Hostname + +The first step of sensor configuration is to configure the network interfaces and sensor hostname. Clicking the **Configure Interfaces and Hostname** toolbar icon (or, if you are at a command line prompt, running `configure-interfaces`) will prompt you for the root password you created during installation, after which the configuration welcome screen is shown. Select **Continue** to proceed. + +You may next select whether to configure the network interfaces, hostname, or time synchronization. + +![Selection to configure network interfaces, hostname, or time synchronization](./images/hedgehog/images/root_config_mode.png) + +Selecting **Hostname**, you will be presented with a summary of the current sensor identification information, after which you may specify a new sensor hostname. This name will be used to tag all events forwarded from this sensor in the events' **host.name** field. + +![Specifying a new sensor hostname](./images/hedgehog/images/hostname_setting.png) + +## Interfaces + +Returning to the configuration mode selection, choose **Interface**. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. + +You will be presented with a list of interfaces to configure as the sensor management interface. This is the interface the sensor itself will use to communicate with the network in order to, for example, forward captured logs to an aggregate server. In order to do so, the management interface must be assigned an IP address. This is generally **not** the interface used for capturing data. Select the interface to which you wish to assign an IP address. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. + +![Management interface selection](./images/hedgehog/images/select_iface.png) + +Depending on the configuration of your network, you may now specify how the management interface will be assigned an IP address. In order to communicate with an event aggregator over the management interface, either **static** or **dhcp** must be selected. + +![Interface address source](./images/hedgehog/images/iface_mode.png) + +If you select static, you will be prompted to enter the IP address, netmask, and gateway to assign to the management interface. + +![Static IP configuration](./images/hedgehog/images/iface_static.png) + +In either case, upon selecting **OK** the network interface will be brought down, configured, and brought back up, and the result of the operation will be displayed. You may choose **Quit** upon returning to the configuration tool's welcome screen. + +## Time synchronization + +Returning to the configuration mode selection, choose **Time Sync**. Here you can configure the sensor to keep its time synchronized with either an NTP server (using the NTP protocol) or a local [Malcolm](https://github.com/idaholab/Malcolm) aggregator or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure. + +![Time synchronization method](./images/hedgehog/images/time_sync_mode.png) + +If **htpdate** is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port `9200` may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server. + +![*htpdate* configuration](./images/hedgehog/images/htpdate_setup.png) + +If *ntpdate* is selected, you will be prompted to enter the IP address or hostname of the NTP server. + +![NTP configuration](./images/hedgehog/images/ntp_host.png) + +Upon configuring time synchronization, a "Time synchronization configured successfully!" message will be displayed, after which you will be returned to the welcome screen. \ No newline at end of file diff --git a/docs/hedgehog-config-user.md b/docs/hedgehog-config-user.md new file mode 100644 index 000000000..edf56c5da --- /dev/null +++ b/docs/hedgehog-config-user.md @@ -0,0 +1,189 @@ +# Capture, forwarding, and autostart services + +Clicking the **Configure Capture and Forwarding** toolbar icon (or, if you are at a command prompt, running `configure-capture`) will launch the configuration tool for capture and forwarding. The root password is not required as it was for the interface and hostname configuration, as sensor services are run under the non-privileged sensor account. Select **Continue** to proceed. You may select from a list of configuration options. + +![Select configuration mode](./images/hedgehog/images/capture_config_main.png) + +## Capture + +Choose **Configure Capture** to configure parameters related to traffic capture and local analysis. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. + +You will be presented with a list of network interfaces and prompted to select one or more capture interfaces. An interface used to capture traffic is generally a different interface than the one selected previously as the management interface, and each capture interface should be connected to a network tap or span port for traffic monitoring. Capture interfaces are usually not assigned an IP address as they are only used to passively “listen” to the traffic on the wire. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. + +![Select capture interfaces](./images/hedgehog/images/capture_iface_select.png) + +Upon choosing the capture interfaces and selecting OK, you may optionally provide a capture filter. This filter will be used to limit what traffic the PCAP service ([`tcpdump`](https://www.tcpdump.org/)) and the traffic analysis services ([`zeek`](https://www.zeek.org/) and [`suricata`](https://suricata.io/)) will see. Capture filters are specified using [Berkeley Packet Filter (BPF)](http://biot.com/capstats/bpf.html) syntax. Clicking **OK** will attempt to validate the capture filter, if specified, and will present a warning if the filter is invalid. + +![Specify capture filters](./images/hedgehog/images/capture_filter.png) + +Next you must specify the paths where captured PCAP files and logs will be stored locally on the sensor. If the installation worked as expected, these paths should be prepopulated to reflect paths on the volumes formatted at install time for the purpose storing these artifacts. Usually these paths will exist on separate storage volumes. Enabling the PCAP and log pruning autostart services (see the section on autostart services below) will enable monitoring of these paths to ensure that their contents do not consume more than 90% of their respective volumes' space. Choose **OK** to continue. + +![Specify capture paths](./images/hedgehog/images/capture_paths.png) + +### Automatic file extraction and scanning + +Hedgehog Linux can leverage Zeek's knowledge of network protocols to automatically detect file transfers and extract those files from network traffic as Zeek sees them. + +To specify which files should be extracted, specify the Zeek file carving mode: + +![Zeek file carving mode](./images/hedgehog/images/zeek_file_carve_mode.png) + +If you're not sure what to choose, either of **mapped (except common plain text files)** (if you want to carve and scan almost all files) or **interesting** (if you only want to carve and scan files with [mime types of common attack vectors](./interface/sensor_ctl/zeek/extractor_override.interesting.zeek)) is probably a good choice. + +Next, specify which carved files to preserve (saved on the sensor under `/capture/bro/capture/extract_files/quarantine` by default). In order to not consume all of the sensor's available storage space, the oldest preserved files will be pruned along with the oldest Zeek logs as described below with **AUTOSTART_PRUNE_ZEEK** in the [autostart services](#HedgehogConfigAutostart) section. + +You'll be prompted to specify which engine(s) to use to analyze extracted files. Extracted files can be examined through any of three methods: + +![File scanners](./images/hedgehog/images/zeek_file_carve_scanners.png) + +* scanning files with [**ClamAV**](https://www.clamav.net/); to enable this method, select **ZEEK_FILE_SCAN_CLAMAV** when specifying scanners for Zeek-carved files +* submitting file hashes to [**VirusTotal**](https://www.virustotal.com/en/#search); to enable this method, select **ZEEK_FILE_SCAN_VTOT** when specifying scanners for Zeek-carved files, then manually edit `/opt/sensor/sensor_ctl/control_vars.conf` and specify your [VirusTotal API key](https://developers.virustotal.com/reference) in `VTOT_API2_KEY` +* scanning files with [**Yara**](https://github.com/VirusTotal/yara); to enable this method, select **ZEEK_FILE_SCAN_YARA** when specifying scanners for Zeek-carved files +* scanning portable executable (PE) files with [**Capa**](https://github.com/fireeye/capa); to enable this method, select **ZEEK_FILE_SCAN_CAPA** when specifying scanners for Zeek-carved files + +Files which are flagged as potentially malicious will be logged as Zeek `signatures.log` entries, and can be viewed in the **Signatures** dashboard in [OpenSearch Dashboards](https://github.com/idaholab/Malcolm#DashboardsVisualizations) when forwarded to Malcolm. + +![File quarantine](./images/hedgehog/images/file_quarantine.png) + +Finally, you will be presented with the list of configuration variables that will be used for capture, including the values which you have configured up to this point in this section. Upon choosing **OK** these values will be written back out to the sensor configuration file located at `/opt/sensor/sensor_ctl/control_vars.conf`. It is not recommended that you edit this file manually. After confirming these values, you will be presented with a confirmation that these settings have been written to the configuration file, and you will be returned to the welcome screen. + +## Forwarding + +Select **Configure Forwarding** to set up forwarding logs and statistics from the sensor to an aggregator server, such as [Malcolm](https://github.com/idaholab/Malcolm). + +![Configure forwarders](./images/hedgehog/images/forwarder_config.png) + +There are five forwarder services used on the sensor, each for forwarding a different type of log or sensor metric. + +## capture: Arkime session forwarding + +[capture](https://github.com/arkime/arkime/tree/master/capture) is not only used to capture PCAP files, but also the parse raw traffic into sessions and forward this session metadata to an [OpenSearch](https://opensearch.org/) database so that it can be viewed in [Arkime viewer](https://arkime.com/), whether standalone or as part of a [Malcolm](https://github.com/idaholab/Malcolm) instance. If you're using Hedgehog Linux with Malcolm, please read [Correlating Zeek logs and Arkime sessions](https://github.com/idaholab/Malcolm#ZeekArkimeFlowCorrelation) in the Malcolm documentation for more information. + +First, select the OpenSearch connection transport protocol, either **HTTPS** or **HTTP**. If the metrics are being forwarded to Malcolm, select **HTTPS** to encrypt messages from the sensor to the aggregator using TLS v1.2 using ECDHE-RSA-AES128-GCM-SHA256. If **HTTPS** is chosen, you must choose whether to enable SSL certificate verification. If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication)), choose **None**. + +![OpenSearch connection protocol](./images/hedgehog/images/opensearch_connection_protocol.png) ![OpenSearch SSL verification](./images/hedgehog/images/opensearch_ssl_verification.png) + +Next, enter the **OpenSearch host** IP address (ie., the IP address of the aggregator) and port. These metrics are written to an OpenSearch database using a RESTful API, usually using port 9200. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. + +![OpenSearch host and port](./images/hedgehog/images/arkime-capture-ip-port.png) + +You will be asked to enter authentication credentials for the sensor's connections to the aggregator's OpenSearch API. After you've entered the username and the password, the sensor will attempt a test connection to OpenSearch using the connection information provided. + +![OpenSearch username](./images/hedgehog/images/opensearch_username.png) ![OpenSearch password](./images/hedgehog/images/opensearch_password.png) ![Successful OpenSearch connection](./images/hedgehog/images/opensearch_connection_success.png) + +Finally, you will be shown a dialog for a list of IP addresses used to populate an access control list (ACL) for hosts allowed to connect back to the sensor for retrieving session payloads from its PCAP files for display in Arkime viewer. The list will be prepopulated with the IP address entered a few screens prior to this one. + +![PCAP retrieval ACL](./images/hedgehog/images/malcolm_arkime_reachback_acl.png) + +Finally, you'll be given the opportunity to review the all of the Arkime `capture` options you've specified. Selecting **OK** will cause the parameters to be saved and you will be returned to the configuration tool's welcome screen. + +![capture settings confirmation](./images/hedgehog/images/arkime_confirm.png) + +## filebeat: Zeek and Suricata log forwarding + +[Filebeat](https://www.elastic.co/products/beats/filebeat) is used to forward [Zeek](https://www.zeek.org/) and [Suricata](https://suricata.io/) logs to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. + +To configure filebeat, first provide the log path (the same path previously configured for log file generation). + +![Configure filebeat for log forwarding](./images/hedgehog/images/filebeat_log_path.png) + +You must also provide the IP address of the Logstash instance to which the logs are to be forwarded, and the port on which Logstash is listening. These logs are forwarded using the Beats protocol, generally over port 5044. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. + +![Configure filebeat for log forwrading](./images/hedgehog/images/filebeat_ip_port.png) + +Next you are asked whether the connection used for log forwarding should be done **unencrypted** or over **SSL**. Unencrypted communication requires less processing overhead and is simpler to configure, but the contents of the logs may be visible to anyone who is able to intercept that traffic. + +![Filebeat SSL certificate verification](./images/hedgehog/images/filebeat_ssl.png) + +If **SSL** is chosen, you must choose whether to enable [SSL certificate verification](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html). If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication), choose **None**. + +![Unencrypted vs. SSL encryption for log forwarding](./images/hedgehog/images/filebeat_ssl_verify.png) + +The last step for SSL-encrypted log forwarding is to specify the SSL certificate authority, certificate, and key files. These files must match those used by the Logstash instance receiving the logs on the aggregator. If Malcolm's `auth_setup` script was used to generate these files they would be found in the `filebeat/certs/` subdirectory of the Malcolm installation and must be manually copied to the sensor (stored under `/opt/sensor/sensor_ctl/logstash-client-certificates` or in any other path accessible to the sensor account). Specify the location of the certificate authorities file (eg., `ca.crt`), the certificate file (eg., `client.crt`), and the key file (eg., `client.key`). + +![SSL certificate files](./images/hedgehog/images/filebeat_certs.png) + +The Logstash instance receiving the events must be similarly configured with matching SSL certificate and key files. Under Malcolm, the `BEATS_SSL` variable must be set to `true` in Malcolm's `docker-compose.yml` file and the SSL files must exist in the `logstash/certs/` subdirectory of the Malcolm installation. + +Once you have specified all of the filebeat parameters, you will be presented with a summary of the settings related to the forwarding of these logs. Selecting **OK** will cause the parameters to be written to filebeat's configuration keystore under `/opt/sensor/sensor_ctl/logstash-client-certificates` and you will be returned to the configuration tool's welcome screen. + +![Confirm filebeat settings](./images/hedgehog/images/filebeat_confirm.png) + +## miscbeat: System metrics forwarding + +The sensor uses [Fluent Bit](https://fluentbit.io/) to gather miscellaneous system resource metrics (CPU, network I/O, disk I/O, memory utilization, temperature, etc.) and the [Beats](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) protocol to forward these metrics to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. Metrics categories can be enabled/disabled as described in the [autostart services](#HedgehogConfigAutostart) section of this document. + +This forwarder's configuration is almost identical to that of [filebeat](#Hedgehogfilebeat) in the previous section. Select `miscbeat` from the forwarding configuration mode options and follow the same steps outlined above to set up this forwarder. + +## Autostart services + +Once the forwarders have been configured, the final step is to **Configure Autostart Services**. Choose this option from the configuration mode menu after the welcome screen of the sensor configuration tool. + +Despite configuring capture and/or forwarder services as described in previous sections, only services enabled in the autostart configuration will run when the sensor starts up. The available autostart processes are as follows (recommended services are in **bold text**): + +* **AUTOSTART_ARKIME** - [capture](#Hedgehogarkime-capture) PCAP engine for traffic capture, as well as traffic parsing and metadata insertion into OpenSearch for viewing in [Arkime](https://arkime.com/). If you are using Hedgehog Linux along with [Malcolm](https://github.com/idaholab/Malcolm) or another Arkime installation, this is probably the packet capture engine you want to use. +* **AUTOSTART_CLAMAV_UPDATES** - Virus database update service for ClamAV (requires sensor to be connected to the internet) +* **AUTOSTART_FILEBEAT** - [filebeat](#Hedgehogfilebeat) Zeek and Suricata log forwarder +* **AUTOSTART_FLUENTBIT_AIDE** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/exec) [AIDE](https://aide.github.io/) file system integrity checks +* **AUTOSTART_FLUENTBIT_AUDITLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/tail) [auditd](https://man7.org/linux/man-pages/man8/auditd.8.html) logs +* *AUTOSTART_FLUENTBIT_KMSG* - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/kernel-logs) the Linux kernel log buffer (these are generally reflected in syslog as well, which may make this agent redundant) +* **AUTOSTART_FLUENTBIT_METRICS** - [Fluent Bit](https://fluentbit.io/) agent for collecting [various](https://docs.fluentbit.io/manual/pipeline/inputs) system resource and performance metrics +* **AUTOSTART_FLUENTBIT_SYSLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/syslog) Linux syslog messages +* **AUTOSTART_FLUENTBIT_THERMAL** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/thermal) system temperatures +* **AUTOSTART_MISCBEAT** - [filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) forwarder which sends system metrics collected by [Fluent Bit](https://fluentbit.io/) to a remote Logstash instance (e.g., [Malcolm](https://github.com/idaholab/Malcolm)'s) +* *AUTOSTART_NETSNIFF* - [netsniff-ng](http://netsniff-ng.org/) PCAP engine for saving packet capture (PCAP) files +* **AUTOSTART_PRUNE_PCAP** - storage space monitor to ensure that PCAP files do not consume more than 90% of the total size of the storage volume to which PCAP files are written +* **AUTOSTART_PRUNE_ZEEK** - storage space monitor to ensure that Zeek logs do not consume more than 90% of the total size of the storage volume to which Zeek logs are written +* **AUTOSTART_SURICATA** - [Suricata](https://suricata.io/) traffic analysis engine +* **AUTOSTART_SURICATA_UPDATES** - Rule update service for Suricata (requires sensor to be connected to the internet) +* *AUTOSTART_TCPDUMP* - [tcpdump](https://www.tcpdump.org/) PCAP engine for saving packet capture (PCAP) files +* **AUTOSTART_ZEEK** - [Zeek](https://www.zeek.org/) traffic analysis engine + +Note that only one packet capture engine ([capture](https://arkime.com/), [netsniff-ng](http://netsniff-ng.org/), or [tcpdump](https://www.tcpdump.org/)) can be used. + +![Autostart services](./images/hedgehog/images/autostarts.png) + +Once you have selected the autostart services, you will be prompted to confirm your selections. Doing so will cause these values to be written back out to the `/opt/sensor/sensor_ctl/control_vars.conf` configuration file. + +![Autostart services confirmation](./images/hedgehog/images/autostarts_confirm.png) + +After you have completed configuring the sensor it is recommended that you reboot the sensor to ensure all new settings take effect. If rebooting is not an option, you may click the **Restart Sensor Services** menu icon in the top menu bar, or open a terminal and run: + +``` +/opt/sensor/sensor_ctl/shutdown && sleep 10 && /opt/sensor/sensor_ctl/supervisor.sh +``` + +This will cause the sensor services controller to stop, wait a few seconds, and restart. You can check the status of the sensor's processes by choosing **Sensor Status** from the sensor's kiosk mode, clicking the **Sensor Service Status** toolbar icon, or running `/opt/sensor/sensor_ctl/status` from the command line: + +``` +$ /opt/sensor/sensor_ctl/status +arkime:arkime-capture RUNNING pid 6455, uptime 0:03:17 +arkime:arkime-viewer RUNNING pid 6456, uptime 0:03:17 +beats:filebeat RUNNING pid 6457, uptime 0:03:17 +beats:miscbeat RUNNING pid 6458, uptime 0:03:17 +clamav:clamav-service RUNNING pid 6459, uptime 0:03:17 +clamav:clamav-updates RUNNING pid 6461, uptime 0:03:17 +fluentbit-auditlog RUNNING pid 6463, uptime 0:03:17 +fluentbit-kmsg STOPPED Not started +fluentbit-metrics:cpu RUNNING pid 6466, uptime 0:03:17 +fluentbit-metrics:df RUNNING pid 6471, uptime 0:03:17 +fluentbit-metrics:disk RUNNING pid 6468, uptime 0:03:17 +fluentbit-metrics:mem RUNNING pid 6472, uptime 0:03:17 +fluentbit-metrics:mem_p RUNNING pid 6473, uptime 0:03:17 +fluentbit-metrics:netif RUNNING pid 6474, uptime 0:03:17 +fluentbit-syslog RUNNING pid 6478, uptime 0:03:17 +fluentbit-thermal RUNNING pid 6480, uptime 0:03:17 +netsniff:netsniff-enp1s0 STOPPED Not started +prune:prune-pcap RUNNING pid 6484, uptime 0:03:17 +prune:prune-zeek RUNNING pid 6486, uptime 0:03:17 +supercronic RUNNING pid 6490, uptime 0:03:17 +suricata RUNNING pid 6501, uptime 0:03:17 +tcpdump:tcpdump-enp1s0 STOPPED Not started +zeek:capa RUNNING pid 6553, uptime 0:03:17 +zeek:clamav RUNNING pid 6512, uptime 0:03:17 +zeek:logger RUNNING pid 6554, uptime 0:03:17 +zeek:virustotal STOPPED Not started +zeek:watcher RUNNING pid 6510, uptime 0:03:17 +zeek:yara RUNNING pid 6548, uptime 0:03:17 +zeek:zeekctl RUNNING pid 6502, uptime 0:03:17 +``` \ No newline at end of file diff --git a/docs/hedgehog-config-zeek-intel.md b/docs/hedgehog-config-zeek-intel.md new file mode 100644 index 000000000..cef702d53 --- /dev/null +++ b/docs/hedgehog-config-zeek-intel.md @@ -0,0 +1,7 @@ +# Zeek Intelligence Framework + +To quote Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) documentation, "The goals of Zeek’s Intelligence Framework are to consume intelligence data, make it available for matching, and provide infrastructure to improve performance and memory utilization. Data in the Intelligence Framework is an atomic piece of intelligence such as an IP address or an e-mail address. This atomic data will be packed with metadata such as a freeform source field, a freeform descriptive field, and a URL which might lead to more information about the specific item." Zeek [intelligence](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html) [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type) include IP addresses, URLs, file names, hashes, email addresses, and more. + +Hedgehog Linux doesn't come bundled with intelligence files from any particular feed, but they can be easily included into your local instance. Before Zeek starts, Hedgehog Linux configures it such that intelligence files will be automatically included in its local policy. Subdirectories under `/opt/sensor/sensor_ctl/zeek/intel` which contain their own `__load__.zeek` file will be `@load`-ed as-is, while subdirectories containing "loose" intelligence files will be [loaded](https://docs.zeek.org/en/master/frameworks/intel.html#loading-intelligence) automatically with a `redef Intel::read_files` directive. + +Note that Hedgehog Linux does not manage updates for these intelligence files. You should use the update mechanism suggested by your feeds' maintainers to keep them up to date. Adding and deleting intelligence files under this directory will take effect upon restarting Zeek. \ No newline at end of file diff --git a/docs/hedgehog-config.md b/docs/hedgehog-config.md new file mode 100644 index 000000000..9fcb2fb4e --- /dev/null +++ b/docs/hedgehog-config.md @@ -0,0 +1,17 @@ +# Configuration + +Kiosk mode can be exited by connecting an external USB keyboard and pressing **Alt+F4**, upon which the *sensor* user's desktop is shown. + +![Sensor login session desktop](./images/hedgehog/images/desktop.png) + +Several icons are available in the top menu bar: + +* **Terminal** - opens a command prompt in a terminal emulator +* **Browser** - opens a web browser +* **Kiosk** – returns the sensor to kiosk mode +* **README** – displays this document +* **Sensor status** – displays a list with the status of each sensor service +* **Configure capture and forwarding** – opens a dialog for configuring the sensor's capture and forwarding services, as well as specifying which services should autostart upon boot +* **Configure interfaces and hostname** – opens a dialog for configuring the sensor's network interfaces and setting the sensor's hostname +* **Restart sensor services** - stops and restarts all of the [autostart services](hedgehog-config-user.md#HedgehogConfigAutostart) + diff --git a/docs/hedgehog-hardening.md b/docs/hedgehog-hardening.md new file mode 100644 index 000000000..9db45a9b9 --- /dev/null +++ b/docs/hedgehog-hardening.md @@ -0,0 +1,59 @@ +# Appendix D - Hardening + +Hedgehog Linux uses the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmarks which target the following guidelines for establishing a secure configuration posture: + +* [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) +* [DISA STIG (Security Technical Implementation Guides) for RHEL 7](https://www.stigviewer.com/stig/red_hat_enterprise_linux_7/) v2r5 Ubuntu v1r2 [adapted](https://github.com/hardenedlinux/STIG-OS-mirror/blob/master/redhat-STIG-DOCs/U_Red_Hat_Enterprise_Linux_7_V2R5_STIG.zip) for a Debian operating system +* Additional recommendations from [cisecurity.org](https://www.cisecurity.org/) + +## Compliance Exceptions + +[Currently](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) there are 274 checks to determine compliance with the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmark. + +Hedgehog Linux claims exceptions from the recommendations in this benchmark in the following categories: + +**1.1 Install Updates, Patches and Additional Security Software** - When the the Malcolm aggregator appliance software is built, all of the latest applicable security patches and updates are included in it. How future updates are to be handled is still in design. + +**1.3 Enable verify the signature of local packages** - As the base distribution is not using embedded signatures, `debsig-verify` would reject all packages (see comment in `/etc/dpkg/dpkg.cfg`). Enabling it after installation would disallow any future updates. + +**2.14 Add nodev option to /run/shm Partition**, **2.15 Add nosuid Option to /run/shm Partition**, **2.16 Add noexec Option to /run/shm Partition** - Hedgehog Linux does not mount `/run/shm` as a separate partition, so these recommendations do not apply. + +**2.19 Disable Mounting of freevxfs Filesystems**, **2.20 Disable Mounting of jffs2 Filesystems**, **2.21 Disable Mounting of hfs Filesystems**, **2.22 Disable Mounting of hfsplus Filesystems**, **2.23 Disable Mounting of squashfs Filesystems**, **2.24 Disable Mounting of udf Filesystems** - Hedgehog Linux is not compiling a custom Linux kernel, so these filesystems are inherently supported as they are part Debian Linux's default kernel. + +**3.3 Set Boot Loader Password** - As maximizing availability is a system requirement, Malcolm should restart automatically without user intervention to ensured uninterrupted service. A boot loader password is not enabled. + +**4.8 Disable USB Devices** - The ability to ingest data (such as PCAP files) from a mounted USB mass storage device is a requirement of the system. + +**6.1 Ensure the X Window system is not installed**, **6.2 Ensure Avahi Server is not enabled**, **6.3 Ensure print server is not enabled** - An X Windows session is provided for displaying dashboards. The library packages `libavahi-common-data`, `libavahi-common3`, and `libcups2` are dependencies of some of the X components used by Hedgehog Linux, but the `avahi` and `cups` services themselves are disabled. + +**6.17 Ensure virus scan Server is enabled**, **6.18 Ensure virus scan Server update is enabled** - As this is a network traffic analysis appliance rather than an end-user device, regular user files will not be created. A virus scan program would impact device performance and would be unnecessary. + +**7.1.1 Disable IP Forwarding**, **7.2.4 Log Suspicious Packets**, **7.2.7 Enable RFC-recommended Source Route Validation**, **7.4.1 Install TCP Wrappers** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, these recommendations do not apply. + +**8.1.1.2 Disable System on Audit Log Full**, **8.1.1.3 Keep All Auditing Information**, **8.1.1.5 Ensure set remote_server for audit service**, **8.1.1.6 Ensure enable_krb5 set to yes for remote audit service**, **8.1.1.7 Ensure set action for audit storage volume is fulled**, **8.1.1.8 Ensure set action for network failure on remote audit service**, **8.1.1.9 Set space left for auditd service**, a few other audit-related items under section **8.1**, **8.2.4 Configure rsyslog to Send Logs to a Remote Log Host** - As maximizing availability is a system requirement, audit processing failures will be logged on the device rather than halting the system. `auditd` is set up to syslog when its local storage capacity is reached. + +**8.4.2 Implement Periodic Execution of File Integrity** - This functionality is not configured by default, but it can be configured post-install by the end user. + +Password-related recommendations under **9.2** and **10.1** - The library package `libpam-pwquality` is used in favor of `libpam-cracklib` which is what the [compliance scripts](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) are looking for. Also, as an appliance running Malcolm is intended to be used as an appliance rather than a general user-facing software platform, some exceptions to password enforcement policies are claimed. + +**9.3.13 Limit Access via SSH** - Hedgehog Linux does not create multiple regular user accounts: only `root` and a sensor service account are used. SSH access for `root` is disabled. SSH login with a password is also disallowed: only key-based authentication is accepted. The service account accepts no keys by default. As such, the `AllowUsers`, `AllowGroups`, `DenyUsers`, and `DenyGroups` values in `sshd_config` do not apply. + +**9.4 Restrict Access to the su Command** - Hedgehog Linux does not create multiple regular user accounts: only `root` and a sensor service account are used. + +**10.1.6 Remove nopasswd option from the sudoers configuration** - A very limited set of operations (a single script used to run the AIDE integrity check as a non-root user) has the NOPASSWD option set to allow it to be run in the background without user intervention. + +**10.1.10 Set maxlogins for all accounts** and **10.5 Set Timeout on ttys** - Hedgehog Linux does not create multiple regular user accounts: only `root` and a sensor service account are used. + +**12.10 Find SUID System Executables**, **12.11 Find SGID System Executables** - The few files found by [these](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.10_find_suid_files.sh) [scripts](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.11_find_sgid_files.sh) are valid exceptions required by Hedgehog Linux's core requirements. + +**14.1 Defense for NAT Slipstreaming** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, this recommendation does not apply. + +Please review the notes for these additional guidelines. While not claiming an exception, Hedgehog Linux may implement them in a manner different than is described by the [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) or the [hardenedlinux/harbian-audit](https://github.com/hardenedlinux/harbian-audit) audit scripts. + +**4.1 Restrict Core Dumps** - Hedgehog Linux disables core dumps using a configuration file for `ulimit` named `/etc/security/limits.d/limits.conf`. The [audit script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/4.1_restrict_core_dumps.sh) checking for this does not check the `limits.d` subdirectory, which is why this is incorrectly flagged as noncompliant. + +**5.4 Ensure ctrl-alt-del is disabled** - Hedgehog Linux disables the `ctrl+alt+delete` key sequence by executing `systemctl disable ctrl-alt-del.target` during installation and the command `systemctl mask ctrl-alt-del.target` at boot time. + +**7.4.4 Create /etc/hosts.deny**, **7.7.1 Ensure Firewall is active**, **7.7.4.1 Ensure default deny firewall policy**, **7.7.4.2 Ensure loopback traffic is configured**, **7.7.4.3 Ensure default deny firewall policy**, **7.7.4.4 Ensure outbound and established connections are configured** - Hedgehog Linux **is** configured with an appropriately locked-down software firewall (managed by "Uncomplicated Firewall" `ufw`). However, the methods outlined in the CIS benchmark recommendations do not account for this configuration. + +**8.6 Verifies integrity all packages** - The [script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/8.7_verify_integrity_packages.sh) which verifies package integrity only "fails" because of missing (status `??5??????` displayed by the utility) language ("locale") files, which are removed as part of Hedgehog Linux's trimming-down process. All non-locale-related system files pass intergrity checks. \ No newline at end of file diff --git a/docs/hedgehog-installation.md b/docs/hedgehog-installation.md new file mode 100644 index 000000000..135b954dc --- /dev/null +++ b/docs/hedgehog-installation.md @@ -0,0 +1,41 @@ +# Sensor installation + +## Image boot options + +The Hedgehog Linux installation image, when provided on an optical disc, USB thumb drive, or other removable medium, can be used to install or reinstall the sensor software. + +![Sensor installation image boot menu](./images/hedgehog/images/boot_options.png) + +The boot menu of the sensor installer image provides several options: + +* **Live system** and **Live system (fully in RAM)** may also be used to run the sensor in a "live USB" mode without installing any software or making any persistent configuration changes on the sensor hardware. +* **Install Hedgehog Linux** and **Install Hedgehog Linux (encrypted)** are used to [install the sensor](#HedgehogInstaller) onto the current system. Both selections install the same operating system and sensor software, the only difference being that the **encrypted** option encrypts the hard disks with a password (provided in a subsequent step during installation) that must be provided each time the sensor boots. There is some CPU overhead involved in an encrypted installation, so it is recommended that encrypted installations only be used for mobile installations (eg., on a sensor that may be shipped or carried for an incident response) and that the unencrypted option be used for fixed sensors in secure environments. +* **Install Hedgehog Linux (advanced configuration)** allows you to configure installation fully using all of the [Debian installer](https://www.debian.org/releases/stable/amd64/) settings and should only be selected for advanced users who know what they're doing. +* **Rescue system** is included for debugging and/or system recovery and should not be needed in most cases. + +## Installer + +The sensor installer is designed to require as little user input as possible. For this reason, there are NO user prompts and confirmations about partitioning and reformatting hard disks for use by the sensor. The installer assumes that all non-removable storage media (eg., SSD, HDD, NVMe, etc.) are available for use and ⛔🆘😭💀 ***will partition and format them without warning*** 💀😭🆘⛔. + +The installer will ask for a few pieces of information prior to installing the sensor operating system: + +* **Root password** – a password for the privileged root account which is rarely needed (only during the configuration of the sensors network interfaces and setting the sensor host name) +* **User password** – a password for the non-privileged sensor account under which the various sensor capture and forwarding services run +* **Encryption password** (optional) – if the encrypted installation option was selected at boot time, the encryption password must be entered every time the sensor boots + +Each of these passwords must be entered twice to ensure they were entered correctly. + +![Example of the installer's password prompt](./images/hedgehog/images/users_and_passwords.png) + +After the passwords have been entered, the installer will proceed to format the system drive and install Hedgehog Linux. + +![Installer progress](./images/hedgehog/images/installer_progress.png) + +At the end of the installation process, you will be prompted with a few self-explanatory yes/no questions: + +* **Disable IPv6?** +* **Automatically login to the GUI session?** +* **Should the GUI session be locked due to inactivity?** +* **Display the [Standard Mandatory DoD Notice and Consent Banner](https://www.stigviewer.com/stig/application_security_and_development/2018-12-24/finding/V-69349)?** *(only applies when installed on U.S. government information systems)* + +Following these prompts, the installer will reboot and Hedgehog Linux will boot. \ No newline at end of file diff --git a/docs/hedgehog-iso-build.md b/docs/hedgehog-iso-build.md new file mode 100644 index 000000000..d92f449cc --- /dev/null +++ b/docs/hedgehog-iso-build.md @@ -0,0 +1,36 @@ +# Appendix A - Generating the ISO + +Official downloads of the Hedgehog Linux installer ISO are not provided: however, it can be built easily on an internet-connected Linux host with Vagrant: + +* [Vagrant](https://www.vagrantup.com/) + - [`vagrant-reload`](https://github.com/aidanns/vagrant-reload) plugin + - [`vagrant-sshfs`](https://github.com/dustymabe/vagrant-sshfs) plugin + - [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box + +The build should work with either the [VirtualBox](https://www.virtualbox.org/) provider or the [libvirt](https://libvirt.org/) provider: + +* [VirtualBox](https://www.virtualbox.org/) [provider](https://www.vagrantup.com/docs/providers/virtualbox) + - [`vagrant-vbguest`](https://github.com/dotless-de/vagrant-vbguest) plugin +* [libvirt](https://libvirt.org/) + - [`vagrant-libvirt`](https://github.com/vagrant-libvirt/vagrant-libvirt) provider plugin + - [`vagrant-mutate`](https://github.com/sciurus/vagrant-mutate) plugin to convert [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box to `libvirt` format + +To perform a clean build the Hedgehog Linux installer ISO, navigate to your local [Malcolm](https://github.com/idaholab/Malcolm/) working copy and run: + +``` +$ ./sensor-iso/build_via_vagrant.sh -f +… +Starting build machine... +Bringing machine 'default' up with 'virtualbox' provider... +… +``` + +Building the ISO may take 90 minutes or more depending on your system. As the build finishes, you will see the following message indicating success: + +``` +… +Finished, created "/sensor-build/hedgehog-6.4.0.iso" +… +``` + +Alternately, if you have forked Malcolm on GitHub, [workflow files](../.github/workflows/) are provided which contain instructions for GitHub to build the docker images and Hedgehog and [Malcolm](https://github.com/idaholab/Malcolm) installer ISOs, specifically [`sensor-iso-build-docker-wrap-push-ghcr.yml`](../.github/workflows/sensor-iso-build-docker-wrap-push-ghcr.yml) for the Hedgehog ISO. The resulting ISO file is wrapped in a Docker image that provides an HTTP server from which the ISO may be downloaded. \ No newline at end of file diff --git a/docs/hedgehog-ssh.md b/docs/hedgehog-ssh.md new file mode 100644 index 000000000..4a5f515d4 --- /dev/null +++ b/docs/hedgehog-ssh.md @@ -0,0 +1,19 @@ +# Appendix B - Configuring SSH access + +SSH access to the sensor's non-privileged sensor account is only available using secure key-based authentication which can be enabled by adding a public SSH key to the **/home/sensor/.ssh/authorized_keys** file as illustrated below: + +``` +sensor@sensor:~$ mkdir -p ~/.ssh + +sensor@sensor:~$ ssh analyst@172.16.10.48 "cat ~/.ssh/id_rsa.pub" >> ~/.ssh/authorized_keys +The authenticity of host '172.16.10.48 (172.16.10.48)' can't be established. +ECDSA key fingerprint is SHA256:... +Are you sure you want to continue connecting (yes/no)? yes +Warning: Permanently added '172.16.10.48' (ECDSA) to the list of known hosts. +analyst@172.16.10.48's password: + +sensor@sensor:~$ cat ~/.ssh/authorized_keys +ssh-rsa AAA...kff analyst@SOC +``` + +SSH access should only be configured when necessary. \ No newline at end of file diff --git a/docs/hedgehog-troubleshooting.md b/docs/hedgehog-troubleshooting.md new file mode 100644 index 000000000..736f50d40 --- /dev/null +++ b/docs/hedgehog-troubleshooting.md @@ -0,0 +1,9 @@ +# Appendix C - Troubleshooting + +Should the sensor not function as expected, first try rebooting the device. If the behavior continues, here are a few things that may help you diagnose the problem (items which may require Linux command line use are marked with **†**) + +* Stop / start services – Using the sensor's kiosk mode, attempt a **Services Stop** followed by a **Services Start**, then check **Sensor Status** to see which service(s) may not be running correctly. +* Sensor configuration file – See `/opt/sensor/sensor_ctl/control_vars.conf` for sensor service settings. It is not recommended to manually edit this file unless you are sure of what you are doing. +* Sensor control scripts – There are scripts under ``/opt/sensor/sensor_ctl/`` to control sensor services (eg., `shutdown`, `start`, `status`, `stop`, etc.) +* Sensor debug logs – Log files under `/opt/sensor/sensor_ctl/log/` may contain clues to processes that are not working correctly. If you can determine which service is failing, you can attempt to reconfigure it using the instructions in the Configure Capture and Forwarding section of this document. +* `sensorwatch` script – Running `sensorwatch` on the command line will display the most recently modified PCAP and Zeek log files in their respective directories, how much storage space they are consuming, and the amount of used/free space on the volumes containing those files. \ No newline at end of file diff --git a/docs/hedgehog-upgrade.md b/docs/hedgehog-upgrade.md new file mode 100644 index 000000000..b4e39ee3e --- /dev/null +++ b/docs/hedgehog-upgrade.md @@ -0,0 +1,330 @@ +# Appendix E - Upgrades + +At this time there is not an "official" upgrade procedure to get from one release of Hedgehog Linux to the next. Upgrading the underlying operating system packages is generally straightforward, but not all of the Hedgehog Linux components are packaged into .deb archives yet as they should be, so for now it's a manual (and kind of nasty) process to Frankenstein an upgrade into existance. The author of this project intends to remedy this at some future point when time and resources allow. + +If possible, it would save you **a lot** of trouble to just [re-ISO](hedgehog-installation.md#HedgehogInstallation) your Hedgehog installation and start fresh, backing up the files (in `/opt/sensor/sensor_ctl`) first and reconfiguring or restoring them as needed afterwards. + +However, if reinstalling the system is not an option, here is the basic process for doing a manual upgrade of Hedgehog Linux. It should be understood that this process is very likely to break your system, and there is **no** guarantee of any kind that any of this will work, or that these instructions are even complete or any support whatsoever regarding them. Really, it will be **much** easier if you re-ISO your installation. But for the brave among you, here you go. ⛔🆘😭💀 + +## Prerequisites + +* A good understanding of the Linux command line +* An existing installation of Hedgehog Linux **with internet access** +* A copy of the Hedgehog Linux [ISO](hedgehog-iso-build.md#HedgehogISOBuild) for the version approximating the one you're upgrading to (i.e., the latest version), **and** + - Either a separate VM with that ISO installed **OR** + - A separate Linux workstation where you can manually mount that ISO to pull stuff off of it + +## Upgrade + +1. Obtain a root shell + - `su -` + +2. Temporarily set the umask value to Debian default instead of the more restrictive Hedgehog Linux default. This will allow updates to be applied with the right permissions. + - `umask 0022` + +3. Create backups of some files + - `cp /etc/apt/sources.list /etc/apt/sources.list.bak` + +4. Set up alternate package sources, if needed + - In an offline/airgapped scenario, you could use [apt-mirror](https://apt-mirror.github.io) to mirror Debian repos and [bandersnatch](https://github.com/pypa/bandersnatch/) to mirror PyPI sources, or [combine them](https://github.com/mmguero/espejo) with Docker. If you were to do this, you'd probably want to make the following changes (and **revert them after the upgrade**): + + create `/etc/apt/apt.conf.d/80ssl-exceptions` to ignore self-signed certificate warnings from using your apt-mirror +``` +Acquire::https { + Verify-Peer "false"; + Verify-Host "false"; +} +``` + + + modify `/etc/apt/source.list` to point to your apt-mirror: + +``` +deb https://XXXXXX:443/debian buster main contrib non-free +deb https://XXXXXX:443/debian-security buster/updates main contrib non-free +deb https://XXXXXX:443/debian buster-updates main contrib non-free +deb https://XXXXXX:443/debian buster-backports main contrib non-free +``` + +5. Update underlying system packages with `apt-get` + - `apt-get update && apt-get dist-upgrade` + +6. If there were [new system deb packages added](https://github.com/idaholab/Malcolm/tree/main/sensor-iso/config/package-lists) to this release of Hedgehog Linux (you might have to [manually compare](https://github.com/idaholab/Malcolm/commits/main/sensor-iso/config/package-lists) on GitHub), install them. If you're not sure, of course, you could just install everything, like this (although you may have to tweak some version numbers or something if the base distribution of your Hedgehog branch is different than `main`; in this example I'm not jumping between Debian releases, just upgrading within a release): +``` +$ for LIST in apps desktopmanager net system; do curl -L -J -O https://raw.github.com/idaholab/Malcolm/main/sensor-iso/config/package-lists/$LIST.list.chroot; done +... +$ apt-get install $(cat *.list.chroot) +``` + +7. Update underlying python packages with `python3 -m pip` + * `apt-get install -y build-essential git-core pkg-config python3-dev` + * `python3 -m pip list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -r -n1 python3 -m pip install -U` + - if this fails for some reason, you may need to reinstall pip first with `python3 -m pip install --force -U pip` + - some *very* old builds of Hedgehog Linux had separate Python 3.5 and 3.7 installations: in this case, you'd need to do this for both `python3 -m pip` and `python3.7 -m pip` (or whatever `python3.x` you have) + * If there were [new python packages](https://raw.githubusercontent.com/idaholab/Malcolm/master/sensor-iso/config/hooks/normal/0169-pip-installs.hook.chroot) added to this release of Hedgehog Linux (you might have to [manually compare](https://github.com/idaholab/Malcolm/blame/main/sensor-iso/config/hooks/normal/0169-pip-installs.hook.chroot) on GitHub), install them. If you are using a PyPI mirror, replace `XXXXXX` here with your mirror's IP. The `colorama` package is used here as an example, your package list might vary. + - `python3 -m pip install --no-compile --no-cache-dir --force-reinstall --upgrade --index-url=https://XXXXXX:443/pypi/simple --trusted-host=XXXXXX:443 colorama` + +8. Okay, **now** things start to get a little bit ugly. You're going to need access to the ISO of the release of Hedgehog Linux you're upgrading to, as we're going to grab some packages off of it. On another Linux system, [build it](hedgehog-iso-build.md#HedgehogISOBuild). + +9. Use a disk image mounter to mount the ISO, **or** if you want to just install the ISO in a VM and grab the files we need off of it, that's fine too. But I'll go through the example as if I've mounted the ISO. + +10. Navigate to the `/live/` directory, and mount the `filesystem.squashfs` file + - `sudo mount filesystem.squashfs /media/squash -t squashfs -o loop` + - **OR** + - `squashfuse filesystem.squashfs /home/user/media/squash` + +11. Very recent builds of Hedgehog Linux keep some build artifacts in `/opt/hedgehog_install_artifacts/`. You're going to want to grab those files and throw them in a temporary directory on the system you're upgrading, via SSH or whatever means you devise. +``` +root@hedgehog:/tmp# scp -r user@otherbox:/media/squash/opt/hedgehog_install_artifacts/ ./ +user@otherbox's password: +filebeat-tweaked-7.6.2-amd64.deb 100% 13MB 65.9MB/s 00:00 +arkime_2.2.3-1_amd64.deb 100% 113MB 32.2MB/s 00:03 +netsniff-ng_0.6.6-1_amd64.deb 100% 330KB 52.1MB/s 00:00 +zeek_3.0.20-1_amd64.deb 100% 26MB 63.1MB/s 00:00 +``` + +12. Blow away the old `zeek` package, we're going to start clean with that one particularly. The others should be fine to upgrade in place. +``` +root@hedgehog:/opt# apt-get --purge remove zeek +Reading package lists... Done +Building dependency tree +Reading state information... Done +The following packages will be REMOVED: + zeek* +0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded. +After this operation, 160 MB disk space will be freed. +Do you want to continue? [Y/n] y +(Reading database ... 118490 files and directories currently installed.) +Removing zeek (3.0.20-1) ... +dpkg: warning: while removing zeek, directory '/opt/zeek/spool' not empty so not removed +dpkg: warning: while removing zeek, directory '/opt/zeek/share/zeek/site' not empty so not removed +dpkg: warning: while removing zeek, directory '/opt/zeek/lib' not empty so not removed +dpkg: warning: while removing zeek, directory '/opt/zeek/bin' not empty so not removed +root@hedgehog:/opt# rm -rf /opt/zeek* +``` + +13. Install the new .deb files. You're going to have some warnings, but that's okay. +``` +root@hedgehog:/tmp# dpkg -i hedgehog_install_artifacts/*.deb +(Reading database ... 118149 files and directories currently installed.) +Preparing to unpack .../filebeat-tweaked-7.6.2-amd64.deb ... +Unpacking filebeat (7.6.2) over (6.8.4) ... +dpkg: warning: unable to delete old directory '/usr/share/filebeat/kibana/6/dashboard': Directory not empty +dpkg: warning: unable to delete old directory '/usr/share/filebeat/kibana/6': Directory not empty +Preparing to unpack .../arkime_2.2.3-1_amd64.deb ... +Unpacking arkime (2.2.3-1) over (2.0.1-1) ... +Preparing to unpack .../netsniff-ng_0.6.6-1_amd64.deb ... +Unpacking netsniff-ng (0.6.6-1) over (0.6.6-1) ... +Preparing to unpack .../zeek_3.0.20-1_amd64.deb ... +Unpacking zeek (3.0.20-1) over (3.0.0-1) ... +Setting up filebeat (7.6.2) ... +Installing new version of [...] +[...] +Setting up arkime (2.2.3-1) ... +READ /opt/arkime/README.txt and RUN /opt/arkime/bin/Configure +Setting up netsniff-ng (0.6.6-1) ... +Setting up zeek (3.0.20-1) ... +Processing triggers for systemd (232-25+deb9u12) ... +Processing triggers for man-db (2.7.6.1-2) ... +``` + +14. Fix anything that might need fixing as far as the deb package requirements go + - `apt-get -f install` + +15. We just installed a Zeek .deb, but the third-part plugins packages and local config weren't part of that package. So we're going to `rsync` those from the other box where we have the ISO and `filesystem.squashfs` mounted as well: +``` +root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/opt/zeek/ /opt/zeek +user@otherbox's password: + +root@hedgehog:/tmp# ls -l /opt/zeek/share/zeek/site/ +total 52 +lrwxrwxrwx 1 root root 13 May 6 21:52 bzar -> packages/bzar +lrwxrwxrwx 1 root root 22 May 6 21:50 cve-2020-0601 -> packages/cve-2020-0601 +-rw-r--r-- 1 root root 2031 Apr 30 16:02 extractor.zeek +-rw-r--r-- 1 root root 39134 May 1 14:20 extractor_params.zeek +lrwxrwxrwx 1 root root 14 May 6 21:52 hassh -> packages/hassh +lrwxrwxrwx 1 root root 12 May 6 21:52 ja3 -> packages/ja3 +-rw-rw-r-- 1 root root 2005 May 6 21:54 local.zeek +drwxr-xr-x 13 root root 4096 May 6 21:52 packages +lrwxrwxrwx 1 root root 27 May 6 21:52 zeek-EternalSafety -> packages/zeek-EternalSafety +lrwxrwxrwx 1 root root 26 May 6 21:52 zeek-community-id -> packages/zeek-community-id +lrwxrwxrwx 1 root root 27 May 6 21:51 zeek-plugin-bacnet -> packages/zeek-plugin-bacnet +lrwxrwxrwx 1 root root 25 May 6 21:51 zeek-plugin-enip -> packages/zeek-plugin-enip +lrwxrwxrwx 1 root root 29 May 6 21:51 zeek-plugin-profinet -> packages/zeek-plugin-profinet +lrwxrwxrwx 1 root root 27 May 6 21:52 zeek-plugin-s7comm -> packages/zeek-plugin-s7comm +lrwxrwxrwx 1 root root 24 May 6 21:52 zeek-plugin-tds -> packages/zeek-plugin-tds +``` + +16. The `zeekctl` component of zeek doesn't like being run by an unprivileged user unless the whole directory is owned by that user. As Hedgehog Linux runs everything it can as an unprivileged user, we're going to reset zeek to a "clean" state after each reboot. Zeek's config files will get regenerated when Zeek itself is started. So, now make a complete backup of `/opt/zeek` as it's going to have its ownership changed during runtime: +``` +root@hedgehog:/tmp# rsync -a /opt/zeek/ /opt/zeek.orig + +root@hedgehog:/tmp# chown -R sensor:sensor /opt/zeek/* + +root@hedgehog:/tmp# chown -R root:root /opt/zeek.orig/* + +root@hedgehog:/tmp# ls -l /opt/ | grep zeek +drwxr-xr-x 8 root root 4096 May 8 15:48 zeek +drwxr-xr-x 8 root root 4096 May 8 15:48 zeek.orig +``` + +17. Grab other new scripts and stuff from our mount of the ISO using `rsync`: +``` +root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/usr/local/bin/ /usr/local/bin +user@otherbox's password: + +root@hedgehog:/tmp# ls -l /usr/local/bin/ | tail +lrwxrwxrwx 1 root root 18 May 8 14:34 zeek -> /opt/zeek/bin/zeek +-rwxr-xr-x 1 root staff 10349 Oct 29 2019 zeek_carve_logger.py +-rwxr-xr-x 1 root staff 10467 Oct 29 2019 zeek_carve_scanner.py +-rw-r--r-- 1 root staff 25756 Oct 29 2019 zeek_carve_utils.py +-rwxr-xr-x 1 root staff 8787 Oct 29 2019 zeek_carve_watcher.py +-rwxr-xr-x 1 root staff 4883 May 4 17:39 zeek_install_plugins.sh + +root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/opt/yara-rules/ /opt/yara-rules +user@otherbox's password: + +root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/opt/capa-rules/ /opt/capa-rules +user@otherbox's password: + +root@hedgehog:/tmp# ls -l /opt/ | grep '\-rules' +drwxr-xr-x 8 root root 4096 May 8 15:48 capa-rules +drwxr-xr-x 8 root root 24576 May 8 15:48 yara-rules + +root@hedgehog:/tmp# for BEAT in filebeat; do rsync -a user@otherbox:/media/squash/usr/share/$BEAT/kibana/ /usr/share/$BEAT/kibana; done +user@otherbox's password: +user@otherbox's password: + +root@hedgehog:/tmp# rsync -avP --delete user@otherbox:/media/squash/etc/audit/rules.d/ /etc/audit/rules.d/ +user@otherbox's password: + +root@hedgehog:/tmp# rsync -avP --delete user@otherbox:/media/squash/etc/sudoers.d/ /etc/sudoers.d/ +user@otherbox's password: + +root@hedgehog:/tmp# chmod 400 /etc/sudoers.d/* +``` + +18. Set capabilities and symlinks for network capture programs to be used by the unprivileged user: + +commands: + +``` +chown root:netdev /usr/sbin/netsniff-ng && \ + setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip CAP_SYS_ADMIN+eip' /usr/sbin/netsniff-ng +chown root:netdev /opt/zeek/bin/zeek && \ + setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/zeek/bin/zeek +chown root:netdev /sbin/ethtool && \ + setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /sbin/ethtool +chown root:netdev /opt/zeek/bin/capstats && \ + setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /opt/zeek/bin/capstats +chown root:netdev /usr/bin/tcpdump && \ + setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /usr/bin/tcpdump +chown root:netdev /opt/arkime/bin/capture && \ + setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/arkime/bin/capture + +ln -s -f /opt/zeek/bin/zeek /usr/local/bin/ +ln -s -f /usr/sbin/netsniff-ng /usr/local/bin/ +ln -s -f /usr/bin/tcpdump /usr/local/bin/ +ln -s -f /opt/arkime/bin/capture /usr/local/bin/ +ln -s -f /opt/arkime/bin/npm /usr/local/bin +ln -s -f /opt/arkime/bin/node /usr/local/bin +ln -s -f /opt/arkime/bin/npx /usr/local/bin +``` + +example: + +``` +root@hedgehog:/tmp# chown root:netdev /usr/sbin/netsniff-ng && \ +> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip CAP_SYS_ADMIN+eip' /usr/sbin/netsniff-ng +root@hedgehog:/tmp# chown root:netdev /opt/zeek/bin/zeek && \ +> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/zeek/bin/zeek +root@hedgehog:/tmp# chown root:netdev /sbin/ethtool && \ +> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /sbin/ethtool +root@hedgehog:/tmp# chown root:netdev /opt/zeek/bin/capstats && \ +> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /opt/zeek/bin/capstats +root@hedgehog:/tmp# chown root:netdev /usr/bin/tcpdump && \ +> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /usr/bin/tcpdump +root@hedgehog:/tmp# chown root:netdev /opt/arkime/bin/capture && \ +> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/arkime/bin/capture +root@hedgehog:/tmp# ln -s -f /opt/zeek/bin/zeek /usr/local/bin/ +root@hedgehog:/tmp# ln -s -f /usr/sbin/netsniff-ng /usr/local/bin/ +root@hedgehog:/tmp# ln -s -f /usr/bin/tcpdump /usr/local/bin/ +root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/capture /usr/local/bin/ +root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/npm /usr/local/bin +root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/node /usr/local/bin +root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/npx /usr/local/bin +``` + +19. Back up unprivileged user sensor-specific config and scripts: + - `mv /opt/sensor/ /opt/sensor_upgrade_backup_$(date +%Y-%m-%d)` + +20. Grab unprivileged user sensor-specific config and scripts from our mount of the ISO using `rsync` and change its ownership to the unprivileged user: +``` +root@hedgehog:/tmp# rsync -av user@otherbox:/media/squash/opt/sensor /opt/ +user@otherbox's password: +receiving incremental file list +created directory ./opt +sensor/ +[...] + +sent 1,244 bytes received 1,646,409 bytes 470,758.00 bytes/sec +total size is 1,641,629 speedup is 1.00 + +root@hedgehog:/tmp# chown -R sensor:sensor /opt/sensor* + +root@hedgehog:/tmp# ls -l /opt/ | grep sensor +drwxr-xr-x 4 sensor sensor 4096 May 6 22:00 sensor +drwxr-x--- 4 sensor sensor 4096 May 8 14:33 sensor_upgrade_backup_2020-05-08 +``` + +21. Leave the root shell and `cd` to `/opt` +``` +root@hedgehog:~# exit +logout + +sensor@hedgehog:~$ whoami +sensor + +sensor@hedgehog:~$ cd /opt +``` + +22. Compare the old and new `control_vars.conf` files +``` +sensor@hedgehog:opt$ diff sensor_upgrade_backup_2020-05-08/sensor_ctl/control_vars.conf sensor/sensor_ctl/control_vars.conf +1,2c1,2 +< export CAPTURE_INTERFACE=enp0s3 +< export CAPTURE_FILTER="not port 5044 and not port 5601 and not port 8005 and not port 9200 and not port 9600" +--- +> export CAPTURE_INTERFACE=xxxx +> export CAPTURE_FILTER="" +4c4 +[...] +``` + +Examine the differences. If there aren't any new `export` variables, then you're probably safe to just replace the default version of `control_vars.conf` with the backed-up version: + +``` +sensor@hedgehog:opt$ cp sensor_upgrade_backup_2020-05-08/sensor_ctl/control_vars.conf sensor/sensor_ctl/control_vars.conf +cp: overwrite 'sensor/sensor_ctl/control_vars.conf'? y +``` + +If there are major differences or new variables, continue on to the next step, in a minute you'll need to run `capture-config` to configure from scratch anyway. + +23. Restore certificates/keystores for forwarders from the backup `sensor_ctl` path to the new one +``` +sensor@hedgehog:opt$ for BEAT in filebeat miscbeat; do cp /opt/sensor_upgrade_backup_2020-05-08/sensor_ctl/$BEAT/data/* /opt/sensor/sensor_ctl/$BEAT/data/; done + +sensor@hedgehog:opt$ cp /opt/sensor_upgrade_backup_2020-05-07/sensor_ctl/filebeat/{ca.crt,client.crt,client.key} /opt/sensor/sensor_ctl/logstash-client-certificates/ +``` + +24. Despite what we just did, you may consider running `capture-config` to re-configure [capture, forwarding, and autostart services](hedgehog-config-user.md#HedgehogConfigUser) from scratch anyway. You can use the backed-up version of `control_vars.conf` to refer back to as a basis for things you might want to restore (e.g., `CAPTURE_INTERFACE`, `CAPTURE_FILTER`, `PCAP_PATH`, `ZEEK_LOG_PATH`, your autostart settings, etc.). + +25. Once you feel confident you've completed all of these steps, issue a reboot on the Hedgehog + +## Post-upgrade + +Once the Hedgehog has come back up, check to make sure everything is working: + +* `/opt/sensor/sensor_ctl/status` should return `RUNNING` for the things you set to autorun (no `FATAL` errors) +* `sensorwatch` should show current writes to Zeek log files and PCAP files (depending on your configuration) +* `tail -f /opt/sensor/sensor_ctl/log/*` should show no egregious errors +* `zeek --version`, `zeek -N local` and `capture --version` ought to run and print out version information as expected +* if you are forwarding to a [Malcolm](https://github.com/idaholab/Malcolm) aggregator, you should start seeing data momentarily \ No newline at end of file diff --git a/docs/hedgehog.md b/docs/hedgehog.md new file mode 100644 index 000000000..aa0473131 --- /dev/null +++ b/docs/hedgehog.md @@ -0,0 +1,41 @@ +# Hedgehog Linux + +**Network Traffic Capture Appliance** + +![](./images/hedgehog/logo/hedgehog-color-w-text.png) + +Hedgehog Linux is a Debian-based operating system built to + +* monitor network interfaces +* capture packets to PCAP files +* detect file transfers in network traffic and extract and scan those files for threats +* generate and forward Zeek logs, Arkime sessions and other information to [Malcolm](https://github.com/idaholab/Malcolm) + +![sensor-iso-build-docker-wrap-push-ghcr](https://github.com/idaholab/Malcolm/workflows/sensor-iso-build-docker-wrap-push-ghcr/badge.svg) + + +* [Sensor installation](hedgehog-installation.md#HedgehogInstallation) + - [Image boot options](hedgehog-installation.md#HedgehogBootOptions) + - [Installer](hedgehog-installation.md#HedgehogInstaller) +* [Boot](hedgehog-boot.md#HedgehogBoot) + - [Kiosk mode](hedgehog-boot.md#HedgehogKioskMode) +* [Configuration](hedgehog-config.md#HedgehogConfiguration) + - [Interfaces, hostname, and time synchronization](hedgehog-config-root.md#HedgehogConfigRoot) + + [Hostname](hedgehog-config-root.md#HedgehogConfigHostname) + + [Interfaces](hedgehog-config-root.md#HedgehogConfigIface) + + [Time synchronization](hedgehog-config-root.md#HedgehogConfigTime) + - [Capture, forwarding, and autostart services](hedgehog-config-user.md#HedgehogConfigUser) + + [Capture](hedgehog-config-user.md#HedgehogConfigCapture) + * [Automatic file extraction and scanning](hedgehog-config-user.md#HedgehogZeekFileExtraction) + + [Forwarding](hedgehog-config-user.md#HedgehogConfigForwarding) + * [arkime-capture](hedgehog-config-user.md#Hedgehogarkime-capture): Arkime session forwarding + * [filebeat](hedgehog-config-user.md#Hedgehogfilebeat): Zeek and Suricata log forwarding + * [miscbeat](hedgehog-config-user.md#Hedgehogmiscbeat): System metrics forwarding + + [Autostart services](hedgehog-config-user.md#HedgehogConfigAutostart) ++ [Zeek Intelligence Framework](hedgehog-config-zeek-intel.md#HedgehogZeekIntel) +* [Appendix A - Generating the ISO](hedgehog-iso-build.md#HedgehogISOBuild) +* [Appendix B - Configuring SSH access](hedgehog-ssh.md#HedgehogConfigSSH) +* [Appendix C - Troubleshooting](hedgehog-troubleshooting.md#HedgehogTroubleshooting) +* [Appendix D - Hardening](hedgehog-hardening.md#HedgehogHardening) + - [Compliance exceptions](hedgehog-hardening.md#HedgehogComplianceExceptions) +* [Appendix E - Upgrades](hedgehog-upgrade.md#HedgehogUpgradePlan) diff --git a/docs/host-and-subnet-mapping.md b/docs/host-and-subnet-mapping.md new file mode 100644 index 000000000..c220a1622 --- /dev/null +++ b/docs/host-and-subnet-mapping.md @@ -0,0 +1,94 @@ +# Automatic host and subnet name assignment + +* [Automatic host and subnet name assignment](host-and-subnet-mapping.md#HostAndSubnetNaming) + - [IP/MAC address to hostname mapping via `host-map.txt`](host-and-subnet-mapping.md#HostNaming) + - [CIDR subnet to network segment name mapping via `cidr-map.txt`](host-and-subnet-mapping.md#SegmentNaming) + - [Defining hostname and CIDR subnet names interface](host-and-subnet-mapping.md#NameMapUI) + - [Applying mapping changes](host-and-subnet-mapping.md#ApplyMapping) + +## IP/MAC address to hostname mapping via `host-map.txt` + +The `host-map.txt` file in the Malcolm installation directory can be used to define names for network hosts based on IP and/or MAC addresses in Zeek logs. The default empty configuration looks like this: +``` +# IP or MAC address to host name map: +# address|host name|required tag +# +# where: +# address: comma-separated list of IPv4, IPv6, or MAC addresses +# e.g., 172.16.10.41, 02:42:45:dc:a2:96, 2001:0db8:85a3:0000:0000:8a2e:0370:7334 +# +# host name: host name to be assigned when event address(es) match +# +# required tag (optional): only check match and apply host name if the event +# contains this tag +# +``` +Each non-comment line (not beginning with a `#`), defines an address-to-name mapping for a network host. For example: +``` +127.0.0.1,127.0.1.1,::1|localhost| +192.168.10.10|office-laptop.intranet.lan| +06:46:0b:a6:16:bf|serial-host.intranet.lan|testbed +``` +Each line consists of three `|`-separated fields: address(es), hostname, and, optionally, a tag which, if specified, must belong to a log for the matching to occur. + +As Zeek logs are processed into Malcolm's OpenSearch instance, the log's source and destination IP and MAC address fields (`source.ip`, `destination.ip`, `source.mac`, and `destination.mac`, respectively) are compared against the lists of addresses in `host-map.txt`. When a match is found, a new field is added to the log: `source.hostname` or `destination.hostname`, depending on whether the matching address belongs to the originating or responding host. If the third field (the "required tag" field) is specified, a log must also contain that value in its `tags` field in addition to matching the IP or MAC address specified in order for the corresponding `_hostname` field to be added. + +`source.hostname` and `destination.hostname` may each contain multiple values. For example, if both a host's source IP address and source MAC address were matched by two different lines, `source.hostname` would contain the hostname values from both matching lines. + +## CIDR subnet to network segment name mapping via `cidr-map.txt` + +The `cidr-map.txt` file in the Malcolm installation directory can be used to define names for network segments based on IP addresses in Zeek logs. The default empty configuration looks like this: +``` +# CIDR to network segment format: +# IP(s)|segment name|required tag +# +# where: +# IP(s): comma-separated list of CIDR-formatted network IP addresses +# e.g., 10.0.0.0/8, 169.254.0.0/16, 172.16.10.41 +# +# segment name: segment name to be assigned when event IP address(es) match +# +# required tag (optional): only check match and apply segment name if the event +# contains this tag +# +``` +Each non-comment line (not beginning with a `#`), defines an subnet-to-name mapping for a network host. For example: +``` +192.168.50.0/24,192.168.40.0/24,10.0.0.0/8|corporate| +192.168.100.0/24|control| +192.168.200.0/24|dmz| +172.16.0.0/12|virtualized|testbed +``` +Each line consists of three `|`-separated fields: CIDR-formatted subnet IP range(s), subnet name, and, optionally, a tag which, if specified, must belong to a log for the matching to occur. + +As Zeek logs are processed into Malcolm's OpenSearch instance, the log's source and destination IP address fields (`source.ip` and `destination.ip`, respectively) are compared against the lists of addresses in `cidr-map.txt`. When a match is found, a new field is added to the log: `source.segment` or `destination.segment`, depending on whether the matching address belongs to the originating or responding host. If the third field (the "required tag" field) is specified, a log must also contain that value in its `tags` field in addition to its IP address falling within the subnet specified in order for the corresponding `_segment` field to be added. + +`source.segment` and `destination.segment` may each contain multiple values. For example, if `cidr-map.txt` specifies multiple overlapping subnets on different lines, `source.segment` would contain the hostname values from both matching lines if `source.ip` belonged to both subnets. + +If both `source.segment` and `destination.segment` are added to a log, and if they contain different values, the tag `cross_segment` will be added to the log's `tags` field for convenient identification of cross-segment traffic. This traffic could be easily visualized using Arkime's **Connections** graph, by setting the **Src:** value to **Originating Network Segment** and the **Dst:** value to **Responding Network Segment**: + +![Cross-segment traffic in Connections](./images/screenshots/arkime_connections_segments.png) + +## Defining hostname and CIDR subnet names interface + +As an alternative to manually editing `cidr-map.txt` and `host-map.txt`, a **Host and Subnet Name Mapping** editor is available at [https://localhost/name-map-ui/](https://localhost/name-map-ui/) if you are connecting locally. Upon loading, the editor is populated from `cidr-map.txt`, `host-map.txt` and `net-map.json`. + +This editor provides the following controls: + +* 🔎 **Search mappings** - narrow the list of visible items using a search filter +* **Type**, **Address**, **Name** and **Tag** *(column headings)* - sort the list of items by clicking a column header +* 📝 *(per item)* - modify the selected item +* 🚫 *(per item)* - remove the selected item +* 🖳 **host** / 🖧 **segment**, **Address**, **Name**, **Tag (optional)** and 💾 - save the item with these values (either adding a new item or updating the item being modified) +* 📥 **Import** - clear the list and replace it with the contents of an uploaded `net-map.json` file +* 📤 **Export** - format and download the list as a `net-map.json` file +* 💾 **Save Mappings** - format and store `net-map.json` in the Malcolm directory (replacing the existing `net-map.json` file) +* 🔁 **Restart Logstash** - restart log ingestion, parsing and enrichment + +![Host and Subnet Name Mapping Editor](./images/screenshots/malcolm_name_map_ui.png) + +## Applying mapping changes + +When changes are made to either `cidr-map.txt`, `host-map.txt` or `net-map.json`, Malcolm's Logstash container must be restarted. The easiest way to do this is to restart malcolm via `restart` (see [Stopping and restarting Malcolm](running.md#StopAndRestart)) or by clicking the 🔁 **Restart Logstash** button in the [name mapping interface](#NameMapUI) interface. + +Restarting Logstash may take several minutes, after which log ingestion will be resumed. \ No newline at end of file diff --git a/docs/host-config-linux.md b/docs/host-config-linux.md new file mode 100644 index 000000000..db5c65db7 --- /dev/null +++ b/docs/host-config-linux.md @@ -0,0 +1,91 @@ +# Linux host system configuration + +## Installing Docker + +Docker installation instructions vary slightly by distribution. Please follow the links below to docker.com to find the instructions specific to your distribution: + +* [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) +* [Debian](https://docs.docker.com/install/linux/docker-ce/debian/) +* [Fedora](https://docs.docker.com/install/linux/docker-ce/fedora/) +* [CentOS](https://docs.docker.com/install/linux/docker-ce/centos/) +* [Binaries](https://docs.docker.com/install/linux/docker-ce/binaries/) + +After installing Docker, because Malcolm should be run as a non-root user, add your user to the `docker` group with something like: +``` +$ sudo usermod -aG docker yourusername +``` + +Following this, either reboot or log out then log back in. + +Docker starts automatically on DEB-based distributions. On RPM-based distributions, you need to start it manually or enable it using the appropriate `systemctl` or `service` command(s). + +You can test docker by running `docker info`, or (assuming you have internet access), `docker run --rm hello-world`. + +## Installing docker-compose + +Please follow [this link](https://docs.docker.com/compose/install/) on docker.com for instructions on installing docker-compose. + +## Operating system configuration + +The host system (ie., the one running Docker) will need to be configured for the [best possible OpenSearch performance](https://www.elastic.co/guide/en/elasticsearch/reference/master/system-config.html). Here are a few suggestions for Linux hosts (these may vary from distribution to distribution): + +* Append the following lines to `/etc/sysctl.conf`: + +``` +# the maximum number of open file handles +fs.file-max=2097152 + +# increase maximums for inotify watches +fs.inotify.max_user_watches=131072 +fs.inotify.max_queued_events=131072 +fs.inotify.max_user_instances=512 + +# the maximum number of memory map areas a process may have +vm.max_map_count=262144 + +# decrease "swappiness" (swapping out runtime memory vs. dropping pages) +vm.swappiness=1 + +# the maximum number of incoming connections +net.core.somaxconn=65535 + +# the % of system memory fillable with "dirty" pages before flushing +vm.dirty_background_ratio=40 + +# maximum % of dirty system memory before committing everything +vm.dirty_ratio=80 +``` + +* Depending on your distribution, create **either** the file `/etc/security/limits.d/limits.conf` containing: + +``` +# the maximum number of open file handles +* soft nofile 65535 +* hard nofile 65535 +# do not limit the size of memory that can be locked +* soft memlock unlimited +* hard memlock unlimited +``` + +**OR** the file `/etc/systemd/system.conf.d/limits.conf` containing: + +``` +[Manager] +# the maximum number of open file handles +DefaultLimitNOFILE=65535:65535 +# do not limit the size of memory that can be locked +DefaultLimitMEMLOCK=infinity +``` + +* Change the readahead value for the disk where the OpenSearch data will be stored. There are a few ways to do this. For example, you could add this line to `/etc/rc.local` (replacing `/dev/sda` with your disk block descriptor): + +``` +# change disk read-adhead value (# of blocks) +blockdev --setra 512 /dev/sda +``` + +* Change the I/O scheduler to `deadline` or `noop`. Again, this can be done in a variety of ways. The simplest is to add `elevator=deadline` to the arguments in `GRUB_CMDLINE_LINUX` in `/etc/default/grub`, then running `sudo update-grub2` + +* If you are planning on using very large data sets, consider formatting the drive containing the `opensearch` volume as XFS. + +After making all of these changes, do a reboot for good measure! \ No newline at end of file diff --git a/docs/host-config-macos.md b/docs/host-config-macos.md new file mode 100644 index 000000000..74e277887 --- /dev/null +++ b/docs/host-config-macos.md @@ -0,0 +1,36 @@ +# macOS host system configuration + +## Automatic installation using `install.py` + +The `install.py` script will attempt to guide you through the installation of Docker and Docker Compose if they are not present. If that works for you, you can skip ahead to **Configure docker daemon option** in this section. + +## Install Homebrew + +The easiest way to install and maintain docker on Mac is using the [Homebrew cask](https://brew.sh). Execute the following in a terminal. + +``` +$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" +$ brew install cask +$ brew tap homebrew/cask-versions +``` + +## Install docker-edge + +``` +$ brew cask install docker-edge +``` +This will install the latest version of docker and docker-compose. It can be upgraded later using `brew` as well: +``` +$ brew cask upgrade --no-quarantine docker-edge +``` +You can now run docker from the Applications folder. + +## Configure docker daemon option + +Some changes should be made for performance ([this link](http://markshust.com/2018/01/30/performance-tuning-docker-mac) gives a good succinct overview). + +* **Resource allocation** - For a good experience, you likely need at least a quad-core MacBook Pro with 16GB RAM and an SSD. I have run Malcolm on an older 2013 MacBook Pro with 8GB of RAM, but the more the better. Go in your system tray and select **Docker** → **Preferences** → **Advanced**. Set the resources available to docker to at least 4 CPUs and 8GB of RAM (>= 16GB is preferable). + +* **Volume mount performance** - You can speed up performance of volume mounts by removing unused paths from **Docker** → **Preferences** → **File Sharing**. For example, if you're only going to be mounting volumes under your home directory, you could share `/Users` but remove other paths. + +After making these changes, right click on the Docker 🐋 icon in the system tray and select **Restart**. \ No newline at end of file diff --git a/docs/host-config-windows.md b/docs/host-config-windows.md new file mode 100644 index 000000000..a12716b33 --- /dev/null +++ b/docs/host-config-windows.md @@ -0,0 +1,16 @@ +# Windows host system configuration + +## Installing and configuring Docker Desktop for Windows + +Installing and configuring [Docker to run under Windows](https://docs.docker.com/desktop/windows/wsl/) must be done manually, rather than through the `install.py` script as is done for Linux and macOS. + +1. Be running Windows 10, version 1903 or higher +1. Prepare your system and [install WSL](https://docs.microsoft.com/en-us/windows/wsl/install) and a Linux distribution by running `wsl --install -d Debian` in PowerShell as Administrator (these instructions are tested with Debian, but may work with other distributions) +1. Install Docker Desktop for Windows either by downloading the installer from the [official Docker site](https://hub.docker.com/editions/community/docker-ce-desktop-windows) or installing it through [chocolatey](https://chocolatey.org/packages/docker-desktop). +1. Follow the [Docker Desktop WSL 2 backend](https://docs.docker.com/desktop/windows/wsl/) instructions to finish configuration and review best practices +1. Reboot +1. Open the WSL distribution's terminal and run run `docker info` to make sure Docker is running + +## Finish Malcolm's configuration + +Once Docker is installed, configured and running as described in the previous section, run [`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning) to finish configuration of the local Malcolm installation. Malcolm will be controlled and run from within your WSL distribution's terminal environment. \ No newline at end of file diff --git a/docs/host-config.md b/docs/host-config.md new file mode 100644 index 000000000..f92375141 --- /dev/null +++ b/docs/host-config.md @@ -0,0 +1,5 @@ +# Platform-specific Configuration + +* [Linux host system configuration](host-config-linux.md#HostSystemConfigLinux) +* [macOS host system configuration](host-config-macos.md#HostSystemConfigMac) +* [Windows host system configuration](host-config-windows.md#HostSystemConfigWindows) \ No newline at end of file diff --git a/docs/ics-best-guess.md b/docs/ics-best-guess.md new file mode 100644 index 000000000..7e6eb61cc --- /dev/null +++ b/docs/ics-best-guess.md @@ -0,0 +1,11 @@ +# "Best Guess" Fingerprinting for ICS Protocols + +There are many ICS (industrial control systems) protocols. While Malcolm's collection of [protocol parsers](protocols.md#Protocols) includes a number of them, many, particularly those that are proprietary or less common, are unlikely to be supported with a full parser in the foreseeable future. + +In an effort to help identify more ICS traffic, Malcolm can use "best guess" method based on transport protocol (e.g., TCP or UDP) and port(s) to categorize potential traffic communicating over some ICS protocols without full parser support. This feature involves a [mapping table](https://github.com/idaholab/Malcolm/blob/master/zeek/config/guess_ics_map.txt) and a [Zeek script](https://github.com/idaholab/Malcolm/blob/master/zeek/config/guess.zeek) to look up the transport protocol and destination and/or source port to make a best guess at whether a connection belongs to one of those protocols. These potential ICS communications are categorized by vendor where possible. + +Naturally, these lookups could produce false positives, so these connections are displayed in their own dashboard (the **Best Guess** dashboard found under the **ICS** section of Malcolm's [OpenSearch Dashboards](dashboards.md#DashboardsVisualizations) navigation pane). Values such as IP addresses, ports, or UID can be used to [pivot to other dashboards](arkime.md#ZeekArkimeFlowCorrelation) to investigate further. + +![](./images/screenshots/dashboards_bestguess.png) + +This feature is disabled by default, but it can be enabled by clearing (setting to `''`) the value of the `ZEEK_DISABLE_BEST_GUESS_ICS` environment variable in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). \ No newline at end of file diff --git a/docs/images/hedgehog/images/arkime-capture-ip-port.jpg b/docs/images/hedgehog/images/arkime-capture-ip-port.jpg new file mode 100644 index 000000000..8e2536437 Binary files /dev/null and b/docs/images/hedgehog/images/arkime-capture-ip-port.jpg differ diff --git a/sensor-iso/docs/images/arkime-capture-ip-port.png b/docs/images/hedgehog/images/arkime-capture-ip-port.png similarity index 100% rename from sensor-iso/docs/images/arkime-capture-ip-port.png rename to docs/images/hedgehog/images/arkime-capture-ip-port.png diff --git a/docs/images/hedgehog/images/arkime_confirm.jpg b/docs/images/hedgehog/images/arkime_confirm.jpg new file mode 100644 index 000000000..85b973f3d Binary files /dev/null and b/docs/images/hedgehog/images/arkime_confirm.jpg differ diff --git a/sensor-iso/docs/images/arkime_confirm.png b/docs/images/hedgehog/images/arkime_confirm.png similarity index 100% rename from sensor-iso/docs/images/arkime_confirm.png rename to docs/images/hedgehog/images/arkime_confirm.png diff --git a/docs/images/hedgehog/images/autostarts.jpg b/docs/images/hedgehog/images/autostarts.jpg new file mode 100644 index 000000000..a690502e8 Binary files /dev/null and b/docs/images/hedgehog/images/autostarts.jpg differ diff --git a/sensor-iso/docs/images/autostarts.png b/docs/images/hedgehog/images/autostarts.png similarity index 100% rename from sensor-iso/docs/images/autostarts.png rename to docs/images/hedgehog/images/autostarts.png diff --git a/docs/images/hedgehog/images/autostarts_confirm.jpg b/docs/images/hedgehog/images/autostarts_confirm.jpg new file mode 100644 index 000000000..c60dc2505 Binary files /dev/null and b/docs/images/hedgehog/images/autostarts_confirm.jpg differ diff --git a/sensor-iso/docs/images/autostarts_confirm.png b/docs/images/hedgehog/images/autostarts_confirm.png similarity index 100% rename from sensor-iso/docs/images/autostarts_confirm.png rename to docs/images/hedgehog/images/autostarts_confirm.png diff --git a/docs/images/hedgehog/images/boot_options.jpg b/docs/images/hedgehog/images/boot_options.jpg new file mode 100644 index 000000000..494356fea Binary files /dev/null and b/docs/images/hedgehog/images/boot_options.jpg differ diff --git a/sensor-iso/docs/images/boot_options.png b/docs/images/hedgehog/images/boot_options.png similarity index 100% rename from sensor-iso/docs/images/boot_options.png rename to docs/images/hedgehog/images/boot_options.png diff --git a/docs/images/hedgehog/images/capture_config_main.jpg b/docs/images/hedgehog/images/capture_config_main.jpg new file mode 100644 index 000000000..9bb58573f Binary files /dev/null and b/docs/images/hedgehog/images/capture_config_main.jpg differ diff --git a/sensor-iso/docs/images/capture_config_main.png b/docs/images/hedgehog/images/capture_config_main.png similarity index 100% rename from sensor-iso/docs/images/capture_config_main.png rename to docs/images/hedgehog/images/capture_config_main.png diff --git a/docs/images/hedgehog/images/capture_filter.jpg b/docs/images/hedgehog/images/capture_filter.jpg new file mode 100644 index 000000000..6d5876fbf Binary files /dev/null and b/docs/images/hedgehog/images/capture_filter.jpg differ diff --git a/sensor-iso/docs/images/capture_filter.png b/docs/images/hedgehog/images/capture_filter.png similarity index 100% rename from sensor-iso/docs/images/capture_filter.png rename to docs/images/hedgehog/images/capture_filter.png diff --git a/docs/images/hedgehog/images/capture_iface_select.jpg b/docs/images/hedgehog/images/capture_iface_select.jpg new file mode 100644 index 000000000..2b275f932 Binary files /dev/null and b/docs/images/hedgehog/images/capture_iface_select.jpg differ diff --git a/sensor-iso/docs/images/capture_iface_select.png b/docs/images/hedgehog/images/capture_iface_select.png similarity index 100% rename from sensor-iso/docs/images/capture_iface_select.png rename to docs/images/hedgehog/images/capture_iface_select.png diff --git a/docs/images/hedgehog/images/capture_paths.jpg b/docs/images/hedgehog/images/capture_paths.jpg new file mode 100644 index 000000000..ae6259d79 Binary files /dev/null and b/docs/images/hedgehog/images/capture_paths.jpg differ diff --git a/sensor-iso/docs/images/capture_paths.png b/docs/images/hedgehog/images/capture_paths.png similarity index 100% rename from sensor-iso/docs/images/capture_paths.png rename to docs/images/hedgehog/images/capture_paths.png diff --git a/docs/images/hedgehog/images/desktop.jpg b/docs/images/hedgehog/images/desktop.jpg new file mode 100644 index 000000000..3abcb12eb Binary files /dev/null and b/docs/images/hedgehog/images/desktop.jpg differ diff --git a/sensor-iso/docs/images/desktop.png b/docs/images/hedgehog/images/desktop.png similarity index 100% rename from sensor-iso/docs/images/desktop.png rename to docs/images/hedgehog/images/desktop.png diff --git a/docs/images/hedgehog/images/file_quarantine.jpg b/docs/images/hedgehog/images/file_quarantine.jpg new file mode 100644 index 000000000..5a324e6f9 Binary files /dev/null and b/docs/images/hedgehog/images/file_quarantine.jpg differ diff --git a/sensor-iso/docs/images/file_quarantine.png b/docs/images/hedgehog/images/file_quarantine.png similarity index 100% rename from sensor-iso/docs/images/file_quarantine.png rename to docs/images/hedgehog/images/file_quarantine.png diff --git a/docs/images/hedgehog/images/filebeat_certs.jpg b/docs/images/hedgehog/images/filebeat_certs.jpg new file mode 100644 index 000000000..ceb85018f Binary files /dev/null and b/docs/images/hedgehog/images/filebeat_certs.jpg differ diff --git a/sensor-iso/docs/images/filebeat_certs.png b/docs/images/hedgehog/images/filebeat_certs.png similarity index 100% rename from sensor-iso/docs/images/filebeat_certs.png rename to docs/images/hedgehog/images/filebeat_certs.png diff --git a/docs/images/hedgehog/images/filebeat_confirm.jpg b/docs/images/hedgehog/images/filebeat_confirm.jpg new file mode 100644 index 000000000..644880987 Binary files /dev/null and b/docs/images/hedgehog/images/filebeat_confirm.jpg differ diff --git a/sensor-iso/docs/images/filebeat_confirm.png b/docs/images/hedgehog/images/filebeat_confirm.png similarity index 100% rename from sensor-iso/docs/images/filebeat_confirm.png rename to docs/images/hedgehog/images/filebeat_confirm.png diff --git a/docs/images/hedgehog/images/filebeat_ip_port.jpg b/docs/images/hedgehog/images/filebeat_ip_port.jpg new file mode 100644 index 000000000..61a8a567b Binary files /dev/null and b/docs/images/hedgehog/images/filebeat_ip_port.jpg differ diff --git a/sensor-iso/docs/images/filebeat_ip_port.png b/docs/images/hedgehog/images/filebeat_ip_port.png similarity index 100% rename from sensor-iso/docs/images/filebeat_ip_port.png rename to docs/images/hedgehog/images/filebeat_ip_port.png diff --git a/docs/images/hedgehog/images/filebeat_log_path.jpg b/docs/images/hedgehog/images/filebeat_log_path.jpg new file mode 100644 index 000000000..34ff4436d Binary files /dev/null and b/docs/images/hedgehog/images/filebeat_log_path.jpg differ diff --git a/sensor-iso/docs/images/filebeat_log_path.png b/docs/images/hedgehog/images/filebeat_log_path.png similarity index 100% rename from sensor-iso/docs/images/filebeat_log_path.png rename to docs/images/hedgehog/images/filebeat_log_path.png diff --git a/docs/images/hedgehog/images/filebeat_ssl.jpg b/docs/images/hedgehog/images/filebeat_ssl.jpg new file mode 100644 index 000000000..850624495 Binary files /dev/null and b/docs/images/hedgehog/images/filebeat_ssl.jpg differ diff --git a/sensor-iso/docs/images/filebeat_ssl.png b/docs/images/hedgehog/images/filebeat_ssl.png similarity index 100% rename from sensor-iso/docs/images/filebeat_ssl.png rename to docs/images/hedgehog/images/filebeat_ssl.png diff --git a/docs/images/hedgehog/images/filebeat_ssl_verify.jpg b/docs/images/hedgehog/images/filebeat_ssl_verify.jpg new file mode 100644 index 000000000..9097178c4 Binary files /dev/null and b/docs/images/hedgehog/images/filebeat_ssl_verify.jpg differ diff --git a/sensor-iso/docs/images/filebeat_ssl_verify.png b/docs/images/hedgehog/images/filebeat_ssl_verify.png similarity index 100% rename from sensor-iso/docs/images/filebeat_ssl_verify.png rename to docs/images/hedgehog/images/filebeat_ssl_verify.png diff --git a/docs/images/hedgehog/images/forwarder_config.jpg b/docs/images/hedgehog/images/forwarder_config.jpg new file mode 100644 index 000000000..4b1d95c5e Binary files /dev/null and b/docs/images/hedgehog/images/forwarder_config.jpg differ diff --git a/sensor-iso/docs/images/forwarder_config.png b/docs/images/hedgehog/images/forwarder_config.png similarity index 100% rename from sensor-iso/docs/images/forwarder_config.png rename to docs/images/hedgehog/images/forwarder_config.png diff --git a/docs/images/hedgehog/images/hedgehog-color-w-text.jpg b/docs/images/hedgehog/images/hedgehog-color-w-text.jpg new file mode 100644 index 000000000..f095934ef Binary files /dev/null and b/docs/images/hedgehog/images/hedgehog-color-w-text.jpg differ diff --git a/sensor-iso/docs/images/hedgehog-color-w-text.png b/docs/images/hedgehog/images/hedgehog-color-w-text.png similarity index 100% rename from sensor-iso/docs/images/hedgehog-color-w-text.png rename to docs/images/hedgehog/images/hedgehog-color-w-text.png diff --git a/docs/images/hedgehog/images/hostname_setting.jpg b/docs/images/hedgehog/images/hostname_setting.jpg new file mode 100644 index 000000000..9d95a53cc Binary files /dev/null and b/docs/images/hedgehog/images/hostname_setting.jpg differ diff --git a/sensor-iso/docs/images/hostname_setting.png b/docs/images/hedgehog/images/hostname_setting.png similarity index 100% rename from sensor-iso/docs/images/hostname_setting.png rename to docs/images/hedgehog/images/hostname_setting.png diff --git a/docs/images/hedgehog/images/htpdate_freq.jpg b/docs/images/hedgehog/images/htpdate_freq.jpg new file mode 100644 index 000000000..8c22e8747 Binary files /dev/null and b/docs/images/hedgehog/images/htpdate_freq.jpg differ diff --git a/sensor-iso/docs/images/htpdate_freq.png b/docs/images/hedgehog/images/htpdate_freq.png similarity index 100% rename from sensor-iso/docs/images/htpdate_freq.png rename to docs/images/hedgehog/images/htpdate_freq.png diff --git a/docs/images/hedgehog/images/htpdate_host.jpg b/docs/images/hedgehog/images/htpdate_host.jpg new file mode 100644 index 000000000..c7571f4bf Binary files /dev/null and b/docs/images/hedgehog/images/htpdate_host.jpg differ diff --git a/sensor-iso/docs/images/htpdate_host.png b/docs/images/hedgehog/images/htpdate_host.png similarity index 100% rename from sensor-iso/docs/images/htpdate_host.png rename to docs/images/hedgehog/images/htpdate_host.png diff --git a/docs/images/hedgehog/images/htpdate_setup.jpg b/docs/images/hedgehog/images/htpdate_setup.jpg new file mode 100644 index 000000000..b95d0967a Binary files /dev/null and b/docs/images/hedgehog/images/htpdate_setup.jpg differ diff --git a/sensor-iso/docs/images/htpdate_setup.png b/docs/images/hedgehog/images/htpdate_setup.png similarity index 100% rename from sensor-iso/docs/images/htpdate_setup.png rename to docs/images/hedgehog/images/htpdate_setup.png diff --git a/docs/images/hedgehog/images/htpdate_test.jpg b/docs/images/hedgehog/images/htpdate_test.jpg new file mode 100644 index 000000000..6b7160f3b Binary files /dev/null and b/docs/images/hedgehog/images/htpdate_test.jpg differ diff --git a/sensor-iso/docs/images/htpdate_test.png b/docs/images/hedgehog/images/htpdate_test.png similarity index 100% rename from sensor-iso/docs/images/htpdate_test.png rename to docs/images/hedgehog/images/htpdate_test.png diff --git a/docs/images/hedgehog/images/iface_mode.jpg b/docs/images/hedgehog/images/iface_mode.jpg new file mode 100644 index 000000000..6645914a3 Binary files /dev/null and b/docs/images/hedgehog/images/iface_mode.jpg differ diff --git a/sensor-iso/docs/images/iface_mode.png b/docs/images/hedgehog/images/iface_mode.png similarity index 100% rename from sensor-iso/docs/images/iface_mode.png rename to docs/images/hedgehog/images/iface_mode.png diff --git a/docs/images/hedgehog/images/iface_static.jpg b/docs/images/hedgehog/images/iface_static.jpg new file mode 100644 index 000000000..4cdba03f6 Binary files /dev/null and b/docs/images/hedgehog/images/iface_static.jpg differ diff --git a/sensor-iso/docs/images/iface_static.png b/docs/images/hedgehog/images/iface_static.png similarity index 100% rename from sensor-iso/docs/images/iface_static.png rename to docs/images/hedgehog/images/iface_static.png diff --git a/docs/images/hedgehog/images/installer_progress.jpg b/docs/images/hedgehog/images/installer_progress.jpg new file mode 100644 index 000000000..20083997f Binary files /dev/null and b/docs/images/hedgehog/images/installer_progress.jpg differ diff --git a/sensor-iso/docs/images/installer_progress.png b/docs/images/hedgehog/images/installer_progress.png similarity index 100% rename from sensor-iso/docs/images/installer_progress.png rename to docs/images/hedgehog/images/installer_progress.png diff --git a/docs/images/hedgehog/images/kiosk_mode_sensor_menu.jpg b/docs/images/hedgehog/images/kiosk_mode_sensor_menu.jpg new file mode 100644 index 000000000..e08d59838 Binary files /dev/null and b/docs/images/hedgehog/images/kiosk_mode_sensor_menu.jpg differ diff --git a/sensor-iso/docs/images/kiosk_mode_sensor_menu.png b/docs/images/hedgehog/images/kiosk_mode_sensor_menu.png similarity index 100% rename from sensor-iso/docs/images/kiosk_mode_sensor_menu.png rename to docs/images/hedgehog/images/kiosk_mode_sensor_menu.png diff --git a/docs/images/hedgehog/images/kiosk_mode_services_menu.jpg b/docs/images/hedgehog/images/kiosk_mode_services_menu.jpg new file mode 100644 index 000000000..4cd2dc525 Binary files /dev/null and b/docs/images/hedgehog/images/kiosk_mode_services_menu.jpg differ diff --git a/sensor-iso/docs/images/kiosk_mode_services_menu.png b/docs/images/hedgehog/images/kiosk_mode_services_menu.png similarity index 100% rename from sensor-iso/docs/images/kiosk_mode_services_menu.png rename to docs/images/hedgehog/images/kiosk_mode_services_menu.png diff --git a/docs/images/hedgehog/images/kiosk_mode_status.jpg b/docs/images/hedgehog/images/kiosk_mode_status.jpg new file mode 100644 index 000000000..52e52a733 Binary files /dev/null and b/docs/images/hedgehog/images/kiosk_mode_status.jpg differ diff --git a/sensor-iso/docs/images/kiosk_mode_status.png b/docs/images/hedgehog/images/kiosk_mode_status.png similarity index 100% rename from sensor-iso/docs/images/kiosk_mode_status.png rename to docs/images/hedgehog/images/kiosk_mode_status.png diff --git a/docs/images/hedgehog/images/kiosk_mode_wipe_prompt.jpg b/docs/images/hedgehog/images/kiosk_mode_wipe_prompt.jpg new file mode 100644 index 000000000..61f7ad59b Binary files /dev/null and b/docs/images/hedgehog/images/kiosk_mode_wipe_prompt.jpg differ diff --git a/sensor-iso/docs/images/kiosk_mode_wipe_prompt.png b/docs/images/hedgehog/images/kiosk_mode_wipe_prompt.png similarity index 100% rename from sensor-iso/docs/images/kiosk_mode_wipe_prompt.png rename to docs/images/hedgehog/images/kiosk_mode_wipe_prompt.png diff --git a/docs/images/hedgehog/images/malcolm_arkime_reachback_acl.jpg b/docs/images/hedgehog/images/malcolm_arkime_reachback_acl.jpg new file mode 100644 index 000000000..0413492d8 Binary files /dev/null and b/docs/images/hedgehog/images/malcolm_arkime_reachback_acl.jpg differ diff --git a/sensor-iso/docs/images/malcolm_arkime_reachback_acl.png b/docs/images/hedgehog/images/malcolm_arkime_reachback_acl.png similarity index 100% rename from sensor-iso/docs/images/malcolm_arkime_reachback_acl.png rename to docs/images/hedgehog/images/malcolm_arkime_reachback_acl.png diff --git a/docs/images/hedgehog/images/ntp_host.jpg b/docs/images/hedgehog/images/ntp_host.jpg new file mode 100644 index 000000000..6d1813e9e Binary files /dev/null and b/docs/images/hedgehog/images/ntp_host.jpg differ diff --git a/sensor-iso/docs/images/ntp_host.png b/docs/images/hedgehog/images/ntp_host.png similarity index 100% rename from sensor-iso/docs/images/ntp_host.png rename to docs/images/hedgehog/images/ntp_host.png diff --git a/docs/images/hedgehog/images/opensearch_connection_protocol.jpg b/docs/images/hedgehog/images/opensearch_connection_protocol.jpg new file mode 100644 index 000000000..64bd309f4 Binary files /dev/null and b/docs/images/hedgehog/images/opensearch_connection_protocol.jpg differ diff --git a/sensor-iso/docs/images/opensearch_connection_protocol.png b/docs/images/hedgehog/images/opensearch_connection_protocol.png similarity index 100% rename from sensor-iso/docs/images/opensearch_connection_protocol.png rename to docs/images/hedgehog/images/opensearch_connection_protocol.png diff --git a/docs/images/hedgehog/images/opensearch_connection_success.jpg b/docs/images/hedgehog/images/opensearch_connection_success.jpg new file mode 100644 index 000000000..3540dcd3c Binary files /dev/null and b/docs/images/hedgehog/images/opensearch_connection_success.jpg differ diff --git a/sensor-iso/docs/images/opensearch_connection_success.png b/docs/images/hedgehog/images/opensearch_connection_success.png similarity index 100% rename from sensor-iso/docs/images/opensearch_connection_success.png rename to docs/images/hedgehog/images/opensearch_connection_success.png diff --git a/docs/images/hedgehog/images/opensearch_password.jpg b/docs/images/hedgehog/images/opensearch_password.jpg new file mode 100644 index 000000000..4b75fe4b1 Binary files /dev/null and b/docs/images/hedgehog/images/opensearch_password.jpg differ diff --git a/sensor-iso/docs/images/opensearch_password.png b/docs/images/hedgehog/images/opensearch_password.png similarity index 100% rename from sensor-iso/docs/images/opensearch_password.png rename to docs/images/hedgehog/images/opensearch_password.png diff --git a/docs/images/hedgehog/images/opensearch_ssl_verification.jpg b/docs/images/hedgehog/images/opensearch_ssl_verification.jpg new file mode 100644 index 000000000..3c23598de Binary files /dev/null and b/docs/images/hedgehog/images/opensearch_ssl_verification.jpg differ diff --git a/sensor-iso/docs/images/opensearch_ssl_verification.png b/docs/images/hedgehog/images/opensearch_ssl_verification.png similarity index 100% rename from sensor-iso/docs/images/opensearch_ssl_verification.png rename to docs/images/hedgehog/images/opensearch_ssl_verification.png diff --git a/docs/images/hedgehog/images/opensearch_username.jpg b/docs/images/hedgehog/images/opensearch_username.jpg new file mode 100644 index 000000000..72170e58c Binary files /dev/null and b/docs/images/hedgehog/images/opensearch_username.jpg differ diff --git a/sensor-iso/docs/images/opensearch_username.png b/docs/images/hedgehog/images/opensearch_username.png similarity index 100% rename from sensor-iso/docs/images/opensearch_username.png rename to docs/images/hedgehog/images/opensearch_username.png diff --git a/docs/images/hedgehog/images/root_config_mode.jpg b/docs/images/hedgehog/images/root_config_mode.jpg new file mode 100644 index 000000000..d4bda7b0f Binary files /dev/null and b/docs/images/hedgehog/images/root_config_mode.jpg differ diff --git a/sensor-iso/docs/images/root_config_mode.png b/docs/images/hedgehog/images/root_config_mode.png similarity index 100% rename from sensor-iso/docs/images/root_config_mode.png rename to docs/images/hedgehog/images/root_config_mode.png diff --git a/docs/images/hedgehog/images/select_iface.jpg b/docs/images/hedgehog/images/select_iface.jpg new file mode 100644 index 000000000..5c2fe0e3e Binary files /dev/null and b/docs/images/hedgehog/images/select_iface.jpg differ diff --git a/sensor-iso/docs/images/select_iface.png b/docs/images/hedgehog/images/select_iface.png similarity index 100% rename from sensor-iso/docs/images/select_iface.png rename to docs/images/hedgehog/images/select_iface.png diff --git a/docs/images/hedgehog/images/time_sync_mode.jpg b/docs/images/hedgehog/images/time_sync_mode.jpg new file mode 100644 index 000000000..509bc576d Binary files /dev/null and b/docs/images/hedgehog/images/time_sync_mode.jpg differ diff --git a/sensor-iso/docs/images/time_sync_mode.png b/docs/images/hedgehog/images/time_sync_mode.png similarity index 100% rename from sensor-iso/docs/images/time_sync_mode.png rename to docs/images/hedgehog/images/time_sync_mode.png diff --git a/docs/images/hedgehog/images/time_sync_success.jpg b/docs/images/hedgehog/images/time_sync_success.jpg new file mode 100644 index 000000000..11d1c0acb Binary files /dev/null and b/docs/images/hedgehog/images/time_sync_success.jpg differ diff --git a/sensor-iso/docs/images/time_sync_success.png b/docs/images/hedgehog/images/time_sync_success.png similarity index 100% rename from sensor-iso/docs/images/time_sync_success.png rename to docs/images/hedgehog/images/time_sync_success.png diff --git a/docs/images/hedgehog/images/users_and_passwords.jpg b/docs/images/hedgehog/images/users_and_passwords.jpg new file mode 100644 index 000000000..391a7ab8a Binary files /dev/null and b/docs/images/hedgehog/images/users_and_passwords.jpg differ diff --git a/sensor-iso/docs/images/users_and_passwords.png b/docs/images/hedgehog/images/users_and_passwords.png similarity index 100% rename from sensor-iso/docs/images/users_and_passwords.png rename to docs/images/hedgehog/images/users_and_passwords.png diff --git a/docs/images/hedgehog/images/zeek_file_carve_mode.jpg b/docs/images/hedgehog/images/zeek_file_carve_mode.jpg new file mode 100644 index 000000000..b1701f3be Binary files /dev/null and b/docs/images/hedgehog/images/zeek_file_carve_mode.jpg differ diff --git a/sensor-iso/docs/images/zeek_file_carve_mode.png b/docs/images/hedgehog/images/zeek_file_carve_mode.png similarity index 100% rename from sensor-iso/docs/images/zeek_file_carve_mode.png rename to docs/images/hedgehog/images/zeek_file_carve_mode.png diff --git a/docs/images/hedgehog/images/zeek_file_carve_scanners.jpg b/docs/images/hedgehog/images/zeek_file_carve_scanners.jpg new file mode 100644 index 000000000..322db2144 Binary files /dev/null and b/docs/images/hedgehog/images/zeek_file_carve_scanners.jpg differ diff --git a/sensor-iso/docs/images/zeek_file_carve_scanners.png b/docs/images/hedgehog/images/zeek_file_carve_scanners.png similarity index 100% rename from sensor-iso/docs/images/zeek_file_carve_scanners.png rename to docs/images/hedgehog/images/zeek_file_carve_scanners.png diff --git a/sensor-iso/docs/logo/Attribution.txt b/docs/images/hedgehog/logo/Attribution.txt similarity index 100% rename from sensor-iso/docs/logo/Attribution.txt rename to docs/images/hedgehog/logo/Attribution.txt diff --git a/sensor-iso/docs/logo/favicon.ico b/docs/images/hedgehog/logo/favicon.ico similarity index 100% rename from sensor-iso/docs/logo/favicon.ico rename to docs/images/hedgehog/logo/favicon.ico diff --git a/sensor-iso/docs/logo/font/ubuntu/CONTRIBUTING.txt b/docs/images/hedgehog/logo/font/ubuntu/CONTRIBUTING.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/CONTRIBUTING.txt rename to docs/images/hedgehog/logo/font/ubuntu/CONTRIBUTING.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/COPYRIGHT.txt b/docs/images/hedgehog/logo/font/ubuntu/COPYRIGHT.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/COPYRIGHT.txt rename to docs/images/hedgehog/logo/font/ubuntu/COPYRIGHT.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/DESCRIPTION.en_us.html b/docs/images/hedgehog/logo/font/ubuntu/DESCRIPTION.en_us.html similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/DESCRIPTION.en_us.html rename to docs/images/hedgehog/logo/font/ubuntu/DESCRIPTION.en_us.html diff --git a/sensor-iso/docs/logo/font/ubuntu/FONTLOG.txt b/docs/images/hedgehog/logo/font/ubuntu/FONTLOG.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/FONTLOG.txt rename to docs/images/hedgehog/logo/font/ubuntu/FONTLOG.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/LICENCE-FAQ.txt b/docs/images/hedgehog/logo/font/ubuntu/LICENCE-FAQ.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/LICENCE-FAQ.txt rename to docs/images/hedgehog/logo/font/ubuntu/LICENCE-FAQ.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/LICENCE.txt b/docs/images/hedgehog/logo/font/ubuntu/LICENCE.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/LICENCE.txt rename to docs/images/hedgehog/logo/font/ubuntu/LICENCE.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/METADATA.pb b/docs/images/hedgehog/logo/font/ubuntu/METADATA.pb similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/METADATA.pb rename to docs/images/hedgehog/logo/font/ubuntu/METADATA.pb diff --git a/sensor-iso/docs/logo/font/ubuntu/README.txt b/docs/images/hedgehog/logo/font/ubuntu/README.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/README.txt rename to docs/images/hedgehog/logo/font/ubuntu/README.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/TRADEMARKS.txt b/docs/images/hedgehog/logo/font/ubuntu/TRADEMARKS.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/TRADEMARKS.txt rename to docs/images/hedgehog/logo/font/ubuntu/TRADEMARKS.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/UFL.txt b/docs/images/hedgehog/logo/font/ubuntu/UFL.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/UFL.txt rename to docs/images/hedgehog/logo/font/ubuntu/UFL.txt diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-Bold.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Bold.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-Bold.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Bold.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-BoldItalic.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-BoldItalic.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-BoldItalic.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-BoldItalic.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-Italic.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Italic.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-Italic.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Italic.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-Light.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Light.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-Light.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Light.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-LightItalic.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-LightItalic.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-LightItalic.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-LightItalic.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-Medium.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Medium.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-Medium.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Medium.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-MediumItalic.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-MediumItalic.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-MediumItalic.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-MediumItalic.ttf diff --git a/sensor-iso/docs/logo/font/ubuntu/Ubuntu-Regular.ttf b/docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Regular.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntu/Ubuntu-Regular.ttf rename to docs/images/hedgehog/logo/font/ubuntu/Ubuntu-Regular.ttf diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/CONTRIBUTING.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/CONTRIBUTING.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/CONTRIBUTING.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/CONTRIBUTING.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/COPYRIGHT.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/COPYRIGHT.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/COPYRIGHT.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/COPYRIGHT.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/DESCRIPTION.en_us.html b/docs/images/hedgehog/logo/font/ubuntucondensed/DESCRIPTION.en_us.html similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/DESCRIPTION.en_us.html rename to docs/images/hedgehog/logo/font/ubuntucondensed/DESCRIPTION.en_us.html diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/FONTLOG.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/FONTLOG.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/FONTLOG.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/FONTLOG.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/LICENCE-FAQ.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/LICENCE-FAQ.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/LICENCE-FAQ.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/LICENCE-FAQ.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/LICENCE.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/LICENCE.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/LICENCE.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/LICENCE.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/METADATA.pb b/docs/images/hedgehog/logo/font/ubuntucondensed/METADATA.pb similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/METADATA.pb rename to docs/images/hedgehog/logo/font/ubuntucondensed/METADATA.pb diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/README.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/README.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/README.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/README.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/TRADEMARKS.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/TRADEMARKS.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/TRADEMARKS.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/TRADEMARKS.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/UFL.txt b/docs/images/hedgehog/logo/font/ubuntucondensed/UFL.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/UFL.txt rename to docs/images/hedgehog/logo/font/ubuntucondensed/UFL.txt diff --git a/sensor-iso/docs/logo/font/ubuntucondensed/UbuntuCondensed-Regular.ttf b/docs/images/hedgehog/logo/font/ubuntucondensed/UbuntuCondensed-Regular.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntucondensed/UbuntuCondensed-Regular.ttf rename to docs/images/hedgehog/logo/font/ubuntucondensed/UbuntuCondensed-Regular.ttf diff --git a/sensor-iso/docs/logo/font/ubuntumono/CONTRIBUTING.txt b/docs/images/hedgehog/logo/font/ubuntumono/CONTRIBUTING.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/CONTRIBUTING.txt rename to docs/images/hedgehog/logo/font/ubuntumono/CONTRIBUTING.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/COPYRIGHT.txt b/docs/images/hedgehog/logo/font/ubuntumono/COPYRIGHT.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/COPYRIGHT.txt rename to docs/images/hedgehog/logo/font/ubuntumono/COPYRIGHT.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/DESCRIPTION.en_us.html b/docs/images/hedgehog/logo/font/ubuntumono/DESCRIPTION.en_us.html similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/DESCRIPTION.en_us.html rename to docs/images/hedgehog/logo/font/ubuntumono/DESCRIPTION.en_us.html diff --git a/sensor-iso/docs/logo/font/ubuntumono/FONTLOG.txt b/docs/images/hedgehog/logo/font/ubuntumono/FONTLOG.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/FONTLOG.txt rename to docs/images/hedgehog/logo/font/ubuntumono/FONTLOG.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/LICENCE-FAQ.txt b/docs/images/hedgehog/logo/font/ubuntumono/LICENCE-FAQ.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/LICENCE-FAQ.txt rename to docs/images/hedgehog/logo/font/ubuntumono/LICENCE-FAQ.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/LICENCE.txt b/docs/images/hedgehog/logo/font/ubuntumono/LICENCE.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/LICENCE.txt rename to docs/images/hedgehog/logo/font/ubuntumono/LICENCE.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/METADATA.pb b/docs/images/hedgehog/logo/font/ubuntumono/METADATA.pb similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/METADATA.pb rename to docs/images/hedgehog/logo/font/ubuntumono/METADATA.pb diff --git a/sensor-iso/docs/logo/font/ubuntumono/README.txt b/docs/images/hedgehog/logo/font/ubuntumono/README.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/README.txt rename to docs/images/hedgehog/logo/font/ubuntumono/README.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/TRADEMARKS.txt b/docs/images/hedgehog/logo/font/ubuntumono/TRADEMARKS.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/TRADEMARKS.txt rename to docs/images/hedgehog/logo/font/ubuntumono/TRADEMARKS.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/UFL.txt b/docs/images/hedgehog/logo/font/ubuntumono/UFL.txt similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/UFL.txt rename to docs/images/hedgehog/logo/font/ubuntumono/UFL.txt diff --git a/sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-Bold.ttf b/docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-Bold.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-Bold.ttf rename to docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-Bold.ttf diff --git a/sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-BoldItalic.ttf b/docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-BoldItalic.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-BoldItalic.ttf rename to docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-BoldItalic.ttf diff --git a/sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-Italic.ttf b/docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-Italic.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-Italic.ttf rename to docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-Italic.ttf diff --git a/sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-Regular.ttf b/docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-Regular.ttf similarity index 100% rename from sensor-iso/docs/logo/font/ubuntumono/UbuntuMono-Regular.ttf rename to docs/images/hedgehog/logo/font/ubuntumono/UbuntuMono-Regular.ttf diff --git a/sensor-iso/docs/logo/hedgehog-bw-large.png b/docs/images/hedgehog/logo/hedgehog-bw-large.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-bw-large.png rename to docs/images/hedgehog/logo/hedgehog-bw-large.png diff --git a/sensor-iso/docs/logo/hedgehog-bw-small.png b/docs/images/hedgehog/logo/hedgehog-bw-small.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-bw-small.png rename to docs/images/hedgehog/logo/hedgehog-bw-small.png diff --git a/sensor-iso/docs/logo/hedgehog-bw-w-text-large.png b/docs/images/hedgehog/logo/hedgehog-bw-w-text-large.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-bw-w-text-large.png rename to docs/images/hedgehog/logo/hedgehog-bw-w-text-large.png diff --git a/sensor-iso/docs/logo/hedgehog-bw-w-text-small.png b/docs/images/hedgehog/logo/hedgehog-bw-w-text-small.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-bw-w-text-small.png rename to docs/images/hedgehog/logo/hedgehog-bw-w-text-small.png diff --git a/sensor-iso/docs/logo/hedgehog-bw-w-text.ai b/docs/images/hedgehog/logo/hedgehog-bw-w-text.ai similarity index 100% rename from sensor-iso/docs/logo/hedgehog-bw-w-text.ai rename to docs/images/hedgehog/logo/hedgehog-bw-w-text.ai diff --git a/sensor-iso/docs/logo/hedgehog-bw.ai b/docs/images/hedgehog/logo/hedgehog-bw.ai similarity index 100% rename from sensor-iso/docs/logo/hedgehog-bw.ai rename to docs/images/hedgehog/logo/hedgehog-bw.ai diff --git a/sensor-iso/docs/logo/hedgehog-color-large.png b/docs/images/hedgehog/logo/hedgehog-color-large.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color-large.png rename to docs/images/hedgehog/logo/hedgehog-color-large.png diff --git a/sensor-iso/docs/logo/hedgehog-color-small.png b/docs/images/hedgehog/logo/hedgehog-color-small.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color-small.png rename to docs/images/hedgehog/logo/hedgehog-color-small.png diff --git a/sensor-iso/docs/logo/hedgehog-color-w-text-large.png b/docs/images/hedgehog/logo/hedgehog-color-w-text-large.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color-w-text-large.png rename to docs/images/hedgehog/logo/hedgehog-color-w-text-large.png diff --git a/sensor-iso/docs/logo/hedgehog-color-w-text-small.png b/docs/images/hedgehog/logo/hedgehog-color-w-text-small.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color-w-text-small.png rename to docs/images/hedgehog/logo/hedgehog-color-w-text-small.png diff --git a/sensor-iso/docs/logo/hedgehog-color-w-text.ai b/docs/images/hedgehog/logo/hedgehog-color-w-text.ai similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color-w-text.ai rename to docs/images/hedgehog/logo/hedgehog-color-w-text.ai diff --git a/sensor-iso/docs/logo/hedgehog-color-w-text.png b/docs/images/hedgehog/logo/hedgehog-color-w-text.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color-w-text.png rename to docs/images/hedgehog/logo/hedgehog-color-w-text.png diff --git a/sensor-iso/docs/logo/hedgehog-color.ai b/docs/images/hedgehog/logo/hedgehog-color.ai similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color.ai rename to docs/images/hedgehog/logo/hedgehog-color.ai diff --git a/sensor-iso/docs/logo/hedgehog-color.eps b/docs/images/hedgehog/logo/hedgehog-color.eps similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color.eps rename to docs/images/hedgehog/logo/hedgehog-color.eps diff --git a/sensor-iso/docs/logo/hedgehog-color.png b/docs/images/hedgehog/logo/hedgehog-color.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-color.png rename to docs/images/hedgehog/logo/hedgehog-color.png diff --git a/sensor-iso/docs/logo/hedgehog-wallpaper-plain.png b/docs/images/hedgehog/logo/hedgehog-wallpaper-plain.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-wallpaper-plain.png rename to docs/images/hedgehog/logo/hedgehog-wallpaper-plain.png diff --git a/sensor-iso/docs/logo/hedgehog-wallpaper.png b/docs/images/hedgehog/logo/hedgehog-wallpaper.png similarity index 100% rename from sensor-iso/docs/logo/hedgehog-wallpaper.png rename to docs/images/hedgehog/logo/hedgehog-wallpaper.png diff --git a/sensor-iso/docs/logo/hedgehog-wallpaper.xcf b/docs/images/hedgehog/logo/hedgehog-wallpaper.xcf similarity index 100% rename from sensor-iso/docs/logo/hedgehog-wallpaper.xcf rename to docs/images/hedgehog/logo/hedgehog-wallpaper.xcf diff --git a/docs/index-management.md b/docs/index-management.md new file mode 100644 index 000000000..2277fbd07 --- /dev/null +++ b/docs/index-management.md @@ -0,0 +1,7 @@ +# OpenSearch index management + +Malcolm releases prior to v6.2.0 used environment variables to configure OpenSearch [Index State Management](https://opensearch.org/docs/latest/im-plugin/ism/index/) [policies](https://opensearch.org/docs/latest/im-plugin/ism/policies/). + +Since then, OpenSearch Dashboards has developed and released plugins with UIs for [Index State Management](https://opensearch.org/docs/latest/im-plugin/ism/index/) and [Snapshot Management](https://opensearch.org/docs/latest/opensearch/snapshots/sm-dashboards/). Because these plugins provide a more comprehensive and user-friendly interfaces for these features, the old environment variable-based configuration code has been removed from Malcolm, with the exception of the code that uses `OPENSEARCH_INDEX_SIZE_PRUNE_LIMIT` and `OPENSEARCH_INDEX_SIZE_PRUNE_NAME_SORT` which deals with deleting the oldest network session metadata indices when the database exceeds a certain size. + +Note that OpenSearch index state management and snapshot management only deals with disk space consumed by OpenSearch indices: it does not have anything to do with PCAP file storage. The `MANAGE_PCAP_FILES` environment variable in the [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) file can be used to allow Arkime to prune old PCAP files based on available disk space. \ No newline at end of file diff --git a/docs/live-analysis.md b/docs/live-analysis.md new file mode 100644 index 000000000..11d046ec1 --- /dev/null +++ b/docs/live-analysis.md @@ -0,0 +1,62 @@ +# Live analysis + +* [Live analysis](#LiveAnalysis) + - [Using a network sensor appliance](#Hedgehog) + - [Monitoring local network interfaces](#LocalPCAP) + - [Manually forwarding logs from an external source](#ExternalForward) + +## Using a network sensor appliance + +A dedicated network sensor appliance is the recommended method for capturing and analyzing live network traffic when performance and throughput is of utmost importance. [Hedgehog Linux](hedgehog.md) is a custom Debian-based operating system built to: + +* monitor network interfaces +* capture packets to PCAP files +* detect file transfers in network traffic and extract and scan those files for threats +* generate and forward Zeek and Suricata logs, Arkime sessions, and other information to [Malcolm](https://github.com/idaholab/Malcolm) + +Please see the [Hedgehog Linux README](hedgehog.md) for more information. + +## Monitoring local network interfaces + +Malcolm's `pcap-capture`, `suricata-live` and `zeek-live` containers can monitor one or more local network interfaces, specified by the `PCAP_IFACE` environment variable in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). These containers are started with additional privileges (`IPC_LOCK`, `NET_ADMIN`, `NET_RAW`, and `SYS_ADMIN`) to allow opening network interfaces in promiscuous mode for capture. + +The instances of Zeek and Suricata (in the `suricata-live` and `zeek-live` containers when the `SURICATA_LIVE_CAPTURE` and `ZEEK_LIVE_CAPTURE` environment variables in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) are set to `true`, respectively) analyze traffic on-the-fly and generate log files containing network session metadata. These log files are in turn scanned by Filebeat and forwarded to Logstash for enrichment and indexing into the OpenSearch document store. + +In contrast, the `pcap-capture` container buffers traffic to PCAP files and periodically rotates these files for processing (by Arkime's `capture` utlity in the `arkime` container) according to the thresholds defined by the `PCAP_ROTATE_MEGABYTES` and `PCAP_ROTATE_MINUTES` environment variables in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). If for some reason (e.g., a low resources environment) you also want Zeek and Suricata to process these intermediate PCAP files rather than monitoring the network interfaces directly, you can set `SURICATA_ROTATED_PCAP`/`ZEEK_ROTATED_PCAP` to `true` and `SURICATA_LIVE_CAPTURE`/`ZEEK_LIVE_CAPTURE` to false. + +These various options for monitoring traffic on local network interfaces can also be configured by running [`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning). + +Note that currently Microsoft Windows and Apple macOS platforms run Docker inside of a virtualized environment. Live traffic capture and analysis on those platforms would require additional configuration of virtual interfaces and port forwarding in Docker which is outside of the scope of this document. + +## Manually forwarding logs from an external source + +Malcolm's Logstash instance can also be configured to accept logs from a [remote forwarder](https://www.elastic.co/products/beats/filebeat) by running [`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning) and answering "yes" to "`Expose Logstash port to external hosts?`." Enabling encrypted transport of these logs files is discussed in [Configure authentication](authsetup.md#AuthSetup) and the description of the `BEATS_SSL` environment variable in the [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) file. + +Configuring Filebeat to forward Zeek logs to Malcolm might look something like this example [`filebeat.yml`](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-reference-yml.html): +``` +filebeat.inputs: +- type: log + paths: + - /var/zeek/*.log + fields_under_root: true + compression_level: 0 + exclude_lines: ['^\s*#'] + scan_frequency: 10s + clean_inactive: 180m + ignore_older: 120m + close_inactive: 90m + close_renamed: true + close_removed: true + close_eof: false + clean_renamed: true + clean_removed: true + +output.logstash: + hosts: ["192.0.2.123:5044"] + ssl.enabled: true + ssl.certificate_authorities: ["/foo/bar/ca.crt"] + ssl.certificate: "/foo/bar/client.crt" + ssl.key: "/foo/bar/client.key" + ssl.supported_protocols: "TLSv1.2" + ssl.verification_mode: "none" +``` \ No newline at end of file diff --git a/docs/malcolm-config.md b/docs/malcolm-config.md new file mode 100644 index 000000000..839bd7caf --- /dev/null +++ b/docs/malcolm-config.md @@ -0,0 +1,77 @@ +# Malcolm Configuration + +If you already have Docker and Docker Compose installed, the `install.py` script can still help you tune system configuration and `docker-compose.yml` parameters for Malcolm. To run it in "configuration only" mode, bypassing the steps to install Docker and Docker Compose, run it like this: +``` +./scripts/install.py --configure +``` + +Although `install.py` will attempt to automate many of the following configuration and tuning parameters, they are nonetheless listed in the following sections for reference: + +## `docker-compose.yml` parameters + +Edit `docker-compose.yml` and search for the `OPENSEARCH_JAVA_OPTS` key. Edit the `-Xms4g -Xmx4g` values, replacing `4g` with a number that is half of your total system memory, or just under 32 gigabytes, whichever is less. So, for example, if I had 64 gigabytes of memory I would edit those values to be `-Xms31g -Xmx31g`. This indicates how much memory can be allocated to the OpenSearch heaps. For a pleasant experience, I would suggest not using a value under 10 gigabytes. Similar values can be modified for Logstash with `LS_JAVA_OPTS`, where using 3 or 4 gigabytes is recommended. + +Various other environment variables inside of `docker-compose.yml` can be tweaked to control aspects of how Malcolm behaves, particularly with regards to processing PCAP files and Zeek logs. The environment variables of particular interest are located near the top of that file under **Commonly tweaked configuration options**, which include: + +* `ARKIME_ANALYZE_PCAP_THREADS` – the number of threads available to Arkime for analyzing PCAP files (default `1`) +* `AUTO_TAG` – if set to `true`, Malcolm will automatically create Arkime sessions and Zeek logs with tags based on the filename, as described in [Tagging](upload.md#Tagging) (default `true`) +* `BEATS_SSL` – if set to `true`, Logstash will use require encrypted communications for any external [Beats](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html)-based forwarders from which it will accept logs (default `true`) +* `CONNECTION_SECONDS_SEVERITY_THRESHOLD` - when [severity scoring](severity.md#Severity) is enabled, this variable indicates the duration threshold (in seconds) for assigning severity to long connections (default `3600`) +* `EXTRACTED_FILE_CAPA_VERBOSE` – if set to `true`, all Capa rule hits will be logged; otherwise (`false`) only [MITRE ATT&CK® technique](https://attack.mitre.org/techniques) classifications will be logged +* `EXTRACTED_FILE_ENABLE_CAPA` – if set to `true`, [Zeek-extracted files](file-scanning.md#ZeekFileExtraction) that are determined to be PE (portable executable) files will be scanned with [Capa](https://github.com/fireeye/capa) +* `EXTRACTED_FILE_ENABLE_CLAMAV` – if set to `true`, [Zeek-extracted files](file-scanning.md#ZeekFileExtraction) will be scanned with [ClamAV](https://www.clamav.net/) +* `EXTRACTED_FILE_ENABLE_YARA` – if set to `true`, [Zeek-extracted files](file-scanning.md#ZeekFileExtraction) will be scanned with [Yara](https://github.com/VirusTotal/yara) +* `EXTRACTED_FILE_HTTP_SERVER_ENABLE` – if set to `true`, the directory containing [Zeek-extracted files](file-scanning.md#ZeekFileExtraction) will be served over HTTP at `./extracted-files/` (e.g., [https://localhost/extracted-files/](https://localhost/extracted-files/) if you are connecting locally) +* `EXTRACTED_FILE_HTTP_SERVER_ENCRYPT` – if set to `true`, those Zeek-extracted files will be AES-256-CBC-encrypted in an `openssl enc`-compatible format (e.g., `openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe`) +* `EXTRACTED_FILE_HTTP_SERVER_KEY` – specifies the AES-256-CBC decryption password for encrypted Zeek-extracted files; used in conjunction with `EXTRACTED_FILE_HTTP_SERVER_ENCRYPT` +* `EXTRACTED_FILE_IGNORE_EXISTING` – if set to `true`, files extant in `./zeek-logs/extract_files/` directory will be ignored on startup rather than scanned +* `EXTRACTED_FILE_PRESERVATION` – determines behavior for preservation of [Zeek-extracted files](file-scanning.md#ZeekFileExtraction) +* `EXTRACTED_FILE_UPDATE_RULES` – if set to `true`, file scanner engines (e.g., ClamAV, Capa, Yara) will periodically update their rule definitions +* `EXTRACTED_FILE_YARA_CUSTOM_ONLY` – if set to `true`, Malcolm will bypass the default [Yara ruleset](https://github.com/Neo23x0/signature-base) and use only user-defined rules in `./yara/rules` +* `FREQ_LOOKUP` - if set to `true`, domain names (from DNS queries and SSL server names) will be assigned entropy scores as calculated by [`freq`](https://github.com/MarkBaggett/freq) (default `false`) +* `FREQ_SEVERITY_THRESHOLD` - when [severity scoring](severity.md#Severity) is enabled, this variable indicates the entropy threshold for assigning severity to events with entropy scores calculated by [`freq`](https://github.com/MarkBaggett/freq); a lower value will only assign severity scores to fewer domain names with higher entropy (e.g., `2.0` for `NQZHTFHRMYMTVBQJE.COM`), while a higher value will assign severity scores to more domain names with lower entropy (e.g., `7.5` for `naturallanguagedomain.example.org`) (default `2.0`) +* `LOGSTASH_OUI_LOOKUP` – if set to `true`, Logstash will map MAC addresses to vendors for all source and destination MAC addresses when analyzing Zeek logs (default `true`) +* `LOGSTASH_REVERSE_DNS` – if set to `true`, Logstash will perform a reverse DNS lookup for all external source and destination IP address values when analyzing Zeek logs (default `false`) +* `LOGSTASH_SEVERITY_SCORING` - if set to `true`, Logstash will perform [severity scoring](severity.md#Severity) when analyzing Zeek logs (default `true`) +* `MANAGE_PCAP_FILES` – if set to `true`, all PCAP files imported into Malcolm will be marked as available for deletion by Arkime if available storage space becomes too low (default `false`) +* `MAXMIND_GEOIP_DB_LICENSE_KEY` - Malcolm uses MaxMind's free GeoLite2 databases for GeoIP lookups. As of December 30, 2019, these databases are [no longer available](https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/) for download via a public URL. Instead, they must be downloaded using a MaxMind license key (available without charge [from MaxMind](https://www.maxmind.com/en/geolite2/signup)). The license key can be specified here for GeoIP database downloads during build- and run-time. +* `OPENSEARCH_LOCAL` - if set to `true`, Malcolm will use its own internal [OpenSearch instance](opensearch-instances.md#OpenSearchInstance) (default `true`) +* `OPENSEARCH_URL` - when using Malcolm's internal OpenSearch instance (i.e., `OPENSEARCH_LOCAL` is `true`) this should be `http://opensearch:9200`, otherwise this value specifies the primary remote instance URL in the format `protocol://host:port` (default `http://opensearch:9200`) +* `OPENSEARCH_SSL_CERTIFICATE_VERIFICATION` - if set to `true`, connections to the primary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (default `false`) +* `OPENSEARCH_SECONDARY` - if set to `true`, Malcolm will forward logs to a secondary remote OpenSearch instance in addition to the primary (local or remote) OpenSearch instance (default `false`) +* `OPENSEARCH_SECONDARY_URL` - when forwarding to a secondary remote OpenSearch instance (i.e., `OPENSEARCH_SECONDARY` is `true`) this value specifies the secondary remote instance URL in the format `protocol://host:port` +* `OPENSEARCH_SECONDARY_SSL_CERTIFICATE_VERIFICATION` - if set to `true`, connections to the secondary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (default `false`) +* `NETBOX_DISABLED` - if set to `true`, Malcolm will **not** start and manage a [NetBox](netbox.md#NetBox) instance (default `true`) +* `NETBOX_CRON` - if set to `true`, network traffic metadata will periodically be queried and used to populate Malcolm's [NetBox](netbox.md#NetBox) instance +* `NGINX_BASIC_AUTH` - if set to `true`, use [TLS-encrypted HTTP basic](authsetup.md#AuthBasicAccountManagement) authentication (default); if set to `false`, use [Lightweight Directory Access Protocol (LDAP)](authsetup.md#AuthLDAP) authentication +* `NGINX_LOG_ACCESS_AND_ERRORS` - if set to `true`, all access to Malcolm via its [web interfaces](quickstart.md#UserInterfaceURLs) will be logged to OpenSearch (default `false`) +* `NGINX_SSL` - if set to `true`, require HTTPS connections to Malcolm's `nginx-proxy` container (default); if set to `false`, use unencrypted HTTP connections (using unsecured HTTP connections is **NOT** recommended unless you are running Malcolm behind another reverse proxy like Traefik, Caddy, etc.) +* `PCAP_ENABLE_NETSNIFF` – if set to `true`, Malcolm will capture network traffic on the local network interface(s) indicated in `PCAP_IFACE` using [netsniff-ng](http://netsniff-ng.org/) +* `PCAP_ENABLE_TCPDUMP` – if set to `true`, Malcolm will capture network traffic on the local network interface(s) indicated in `PCAP_IFACE` using [tcpdump](https://www.tcpdump.org/); there is no reason to enable *both* `PCAP_ENABLE_NETSNIFF` and `PCAP_ENABLE_TCPDUMP` +* `PCAP_FILTER` – specifies a tcpdump-style filter expression for local packet capture; leave blank to capture all traffic +* `PCAP_IFACE` – used to specify the network interface(s) for local packet capture if `PCAP_ENABLE_NETSNIFF`, `PCAP_ENABLE_TCPDUMP`, `ZEEK_LIVE_CAPTURE` or `SURICATA_LIVE_CAPTURE` are enabled; for multiple interfaces, separate the interface names with a comma (e.g., `'enp0s25'` or `'enp10s0,enp11s0'`) +* `PCAP_IFACE_TWEAK` - if set to `true`, Malcolm will [use `ethtool`](shared/bin/nic-capture-setup.sh) to disable NIC hardware offloading features and adjust ring buffer sizes for capture interface(s); this should be `true` if the interface(s) are being used for capture only, `false` if they are being used for management/communication +* `PCAP_ROTATE_MEGABYTES` – used to specify how large a locally-captured PCAP file can become (in megabytes) before it is closed for processing and a new PCAP file created +* `PCAP_ROTATE_MINUTES` – used to specify a time interval (in minutes) after which a locally-captured PCAP file will be closed for processing and a new PCAP file created +* `pipeline.workers`, `pipeline.batch.size` and `pipeline.batch.delay` - these settings are used to tune the performance and resource utilization of the the `logstash` container; see [Tuning and Profiling Logstash Performance](https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html), [`logstash.yml`](https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html) and [Multiple Pipelines](https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html) +* `PUID` and `PGID` - Docker runs all of its containers as the privileged `root` user by default. For better security, Malcolm immediately drops to non-privileged user accounts for executing internal processes wherever possible. The `PUID` (**p**rocess **u**ser **ID**) and `PGID` (**p**rocess **g**roup **ID**) environment variables allow Malcolm to map internal non-privileged user accounts to a corresponding [user account](https://en.wikipedia.org/wiki/User_identifier) on the host. +* `SENSITIVE_COUNTRY_CODES` - when [severity scoring](severity.md#Severity) is enabled, this variable defines a comma-separated list of sensitive countries (using [ISO 3166-1 alpha-2 codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Current_codes)) (default `'AM,AZ,BY,CN,CU,DZ,GE,HK,IL,IN,IQ,IR,KG,KP,KZ,LY,MD,MO,PK,RU,SD,SS,SY,TJ,TM,TW,UA,UZ'`, taken from the U.S. Department of Energy Sensitive Country List) +* `SURICATA_AUTO_ANALYZE_PCAP_FILES` – if set to `true`, all PCAP files imported into Malcolm will automatically be analyzed by Suricata, and the resulting logs will also be imported (default `false`) +* `SURICATA_AUTO_ANALYZE_PCAP_THREADS` – the number of threads available to Malcolm for analyzing Suricata logs (default `1`) +* `SURICATA_CUSTOM_RULES_ONLY` – if set to `true`, Malcolm will bypass the default [Suricata ruleset](https://github.com/OISF/suricata/tree/master/rules) and use only user-defined rules (`./suricata/rules/*.rules`). +* `SURICATA_UPDATE_RULES` – if set to `true`, Suricata signatures will periodically be updated (default `false`) +* `SURICATA_LIVE_CAPTURE` - if set to `true`, Suricata will monitor live traffic on the local interface(s) defined by `PCAP_FILTER` +* `SURICATA_ROTATED_PCAP` - if set to `true`, Suricata can analyze captured PCAP files captured by `netsniff-ng` or `tcpdump` (see `PCAP_ENABLE_NETSNIFF` and `PCAP_ENABLE_TCPDUMP`, as well as `SURICATA_AUTO_ANALYZE_PCAP_FILES`); if `SURICATA_LIVE_CAPTURE` is `true`, this should be false, otherwise Suricata will see duplicate traffic +* `SURICATA_…` - the [`suricata` container entrypoint script](shared/bin/suricata_config_populate.py) can use **many** more environment variables to tweak [suricata.yaml](https://github.com/OISF/suricata/blob/master/suricata.yaml.in); in that script, `DEFAULT_VARS` defines those variables (albeit without the `SURICATA_` prefix you must add to each for use) +* `TOTAL_MEGABYTES_SEVERITY_THRESHOLD` - when [severity scoring](severity.md#Severity) is enabled, this variable indicates the size threshold (in megabytes) for assigning severity to large connections or file transfers (default `1000`) +* `VTOT_API2_KEY` – used to specify a [VirusTotal Public API v.20](https://www.virustotal.com/en/documentation/public-api/) key, which, if specified, will be used to submit hashes of [Zeek-extracted files](file-scanning.md#ZeekFileExtraction) to VirusTotal +* `ZEEK_AUTO_ANALYZE_PCAP_FILES` – if set to `true`, all PCAP files imported into Malcolm will automatically be analyzed by Zeek, and the resulting logs will also be imported (default `false`) +* `ZEEK_AUTO_ANALYZE_PCAP_THREADS` – the number of threads available to Malcolm for analyzing Zeek logs (default `1`) +* `ZEEK_DISABLE_…` - if set to any non-blank value, each of these variables can be used to disable a certain Zeek function when it analyzes PCAP files (for example, setting `ZEEK_DISABLE_LOG_PASSWORDS` to `true` to disable logging of cleartext passwords) +* `ZEEK_DISABLE_BEST_GUESS_ICS` - see ["Best Guess" Fingerprinting for ICS Protocols](ics-best-guess.md#ICSBestGuess) +* `ZEEK_EXTRACTOR_MODE` – determines the file extraction behavior for file transfers detected by Zeek; see [Automatic file extraction and scanning](file-scanning.md#ZeekFileExtraction) for more details +* `ZEEK_INTEL_FEED_SINCE` - when querying a [TAXII](zeek-intel.md#ZeekIntelSTIX) or [MISP](zeek-intel.md#ZeekIntelMISP) feed, only process threat indicators that have been created or modified since the time represented by this value; it may be either a fixed date/time (`01/01/2021`) or relative interval (`30 days ago`) +* `ZEEK_INTEL_ITEM_EXPIRATION` - specifies the value for Zeek's [`Intel::item_expiration`](https://docs.zeek.org/en/current/scripts/base/frameworks/intel/main.zeek.html#id-Intel::item_expiration) timeout as used by the [Zeek Intelligence Framework](zeek-intel.md#ZeekIntel) (default `-1min`, which disables item expiration) +* `ZEEK_INTEL_REFRESH_CRON_EXPRESSION` - specifies a [cron expression](https://en.wikipedia.org/wiki/Cron#CRON_expression) indicating the refresh interval for generating the [Zeek Intelligence Framework](zeek-intel.md#ZeekIntel) files (defaults to empty, which disables automatic refresh) +* `ZEEK_LIVE_CAPTURE` - if set to `true`, Zeek will monitor live traffic on the local interface(s) defined by `PCAP_FILTER` +* `ZEEK_ROTATED_PCAP` - if set to `true`, Zeek can analyze captured PCAP files captured by `netsniff-ng` or `tcpdump` (see `PCAP_ENABLE_NETSNIFF` and `PCAP_ENABLE_TCPDUMP`, as well as `ZEEK_AUTO_ANALYZE_PCAP_FILES`); if `ZEEK_LIVE_CAPTURE` is `true`, this should be false, otherwise Zeek will see duplicate traffic \ No newline at end of file diff --git a/docs/malcolm-iso.md b/docs/malcolm-iso.md new file mode 100644 index 000000000..0b192818a --- /dev/null +++ b/docs/malcolm-iso.md @@ -0,0 +1,87 @@ +# Malcolm installer ISO + +* [Malcolm installer ISO](#ISO) + - [Installation](#ISOInstallation) + - [Generating the ISO](#ISOBuild) + - [Setup](#ISOSetup) + - [Time synchronization](time-sync.md#ConfigTime) + +Malcolm's Docker-based deployment model makes Malcolm able to run on a variety of platforms. However, in some circumstances (for example, as a long-running appliance as part of a security operations center, or inside of a virtual machine) it may be desirable to install Malcolm as a dedicated standalone installation. + +Malcolm can be packaged into an installer ISO based on the current [stable release](https://wiki.debian.org/DebianStable) of [Debian](https://www.debian.org/). This [customized Debian installation](https://wiki.debian.org/DebianLive) is preconfigured with the bare minimum software needed to run Malcolm. + +## Generating the ISO + +Official downloads of the Malcolm installer ISO are not provided: however, it can be built easily on an internet-connected Linux host with Vagrant: + +* [Vagrant](https://www.vagrantup.com/) + - [`vagrant-reload`](https://github.com/aidanns/vagrant-reload) plugin + - [`vagrant-sshfs`](https://github.com/dustymabe/vagrant-sshfs) plugin + - [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box + +The build should work with either the [VirtualBox](https://www.virtualbox.org/) provider or the [libvirt](https://libvirt.org/) provider: + +* [VirtualBox](https://www.virtualbox.org/) [provider](https://www.vagrantup.com/docs/providers/virtualbox) + - [`vagrant-vbguest`](https://github.com/dotless-de/vagrant-vbguest) plugin +* [libvirt](https://libvirt.org/) + - [`vagrant-libvirt`](https://github.com/vagrant-libvirt/vagrant-libvirt) provider plugin + - [`vagrant-mutate`](https://github.com/sciurus/vagrant-mutate) plugin to convert [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box to `libvirt` format + +To perform a clean build the Malcolm installer ISO, navigate to your local Malcolm working copy and run: + +``` +$ ./malcolm-iso/build_via_vagrant.sh -f +… +Starting build machine... +Bringing machine 'default' up with 'virtualbox' provider... +… +``` + +Building the ISO may take 30 minutes or more depending on your system. As the build finishes, you will see the following message indicating success: + +``` +… +Finished, created "/malcolm-build/malcolm-iso/malcolm-6.4.0.iso" +… +``` + +By default, Malcolm's Docker images are not packaged with the installer ISO, assuming instead that you will pull the [latest images](https://hub.docker.com/u/malcolmnetsec) with a `docker-compose pull` command as described in the [Quick start](quickstart.md#QuickStart) section. If you wish to build an ISO with the latest Malcolm images included, follow the directions to create [pre-packaged installation files](development.md#Packager), which include a tarball with a name like `malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz`. Then, pass that images tarball to the ISO build script with a `-d`, like this: + +``` +$ ./malcolm-iso/build_via_vagrant.sh -f -d malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz +… +``` + +A system installed from the resulting ISO will load the Malcolm Docker images upon first boot. This method is desirable when the ISO is to be installed in an "air gapped" environment or for distribution to non-networked machines. + +Alternately, if you have forked Malcolm on GitHub, [workflow files](./.github/workflows/) are provided which contain instructions for GitHub to build the docker images and [sensor](live-analysis.md#Hedgehog) and [Malcolm](#ISO) installer ISOs, specifically [`malcolm-iso-build-docker-wrap-push-ghcr.yml`](./.github/workflows/malcolm-iso-build-docker-wrap-push-ghcr.yml) for the Malcolm ISO. You'll need to run the workflows to build and push your fork's Malcolm docker images before building the ISO. The resulting ISO file is wrapped in a Docker image that provides an HTTP server from which the ISO may be downloaded. + +## Installation + +The installer is designed to require as little user input as possible. For this reason, there are NO user prompts and confirmations about partitioning and reformatting hard disks for use by the operating system. The installer assumes that all non-removable storage media (eg., SSD, HDD, NVMe, etc.) are available for use and ⛔🆘😭💀 ***will partition and format them without warning*** 💀😭🆘⛔. + +The installer will ask for several pieces of information prior to installing the Malcolm base operating system: + +* Hostname +* Domain name +* Root password – (optional) a password for the privileged root account which is rarely needed +* User name: the name for the non-privileged service account user account under which the Malcolm runs +* User password – a password for the non-privileged sensor account +* Encryption password (optional) – if the encrypted installation option was selected at boot time, the encryption password must be entered every time the system boots + +At the end of the installation process, you will be prompted with a few self-explanatory yes/no questions: + +* **Disable IPv6?** +* **Automatically login to the GUI session?** +* **Should the GUI session be locked due to inactivity?** +* **Display the [Standard Mandatory DoD Notice and Consent Banner](https://www.stigviewer.com/stig/application_security_and_development/2018-12-24/finding/V-69349)?** *(only applies when installed on U.S. government information systems)* + +Following these prompts, the installer will reboot and the Malcolm base operating system will boot. + +## Setup + +When the system boots for the first time, the Malcolm Docker images will load if the installer was built with pre-packaged installation files as described above. Wait for this operation to continue (the progress dialog will disappear when they have finished loading) before continuing the setup. + +Open a terminal (click the red terminal 🗔 icon next to the Debian swirl logo 🍥 menu button in the menu bar). At this point, setup is similar to the steps described in the [Quick start](quickstart.md#QuickStart) section. Navigate to the Malcolm directory (`cd ~/Malcolm`) and run [`auth_setup`](authsetup.md#AuthSetup) to configure authentication. If the ISO didn't have pre-packaged Malcolm images, or if you'd like to retrieve the latest updates, run `docker-compose pull`. Finalize your configuration by running `scripts/install.py --configure` and follow the prompts as illustrated in the [installation example](ubuntu-install-example.md#InstallationExample). + +Once Malcolm is configured, you can [start Malcolm](running.md#Starting) via the command line or by clicking the circular yellow Malcolm icon in the menu bar. \ No newline at end of file diff --git a/docs/malcolm-preparation.md b/docs/malcolm-preparation.md new file mode 100644 index 000000000..6e0861c60 --- /dev/null +++ b/docs/malcolm-preparation.md @@ -0,0 +1,15 @@ +# Configuration + +* [Configuration](#Configuration) + - [Recommended system requirements](system-requirements.md#SystemRequirements) + - [Malcolm Configuration](malcolm-config.md#ConfigAndTuning) + + [`docker-compose.yml` parameters](malcolm-config.md#DockerComposeYml) + - [Configure authentication](authsetup.md#AuthSetup) + + [Local account management](authsetup.md#AuthBasicAccountManagement) + + [Lightweight Directory Access Protocol (LDAP) authentication](authsetup.md#AuthLDAP) + * [LDAP connection security](authsetup.md#AuthLDAPSecurity) + + [TLS certificates](authsetup.md#TLSCerts) + - [Platform-specific Configuration](host-config.md#HostSystemConfig) + + [Linux host system configuration](host-config-linux.md#HostSystemConfigLinux) + + [macOS host system configuration](host-config-macos.md#HostSystemConfigMac) + + [Windows host system configuration](host-config-windows.md#HostSystemConfigWindows) \ No newline at end of file diff --git a/docs/malcolm-upgrade.md b/docs/malcolm-upgrade.md new file mode 100644 index 000000000..8984164e5 --- /dev/null +++ b/docs/malcolm-upgrade.md @@ -0,0 +1,73 @@ +# Upgrading Malcolm + +At this time there is not an "official" upgrade procedure to get from one version of Malcolm to the next, as it may vary from platform to platform. However, the process is fairly simple can be done by following these steps: + +## Update the underlying system + +You may wish to get the official updates for the underlying system's software packages before you proceed. Consult the documentation of your operating system for how to do this. + +If you are upgrading an Malcolm instance installed from [Malcolm installation ISO](malcolm-iso.md#ISOInstallation), follow scenario 2 below. Due to the Malcolm base operating system's [hardened](hardening.md#Hardening) configuration, when updating the underlying system, temporarily set the umask value to Debian default (`umask 0022` in the root shell in which updates are being performed) instead of the more restrictive Malcolm default. This will allow updates to be applied with the right permissions. + +## Scenario 1: Malcolm is a GitHub clone + +If you checked out a working copy of the Malcolm repository from GitHub with a `git clone` command, here are the basic steps to performing an upgrade: + +1. stop Malcolm + * `./scripts/stop` +2. stash changes to `docker-compose.yml` and other files + * `git stash save "pre-upgrade Malcolm configuration changes"` +3. pull changes from GitHub repository + * `git pull --rebase` +4. pull new Docker images (this will take a while) + * `docker-compose pull` +5. apply saved configuration change stashed earlier + * `git stash pop` +6. if you see `Merge conflict` messages, resolve the [conflicts](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging#_basic_merge_conflicts) with your favorite text editor +7. you may wish to re-run `install.py --configure` as described in [System configuration and tuning](malcolm-config.md#ConfigAndTuning) in case there are any new `docker-compose.yml` parameters for Malcolm that need to be set up +8. start Malcolm + * `./scripts/start` +9. you may be prompted to [configure authentication](authsetup.md#AuthSetup) if there are new authentication-related files that need to be generated + * you probably do not need to re-generate self-signed certificates + +## Scenario 2: Malcolm was installed from a packaged tarball + +If you installed Malcolm from [pre-packaged installation files](https://github.com/idaholab/Malcolm#Packager), here are the basic steps to perform an upgrade: + +1. stop Malcolm + * `./scripts/stop` +2. uncompress the new pre-packaged installation files (using `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` as an example, the file and/or directory names will be different depending on the release) + * `tar xf malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` +3. backup current Malcolm scripts, configuration files and certificates + * `mkdir -p ./upgrade_backup_$(date +%Y-%m-%d)` + * `cp -r filebeat/ htadmin/ logstash/ nginx/ auth.env cidr-map.txt docker-compose.yml host-map.txt net-map.json ./scripts ./README.md ./upgrade_backup_$(date +%Y-%m-%d)/` +3. replace scripts and local documentation in your existing installation with the new ones + * `rm -rf ./scripts ./README.md` + * `cp -r ./malcolm_YYYYMMDD_HHNNSS_xxxxxxx/scripts ./malcolm_YYYYMMDD_HHNNSS_xxxxxxx/README.md ./` +4. replace (overwrite) `docker-compose.yml` file with new version + * `cp ./malcolm_YYYYMMDD_HHNNSS_xxxxxxx/docker-compose.yml ./docker-compose.yml` +5. re-run `./scripts/install.py --configure` as described in [System configuration and tuning](malcolm-config.md#ConfigAndTuning) +6. using a file comparison tool (e.g., `diff`, `meld`, `Beyond Compare`, etc.), compare `docker-compose.yml` and the `docker-compare.yml` file you backed up in step 3, and manually migrate over any customizations you wish to preserve from that file (e.g., `PCAP_FILTER`, `MAXMIND_GEOIP_DB_LICENSE_KEY`, `MANAGE_PCAP_FILES`; [anything else](malcolm-config.md#DockerComposeYml) you may have edited by hand in `docker-compose.yml` that's not prompted for in `install.py --configure`) +7. pull the new docker images (this will take a while) + * `docker-compose pull` to pull them from Docker Hub or `docker-compose load -i malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz` if you have an offline tarball of the Malcolm docker images +8. start Malcolm + * `./scripts/start` +9. you may be prompted to [configure authentication](authsetup.md#AuthSetup) if there are new authentication-related files that need to be generated + * you probably do not need to re-generate self-signed certificates + +## Post-upgrade + +### Monitoring Malcolm + +If you are technically-minded, you may wish to follow the debug output provided by `./scripts/start` (or `./scripts/logs` if you need to re-open the log stream after you've closed it), although there is a lot there and it may be hard to distinguish whether or not something is okay. + +Running `docker-compose ps -a` should give you a good idea if all of Malcolm's Docker containers started up and, in some cases, may be able to indicate if the containers are "healthy" or not. + +After upgrading following one of the previous outlines, give Malcolm several minutes to get started. Once things are up and running, open one of Malcolm's [web interfaces](quickstart.md#UserInterfaceURLs) to verify that things are working. + +### Loading new OpenSearch Dashboards visualizations + +Once the upgraded instance Malcolm has started up, you'll probably want to import the new dashboards and visualizations for OpenSearch Dashboards. You can signal Malcolm to load the new visualizations by opening OpenSearch Dashboards, clicking **Management** → **Index Patterns**, then selecting the `arkime_sessions3-*` index pattern and clicking the delete **🗑** button near the upper-right of the window. Confirm the **Delete index pattern?** prompt by clicking **Delete**. Close the OpenSearch Dashboards browser window. After a few minutes the missing index pattern will be detected and OpenSearch Dashboards will be signalled to load its new dashboards and visualizations. + +## Major releases + +The Malcolm project uses [semantic versioning](https://semver.org/) when choosing version numbers. If you are moving between major releases (e.g., from v4.0.1 to v5.0.0), you're likely to find that there are enough major backwards compatibility-breaking changes that upgrading may not be worth the time and trouble. A fresh install is strongly recommended between major releases. \ No newline at end of file diff --git a/docs/netbox.md b/docs/netbox.md new file mode 100644 index 000000000..2ffd84120 --- /dev/null +++ b/docs/netbox.md @@ -0,0 +1,7 @@ +# Asset Management with NetBox + +Malcolm provides an instance of [NetBox](https://netbox.dev/), an open-source "solution for modeling and documenting modern networks." The NetBox web interface is available at at [https://localhost/netbox/](https://localhost/netbox/) if you are connecting locally. + +The design of a potentially deeper integration between Malcolm and Netbox is a work in progress. The purpose of an asset management system is to document the intended state of a network: were Malcolm to actively and agressively populate NetBox with the live network state, a network configuration fault could result in an incorrect documented configuration. The Malcolm development team is investigating what data, if any, should automatically flow to NetBox based on traffic observed (enabled via the `NETBOX_CRON` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml)), and what NetBox inventory data could be used, if any, to enrich Malcolm's network traffic metadata. Well-considered suggestions in this area [are welcome](mailto:malcolm@inl.gov?subject=NetBox). + +Please see the [NetBox page on GitHub](https://github.com/netbox-community/netbox), its [documentation](https://docs.netbox.dev/en/stable/) and its [public demo](https://demo.netbox.dev/) for more information. \ No newline at end of file diff --git a/docs/opensearch-instances.md b/docs/opensearch-instances.md new file mode 100644 index 000000000..c77ab38c3 --- /dev/null +++ b/docs/opensearch-instances.md @@ -0,0 +1,81 @@ +# OpenSearch instances + +* [OpenSearch instances](#OpenSearchInstance) + - [Authentication and authorization for remote OpenSearch clusters](#OpenSearchAuth) + +Malcolm's default standalone configuration is to use a local [OpenSearch](https://opensearch.org/) instance in a Docker container to index and search network traffic metadata. OpenSearch can also run as a [cluster](https://opensearch.org/docs/latest/opensearch/cluster/) with instances distributed across multiple nodes with dedicated [roles](https://opensearch.org/docs/latest/opensearch/cluster/#nodes) like cluster manager, data node, ingest node, etc. + +As the permutations of OpenSearch cluster configurations are numerous, it is beyond Malcolm's scope to set up multi-node clusters. However, Malcolm can be configured to use a remote OpenSearch cluster rather than its own internal instance. + +The `OPENSEARCH_…` [environment variables in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) control whether Malcolm uses its own local OpenSearch instance or a remote OpenSearch instance as its primary data store. The configuration portion of Malcolm install script ([`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning)) can help you configure these options. + +For example, to use the default standalone configuration, answer `Y` when prompted `Should Malcolm use and maintain its own OpenSearch instance?`. + +Or, to use a remote OpenSearch cluster: + +``` +… +Should Malcolm use and maintain its own OpenSearch instance? (Y/n): n + +Enter primary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.123:9200 + +Require SSL certificate validation for communication with primary OpenSearch instance? (y/N): n + +You must run auth_setup after install.py to store OpenSearch connection credentials. +… +``` + +Whether the primary OpenSearch instance is a locally maintained single-node instance or is a remote cluster, Malcolm can be configured additionally forward logs to a secondary remote OpenSearch instance. The `OPENSEARCH_SECONDARY_…` [environment variables in `docker-compose.yml`](malcolm-config.md#DockerComposeYml) control this behavior. Configuration of a remote secondary OpenSearch instance is similar to that of a remote primary OpenSearch instance: + + +``` +… +Forward Logstash logs to a secondary remote OpenSearch instance? (y/N): y + +Enter secondary remote OpenSearch connection URL (e.g., https://192.168.1.123:9200): https://192.168.1.124:9200 + +Require SSL certificate validation for communication with secondary OpenSearch instance? (y/N): n + +You must run auth_setup after install.py to store OpenSearch connection credentials. +… +``` + +## Authentication and authorization for remote OpenSearch clusters + +In addition to setting the environment variables in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) as described above, you must provide Malcolm with credentials for it to be able to communicate with remote OpenSearch instances. These credentials are stored in the Malcolm installation directory as `.opensearch.primary.curlrc` and `.opensearch.secondary.curlrc` for the primary and secondary OpenSearch connections, respectively, and are bind mounted into the Docker containers which need to communicate with OpenSearch. These [cURL-formatted](https://everything.curl.dev/cmdline/configfile) config files can be generated for you by the [`auth_setup`](authsetup.md#AuthSetup) script as illustrated: + +``` +$ ./scripts/auth_setup + +… + +Store username/password for primary remote OpenSearch instance? (y/N): y + +OpenSearch username: servicedb +servicedb password: +servicedb password (again): + +Require SSL certificate validation for OpenSearch communication? (Y/n): n + +Store username/password for secondary remote OpenSearch instance? (y/N): y + +OpenSearch username: remotedb +remotedb password: +remotedb password (again): + +Require SSL certificate validation for OpenSearch communication? (Y/n): n + +… +``` + +These files are created with permissions such that only the user account running Malcolm can access them: + +``` +$ ls -la .opensearch.*.curlrc +-rw------- 1 user user 36 Aug 22 14:17 .opensearch.primary.curlrc +-rw------- 1 user user 35 Aug 22 14:18 .opensearch.secondary.curlrc +``` + +One caveat with Malcolm using a remote OpenSearch cluster as its primary document store is that the accounts used to access Malcolm's [web interfaces](quickstart.md#UserInterfaceURLs), particularly [OpenSearch Dashboards](dashboards.md#Dashboards), are in some instance passed directly through to OpenSearch itself. For this reason, both Malcolm and the remote primary OpenSearch instance must have the same account information. The easiest way to accomplish this is to use an Active Directory/LDAP server that both [Malcolm](authsetup.md#AuthLDAP) and [OpenSearch](https://opensearch.org/docs/latest/security-plugin/configuration/ldap/) use as a common authentication backend. + +See the OpenSearch documentation on [access control](https://opensearch.org/docs/latest/security-plugin/access-control/index/) for more information. \ No newline at end of file diff --git a/docs/protocols.md b/docs/protocols.md new file mode 100644 index 000000000..7d687e416 --- /dev/null +++ b/docs/protocols.md @@ -0,0 +1,64 @@ +# Supported Protocols + +Malcolm uses [Zeek](https://docs.zeek.org/en/stable/script-reference/proto-analyzers.html) and [Arkime](https://github.com/arkime/arkime/tree/master/capture/parsers) to analyze network traffic. These tools provide varying degrees of visibility into traffic transmitted over the following network protocols: + +| Traffic | Wiki | Organization/Specification | Arkime | Zeek | +|---|:---:|:---:|:---:|:---:| +|Internet layer|[🔗](https://en.wikipedia.org/wiki/Internet_layer)|[🔗](https://tools.ietf.org/html/rfc791)|[✓](https://github.com/arkime/arkime/blob/master/capture/packet.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/conn/main.zeek.html#type-Conn::Info)| +|Border Gateway Protocol (BGP)|[🔗](https://en.wikipedia.org/wiki/Border_Gateway_Protocol)|[🔗](https://tools.ietf.org/html/rfc2283)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/bgp.c)|| +|Building Automation and Control (BACnet)|[🔗](https://en.wikipedia.org/wiki/BACnet)|[🔗](http://www.bacnet.org/)||[✓](https://github.com/cisagov/icsnpp-bacnet)| +|Bristol Standard Asynchronous Protocol (BSAP)|[🔗](https://en.wikipedia.org/wiki/Bristol_Standard_Asynchronous_Protocol)|[🔗](http://www.documentation.emersonprocess.com/groups/public/documents/specification_sheets/d301321x012.pdf)[🔗](http://www.documentation.emersonprocess.com/groups/public/documents/instruction_manuals/d301401x012.pdf)||[✓](https://github.com/cisagov/icsnpp-bsap)| +|Distributed Computing Environment / Remote Procedure Calls (DCE/RPC)|[🔗](https://en.wikipedia.org/wiki/DCE/RPC)|[🔗](https://pubs.opengroup.org/onlinepubs/009629399/toc.pdf)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dce-rpc/main.zeek.html#type-DCE_RPC::Info)| +|Dynamic Host Configuration Protocol (DHCP)|[🔗](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol)|[🔗](https://tools.ietf.org/html/rfc2131)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/dhcp.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dhcp/main.zeek.html#type-DHCP::Info)| +|Distributed Network Protocol 3 (DNP3)|[🔗](https://en.wikipedia.org/wiki/DNP3)|[🔗](https://www.dnp.org)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dnp3/main.zeek.html#type-DNP3::Info)[✓](https://github.com/cisagov/icsnpp-dnp3)| +|Domain Name System (DNS)|[🔗](https://en.wikipedia.org/wiki/Domain_Name_System)|[🔗](https://tools.ietf.org/html/rfc1035)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/dns.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/dns/main.zeek.html#type-DNS::Info)| +|EtherCAT|[🔗](https://en.wikipedia.org/wiki/EtherCAT)|[🔗](https://www.ethercat.org/en/downloads/downloads_A02E436C7A97479F9261FDFA8A6D71E5.htm)||[✓](https://github.com/cisagov/icsnpp-ethercat)| +|EtherNet/IP / Common Industrial Protocol (CIP)|[🔗](https://en.wikipedia.org/wiki/EtherNet/IP) [🔗](https://en.wikipedia.org/wiki/Common_Industrial_Protocol)|[🔗](https://www.odva.org/Technology-Standards/EtherNet-IP/Overview)||[✓](https://github.com/cisagov/icsnpp-enip)| +|FTP (File Transfer Protocol)|[🔗](https://en.wikipedia.org/wiki/File_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc959)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ftp/info.zeek.html#type-FTP::Info)| +|GENISYS||[🔗](https://manualzz.com/doc/6363274/genisys-2000---ansaldo-sts---product-support#93)[🔗](https://gitlab.com/wireshark/wireshark/-/issues/3422)||[✓](https://github.com/cisagov/icsnpp-genisys)| +|Google Quick UDP Internet Connections (gQUIC)|[🔗](https://en.wikipedia.org/wiki/QUIC#Google_QUIC_(gQUIC))|[🔗](https://www.chromium.org/quic)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/quic.c)|[✓](https://github.com/salesforce/GQUIC_Protocol_Analyzer/blob/master/scripts/Salesforce/GQUIC/main.bro)| +|Hypertext Transfer Protocol (HTTP)|[🔗](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc7230)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/http.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/http/main.zeek.html#type-HTTP::Info)| +|IPsec|[🔗](https://en.wikipedia.org/wiki/IPsec)|[🔗](https://zeek.org/2021/04/20/zeeks-ipsec-protocol-analyzer/)||[✓](https://github.com/corelight/zeek-spicy-ipsec)| +|Internet Relay Chat (IRC)|[🔗](https://en.wikipedia.org/wiki/Internet_Relay_Chat)|[🔗](https://tools.ietf.org/html/rfc1459)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/irc.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/irc/main.zeek.html#type-IRC::Info)| +|Lightweight Directory Access Protocol (LDAP)|[🔗](https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol)|[🔗](https://tools.ietf.org/html/rfc4511)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/ldap.c)|[✓](https://github.com/zeek/spicy-ldap)| +|Kerberos|[🔗](https://en.wikipedia.org/wiki/Kerberos_(protocol))|[🔗](https://tools.ietf.org/html/rfc4120)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/krb5.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/krb/main.zeek.html#type-KRB::Info)| +|Modbus|[🔗](https://en.wikipedia.org/wiki/Modbus)|[🔗](http://www.modbus.org/)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/modbus/main.zeek.html#type-Modbus::Info)[✓](https://github.com/cisagov/icsnpp-modbus)| +|MQ Telemetry Transport (MQTT)|[🔗](https://en.wikipedia.org/wiki/MQTT)|[🔗](https://mqtt.org/)||[✓](https://docs.zeek.org/en/stable/scripts/policy/protocols/mqtt/main.zeek.html)| +|MySQL|[🔗](https://en.wikipedia.org/wiki/MySQL)|[🔗](https://dev.mysql.com/doc/internals/en/client-server-protocol.html)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/mysql.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/mysql/main.zeek.html#type-MySQL::Info)| +|NT Lan Manager (NTLM)|[🔗](https://en.wikipedia.org/wiki/NT_LAN_Manager)|[🔗](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-nlmp/b38c36ed-2804-4868-a9ff-8dd3182128e4?redirectedfrom=MSDN)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ntlm/main.zeek.html#type-NTLM::Info)| +|Network Time Protocol (NTP)|[🔗](https://en.wikipedia.org/wiki/Network_Time_Protocol)|[🔗](http://www.ntp.org)||[✓](https://docs.zeek.org/en/latest/scripts/base/protocols/ntp/main.zeek.html#type-NTP::Info)| +|Oracle|[🔗](https://en.wikipedia.org/wiki/Oracle_Net_Services)|[🔗](https://docs.oracle.com/cd/E11882_01/network.112/e41945/layers.htm#NETAG004)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/oracle.c)|| +|Open Platform Communications Unified Architecture (OPC UA) Binary|[🔗](https://en.wikipedia.org/wiki/OPC_Unified_Architecture)|[🔗](https://opcfoundation.org/developer-tools/specifications-unified-architecture)||[✓](https://github.com/cisagov/icsnpp-opcua-binary)| +|Open Shortest Path First (OSPF)|[🔗](https://en.wikipedia.org/wiki/Open_Shortest_Path_First)|[🔗](https://datatracker.ietf.org/wg/ospf/charter/)[🔗](https://datatracker.ietf.org/doc/html/rfc2328)[🔗](https://datatracker.ietf.org/doc/html/rfc5340)||[✓](https://github.com/corelight/zeek-spicy-ospf)| +|OpenVPN|[🔗](https://en.wikipedia.org/wiki/OpenVPN)|[🔗](https://openvpn.net/community-resources/openvpn-protocol/)[🔗](https://zeek.org/2021/03/16/a-zeek-openvpn-protocol-analyzer/)||[✓](https://github.com/corelight/zeek-spicy-openvpn)| +|PostgreSQL|[🔗](https://en.wikipedia.org/wiki/PostgreSQL)|[🔗](https://www.postgresql.org/)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/postgresql.c)|| +|Process Field Net (PROFINET)|[🔗](https://en.wikipedia.org/wiki/PROFINET)|[🔗](https://us.profinet.com/technology/profinet/)||[✓](https://github.com/amzn/zeek-plugin-profinet/blob/master/scripts/main.zeek)| +|Remote Authentication Dial-In User Service (RADIUS)|[🔗](https://en.wikipedia.org/wiki/RADIUS)|[🔗](https://tools.ietf.org/html/rfc2865)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/radius.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/radius/main.zeek.html#type-RADIUS::Info)| +|Remote Desktop Protocol (RDP)|[🔗](https://en.wikipedia.org/wiki/Remote_Desktop_Protocol)|[🔗](https://docs.microsoft.com/en-us/windows/win32/termserv/remote-desktop-protocol?redirectedfrom=MSDN)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/rdp/main.zeek.html#type-RDP::Info)| +|Remote Framebuffer (RFB)|[🔗](https://en.wikipedia.org/wiki/RFB_protocol)|[🔗](https://tools.ietf.org/html/rfc6143)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/rfb/main.zeek.html#type-RFB::Info)| +|S7comm / Connection Oriented Transport Protocol (COTP)|[🔗](https://wiki.wireshark.org/S7comm) [🔗](https://wiki.wireshark.org/COTP)|[🔗](https://support.industry.siemens.com/cs/document/26483647/what-properties-advantages-and-special-features-does-the-s7-protocol-offer-?dti=0&lc=en-WW) [🔗](https://www.ietf.org/rfc/rfc0905.txt)||[✓](https://github.com/cisagov/icsnpp-s7comm)| +|Secure Shell (SSH)|[🔗](https://en.wikipedia.org/wiki/Secure_Shell)|[🔗](https://tools.ietf.org/html/rfc4253)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/ssh.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ssh/main.zeek.html#type-SSH::Info)| +|Secure Sockets Layer (SSL) / Transport Layer Security (TLS)|[🔗](https://en.wikipedia.org/wiki/Transport_Layer_Security)|[🔗](https://tools.ietf.org/html/rfc5246)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/socks.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/ssl/main.zeek.html#type-SSL::Info)| +|Session Initiation Protocol (SIP)|[🔗](https://en.wikipedia.org/wiki/Session_Initiation_Protocol)|[🔗](https://tools.ietf.org/html/rfc3261)||[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/sip/main.zeek.html#type-SIP::Info)| +|Server Message Block (SMB) / Common Internet File System (CIFS)|[🔗](https://en.wikipedia.org/wiki/Server_Message_Block)|[🔗](https://docs.microsoft.com/en-us/windows/win32/fileio/microsoft-smb-protocol-and-cifs-protocol-overview)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/smb.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/smb/main.zeek.html)| +|Simple Mail Transfer Protocol (SMTP)|[🔗](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc5321)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/smtp.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/smtp/main.zeek.html#type-SMTP::Info)| +|Simple Network Management Protocol (SNMP)|[🔗](https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol)|[🔗](https://tools.ietf.org/html/rfc2578)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/smtp.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/snmp/main.zeek.html#type-SNMP::Info)| +|SOCKS|[🔗](https://en.wikipedia.org/wiki/SOCKS)|[🔗](https://tools.ietf.org/html/rfc1928)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/socks.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/socks/main.zeek.html#type-SOCKS::Info)| +|STUN (Session Traversal Utilities for NAT)|[🔗](https://en.wikipedia.org/wiki/STUN)|[🔗](https://datatracker.ietf.org/doc/html/rfc3489)|[✓](https://github.com/arkime/arkime/blob/main/capture/parsers/misc.c#L147)|[✓](https://github.com/corelight/zeek-spicy-stun)| +|Syslog|[🔗](https://en.wikipedia.org/wiki/Syslog)|[🔗](https://tools.ietf.org/html/rfc5424)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/tls.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/protocols/syslog/main.zeek.html#type-Syslog::Info)| +|Tabular Data Stream (TDS)|[🔗](https://en.wikipedia.org/wiki/Tabular_Data_Stream)|[🔗](https://www.freetds.org/tds.html) [🔗](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/b46a581a-39de-4745-b076-ec4dbb7d13ec)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/tds.c)|[✓](https://github.com/amzn/zeek-plugin-tds/blob/master/scripts/main.zeek)| +|Telnet / remote shell (rsh) / remote login (rlogin)|[🔗](https://en.wikipedia.org/wiki/Telnet)[🔗](https://en.wikipedia.org/wiki/Berkeley_r-commands)|[🔗](https://tools.ietf.org/html/rfc854)[🔗](https://tools.ietf.org/html/rfc1282)|[✓](https://github.com/arkime/arkime/blob/master/capture/parsers/misc.c#L336)|[✓](https://docs.zeek.org/en/current/scripts/base/bif/plugins/Zeek_Login.events.bif.zeek.html)[❋](https://github.com/idaholab/Malcolm/blob/main/zeek/config/login.zeek)| +|TFTP (Trivial File Transfer Protocol)|[🔗](https://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol)|[🔗](https://tools.ietf.org/html/rfc1350)||[✓](https://github.com/zeek/spicy-analyzers/blob/main/analyzer/protocol/tftp/tftp.zeek)| +|WireGuard|[🔗](https://en.wikipedia.org/wiki/WireGuard)|[🔗](https://www.wireguard.com/protocol/)[🔗](https://www.wireguard.com/papers/wireguard.pdf)||[✓](https://github.com/corelight/zeek-spicy-wireguard)| +|various tunnel protocols (e.g., GTP, GRE, Teredo, AYIYA, IP-in-IP, etc.)|[🔗](https://en.wikipedia.org/wiki/Tunneling_protocol)||[✓](https://github.com/arkime/arkime/blob/master/capture/packet.c)|[✓](https://docs.zeek.org/en/stable/scripts/base/frameworks/tunnels/main.zeek.html#type-Tunnel::Info)| + +Additionally, Zeek is able to detect and, where possible, log the type, vendor and version of [various](https://docs.zeek.org/en/stable/scripts/base/frameworks/software/main.zeek.html#type-Software::Type) other [software protocols](https://en.wikipedia.org/wiki/Application_layer). + +As part of its network traffic analysis, Zeek can extract and analyze files transferred across the protocols it understands. In addition to generating logs for transferred files, deeper analysis is done into the following file types: + +* [Portable executable](https://docs.zeek.org/en/stable/scripts/base/files/pe/main.zeek.html#type-PE::Info) files +* [X.509](https://docs.zeek.org/en/stable/scripts/base/files/x509/main.zeek.html#type-X509::Info) certificates + +See [automatic file extraction and scanning](file-scanning.md#ZeekFileExtraction) for additional features related to file scanning. + +See [Zeek log integration](arkime.md#ArkimeZeek) for more information on how Malcolm integrates [Arkime sessions and Zeek logs](arkime.md#ZeekArkimeFlowCorrelation) for analysis. \ No newline at end of file diff --git a/docs/queries-cheat-sheet.md b/docs/queries-cheat-sheet.md new file mode 100644 index 000000000..751c4a65c --- /dev/null +++ b/docs/queries-cheat-sheet.md @@ -0,0 +1,74 @@ +# Search Queries in Arkime and OpenSearch Dashboards + +OpenSearch Dashboards supports two query syntaxes: the legacy [Lucene](https://www.elastic.co/guide/en/kibana/current/lucene-query.html) syntax and [Dashboards Query Language (DQL)](https://opensearch.org/docs/1.2/dashboards/dql/), both of which are somewhat different than Arkime's query syntax (see the help at [https://localhost/help#search](https://localhost/help#search) if you are connecting locally). The Arkime interface is for searching and visualizing both Arkime sessions and Zeek logs. The prebuilt dashboards in the OpenSearch Dashboards interface are for searching and visualizing Zeek logs, but will not include Arkime sessions. Here are some common patterns used in building search query strings for Arkime and OpenSearch Dashboards, respectively. See the links provided for further documentation. + +| | [Arkime Search String](https://localhost/help#search) | [OpenSearch Dashboards Search String (Lucene)](https://www.elastic.co/guide/en/kibana/current/lucene-query.html) | [OpenSearch Dashboards Search String (DQL)](https://www.elastic.co/guide/en/kibana/current/kuery-query.html)| +|---|:---:|:---:|:---:| +| Field exists |`event.dataset == EXISTS!`|`_exists_:event.dataset`|`event.dataset:*`| +| Field does not exist |`event.dataset != EXISTS!`|`NOT _exists_:event.dataset`|`NOT event.dataset:*`| +| Field matches a value |`port.dst == 22`|`destination.port:22`|`destination.port:22`| +| Field does not match a value |`port.dst != 22`|`NOT destination.port:22`|`NOT destination.port:22`| +| Field matches at least one of a list of values |`tags == [foo, bar]`|`tags:(foo OR bar)`|`tags:(foo or bar)`| +| Field range (inclusive) |`http.statuscode >= 200 && http.statuscode <= 300`|`http.statuscode:[200 TO 300]`|`http.statuscode >= 200 and http.statuscode <= 300`| +| Field range (exclusive) |`http.statuscode > 200 && http.statuscode < 300`|`http.statuscode:{200 TO 300}`|`http.statuscode > 200 and http.statuscode < 300`| +| Field range (mixed exclusivity) |`http.statuscode >= 200 && http.statuscode < 300`|`http.statuscode:[200 TO 300}`|`http.statuscode >= 200 and http.statuscode < 300`| +| Match all search terms (AND) |`(tags == [foo, bar]) && (http.statuscode == 401)`|`tags:(foo OR bar) AND http.statuscode:401`|`tags:(foo or bar) and http.statuscode:401`| +| Match any search terms (OR) |`(zeek.ftp.password == EXISTS!) || (zeek.http.password == EXISTS!) || (related.user == "anonymous")`|`_exists_:zeek.ftp.password OR _exists_:zeek.http.password OR related.user:"anonymous"`|`zeek.ftp.password:* or zeek.http.password:* or related.user:"anonymous"`| +| Global string search (anywhere in the document) |all Arkime search expressions are field-based|`microsoft`|`microsoft`| +| Wildcards|`host.dns == "*micro?oft*"` (`?` for single character, `*` for any characters)|`dns.host:*micro?oft*` (`?` for single character, `*` for any characters)|`dns.host:*micro*ft*` (`*` for any characters)| +| Regex |`host.http == /.*www\.f.*k\.com.*/`|`zeek.http.host:/.*www\.f.*k\.com.*/`|DQL does not support regex| +| IPv4 values |`ip == 0.0.0.0/0`|`source.ip:"0.0.0.0/0" OR destination.ip:"0.0.0.0/0"`|`source.ip:"0.0.0.0/0" OR destination.ip:"0.0.0.0/0"`| +| IPv6 values |`(ip.src == EXISTS! || ip.dst == EXISTS!) && (ip != 0.0.0.0/0)`|`(_exists_:source.ip AND NOT source.ip:"0.0.0.0/0") OR (_exists_:destination.ip AND NOT destination.ip:"0.0.0.0/0")`|`(source.ip:* and not source.ip:"0.0.0.0/0") or (destination.ip:* and not destination.ip:"0.0.0.0/0")`| +| GeoIP information available |`country == EXISTS!`|`_exists_:destination.geo OR _exists_:source.geo`|`destination.geo:* or source.geo:*`| +| Zeek log type |`event.dataset == notice`|`event.dataset:notice`|`event.dataset:notice`| +| IP CIDR Subnets |`ip.src == 172.16.0.0/12`|`source.ip:"172.16.0.0/12"`|`source.ip:"172.16.0.0/12"`| +| Search time frame |Use Arkime time bounding controls under the search bar|Use OpenSearch Dashboards time range controls in the upper right-hand corner|Use OpenSearch Dashboards time range controls in the upper right-hand corner| + +When building complex queries, it is **strongly recommended** that you enclose search terms and expressions in parentheses to control order of operations. + +As Zeek logs are ingested, Malcolm parses and normalizes the logs' fields to match Arkime's underlying OpenSearch schema. A complete list of these fields can be found in the Arkime help (accessible at [https://localhost/help#fields](https://localhost/help#fields) if you are connecting locally). + +Whenever possible, Zeek fields are mapped to existing corresponding Arkime fields: for example, the `orig_h` field in Zeek is mapped to Arkime's `source.ip` field. The original Zeek fields are also left intact. To complicate the issue, the Arkime interface uses its own aliases to reference those fields: the source IP field is referenced as `ip.src` (Arkime's alias) in Arkime and `source.ip` or `source.ip` in OpenSearch Dashboards. + +The table below shows the mapping of some of these fields. + +| Field Description |Arkime Field Alias(es)|Arkime-mapped Zeek Field(s)|Zeek Field(s)| +|---|:---:|:---:|:---:| +| [Community ID](https://github.com/corelight/community-id-spec) Flow Hash ||`network.community_id`|`network.community_id`| +| Destination IP |`ip.dst`|`destination.ip`|`destination.ip`| +| Destination MAC |`mac.dst`|`destination.mac`|`destination.mac`| +| Destination Port |`port.dst`|`destination.port`|`destination.port`| +| Duration |`session.length`|`length`|`zeek.conn.duration`| +| First Packet Time |`starttime`|`firstPacket`|`zeek.ts`, `@timestamp`| +| IP Protocol |`ip.protocol`|`ipProtocol`|`network.transport`| +| Last Packet Time |`stoptime`|`lastPacket`|| +| MIME Type |`email.bodymagic`, `http.bodymagic`|`http.bodyMagic`|`file.mime_type`, `zeek.files.mime_type`, `zeek.ftp.mime_type`, `zeek.http.orig_mime_types`, `zeek.http.resp_mime_types`, `zeek.irc.dcc_mime_type`| +| Protocol/Service |`protocols`|`protocol`|`network.transport`, `network.protocol`| +| Request Bytes |`databytes.src`, `bytes.src`|`source.bytes`, `client.bytes`|`zeek.conn.orig_bytes`, `zeek.conn.orig_ip_bytes`| +| Request Packets |`packets.src`|`source.packets`|`zeek.conn.orig_pkts`| +| Response Bytes |`databytes.dst`, `bytes.dst`|`destination.bytes`, `server.bytes`|`zeek.conn.resp_bytes`, `zeek.conn.resp_ip_bytes`| +| Response Packets |`packets.dst`|`destination.packets`|`zeek.con.resp_pkts`| +| Source IP |`ip.src`|`source.ip`|`source.ip`| +| Source MAC |`mac.src`|`source.mac`|`source.mac`| +| Source Port |`port.src`|`source.port`|`source.port`| +| Total Bytes |`databytes`, `bytes`|`totDataBytes`, `network.bytes`|| +| Total Packets |`packets`|`network.packets`|| +| Username |`user`|`user`|`related.user`| +| Zeek Connection UID|||`zeek.uid`, `event.id`| +| Zeek File UID |||`zeek.fuid`, `event.id`| +| Zeek Log Type |||`event.dataset`| + +In addition to the fields listed above, Arkime provides several special field aliases for matching any field of a particular type. While these aliases do not exist in OpenSearch Dashboards *per se*, they can be approximated as illustrated below. + +| Matches Any | Arkime Special Field Example | OpenSearch Dashboards/Zeek Equivalent Example | +|---|:---:|:---:| +| IP Address | `ip == 192.168.0.1` | `source.ip:192.168.0.1 OR destination.ip:192.168.0.1` | +| Port | `port == [80, 443, 8080, 8443]` | `source.port:(80 OR 443 OR 8080 OR 8443) OR destination.port:(80 OR 443 OR 8080 OR 8443)` | +| Country (code) | `country == [RU,CN]` | `destination.geo.country_code2:(RU OR CN) OR source.geo.country_code2:(RU OR CN) OR dns.GEO:(RU OR CN)` | +| Country (name) | | `destination.geo.country_name:(Russia OR China) OR source.geo.country_name:(Russia OR China)` | +| ASN | `asn == "*Mozilla*"` | `source.as.full:*Mozilla* OR destination.as.full:*Mozilla* OR dns.ASN:*Mozilla*` | +| Host | `host == www.microsoft.com` | `zeek.http.host:www.microsoft.com (or zeek.dhcp.host_name, zeek.dns.host, zeek.ntlm.host, smb.host, etc.)` | +| Protocol (layers >= 4) | `protocols == tls` | `protocol:tls` | +| User | `user == EXISTS! && user != anonymous` | `_exists_:user AND (NOT user:anonymous)` | + +For details on how to filter both Zeek logs and Arkime session records for a particular connection, see [Correlating Zeek logs and Arkime sessions](arkime.md#ZeekArkimeFlowCorrelation). \ No newline at end of file diff --git a/docs/Malcolm Network Traffic Analysis Quick Start Guide.odt b/docs/quick-start/Malcolm Network Traffic Analysis Quick Start Guide.odt similarity index 100% rename from docs/Malcolm Network Traffic Analysis Quick Start Guide.odt rename to docs/quick-start/Malcolm Network Traffic Analysis Quick Start Guide.odt diff --git a/docs/Malcolm Network Traffic Analysis Quick Start Guide.pdf b/docs/quick-start/Malcolm Network Traffic Analysis Quick Start Guide.pdf similarity index 100% rename from docs/Malcolm Network Traffic Analysis Quick Start Guide.pdf rename to docs/quick-start/Malcolm Network Traffic Analysis Quick Start Guide.pdf diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 000000000..6abeae160 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,97 @@ +# Quick start + +* [Quick start](#QuickStart) + - [Getting Malcolm](#GetMalcolm) + - [User interface](#UserInterfaceURLs) + +## Getting Malcolm + +For a `TL;DR` example of downloading, configuring, and running Malcolm on a Linux platform, see [Installation example using Ubuntu 22.04 LTS](ubuntu-install-example.md#InstallationExample). + +The scripts to control Malcolm require Python 3. The [`install.py`](malcolm-config.md#ConfigAndTuning) script requires the [requests](https://docs.python-requests.org/en/latest/) module for Python 3, and will make use of the [pythondialog](https://pythondialog.sourceforge.io/) module for user interaction (on Linux) if it is available. + +### Source code + +The files required to build and run Malcolm are available on its [GitHub page](https://github.com/idaholab/Malcolm/tree/main). Malcolm's source code is released under the terms of a permissive open source software license (see see `License.txt` for the terms of its release). + +### Building Malcolm from scratch + +The `build.sh` script can build Malcolm's Docker images from scratch. See [Building from source](development.md#Build) for more information. + +### Initial configuration + +You must run [`auth_setup`](authsetup.md#AuthSetup) prior to pulling Malcolm's Docker images. You should also ensure your system configuration and `docker-compose.yml` settings are tuned by running `./scripts/install.py` or `./scripts/install.py --configure` (see [System configuration and tuning](malcolm-config.md#ConfigAndTuning)). + +### Pull Malcolm's Docker images + +Malcolm's Docker images are periodically built and hosted on [Docker Hub](https://hub.docker.com/u/malcolmnetsec). If you already have [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/), these prebuilt images can be pulled by navigating into the Malcolm directory (containing the `docker-compose.yml` file) and running `docker-compose pull` like this: +``` +$ docker-compose pull +Pulling api ... done +Pulling arkime ... done +Pulling dashboards ... done +Pulling dashboards-helper ... done +Pulling file-monitor ... done +Pulling filebeat ... done +Pulling freq ... done +Pulling htadmin ... done +Pulling logstash ... done +Pulling name-map-ui ... done +Pulling netbox ... done +Pulling netbox-postgresql ... done +Pulling netbox-redis ... done +Pulling nginx-proxy ... done +Pulling opensearch ... done +Pulling pcap-capture ... done +Pulling pcap-monitor ... done +Pulling suricata ... done +Pulling upload ... done +Pulling zeek ... done +``` + +You can then observe that the images have been retrieved by running `docker images`: +``` +$ docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +malcolmnetsec/api 6.4.0 xxxxxxxxxxxx 3 days ago 158MB +malcolmnetsec/arkime 6.4.0 xxxxxxxxxxxx 3 days ago 816MB +malcolmnetsec/dashboards 6.4.0 xxxxxxxxxxxx 3 days ago 1.02GB +malcolmnetsec/dashboards-helper 6.4.0 xxxxxxxxxxxx 3 days ago 184MB +malcolmnetsec/file-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 588MB +malcolmnetsec/file-upload 6.4.0 xxxxxxxxxxxx 3 days ago 259MB +malcolmnetsec/filebeat-oss 6.4.0 xxxxxxxxxxxx 3 days ago 624MB +malcolmnetsec/freq 6.4.0 xxxxxxxxxxxx 3 days ago 132MB +malcolmnetsec/htadmin 6.4.0 xxxxxxxxxxxx 3 days ago 242MB +malcolmnetsec/logstash-oss 6.4.0 xxxxxxxxxxxx 3 days ago 1.35GB +malcolmnetsec/name-map-ui 6.4.0 xxxxxxxxxxxx 3 days ago 143MB +malcolmnetsec/netbox 6.4.0 xxxxxxxxxxxx 3 days ago 1.01GB +malcolmnetsec/nginx-proxy 6.4.0 xxxxxxxxxxxx 3 days ago 121MB +malcolmnetsec/opensearch 6.4.0 xxxxxxxxxxxx 3 days ago 1.17GB +malcolmnetsec/pcap-capture 6.4.0 xxxxxxxxxxxx 3 days ago 121MB +malcolmnetsec/pcap-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 213MB +malcolmnetsec/postgresql 6.4.0 xxxxxxxxxxxx 3 days ago 268MB +malcolmnetsec/redis 6.4.0 xxxxxxxxxxxx 3 days ago 34.2MB +malcolmnetsec/suricata 6.4.0 xxxxxxxxxxxx 3 days ago 278MB +malcolmnetsec/zeek 6.4.0 xxxxxxxxxxxx 3 days ago 1GB +``` + +### Import from pre-packaged tarballs + +Once built, the `malcolm_appliance_packager.sh` script can be used to create pre-packaged Malcolm tarballs for import on another machine. See [Pre-Packaged Installation Files](development.md#Packager) for more information. + +## Starting and stopping Malcolm + +Use the scripts in the `scripts/` directory to start and stop Malcolm, view debug logs of a currently running +instance, wipe the database and restore Malcolm to a fresh state, etc. + +## User interface + +A few minutes after starting Malcolm (probably 5 to 10 minutes for Logstash to be completely up, depending on the system), the following services will be accessible: + +* [Arkime](https://arkime.com/): [https://localhost:443](https://localhost:443) +* [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/): [https://localhost/dashboards/](https://localhost/dashboards/) or [https://localhost:5601](https://localhost:5601) +* [Capture File and Log Archive Upload (Web)](upload.md#Upload): [https://localhost/upload/](https://localhost/upload/) +* [Capture File and Log Archive Upload (SFTP)](upload.md#Upload): `sftp://@127.0.0.1:8022/files` +* [Host and Subnet Name Mapping](host-and-subnet-mapping.md#HostAndSubnetNaming) Editor: [https://localhost/name-map-ui/](https://localhost/name-map-ui/) +* [NetBox](netbox.md#NetBox): [https://localhost/netbox/](https://localhost/netbox/) +* [Account Management](authsetup.md#AuthBasicAccountManagement): [https://localhost:488](https://localhost:488) \ No newline at end of file diff --git a/docs/running.md b/docs/running.md new file mode 100644 index 000000000..a2db0d40d --- /dev/null +++ b/docs/running.md @@ -0,0 +1,51 @@ +# Running Malcolm + +* [Running Malcolm](#Running) + - [OpenSearch instances](opensearch-instances.md#OpenSearchInstance) + + [Authentication and authorization for remote OpenSearch clusters](opensearch-instances.md#OpenSearchAuth) + - [Starting Malcolm](#Starting) + - [Stopping and restarting Malcolm](#StopAndRestart) + - [Clearing Malcolm's data](#Wipe) + - [Temporary read-only interface](#ReadOnlyUI) + +## Starting Malcolm + +[Docker compose](https://docs.docker.com/compose/) is used to coordinate running the Docker containers. To start Malcolm, navigate to the directory containing `docker-compose.yml` and run: +``` +$ ./scripts/start +``` +This will create the containers' virtual network and instantiate them, then leave them running in the background. The Malcolm containers may take a several minutes to start up completely. To follow the debug output for an already-running Malcolm instance, run: +``` +$ ./scripts/logs +``` +You can also use `docker stats` to monitor the resource utilization of running containers. + +## Stopping and restarting Malcolm + +You can run `./scripts/stop` to stop the docker containers and remove their virtual network. Alternatively, `./scripts/restart` will restart an instance of Malcolm. Because the data on disk is stored on the host in docker volumes, doing these operations will not result in loss of data. + +Malcolm can be configured to be automatically restarted when the Docker system daemon restart (for example, on system reboot). This behavior depends on the [value](https://docs.docker.com/config/containers/start-containers-automatically/) of the [`restart:`](https://docs.docker.com/compose/compose-file/#restart) setting for each service in the `docker-compose.yml` file. This value can be set by running [`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning) and answering "yes" to "`Restart Malcolm upon system or Docker daemon restart?`." + +## Clearing Malcolm's data + +Run `./scripts/wipe` to stop the Malcolm instance and wipe its OpenSearch database (**including** [index snapshots and management policies](index-management.md#IndexManagement) and [alerting configuration](alerting.md#Alerting)). + +## Temporary read-only interface + +To temporarily set the Malcolm user interaces into a read-only configuration, run the following commands from the Malcolm installation directory. + +First, to configure [Nginx] to disable access to the upload and other interfaces for changing Malcolm settings, and to deny HTTP methods other than `GET` and `POST`: + +``` +docker-compose exec nginx-proxy bash -c "cp /etc/nginx/nginx_readonly.conf /etc/nginx/nginx.conf && nginx -s reload" +``` + +Second, to set the existing OpenSearch data store to read-only: + +``` +docker-compose exec dashboards-helper /data/opensearch_read_only.py -i _cluster +``` + +These commands must be re-run every time you restart Malcolm. + +Note that after you run these commands you may see an increase of error messages in the Malcolm containers' output as various background processes will fail due to the read-only nature of the indices. Additionally, some features such as Arkime's [Hunt](arkime.md#ArkimeHunt) and [building your own visualizations and dashboards](dashboards.md#BuildDashboard) in OpenSearch Dashboards will not function correctly in read-only mode. \ No newline at end of file diff --git a/docs/severity.md b/docs/severity.md new file mode 100644 index 000000000..2d8da26f0 --- /dev/null +++ b/docs/severity.md @@ -0,0 +1,47 @@ +# Event severity scoring + +* [Event severity scoring](#Severity) + - [Customizing event severity scoring](#SeverityConfig) + +As Zeek logs are parsed and enriched prior to indexing, a severity score up to `100` (a higher score indicating a more severe event) can be assigned when one or more of the following conditions are met: + +* cross-segment network traffic (if [network subnets were defined](host-and-subnet-mapping.md#HostAndSubnetNaming)) +* connection origination and destination (e.g., inbound, outbound, external, internal) +* traffic to or from sensitive countries + - The comma-separated list of countries (by [ISO 3166-1 alpha-2 code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Current_codes)) can be customized by setting the `SENSITIVE_COUNTRY_CODES` environment variable in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). +* domain names (from DNS queries and SSL server names) with high entropy as calculated by [freq](https://github.com/MarkBaggett/freq) + - The entropy threshold for this condition to trigger can be adjusted by setting the `FREQ_SEVERITY_THRESHOLD` environment variable in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). A lower value will only assign severity scores to fewer domain names with higher entropy (e.g., `2.0` for `NQZHTFHRMYMTVBQJE.COM`), while a higher value will assign severity scores to more domain names with lower entropy (e.g., `7.5` for `naturallanguagedomain.example.org`). +* file transfers (categorized by mime type) +* `notice.log`, [`intel.log`](zeek-intel.md#ZeekIntel) and `weird.log` entries, including those generated by Zeek plugins detecting vulnerabilities (see the list of Zeek plugins under [Components](components.md#Components)) +* detection of cleartext passwords +* use of insecure or outdated protocols +* tunneled traffic or use of VPN protocols +* rejected or aborted connections +* common network services communicating over non-standard ports +* file scanning engine hits on [extracted files](file-scanning.md#ZeekFileExtraction) +* large connection or file transfer + - The size (in megabytes) threshold for this condition to trigger can be adjusted by setting the `TOTAL_MEGABYTES_SEVERITY_THRESHOLD` environment variable in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). +* long connection duration + - The duration (in seconds) threshold for this condition to trigger can be adjusted by setting the `CONNECTION_SECONDS_SEVERITY_THRESHOLD` environment variable in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml). + +As this [feature](https://github.com/idaholab/Malcolm/issues/19) is improved it's expected that additional categories will be identified and implemented for severity scoring. + +When a Zeek log satisfies more than one of these conditions its severity scores will be summed, with a maximum score of `100`. A Zeek log's severity score is indexed in the `event.severity` field and the conditions which contributed to its score are indexed in `event.severity_tags`. + +![The Severity dashboard](./images/screenshots/dashboards_severity.png) + +## Customizing event severity scoring + +These categories' severity scores can be customized by editing `logstash/maps/malcolm_severity.yaml`: + +* Each category can be assigned a number between `1` and `100` for severity scoring. +* Any category may be disabled by assigning it a score of `0`. +* A severity score can be assigned for any [supported protocol](protocols.md#Protocols) by adding an entry with the key formatted like `"PROTOCOL_XYZ"`, where `XYZ` is the uppercased value of the protocol as stored in the `network.protocol` field. For example, to assign a score of `40` to Zeek logs generated for SSH traffic, you could add the following line to `malcolm_severity.yaml`: + +``` +"PROTOCOL_SSH": 40 +``` + +Restart Logstash after modifying `malcolm_severity.yaml` for the changes to take effect. The [hostname and CIDR subnet names interface](host-and-subnet-mapping.md#NameMapUI) provides a convenient button for restarting Logstash. + +Severity scoring can be disabled globally by setting the `LOGSTASH_SEVERITY_SCORING` environment variable to `false` in the [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) file and [restarting Malcolm](running.md#StopAndRestart). \ No newline at end of file diff --git a/docs/system-requirements.md b/docs/system-requirements.md new file mode 100644 index 000000000..e789504c1 --- /dev/null +++ b/docs/system-requirements.md @@ -0,0 +1,7 @@ +# Recommended system requirements + +Malcolm runs on top of [Docker](https://www.docker.com/) which runs on recent releases of Linux, Apple macOS and Microsoft Windows 10. + +To quote the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html), "If there is one resource that you will run out of first, it will likely be memory." The same is true for Malcolm: you will want at least 16 gigabytes of RAM to run Malcolm comfortably. For processing large volumes of traffic, I'd recommend at a bare minimum a dedicated server with 16 cores and 16 gigabytes of RAM. Malcolm can run on less, but more is better. You're going to want as much hard drive space as possible, of course, as the amount of PCAP data you're able to analyze and store will be limited by your hard drive. + +Arkime's wiki has a couple of documents ([here](https://github.com/arkime/arkime#hardware-requirements) and [here](https://github.com/arkime/arkime/wiki/FAQ#what-kind-of-capture-machines-should-we-buy) and [here](https://github.com/arkime/arkime/wiki/FAQ#how-many-elasticsearch-nodes-or-machines-do-i-need) and a [calculator here](https://molo.ch/#estimators)) which may be helpful, although not everything in those documents will apply to a Docker-based setup like Malcolm. \ No newline at end of file diff --git a/scripts/third-party-logs/README.md b/docs/third-party-logs.md similarity index 88% rename from scripts/third-party-logs/README.md rename to docs/third-party-logs.md index 4e4b58530..804e8f66c 100644 --- a/scripts/third-party-logs/README.md +++ b/docs/third-party-logs.md @@ -12,10 +12,9 @@ Malcolm uses [OpenSearch](https://opensearch.org/) and [OpenSearch Dashboards](h * Messages in the form of MQTT control packets * many more... -The types of third-party logs and metrics discussed in this document are *not* the same as the network session metadata provided by Arkime, Zeek and Suricata. Please refer to the [Malcolm Contributor Guide](../../docs/contributing/README.md) for information on integrating a new network traffic analysis provider. - -## Table of Contents +The types of third-party logs and metrics discussed in this document are *not* the same as the network session metadata provided by Arkime, Zeek and Suricata. Please refer to the [Malcolm Contributor Guide](contributing-guide.md) for information on integrating a new network traffic analysis provider. + * [Configuring Malcolm](#Malcolm) - [Secure communication](#MalcolmTLS) * [Fluent Bit](#FluentBit) @@ -27,7 +26,7 @@ The types of third-party logs and metrics discussed in this document are *not* t ## Configuring Malcolm -The environment variables in [`docker-compose.yml`](../../README.md#DockerComposeYml) for configuring how Malcolm accepts external logs are prefixed with `FILEBEAT_TCP_…`. These values can be specified during Malcolm configuration (i.e., when running [`./scripts/install.py --configure`](../../README.md#ConfigAndTuning)), as can be seen from the following excerpt from the [Installation example](../../README.md#InstallationExample): +The environment variables in [`docker-compose.yml`](malcolm-config.md#DockerComposeYml) for configuring how Malcolm accepts external logs are prefixed with `FILEBEAT_TCP_…`. These values can be specified during Malcolm configuration (i.e., when running [`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning)), as can be seen from the following excerpt from the [Installation example](ubuntu-install-example.md#InstallationExample): ``` … @@ -57,11 +56,11 @@ The variables corresponding to these questions can be found in the `filebeat-var * `FILEBEAT_TCP_PARSE_DROP_FIELD` - name of field to drop (if it exists) in logs sent to the Filebeat TCP input listener * `FILEBEAT_TCP_TAG` - tag to append to events sent to the Filebeat TCP input listener -These variables' values will depend on your forwarder and the format of the data it sends. Note that unless you are creating your own [Logstash pipeline](../../docs/contributing/README.md#LogstashNewSource), you probably want to choose the default `_malcolm_beats` for `FILEBEAT_TCP_TAG` in order for your logs to be picked up and ingested through Malcolm's `beats` pipeline. +These variables' values will depend on your forwarder and the format of the data it sends. Note that unless you are creating your own [Logstash pipeline](contributing-logstash.md#LogstashNewSource), you probably want to choose the default `_malcolm_beats` for `FILEBEAT_TCP_TAG` in order for your logs to be picked up and ingested through Malcolm's `beats` pipeline. ### Secure communication -In order to maintain the integrity and confidentiality of your data, Malcolm's default (set via the `BEATS_SSL` environment variable in `docker-compose.yml`) is to require connections from external forwarders to be encrypted using TLS. When [`./scripts/auth_setup`](../../README.md#AuthSetup) is run, self-signed certificates are generated which may be used by remote log forwarders. Located in the `filebeat/certs/` directory, the certificate authority and client certificate and key files should be copied to the host on which your forwarder is running and used when defining its settings for connecting to Malcolm. +In order to maintain the integrity and confidentiality of your data, Malcolm's default (set via the `BEATS_SSL` environment variable in `docker-compose.yml`) is to require connections from external forwarders to be encrypted using TLS. When [`./scripts/auth_setup`](authsetup.md#AuthSetup) is run, self-signed certificates are generated which may be used by remote log forwarders. Located in the `filebeat/certs/` directory, the certificate authority and client certificate and key files should be copied to the host on which your forwarder is running and used when defining its settings for connecting to Malcolm. ## Fluent Bit @@ -277,7 +276,7 @@ Running fluentbit_winev... fluentbit_winevtlog Elastic [Beats](https://www.elastic.co/beats/) can also be used to forward data to Malcolm's Filebeat TCP listener. Follow the [Get started with Beats](https://www.elastic.co/guide/en/beats/libbeat/current/getting-started.html) documentation for configuring Beats on your system. -In contrast to Fluent Bit, Beats forwarders write to Malcolm's Logstash input over TCP port 5044 (rather than its Filebeat TCP input). Answer `Y` when prompted `Expose Logstash port to external hosts?` during Malcolm configuration (i.e., when running [`./scripts/install.py --configure`](../../README.md#ConfigAndTuning)) to allow external remote Beats forwarders to send logs to Logstash. +In contrast to Fluent Bit, Beats forwarders write to Malcolm's Logstash input over TCP port 5044 (rather than its Filebeat TCP input). Answer `Y` when prompted `Expose Logstash port to external hosts?` during Malcolm configuration (i.e., when running [`./scripts/install.py --configure`](malcolm-config.md#ConfigAndTuning)) to allow external remote Beats forwarders to send logs to Logstash. Your Beat's [configuration YML file](https://www.elastic.co/guide/en/beats/libbeat/current/config-file-format.html) file might look something like this sample [filebeat.yml](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-howto-filebeat.html) file: @@ -302,7 +301,7 @@ output.logstash: ssl.verification_mode: "none" ``` -The important bits to note in this example are the settings under [`output.logstash`](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html) (including the TLS-related files described above in **Configuring Malcolm**) and the `_malcolm_beats` value in [`tags`](https://www.elastic.co/guide/en/beats/filebeat/current/add-tags.html): unless you are creating your own [Logstash pipeline](../../docs/contributing/README.md#LogstashNewSource), you probably want to use `_malcolm_beats` in order for your logs to be picked up and ingested through Malcolm's `beats` pipeline. This parts should apply regardless of the specific Beats forwarder you're using (e.g., Filebeat, Metricbeat, Winlogbeat, etc.). +The important bits to note in this example are the settings under [`output.logstash`](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html) (including the TLS-related files described above in **Configuring Malcolm**) and the `_malcolm_beats` value in [`tags`](https://www.elastic.co/guide/en/beats/filebeat/current/add-tags.html): unless you are creating your own [Logstash pipeline](contributing-logstash.md#LogstashNewSource), you probably want to use `_malcolm_beats` in order for your logs to be picked up and ingested through Malcolm's `beats` pipeline. This parts should apply regardless of the specific Beats forwarder you're using (e.g., Filebeat, Metricbeat, Winlogbeat, etc.). Most Beats forwarders can use [processors](https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html) to filter, transform and enhance data prior to sending it to Malcolm. Consult each forwarder's [documentation](https://www.elastic.co/beats/) to learn more about what processors are available and how to configure them. Use the [Console output](https://www.elastic.co/guide/en/beats/filebeat/current/console-output.html) for debugging and experimenting with how Beats forwarders format the logs they generate. @@ -310,18 +309,18 @@ Most Beats forwarders can use [processors](https://www.elastic.co/guide/en/beats Because Malcolm could receive logs or metrics from virtually any provider, Malcolm most likely does not have prebuilt dashboards and visualizations for your third-party logs. Luckily, [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) provides visualization tools that can be used with whatever data is stored in Malcolm's OpenSearch document store. Here are some resources to help you get started understanding OpenSearch Dashboards and building custom visualizations for your data: -* [OpenSearch Dashboards](../../README.md#Dashboards) in the Malcolm documentation +* [OpenSearch Dashboards](dashboards.md) in the Malcolm documentation * [OpenSearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/) documentation * [Kibana User Guide](https://www.elastic.co/guide/en/kibana/7.10/index.html) (OpenSearch Dashboards is an open-source fork of Kibana, so much of its documentation also applies to OpenSearch Dashboards) - [Discover](https://www.elastic.co/guide/en/kibana/7.10/discover.html) - [Searching Your Data](https://www.elastic.co/guide/en/kibana/7.10/search.html) - [Kibana Dashboards](https://www.elastic.co/guide/en/kibana/7.10/dashboard.html) - [TimeLine](https://www.elastic.co/guide/en/kibana/7.12/timelion.html) -* [Search Queries in Arkime and OpenSearch](../../README.md#SearchCheatSheet) +* [Search Queries in Arkime and OpenSearch](queries-cheat-sheet.md#SearchCheatSheet) ## Document Indices -Third-party logs ingested into Malcolm as outlined in this document will be indexed into the `malcolm_beats_*` index pattern (unless you've created your own [Logstash pipeline](../../docs/contributing/README.md#LogstashNewSource)), which can be selected in the OpenSearch Dashboards' Discover view or when specifying the log source for a new visualization. +Third-party logs ingested into Malcolm as outlined in this document will be indexed into the `malcolm_beats_*` index pattern (unless you've created your own [Logstash pipeline](contributing-logstash.md#LogstashNewSource)), which can be selected in the OpenSearch Dashboards' Discover view or when specifying the log source for a new visualization. Because these documents are indexed by OpenSearch dynamically as they are ingested by Logstash, their component fields will not show up as searchable in OpenSearch Dashboards visualizations until its copy of the field list is refreshed. Malcolm periodically refreshes this list, but if fields are missing from your visualizations you may wish to do it manually. diff --git a/docs/time-sync.md b/docs/time-sync.md new file mode 100644 index 000000000..f4c3500fc --- /dev/null +++ b/docs/time-sync.md @@ -0,0 +1,9 @@ +# Time synchronization + +If you wish to set up time synchronization via [NTP](http://www.ntp.org/) or `htpdate`, open a terminal and run `sudo configure-interfaces.py`. Select **Continue**, then choose **Time Sync**. Here you can configure the operating system to keep its time synchronized with either an NTP server (using the NTP protocol), another Malcolm instance, or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure. + +If **htpdate** is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port `9200` may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server. + +If *ntpdate* is selected, you will be prompted to enter the IP address or hostname of the NTP server. + +Upon configuring time synchronization, a "Time synchronization configured successfully!" message will be displayed. \ No newline at end of file diff --git a/docs/ubuntu-install-example.md b/docs/ubuntu-install-example.md new file mode 100644 index 000000000..c61cfaac4 --- /dev/null +++ b/docs/ubuntu-install-example.md @@ -0,0 +1,323 @@ +# Installation example using Ubuntu 22.04 LTS + +Here's a step-by-step example of getting [Malcolm from GitHub](https://github.com/idaholab/Malcolm/tree/main), configuring your system and your Malcolm instance, and running it on a system running Ubuntu Linux. Your mileage may vary depending on your individual system configuration, but this should be a good starting point. + +The commands in this example should be executed as a non-root user. + +You can use `git` to clone Malcolm into a local working copy, or you can download and extract the artifacts from the [latest release](https://github.com/idaholab/Malcolm/releases). + +To install Malcolm from the latest Malcolm release, browse to the [Malcolm releases page on GitHub](https://github.com/idaholab/Malcolm/releases) and download at a minimum `install.py` and the `malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz` file, then navigate to your downloads directory: +``` +user@host:~$ cd Downloads/ +user@host:~/Downloads$ ls +malcolm_common.py install.py malcolm_20190611_095410_ce2d8de.tar.gz +``` + +If you are obtaining Malcolm using `git` instead, run the following command to clone Malcolm into a local working copy: +``` +user@host:~$ git clone https://github.com/idaholab/Malcolm +Cloning into 'Malcolm'... +remote: Enumerating objects: 443, done. +remote: Counting objects: 100% (443/443), done. +remote: Compressing objects: 100% (310/310), done. +remote: Total 443 (delta 81), reused 441 (delta 79), pack-reused 0 +Receiving objects: 100% (443/443), 6.87 MiB | 18.86 MiB/s, done. +Resolving deltas: 100% (81/81), done. + +user@host:~$ cd Malcolm/ +``` + +Next, run the `install.py` script to configure your system. Replace `user` in this example with your local account username, and follow the prompts. Most questions have an acceptable default you can accept by pressing the `Enter` key. Depending on whether you are installing Malcolm from the release tarball or inside of a git working copy, the questions below will be slightly different, but for the most part are the same. +``` +user@host:~/Malcolm$ sudo ./scripts/install.py +Installing required packages: ['apache2-utils', 'make', 'openssl', 'python3-dialog'] + +"docker info" failed, attempt to install Docker? (Y/n): y + +Attempt to install Docker using official repositories? (Y/n): y +Installing required packages: ['apt-transport-https', 'ca-certificates', 'curl', 'gnupg-agent', 'software-properties-common'] +Installing docker packages: ['docker-ce', 'docker-ce-cli', 'containerd.io'] +Installation of docker packages apparently succeeded + +Add a non-root user to the "docker" group?: y + +Enter user account: user + +Add another non-root user to the "docker" group?: n + +"docker-compose version" failed, attempt to install docker-compose? (Y/n): y + +Install docker-compose directly from docker github? (Y/n): y +Download and installation of docker-compose apparently succeeded + +fs.file-max increases allowed maximum for file handles +fs.file-max= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +fs.inotify.max_user_watches increases allowed maximum for monitored files +fs.inotify.max_user_watches= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +fs.inotify.max_queued_events increases queue size for monitored files +fs.inotify.max_queued_events= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +fs.inotify.max_user_instances increases allowed maximum monitor file watchers +fs.inotify.max_user_instances= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +vm.max_map_count increases allowed maximum for memory segments +vm.max_map_count= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +net.core.somaxconn increases allowed maximum for socket connections +net.core.somaxconn= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +vm.swappiness adjusts the preference of the system to swap vs. drop runtime memory pages +vm.swappiness= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +vm.dirty_background_ratio defines the percentage of system memory fillable with "dirty" pages before flushing +vm.dirty_background_ratio= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +vm.dirty_ratio defines the maximum percentage of dirty system memory before committing everything +vm.dirty_ratio= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y + +/etc/security/limits.d/limits.conf increases the allowed maximums for file handles and memlocked segments +/etc/security/limits.d/limits.conf does not exist, create it? (Y/n): y +``` + +If you are configuring Malcolm from within a git working copy, `install.py` will now exit. Run `install.py` again like you did at the beginning of the example, only remove the `sudo` and add `--configure` to run `install.py` in "configuration only" mode. +``` +user@host:~/Malcolm$ ./scripts/install.py --configure +``` + +Alternately, if you are configuring Malcolm from the release tarball you will be asked if you would like to extract the contents of the tarball and to specify the installation directory and `install.py` will continue: +``` +Extract Malcolm runtime files from /home/user/Downloads/malcolm_20190611_095410_ce2d8de.tar.gz (Y/n): y + +Enter installation path for Malcolm [/home/user/Downloads/malcolm]: /home/user/Malcolm +Malcolm runtime files extracted to /home/user/Malcolm +``` + +Now that any necessary system configuration changes have been made, the local Malcolm instance will be configured: +``` +Malcolm processes will run as UID 1000 and GID 1000. Is this OK? (Y/n): y + +Should Malcolm use and maintain its own OpenSearch instance? (Y/n): y + +Forward Logstash logs to a secondary remote OpenSearch instance? (y/N): n + +Setting 10g for OpenSearch and 3g for Logstash. Is this OK? (Y/n): y + +Setting 3 workers for Logstash pipelines. Is this OK? (Y/n): y + +Restart Malcolm upon system or Docker daemon restart? (y/N): y +1: no +2: on-failure +3: always +4: unless-stopped +Select Malcolm restart behavior (unless-stopped): 4 + +Require encrypted HTTPS connections? (Y/n): y + +Will Malcolm be running behind another reverse proxy (Traefik, Caddy, etc.)? (y/N): n + +Specify external Docker network name (or leave blank for default networking) (): + +Authenticate against Lightweight Directory Access Protocol (LDAP) server? (y/N): n + +Store OpenSearch index snapshots locally in /home/user/Malcolm/opensearch-backup? (Y/n): y + +Compress OpenSearch index snapshots? (y/N): n + +Delete the oldest indices when the database exceeds a certain size? (y/N): n + +Automatically analyze all PCAP files with Suricata? (Y/n): y + +Download updated Suricata signatures periodically? (Y/n): y + +Automatically analyze all PCAP files with Zeek? (Y/n): y + +Perform reverse DNS lookup locally for source and destination IP addresses in logs? (y/N): n + +Perform hardware vendor OUI lookups for MAC addresses? (Y/n): y + +Perform string randomness scoring on some fields? (Y/n): y + +Expose OpenSearch port to external hosts? (y/N): n + +Expose Logstash port to external hosts? (y/N): n + +Expose Filebeat TCP port to external hosts? (y/N): y +1: json +2: raw +Select log format for messages sent to Filebeat TCP listener (json): 1 + +Source field to parse for messages sent to Filebeat TCP listener (message): message + +Target field under which to store decoded JSON fields for messages sent to Filebeat TCP listener (miscbeat): miscbeat + +Field to drop from events sent to Filebeat TCP listener (message): message + +Tag to apply to messages sent to Filebeat TCP listener (_malcolm_beats): _malcolm_beats + +Expose SFTP server (for PCAP upload) to external hosts? (y/N): n + +Enable file extraction with Zeek? (y/N): y +1: none +2: known +3: mapped +4: all +5: interesting +Select file extraction behavior (none): 5 +1: quarantined +2: all +3: none +Select file preservation behavior (quarantined): 1 + +Scan extracted files with ClamAV? (y/N): y + +Scan extracted files with Yara? (y/N): y + +Scan extracted PE files with Capa? (y/N): y + +Lookup extracted file hashes with VirusTotal? (y/N): n + +Download updated file scanner signatures periodically? (Y/n): y + +Should Malcolm capture live network traffic to PCAP files for analysis with Arkime? (y/N): y + +Capture packets using netsniff-ng? (Y/n): y + +Capture packets using tcpdump? (y/N): n + +Should Malcolm analyze live network traffic with Suricata? (y/N): y + +Should Malcolm analyze live network traffic with Zeek? (y/N): y + +Specify capture interface(s) (comma-separated): eth0 + +Capture filter (tcpdump-like filter expression; leave blank to capture all traffic) (): not port 5044 and not port 8005 and not port 9200 + +Disable capture interface hardware offloading and adjust ring buffer sizes? (y/N): n + +Malcolm has been installed to /home/user/Malcolm. See README.md for more information. +Scripts for starting and stopping Malcolm and changing authentication-related settings can be found in /home/user/Malcolm/scripts. +``` + +At this point you should **reboot your computer** so that the new system settings can be applied. After rebooting, log back in and return to the directory to which Malcolm was installed (or to which the git working copy was cloned). + +Now we need to [set up authentication](authsetup.md#AuthSetup) and generate some unique self-signed TLS certificates. You can replace `analyst` in this example with whatever username you wish to use to log in to the Malcolm web interface. +``` +user@host:~/Malcolm$ ./scripts/auth_setup + +Store administrator username/password for local Malcolm access? (Y/n): y + +Administrator username: analyst +analyst password: +analyst password (again): + +(Re)generate self-signed certificates for HTTPS access (Y/n): y + +(Re)generate self-signed certificates for a remote log forwarder (Y/n): y + +Store username/password for primary remote OpenSearch instance? (y/N): n + +Store username/password for secondary remote OpenSearch instance? (y/N): n + +Store username/password for email alert sender account? (y/N): n + +(Re)generate internal passwords for NetBox (Y/n): y +``` + +For now, rather than [build Malcolm from scratch](development.md#Build), we'll pull images from [Docker Hub](https://hub.docker.com/u/malcolmnetsec): +``` +user@host:~/Malcolm$ docker-compose pull +Pulling api ... done +Pulling arkime ... done +Pulling dashboards ... done +Pulling dashboards-helper ... done +Pulling file-monitor ... done +Pulling filebeat ... done +Pulling freq ... done +Pulling htadmin ... done +Pulling logstash ... done +Pulling name-map-ui ... done +Pulling netbox ... done +Pulling netbox-postgresql ... done +Pulling netbox-redis ... done +Pulling nginx-proxy ... done +Pulling opensearch ... done +Pulling pcap-capture ... done +Pulling pcap-monitor ... done +Pulling suricata ... done +Pulling upload ... done +Pulling zeek ... done + +user@host:~/Malcolm$ docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +malcolmnetsec/api 6.4.0 xxxxxxxxxxxx 3 days ago 158MB +malcolmnetsec/arkime 6.4.0 xxxxxxxxxxxx 3 days ago 816MB +malcolmnetsec/dashboards 6.4.0 xxxxxxxxxxxx 3 days ago 1.02GB +malcolmnetsec/dashboards-helper 6.4.0 xxxxxxxxxxxx 3 days ago 184MB +malcolmnetsec/file-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 588MB +malcolmnetsec/file-upload 6.4.0 xxxxxxxxxxxx 3 days ago 259MB +malcolmnetsec/filebeat-oss 6.4.0 xxxxxxxxxxxx 3 days ago 624MB +malcolmnetsec/freq 6.4.0 xxxxxxxxxxxx 3 days ago 132MB +malcolmnetsec/htadmin 6.4.0 xxxxxxxxxxxx 3 days ago 242MB +malcolmnetsec/logstash-oss 6.4.0 xxxxxxxxxxxx 3 days ago 1.35GB +malcolmnetsec/name-map-ui 6.4.0 xxxxxxxxxxxx 3 days ago 143MB +malcolmnetsec/netbox 6.4.0 xxxxxxxxxxxx 3 days ago 1.01GB +malcolmnetsec/nginx-proxy 6.4.0 xxxxxxxxxxxx 3 days ago 121MB +malcolmnetsec/opensearch 6.4.0 xxxxxxxxxxxx 3 days ago 1.17GB +malcolmnetsec/pcap-capture 6.4.0 xxxxxxxxxxxx 3 days ago 121MB +malcolmnetsec/pcap-monitor 6.4.0 xxxxxxxxxxxx 3 days ago 213MB +malcolmnetsec/postgresql 6.4.0 xxxxxxxxxxxx 3 days ago 268MB +malcolmnetsec/redis 6.4.0 xxxxxxxxxxxx 3 days ago 34.2MB +malcolmnetsec/suricata 6.4.0 xxxxxxxxxxxx 3 days ago 278MB +malcolmnetsec/zeek 6.4.0 xxxxxxxxxxxx 3 days ago 1GB +``` + +Finally, we can start Malcolm. When Malcolm starts it will stream informational and debug messages to the console. If you wish, you can safely close the console or use `Ctrl+C` to stop these messages; Malcolm will continue running in the background. +``` +user@host:~/Malcolm$ ./scripts/start +In a few minutes, Malcolm services will be accessible via the following URLs: +------------------------------------------------------------------------------ + - Arkime: https://localhost/ + - OpenSearch Dashboards: https://localhost/dashboards/ + - PCAP upload (web): https://localhost/upload/ + - PCAP upload (sftp): sftp://username@127.0.0.1:8022/files/ + - Host and subnet name mapping editor: https://localhost/name-map-ui/ + - NetBox: https://localhost/netbox/ + - Account management: https://localhost:488/ + +NAME COMMAND SERVICE STATUS PORTS +malcolm-api-1 "/usr/local/bin/dock…" api running (starting) … +malcolm-arkime-1 "/usr/local/bin/dock…" arkime running (starting) … +malcolm-dashboards-1 "/usr/local/bin/dock…" dashboards running (starting) … +malcolm-dashboards-helper-1 "/usr/local/bin/dock…" dashboards-helper running (starting) … +malcolm-file-monitor-1 "/usr/local/bin/dock…" file-monitor running (starting) … +malcolm-filebeat-1 "/usr/local/bin/dock…" filebeat running (starting) … +malcolm-freq-1 "/usr/local/bin/dock…" freq running (starting) … +malcolm-htadmin-1 "/usr/local/bin/dock…" htadmin running (starting) … +malcolm-logstash-1 "/usr/local/bin/dock…" logstash running (starting) … +malcolm-name-map-ui-1 "/usr/local/bin/dock…" name-map-ui running (starting) … +malcolm-netbox-1 "/usr/bin/tini -- /u…" netbox running (starting) … +malcolm-netbox-postgres-1 "/usr/bin/docker-uid…" netbox-postgres running (starting) … +malcolm-netbox-redis-1 "/sbin/tini -- /usr/…" netbox-redis running (starting) … +malcolm-netbox-redis-cache-1 "/sbin/tini -- /usr/…" netbox-redis-cache running (starting) … +malcolm-nginx-proxy-1 "/usr/local/bin/dock…" nginx-proxy running (starting) … +malcolm-opensearch-1 "/usr/local/bin/dock…" opensearch running (starting) … +malcolm-pcap-capture-1 "/usr/local/bin/dock…" pcap-capture running … +malcolm-pcap-monitor-1 "/usr/local/bin/dock…" pcap-monitor running (starting) … +malcolm-suricata-1 "/usr/local/bin/dock…" suricata running (starting) … +malcolm-suricata-live-1 "/usr/local/bin/dock…" suricata-live running … +malcolm-upload-1 "/usr/local/bin/dock…" upload running (starting) … +malcolm-zeek-1 "/usr/local/bin/dock…" zeek running (starting) … +malcolm-zeek-live-1 "/usr/local/bin/dock…" zeek-live running … +… +``` + +It will take several minutes for all of Malcolm's components to start up. Logstash will take the longest, probably 3 to 5 minutes. You'll know Logstash is fully ready when you see Logstash spit out a bunch of starting up messages, ending with this: +``` +… +malcolm-logstash-1 | [2022-07-27T20:27:52,056][INFO ][logstash.agent ] Pipelines running {:count=>6, :running_pipelines=>[:"malcolm-input", :"malcolm-output", :"malcolm-beats", :"malcolm-suricata", :"malcolm-enrichment", :"malcolm-zeek"], :non_running_pipelines=>[]} +… +``` + +You can now open a web browser and navigate to one of the [Malcolm user interfaces](quickstart.md#UserInterfaceURLs). \ No newline at end of file diff --git a/docs/upload.md b/docs/upload.md new file mode 100644 index 000000000..a0a4e5fe1 --- /dev/null +++ b/docs/upload.md @@ -0,0 +1,30 @@ +# Capture file and log archive upload + +* [Capture file and log archive upload](#Upload) + - [Tagging](#Tagging) + - [Processing uploaded PCAPs with Zeek and Suricata](#UploadPCAPProcessors) + +Malcolm serves a web browser-based upload form for uploading PCAP files and Zeek logs at [https://localhost/upload/](https://localhost/upload/) if you are connecting locally. + +![Capture File and Log Archive Upload](./images/screenshots/malcolm_upload.png) + +Additionally, there is a writable `files` directory on an SFTP server served on port 8022 (e.g., `sftp://USERNAME@localhost:8022/files/` if you are connecting locally). + +The types of files supported are: + +* PCAP files (of mime type `application/vnd.tcpdump.pcap` or `application/x-pcapng`) + - PCAPNG files are *partially* supported: Zeek is able to process PCAPNG files, but not all of Arkime's packet examination features work correctly +* Zeek logs in archive files (`application/gzip`, `application/x-gzip`, `application/x-7z-compressed`, `application/x-bzip2`, `application/x-cpio`, `application/x-lzip`, `application/x-lzma`, `application/x-rar-compressed`, `application/x-tar`, `application/x-xz`, or `application/zip`) + - where the Zeek logs are found in the internal directory structure in the archive file does not matter + +Files uploaded via these methods are monitored and moved automatically to other directories for processing to begin, generally within one minute of completion of the upload. + +## Tagging + +In addition to be processed for uploading, Malcolm events will be tagged according to the components of the filenames of the PCAP files or Zeek log archives files from which the events were parsed. For example, records created from a PCAP file named `ACME_Scada_VLAN10.pcap` would be tagged with `ACME`, `Scada`, and `VLAN10`. Tags are extracted from filenames by splitting on the characters `,` (comma), `-` (dash), and `_` (underscore). These tags are viewable and searchable (via the `tags` field) in Arkime and OpenSearch Dashboards. This behavior can be changed by modifying the `AUTO_TAG` [environment variable in `docker-compose.yml`](malcolm-config.md#DockerComposeYml). + +Tags may also be specified manually with the [browser-based upload form](#Upload). + +## Processing uploaded PCAPs with Zeek and Suricata + +The **Analyze with Zeek** and **Analyze with Suricata** checkboxes may be used when uploading PCAP files to cause them to be analyzed by Zeek and Suricata, respectively. This is functionally equivalent to the `ZEEK_AUTO_ANALYZE_PCAP_FILES` and `SURICATA_AUTO_ANALYZE_PCAP_FILES` environment variables [described above](malcolm-config.md#DockerComposeYml), only on a per-upload basis. Zeek can also automatically carve out files from file transfers; see [Automatic file extraction and scanning](file-scanning.md#ZeekFileExtraction) for more details. diff --git a/docs/zeek-intel.md b/docs/zeek-intel.md new file mode 100644 index 000000000..f60744dd2 --- /dev/null +++ b/docs/zeek-intel.md @@ -0,0 +1,62 @@ +# Zeek Intelligence Framework + +* [Zeek Intelligence Framework](#ZeekIntel) + - [STIX™ and TAXII™](#ZeekIntelSTIX) + - [MISP](#ZeekIntelMISP) + +To quote Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) documentation, "The goals of Zeek’s Intelligence Framework are to consume intelligence data, make it available for matching, and provide infrastructure to improve performance and memory utilization. Data in the Intelligence Framework is an atomic piece of intelligence such as an IP address or an e-mail address. This atomic data will be packed with metadata such as a freeform source field, a freeform descriptive field, and a URL which might lead to more information about the specific item." Zeek [intelligence](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html) [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type) include IP addresses, URLs, file names, hashes, email addresses, and more. + +Malcolm doesn't come bundled with intelligence files from any particular feed, but they can be easily included into your local instance. On [startup](shared/bin/zeek_intel_setup.sh), Malcolm's `malcolmnetsec/zeek` docker container enumerates the subdirectories under `./zeek/intel` (which is [bind mounted](https://docs.docker.com/storage/bind-mounts/) into the container's runtime) and configures Zeek so that those intelligence files will be automatically included in its local policy. Subdirectories under `./zeek/intel` which contain their own `__load__.zeek` file will be `@load`-ed as-is, while subdirectories containing "loose" intelligence files will be [loaded](https://docs.zeek.org/en/master/frameworks/intel.html#loading-intelligence) automatically with a `redef Intel::read_files` directive. + +Note that Malcolm does not manage updates for these intelligence files. You should use the update mechanism suggested by your feeds' maintainers to keep them up to date, or use a [TAXII](#ZeekIntelSTIX) or [MISP](#ZeekIntelMISP) feed as described below. + +Adding and deleting intelligence files under this directory will take effect upon [restarting Malcolm](running.md#StopAndRestart). Alternately, you can use the `ZEEK_INTEL_REFRESH_CRON_EXPRESSION` environment variable containing a [cron expression](https://en.wikipedia.org/wiki/Cron#CRON_expression) to specify the interval at which the intel files should be refreshed. It can also be done manually without restarting Malcolm by running the following command from the Malcolm installation directory: + +``` +docker-compose exec --user $(id -u) zeek /usr/local/bin/entrypoint.sh true +``` + +For a public example of Zeek intelligence files, see Critical Path Security's [repository](https://github.com/CriticalPathSecurity/Zeek-Intelligence-Feeds) which aggregates data from various other threat feeds into Zeek's format. + +## STIX™ and TAXII™ + +In addition to loading Zeek intelligence files, on startup Malcolm will [automatically generate](shared/bin/zeek_intel_from_threat_feed.py) a Zeek intelligence file for all [Structured Threat Information Expression (STIX™)](https://oasis-open.github.io/cti-documentation/stix/intro.html) [v2.0](https://docs.oasis-open.org/cti/stix/v2.0/stix-v2.0-part1-stix-core.html)/[v2.1](https://docs.oasis-open.org/cti/stix/v2.1/stix-v2.1.html) JSON files found under `./zeek/intel/STIX`. + +Additionally, if a special text file named `.stix_input.txt` is found in `./zeek/intel/STIX`, that file will be read and processed as a list of [TAXII™](https://oasis-open.github.io/cti-documentation/taxii/intro.html) [2.0](http://docs.oasis-open.org/cti/taxii/v2.0/cs01/taxii-v2.0-cs01.html)/[2.1](https://docs.oasis-open.org/cti/taxii/v2.1/csprd02/taxii-v2.1-csprd02.html) feeds, one per line, according to the following format (the username and password are optional): + +``` +taxii|version|discovery_url|collection_name|username|password +``` + +For example: + +``` +taxii|2.0|http://example.org/taxii/|IP Blocklist|guest|guest +taxii|2.1|https://example.com/taxii/api2/|URL Blocklist +… +``` + +Malcolm will attempt to query the TAXII feed(s) for `indicator` STIX objects and convert them to the Zeek intelligence format as described above. There are publicly available TAXII 2.x-compatible services provided by a number of organizations including [Anomali Labs](https://www.anomali.com/resources/limo) and [MITRE](https://www.mitre.org/capabilities/cybersecurity/overview/cybersecurity-blog/attck%E2%84%A2-content-available-in-stix%E2%84%A2-20-via), or you may choose from several open-source offerings to roll your own TAXII 2 server (e.g., [oasis-open/cti-taxii-server](https://github.com/oasis-open/cti-taxii-server), [freetaxii/server](https://github.com/freetaxii/server), [StephenOTT/TAXII-Server](https://github.com/StephenOTT/TAXII-Server), etc.). + +Note that only **indicators** of [**cyber-observable objects**](https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html#_mlbmudhl16lr) matched with the **equals (`=`)** [comparison operator](https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html#_t11hn314cr7w) against a **single value** can be expressed as Zeek intelligence items. More complex STIX indicators will be silently ignored. + +## MISP + +In addition to loading Zeek intelligence files, on startup Malcolm will [automatically generate](shared/bin/zeek_intel_from_threat_feed.py) a Zeek intelligence file for all [Malware Information Sharing Platform (MISP)](https://www.misp-project.org/datamodels/) JSON files found under `./zeek/intel/MISP`. + +Additionally, if a special text file named `.misp_input.txt` is found in `./zeek/intel/MISP`, that file will be read and processed as a list of [MISP feed](https://misp.gitbooks.io/misp-book/content/managing-feeds/#feeds) URLs, one per line, according to the following format (the authentication key is optional): + +``` +misp|manifest_url|auth_key +``` + +For example: + +``` +misp|https://example.com/data/feed-osint/manifest.json|df97338db644c64fbfd90f3e03ba8870 +… +``` + +Malcolm will attempt to connect to the MISP feed(s) and retrieve [`Attribute`](https://www.misp-standard.org/rfc/misp-standard-core.html#name-attribute) objects of MISP events and convert them to the Zeek intelligence format as described above. There are publicly available [MISP feeds](https://www.misp-project.org/feeds/) and [communities](https://www.misp-project.org/communities/), or you may [run your own MISP instance](https://www.misp-project.org/2019/09/25/hostev-vs-own-misp.html/). + +Note that only a subset of MISP [attribute types](https://www.misp-project.org/datamodels/#attribute-categories-vs-types) can be expressed with the Zeek intelligence [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type). MISP attributes with other types will be silently ignored. \ No newline at end of file diff --git a/sensor-iso/README.md b/sensor-iso/README.md deleted file mode 100644 index 0ddfdf8c1..000000000 --- a/sensor-iso/README.md +++ /dev/null @@ -1,833 +0,0 @@ -# Hedgehog Linux -## Network Traffic Capture Appliance - -![](./docs/logo/hedgehog-color-w-text.png) - -Hedgehog Linux is a Debian-based operating system built to - -* monitor network interfaces -* capture packets to PCAP files -* detect file transfers in network traffic and extract and scan those files for threats -* generate and forward Zeek logs, Arkime sessions and other information to [Malcolm](https://github.com/idaholab/Malcolm) - -![sensor-iso-build-docker-wrap-push-ghcr](https://github.com/idaholab/Malcolm/workflows/sensor-iso-build-docker-wrap-push-ghcr/badge.svg) - -### Table of Contents - -* [Sensor installation](#Installation) - - [Image boot options](#BootOptions) - - [Installer](#Installer) -* [Boot](#Boot) - - [Kiosk mode](#KioskMode) -* [Configuration](#Configuration) - - [Interfaces, hostname, and time synchronization](#ConfigRoot) - + [Hostname](#ConfigHostname) - + [Interfaces](#ConfigIface) - + [Time synchronization](#ConfigTime) - - [Capture, forwarding, and autostart services](#ConfigUser) - + [Capture](#ConfigCapture) - * [Automatic file extraction and scanning](#ZeekFileExtraction) - + [Forwarding](#ConfigForwarding) - * [arkime-capture](#arkime-capture): Arkime session forwarding - * [filebeat](#filebeat): Zeek and Suricata log forwarding - * [miscbeat](#miscbeat): System metrics forwarding - + [Autostart services](#ConfigAutostart) -+ [Zeek Intelligence Framework](#ZeekIntel) -* [Appendix A - Generating the ISO](#ISOBuild) -* [Appendix B - Configuring SSH access](#ConfigSSH) -* [Appendix C - Troubleshooting](#Troubleshooting) -* [Appendix D - Hardening](#Hardening) - - [Compliance exceptions](#ComplianceExceptions) -* [Appendix E - Upgrades](#UpgradePlan) -* [Copyright](#Footer) - -# Sensor installation - -## Image boot options - -The Hedgehog Linux installation image, when provided on an optical disc, USB thumb drive, or other removable medium, can be used to install or reinstall the sensor software. - -![Sensor installation image boot menu](./docs/images/boot_options.png) - -The boot menu of the sensor installer image provides several options: - -* **Live system** and **Live system (fully in RAM)** may also be used to run the sensor in a "live USB" mode without installing any software or making any persistent configuration changes on the sensor hardware. -* **Install Hedgehog Linux** and **Install Hedgehog Linux (encrypted)** are used to [install the sensor](#Installer) onto the current system. Both selections install the same operating system and sensor software, the only difference being that the **encrypted** option encrypts the hard disks with a password (provided in a subsequent step during installation) that must be provided each time the sensor boots. There is some CPU overhead involved in an encrypted installation, so it is recommended that encrypted installations only be used for mobile installations (eg., on a sensor that may be shipped or carried for an incident response) and that the unencrypted option be used for fixed sensors in secure environments. -* **Install Hedgehog Linux (advanced configuration)** allows you to configure installation fully using all of the [Debian installer](https://www.debian.org/releases/stable/amd64/) settings and should only be selected for advanced users who know what they're doing. -* **Rescue system** is included for debugging and/or system recovery and should not be needed in most cases. - -## Installer - -The sensor installer is designed to require as little user input as possible. For this reason, there are NO user prompts and confirmations about partitioning and reformatting hard disks for use by the sensor. The installer assumes that all non-removable storage media (eg., SSD, HDD, NVMe, etc.) are available for use and ⛔🆘😭💀 ***will partition and format them without warning*** 💀😭🆘⛔. - -The installer will ask for a few pieces of information prior to installing the sensor operating system: - -* **Root password** – a password for the privileged root account which is rarely needed (only during the configuration of the sensors network interfaces and setting the sensor host name) -* **User password** – a password for the non-privileged sensor account under which the various sensor capture and forwarding services run -* **Encryption password** (optional) – if the encrypted installation option was selected at boot time, the encryption password must be entered every time the sensor boots - -Each of these passwords must be entered twice to ensure they were entered correctly. - -![Example of the installer's password prompt](./docs/images/users_and_passwords.png) - -After the passwords have been entered, the installer will proceed to format the system drive and install Hedgehog Linux. - -![Installer progress](./docs/images/installer_progress.png) - -At the end of the installation process, you will be prompted with a few self-explanatory yes/no questions: - -* **Disable IPv6?** -* **Automatically login to the GUI session?** -* **Should the GUI session be locked due to inactivity?** -* **Display the [Standard Mandatory DoD Notice and Consent Banner](https://www.stigviewer.com/stig/application_security_and_development/2018-12-24/finding/V-69349)?** *(only applies when installed on U.S. government information systems)* - -Following these prompts, the installer will reboot and Hedgehog Linux will boot. - -# Boot - -Each time the sensor boots, a grub boot menu will be shown briefly, after which the sensor will proceed to load. - -## Kiosk mode - -![Kiosk mode sensor menu: resource statistics](./docs/images/kiosk_mode_sensor_menu.png) - -The sensor automatically logs in as the sensor user account and runs in **kiosk mode**, which is intended to show an at-a-glance view of the its resource utilization. Clicking the **☰** icon in allows you to switch between the resource statistics view and the services view. - -![Kiosk mode sensor menu: services](./docs/images/kiosk_mode_services_menu.png) - -The kiosk's services screen (designed with large clickable labels for small portable touch screens) can be used to start and stop essential services, get a status report of the currently running services, and clean all captured data from the sensor. - -!["Clean Sensor" confirmation prompt before deleting sensor data](./docs/images/kiosk_mode_wipe_prompt.png) - -!["Sensor Status" report from the kiosk services menu](./docs/images/kiosk_mode_status.png) - -# Configuration - -Kiosk mode can be exited by connecting an external USB keyboard and pressing **Alt+F4**, upon which the *sensor* user's desktop is shown. - -![Sensor login session desktop](./docs/images/desktop.png) - -Several icons are available in the top menu bar: - -* **Terminal** - opens a command prompt in a terminal emulator -* **Browser** - opens a web browser -* **Kiosk** – returns the sensor to kiosk mode -* **README** – displays this document -* **Sensor status** – displays a list with the status of each sensor service -* **Configure capture and forwarding** – opens a dialog for configuring the sensor's capture and forwarding services, as well as specifying which services should autostart upon boot -* **Configure interfaces and hostname** – opens a dialog for configuring the sensor's network interfaces and setting the sensor's hostname -* **Restart sensor services** - stops and restarts all of the [autostart services](#ConfigAutostart) - -## Interfaces, hostname, and time synchronization - -### Hostname - -The first step of sensor configuration is to configure the network interfaces and sensor hostname. Clicking the **Configure Interfaces and Hostname** toolbar icon (or, if you are at a command line prompt, running `configure-interfaces`) will prompt you for the root password you created during installation, after which the configuration welcome screen is shown. Select **Continue** to proceed. - -You may next select whether to configure the network interfaces, hostname, or time synchronization. - -![Selection to configure network interfaces, hostname, or time synchronization](./docs/images/root_config_mode.png) - -Selecting **Hostname**, you will be presented with a summary of the current sensor identification information, after which you may specify a new sensor hostname. This name will be used to tag all events forwarded from this sensor in the events' **host.name** field. - -![Specifying a new sensor hostname](./docs/images/hostname_setting.png) - -### Interfaces - -Returning to the configuration mode selection, choose **Interface**. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. - -You will be presented with a list of interfaces to configure as the sensor management interface. This is the interface the sensor itself will use to communicate with the network in order to, for example, forward captured logs to an aggregate server. In order to do so, the management interface must be assigned an IP address. This is generally **not** the interface used for capturing data. Select the interface to which you wish to assign an IP address. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. - -![Management interface selection](./docs/images/select_iface.png) - -Depending on the configuration of your network, you may now specify how the management interface will be assigned an IP address. In order to communicate with an event aggregator over the management interface, either **static** or **dhcp** must be selected. - -![Interface address source](./docs/images/iface_mode.png) - -If you select static, you will be prompted to enter the IP address, netmask, and gateway to assign to the management interface. - -![Static IP configuration](./docs/images/iface_static.png) - -In either case, upon selecting **OK** the network interface will be brought down, configured, and brought back up, and the result of the operation will be displayed. You may choose **Quit** upon returning to the configuration tool's welcome screen. - -### Time synchronization - -Returning to the configuration mode selection, choose **Time Sync**. Here you can configure the sensor to keep its time synchronized with either an NTP server (using the NTP protocol) or a local [Malcolm](https://github.com/idaholab/Malcolm) aggregator or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure. - -![Time synchronization method](./docs/images/time_sync_mode.png) - -If **htpdate** is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port `9200` may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server. - -![*htpdate* configuration](./docs/images/htpdate_setup.png) - -If *ntpdate* is selected, you will be prompted to enter the IP address or hostname of the NTP server. - -![NTP configuration](./docs/images/ntp_host.png) - -Upon configuring time synchronization, a "Time synchronization configured successfully!" message will be displayed, after which you will be returned to the welcome screen. - -## Capture, forwarding, and autostart services - -Clicking the **Configure Capture and Forwarding** toolbar icon (or, if you are at a command prompt, running `configure-capture`) will launch the configuration tool for capture and forwarding. The root password is not required as it was for the interface and hostname configuration, as sensor services are run under the non-privileged sensor account. Select **Continue** to proceed. You may select from a list of configuration options. - -![Select configuration mode](./docs/images/capture_config_main.png) - -### Capture - -Choose **Configure Capture** to configure parameters related to traffic capture and local analysis. You will be prompted if you would like help identifying network interfaces. If you select **Yes**, you will be prompted to select a network interface, after which that interface's link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select **No**. - -You will be presented with a list of network interfaces and prompted to select one or more capture interfaces. An interface used to capture traffic is generally a different interface than the one selected previously as the management interface, and each capture interface should be connected to a network tap or span port for traffic monitoring. Capture interfaces are usually not assigned an IP address as they are only used to passively “listen” to the traffic on the wire. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a `-1` will be displayed instead of the interface speed. - -![Select capture interfaces](./docs/images/capture_iface_select.png) - -Upon choosing the capture interfaces and selecting OK, you may optionally provide a capture filter. This filter will be used to limit what traffic the PCAP service ([`tcpdump`](https://www.tcpdump.org/)) and the traffic analysis services ([`zeek`](https://www.zeek.org/) and [`suricata`](https://suricata.io/)) will see. Capture filters are specified using [Berkeley Packet Filter (BPF)](http://biot.com/capstats/bpf.html) syntax. Clicking **OK** will attempt to validate the capture filter, if specified, and will present a warning if the filter is invalid. - -![Specify capture filters](./docs/images/capture_filter.png) - -Next you must specify the paths where captured PCAP files and logs will be stored locally on the sensor. If the installation worked as expected, these paths should be prepopulated to reflect paths on the volumes formatted at install time for the purpose storing these artifacts. Usually these paths will exist on separate storage volumes. Enabling the PCAP and log pruning autostart services (see the section on autostart services below) will enable monitoring of these paths to ensure that their contents do not consume more than 90% of their respective volumes' space. Choose **OK** to continue. - -![Specify capture paths](./docs/images/capture_paths.png) - -#### Automatic file extraction and scanning - -Hedgehog Linux can leverage Zeek's knowledge of network protocols to automatically detect file transfers and extract those files from network traffic as Zeek sees them. - -To specify which files should be extracted, specify the Zeek file carving mode: - -![Zeek file carving mode](./docs/images/zeek_file_carve_mode.png) - -If you're not sure what to choose, either of **mapped (except common plain text files)** (if you want to carve and scan almost all files) or **interesting** (if you only want to carve and scan files with [mime types of common attack vectors](./interface/sensor_ctl/zeek/extractor_override.interesting.zeek)) is probably a good choice. - -Next, specify which carved files to preserve (saved on the sensor under `/capture/bro/capture/extract_files/quarantine` by default). In order to not consume all of the sensor's available storage space, the oldest preserved files will be pruned along with the oldest Zeek logs as described below with **AUTOSTART_PRUNE_ZEEK** in the [autostart services](#ConfigAutostart) section. - -You'll be prompted to specify which engine(s) to use to analyze extracted files. Extracted files can be examined through any of three methods: - -![File scanners](./docs/images/zeek_file_carve_scanners.png) - -* scanning files with [**ClamAV**](https://www.clamav.net/); to enable this method, select **ZEEK_FILE_SCAN_CLAMAV** when specifying scanners for Zeek-carved files -* submitting file hashes to [**VirusTotal**](https://www.virustotal.com/en/#search); to enable this method, select **ZEEK_FILE_SCAN_VTOT** when specifying scanners for Zeek-carved files, then manually edit `/opt/sensor/sensor_ctl/control_vars.conf` and specify your [VirusTotal API key](https://developers.virustotal.com/reference) in `VTOT_API2_KEY` -* scanning files with [**Yara**](https://github.com/VirusTotal/yara); to enable this method, select **ZEEK_FILE_SCAN_YARA** when specifying scanners for Zeek-carved files -* scanning portable executable (PE) files with [**Capa**](https://github.com/fireeye/capa); to enable this method, select **ZEEK_FILE_SCAN_CAPA** when specifying scanners for Zeek-carved files - -Files which are flagged as potentially malicious will be logged as Zeek `signatures.log` entries, and can be viewed in the **Signatures** dashboard in [OpenSearch Dashboards](https://github.com/idaholab/Malcolm#DashboardsVisualizations) when forwarded to Malcolm. - -![File quarantine](./docs/images/file_quarantine.png) - -Finally, you will be presented with the list of configuration variables that will be used for capture, including the values which you have configured up to this point in this section. Upon choosing **OK** these values will be written back out to the sensor configuration file located at `/opt/sensor/sensor_ctl/control_vars.conf`. It is not recommended that you edit this file manually. After confirming these values, you will be presented with a confirmation that these settings have been written to the configuration file, and you will be returned to the welcome screen. - -### Forwarding - -Select **Configure Forwarding** to set up forwarding logs and statistics from the sensor to an aggregator server, such as [Malcolm](https://github.com/idaholab/Malcolm). - -![Configure forwarders](./docs/images/forwarder_config.png) - -There are five forwarder services used on the sensor, each for forwarding a different type of log or sensor metric. - -### capture: Arkime session forwarding - -[capture](https://github.com/arkime/arkime/tree/master/capture) is not only used to capture PCAP files, but also the parse raw traffic into sessions and forward this session metadata to an [OpenSearch](https://opensearch.org/) database so that it can be viewed in [Arkime viewer](https://arkime.com/), whether standalone or as part of a [Malcolm](https://github.com/idaholab/Malcolm) instance. If you're using Hedgehog Linux with Malcolm, please read [Correlating Zeek logs and Arkime sessions](https://github.com/idaholab/Malcolm#ZeekArkimeFlowCorrelation) in the Malcolm documentation for more information. - -First, select the OpenSearch connection transport protocol, either **HTTPS** or **HTTP**. If the metrics are being forwarded to Malcolm, select **HTTPS** to encrypt messages from the sensor to the aggregator using TLS v1.2 using ECDHE-RSA-AES128-GCM-SHA256. If **HTTPS** is chosen, you must choose whether to enable SSL certificate verification. If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication)), choose **None**. - -![OpenSearch connection protocol](./docs/images/opensearch_connection_protocol.png) ![OpenSearch SSL verification](./docs/images/opensearch_ssl_verification.png) - -Next, enter the **OpenSearch host** IP address (ie., the IP address of the aggregator) and port. These metrics are written to an OpenSearch database using a RESTful API, usually using port 9200. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. - -![OpenSearch host and port](./docs/images/arkime-capture-ip-port.png) - -You will be asked to enter authentication credentials for the sensor's connections to the aggregator's OpenSearch API. After you've entered the username and the password, the sensor will attempt a test connection to OpenSearch using the connection information provided. - -![OpenSearch username](./docs/images/opensearch_username.png) ![OpenSearch password](./docs/images/opensearch_password.png) ![Successful OpenSearch connection](./docs/images/opensearch_connection_success.png) - -Finally, you will be shown a dialog for a list of IP addresses used to populate an access control list (ACL) for hosts allowed to connect back to the sensor for retrieving session payloads from its PCAP files for display in Arkime viewer. The list will be prepopulated with the IP address entered a few screens prior to this one. - -![PCAP retrieval ACL](./docs/images/malcolm_arkime_reachback_acl.png) - -Finally, you'll be given the opportunity to review the all of the Arkime `capture` options you've specified. Selecting **OK** will cause the parameters to be saved and you will be returned to the configuration tool's welcome screen. - -![capture settings confirmation](./docs/images/arkime_confirm.png) - -### filebeat: Zeek and Suricata log forwarding - -[Filebeat](https://www.elastic.co/products/beats/filebeat) is used to forward [Zeek](https://www.zeek.org/) and [Suricata](https://suricata.io/) logs to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. - -To configure filebeat, first provide the log path (the same path previously configured for log file generation). - -![Configure filebeat for log forwarding](./docs/images/filebeat_log_path.png) - -You must also provide the IP address of the Logstash instance to which the logs are to be forwarded, and the port on which Logstash is listening. These logs are forwarded using the Beats protocol, generally over port 5044. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator. - -![Configure filebeat for log forwrading](./docs/images/filebeat_ip_port.png) - -Next you are asked whether the connection used for log forwarding should be done **unencrypted** or over **SSL**. Unencrypted communication requires less processing overhead and is simpler to configure, but the contents of the logs may be visible to anyone who is able to intercept that traffic. - -![Filebeat SSL certificate verification](./docs/images/filebeat_ssl.png) - -If **SSL** is chosen, you must choose whether to enable [SSL certificate verification](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html). If you are using a self-signed certificate (such as the one automatically created during [Malcolm's configuration](https://github.com/idaholab/Malcolm#configure-authentication), choose **None**. - -![Unencrypted vs. SSL encryption for log forwarding](./docs/images/filebeat_ssl_verify.png) - -The last step for SSL-encrypted log forwarding is to specify the SSL certificate authority, certificate, and key files. These files must match those used by the Logstash instance receiving the logs on the aggregator. If Malcolm's `auth_setup` script was used to generate these files they would be found in the `filebeat/certs/` subdirectory of the Malcolm installation and must be manually copied to the sensor (stored under `/opt/sensor/sensor_ctl/logstash-client-certificates` or in any other path accessible to the sensor account). Specify the location of the certificate authorities file (eg., `ca.crt`), the certificate file (eg., `client.crt`), and the key file (eg., `client.key`). - -![SSL certificate files](./docs/images/filebeat_certs.png) - -The Logstash instance receiving the events must be similarly configured with matching SSL certificate and key files. Under Malcolm, the `BEATS_SSL` variable must be set to `true` in Malcolm's `docker-compose.yml` file and the SSL files must exist in the `logstash/certs/` subdirectory of the Malcolm installation. - -Once you have specified all of the filebeat parameters, you will be presented with a summary of the settings related to the forwarding of these logs. Selecting **OK** will cause the parameters to be written to filebeat's configuration keystore under `/opt/sensor/sensor_ctl/logstash-client-certificates` and you will be returned to the configuration tool's welcome screen. - -![Confirm filebeat settings](./docs/images/filebeat_confirm.png) - -### miscbeat: System metrics forwarding - -The sensor uses [Fluent Bit](https://fluentbit.io/) to gather miscellaneous system resource metrics (CPU, network I/O, disk I/O, memory utilization, temperature, etc.) and the [Beats](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) protocol to forward these metrics to a remote [Logstash](https://www.elastic.co/products/logstash) instance for further enrichment prior to insertion into an [OpenSearch](https://opensearch.org/) database. Metrics categories can be enabled/disabled as described in the [autostart services](#ConfigAutostart) section of this document. - -This forwarder's configuration is almost identical to that of [filebeat](#filebeat) in the previous section. Select `miscbeat` from the forwarding configuration mode options and follow the same steps outlined above to set up this forwarder. - -### Autostart services - -Once the forwarders have been configured, the final step is to **Configure Autostart Services**. Choose this option from the configuration mode menu after the welcome screen of the sensor configuration tool. - -Despite configuring capture and/or forwarder services as described in previous sections, only services enabled in the autostart configuration will run when the sensor starts up. The available autostart processes are as follows (recommended services are in **bold text**): - -* **AUTOSTART_ARKIME** - [capture](#arkime-capture) PCAP engine for traffic capture, as well as traffic parsing and metadata insertion into OpenSearch for viewing in [Arkime](https://arkime.com/). If you are using Hedgehog Linux along with [Malcolm](https://github.com/idaholab/Malcolm) or another Arkime installation, this is probably the packet capture engine you want to use. -* **AUTOSTART_CLAMAV_UPDATES** - Virus database update service for ClamAV (requires sensor to be connected to the internet) -* **AUTOSTART_FILEBEAT** - [filebeat](#filebeat) Zeek and Suricata log forwarder -* **AUTOSTART_FLUENTBIT_AIDE** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/exec) [AIDE](https://aide.github.io/) file system integrity checks -* **AUTOSTART_FLUENTBIT_AUDITLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/tail) [auditd](https://man7.org/linux/man-pages/man8/auditd.8.html) logs -* *AUTOSTART_FLUENTBIT_KMSG* - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/kernel-logs) the Linux kernel log buffer (these are generally reflected in syslog as well, which may make this agent redundant) -* **AUTOSTART_FLUENTBIT_METRICS** - [Fluent Bit](https://fluentbit.io/) agent for collecting [various](https://docs.fluentbit.io/manual/pipeline/inputs) system resource and performance metrics -* **AUTOSTART_FLUENTBIT_SYSLOG** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/syslog) Linux syslog messages -* **AUTOSTART_FLUENTBIT_THERMAL** - [Fluent Bit](https://fluentbit.io/) agent [monitoring](https://docs.fluentbit.io/manual/pipeline/inputs/thermal) system temperatures -* **AUTOSTART_MISCBEAT** - [filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-tcp.html) forwarder which sends system metrics collected by [Fluent Bit](https://fluentbit.io/) to a remote Logstash instance (e.g., [Malcolm](https://github.com/idaholab/Malcolm)'s) -* *AUTOSTART_NETSNIFF* - [netsniff-ng](http://netsniff-ng.org/) PCAP engine for saving packet capture (PCAP) files -* **AUTOSTART_PRUNE_PCAP** - storage space monitor to ensure that PCAP files do not consume more than 90% of the total size of the storage volume to which PCAP files are written -* **AUTOSTART_PRUNE_ZEEK** - storage space monitor to ensure that Zeek logs do not consume more than 90% of the total size of the storage volume to which Zeek logs are written -* **AUTOSTART_SURICATA** - [Suricata](https://suricata.io/) traffic analysis engine -* **AUTOSTART_SURICATA_UPDATES** - Rule update service for Suricata (requires sensor to be connected to the internet) -* *AUTOSTART_TCPDUMP* - [tcpdump](https://www.tcpdump.org/) PCAP engine for saving packet capture (PCAP) files -* **AUTOSTART_ZEEK** - [Zeek](https://www.zeek.org/) traffic analysis engine - -Note that only one packet capture engine ([capture](https://arkime.com/), [netsniff-ng](http://netsniff-ng.org/), or [tcpdump](https://www.tcpdump.org/)) can be used. - -![Autostart services](./docs/images/autostarts.png) - -Once you have selected the autostart services, you will be prompted to confirm your selections. Doing so will cause these values to be written back out to the `/opt/sensor/sensor_ctl/control_vars.conf` configuration file. - -![Autostart services confirmation](./docs/images/autostarts_confirm.png) - -After you have completed configuring the sensor it is recommended that you reboot the sensor to ensure all new settings take effect. If rebooting is not an option, you may click the **Restart Sensor Services** menu icon in the top menu bar, or open a terminal and run: - -``` -/opt/sensor/sensor_ctl/shutdown && sleep 10 && /opt/sensor/sensor_ctl/supervisor.sh -``` - -This will cause the sensor services controller to stop, wait a few seconds, and restart. You can check the status of the sensor's processes by choosing **Sensor Status** from the sensor's kiosk mode, clicking the **Sensor Service Status** toolbar icon, or running `/opt/sensor/sensor_ctl/status` from the command line: - -``` -$ /opt/sensor/sensor_ctl/status -arkime:arkime-capture RUNNING pid 6455, uptime 0:03:17 -arkime:arkime-viewer RUNNING pid 6456, uptime 0:03:17 -beats:filebeat RUNNING pid 6457, uptime 0:03:17 -beats:miscbeat RUNNING pid 6458, uptime 0:03:17 -clamav:clamav-service RUNNING pid 6459, uptime 0:03:17 -clamav:clamav-updates RUNNING pid 6461, uptime 0:03:17 -fluentbit-auditlog RUNNING pid 6463, uptime 0:03:17 -fluentbit-kmsg STOPPED Not started -fluentbit-metrics:cpu RUNNING pid 6466, uptime 0:03:17 -fluentbit-metrics:df RUNNING pid 6471, uptime 0:03:17 -fluentbit-metrics:disk RUNNING pid 6468, uptime 0:03:17 -fluentbit-metrics:mem RUNNING pid 6472, uptime 0:03:17 -fluentbit-metrics:mem_p RUNNING pid 6473, uptime 0:03:17 -fluentbit-metrics:netif RUNNING pid 6474, uptime 0:03:17 -fluentbit-syslog RUNNING pid 6478, uptime 0:03:17 -fluentbit-thermal RUNNING pid 6480, uptime 0:03:17 -netsniff:netsniff-enp1s0 STOPPED Not started -prune:prune-pcap RUNNING pid 6484, uptime 0:03:17 -prune:prune-zeek RUNNING pid 6486, uptime 0:03:17 -supercronic RUNNING pid 6490, uptime 0:03:17 -suricata RUNNING pid 6501, uptime 0:03:17 -tcpdump:tcpdump-enp1s0 STOPPED Not started -zeek:capa RUNNING pid 6553, uptime 0:03:17 -zeek:clamav RUNNING pid 6512, uptime 0:03:17 -zeek:logger RUNNING pid 6554, uptime 0:03:17 -zeek:virustotal STOPPED Not started -zeek:watcher RUNNING pid 6510, uptime 0:03:17 -zeek:yara RUNNING pid 6548, uptime 0:03:17 -zeek:zeekctl RUNNING pid 6502, uptime 0:03:17 -``` - -### Zeek Intelligence Framework - -To quote Zeek's [Intelligence Framework](https://docs.zeek.org/en/master/frameworks/intel.html) documentation, "The goals of Zeek’s Intelligence Framework are to consume intelligence data, make it available for matching, and provide infrastructure to improve performance and memory utilization. Data in the Intelligence Framework is an atomic piece of intelligence such as an IP address or an e-mail address. This atomic data will be packed with metadata such as a freeform source field, a freeform descriptive field, and a URL which might lead to more information about the specific item." Zeek [intelligence](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html) [indicator types](https://docs.zeek.org/en/master/scripts/base/frameworks/intel/main.zeek.html#type-Intel::Type) include IP addresses, URLs, file names, hashes, email addresses, and more. - -Hedgehog Linux doesn't come bundled with intelligence files from any particular feed, but they can be easily included into your local instance. Before Zeek starts, Hedgehog Linux configures it such that intelligence files will be automatically included in its local policy. Subdirectories under `/opt/sensor/sensor_ctl/zeek/intel` which contain their own `__load__.zeek` file will be `@load`-ed as-is, while subdirectories containing "loose" intelligence files will be [loaded](https://docs.zeek.org/en/master/frameworks/intel.html#loading-intelligence) automatically with a `redef Intel::read_files` directive. - -Note that Hedgehog Linux does not manage updates for these intelligence files. You should use the update mechanism suggested by your feeds' maintainers to keep them up to date. Adding and deleting intelligence files under this directory will take effect upon restarting Zeek. - -# Appendix A - Generating the ISO - -Official downloads of the Hedgehog Linux installer ISO are not provided: however, it can be built easily on an internet-connected Linux host with Vagrant: - -* [Vagrant](https://www.vagrantup.com/) - - [`vagrant-reload`](https://github.com/aidanns/vagrant-reload) plugin - - [`vagrant-sshfs`](https://github.com/dustymabe/vagrant-sshfs) plugin - - [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box - -The build should work with either the [VirtualBox](https://www.virtualbox.org/) provider or the [libvirt](https://libvirt.org/) provider: - -* [VirtualBox](https://www.virtualbox.org/) [provider](https://www.vagrantup.com/docs/providers/virtualbox) - - [`vagrant-vbguest`](https://github.com/dotless-de/vagrant-vbguest) plugin -* [libvirt](https://libvirt.org/) - - [`vagrant-libvirt`](https://github.com/vagrant-libvirt/vagrant-libvirt) provider plugin - - [`vagrant-mutate`](https://github.com/sciurus/vagrant-mutate) plugin to convert [`bento/debian-11`](https://app.vagrantup.com/bento/boxes/debian-11) Vagrant box to `libvirt` format - -To perform a clean build the Hedgehog Linux installer ISO, navigate to your local [Malcolm](https://github.com/idaholab/Malcolm/) working copy and run: - -``` -$ ./sensor-iso/build_via_vagrant.sh -f -… -Starting build machine... -Bringing machine 'default' up with 'virtualbox' provider... -… -``` - -Building the ISO may take 90 minutes or more depending on your system. As the build finishes, you will see the following message indicating success: - -``` -… -Finished, created "/sensor-build/hedgehog-6.4.0.iso" -… -``` - -Alternately, if you have forked Malcolm on GitHub, [workflow files](../.github/workflows/) are provided which contain instructions for GitHub to build the docker images and Hedgehog and [Malcolm](https://github.com/idaholab/Malcolm) installer ISOs, specifically [`sensor-iso-build-docker-wrap-push-ghcr.yml`](../.github/workflows/sensor-iso-build-docker-wrap-push-ghcr.yml) for the Hedgehog ISO. The resulting ISO file is wrapped in a Docker image that provides an HTTP server from which the ISO may be downloaded. - -# Appendix B - Configuring SSH access - -SSH access to the sensor's non-privileged sensor account is only available using secure key-based authentication which can be enabled by adding a public SSH key to the **/home/sensor/.ssh/authorized_keys** file as illustrated below: - -``` -sensor@sensor:~$ mkdir -p ~/.ssh - -sensor@sensor:~$ ssh analyst@172.16.10.48 "cat ~/.ssh/id_rsa.pub" >> ~/.ssh/authorized_keys -The authenticity of host '172.16.10.48 (172.16.10.48)' can't be established. -ECDSA key fingerprint is SHA256:... -Are you sure you want to continue connecting (yes/no)? yes -Warning: Permanently added '172.16.10.48' (ECDSA) to the list of known hosts. -analyst@172.16.10.48's password: - -sensor@sensor:~$ cat ~/.ssh/authorized_keys -ssh-rsa AAA...kff analyst@SOC -``` - -SSH access should only be configured when necessary. - -# Appendix C - Troubleshooting - -Should the sensor not function as expected, first try rebooting the device. If the behavior continues, here are a few things that may help you diagnose the problem (items which may require Linux command line use are marked with **†**) - -* Stop / start services – Using the sensor's kiosk mode, attempt a **Services Stop** followed by a **Services Start**, then check **Sensor Status** to see which service(s) may not be running correctly. -* Sensor configuration file – See `/opt/sensor/sensor_ctl/control_vars.conf` for sensor service settings. It is not recommended to manually edit this file unless you are sure of what you are doing. -* Sensor control scripts – There are scripts under ``/opt/sensor/sensor_ctl/`` to control sensor services (eg., `shutdown`, `start`, `status`, `stop`, etc.) -* Sensor debug logs – Log files under `/opt/sensor/sensor_ctl/log/` may contain clues to processes that are not working correctly. If you can determine which service is failing, you can attempt to reconfigure it using the instructions in the Configure Capture and Forwarding section of this document. -* `sensorwatch` script – Running `sensorwatch` on the command line will display the most recently modified PCAP and Zeek log files in their respective directories, how much storage space they are consuming, and the amount of used/free space on the volumes containing those files. - -# Appendix D - Hardening - -Hedgehog Linux uses the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmarks which target the following guidelines for establishing a secure configuration posture: - -* [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) -* [DISA STIG (Security Technical Implementation Guides) for RHEL 7](https://www.stigviewer.com/stig/red_hat_enterprise_linux_7/) v2r5 Ubuntu v1r2 [adapted](https://github.com/hardenedlinux/STIG-OS-mirror/blob/master/redhat-STIG-DOCs/U_Red_Hat_Enterprise_Linux_7_V2R5_STIG.zip) for a Debian operating system -* Additional recommendations from [cisecurity.org](https://www.cisecurity.org/) - -## Compliance Exceptions - -[Currently](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) there are 274 checks to determine compliance with the [harbian-audit](https://github.com/hardenedlinux/harbian-audit) benchmark. - -Hedgehog Linux claims exceptions from the recommendations in this benchmark in the following categories: - -**1.1 Install Updates, Patches and Additional Security Software** - When the the Malcolm aggregator appliance software is built, all of the latest applicable security patches and updates are included in it. How future updates are to be handled is still in design. - -**1.3 Enable verify the signature of local packages** - As the base distribution is not using embedded signatures, `debsig-verify` would reject all packages (see comment in `/etc/dpkg/dpkg.cfg`). Enabling it after installation would disallow any future updates. - -**2.14 Add nodev option to /run/shm Partition**, **2.15 Add nosuid Option to /run/shm Partition**, **2.16 Add noexec Option to /run/shm Partition** - Hedgehog Linux does not mount `/run/shm` as a separate partition, so these recommendations do not apply. - -**2.19 Disable Mounting of freevxfs Filesystems**, **2.20 Disable Mounting of jffs2 Filesystems**, **2.21 Disable Mounting of hfs Filesystems**, **2.22 Disable Mounting of hfsplus Filesystems**, **2.23 Disable Mounting of squashfs Filesystems**, **2.24 Disable Mounting of udf Filesystems** - Hedgehog Linux is not compiling a custom Linux kernel, so these filesystems are inherently supported as they are part Debian Linux's default kernel. - -**3.3 Set Boot Loader Password** - As maximizing availability is a system requirement, Malcolm should restart automatically without user intervention to ensured uninterrupted service. A boot loader password is not enabled. - -**4.8 Disable USB Devices** - The ability to ingest data (such as PCAP files) from a mounted USB mass storage device is a requirement of the system. - -**6.1 Ensure the X Window system is not installed**, **6.2 Ensure Avahi Server is not enabled**, **6.3 Ensure print server is not enabled** - An X Windows session is provided for displaying dashboards. The library packages `libavahi-common-data`, `libavahi-common3`, and `libcups2` are dependencies of some of the X components used by Hedgehog Linux, but the `avahi` and `cups` services themselves are disabled. - -**6.17 Ensure virus scan Server is enabled**, **6.18 Ensure virus scan Server update is enabled** - As this is a network traffic analysis appliance rather than an end-user device, regular user files will not be created. A virus scan program would impact device performance and would be unnecessary. - -**7.1.1 Disable IP Forwarding**, **7.2.4 Log Suspicious Packets**, **7.2.7 Enable RFC-recommended Source Route Validation**, **7.4.1 Install TCP Wrappers** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, these recommendations do not apply. - -**8.1.1.2 Disable System on Audit Log Full**, **8.1.1.3 Keep All Auditing Information**, **8.1.1.5 Ensure set remote_server for audit service**, **8.1.1.6 Ensure enable_krb5 set to yes for remote audit service**, **8.1.1.7 Ensure set action for audit storage volume is fulled**, **8.1.1.8 Ensure set action for network failure on remote audit service**, **8.1.1.9 Set space left for auditd service**, a few other audit-related items under section **8.1**, **8.2.4 Configure rsyslog to Send Logs to a Remote Log Host** - As maximizing availability is a system requirement, audit processing failures will be logged on the device rather than halting the system. `auditd` is set up to syslog when its local storage capacity is reached. - -**8.4.2 Implement Periodic Execution of File Integrity** - This functionality is not configured by default, but it can be configured post-install by the end user. - -Password-related recommendations under **9.2** and **10.1** - The library package `libpam-pwquality` is used in favor of `libpam-cracklib` which is what the [compliance scripts](https://github.com/hardenedlinux/harbian-audit/tree/master/bin/hardening) are looking for. Also, as an appliance running Malcolm is intended to be used as an appliance rather than a general user-facing software platform, some exceptions to password enforcement policies are claimed. - -**9.3.13 Limit Access via SSH** - Hedgehog Linux does not create multiple regular user accounts: only `root` and a sensor service account are used. SSH access for `root` is disabled. SSH login with a password is also disallowed: only key-based authentication is accepted. The service account accepts no keys by default. As such, the `AllowUsers`, `AllowGroups`, `DenyUsers`, and `DenyGroups` values in `sshd_config` do not apply. - -**9.4 Restrict Access to the su Command** - Hedgehog Linux does not create multiple regular user accounts: only `root` and a sensor service account are used. - -**10.1.6 Remove nopasswd option from the sudoers configuration** - A very limited set of operations (a single script used to run the AIDE integrity check as a non-root user) has the NOPASSWD option set to allow it to be run in the background without user intervention. - -**10.1.10 Set maxlogins for all accounts** and **10.5 Set Timeout on ttys** - Hedgehog Linux does not create multiple regular user accounts: only `root` and a sensor service account are used. - -**12.10 Find SUID System Executables**, **12.11 Find SGID System Executables** - The few files found by [these](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.10_find_suid_files.sh) [scripts](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/12.11_find_sgid_files.sh) are valid exceptions required by Hedgehog Linux's core requirements. - -**14.1 Defense for NAT Slipstreaming** - As Malcolm may operate as a network traffic capture appliance sniffing packets on a network interface configured in promiscuous mode, this recommendation does not apply. - -Please review the notes for these additional guidelines. While not claiming an exception, Hedgehog Linux may implement them in a manner different than is described by the [CIS Debian Linux 9/10 Benchmark](https://www.cisecurity.org/cis-benchmarks/cis-benchmarks-faq/) or the [hardenedlinux/harbian-audit](https://github.com/hardenedlinux/harbian-audit) audit scripts. - -**4.1 Restrict Core Dumps** - Hedgehog Linux disables core dumps using a configuration file for `ulimit` named `/etc/security/limits.d/limits.conf`. The [audit script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/4.1_restrict_core_dumps.sh) checking for this does not check the `limits.d` subdirectory, which is why this is incorrectly flagged as noncompliant. - -**5.4 Ensure ctrl-alt-del is disabled** - Hedgehog Linux disables the `ctrl+alt+delete` key sequence by executing `systemctl disable ctrl-alt-del.target` during installation and the command `systemctl mask ctrl-alt-del.target` at boot time. - -**7.4.4 Create /etc/hosts.deny**, **7.7.1 Ensure Firewall is active**, **7.7.4.1 Ensure default deny firewall policy**, **7.7.4.2 Ensure loopback traffic is configured**, **7.7.4.3 Ensure default deny firewall policy**, **7.7.4.4 Ensure outbound and established connections are configured** - Hedgehog Linux **is** configured with an appropriately locked-down software firewall (managed by "Uncomplicated Firewall" `ufw`). However, the methods outlined in the CIS benchmark recommendations do not account for this configuration. - -**8.6 Verifies integrity all packages** - The [script](https://github.com/hardenedlinux/harbian-audit/blob/master/bin/hardening/8.7_verify_integrity_packages.sh) which verifies package integrity only "fails" because of missing (status `??5??????` displayed by the utility) language ("locale") files, which are removed as part of Hedgehog Linux's trimming-down process. All non-locale-related system files pass intergrity checks. - -# Appendix E - Upgrades - -At this time there is not an "official" upgrade procedure to get from one release of Hedgehog Linux to the next. Upgrading the underlying operating system packages is generally straightforward, but not all of the Hedgehog Linux components are packaged into .deb archives yet as they should be, so for now it's a manual (and kind of nasty) process to Frankenstein an upgrade into existance. The author of this project intends to remedy this at some future point when time and resources allow. - -If possible, it would save you **a lot** of trouble to just [re-ISO](#Installation) your Hedgehog installation and start fresh, backing up the files (in `/opt/sensor/sensor_ctl`) first and reconfiguring or restoring them as needed afterwards. - -However, if reinstalling the system is not an option, here is the basic process for doing a manual upgrade of Hedgehog Linux. It should be understood that this process is very likely to break your system, and there is **no** guarantee of any kind that any of this will work, or that these instructions are even complete or any support whatsoever regarding them. Really, it will be **much** easier if you re-ISO your installation. But for the brave among you, here you go. ⛔🆘😭💀 - -## Prerequisites - -* A good understanding of the Linux command line -* An existing installation of Hedgehog Linux **with internet access** -* A copy of the Hedgehog Linux [ISO](#ISOBuild) for the version approximating the one you're upgrading to (i.e., the latest version), **and** - - Either a separate VM with that ISO installed **OR** - - A separate Linux workstation where you can manually mount that ISO to pull stuff off of it - -## Upgrade - -1. Obtain a root shell - - `su -` - -2. Temporarily set the umask value to Debian default instead of the more restrictive Hedgehog Linux default. This will allow updates to be applied with the right permissions. - - `umask 0022` - -3. Create backups of some files - - `cp /etc/apt/sources.list /etc/apt/sources.list.bak` - -4. Set up alternate package sources, if needed - - In an offline/airgapped scenario, you could use [apt-mirror](https://apt-mirror.github.io) to mirror Debian repos and [bandersnatch](https://github.com/pypa/bandersnatch/) to mirror PyPI sources, or [combine them](https://github.com/mmguero/espejo) with Docker. If you were to do this, you'd probably want to make the following changes (and **revert them after the upgrade**): - + create `/etc/apt/apt.conf.d/80ssl-exceptions` to ignore self-signed certificate warnings from using your apt-mirror -``` -Acquire::https { - Verify-Peer "false"; - Verify-Host "false"; -} -``` - - + modify `/etc/apt/source.list` to point to your apt-mirror: - -``` -deb https://XXXXXX:443/debian buster main contrib non-free -deb https://XXXXXX:443/debian-security buster/updates main contrib non-free -deb https://XXXXXX:443/debian buster-updates main contrib non-free -deb https://XXXXXX:443/debian buster-backports main contrib non-free -``` - -5. Update underlying system packages with `apt-get` - - `apt-get update && apt-get dist-upgrade` - -6. If there were [new system deb packages added](https://github.com/idaholab/Malcolm/tree/main/sensor-iso/config/package-lists) to this release of Hedgehog Linux (you might have to [manually compare](https://github.com/idaholab/Malcolm/commits/main/sensor-iso/config/package-lists) on GitHub), install them. If you're not sure, of course, you could just install everything, like this (although you may have to tweak some version numbers or something if the base distribution of your Hedgehog branch is different than `main`; in this example I'm not jumping between Debian releases, just upgrading within a release): -``` -$ for LIST in apps desktopmanager net system; do curl -L -J -O https://raw.github.com/idaholab/Malcolm/main/sensor-iso/config/package-lists/$LIST.list.chroot; done -... -$ apt-get install $(cat *.list.chroot) -``` - -7. Update underlying python packages with `python3 -m pip` - * `apt-get install -y build-essential git-core pkg-config python3-dev` - * `python3 -m pip list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -r -n1 python3 -m pip install -U` - - if this fails for some reason, you may need to reinstall pip first with `python3 -m pip install --force -U pip` - - some *very* old builds of Hedgehog Linux had separate Python 3.5 and 3.7 installations: in this case, you'd need to do this for both `python3 -m pip` and `python3.7 -m pip` (or whatever `python3.x` you have) - * If there were [new python packages](https://raw.githubusercontent.com/idaholab/Malcolm/master/sensor-iso/config/hooks/normal/0169-pip-installs.hook.chroot) added to this release of Hedgehog Linux (you might have to [manually compare](https://github.com/idaholab/Malcolm/blame/main/sensor-iso/config/hooks/normal/0169-pip-installs.hook.chroot) on GitHub), install them. If you are using a PyPI mirror, replace `XXXXXX` here with your mirror's IP. The `colorama` package is used here as an example, your package list might vary. - - `python3 -m pip install --no-compile --no-cache-dir --force-reinstall --upgrade --index-url=https://XXXXXX:443/pypi/simple --trusted-host=XXXXXX:443 colorama` - -8. Okay, **now** things start to get a little bit ugly. You're going to need access to the ISO of the release of Hedgehog Linux you're upgrading to, as we're going to grab some packages off of it. On another Linux system, [build it](#ISOBuild). - -9. Use a disk image mounter to mount the ISO, **or** if you want to just install the ISO in a VM and grab the files we need off of it, that's fine too. But I'll go through the example as if I've mounted the ISO. - -10. Navigate to the `/live/` directory, and mount the `filesystem.squashfs` file - - `sudo mount filesystem.squashfs /media/squash -t squashfs -o loop` - - **OR** - - `squashfuse filesystem.squashfs /home/user/media/squash` - -11. Very recent builds of Hedgehog Linux keep some build artifacts in `/opt/hedgehog_install_artifacts/`. You're going to want to grab those files and throw them in a temporary directory on the system you're upgrading, via SSH or whatever means you devise. -``` -root@hedgehog:/tmp# scp -r user@otherbox:/media/squash/opt/hedgehog_install_artifacts/ ./ -user@otherbox's password: -filebeat-tweaked-7.6.2-amd64.deb 100% 13MB 65.9MB/s 00:00 -arkime_2.2.3-1_amd64.deb 100% 113MB 32.2MB/s 00:03 -netsniff-ng_0.6.6-1_amd64.deb 100% 330KB 52.1MB/s 00:00 -zeek_3.0.20-1_amd64.deb 100% 26MB 63.1MB/s 00:00 -``` - -12. Blow away the old `zeek` package, we're going to start clean with that one particularly. The others should be fine to upgrade in place. -``` -root@hedgehog:/opt# apt-get --purge remove zeek -Reading package lists... Done -Building dependency tree -Reading state information... Done -The following packages will be REMOVED: - zeek* -0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded. -After this operation, 160 MB disk space will be freed. -Do you want to continue? [Y/n] y -(Reading database ... 118490 files and directories currently installed.) -Removing zeek (3.0.20-1) ... -dpkg: warning: while removing zeek, directory '/opt/zeek/spool' not empty so not removed -dpkg: warning: while removing zeek, directory '/opt/zeek/share/zeek/site' not empty so not removed -dpkg: warning: while removing zeek, directory '/opt/zeek/lib' not empty so not removed -dpkg: warning: while removing zeek, directory '/opt/zeek/bin' not empty so not removed -root@hedgehog:/opt# rm -rf /opt/zeek* -``` - -13. Install the new .deb files. You're going to have some warnings, but that's okay. -``` -root@hedgehog:/tmp# dpkg -i hedgehog_install_artifacts/*.deb -(Reading database ... 118149 files and directories currently installed.) -Preparing to unpack .../filebeat-tweaked-7.6.2-amd64.deb ... -Unpacking filebeat (7.6.2) over (6.8.4) ... -dpkg: warning: unable to delete old directory '/usr/share/filebeat/kibana/6/dashboard': Directory not empty -dpkg: warning: unable to delete old directory '/usr/share/filebeat/kibana/6': Directory not empty -Preparing to unpack .../arkime_2.2.3-1_amd64.deb ... -Unpacking arkime (2.2.3-1) over (2.0.1-1) ... -Preparing to unpack .../netsniff-ng_0.6.6-1_amd64.deb ... -Unpacking netsniff-ng (0.6.6-1) over (0.6.6-1) ... -Preparing to unpack .../zeek_3.0.20-1_amd64.deb ... -Unpacking zeek (3.0.20-1) over (3.0.0-1) ... -Setting up filebeat (7.6.2) ... -Installing new version of [...] -[...] -Setting up arkime (2.2.3-1) ... -READ /opt/arkime/README.txt and RUN /opt/arkime/bin/Configure -Setting up netsniff-ng (0.6.6-1) ... -Setting up zeek (3.0.20-1) ... -Processing triggers for systemd (232-25+deb9u12) ... -Processing triggers for man-db (2.7.6.1-2) ... -``` - -14. Fix anything that might need fixing as far as the deb package requirements go - - `apt-get -f install` - -15. We just installed a Zeek .deb, but the third-part plugins packages and local config weren't part of that package. So we're going to `rsync` those from the other box where we have the ISO and `filesystem.squashfs` mounted as well: -``` -root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/opt/zeek/ /opt/zeek -user@otherbox's password: - -root@hedgehog:/tmp# ls -l /opt/zeek/share/zeek/site/ -total 52 -lrwxrwxrwx 1 root root 13 May 6 21:52 bzar -> packages/bzar -lrwxrwxrwx 1 root root 22 May 6 21:50 cve-2020-0601 -> packages/cve-2020-0601 --rw-r--r-- 1 root root 2031 Apr 30 16:02 extractor.zeek --rw-r--r-- 1 root root 39134 May 1 14:20 extractor_params.zeek -lrwxrwxrwx 1 root root 14 May 6 21:52 hassh -> packages/hassh -lrwxrwxrwx 1 root root 12 May 6 21:52 ja3 -> packages/ja3 --rw-rw-r-- 1 root root 2005 May 6 21:54 local.zeek -drwxr-xr-x 13 root root 4096 May 6 21:52 packages -lrwxrwxrwx 1 root root 27 May 6 21:52 zeek-EternalSafety -> packages/zeek-EternalSafety -lrwxrwxrwx 1 root root 26 May 6 21:52 zeek-community-id -> packages/zeek-community-id -lrwxrwxrwx 1 root root 27 May 6 21:51 zeek-plugin-bacnet -> packages/zeek-plugin-bacnet -lrwxrwxrwx 1 root root 25 May 6 21:51 zeek-plugin-enip -> packages/zeek-plugin-enip -lrwxrwxrwx 1 root root 29 May 6 21:51 zeek-plugin-profinet -> packages/zeek-plugin-profinet -lrwxrwxrwx 1 root root 27 May 6 21:52 zeek-plugin-s7comm -> packages/zeek-plugin-s7comm -lrwxrwxrwx 1 root root 24 May 6 21:52 zeek-plugin-tds -> packages/zeek-plugin-tds -``` - -16. The `zeekctl` component of zeek doesn't like being run by an unprivileged user unless the whole directory is owned by that user. As Hedgehog Linux runs everything it can as an unprivileged user, we're going to reset zeek to a "clean" state after each reboot. Zeek's config files will get regenerated when Zeek itself is started. So, now make a complete backup of `/opt/zeek` as it's going to have its ownership changed during runtime: -``` -root@hedgehog:/tmp# rsync -a /opt/zeek/ /opt/zeek.orig - -root@hedgehog:/tmp# chown -R sensor:sensor /opt/zeek/* - -root@hedgehog:/tmp# chown -R root:root /opt/zeek.orig/* - -root@hedgehog:/tmp# ls -l /opt/ | grep zeek -drwxr-xr-x 8 root root 4096 May 8 15:48 zeek -drwxr-xr-x 8 root root 4096 May 8 15:48 zeek.orig -``` - -17. Grab other new scripts and stuff from our mount of the ISO using `rsync`: -``` -root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/usr/local/bin/ /usr/local/bin -user@otherbox's password: - -root@hedgehog:/tmp# ls -l /usr/local/bin/ | tail -lrwxrwxrwx 1 root root 18 May 8 14:34 zeek -> /opt/zeek/bin/zeek --rwxr-xr-x 1 root staff 10349 Oct 29 2019 zeek_carve_logger.py --rwxr-xr-x 1 root staff 10467 Oct 29 2019 zeek_carve_scanner.py --rw-r--r-- 1 root staff 25756 Oct 29 2019 zeek_carve_utils.py --rwxr-xr-x 1 root staff 8787 Oct 29 2019 zeek_carve_watcher.py --rwxr-xr-x 1 root staff 4883 May 4 17:39 zeek_install_plugins.sh - -root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/opt/yara-rules/ /opt/yara-rules -user@otherbox's password: - -root@hedgehog:/tmp# rsync -a user@otherbox:/media/squash/opt/capa-rules/ /opt/capa-rules -user@otherbox's password: - -root@hedgehog:/tmp# ls -l /opt/ | grep '\-rules' -drwxr-xr-x 8 root root 4096 May 8 15:48 capa-rules -drwxr-xr-x 8 root root 24576 May 8 15:48 yara-rules - -root@hedgehog:/tmp# for BEAT in filebeat; do rsync -a user@otherbox:/media/squash/usr/share/$BEAT/kibana/ /usr/share/$BEAT/kibana; done -user@otherbox's password: -user@otherbox's password: - -root@hedgehog:/tmp# rsync -avP --delete user@otherbox:/media/squash/etc/audit/rules.d/ /etc/audit/rules.d/ -user@otherbox's password: - -root@hedgehog:/tmp# rsync -avP --delete user@otherbox:/media/squash/etc/sudoers.d/ /etc/sudoers.d/ -user@otherbox's password: - -root@hedgehog:/tmp# chmod 400 /etc/sudoers.d/* -``` - -18. Set capabilities and symlinks for network capture programs to be used by the unprivileged user: - -commands: - -``` -chown root:netdev /usr/sbin/netsniff-ng && \ - setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip CAP_SYS_ADMIN+eip' /usr/sbin/netsniff-ng -chown root:netdev /opt/zeek/bin/zeek && \ - setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/zeek/bin/zeek -chown root:netdev /sbin/ethtool && \ - setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /sbin/ethtool -chown root:netdev /opt/zeek/bin/capstats && \ - setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /opt/zeek/bin/capstats -chown root:netdev /usr/bin/tcpdump && \ - setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /usr/bin/tcpdump -chown root:netdev /opt/arkime/bin/capture && \ - setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/arkime/bin/capture - -ln -s -f /opt/zeek/bin/zeek /usr/local/bin/ -ln -s -f /usr/sbin/netsniff-ng /usr/local/bin/ -ln -s -f /usr/bin/tcpdump /usr/local/bin/ -ln -s -f /opt/arkime/bin/capture /usr/local/bin/ -ln -s -f /opt/arkime/bin/npm /usr/local/bin -ln -s -f /opt/arkime/bin/node /usr/local/bin -ln -s -f /opt/arkime/bin/npx /usr/local/bin -``` - -example: - -``` -root@hedgehog:/tmp# chown root:netdev /usr/sbin/netsniff-ng && \ -> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip CAP_SYS_ADMIN+eip' /usr/sbin/netsniff-ng -root@hedgehog:/tmp# chown root:netdev /opt/zeek/bin/zeek && \ -> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/zeek/bin/zeek -root@hedgehog:/tmp# chown root:netdev /sbin/ethtool && \ -> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /sbin/ethtool -root@hedgehog:/tmp# chown root:netdev /opt/zeek/bin/capstats && \ -> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /opt/zeek/bin/capstats -root@hedgehog:/tmp# chown root:netdev /usr/bin/tcpdump && \ -> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /usr/bin/tcpdump -root@hedgehog:/tmp# chown root:netdev /opt/arkime/bin/capture && \ -> setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip CAP_IPC_LOCK+eip' /opt/arkime/bin/capture -root@hedgehog:/tmp# ln -s -f /opt/zeek/bin/zeek /usr/local/bin/ -root@hedgehog:/tmp# ln -s -f /usr/sbin/netsniff-ng /usr/local/bin/ -root@hedgehog:/tmp# ln -s -f /usr/bin/tcpdump /usr/local/bin/ -root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/capture /usr/local/bin/ -root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/npm /usr/local/bin -root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/node /usr/local/bin -root@hedgehog:/tmp# ln -s -f /opt/arkime/bin/npx /usr/local/bin -``` - -19. Back up unprivileged user sensor-specific config and scripts: - - `mv /opt/sensor/ /opt/sensor_upgrade_backup_$(date +%Y-%m-%d)` - -20. Grab unprivileged user sensor-specific config and scripts from our mount of the ISO using `rsync` and change its ownership to the unprivileged user: -``` -root@hedgehog:/tmp# rsync -av user@otherbox:/media/squash/opt/sensor /opt/ -user@otherbox's password: -receiving incremental file list -created directory ./opt -sensor/ -[...] - -sent 1,244 bytes received 1,646,409 bytes 470,758.00 bytes/sec -total size is 1,641,629 speedup is 1.00 - -root@hedgehog:/tmp# chown -R sensor:sensor /opt/sensor* - -root@hedgehog:/tmp# ls -l /opt/ | grep sensor -drwxr-xr-x 4 sensor sensor 4096 May 6 22:00 sensor -drwxr-x--- 4 sensor sensor 4096 May 8 14:33 sensor_upgrade_backup_2020-05-08 -``` - -21. Leave the root shell and `cd` to `/opt` -``` -root@hedgehog:~# exit -logout - -sensor@hedgehog:~$ whoami -sensor - -sensor@hedgehog:~$ cd /opt -``` - -22. Compare the old and new `control_vars.conf` files -``` -sensor@hedgehog:opt$ diff sensor_upgrade_backup_2020-05-08/sensor_ctl/control_vars.conf sensor/sensor_ctl/control_vars.conf -1,2c1,2 -< export CAPTURE_INTERFACE=enp0s3 -< export CAPTURE_FILTER="not port 5044 and not port 5601 and not port 8005 and not port 9200 and not port 9600" ---- -> export CAPTURE_INTERFACE=xxxx -> export CAPTURE_FILTER="" -4c4 -[...] -``` - -Examine the differences. If there aren't any new `export` variables, then you're probably safe to just replace the default version of `control_vars.conf` with the backed-up version: - -``` -sensor@hedgehog:opt$ cp sensor_upgrade_backup_2020-05-08/sensor_ctl/control_vars.conf sensor/sensor_ctl/control_vars.conf -cp: overwrite 'sensor/sensor_ctl/control_vars.conf'? y -``` - -If there are major differences or new variables, continue on to the next step, in a minute you'll need to run `capture-config` to configure from scratch anyway. - -23. Restore certificates/keystores for forwarders from the backup `sensor_ctl` path to the new one -``` -sensor@hedgehog:opt$ for BEAT in filebeat miscbeat; do cp /opt/sensor_upgrade_backup_2020-05-08/sensor_ctl/$BEAT/data/* /opt/sensor/sensor_ctl/$BEAT/data/; done - -sensor@hedgehog:opt$ cp /opt/sensor_upgrade_backup_2020-05-07/sensor_ctl/filebeat/{ca.crt,client.crt,client.key} /opt/sensor/sensor_ctl/logstash-client-certificates/ -``` - -24. Despite what we just did, you may consider running `capture-config` to re-configure [capture, forwarding, and autostart services](#ConfigUser) from scratch anyway. You can use the backed-up version of `control_vars.conf` to refer back to as a basis for things you might want to restore (e.g., `CAPTURE_INTERFACE`, `CAPTURE_FILTER`, `PCAP_PATH`, `ZEEK_LOG_PATH`, your autostart settings, etc.). - -25. Once you feel confident you've completed all of these steps, issue a reboot on the Hedgehog - -## Post-upgrade - -Once the Hedgehog has come back up, check to make sure everything is working: - -* `/opt/sensor/sensor_ctl/status` should return `RUNNING` for the things you set to autorun (no `FATAL` errors) -* `sensorwatch` should show current writes to Zeek log files and PCAP files (depending on your configuration) -* `tail -f /opt/sensor/sensor_ctl/log/*` should show no egregious errors -* `zeek --version`, `zeek -N local` and `capture --version` ought to run and print out version information as expected -* if you are forwarding to a [Malcolm](https://github.com/idaholab/Malcolm) aggregator, you should start seeing data momentarily - -# Copyright - -Hedgehog Linux - part of [Malcolm](https://github.com/idaholab/Malcolm) - is Copyright 2022 Battelle Energy Alliance, LLC, and is developed and released through the cooperation of the Cybersecurity and Infrastructure Security Agency of the U.S. Department of Homeland Security. - -See [`License.txt`](https://raw.githubusercontent.com/idaholab/Malcolm/main/License.txt) for the terms of its release. - -### Contact information of author(s): - -[malcolm@inl.gov](mailto:malcolm@inl.gov?subject=Network%20sensor%20development) diff --git a/sensor-iso/doc.css b/sensor-iso/doc.css deleted file mode 100644 index cfef2a644..000000000 --- a/sensor-iso/doc.css +++ /dev/null @@ -1,324 +0,0 @@ -html { - font-size: 100%; - overflow-y: scroll; - -webkit-text-size-adjust: 100%; - -ms-text-size-adjust: 100%; -} - -body { - color: #444; - font-family: Georgia, Palatino, 'Palatino Linotype', Times, 'Times New Roman', serif; - font-size: 12px; - line-height: 1.7; - padding: 1em; - margin: auto; - max-width: 1366px; - background: #fefefe; -} - -a { - color: #0645ad; - text-decoration: none; -} - -a:visited { - color: #0b0080; -} - -a:hover { - color: #06e; -} - -a:active { - color: #faa700; -} - -a:focus { - outline: thin dotted; -} - -*::-moz-selection { - background: rgba(255, 255, 0, 0.3); - color: #000; -} - -*::selection { - background: rgba(255, 255, 0, 0.3); - color: #000; -} - -a::-moz-selection { - background: rgba(255, 255, 0, 0.3); - color: #0645ad; -} - -a::selection { - background: rgba(255, 255, 0, 0.3); - color: #0645ad; -} - -p { - margin: 1em 0; -} - -img { - max-width: 100%; -} - -h1, h2, h3, h4, h5, h6 { - color: #111; - line-height: 125%; - margin-top: 2em; - font-weight: normal; -} - -h4, h5, h6 { - font-weight: bold; -} - -h1 { - font-size: 2.5em; -} - -h2 { - font-size: 2em; -} - -h3 { - font-size: 1.5em; -} - -h4 { - font-size: 1.2em; -} - -h5 { - font-size: 1em; -} - -h6 { - font-size: 0.9em; -} - -blockquote { - color: #666666; - margin: 0; - padding-left: 3em; - border-left: 0.5em #EEE solid; -} - -hr { - display: block; - height: 2px; - border: 0; - border-top: 1px solid #aaa; - border-bottom: 1px solid #eee; - margin: 1em 0; - padding: 0; -} - -pre, code, kbd, samp { - color: #000; - font-family: monospace, monospace; - _font-family: 'courier new', monospace; - font-size: 0.98em; -} - -pre { - white-space: pre; - white-space: pre-wrap; - word-wrap: break-word; -} - -b, strong { - font-weight: bold; -} - -dfn { - font-style: italic; -} - -ins { - background: #ff9; - color: #000; - text-decoration: none; -} - -mark { - background: #ff0; - color: #000; - font-style: italic; - font-weight: bold; -} - -sub, sup { - font-size: 75%; - line-height: 0; - position: relative; - vertical-align: baseline; -} - -sup { - top: -0.5em; -} - -sub { - bottom: -0.25em; -} - -ul, ol { - margin: 1em 0; - padding: 0 0 0 2em; -} - -li p:last-child { - margin-bottom: 0; -} - -ul ul, ol ol { - margin: .3em 0; -} - -dl { - margin-bottom: 1em; -} - -dt { - font-weight: bold; - margin-bottom: .8em; -} - -dd { - margin: 0 0 .8em 2em; -} - -dd:last-child { - margin-bottom: 0; -} - -img { - border: 0; - -ms-interpolation-mode: bicubic; - vertical-align: middle; -} - -figure { - display: block; - text-align: center; - margin: 1em 0; -} - -figure img { - border: none; - margin: 0 auto; -} - -p.caption, figcaption { - font-size: 0.8em; - font-style: italic; - margin: 0 0 .8em; -} - -table { - margin-bottom: 2em; - border-bottom: 1px solid #ddd; - border-right: 1px solid #ddd; - border-spacing: 0; - border-collapse: collapse; -} - -table th { - padding: .2em 1em; - background-color: #eee; - border-top: 1px solid #ddd; - border-left: 1px solid #ddd; -} - -table td { - padding: .2em 1em; - border-top: 1px solid #ddd; - border-left: 1px solid #ddd; - vertical-align: top; -} - -.author { - font-size: 1.2em; - text-align: center; -} - -@media only screen and (min-width: 480px) { - body { - font-size: 14px; - } -} -@media only screen and (min-width: 768px) { - body { - font-size: 16px; - } -} -@media print { - * { - background: transparent !important; - color: black !important; - filter: none !important; - -ms-filter: none !important; - } - - body { - font-size: 12pt; - max-width: 100%; - } - - a, a:visited { - text-decoration: underline; - } - - hr { - height: 1px; - border: 0; - border-bottom: 1px solid black; - } - - a[href]:after { - content: " (" attr(href) ")"; - } - - abbr[title]:after { - content: " (" attr(title) ")"; - } - - .ir a:after, a[href^="javascript:"]:after, a[href^="#"]:after { - content: ""; - } - - pre, blockquote { - border: 1px solid #999; - padding-right: 1em; - page-break-inside: avoid; - } - - tr, img { - page-break-inside: avoid; - } - - img { - max-width: 100% !important; - } - - @page :left { - margin: 15mm 20mm 15mm 10mm; -} - - @page :right { - margin: 15mm 10mm 15mm 20mm; -} - - p, h2, h3 { - orphans: 3; - widows: 3; - } - - h2, h3 { - page-break-after: avoid; - } -}