From 25e8a865c09dfcfa327b06d6dc06dbfd98e04da6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Str=C3=B6mberg?= <t+github@chainguard.dev>
Date: Tue, 26 Nov 2024 08:02:53 -0500
Subject: [PATCH] Reframe README around the concept of differential analysis
 (#663)

* Reframe README around the concept of differential analysis

* more tuning
---
 README.md | 75 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/README.md b/README.md
index d4884c39f..9ce9fbafb 100644
--- a/README.md
+++ b/README.md
@@ -13,32 +13,66 @@
             subtle malware discovery tool
 ```
 
-malcontent detects supply-chain compromises and other malicious software. It has 3 modes of operation:
+malcontent discovers supply-chain compromises through the magic of context, differential analysis, and 14,000+ YARA rules.
 
-* ✨`diff`: show the risk-weighted capability drift between two versions of a program
-  * ☝️ **Our bread & butter: malcontent does this better than anyone else**
+
+```
+ ________      ________      ________      ________
+|        |    |        |    |        |    |        |
+| v1.0.0 | => | v1.0.1 | => | v1.0.2 | => | v1.0.3 |
+|________|    |________|    |________|    |________|
+
+               unchanged     HIGH-RISK     decreased
+               risk          increase      risk
+
+```
+
+malcontent has 3 modes of operation:
+
+* ✨ `diff`: risk-weighted differential analysis between two programs
 * 🕵️‍♀️ `analyze`: deep analysis of a program's capabilities
-* 🔍 `scan`: find malicious content across a broad set of file formats
+* 🔍 `scan`: basic scan of malicious content
 
-malcontent is a bit paranoid and prone to false positives. It is currently focused on finding threats that impact Linux and macOS platforms, but malcontent can also detect threats that impact other platforms.
+malcontent is at its best analyzing programs that run on Linux. Still, it also performs admirably for programs designed for other UNIX platforms such as macOS and, to a lesser extent, Windows.
 
 ## Features
 
 * 14,500+ [YARA](YARA) detection rules
   * Including third-party rules from companies such as Avast, Elastic, FireEye, Mandiant, Nextron, ReversingLabs, and more!
-* Analyzes binaries from nearly any operating system (Linux, macOS, FreeBSD, Windows, etc.)
-* Analyzes scripts (Python, shell, Javascript, Typescript, PHP, Perl, AppleScript)
-* Analyzes container images
-* Transparent archive support (apk, tar, zip, etc.)
+* Analyzes binary files in most common formats (ELF, Mach-O, a.out, PE)
+* Analyzes code from most common languages (AppleScript, C, Go,  Javascript, PHP, Perl, Ruby, Shell, Typescript)
+* Transparent support for archives (apk, tar, zip, etc.) & container images
 * Multiple output formats (JSON, YAML, Markdown, Terminal)
 * Designed to work as part of a CI/CD pipeline
 * Supports air-gapped networks
 
 ## Modes
 
+### Diff
+
+malcontent's most powerful method for discovering malware is through differential analysis against CI/CD artifacts. When used within a build system, malcontent has two significant contextual advantages over a traditional malware scanner:
+
+* Baseline of expected behavior (previous release)
+* Semantic versioning that describes how large of a change to expect
+
+
+Using the [3CX Compromise](https://www.fortinet.com/blog/threat-research/3cx-desktop-app-compromised) as an example, malcontent trivially surfaces unexpectedly high-risk changes to  libffmpeg:
+
+![diff screenshot](./images/diff.png)
+
+Each line that begins with a "++" represents a newly added capability. Each capability has a risk score based on how unique it is to malware.
+
+Like the diff(1) command it's based on, malcontent can diff between two binaries or directories. It can also diff two archive files or even two OCI images. Here are some helpful flags:
+
+* `--format=markdown`: output in markdown for use in GitHub Actions
+* `--min-file-risk=critical`: only show diffs for critical-level changes
+* `--quantity-increases-risk=false`: disable heuristics that increase file criticality due to result frequency
+* `--file-risk-change`: only show diffs for modified files when the source and destination files are of different risks
+* `--file-risk-increase`: only show diffs for modified files when the destination file is of a higher risk than the source file
+
 ### Scan
 
-Scan directories for possible malware. This is our simplest feature, but not particularly novel either. malcontent is pretty paranoid in this mode, so expect some false positives:
+malcontent's most basic feature scans directories for possible malware. malcontent is pretty paranoid in this mode, so expect some false positives:
 
 ![scan screenshot](./images/scan.png)
 
@@ -51,7 +85,7 @@ Useful flags:
 
 ### Analyze
 
-To analyze the capabilities of a program, use `mal analyze`. For example:
+To enumerate the capabilities of a program, use `mal analyze`. For example:
 
 ![analyze screenshot](./images/analyze.png)
 
@@ -62,23 +96,6 @@ The analyze mode emits a list of capabilities often seen in malware, categorized
 * `--format=json`: output to JSON for data parsing
 * `--min-risk=high`: only show high or critical risk findings
 
-### Diff
-
-To detect unexpected capability changes, try `diff` mode. This allows you to find far more subtle attacks than a general scan, as you generally have both a baseline "known good" version and the context to understand what capabilities a program needs to operate.
-
-Using the [3CX Compromise](https://www.fortinet.com/blog/threat-research/3cx-desktop-app-compromised) as an example, we're able to use malcontent to detect malicious code inserted in an otherwise harmless library:
-
-![diff screenshot](./images/diff.png)
-
-Each line that begins with a "++" represents a newly added capability. You can use it to diff entire directories recursively, even if they contain programs written in a variety of languages.
-
-For use in CI/CD pipelines, you may find the following flags helpful:
-
-* `--format=markdown`: output in markdown for use in GitHub Actions
-* `--min-file-risk=critical`: only show diffs for critical-level changes
-* `--quantity-increases-risk=false`: disable heuristics that increase file criticality due to result frequency
-* `--file-risk-change`: only show diffs for modified files when the source and destination files are of different risks
-* `--file-risk-increase`: only show diffs for modified files when the destination file is of a higher risk than the source file
 
 ## Installation
 
@@ -109,4 +126,4 @@ go install github.com/chainguard-dev/malcontent/cmd/mal@latest
 
 ## Help Wanted
 
-malcontent is an honest-to-goodness open-source project. If you are interested in contributing, check out [DEVELOPMENT.md](DEVELOPMENT.md). Send us a pull request, and we'll help you with the rest!
+malcontent is open source! If you are interested in contributing, check out [our development guide](DEVELOPMENT.md). Send us a pull request, and we'll help you with the rest!