From b9554289ca261b34ac7a5b44d76be74587045d1d Mon Sep 17 00:00:00 2001
From: github-actions A BBOT scan in real-time - visualization with VivaGraphJS Supported Platforms Only Linux is supported at this time. Windows and macOS are not supported. If you use one of these platforms, consider using Docker. BBOT offers multiple methods of installation, including pipx and Docker. If you plan to dev on BBOT, see Installation (Poetry). Docker images are provided, along with helper script Below are some examples of common scans. Subdomains: Subdomains (passive only): Subdomains + port scan + web screenshots: Subdomains + basic web scan: Web spider: Everything everywhere all at once: BBOT works just fine without API keys. However, there are certain modules that need them to function. If you have API keys and want to make use of these modules, you can place them either in your preset: ...in BBOT's global YAML config ( Note: this will ensure the API keys are used in all scans, regardless of preset. ...or directly on the command-line: For more information, see Configuration. For a full list of modules, including which ones require API keys, see List of Modules. Next Up: Scanning --> BBOT does a lot more than just subdomain enumeration. However, subdomain enumeration is arguably the most important part of OSINT, and since there's so many subdomain enumeration tools out there, they're the easiest class of tool to compare it to. Thanks to BBOT's recursive nature (and its For a detailed analysis of this data, please see Subdomain Enumeration Tool Face-Off Note that in this benchmark, Spiderfoot crashed after ~20 minutes due to excessive memory usage. Amass never finished and had to be cancelled after 24h. All other tools finished successfully. We welcome contributions! If you have an idea for a new module, or are a Python developer who wants to get involved, please fork us or come talk to us on Discord. To get started devving, see the following links: It's well-known that when you're doing recon, it's best to do it recursively. However, there are very few recursive tools, and the main reason for this is because making a recursive tool is hard. In particular, it's very difficult to build a large-scale recursive system that interacts with the internet, and to keep it stable. When we first set out to make BBOT, we didn't know this, and it was definitely a lesson we learned the hard way. BBOT's stability is thanks to its extensive Unit Tests. BBOT inherits its recursive philosophy from Spiderfoot, which means it is also event-driven. Each of BBOT's 100+ modules consume a certain type of Event, use it to discover something new, and produce new events, which get distributed to all the other modules. This happens again and again -- thousands of times during a scan -- spidering outwards in a recursive web of discovery. Below is an interactive graph showing the relationships between modules and the event types they produce and consume. Each BBOT module does one specific task, such as querying an API for subdomains, or running a tool like For example, the Because of this, enabling even one module has the potential to increase your results exponentially. This is exactly how BBOT is able to outperform other tools. To learn more about how events flow inside BBOT, see BBOT Internal Architecture. May 15th, 2024 February 21, 2024 January 29, 2024 January 11, 2024 October 24, 2023 October 11, 2023 Includes webhook output modules - Discord, Slack, and Teams! August 4, 2023 New Features: Improvements / Fixes: New Modules: March 10, 2023 New Modules: New Features: December 15, 2022 New Modules: New Features: October 12, 2022 Changes: If you get errors resembling any of the above, it's probably because your Python version is too old. To install a newer version (3.9+ is required), you will need to do something like this:
self.strict_scope = strict_scope
self.acl_mode = acl_mode
self.special_event_types = {
- "ORG_STUB": re.compile(r"^ORG:(.*)", re.IGNORECASE),
- "ASN": re.compile(r"^ASN:(.*)", re.IGNORECASE),
+ "ORG_STUB": re.compile(r"^(?:ORG|ORG_STUB):(.*)", re.IGNORECASE),
+ "USERNAME": re.compile(r"^(?:USER|USERNAME):(.*)", re.IGNORECASE),
}
self._events = set()
self._radix = RadixTarget()
@@ -1494,8 +1494,8 @@
self.strict_scope = strict_scope
self.acl_mode = acl_mode
self.special_event_types = {
- "ORG_STUB": re.compile(r"^ORG:(.*)", re.IGNORECASE),
- "ASN": re.compile(r"^ASN:(.*)", re.IGNORECASE),
+ "ORG_STUB": re.compile(r"^(?:ORG|ORG_STUB):(.*)", re.IGNORECASE),
+ "USERNAME": re.compile(r"^(?:USER|USERNAME):(.*)", re.IGNORECASE),
}
self._events = set()
self._radix = RadixTarget()
diff --git a/Dev/dev/tests/index.html b/Dev/dev/tests/index.html
index 9a12b271b..4b961fcde 100644
--- a/Dev/dev/tests/index.html
+++ b/Dev/dev/tests/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/how_it_works/index.html b/Dev/how_it_works/index.html
index abcc3e5df..fc56f85bd 100644
--- a/Dev/how_it_works/index.html
+++ b/Dev/how_it_works/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/index.html b/Dev/index.html
index f6d837e52..a5e666546 100644
--- a/Dev/index.html
+++ b/Dev/index.html
@@ -19,7 +19,7 @@
-
+
diff --git a/Dev/modules/custom_yara_rules/index.html b/Dev/modules/custom_yara_rules/index.html
index d89d889c0..73121ec2a 100644
--- a/Dev/modules/custom_yara_rules/index.html
+++ b/Dev/modules/custom_yara_rules/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/Dev/modules/internal_modules/index.html b/Dev/modules/internal_modules/index.html
index 3013f554a..65eb61aba 100644
--- a/Dev/modules/internal_modules/index.html
+++ b/Dev/modules/internal_modules/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/Dev/modules/list_of_modules/index.html b/Dev/modules/list_of_modules/index.html
index aac4292ed..7888786d6 100644
--- a/Dev/modules/list_of_modules/index.html
+++ b/Dev/modules/list_of_modules/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/modules/nuclei/index.html b/Dev/modules/nuclei/index.html
index 201a3ba01..e83e51e9b 100644
--- a/Dev/modules/nuclei/index.html
+++ b/Dev/modules/nuclei/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/release_history/index.html b/Dev/release_history/index.html
index be29b6b74..d12f2c002 100644
--- a/Dev/release_history/index.html
+++ b/Dev/release_history/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/advanced/index.html b/Dev/scanning/advanced/index.html
index 8dc76b484..9671a05fe 100644
--- a/Dev/scanning/advanced/index.html
+++ b/Dev/scanning/advanced/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/configuration/index.html b/Dev/scanning/configuration/index.html
index 169e42d71..bd8465f5e 100644
--- a/Dev/scanning/configuration/index.html
+++ b/Dev/scanning/configuration/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/events/index.html b/Dev/scanning/events/index.html
index 0f8c64bff..11412bc8d 100644
--- a/Dev/scanning/events/index.html
+++ b/Dev/scanning/events/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/index.html b/Dev/scanning/index.html
index f155c16f1..d9523e41e 100644
--- a/Dev/scanning/index.html
+++ b/Dev/scanning/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/output/index.html b/Dev/scanning/output/index.html
index 43500f52c..6f0433fde 100644
--- a/Dev/scanning/output/index.html
+++ b/Dev/scanning/output/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/presets/index.html b/Dev/scanning/presets/index.html
index 439e8b8ec..ef28cdafd 100644
--- a/Dev/scanning/presets/index.html
+++ b/Dev/scanning/presets/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/presets_list/index.html b/Dev/scanning/presets_list/index.html
index 842b1a3b6..e0eb940ac 100644
--- a/Dev/scanning/presets_list/index.html
+++ b/Dev/scanning/presets_list/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/scanning/tips_and_tricks/index.html b/Dev/scanning/tips_and_tricks/index.html
index 81f3b491f..a9935c038 100644
--- a/Dev/scanning/tips_and_tricks/index.html
+++ b/Dev/scanning/tips_and_tricks/index.html
@@ -20,7 +20,7 @@
-
+
diff --git a/Dev/search/search_index.json b/Dev/search/search_index.json
index b82bca30a..1ea468ffe 100644
--- a/Dev/search/search_index.json
+++ b/Dev/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Getting Started","text":"
pipx
installs BBOT inside its own virtual environment.
"},{"location":"#docker","title":"Docker","text":"# stable version\npipx install bbot\n\n# bleeding edge (dev branch)\npipx install --pip-args '\\--pre' bbot\n\n# execute bbot command\nbbot --help\n
bbot-docker.sh
to persist your scan data.
"},{"location":"#example-commands","title":"Example Commands","text":"# bleeding edge (dev)\ndocker run -it blacklanternsecurity/bbot --help\n\n# stable\ndocker run -it blacklanternsecurity/bbot:stable --help\n\n# helper script\ngit clone https://github.com/blacklanternsecurity/bbot && cd bbot\n./bbot-docker.sh --help\n
# Perform a full subdomain enumeration on evilcorp.com\nbbot -t evilcorp.com -p subdomain-enum\n
# Perform a passive-only subdomain enumeration on evilcorp.com\nbbot -t evilcorp.com -p subdomain-enum -rf passive\n
# Port-scan every subdomain, screenshot every webpage, output to current directory\nbbot -t evilcorp.com -p subdomain-enum -m portscan gowitness -n my_scan -o .\n
# A basic web scan includes wappalyzer, robots.txt, and other non-intrusive web modules\nbbot -t evilcorp.com -p subdomain-enum web-basic\n
# Crawl www.evilcorp.com up to a max depth of 2, automatically extracting emails, secrets, etc.\nbbot -t www.evilcorp.com -p spider -c web.spider_distance=2 web.spider_depth=2\n
"},{"location":"#api-keys","title":"API Keys","text":"# Subdomains, emails, cloud buckets, port scan, basic web, web screenshots, nuclei\nbbot -t evilcorp.com -p kitchen-sink\n
description: My custom subdomain enum preset\n\ninclude:\n - subdomain-enum\n - cloud-enum\n\nconfig:\n modules:\n shodan_dns:\n api_key: deadbeef\n virustotal:\n api_key: cafebabe\n
~/.config/bbot/bbot.yml
):modules:\n shodan_dns:\n api_key: deadbeef\n virustotal:\n api_key: cafebabe\n
# specify API key with -c\nbbot -t evilcorp.com -f subdomain-enum -c modules.shodan_dns.api_key=deadbeef modules.virustotal.api_key=cafebabe\n
dnsbrute_mutations
module with its NLP-powered subdomain mutations), it typically finds about 20-25% more than other tools such as Amass
or theHarvester
. This holds true especially for larger targets like delta.com
(1000+ subdomains):
"},{"location":"how_it_works/","title":"How it Works","text":""},{"location":"how_it_works/#bbots-recursive-philosophy","title":"BBOT's Recursive Philosophy","text":"nuclei
, and is carefully designed to work together with other modules inside BBOT's recursive system.portscan
module consumes DNS_NAME
, and produces OPEN_TCP_PORT
. The sslcert
module consumes OPEN_TCP_PORT
and produces DNS_NAME
. You can see how even these two modules, when enabled together, will feed each other recursively.
"},{"location":"release_history/#improvements","title":"Improvements","text":"
"},{"location":"release_history/#bugfixes","title":"Bugfixes","text":"
"},{"location":"release_history/#v116","title":"v1.1.6","text":"
"},{"location":"release_history/#bigfixes","title":"Bigfixes","text":"
"},{"location":"release_history/#new-modules_1","title":"New Modules","text":"
"},{"location":"release_history/#v115","title":"v1.1.5","text":"
"},{"location":"release_history/#bugfixes_1","title":"Bugfixes","text":"
"},{"location":"release_history/#v114","title":"v1.1.4","text":"
"},{"location":"release_history/#bugfixes_2","title":"Bugfixes","text":"
"},{"location":"release_history/#new-modules_2","title":"New Modules","text":"
"},{"location":"release_history/#v112","title":"v1.1.2","text":"
"},{"location":"release_history/#bugfixes_3","title":"Bugfixes","text":"
"},{"location":"release_history/#new-modules_3","title":"New Modules","text":"
"},{"location":"release_history/#v111","title":"v1.1.1","text":"
"},{"location":"release_history/#bugfixes_4","title":"Bugfixes","text":"
"},{"location":"release_history/#new-modules_4","title":"New Modules","text":"
"},{"location":"release_history/#v110","title":"v1.1.0","text":"
-lf
"},{"location":"release_history/#v105","title":"v1.0.5","text":"
"},{"location":"release_history/#v104","title":"v1.0.4","text":"
"},{"location":"release_history/#v103","title":"v1.0.3","text":"
"},{"location":"troubleshooting/","title":"Troubleshooting","text":""},{"location":"troubleshooting/#installation-troubleshooting","title":"Installation troubleshooting","text":"retries
option for httpx moduleasset_inventory
output module--help
output
Fatal error from pip prevented installation.
ERROR: No matching distribution found for bbot
bash: /home/user/.local/bin/bbot: /home/user/.local/pipx/venvs/bbot/bin/python: bad interpreter
# install a newer version of python\nsudo apt install python3.9 python3.9-venv\n# install pipx\npython3.9 -m pip install --user pipx\n# add pipx to your path\npython3.9 -m pipx ensurepath\n# reboot\nreboot\n# install bbot\npython3.9 -m pipx install bbot\n# run bbot\nbbot --help\n
ModuleNotFoundError
","text":"If you run into a ModuleNotFoundError
, try running your bbot
command again with --force-deps
. This will repair your modules' Python dependencies.
As a troubleshooting step it is sometimes useful to clear out your older configs and let BBOT generate new ones. This will ensure that new defaults are property restored, etc.
# make a backup of the old configs\nmv ~/.config/bbot ~/.config/bbot.bak\n\n# generate new configs\nbbot\n
"},{"location":"dev/","title":"BBOT Developer Reference","text":"BBOT exposes a Python API that allows you to create, start, and stop scans.
Documented in this section are commonly-used classes and functions within BBOT, along with usage examples.
"},{"location":"dev/#adding-bbot-to-your-python-project","title":"Adding BBOT to Your Python Project","text":"If you are using Poetry, you can add BBOT to your python environment like this:
# stable\npoetry add bbot\n\n# bleeding-edge (dev branch)\npoetry add bbot --allow-prereleases\n
"},{"location":"dev/#running-a-bbot-scan-from-python","title":"Running a BBOT Scan from Python","text":""},{"location":"dev/#synchronous","title":"Synchronous","text":"from bbot.scanner import Scanner\n\nif __name__ == \"__main__\":\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n for event in scan.start():\n print(event)\n
"},{"location":"dev/#asynchronous","title":"Asynchronous","text":"from bbot.scanner import Scanner\n\nasync def main():\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n async for event in scan.async_start():\n print(event.json())\n\nif __name__ == \"__main__\":\n import asyncio\n asyncio.run(main())\n
For a full listing of Scanner
attributes and functions, see the Scanner
Code Reference.
You can specify any number of targets:
# create a scan against multiple targets\nscan = Scanner(\n \"evilcorp.com\",\n \"evilcorp.org\",\n \"evilcorp.ce\",\n \"4.3.2.1\",\n \"1.2.3.4/24\",\n presets=[\"subdomain-enum\"]\n)\n\n# this is the same as:\ntargets = [\"evilcorp.com\", \"evilcorp.org\", \"evilcorp.ce\", \"4.3.2.1\", \"1.2.3.4/24\"]\nscan = Scanner(*targets, presets=[\"subdomain-enum\"])\n
For more details, including which types of targets are valid, see Targets
"},{"location":"dev/#other-custom-options","title":"Other Custom Options","text":"In many cases, using a Preset like subdomain-enum
is sufficient. However, the Scanner
is flexible and accepts many other arguments that can override the default functionality. You can specify flags
, modules
, output_modules
, a whitelist
or blacklist
, and custom config
options:
# create a scan against multiple targets\nscan = Scanner(\n # targets\n \"evilcorp.com\",\n \"4.3.2.1\",\n # enable these presets\n presets=[\"subdomain-enum\"],\n # whitelist these hosts\n whitelist=[\"evilcorp.com\", \"evilcorp.org\"],\n # blacklist these hosts\n blacklist=[\"prod.evilcorp.com\"],\n # also enable these individual modules\n modules=[\"nuclei\", \"ipstack\"],\n # exclude modules with these flags\n exclude_flags=[\"slow\"],\n # custom config options\n config={\n \"modules\": {\n \"nuclei\": {\n \"tags\": \"apache,nginx\"\n }\n }\n }\n)\n
For a list of all the possible scan options, see the Presets
Code Reference
Here is a basic overview of BBOT's internal architecture.
"},{"location":"dev/architecture/#queues","title":"Queues","text":"Being both recursive and event-driven, BBOT makes heavy use of queues. These enable smooth communication between the modules, and ensure that large numbers of events can be produced without slowing down or clogging up the scan.
Every module in BBOT has both an incoming and outgoing queue. Event types matching the module's WATCHED_EVENTS
(e.g. DNS_NAME
) are queued in its incoming queue, and processed by the module's handle_event()
(or handle_batch()
in the case of batched modules). If the module finds anything interesting, it creates an event and places it in its outgoing queue, to be processed by the scan and redistributed to other modules.
Below is a graph showing the internal event flow in BBOT. White lines represent queues. Notice how some modules run in sequence, while others run in parallel. With the exception of a few specific modules, most BBOT modules are parallelized.
For a higher-level overview, see How it Works.
"},{"location":"dev/basemodule/","title":"BaseModule","text":""},{"location":"dev/basemodule/#bbot.modules.base.BaseModule","title":"BaseModule","text":"The base class for all BBOT modules.
Attributes:
watched_events
(List
) \u2013 Event types to watch.
produced_events
(List
) \u2013 Event types to produce.
meta
(Dict
) \u2013 Metadata about the module, such as whether authentication is required and a description.
flags
(List
) \u2013 Flags indicating the type of module (must have at least \"safe\" or \"aggressive\" and \"passive\" or \"active\").
deps_modules
(List
) \u2013 Other BBOT modules this module depends on. Empty list by default.
deps_pip
(List
) \u2013 Python dependencies to install via pip. Empty list by default.
deps_apt
(List
) \u2013 APT package dependencies to install. Empty list by default.
deps_shell
(List
) \u2013 Other dependencies installed via shell commands. Uses ansible.builtin.shell. Empty list by default.
deps_ansible
(List
) \u2013 Additional Ansible tasks for complex dependencies. Empty list by default.
accept_dupes
(bool
) \u2013 Whether to accept incoming duplicate events. Default is False.
suppress_dupes
(bool
) \u2013 Whether to suppress outgoing duplicate events. Default is True.
per_host_only
(bool
) \u2013 Limit the module to only scanning once per host. Default is False.
per_hostport_only
(bool
) \u2013 Limit the module to only scanning once per host:port. Default is False.
per_domain_only
(bool
) \u2013 Limit the module to only scanning once per domain. Default is False.
scope_distance_modifier
((int, None)
) \u2013 Modifies scope distance acceptance for events. Default is 0.
None == accept all events\n2 == accept events up to and including the scan's configured search distance plus two\n1 == accept events up to and including the scan's configured search distance plus one\n0 == (DEFAULT) accept events up to and including the scan's configured search distance\n
target_only
(bool
) \u2013 Accept only the initial target event(s). Default is False.
in_scope_only
(bool
) \u2013 Accept only explicitly in-scope events. Default is False.
options
(Dict
) \u2013 Customizable options for the module, e.g., {\"api_key\": \"\"}. Empty dict by default.
options_desc
(Dict
) \u2013 Descriptions for options, e.g., {\"api_key\": \"API Key\"}. Empty dict by default.
module_threads
(int
) \u2013 Maximum concurrent instances of handle_event() or handle_batch(). Default is 1.
batch_size
(int
) \u2013 Size of batches processed by handle_batch(). Default is 1.
batch_wait
(int
) \u2013 Seconds to wait before force-submitting a batch. Default is 10.
failed_request_abort_threshold
(int
) \u2013 Threshold for setting error state after failed HTTP requests (only takes effect when request_with_fail_count()
is used. Default is 5.
_preserve_graph
(bool
) \u2013 When set to True, accept events that may be duplicates but are necessary for construction of complete graph. Typically only enabled for output modules that need to maintain full chains of events, e.g. neo4j
and json
. Default is False.
_stats_exclude
(bool
) \u2013 Whether to exclude this module from scan statistics. Default is False.
_qsize
(int
) \u2013 Outgoing queue size (0 for infinite). Default is 0.
_priority
(int
) \u2013 Priority level of events raised by this module, 1-5. Default is 3.
_name
(str
) \u2013 Module name, overridden automatically. Default is 'base'.
_type
(str
) \u2013 Module type, for differentiating between normal and output modules. Default is 'scan'.
bbot/modules/base.py
class BaseModule:\n \"\"\"The base class for all BBOT modules.\n\n Attributes:\n watched_events (List): Event types to watch.\n\n produced_events (List): Event types to produce.\n\n meta (Dict): Metadata about the module, such as whether authentication is required and a description.\n\n flags (List): Flags indicating the type of module (must have at least \"safe\" or \"aggressive\" and \"passive\" or \"active\").\n\n deps_modules (List): Other BBOT modules this module depends on. Empty list by default.\n\n deps_pip (List): Python dependencies to install via pip. Empty list by default.\n\n deps_apt (List): APT package dependencies to install. Empty list by default.\n\n deps_shell (List): Other dependencies installed via shell commands. Uses [ansible.builtin.shell](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/shell_module.html). Empty list by default.\n\n deps_ansible (List): Additional Ansible tasks for complex dependencies. Empty list by default.\n\n accept_dupes (bool): Whether to accept incoming duplicate events. Default is False.\n\n suppress_dupes (bool): Whether to suppress outgoing duplicate events. Default is True.\n\n per_host_only (bool): Limit the module to only scanning once per host. Default is False.\n\n per_hostport_only (bool): Limit the module to only scanning once per host:port. Default is False.\n\n per_domain_only (bool): Limit the module to only scanning once per domain. Default is False.\n\n scope_distance_modifier (int, None): Modifies scope distance acceptance for events. Default is 0.\n ```\n None == accept all events\n 2 == accept events up to and including the scan's configured search distance plus two\n 1 == accept events up to and including the scan's configured search distance plus one\n 0 == (DEFAULT) accept events up to and including the scan's configured search distance\n ```\n\n target_only (bool): Accept only the initial target event(s). Default is False.\n\n in_scope_only (bool): Accept only explicitly in-scope events. Default is False.\n\n options (Dict): Customizable options for the module, e.g., {\"api_key\": \"\"}. Empty dict by default.\n\n options_desc (Dict): Descriptions for options, e.g., {\"api_key\": \"API Key\"}. Empty dict by default.\n\n module_threads (int): Maximum concurrent instances of handle_event() or handle_batch(). Default is 1.\n\n batch_size (int): Size of batches processed by handle_batch(). Default is 1.\n\n batch_wait (int): Seconds to wait before force-submitting a batch. Default is 10.\n\n failed_request_abort_threshold (int): Threshold for setting error state after failed HTTP requests (only takes effect when `request_with_fail_count()` is used. Default is 5.\n\n _preserve_graph (bool): When set to True, accept events that may be duplicates but are necessary for construction of complete graph. Typically only enabled for output modules that need to maintain full chains of events, e.g. `neo4j` and `json`. Default is False.\n\n _stats_exclude (bool): Whether to exclude this module from scan statistics. Default is False.\n\n _qsize (int): Outgoing queue size (0 for infinite). Default is 0.\n\n _priority (int): Priority level of events raised by this module, 1-5. Default is 3.\n\n _name (str): Module name, overridden automatically. Default is 'base'.\n\n _type (str): Module type, for differentiating between normal and output modules. Default is 'scan'.\n \"\"\"\n\n watched_events = []\n produced_events = []\n meta = {\"auth_required\": False, \"description\": \"Base module\"}\n flags = []\n options = {}\n options_desc = {}\n\n deps_modules = []\n deps_pip = []\n deps_apt = []\n deps_shell = []\n deps_ansible = []\n\n accept_dupes = False\n suppress_dupes = True\n per_host_only = False\n per_hostport_only = False\n per_domain_only = False\n scope_distance_modifier = 0\n target_only = False\n in_scope_only = False\n\n _module_threads = 1\n _batch_size = 1\n batch_wait = 10\n failed_request_abort_threshold = 5\n\n default_discovery_context = \"{module} discovered {event.type}: {event.data}\"\n\n _preserve_graph = False\n _stats_exclude = False\n _qsize = 1000\n _priority = 3\n _name = \"base\"\n _type = \"scan\"\n _intercept = False\n _shuffle_incoming_queue = True\n\n def __init__(self, scan):\n \"\"\"Initializes a module instance.\n\n Args:\n scan: The BBOT scan object associated with this module instance.\n\n Attributes:\n scan: The scan object associated with this module.\n\n errored (bool): Whether the module has errored out. Default is False.\n \"\"\"\n self.scan = scan\n self.errored = False\n self._log = None\n self._incoming_event_queue = None\n self._outgoing_event_queue = None\n # track incoming events to prevent unwanted duplicates\n self._incoming_dup_tracker = set()\n # tracks which subprocesses are running under this module\n self._proc_tracker = set()\n # seconds since we've submitted a batch\n self._last_submitted_batch = None\n # additional callbacks to be executed alongside self.cleanup()\n self.cleanup_callbacks = []\n self._cleanedup = False\n self._watched_events = None\n\n self._task_counter = TaskCounter()\n\n # string constant\n self._custom_filter_criteria_msg = \"it did not meet custom filter criteria\"\n\n # track number of failures (for .request_with_fail_count())\n self._request_failures = 0\n\n self._tasks = []\n self._event_received = asyncio.Condition()\n self._event_queued = asyncio.Condition()\n\n # used for optional \"per host\" tracking\n self._per_host_tracker = set()\n\n async def setup(self):\n \"\"\"\n Performs one-time setup tasks for the module.\n\n This method is responsible for preparing the module for its operation, which may include tasks\n such as downloading necessary resources, validating configuration parameters, or other preliminary\n checks.\n\n Returns:\n tuple:\n - bool or None: A status indicating the outcome of the setup process. Returns `True` if\n the setup was successful, `None` for a soft-fail where the module setup did not succeed\n but the scan will continue with the module disabled, and `False` for a hard-fail where\n the setup failure causes the scan to abort.\n - str, optional: A reason for the setup failure, provided only when the setup does not\n succeed (i.e., returns `None` or `False`).\n\n Examples:\n >>> async def setup(self):\n >>> if not self.config.get(\"api_key\"):\n >>> # Soft-fail: Configuration missing an API key\n >>> return None, \"No API key specified\"\n\n >>> async def setup(self):\n >>> try:\n >>> wordlist = await self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n >>> except WordlistError as e:\n >>> # Hard-fail: Error retrieving wordlist\n >>> return False, f\"Error retrieving wordlist: {e}\"\n\n >>> async def setup(self):\n >>> self.timeout = self.config.get(\"timeout\", 5)\n >>> # Success: Setup completed without issues\n >>> return True\n \"\"\"\n\n return True\n\n async def handle_event(self, event):\n \"\"\"Asynchronously handles incoming events that the module is configured to watch.\n\n This method is automatically invoked when an event that matches any in `watched_events` is encountered during a scan. Override this method to implement custom event-handling logic for your module.\n\n Args:\n event (Event): The event object containing details about the incoming event.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to 1.\n\n Returns:\n None\n \"\"\"\n pass\n\n async def handle_batch(self, *events):\n \"\"\"Handles incoming events in batches for optimized processing.\n\n This method is automatically called when multiple events that match any in `watched_events` are encountered and the `batch_size` attribute is set to a value greater than 1. Override this method to implement custom batch event-handling logic for your module.\n\n Args:\n *events (Event): A variable number of Event objects to be processed in a batch.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to a value greater than 1.\n\n Returns:\n None\n \"\"\"\n pass\n\n async def filter_event(self, event):\n \"\"\"Asynchronously filters incoming events based on custom criteria.\n\n Override this method for more granular control over which events are accepted by your module. This method is called automatically before `handle_event()` for each incoming event that matches any in `watched_events`.\n\n Args:\n event (Event): The incoming Event object to be filtered.\n\n Returns:\n tuple: A 2-tuple where the first value is a bool indicating whether the event should be accepted, and the second value is a string explaining the reason for its acceptance or rejection. By default, returns `(True, None)` to indicate acceptance without reason.\n\n Note:\n This method should be overridden if the module requires custom logic for event filtering.\n \"\"\"\n return True\n\n async def finish(self):\n \"\"\"Asynchronously performs final tasks as the scan nears completion.\n\n This method can be overridden to execute any necessary finalization logic. For example, if the module relies on a word cloud, you might wait for the scan to finish to ensure the word cloud is most complete before running an operation.\n\n Returns:\n None\n\n Warnings:\n This method may be called multiple times since it can raise events, which may re-trigger the \"finish\" phase of the scan. Optional to override.\n \"\"\"\n return\n\n async def report(self):\n \"\"\"Asynchronously executes a final task after the scan is complete but before cleanup.\n\n This method can be overridden to aggregate data and raise summary events at the end of the scan.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan.\n \"\"\"\n return\n\n async def cleanup(self):\n \"\"\"Asynchronously performs final cleanup operations after the scan is complete.\n\n This method can be overridden to implement custom cleanup logic. It is called only once per scan and may not raise events.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan and may not raise events.\n \"\"\"\n return\n\n async def require_api_key(self):\n \"\"\"\n Asynchronously checks if an API key is required and valid.\n\n Args:\n None\n\n Returns:\n bool or tuple: Returns True if API key is valid and ready.\n Returns a tuple (None, \"error message\") otherwise.\n\n Notes:\n - Fetches the API key from the configuration.\n - Calls the 'ping()' method to test API accessibility.\n - Sets the API key readiness status accordingly.\n \"\"\"\n self.api_key = self.config.get(\"api_key\", \"\")\n if self.auth_secret:\n try:\n await self.ping()\n self.hugesuccess(f\"API is ready\")\n return True\n except Exception as e:\n return None, f\"Error with API ({str(e).strip()})\"\n else:\n return None, \"No API key set\"\n\n async def ping(self):\n \"\"\"Asynchronously checks the health of the configured API.\n\n This method is used in conjunction with require_api_key() to verify that the API is not just configured, but also responsive. This method should include an assert statement to validate the API's health, typically by making a test request to a known endpoint.\n\n Example Usage:\n In your implementation, if the API has a \"/ping\" endpoint:\n async def ping(self):\n r = await self.request_with_fail_count(f\"{self.base_url}/ping\")\n resp_content = getattr(r, \"text\", \"\")\n assert getattr(r, \"status_code\", 0) == 200, resp_content\n\n Returns:\n None\n\n Raises:\n AssertionError: If the API does not respond as expected.\n \"\"\"\n return\n\n @property\n def batch_size(self):\n batch_size = self.config.get(\"batch_size\", None)\n # only allow overriding the batch size if its default value is greater than 1\n # this prevents modules from being accidentally neutered by an incorrect batch_size setting\n if batch_size is None or self._batch_size == 1:\n batch_size = self._batch_size\n return batch_size\n\n @property\n def module_threads(self):\n module_threads = self.config.get(\"module_threads\", None)\n if module_threads is None:\n module_threads = self._module_threads\n return module_threads\n\n @property\n def auth_secret(self):\n \"\"\"Indicates if the module is properly configured for authentication.\n\n This read-only property should be used to check whether all necessary attributes (e.g., API keys, tokens, etc.) are configured to perform authenticated requests in the module. Commonly used in setup or initialization steps.\n\n Returns:\n bool: True if the module is properly configured for authentication, otherwise False.\n \"\"\"\n return getattr(self, \"api_key\", \"\")\n\n def get_watched_events(self):\n \"\"\"Retrieve the set of events that the module is interested in observing.\n\n Override this method if the set of events the module should watch needs to be determined dynamically, e.g., based on configuration options or other runtime conditions.\n\n Returns:\n set: The set of event types that this module will handle.\n \"\"\"\n if self._watched_events is None:\n self._watched_events = set(self.watched_events)\n return self._watched_events\n\n async def _handle_batch(self):\n \"\"\"\n Asynchronously handles a batch of events in the module.\n\n Args:\n None\n\n Returns:\n bool: True if events were submitted for processing, False otherwise.\n\n Notes:\n - The method is wrapped in a task counter to monitor asynchronous operations.\n - Checks if there are any events in the incoming queue and module is not in an error state.\n - Invokes '_events_waiting()' to fetch a batch of events.\n - Calls the module's 'handle_batch()' method to process these events.\n - If a \"FINISHED\" event is found, invokes 'finish()' method of the module.\n \"\"\"\n finish = False\n async with self._task_counter.count(f\"{self.name}.handle_batch()\") as counter:\n submitted = False\n if self.batch_size <= 1:\n return\n if self.num_incoming_events > 0:\n events, finish = await self._events_waiting()\n if events and not self.errored:\n counter.n = len(events)\n self.verbose(f\"Handling batch of {len(events):,} events\")\n submitted = True\n async with self.scan._acatch(f\"{self.name}.handle_batch()\"):\n await self.handle_batch(*events)\n self.verbose(f\"Finished handling batch of {len(events):,} events\")\n if finish:\n context = f\"{self.name}.finish()\"\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.finish()\n return submitted\n\n def make_event(self, *args, **kwargs):\n \"\"\"Create an event for the scan.\n\n Raises a validation error if the event could not be created, unless raise_error is set to False.\n\n Args:\n *args: Positional arguments to be passed to the scan's make_event method.\n **kwargs: Keyword arguments to be passed to the scan's make_event method.\n raise_error (bool, optional): Whether to raise a validation error if the event could not be created. Defaults to False.\n\n Examples:\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n Event or None: The created event, or None if a validation error occurred and raise_error was False.\n\n Raises:\n ValidationError: If the event could not be validated and raise_error is True.\n \"\"\"\n raise_error = kwargs.pop(\"raise_error\", False)\n module = kwargs.pop(\"module\", None)\n if module is None:\n if (not args) or getattr(args[0], \"module\", None) is None:\n kwargs[\"module\"] = self\n try:\n event = self.scan.make_event(*args, **kwargs)\n except ValidationError as e:\n if raise_error:\n raise\n self.warning(f\"{e}\")\n return\n return event\n\n async def emit_event(self, *args, **kwargs):\n \"\"\"Emit an event to the event queue and distribute it to interested modules.\n\n This is how modules \"return\" data.\n\n The method first creates an event object by calling `self.make_event()` with the provided arguments.\n Then, the event is queued for outgoing distribution using `self.queue_outgoing_event()`.\n\n Args:\n *args: Positional arguments to be passed to `self.make_event()` for event creation.\n **kwargs: Keyword arguments to be passed for event creation or configuration of the emit action.\n ```markdown\n - on_success_callback: Optional callback function to execute upon successful event emission.\n - abort_if: Optional condition under which the event emission should be aborted.\n - quick: Optional flag to indicate whether the event should be processed quickly.\n ```\n\n Examples:\n >>> await self.emit_event(\"www.evilcorp.com\", parent=event, tags=[\"affiliate\"])\n\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n None\n\n Raises:\n ValidationError: If the event cannot be validated (handled in `self.make_event()`).\n \"\"\"\n event_kwargs = dict(kwargs)\n emit_kwargs = {}\n for o in (\"on_success_callback\", \"abort_if\", \"quick\"):\n v = event_kwargs.pop(o, None)\n if v is not None:\n emit_kwargs[o] = v\n event = self.make_event(*args, **event_kwargs)\n if event:\n await self.queue_outgoing_event(event, **emit_kwargs)\n return event\n\n async def _events_waiting(self, batch_size=None):\n \"\"\"\n Asynchronously fetches events from the incoming_event_queue, up to a specified batch size.\n\n Args:\n None\n\n Returns:\n tuple: A tuple containing two elements:\n - events (list): A list of acceptable events from the queue.\n - finish (bool): A flag indicating if a \"FINISHED\" event is encountered.\n\n Notes:\n - The method pulls events from incoming_event_queue using 'get_nowait()'.\n - Events go through '_event_postcheck()' for validation.\n - \"FINISHED\" events are handled differently and the finish flag is set to True.\n - If the queue is empty or the batch size is reached, the loop breaks.\n \"\"\"\n if batch_size is None:\n batch_size = self.batch_size\n events = []\n finish = False\n while self.incoming_event_queue:\n if batch_size != -1 and len(events) > self.batch_size:\n break\n try:\n event = self.incoming_event_queue.get_nowait()\n self.debug(f\"Got {event} from {getattr(event, 'module', 'unknown_module')}\")\n acceptable, reason = await self._event_postcheck(event)\n if acceptable:\n if event.type == \"FINISHED\":\n finish = True\n else:\n events.append(event)\n self.scan.stats.event_consumed(event, self)\n elif reason:\n self.debug(f\"Not accepting {event} because {reason}\")\n except asyncio.queues.QueueEmpty:\n break\n return events, finish\n\n @property\n def num_incoming_events(self):\n ret = 0\n if self.incoming_event_queue is not False:\n ret = self.incoming_event_queue.qsize()\n return ret\n\n def start(self):\n self._tasks = [\n asyncio.create_task(self._worker(), name=f\"{self.scan.name}.{self.name}._worker()\")\n for _ in range(self.module_threads)\n ]\n\n async def _setup(self):\n \"\"\"\n Asynchronously sets up the module by invoking its 'setup()' method.\n\n This method catches exceptions during setup, sets the module's error state if necessary, and determines the\n status code based on the result of the setup process.\n\n Args:\n None\n\n Returns:\n tuple: A tuple containing the module's name, status (True for success, False for hard-fail, None for soft-fail),\n and an optional status message.\n\n Raises:\n Exception: Captured exceptions from the 'setup()' method are logged, but not propagated.\n\n Notes:\n - The 'setup()' method can return either a simple boolean status or a tuple of status and message.\n - A WordlistError exception triggers a soft-fail status.\n - The debug log will contain setup status information for the module.\n \"\"\"\n status_codes = {False: \"hard-fail\", None: \"soft-fail\", True: \"success\"}\n\n status = False\n self.debug(f\"Setting up module {self.name}\")\n try:\n result = await self.setup()\n if type(result) == tuple and len(result) == 2:\n status, msg = result\n else:\n status = result\n msg = status_codes[status]\n self.debug(f\"Finished setting up module {self.name}\")\n except Exception as e:\n self.set_error_state(f\"Unexpected error during module setup: {e}\", critical=True)\n msg = f\"{e}\"\n self.trace()\n return self, status, str(msg)\n\n async def _worker(self):\n \"\"\"\n The core worker loop for the module, responsible for handling events from the incoming event queue.\n\n This method is a coroutine and is run asynchronously. Multiple instances can run simultaneously based on\n the 'module_threads' configuration. The worker dequeues events from 'incoming_event_queue', performs\n necessary prechecks, and passes the event to the appropriate handler function.\n\n Args:\n None\n\n Returns:\n None\n\n Raises:\n asyncio.CancelledError: If the worker is cancelled during its operation.\n\n Notes:\n - The worker is sensitive to the 'stopping' flag of the scan. It will terminate if this flag is set.\n - The worker handles backpressure by pausing when the outgoing event queue is full.\n - Batch processing is supported and is activated when 'batch_size' > 1.\n - Each event is subject to a post-check via '_event_postcheck()' to decide whether it should be handled.\n - Special 'FINISHED' events trigger the 'finish()' method of the module.\n \"\"\"\n async with self.scan._acatch(context=self._worker, unhandled_is_critical=True):\n try:\n while not self.scan.stopping and not self.errored:\n # hold the reigns if our outgoing queue is full\n if self._qsize > 0 and self.outgoing_event_queue.qsize() >= self._qsize:\n await asyncio.sleep(0.1)\n continue\n\n if self.batch_size > 1:\n submitted = await self._handle_batch()\n if not submitted:\n async with self._event_received:\n await self._event_received.wait()\n\n else:\n try:\n if self.incoming_event_queue is not False:\n event = await self.incoming_event_queue.get()\n else:\n self.debug(f\"Event queue is in bad state\")\n break\n except asyncio.queues.QueueEmpty:\n continue\n self.debug(f\"Got {event} from {getattr(event, 'module', 'unknown_module')}\")\n async with self._task_counter.count(f\"event_postcheck({event})\"):\n acceptable, reason = await self._event_postcheck(event)\n if acceptable:\n if event.type == \"FINISHED\":\n context = f\"{self.name}.finish()\"\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.finish()\n else:\n context = f\"{self.name}.handle_event({event})\"\n self.scan.stats.event_consumed(event, self)\n self.debug(f\"Handling {event}\")\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.handle_event(event)\n self.debug(f\"Finished handling {event}\")\n else:\n self.debug(f\"Not accepting {event} because {reason}\")\n except asyncio.CancelledError:\n # this trace was used for debugging leaked CancelledErrors from inside httpx\n # self.log.trace(\"Worker cancelled\")\n raise\n except BaseException as e:\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt,)):\n self.scan.stop()\n else:\n self.error(f\"Critical failure in module {self.name}: {e}\")\n self.error(traceback.format_exc())\n self.log.trace(f\"Worker stopped\")\n\n @property\n def max_scope_distance(self):\n if self.in_scope_only or self.target_only:\n return 0\n if self.scope_distance_modifier is None:\n return 999\n return max(0, self.scan.scope_search_distance + self.scope_distance_modifier)\n\n def _event_precheck(self, event):\n \"\"\"\n Pre-checks an event to determine if it should be accepted by the module for queuing.\n\n This method is called when an event is about to be enqueued into the module's incoming event queue.\n It applies various filters such as special signal event types, module error state, watched event types, and more\n to decide whether or not the event should be enqueued.\n\n Args:\n event (Event): The event object to check.\n\n Returns:\n tuple: A tuple (bool, str) where the bool indicates if the event should be accepted, and the str gives the reason.\n\n Examples:\n >>> result, reason = self._event_precheck(event)\n >>> if result:\n ... self.incoming_event_queue.put_nowait(event)\n ... else:\n ... self.debug(f\"Not accepting {event} because {reason}\")\n\n Notes:\n - The method considers special signal event types like \"FINISHED\".\n - Checks whether the module is in an error state.\n - Checks if the event type matches the types this module is interested in (`watched_events`).\n - Checks for events tagged as 'target' if the module has `target_only` flag set.\n - Applies specific filtering based on event type and module name.\n \"\"\"\n\n # special signal event types\n if event.type in (\"FINISHED\",):\n return True, \"its type is FINISHED\"\n if self.errored:\n return False, f\"module is in error state\"\n # exclude non-watched types\n if not any(t in self.get_watched_events() for t in (\"*\", event.type)):\n return False, \"its type is not in watched_events\"\n if self.target_only:\n if \"target\" not in event.tags:\n return False, \"it did not meet target_only filter criteria\"\n\n # exclude certain URLs (e.g. javascript):\n # TODO: revisit this after httpx rework\n if event.type.startswith(\"URL\") and self.name != \"httpx\" and \"httpx-only\" in event.tags:\n return False, \"its extension was listed in url_extension_httpx_only\"\n\n return True, \"precheck succeeded\"\n\n async def _event_postcheck(self, event):\n \"\"\"\n A simple wrapper for dup tracking\n \"\"\"\n # special exception for \"FINISHED\" event\n if event.type in (\"FINISHED\",):\n return True, \"\"\n acceptable, reason = await self._event_postcheck_inner(event)\n if acceptable:\n # check duplicates\n is_incoming_duplicate, reason = self.is_incoming_duplicate(event, add=True)\n if is_incoming_duplicate and not self.accept_dupes:\n return False, f\"module has already seen it\" + (f\" ({reason})\" if reason else \"\")\n\n return acceptable, reason\n\n async def _event_postcheck_inner(self, event):\n \"\"\"\n Post-checks an event to determine if it should be accepted by the module for handling.\n\n This method is called when an event is dequeued from the module's incoming event queue, right before it is actually processed.\n It applies various filters such as scope, custom filtering logic, and per-host tracking to decide the event's fate.\n\n Args:\n event (Event): The event object to check.\n\n Returns:\n tuple: A tuple (bool, str) where the bool indicates if the event should be accepted, and the str gives the reason.\n\n Notes:\n - Override the `filter_event` method for custom filtering logic.\n - This method also maintains host-based tracking when the `per_host_only` or similar flags are set.\n - The method will also update event production stats for output modules.\n \"\"\"\n # force-output certain events to the graph\n if self._is_graph_important(event):\n return True, \"event is critical to the graph\"\n\n # check scope distance\n filter_result, reason = self._scope_distance_check(event)\n if not filter_result:\n return filter_result, reason\n\n # custom filtering\n async with self.scan._acatch(context=self.filter_event):\n try:\n filter_result = await self.filter_event(event)\n except Exception as e:\n msg = f\"Unhandled exception in {self.name}.filter_event({event}): {e}\"\n self.error(msg)\n return False, msg\n msg = str(self._custom_filter_criteria_msg)\n with suppress(ValueError, TypeError):\n filter_result, reason = filter_result\n msg += f\": {reason}\"\n if not filter_result:\n return False, msg\n\n self.debug(f\"{event} passed post-check\")\n return True, \"\"\n\n def _scope_distance_check(self, event):\n if self.in_scope_only:\n if event.scope_distance > 0:\n return False, \"it did not meet in_scope_only filter criteria\"\n if self.scope_distance_modifier is not None:\n if event.scope_distance < 0:\n return False, f\"its scope_distance ({event.scope_distance}) is invalid.\"\n elif event.scope_distance > self.max_scope_distance:\n return (\n False,\n f\"its scope_distance ({event.scope_distance}) exceeds the maximum allowed by the scan ({self.scan.scope_search_distance}) + the module ({self.scope_distance_modifier}) == {self.max_scope_distance}\",\n )\n return True, \"\"\n\n async def _cleanup(self):\n if not self._cleanedup:\n self._cleanedup = True\n for callback in [self.cleanup] + self.cleanup_callbacks:\n context = f\"{self.name}.cleanup()\"\n if callable(callback):\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.helpers.execute_sync_or_async(callback)\n\n async def queue_event(self, event):\n \"\"\"\n Asynchronously queues an incoming event to the module's event queue for further processing.\n\n The function performs an initial check to see if the event is acceptable for queuing.\n If the event passes the check, it is put into the `incoming_event_queue`.\n\n Args:\n event: The event object to be queued.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `incoming_event_queue`.\n\n Examples:\n >>> await self.queue_event(some_event)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue incoming events.\n \"\"\"\n async with self._task_counter.count(\"queue_event()\", _log=False):\n if self.incoming_event_queue is False:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n return\n acceptable, reason = self._event_precheck(event)\n if not acceptable:\n if reason and reason != \"its type is not in watched_events\":\n self.debug(f\"Not queueing {event} because {reason}\")\n return\n else:\n self.debug(f\"Queueing {event} because {reason}\")\n try:\n self.incoming_event_queue.put_nowait(event)\n async with self._event_received:\n self._event_received.notify()\n if event.type != \"FINISHED\":\n self.scan._new_activity = True\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n\n async def queue_outgoing_event(self, event, **kwargs):\n \"\"\"\n Queues an outgoing event to the module's outgoing event queue for further processing.\n\n The function attempts to put the event into the `outgoing_event_queue` immediately.\n If it's not possible due to the current state of the module, an AttributeError is raised, and a debug log is generated.\n\n Args:\n event: The event object to be queued.\n **kwargs: Additional keyword arguments to be associated with the event.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `outgoing_event_queue`.\n\n Examples:\n >>> self.queue_outgoing_event(some_outgoing_event, abort_if=lambda e: \"unresolved\" in e.tags)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue outgoing events.\n \"\"\"\n try:\n await self.outgoing_event_queue.put((event, kwargs))\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue outgoing event\")\n\n def set_error_state(self, message=None, clear_outgoing_queue=False, critical=False):\n \"\"\"\n Puts the module into an errored state where it cannot accept new events. Optionally logs a warning message.\n\n The function sets the module's `errored` attribute to True and logs a warning with the optional message.\n It also clears the incoming event queue to prevent further processing and updates its status to False.\n\n Args:\n message (str, optional): Additional message to be logged along with the warning.\n\n Returns:\n None: The function doesn't return anything but updates the `errored` state and clears the incoming event queue.\n\n Examples:\n >>> self.set_error_state()\n >>> self.set_error_state(\"Failed to connect to the server\")\n\n Notes:\n - The function sets `self._incoming_event_queue` to False to prevent its further use.\n - If the module was already in an errored state, the function will not reset the error state or the queue.\n \"\"\"\n if not self.errored:\n log_msg = \"Setting error state\"\n if message is not None:\n log_msg += f\": {message}\"\n if critical:\n log_fn = self.error\n else:\n log_fn = self.warning\n log_fn(log_msg)\n self.errored = True\n # clear incoming queue\n if self.incoming_event_queue is not False:\n self.debug(f\"Emptying event_queue\")\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.incoming_event_queue.get_nowait()\n # set queue to None to prevent its use\n # if there are leftover objects in the queue, the scan will hang.\n self._incoming_event_queue = False\n\n if clear_outgoing_queue:\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.outgoing_event_queue.get_nowait()\n\n def is_incoming_duplicate(self, event, add=False):\n if event.type in (\"FINISHED\",):\n return False, \"\"\n reason = \"\"\n try:\n event_hash = self._incoming_dedup_hash(event)\n except Exception as e:\n msg = f\"Unhandled exception in {self.name}._incoming_dedup_hash({event}): {e}\"\n self.error(msg)\n return True, msg\n with suppress(TypeError, ValueError):\n event_hash, reason = event_hash\n is_dup = event_hash in self._incoming_dup_tracker\n if add:\n self._incoming_dup_tracker.add(event_hash)\n return is_dup, reason\n\n def _incoming_dedup_hash(self, event):\n \"\"\"\n Determines the criteria for what is considered to be a duplicate event if `accept_dupes` is False.\n \"\"\"\n if self.per_host_only:\n return self.get_per_host_hash(event), \"per_host_only=True\"\n if self.per_hostport_only:\n return self.get_per_hostport_hash(event), \"per_hostport_only=True\"\n elif self.per_domain_only:\n return self.get_per_domain_hash(event), \"per_domain_only=True\"\n return hash(event), \"\"\n\n def _outgoing_dedup_hash(self, event):\n \"\"\"\n Determines the criteria for what is considered to be a duplicate event if `suppress_dupes` is True.\n \"\"\"\n return hash((event, self.name))\n\n def get_per_host_hash(self, event):\n \"\"\"\n Computes a per-host hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host` to create a string to be hashed.\n\n Args:\n event (Event): The event object containing host information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_host_hash(event)\n \"\"\"\n return hash(event.host)\n\n def get_per_hostport_hash(self, event):\n \"\"\"\n Computes a per-host:port hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host`, `port`, and `scheme` (for URLs) to create a string to be hashed.\n The hash value is used for distinguishing events related to the same host.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_hostport_hash(event)\n \"\"\"\n parsed = getattr(event, \"parsed_url\", None)\n if parsed is None:\n to_hash = self.helpers.make_netloc(event.host, event.port)\n else:\n to_hash = f\"{parsed.scheme}://{parsed.netloc}/\"\n return hash(to_hash)\n\n def get_per_domain_hash(self, event):\n \"\"\"\n Computes a per-domain hash value for a given event. This method may be optionally overridden in subclasses.\n\n Events with the same root domain will receive the same hash value.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the domain.\n\n Examples:\n >>> event = self.make_event(\"https://www.example.com:8443\")\n >>> self.get_per_domain_hash(event)\n \"\"\"\n _, domain = self.helpers.split_domain(event.host)\n return hash(domain)\n\n @property\n def name(self):\n return str(self._name)\n\n @property\n def helpers(self):\n return self.scan.helpers\n\n @property\n def status(self):\n \"\"\"\n Provides the current status of the module as a dictionary.\n\n The dictionary contains the following keys:\n - 'events': A sub-dictionary with 'incoming' and 'outgoing' keys, representing the number of events in the respective queues.\n - 'tasks': The current value of the task counter.\n - 'errored': A boolean value indicating if the module is in an error state.\n - 'running': A boolean value indicating if the module is currently processing data.\n\n Returns:\n dict: A dictionary containing the current status of the module.\n\n Examples:\n >>> self.status\n {'events': {'incoming': 5, 'outgoing': 2}, 'tasks': 3, 'errored': False, 'running': True}\n \"\"\"\n status = {\n \"events\": {\"incoming\": self.num_incoming_events, \"outgoing\": self.outgoing_event_queue.qsize()},\n \"tasks\": self._task_counter.value,\n \"errored\": self.errored,\n }\n status[\"running\"] = self.running\n return status\n\n @property\n def running(self):\n \"\"\"Property indicating whether the module is currently processing data.\n\n This property checks if the task counter (`self._task_counter.value`) is greater than zero,\n indicating that there are ongoing tasks in the module.\n\n Returns:\n bool: True if the module is currently processing data, False otherwise.\n \"\"\"\n return self._task_counter.value > 0\n\n @property\n def finished(self):\n \"\"\"Property indicating whether the module has finished processing.\n\n This property checks three conditions to determine if the module is finished:\n 1. The module is not currently running (`self.running` is False).\n 2. The number of incoming events in the queue is zero or less (`self.num_incoming_events <= 0`).\n 3. The number of outgoing events in the queue is zero or less (`self.outgoing_event_queue.qsize() <= 0`).\n\n Returns:\n bool: True if the module has finished processing, False otherwise.\n \"\"\"\n return not self.running and self.num_incoming_events <= 0 and self.outgoing_event_queue.qsize() <= 0\n\n async def run_process(self, *args, **kwargs):\n kwargs[\"_proc_tracker\"] = self._proc_tracker\n return await self.helpers.run(*args, **kwargs)\n\n async def run_process_live(self, *args, **kwargs):\n kwargs[\"_proc_tracker\"] = self._proc_tracker\n async for line in self.helpers.run_live(*args, **kwargs):\n yield line\n\n async def request_with_fail_count(self, *args, **kwargs):\n \"\"\"Asynchronously perform an HTTP request while keeping track of consecutive failures.\n\n This function wraps the `self.helpers.request` method, incrementing a failure counter if\n the request returns None. When the failure counter exceeds `self.failed_request_abort_threshold`,\n the module is set to an error state.\n\n Args:\n *args: Positional arguments to pass to `self.helpers.request`.\n **kwargs: Keyword arguments to pass to `self.helpers.request`.\n\n Returns:\n Any: The response object or None if the request failed.\n\n Raises:\n None: Sets the module to an error state when the failure threshold is reached.\n \"\"\"\n r = await self.helpers.request(*args, **kwargs)\n if r is None:\n self._request_failures += 1\n else:\n self._request_failures = 0\n if self._request_failures >= self.failed_request_abort_threshold:\n self.set_error_state(f\"Setting error state due to {self._request_failures:,} failed HTTP requests\")\n return r\n\n @property\n def preset(self):\n return self.scan.preset\n\n @property\n def config(self):\n \"\"\"Property that provides easy access to the module's configuration in the scan's config.\n\n This property serves as a shortcut to retrieve the module-specific configuration from\n `self.scan.config`. If no configuration is found for this module, an empty dictionary is returned.\n\n Returns:\n dict: The configuration dictionary specific to this module.\n \"\"\"\n config = self.scan.config.get(\"modules\", {}).get(self.name, {})\n if config is None:\n config = {}\n return config\n\n @property\n def incoming_event_queue(self):\n if self._incoming_event_queue is None:\n if self._shuffle_incoming_queue:\n self._incoming_event_queue = ShuffleQueue()\n else:\n self._incoming_event_queue = asyncio.Queue()\n return self._incoming_event_queue\n\n @property\n def outgoing_event_queue(self):\n if self._outgoing_event_queue is None:\n self._outgoing_event_queue = ShuffleQueue(self._qsize)\n return self._outgoing_event_queue\n\n @property\n def priority(self):\n \"\"\"\n Gets the priority level of the module as an integer.\n\n The priority level is constrained to be between 1 and 5, inclusive.\n A lower value indicates a higher priority.\n\n Returns:\n int: The priority level of the module, constrained between 1 and 5.\n\n Examples:\n >>> self.priority\n 3\n \"\"\"\n return int(max(1, min(5, self._priority)))\n\n @property\n def auth_required(self):\n return self.meta.get(\"auth_required\", False)\n\n @property\n def http_timeout(self):\n \"\"\"\n Convenience shortcut to `http_timeout` in the config\n \"\"\"\n return self.scan.web_config.get(\"http_timeout\", 10)\n\n @property\n def log(self):\n if getattr(self, \"_log\", None) is None:\n self._log = logging.getLogger(f\"bbot.modules.{self.name}\")\n return self._log\n\n @property\n def memory_usage(self):\n \"\"\"Property that calculates the current memory usage of the module in bytes.\n\n This property uses the `get_size` function to estimate the memory consumption\n of the module object. The depth of the object graph traversal is limited to 3 levels\n to avoid performance issues. Commonly shared objects like `self.scan`, `self.helpers`,\n are excluded from the calculation to prevent double-counting.\n\n Returns:\n int: The estimated memory usage of the module in bytes.\n \"\"\"\n seen = {self.scan, self.helpers, self.log} # noqa\n return get_size(self, max_depth=3, seen=seen)\n\n def __str__(self):\n return self.name\n\n def log_table(self, *args, **kwargs):\n \"\"\"Logs a table to the console and optionally writes it to a file.\n\n This function generates a table using `self.helpers.make_table`, then logs each line\n of the table as an info-level log. If a table_name is provided, it also writes the table to a file.\n\n Args:\n *args: Variable length argument list to be passed to `self.helpers.make_table`.\n **kwargs: Arbitrary keyword arguments. If 'table_name' is specified, the table will be written to a file.\n\n Returns:\n str: The generated table as a string.\n\n Examples:\n >>> self.log_table(['Header1', 'Header2'], [['row1col1', 'row1col2'], ['row2col1', 'row2col2']], table_name=\"my_table\")\n \"\"\"\n table_name = kwargs.pop(\"table_name\", None)\n max_log_entries = kwargs.pop(\"max_log_entries\", None)\n table = self.helpers.make_table(*args, **kwargs)\n lines_logged = 0\n for line in table.splitlines():\n if max_log_entries is not None and lines_logged > max_log_entries:\n break\n self.info(line)\n lines_logged += 1\n if table_name is not None:\n date = self.helpers.make_date()\n filename = self.scan.home / f\"{self.helpers.tagify(table_name)}-table-{date}.txt\"\n with open(filename, \"w\") as f:\n f.write(table)\n self.verbose(f\"Wrote {table_name} to {filename}\")\n return table\n\n def _is_graph_important(self, event):\n return self.preserve_graph and getattr(event, \"_graph_important\", False) and not getattr(event, \"_omit\", False)\n\n @property\n def preserve_graph(self):\n preserve_graph = self.config.get(\"preserve_graph\", None)\n if preserve_graph is None:\n preserve_graph = self._preserve_graph\n return preserve_graph\n\n def debug(self, *args, trace=False, **kwargs):\n \"\"\"Logs debug messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.debug(\"This is a debug message\")\n >>> self.debug(\"This is a debug message with a trace\", trace=True)\n \"\"\"\n self.log.debug(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def verbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.verbose(\"This is a verbose message\")\n >>> self.verbose(\"This is a verbose message with a trace\", trace=True)\n \"\"\"\n self.log.verbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeverbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened white text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeverbose(\"This is a huge verbose message\")\n >>> self.hugeverbose(\"This is a huge verbose message with a trace\", trace=True)\n \"\"\"\n self.log.hugeverbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def info(self, *args, trace=False, **kwargs):\n \"\"\"Logs informational messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.info(\"This is an informational message\")\n >>> self.info(\"This is an informational message with a trace\", trace=True)\n \"\"\"\n self.log.info(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeinfo(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened blue text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeinfo(\"This is a huge informational message\")\n >>> self.hugeinfo(\"This is a huge informational message with a trace\", trace=True)\n \"\"\"\n self.log.hugeinfo(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def success(self, *args, trace=False, **kwargs):\n \"\"\"Logs a success message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.success(\"Operation completed successfully\")\n >>> self.success(\"Operation completed with a trace\", trace=True)\n \"\"\"\n self.log.success(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugesuccess(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened green text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugesuccess(\"This is a huge success message\")\n >>> self.hugesuccess(\"This is a huge success message with a trace\", trace=True)\n \"\"\"\n self.log.hugesuccess(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def warning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a warning message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.warning(\"This is a warning message\")\n >>> self.warning(\"This is a warning message with a trace\", trace=False)\n \"\"\"\n self.log.warning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugewarning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened orange text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugewarning(\"This is a huge warning message\")\n >>> self.hugewarning(\"This is a huge warning message with a trace\", trace=False)\n \"\"\"\n self.log.hugewarning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def error(self, *args, trace=True, **kwargs):\n \"\"\"Logs an error message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.error(\"This is an error message\")\n >>> self.error(\"This is an error message with a trace\", trace=False)\n \"\"\"\n self.log.error(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def trace(self, msg=None):\n \"\"\"Logs the stack trace of the most recently caught exception.\n\n This method captures the type, value, and traceback of the most recent exception and logs it using the trace level. It is typically used for debugging purposes.\n\n Anything logged using this method will always be written to the scan's `debug.log`, even if debugging is not enabled.\n\n Examples:\n >>> try:\n >>> 1 / 0\n >>> except ZeroDivisionError:\n >>> self.trace()\n \"\"\"\n if msg is None:\n e_type, e_val, e_traceback = exc_info()\n if e_type is not None:\n self.log.trace(traceback.format_exc())\n else:\n self.log.trace(msg)\n\n def critical(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened red text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.critical(\"This is a critical message\")\n >>> self.critical(\"This is a critical message with a trace\", trace=False)\n \"\"\"\n self.log.critical(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.auth_secret","title":"auth_secret property
","text":"auth_secret\n
Indicates if the module is properly configured for authentication.
This read-only property should be used to check whether all necessary attributes (e.g., API keys, tokens, etc.) are configured to perform authenticated requests in the module. Commonly used in setup or initialization steps.
Returns:
bool
\u2013 True if the module is properly configured for authentication, otherwise False.
property
","text":"config\n
Property that provides easy access to the module's configuration in the scan's config.
This property serves as a shortcut to retrieve the module-specific configuration from self.scan.config
. If no configuration is found for this module, an empty dictionary is returned.
Returns:
dict
\u2013 The configuration dictionary specific to this module.
property
","text":"finished\n
Property indicating whether the module has finished processing.
This property checks three conditions to determine if the module is finished: 1. The module is not currently running (self.running
is False). 2. The number of incoming events in the queue is zero or less (self.num_incoming_events <= 0
). 3. The number of outgoing events in the queue is zero or less (self.outgoing_event_queue.qsize() <= 0
).
Returns:
bool
\u2013 True if the module has finished processing, False otherwise.
property
","text":"http_timeout\n
Convenience shortcut to http_timeout
in the config
property
","text":"memory_usage\n
Property that calculates the current memory usage of the module in bytes.
This property uses the get_size
function to estimate the memory consumption of the module object. The depth of the object graph traversal is limited to 3 levels to avoid performance issues. Commonly shared objects like self.scan
, self.helpers
, are excluded from the calculation to prevent double-counting.
Returns:
int
\u2013 The estimated memory usage of the module in bytes.
property
","text":"priority\n
Gets the priority level of the module as an integer.
The priority level is constrained to be between 1 and 5, inclusive. A lower value indicates a higher priority.
Returns:
int
\u2013 The priority level of the module, constrained between 1 and 5.
Examples:
>>> self.priority\n3\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.running","title":"running property
","text":"running\n
Property indicating whether the module is currently processing data.
This property checks if the task counter (self._task_counter.value
) is greater than zero, indicating that there are ongoing tasks in the module.
Returns:
bool
\u2013 True if the module is currently processing data, False otherwise.
property
","text":"status\n
Provides the current status of the module as a dictionary.
The dictionary contains the following keysReturns:
dict
\u2013 A dictionary containing the current status of the module.
Examples:
>>> self.status\n{'events': {'incoming': 5, 'outgoing': 2}, 'tasks': 3, 'errored': False, 'running': True}\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.__init__","title":"__init__","text":"__init__(scan)\n
Initializes a module instance.
Parameters:
scan
\u2013 The BBOT scan object associated with this module instance.
Attributes:
scan
\u2013 The scan object associated with this module.
errored
(bool
) \u2013 Whether the module has errored out. Default is False.
bbot/modules/base.py
def __init__(self, scan):\n \"\"\"Initializes a module instance.\n\n Args:\n scan: The BBOT scan object associated with this module instance.\n\n Attributes:\n scan: The scan object associated with this module.\n\n errored (bool): Whether the module has errored out. Default is False.\n \"\"\"\n self.scan = scan\n self.errored = False\n self._log = None\n self._incoming_event_queue = None\n self._outgoing_event_queue = None\n # track incoming events to prevent unwanted duplicates\n self._incoming_dup_tracker = set()\n # tracks which subprocesses are running under this module\n self._proc_tracker = set()\n # seconds since we've submitted a batch\n self._last_submitted_batch = None\n # additional callbacks to be executed alongside self.cleanup()\n self.cleanup_callbacks = []\n self._cleanedup = False\n self._watched_events = None\n\n self._task_counter = TaskCounter()\n\n # string constant\n self._custom_filter_criteria_msg = \"it did not meet custom filter criteria\"\n\n # track number of failures (for .request_with_fail_count())\n self._request_failures = 0\n\n self._tasks = []\n self._event_received = asyncio.Condition()\n self._event_queued = asyncio.Condition()\n\n # used for optional \"per host\" tracking\n self._per_host_tracker = set()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.cleanup","title":"cleanup async
","text":"cleanup()\n
Asynchronously performs final cleanup operations after the scan is complete.
This method can be overridden to implement custom cleanup logic. It is called only once per scan and may not raise events.
Returns:
None
This method is called only once per scan and may not raise events.
Source code inbbot/modules/base.py
async def cleanup(self):\n \"\"\"Asynchronously performs final cleanup operations after the scan is complete.\n\n This method can be overridden to implement custom cleanup logic. It is called only once per scan and may not raise events.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan and may not raise events.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.critical","title":"critical","text":"critical(*args, trace=True, **kwargs)\n
Logs a whole message in emboldened red text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.critical(\"This is a critical message\")\n>>> self.critical(\"This is a critical message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def critical(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened red text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.critical(\"This is a critical message\")\n >>> self.critical(\"This is a critical message with a trace\", trace=False)\n \"\"\"\n self.log.critical(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.debug","title":"debug","text":"debug(*args, trace=False, **kwargs)\n
Logs debug messages and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.debug(\"This is a debug message\")\n>>> self.debug(\"This is a debug message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def debug(self, *args, trace=False, **kwargs):\n \"\"\"Logs debug messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.debug(\"This is a debug message\")\n >>> self.debug(\"This is a debug message with a trace\", trace=True)\n \"\"\"\n self.log.debug(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.emit_event","title":"emit_event async
","text":"emit_event(*args, **kwargs)\n
Emit an event to the event queue and distribute it to interested modules.
This is how modules \"return\" data.
The method first creates an event object by calling self.make_event()
with the provided arguments. Then, the event is queued for outgoing distribution using self.queue_outgoing_event()
.
Parameters:
*args
\u2013 Positional arguments to be passed to self.make_event()
for event creation.
**kwargs
\u2013 Keyword arguments to be passed for event creation or configuration of the emit action.
- on_success_callback: Optional callback function to execute upon successful event emission.\n- abort_if: Optional condition under which the event emission should be aborted.\n- quick: Optional flag to indicate whether the event should be processed quickly.\n
Examples:
>>> await self.emit_event(\"www.evilcorp.com\", parent=event, tags=[\"affiliate\"])\n
>>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n>>> await self.emit_event(new_event)\n
Returns:
None
Raises:
ValidationError
\u2013 If the event cannot be validated (handled in self.make_event()
).
bbot/modules/base.py
async def emit_event(self, *args, **kwargs):\n \"\"\"Emit an event to the event queue and distribute it to interested modules.\n\n This is how modules \"return\" data.\n\n The method first creates an event object by calling `self.make_event()` with the provided arguments.\n Then, the event is queued for outgoing distribution using `self.queue_outgoing_event()`.\n\n Args:\n *args: Positional arguments to be passed to `self.make_event()` for event creation.\n **kwargs: Keyword arguments to be passed for event creation or configuration of the emit action.\n ```markdown\n - on_success_callback: Optional callback function to execute upon successful event emission.\n - abort_if: Optional condition under which the event emission should be aborted.\n - quick: Optional flag to indicate whether the event should be processed quickly.\n ```\n\n Examples:\n >>> await self.emit_event(\"www.evilcorp.com\", parent=event, tags=[\"affiliate\"])\n\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n None\n\n Raises:\n ValidationError: If the event cannot be validated (handled in `self.make_event()`).\n \"\"\"\n event_kwargs = dict(kwargs)\n emit_kwargs = {}\n for o in (\"on_success_callback\", \"abort_if\", \"quick\"):\n v = event_kwargs.pop(o, None)\n if v is not None:\n emit_kwargs[o] = v\n event = self.make_event(*args, **event_kwargs)\n if event:\n await self.queue_outgoing_event(event, **emit_kwargs)\n return event\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.error","title":"error","text":"error(*args, trace=True, **kwargs)\n
Logs an error message, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.error(\"This is an error message\")\n>>> self.error(\"This is an error message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def error(self, *args, trace=True, **kwargs):\n \"\"\"Logs an error message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.error(\"This is an error message\")\n >>> self.error(\"This is an error message with a trace\", trace=False)\n \"\"\"\n self.log.error(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.filter_event","title":"filter_event async
","text":"filter_event(event)\n
Asynchronously filters incoming events based on custom criteria.
Override this method for more granular control over which events are accepted by your module. This method is called automatically before handle_event()
for each incoming event that matches any in watched_events
.
Parameters:
event
(Event
) \u2013 The incoming Event object to be filtered.
Returns:
tuple
\u2013 A 2-tuple where the first value is a bool indicating whether the event should be accepted, and the second value is a string explaining the reason for its acceptance or rejection. By default, returns (True, None)
to indicate acceptance without reason.
This method should be overridden if the module requires custom logic for event filtering.
Source code inbbot/modules/base.py
async def filter_event(self, event):\n \"\"\"Asynchronously filters incoming events based on custom criteria.\n\n Override this method for more granular control over which events are accepted by your module. This method is called automatically before `handle_event()` for each incoming event that matches any in `watched_events`.\n\n Args:\n event (Event): The incoming Event object to be filtered.\n\n Returns:\n tuple: A 2-tuple where the first value is a bool indicating whether the event should be accepted, and the second value is a string explaining the reason for its acceptance or rejection. By default, returns `(True, None)` to indicate acceptance without reason.\n\n Note:\n This method should be overridden if the module requires custom logic for event filtering.\n \"\"\"\n return True\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.finish","title":"finish async
","text":"finish()\n
Asynchronously performs final tasks as the scan nears completion.
This method can be overridden to execute any necessary finalization logic. For example, if the module relies on a word cloud, you might wait for the scan to finish to ensure the word cloud is most complete before running an operation.
Returns:
None
bbot/modules/base.py
async def finish(self):\n \"\"\"Asynchronously performs final tasks as the scan nears completion.\n\n This method can be overridden to execute any necessary finalization logic. For example, if the module relies on a word cloud, you might wait for the scan to finish to ensure the word cloud is most complete before running an operation.\n\n Returns:\n None\n\n Warnings:\n This method may be called multiple times since it can raise events, which may re-trigger the \"finish\" phase of the scan. Optional to override.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_per_domain_hash","title":"get_per_domain_hash","text":"get_per_domain_hash(event)\n
Computes a per-domain hash value for a given event. This method may be optionally overridden in subclasses.
Events with the same root domain will receive the same hash value.
Parameters:
event
(Event
) \u2013 The event object containing host, port, or parsed URL information.
Returns:
int
\u2013 The hash value computed for the domain.
Examples:
>>> event = self.make_event(\"https://www.example.com:8443\")\n>>> self.get_per_domain_hash(event)\n
Source code in bbot/modules/base.py
def get_per_domain_hash(self, event):\n \"\"\"\n Computes a per-domain hash value for a given event. This method may be optionally overridden in subclasses.\n\n Events with the same root domain will receive the same hash value.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the domain.\n\n Examples:\n >>> event = self.make_event(\"https://www.example.com:8443\")\n >>> self.get_per_domain_hash(event)\n \"\"\"\n _, domain = self.helpers.split_domain(event.host)\n return hash(domain)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_per_host_hash","title":"get_per_host_hash","text":"get_per_host_hash(event)\n
Computes a per-host hash value for a given event. This method may be optionally overridden in subclasses.
The function uses the event's host
to create a string to be hashed.
Parameters:
event
(Event
) \u2013 The event object containing host information.
Returns:
int
\u2013 The hash value computed for the host.
Examples:
>>> event = self.make_event(\"https://example.com:8443\")\n>>> self.get_per_host_hash(event)\n
Source code in bbot/modules/base.py
def get_per_host_hash(self, event):\n \"\"\"\n Computes a per-host hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host` to create a string to be hashed.\n\n Args:\n event (Event): The event object containing host information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_host_hash(event)\n \"\"\"\n return hash(event.host)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_per_hostport_hash","title":"get_per_hostport_hash","text":"get_per_hostport_hash(event)\n
Computes a per-host:port hash value for a given event. This method may be optionally overridden in subclasses.
The function uses the event's host
, port
, and scheme
(for URLs) to create a string to be hashed. The hash value is used for distinguishing events related to the same host.
Parameters:
event
(Event
) \u2013 The event object containing host, port, or parsed URL information.
Returns:
int
\u2013 The hash value computed for the host.
Examples:
>>> event = self.make_event(\"https://example.com:8443\")\n>>> self.get_per_hostport_hash(event)\n
Source code in bbot/modules/base.py
def get_per_hostport_hash(self, event):\n \"\"\"\n Computes a per-host:port hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host`, `port`, and `scheme` (for URLs) to create a string to be hashed.\n The hash value is used for distinguishing events related to the same host.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_hostport_hash(event)\n \"\"\"\n parsed = getattr(event, \"parsed_url\", None)\n if parsed is None:\n to_hash = self.helpers.make_netloc(event.host, event.port)\n else:\n to_hash = f\"{parsed.scheme}://{parsed.netloc}/\"\n return hash(to_hash)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_watched_events","title":"get_watched_events","text":"get_watched_events()\n
Retrieve the set of events that the module is interested in observing.
Override this method if the set of events the module should watch needs to be determined dynamically, e.g., based on configuration options or other runtime conditions.
Returns:
set
\u2013 The set of event types that this module will handle.
bbot/modules/base.py
def get_watched_events(self):\n \"\"\"Retrieve the set of events that the module is interested in observing.\n\n Override this method if the set of events the module should watch needs to be determined dynamically, e.g., based on configuration options or other runtime conditions.\n\n Returns:\n set: The set of event types that this module will handle.\n \"\"\"\n if self._watched_events is None:\n self._watched_events = set(self.watched_events)\n return self._watched_events\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.handle_batch","title":"handle_batch async
","text":"handle_batch(*events)\n
Handles incoming events in batches for optimized processing.
This method is automatically called when multiple events that match any in watched_events
are encountered and the batch_size
attribute is set to a value greater than 1. Override this method to implement custom batch event-handling logic for your module.
Parameters:
*events
(Event
, default: ()
) \u2013 A variable number of Event objects to be processed in a batch.
This method should be overridden if the batch_size
attribute of the module is set to a value greater than 1.
Returns:
None
bbot/modules/base.py
async def handle_batch(self, *events):\n \"\"\"Handles incoming events in batches for optimized processing.\n\n This method is automatically called when multiple events that match any in `watched_events` are encountered and the `batch_size` attribute is set to a value greater than 1. Override this method to implement custom batch event-handling logic for your module.\n\n Args:\n *events (Event): A variable number of Event objects to be processed in a batch.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to a value greater than 1.\n\n Returns:\n None\n \"\"\"\n pass\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.handle_event","title":"handle_event async
","text":"handle_event(event)\n
Asynchronously handles incoming events that the module is configured to watch.
This method is automatically invoked when an event that matches any in watched_events
is encountered during a scan. Override this method to implement custom event-handling logic for your module.
Parameters:
event
(Event
) \u2013 The event object containing details about the incoming event.
This method should be overridden if the batch_size
attribute of the module is set to 1.
Returns:
None
bbot/modules/base.py
async def handle_event(self, event):\n \"\"\"Asynchronously handles incoming events that the module is configured to watch.\n\n This method is automatically invoked when an event that matches any in `watched_events` is encountered during a scan. Override this method to implement custom event-handling logic for your module.\n\n Args:\n event (Event): The event object containing details about the incoming event.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to 1.\n\n Returns:\n None\n \"\"\"\n pass\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugeinfo","title":"hugeinfo","text":"hugeinfo(*args, trace=False, **kwargs)\n
Logs a whole message in emboldened blue text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugeinfo(\"This is a huge informational message\")\n>>> self.hugeinfo(\"This is a huge informational message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def hugeinfo(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened blue text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeinfo(\"This is a huge informational message\")\n >>> self.hugeinfo(\"This is a huge informational message with a trace\", trace=True)\n \"\"\"\n self.log.hugeinfo(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugesuccess","title":"hugesuccess","text":"hugesuccess(*args, trace=False, **kwargs)\n
Logs a whole message in emboldened green text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugesuccess(\"This is a huge success message\")\n>>> self.hugesuccess(\"This is a huge success message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def hugesuccess(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened green text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugesuccess(\"This is a huge success message\")\n >>> self.hugesuccess(\"This is a huge success message with a trace\", trace=True)\n \"\"\"\n self.log.hugesuccess(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugeverbose","title":"hugeverbose","text":"hugeverbose(*args, trace=False, **kwargs)\n
Logs a whole message in emboldened white text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugeverbose(\"This is a huge verbose message\")\n>>> self.hugeverbose(\"This is a huge verbose message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def hugeverbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened white text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeverbose(\"This is a huge verbose message\")\n >>> self.hugeverbose(\"This is a huge verbose message with a trace\", trace=True)\n \"\"\"\n self.log.hugeverbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugewarning","title":"hugewarning","text":"hugewarning(*args, trace=True, **kwargs)\n
Logs a whole message in emboldened orange text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugewarning(\"This is a huge warning message\")\n>>> self.hugewarning(\"This is a huge warning message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def hugewarning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened orange text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugewarning(\"This is a huge warning message\")\n >>> self.hugewarning(\"This is a huge warning message with a trace\", trace=False)\n \"\"\"\n self.log.hugewarning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.info","title":"info","text":"info(*args, trace=False, **kwargs)\n
Logs informational messages and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.info(\"This is an informational message\")\n>>> self.info(\"This is an informational message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def info(self, *args, trace=False, **kwargs):\n \"\"\"Logs informational messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.info(\"This is an informational message\")\n >>> self.info(\"This is an informational message with a trace\", trace=True)\n \"\"\"\n self.log.info(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.log_table","title":"log_table","text":"log_table(*args, **kwargs)\n
Logs a table to the console and optionally writes it to a file.
This function generates a table using self.helpers.make_table
, then logs each line of the table as an info-level log. If a table_name is provided, it also writes the table to a file.
Parameters:
*args
\u2013 Variable length argument list to be passed to self.helpers.make_table
.
**kwargs
\u2013 Arbitrary keyword arguments. If 'table_name' is specified, the table will be written to a file.
Returns:
str
\u2013 The generated table as a string.
Examples:
>>> self.log_table(['Header1', 'Header2'], [['row1col1', 'row1col2'], ['row2col1', 'row2col2']], table_name=\"my_table\")\n
Source code in bbot/modules/base.py
def log_table(self, *args, **kwargs):\n \"\"\"Logs a table to the console and optionally writes it to a file.\n\n This function generates a table using `self.helpers.make_table`, then logs each line\n of the table as an info-level log. If a table_name is provided, it also writes the table to a file.\n\n Args:\n *args: Variable length argument list to be passed to `self.helpers.make_table`.\n **kwargs: Arbitrary keyword arguments. If 'table_name' is specified, the table will be written to a file.\n\n Returns:\n str: The generated table as a string.\n\n Examples:\n >>> self.log_table(['Header1', 'Header2'], [['row1col1', 'row1col2'], ['row2col1', 'row2col2']], table_name=\"my_table\")\n \"\"\"\n table_name = kwargs.pop(\"table_name\", None)\n max_log_entries = kwargs.pop(\"max_log_entries\", None)\n table = self.helpers.make_table(*args, **kwargs)\n lines_logged = 0\n for line in table.splitlines():\n if max_log_entries is not None and lines_logged > max_log_entries:\n break\n self.info(line)\n lines_logged += 1\n if table_name is not None:\n date = self.helpers.make_date()\n filename = self.scan.home / f\"{self.helpers.tagify(table_name)}-table-{date}.txt\"\n with open(filename, \"w\") as f:\n f.write(table)\n self.verbose(f\"Wrote {table_name} to {filename}\")\n return table\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.make_event","title":"make_event","text":"make_event(*args, **kwargs)\n
Create an event for the scan.
Raises a validation error if the event could not be created, unless raise_error is set to False.
Parameters:
*args
\u2013 Positional arguments to be passed to the scan's make_event method.
**kwargs
\u2013 Keyword arguments to be passed to the scan's make_event method.
raise_error
(bool
) \u2013 Whether to raise a validation error if the event could not be created. Defaults to False.
Examples:
>>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n>>> await self.emit_event(new_event)\n
Returns:
Event or None: The created event, or None if a validation error occurred and raise_error was False.
Raises:
ValidationError
\u2013 If the event could not be validated and raise_error is True.
bbot/modules/base.py
def make_event(self, *args, **kwargs):\n \"\"\"Create an event for the scan.\n\n Raises a validation error if the event could not be created, unless raise_error is set to False.\n\n Args:\n *args: Positional arguments to be passed to the scan's make_event method.\n **kwargs: Keyword arguments to be passed to the scan's make_event method.\n raise_error (bool, optional): Whether to raise a validation error if the event could not be created. Defaults to False.\n\n Examples:\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n Event or None: The created event, or None if a validation error occurred and raise_error was False.\n\n Raises:\n ValidationError: If the event could not be validated and raise_error is True.\n \"\"\"\n raise_error = kwargs.pop(\"raise_error\", False)\n module = kwargs.pop(\"module\", None)\n if module is None:\n if (not args) or getattr(args[0], \"module\", None) is None:\n kwargs[\"module\"] = self\n try:\n event = self.scan.make_event(*args, **kwargs)\n except ValidationError as e:\n if raise_error:\n raise\n self.warning(f\"{e}\")\n return\n return event\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.ping","title":"ping async
","text":"ping()\n
Asynchronously checks the health of the configured API.
This method is used in conjunction with require_api_key() to verify that the API is not just configured, but also responsive. This method should include an assert statement to validate the API's health, typically by making a test request to a known endpoint.
Example UsageIn your implementation, if the API has a \"/ping\" endpoint: async def ping(self): r = await self.request_with_fail_count(f\"{self.base_url}/ping\") resp_content = getattr(r, \"text\", \"\") assert getattr(r, \"status_code\", 0) == 200, resp_content
Returns:
None
Raises:
AssertionError
\u2013 If the API does not respond as expected.
bbot/modules/base.py
async def ping(self):\n \"\"\"Asynchronously checks the health of the configured API.\n\n This method is used in conjunction with require_api_key() to verify that the API is not just configured, but also responsive. This method should include an assert statement to validate the API's health, typically by making a test request to a known endpoint.\n\n Example Usage:\n In your implementation, if the API has a \"/ping\" endpoint:\n async def ping(self):\n r = await self.request_with_fail_count(f\"{self.base_url}/ping\")\n resp_content = getattr(r, \"text\", \"\")\n assert getattr(r, \"status_code\", 0) == 200, resp_content\n\n Returns:\n None\n\n Raises:\n AssertionError: If the API does not respond as expected.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.queue_event","title":"queue_event async
","text":"queue_event(event)\n
Asynchronously queues an incoming event to the module's event queue for further processing.
The function performs an initial check to see if the event is acceptable for queuing. If the event passes the check, it is put into the incoming_event_queue
.
Parameters:
event
\u2013 The event object to be queued.
Returns:
None
\u2013 The function doesn't return anything but modifies the state of the incoming_event_queue
.
Examples:
>>> await self.queue_event(some_event)\n
Raises:
AttributeError
\u2013 If the module is not in an acceptable state to queue incoming events.
bbot/modules/base.py
async def queue_event(self, event):\n \"\"\"\n Asynchronously queues an incoming event to the module's event queue for further processing.\n\n The function performs an initial check to see if the event is acceptable for queuing.\n If the event passes the check, it is put into the `incoming_event_queue`.\n\n Args:\n event: The event object to be queued.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `incoming_event_queue`.\n\n Examples:\n >>> await self.queue_event(some_event)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue incoming events.\n \"\"\"\n async with self._task_counter.count(\"queue_event()\", _log=False):\n if self.incoming_event_queue is False:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n return\n acceptable, reason = self._event_precheck(event)\n if not acceptable:\n if reason and reason != \"its type is not in watched_events\":\n self.debug(f\"Not queueing {event} because {reason}\")\n return\n else:\n self.debug(f\"Queueing {event} because {reason}\")\n try:\n self.incoming_event_queue.put_nowait(event)\n async with self._event_received:\n self._event_received.notify()\n if event.type != \"FINISHED\":\n self.scan._new_activity = True\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.queue_outgoing_event","title":"queue_outgoing_event async
","text":"queue_outgoing_event(event, **kwargs)\n
Queues an outgoing event to the module's outgoing event queue for further processing.
The function attempts to put the event into the outgoing_event_queue
immediately. If it's not possible due to the current state of the module, an AttributeError is raised, and a debug log is generated.
Parameters:
event
\u2013 The event object to be queued.
**kwargs
\u2013 Additional keyword arguments to be associated with the event.
Returns:
None
\u2013 The function doesn't return anything but modifies the state of the outgoing_event_queue
.
Examples:
>>> self.queue_outgoing_event(some_outgoing_event, abort_if=lambda e: \"unresolved\" in e.tags)\n
Raises:
AttributeError
\u2013 If the module is not in an acceptable state to queue outgoing events.
bbot/modules/base.py
async def queue_outgoing_event(self, event, **kwargs):\n \"\"\"\n Queues an outgoing event to the module's outgoing event queue for further processing.\n\n The function attempts to put the event into the `outgoing_event_queue` immediately.\n If it's not possible due to the current state of the module, an AttributeError is raised, and a debug log is generated.\n\n Args:\n event: The event object to be queued.\n **kwargs: Additional keyword arguments to be associated with the event.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `outgoing_event_queue`.\n\n Examples:\n >>> self.queue_outgoing_event(some_outgoing_event, abort_if=lambda e: \"unresolved\" in e.tags)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue outgoing events.\n \"\"\"\n try:\n await self.outgoing_event_queue.put((event, kwargs))\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue outgoing event\")\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.report","title":"report async
","text":"report()\n
Asynchronously executes a final task after the scan is complete but before cleanup.
This method can be overridden to aggregate data and raise summary events at the end of the scan.
Returns:
None
This method is called only once per scan.
Source code inbbot/modules/base.py
async def report(self):\n \"\"\"Asynchronously executes a final task after the scan is complete but before cleanup.\n\n This method can be overridden to aggregate data and raise summary events at the end of the scan.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.request_with_fail_count","title":"request_with_fail_count async
","text":"request_with_fail_count(*args, **kwargs)\n
Asynchronously perform an HTTP request while keeping track of consecutive failures.
This function wraps the self.helpers.request
method, incrementing a failure counter if the request returns None. When the failure counter exceeds self.failed_request_abort_threshold
, the module is set to an error state.
Parameters:
*args
\u2013 Positional arguments to pass to self.helpers.request
.
**kwargs
\u2013 Keyword arguments to pass to self.helpers.request
.
Returns:
Any
\u2013 The response object or None if the request failed.
Raises:
None
\u2013 Sets the module to an error state when the failure threshold is reached.
bbot/modules/base.py
async def request_with_fail_count(self, *args, **kwargs):\n \"\"\"Asynchronously perform an HTTP request while keeping track of consecutive failures.\n\n This function wraps the `self.helpers.request` method, incrementing a failure counter if\n the request returns None. When the failure counter exceeds `self.failed_request_abort_threshold`,\n the module is set to an error state.\n\n Args:\n *args: Positional arguments to pass to `self.helpers.request`.\n **kwargs: Keyword arguments to pass to `self.helpers.request`.\n\n Returns:\n Any: The response object or None if the request failed.\n\n Raises:\n None: Sets the module to an error state when the failure threshold is reached.\n \"\"\"\n r = await self.helpers.request(*args, **kwargs)\n if r is None:\n self._request_failures += 1\n else:\n self._request_failures = 0\n if self._request_failures >= self.failed_request_abort_threshold:\n self.set_error_state(f\"Setting error state due to {self._request_failures:,} failed HTTP requests\")\n return r\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.require_api_key","title":"require_api_key async
","text":"require_api_key()\n
Asynchronously checks if an API key is required and valid.
Returns:
bool or tuple: Returns True if API key is valid and ready. Returns a tuple (None, \"error message\") otherwise.
bbot/modules/base.py
async def require_api_key(self):\n \"\"\"\n Asynchronously checks if an API key is required and valid.\n\n Args:\n None\n\n Returns:\n bool or tuple: Returns True if API key is valid and ready.\n Returns a tuple (None, \"error message\") otherwise.\n\n Notes:\n - Fetches the API key from the configuration.\n - Calls the 'ping()' method to test API accessibility.\n - Sets the API key readiness status accordingly.\n \"\"\"\n self.api_key = self.config.get(\"api_key\", \"\")\n if self.auth_secret:\n try:\n await self.ping()\n self.hugesuccess(f\"API is ready\")\n return True\n except Exception as e:\n return None, f\"Error with API ({str(e).strip()})\"\n else:\n return None, \"No API key set\"\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.set_error_state","title":"set_error_state","text":"set_error_state(message=None, clear_outgoing_queue=False, critical=False)\n
Puts the module into an errored state where it cannot accept new events. Optionally logs a warning message.
The function sets the module's errored
attribute to True and logs a warning with the optional message. It also clears the incoming event queue to prevent further processing and updates its status to False.
Parameters:
message
(str
, default: None
) \u2013 Additional message to be logged along with the warning.
Returns:
None
\u2013 The function doesn't return anything but updates the errored
state and clears the incoming event queue.
Examples:
>>> self.set_error_state()\n>>> self.set_error_state(\"Failed to connect to the server\")\n
Notes self._incoming_event_queue
to False to prevent its further use.bbot/modules/base.py
def set_error_state(self, message=None, clear_outgoing_queue=False, critical=False):\n \"\"\"\n Puts the module into an errored state where it cannot accept new events. Optionally logs a warning message.\n\n The function sets the module's `errored` attribute to True and logs a warning with the optional message.\n It also clears the incoming event queue to prevent further processing and updates its status to False.\n\n Args:\n message (str, optional): Additional message to be logged along with the warning.\n\n Returns:\n None: The function doesn't return anything but updates the `errored` state and clears the incoming event queue.\n\n Examples:\n >>> self.set_error_state()\n >>> self.set_error_state(\"Failed to connect to the server\")\n\n Notes:\n - The function sets `self._incoming_event_queue` to False to prevent its further use.\n - If the module was already in an errored state, the function will not reset the error state or the queue.\n \"\"\"\n if not self.errored:\n log_msg = \"Setting error state\"\n if message is not None:\n log_msg += f\": {message}\"\n if critical:\n log_fn = self.error\n else:\n log_fn = self.warning\n log_fn(log_msg)\n self.errored = True\n # clear incoming queue\n if self.incoming_event_queue is not False:\n self.debug(f\"Emptying event_queue\")\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.incoming_event_queue.get_nowait()\n # set queue to None to prevent its use\n # if there are leftover objects in the queue, the scan will hang.\n self._incoming_event_queue = False\n\n if clear_outgoing_queue:\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.outgoing_event_queue.get_nowait()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.setup","title":"setup async
","text":"setup()\n
Performs one-time setup tasks for the module.
This method is responsible for preparing the module for its operation, which may include tasks such as downloading necessary resources, validating configuration parameters, or other preliminary checks.
Returns:
tuple
\u2013 True
if the setup was successful, None
for a soft-fail where the module setup did not succeed but the scan will continue with the module disabled, and False
for a hard-fail where the setup failure causes the scan to abort.None
or False
).Examples:
>>> async def setup(self):\n>>> if not self.config.get(\"api_key\"):\n>>> # Soft-fail: Configuration missing an API key\n>>> return None, \"No API key specified\"\n
>>> async def setup(self):\n>>> try:\n>>> wordlist = await self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n>>> except WordlistError as e:\n>>> # Hard-fail: Error retrieving wordlist\n>>> return False, f\"Error retrieving wordlist: {e}\"\n
>>> async def setup(self):\n>>> self.timeout = self.config.get(\"timeout\", 5)\n>>> # Success: Setup completed without issues\n>>> return True\n
Source code in bbot/modules/base.py
async def setup(self):\n \"\"\"\n Performs one-time setup tasks for the module.\n\n This method is responsible for preparing the module for its operation, which may include tasks\n such as downloading necessary resources, validating configuration parameters, or other preliminary\n checks.\n\n Returns:\n tuple:\n - bool or None: A status indicating the outcome of the setup process. Returns `True` if\n the setup was successful, `None` for a soft-fail where the module setup did not succeed\n but the scan will continue with the module disabled, and `False` for a hard-fail where\n the setup failure causes the scan to abort.\n - str, optional: A reason for the setup failure, provided only when the setup does not\n succeed (i.e., returns `None` or `False`).\n\n Examples:\n >>> async def setup(self):\n >>> if not self.config.get(\"api_key\"):\n >>> # Soft-fail: Configuration missing an API key\n >>> return None, \"No API key specified\"\n\n >>> async def setup(self):\n >>> try:\n >>> wordlist = await self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n >>> except WordlistError as e:\n >>> # Hard-fail: Error retrieving wordlist\n >>> return False, f\"Error retrieving wordlist: {e}\"\n\n >>> async def setup(self):\n >>> self.timeout = self.config.get(\"timeout\", 5)\n >>> # Success: Setup completed without issues\n >>> return True\n \"\"\"\n\n return True\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.success","title":"success","text":"success(*args, trace=False, **kwargs)\n
Logs a success message, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.success(\"Operation completed successfully\")\n>>> self.success(\"Operation completed with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def success(self, *args, trace=False, **kwargs):\n \"\"\"Logs a success message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.success(\"Operation completed successfully\")\n >>> self.success(\"Operation completed with a trace\", trace=True)\n \"\"\"\n self.log.success(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.trace","title":"trace","text":"trace(msg=None)\n
Logs the stack trace of the most recently caught exception.
This method captures the type, value, and traceback of the most recent exception and logs it using the trace level. It is typically used for debugging purposes.
Anything logged using this method will always be written to the scan's debug.log
, even if debugging is not enabled.
Examples:
>>> try:\n>>> 1 / 0\n>>> except ZeroDivisionError:\n>>> self.trace()\n
Source code in bbot/modules/base.py
def trace(self, msg=None):\n \"\"\"Logs the stack trace of the most recently caught exception.\n\n This method captures the type, value, and traceback of the most recent exception and logs it using the trace level. It is typically used for debugging purposes.\n\n Anything logged using this method will always be written to the scan's `debug.log`, even if debugging is not enabled.\n\n Examples:\n >>> try:\n >>> 1 / 0\n >>> except ZeroDivisionError:\n >>> self.trace()\n \"\"\"\n if msg is None:\n e_type, e_val, e_traceback = exc_info()\n if e_type is not None:\n self.log.trace(traceback.format_exc())\n else:\n self.log.trace(msg)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.verbose","title":"verbose","text":"verbose(*args, trace=False, **kwargs)\n
Logs messages and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.verbose(\"This is a verbose message\")\n>>> self.verbose(\"This is a verbose message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def verbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.verbose(\"This is a verbose message\")\n >>> self.verbose(\"This is a verbose message with a trace\", trace=True)\n \"\"\"\n self.log.verbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.warning","title":"warning","text":"warning(*args, trace=True, **kwargs)\n
Logs a warning message, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.warning(\"This is a warning message\")\n>>> self.warning(\"This is a warning message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def warning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a warning message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.warning(\"This is a warning message\")\n >>> self.warning(\"This is a warning message with a trace\", trace=False)\n \"\"\"\n self.log.warning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/core/","title":"BBOTCore","text":""},{"location":"dev/core/#bbot.core.core.BBOTCore","title":"BBOTCore","text":"This is the first thing that loads when you import BBOT.
Unlike a Preset, BBOTCore holds only the config, not scan-specific stuff like targets, flags, modules, etc.
Its main jobs are:
default
and custom
config (this allows presets to only display the config options that have changed)bbot/core/core.py
class BBOTCore:\n \"\"\"\n This is the first thing that loads when you import BBOT.\n\n Unlike a Preset, BBOTCore holds only the config, not scan-specific stuff like targets, flags, modules, etc.\n\n Its main jobs are:\n\n - set up logging\n - keep separation between the `default` and `custom` config (this allows presets to only display the config options that have changed)\n - allow for easy merging of configs\n - load quickly\n \"\"\"\n\n # used for filtering out sensitive config values\n secrets_strings = [\"api_key\", \"username\", \"password\", \"token\", \"secret\", \"_id\"]\n # don't filter/remove entries under this key\n secrets_exclude_keys = [\"modules\"]\n\n def __init__(self):\n self._logger = None\n self._files_config = None\n\n self.bbot_sudo_pass = None\n\n self._config = None\n self._custom_config = None\n\n # bare minimum == logging\n self.logger\n self.log = logging.getLogger(\"bbot.core\")\n\n import multiprocessing\n\n self.process_name = multiprocessing.current_process().name\n\n @property\n def home(self):\n return Path(self.config[\"home\"]).expanduser().resolve()\n\n @property\n def cache_dir(self):\n return self.home / \"cache\"\n\n @property\n def tools_dir(self):\n return self.home / \"tools\"\n\n @property\n def temp_dir(self):\n return self.home / \"temp\"\n\n @property\n def lib_dir(self):\n return self.home / \"lib\"\n\n @property\n def scans_dir(self):\n return self.home / \"scans\"\n\n @property\n def config(self):\n \"\"\"\n .config is just .default_config + .custom_config merged together\n\n any new values should be added to custom_config.\n \"\"\"\n if self._config is None:\n self._config = OmegaConf.merge(self.default_config, self.custom_config)\n # set read-only flag (change .custom_config instead)\n OmegaConf.set_readonly(self._config, True)\n return self._config\n\n @property\n def default_config(self):\n \"\"\"\n The default BBOT config (from `defaults.yml`). Read-only.\n \"\"\"\n global DEFAULT_CONFIG\n if DEFAULT_CONFIG is None:\n self.default_config = self.files_config.get_default_config()\n # ensure bbot home dir\n if not \"home\" in self.default_config:\n self.default_config[\"home\"] = \"~/.bbot\"\n return DEFAULT_CONFIG\n\n @default_config.setter\n def default_config(self, value):\n # we temporarily clear out the config so it can be refreshed if/when default_config changes\n global DEFAULT_CONFIG\n self._config = None\n DEFAULT_CONFIG = value\n # set read-only flag (change .custom_config instead)\n OmegaConf.set_readonly(DEFAULT_CONFIG, True)\n\n @property\n def custom_config(self):\n \"\"\"\n Custom BBOT config (from `~/.config/bbot/bbot.yml`)\n \"\"\"\n # we temporarily clear out the config so it can be refreshed if/when custom_config changes\n self._config = None\n if self._custom_config is None:\n self.custom_config = self.files_config.get_custom_config()\n return self._custom_config\n\n @custom_config.setter\n def custom_config(self, value):\n # we temporarily clear out the config so it can be refreshed if/when custom_config changes\n self._config = None\n # ensure the modules key is always a dictionary\n modules_entry = value.get(\"modules\", None)\n if modules_entry is not None and not OmegaConf.is_dict(modules_entry):\n value[\"modules\"] = {}\n self._custom_config = value\n\n def no_secrets_config(self, config):\n from .helpers.misc import clean_dict\n\n with suppress(ValueError):\n config = OmegaConf.to_object(config)\n\n return clean_dict(\n config,\n *self.secrets_strings,\n fuzzy=True,\n exclude_keys=self.secrets_exclude_keys,\n )\n\n def secrets_only_config(self, config):\n from .helpers.misc import filter_dict\n\n with suppress(ValueError):\n config = OmegaConf.to_object(config)\n\n return filter_dict(\n config,\n *self.secrets_strings,\n fuzzy=True,\n exclude_keys=self.secrets_exclude_keys,\n )\n\n def merge_custom(self, config):\n \"\"\"\n Merge a config into the custom config.\n \"\"\"\n self.custom_config = OmegaConf.merge(self.custom_config, OmegaConf.create(config))\n\n def merge_default(self, config):\n \"\"\"\n Merge a config into the default config.\n \"\"\"\n self.default_config = OmegaConf.merge(self.default_config, OmegaConf.create(config))\n\n def copy(self):\n \"\"\"\n Return a semi-shallow copy of self. (`custom_config` is copied, but `default_config` stays the same)\n \"\"\"\n core_copy = copy(self)\n core_copy._custom_config = self._custom_config.copy()\n return core_copy\n\n @property\n def files_config(self):\n \"\"\"\n Get the configs from `bbot.yml` and `defaults.yml`\n \"\"\"\n if self._files_config is None:\n from .config import files\n\n self.files = files\n self._files_config = files.BBOTConfigFiles(self)\n return self._files_config\n\n def create_process(self, *args, **kwargs):\n if os.environ.get(\"BBOT_TESTING\", \"\") == \"True\":\n process = self.create_thread(*args, **kwargs)\n else:\n if self.process_name == \"MainProcess\":\n from .helpers.process import BBOTProcess\n\n process = BBOTProcess(*args, **kwargs)\n else:\n raise BBOTError(f\"Tried to start server from process {self.process_name}\")\n process.daemon = True\n return process\n\n def create_thread(self, *args, **kwargs):\n from .helpers.process import BBOTThread\n\n return BBOTThread(*args, **kwargs)\n\n @property\n def logger(self):\n self.config\n if self._logger is None:\n from .config.logger import BBOTLogger\n\n self._logger = BBOTLogger(self)\n return self._logger\n
"},{"location":"dev/core/#bbot.core.core.BBOTCore.config","title":"config property
","text":"config\n
.config is just .default_config + .custom_config merged together
any new values should be added to custom_config.
"},{"location":"dev/core/#bbot.core.core.BBOTCore.custom_config","title":"custom_configproperty
writable
","text":"custom_config\n
Custom BBOT config (from ~/.config/bbot/bbot.yml
)
property
writable
","text":"default_config\n
The default BBOT config (from defaults.yml
). Read-only.
property
","text":"files_config\n
Get the configs from bbot.yml
and defaults.yml
copy()\n
Return a semi-shallow copy of self. (custom_config
is copied, but default_config
stays the same)
bbot/core/core.py
def copy(self):\n \"\"\"\n Return a semi-shallow copy of self. (`custom_config` is copied, but `default_config` stays the same)\n \"\"\"\n core_copy = copy(self)\n core_copy._custom_config = self._custom_config.copy()\n return core_copy\n
"},{"location":"dev/core/#bbot.core.core.BBOTCore.merge_custom","title":"merge_custom","text":"merge_custom(config)\n
Merge a config into the custom config.
Source code inbbot/core/core.py
def merge_custom(self, config):\n \"\"\"\n Merge a config into the custom config.\n \"\"\"\n self.custom_config = OmegaConf.merge(self.custom_config, OmegaConf.create(config))\n
"},{"location":"dev/core/#bbot.core.core.BBOTCore.merge_default","title":"merge_default","text":"merge_default(config)\n
Merge a config into the default config.
Source code inbbot/core/core.py
def merge_default(self, config):\n \"\"\"\n Merge a config into the default config.\n \"\"\"\n self.default_config = OmegaConf.merge(self.default_config, OmegaConf.create(config))\n
"},{"location":"dev/dev_environment/","title":"Setting Up a Dev Environment","text":"The following will show you how to set up a fully functioning python environment for devving on BBOT.
"},{"location":"dev/dev_environment/#installation-poetry","title":"Installation (Poetry)","text":"Poetry is the recommended method of installation if you want to dev on BBOT. To set up a dev environment with Poetry, you can follow these steps:
# clone your forked repo and cd into it\ngit clone git@github.com/<username>/bbot.git\ncd bbot\n\n# install poetry\ncurl -sSL https://install.python-poetry.org | python3 -\n\n# install pip dependencies\npoetry install\n# install pre-commit hooks, etc.\npoetry run pre-commit install\n\n# enter virtual environment\npoetry shell\n\nbbot --help\n
bbot
command.# auto-format code indentation, etc.\nblack .\n\n# run tests\n./bbot/test/run_tests.sh\n
dev
branch of the main BBOT repo.Below is a simple Discord bot designed to run BBOT scans.
examples/discord_bot.pyimport discord\nfrom discord.ext import commands\n\nfrom bbot.scanner import Scanner\nfrom bbot.modules.output.discord import Discord\n\n\nclass BBOTDiscordBot(commands.Cog):\n \"\"\"\n A simple Discord bot capable of running a BBOT scan.\n\n To set up:\n 1. Go to Discord Developer Portal (https://discord.com/developers)\n 2. Create a new application\n 3. Create an invite link for the bot, visit the link to invite it to your server\n - Your Application --> OAuth2 --> URL Generator\n - For Scopes, select \"bot\"\"\n - For Bot Permissions, select:\n - Read Messages/View Channels\n - Send Messages\n 4. Turn on \"Message Content Intent\"\n - Your Application --> Bot --> Privileged Gateway Intents --> Message Content Intent\n 5. Copy your Discord Bot Token and put it at the top this file\n - Your Application --> Bot --> Reset Token\n 6. Run this script\n\n To scan evilcorp.com, you would type:\n\n /scan evilcorp.com\n\n Results will be output to the same channel.\n \"\"\"\n\n def __init__(self):\n self.current_scan = None\n\n @commands.command(name=\"scan\", description=\"Scan a target with BBOT.\")\n async def scan(self, ctx, target: str):\n if self.current_scan is not None:\n self.current_scan.stop()\n await ctx.send(f\"Starting scan against {target}.\")\n\n # creates scan instance\n self.current_scan = Scanner(target, flags=\"subdomain-enum\")\n discord_module = Discord(self.current_scan)\n\n seen = set()\n num_events = 0\n # start scan and iterate through results\n async for event in self.current_scan.async_start():\n if hash(event) in seen:\n continue\n seen.add(hash(event))\n await ctx.send(discord_module.format_message(event))\n num_events += 1\n\n await ctx.send(f\"Finished scan against {target}. {num_events:,} results.\")\n self.current_scan = None\n\n\nif __name__ == \"__main__\":\n intents = discord.Intents.default()\n intents.message_content = True\n bot = commands.Bot(command_prefix=\"/\", intents=intents)\n\n @bot.event\n async def on_ready():\n print(f\"We have logged in as {bot.user}\")\n await bot.add_cog(BBOTDiscordBot())\n\n bot.run(\"DISCORD_BOT_TOKEN_HERE\")\n
"},{"location":"dev/engine/","title":"Engine","text":""},{"location":"dev/engine/#bbot.core.engine.EngineBase","title":"EngineBase","text":"Base Engine class for Server and Client.
An Engine is a simple and lightweight RPC implementation that allows offloading async tasks to a separate process. It leverages ZeroMQ in a ROUTER-DEALER configuration.
BBOT makes use of this by spawning a dedicated engine for DNS and HTTP tasks. This offloads I/O and helps free up the main event loop for other tasks.
To use Engine, you must subclass both EngineClient and EngineServer.
See the respective EngineClient and EngineServer classes for usage examples.
Source code inbbot/core/engine.py
class EngineBase:\n \"\"\"\n Base Engine class for Server and Client.\n\n An Engine is a simple and lightweight RPC implementation that allows offloading async tasks\n to a separate process. It leverages ZeroMQ in a ROUTER-DEALER configuration.\n\n BBOT makes use of this by spawning a dedicated engine for DNS and HTTP tasks.\n This offloads I/O and helps free up the main event loop for other tasks.\n\n To use Engine, you must subclass both EngineClient and EngineServer.\n\n See the respective EngineClient and EngineServer classes for usage examples.\n \"\"\"\n\n ERROR_CLASS = BBOTEngineError\n\n def __init__(self, debug=False):\n self._shutdown_status = False\n self.log = logging.getLogger(f\"bbot.core.{self.__class__.__name__.lower()}\")\n self._debug = debug\n\n def pickle(self, obj):\n try:\n return pickle.dumps(obj)\n except Exception as e:\n self.log.error(f\"Error serializing object: {obj}: {e}\")\n self.log.trace(traceback.format_exc())\n return error_sentinel\n\n def unpickle(self, binary):\n try:\n return pickle.loads(binary)\n except Exception as e:\n self.log.error(f\"Error deserializing binary: {e}\")\n self.log.trace(f\"Offending binary: {binary}\")\n self.log.trace(traceback.format_exc())\n return error_sentinel\n\n async def _infinite_retry(self, callback, *args, **kwargs):\n interval = kwargs.pop(\"_interval\", 15)\n context = kwargs.pop(\"_context\", \"\")\n # default overall timeout of 5 minutes (15 second interval * 20 iterations)\n max_retries = kwargs.pop(\"_max_retries\", 4 * 5)\n if not context:\n context = f\"{callback.__name__}({args}, {kwargs})\"\n retries = 0\n while not self._shutdown_status:\n try:\n return await asyncio.wait_for(callback(*args, **kwargs), timeout=interval)\n except (TimeoutError, asyncio.exceptions.TimeoutError):\n self.log.debug(f\"{self.name}: Timeout after {interval:,} seconds{context}, retrying...\")\n retries += 1\n if max_retries is not None and retries > max_retries:\n raise TimeoutError(f\"Timed out after {max_retries*interval:,} seconds {context}\")\n\n def debug(self, *args, **kwargs):\n if self._debug:\n self.log.debug(*args, **kwargs)\n
"},{"location":"dev/engine/#bbot.core.engine.EngineClient","title":"EngineClient","text":" Bases: EngineBase
The client portion of BBOT's RPC Engine.
To create an engine, you must create a subclass of this class and also define methods for each of your desired functions.
Note that this only supports async functions. If you need to offload a synchronous function to another CPU, use BBOT's multiprocessing pool instead.
Any CPU or I/O intense logic should be implemented in the EngineServer.
These functions are typically stubs whose only job is to forward the arguments to the server.
Functions with the same names should be defined on the EngineServer.
The EngineClient must specify its associated server class via the SERVER_CLASS
variable.
Depending on whether your function is a generator, you will use either run_and_return()
, or run_and_yield
.
Examples:
>>> from bbot.core.engine import EngineClient\n>>>\n>>> class MyClient(EngineClient):\n>>> SERVER_CLASS = MyServer\n>>>\n>>> async def my_function(self, **kwargs)\n>>> return await self.run_and_return(\"my_function\", **kwargs)\n>>>\n>>> async def my_generator(self, **kwargs):\n>>> async for _ in self.run_and_yield(\"my_generator\", **kwargs):\n>>> yield _\n
Source code in bbot/core/engine.py
class EngineClient(EngineBase):\n \"\"\"\n The client portion of BBOT's RPC Engine.\n\n To create an engine, you must create a subclass of this class and also\n define methods for each of your desired functions.\n\n Note that this only supports async functions. If you need to offload a synchronous function to another CPU, use BBOT's multiprocessing pool instead.\n\n Any CPU or I/O intense logic should be implemented in the EngineServer.\n\n These functions are typically stubs whose only job is to forward the arguments to the server.\n\n Functions with the same names should be defined on the EngineServer.\n\n The EngineClient must specify its associated server class via the `SERVER_CLASS` variable.\n\n Depending on whether your function is a generator, you will use either `run_and_return()`, or `run_and_yield`.\n\n Examples:\n >>> from bbot.core.engine import EngineClient\n >>>\n >>> class MyClient(EngineClient):\n >>> SERVER_CLASS = MyServer\n >>>\n >>> async def my_function(self, **kwargs)\n >>> return await self.run_and_return(\"my_function\", **kwargs)\n >>>\n >>> async def my_generator(self, **kwargs):\n >>> async for _ in self.run_and_yield(\"my_generator\", **kwargs):\n >>> yield _\n \"\"\"\n\n SERVER_CLASS = None\n\n def __init__(self, debug=False, **kwargs):\n self.name = f\"EngineClient {self.__class__.__name__}\"\n super().__init__(debug=debug)\n self.process = None\n if self.SERVER_CLASS is None:\n raise ValueError(f\"Must set EngineClient SERVER_CLASS, {self.SERVER_CLASS}\")\n self.CMDS = dict(self.SERVER_CLASS.CMDS)\n for k, v in list(self.CMDS.items()):\n self.CMDS[v] = k\n self.socket_address = f\"zmq_{rand_string(8)}.sock\"\n self.socket_path = Path(tempfile.gettempdir()) / self.socket_address\n self.server_kwargs = kwargs.pop(\"server_kwargs\", {})\n self._server_process = None\n self.context = zmq.asyncio.Context()\n self.context.setsockopt(zmq.LINGER, 0)\n self.sockets = set()\n\n def check_error(self, message):\n if isinstance(message, dict) and len(message) == 1 and \"_e\" in message:\n error, trace = message[\"_e\"]\n error = self.ERROR_CLASS(error)\n error.engine_traceback = trace\n raise error\n return False\n\n async def run_and_return(self, command, *args, **kwargs):\n fn_str = f\"{command}({args}, {kwargs})\"\n self.debug(f\"{self.name}: executing run-and-return {fn_str}\")\n if self._shutdown_status and not command == \"_shutdown\":\n self.log.verbose(f\"{self.name} has been shut down and is not accepting new tasks\")\n return\n async with self.new_socket() as socket:\n try:\n message = self.make_message(command, args=args, kwargs=kwargs)\n if message is error_sentinel:\n return\n await socket.send(message)\n binary = await self._infinite_retry(socket.recv, _context=f\"waiting for return value from {fn_str}\")\n except BaseException:\n try:\n await self.send_cancel_message(socket, fn_str)\n except Exception:\n self.log.debug(f\"{self.name}: {fn_str} failed to send cancel message after exception\")\n self.log.trace(traceback.format_exc())\n raise\n # self.log.debug(f\"{self.name}.{command}({kwargs}) got binary: {binary}\")\n message = self.unpickle(binary)\n self.debug(f\"{self.name}: {fn_str} got return value: {message}\")\n # error handling\n if self.check_error(message):\n return\n return message\n\n async def run_and_yield(self, command, *args, **kwargs):\n fn_str = f\"{command}({args}, {kwargs})\"\n self.debug(f\"{self.name}: executing run-and-yield {fn_str}\")\n if self._shutdown_status:\n self.log.verbose(\"Engine has been shut down and is not accepting new tasks\")\n return\n message = self.make_message(command, args=args, kwargs=kwargs)\n if message is error_sentinel:\n return\n async with self.new_socket() as socket:\n # TODO: synchronize server-side generator by limiting qsize\n # socket.setsockopt(zmq.RCVHWM, 1)\n # socket.setsockopt(zmq.SNDHWM, 1)\n await socket.send(message)\n while 1:\n try:\n binary = await self._infinite_retry(\n socket.recv, _context=f\"waiting for new iteration from {fn_str}\"\n )\n # self.log.debug(f\"{self.name}.{command}({kwargs}) got binary: {binary}\")\n message = self.unpickle(binary)\n self.debug(f\"{self.name}: {fn_str} got iteration: {message}\")\n # error handling\n if self.check_error(message) or self.check_stop(message):\n break\n yield message\n except (StopAsyncIteration, GeneratorExit) as e:\n exc_name = e.__class__.__name__\n self.debug(f\"{self.name}.{command} got {exc_name}\")\n try:\n await self.send_cancel_message(socket, fn_str)\n except Exception:\n self.debug(f\"{self.name}.{command} failed to send cancel message after {exc_name}\")\n self.log.trace(traceback.format_exc())\n break\n\n async def send_cancel_message(self, socket, context):\n \"\"\"\n Send a cancel message and wait for confirmation from the server\n \"\"\"\n # -1 == special \"cancel\" signal\n message = pickle.dumps({\"c\": -1})\n await self._infinite_retry(socket.send, message)\n while 1:\n response = await self._infinite_retry(\n socket.recv, _context=f\"waiting for CANCEL_OK from {context}\", _max_retries=4\n )\n response = pickle.loads(response)\n if isinstance(response, dict):\n response = response.get(\"m\", \"\")\n if response == \"CANCEL_OK\":\n break\n\n async def send_shutdown_message(self):\n async with self.new_socket() as socket:\n # -99 == special shutdown message\n message = pickle.dumps({\"c\": -99})\n with suppress(TimeoutError, asyncio.exceptions.TimeoutError):\n await asyncio.wait_for(socket.send(message), 0.5)\n with suppress(TimeoutError, asyncio.exceptions.TimeoutError):\n while 1:\n response = await asyncio.wait_for(socket.recv(), 0.5)\n response = pickle.loads(response)\n if isinstance(response, dict):\n response = response.get(\"m\", \"\")\n if response == \"SHUTDOWN_OK\":\n break\n\n def check_stop(self, message):\n if isinstance(message, dict) and len(message) == 1 and \"_s\" in message:\n return True\n return False\n\n def make_message(self, command, args=None, kwargs=None):\n try:\n cmd_id = self.CMDS[command]\n except KeyError:\n raise KeyError(f'Command \"{command}\" not found. Available commands: {\",\".join(self.available_commands)}')\n message = {\"c\": cmd_id}\n if args:\n message[\"a\"] = args\n if kwargs:\n message[\"k\"] = kwargs\n return pickle.dumps(message)\n\n @property\n def available_commands(self):\n return [s for s in self.CMDS if isinstance(s, str)]\n\n def start_server(self):\n import multiprocessing\n\n process_name = multiprocessing.current_process().name\n if process_name == \"MainProcess\":\n kwargs = dict(self.server_kwargs)\n # if we're in tests, we use a single event loop to avoid weird race conditions\n # this allows us to more easily mock http, etc.\n if os.environ.get(\"BBOT_TESTING\", \"\") == \"True\":\n kwargs[\"_loop\"] = get_event_loop()\n kwargs[\"debug\"] = self._debug\n self.process = CORE.create_process(\n target=self.server_process,\n args=(\n self.SERVER_CLASS,\n self.socket_path,\n ),\n kwargs=kwargs,\n custom_name=f\"BBOT {self.__class__.__name__}\",\n )\n self.process.start()\n return self.process\n else:\n raise BBOTEngineError(\n f\"Tried to start server from process {process_name}. Did you forget \\\"if __name__ == '__main__'?\\\"\"\n )\n\n @staticmethod\n def server_process(server_class, socket_path, **kwargs):\n try:\n loop = kwargs.pop(\"_loop\", None)\n engine_server = server_class(socket_path, **kwargs)\n if loop is not None:\n future = asyncio.run_coroutine_threadsafe(engine_server.worker(), loop)\n future.result()\n else:\n asyncio.run(engine_server.worker())\n except (asyncio.CancelledError, KeyboardInterrupt, CancelledError):\n return\n except Exception:\n import traceback\n\n log = logging.getLogger(\"bbot.core.engine.server\")\n log.critical(f\"Unhandled error in {server_class.__name__} server process: {traceback.format_exc()}\")\n\n @asynccontextmanager\n async def new_socket(self):\n if self._server_process is None:\n self._server_process = self.start_server()\n while not self.socket_path.exists():\n self.debug(f\"{self.name}: waiting for server process to start...\")\n await asyncio.sleep(0.1)\n socket = self.context.socket(zmq.DEALER)\n socket.setsockopt(zmq.LINGER, 0)\n socket.connect(f\"ipc://{self.socket_path}\")\n self.sockets.add(socket)\n try:\n yield socket\n finally:\n self.sockets.remove(socket)\n with suppress(Exception):\n socket.close()\n\n async def shutdown(self):\n if not self._shutdown_status:\n self._shutdown_status = True\n self.log.verbose(f\"{self.name}: shutting down...\")\n # send shutdown signal\n await self.send_shutdown_message()\n # then terminate context\n try:\n self.context.destroy(linger=0)\n except Exception:\n print(traceback.format_exc(), file=sys.stderr)\n try:\n self.context.term()\n except Exception:\n print(traceback.format_exc(), file=sys.stderr)\n # delete socket file on exit\n self.socket_path.unlink(missing_ok=True)\n
"},{"location":"dev/engine/#bbot.core.engine.EngineClient.send_cancel_message","title":"send_cancel_message async
","text":"send_cancel_message(socket, context)\n
Send a cancel message and wait for confirmation from the server
Source code inbbot/core/engine.py
async def send_cancel_message(self, socket, context):\n \"\"\"\n Send a cancel message and wait for confirmation from the server\n \"\"\"\n # -1 == special \"cancel\" signal\n message = pickle.dumps({\"c\": -1})\n await self._infinite_retry(socket.send, message)\n while 1:\n response = await self._infinite_retry(\n socket.recv, _context=f\"waiting for CANCEL_OK from {context}\", _max_retries=4\n )\n response = pickle.loads(response)\n if isinstance(response, dict):\n response = response.get(\"m\", \"\")\n if response == \"CANCEL_OK\":\n break\n
"},{"location":"dev/engine/#bbot.core.engine.EngineServer","title":"EngineServer","text":" Bases: EngineBase
The server portion of BBOT's RPC Engine.
Methods defined here must match the methods in your EngineClient.
To use the functions, you must create mappings for them in the CMDS attribute, as shown below.
Examples:
>>> from bbot.core.engine import EngineServer\n>>>\n>>> class MyServer(EngineServer):\n>>> CMDS = {\n>>> 0: \"my_function\",\n>>> 1: \"my_generator\",\n>>> }\n>>>\n>>> def my_function(self, arg1=None):\n>>> await asyncio.sleep(1)\n>>> return str(arg1)\n>>>\n>>> def my_generator(self):\n>>> for i in range(10):\n>>> await asyncio.sleep(1)\n>>> yield i\n
Source code in bbot/core/engine.py
class EngineServer(EngineBase):\n \"\"\"\n The server portion of BBOT's RPC Engine.\n\n Methods defined here must match the methods in your EngineClient.\n\n To use the functions, you must create mappings for them in the CMDS attribute, as shown below.\n\n Examples:\n >>> from bbot.core.engine import EngineServer\n >>>\n >>> class MyServer(EngineServer):\n >>> CMDS = {\n >>> 0: \"my_function\",\n >>> 1: \"my_generator\",\n >>> }\n >>>\n >>> def my_function(self, arg1=None):\n >>> await asyncio.sleep(1)\n >>> return str(arg1)\n >>>\n >>> def my_generator(self):\n >>> for i in range(10):\n >>> await asyncio.sleep(1)\n >>> yield i\n \"\"\"\n\n CMDS = {}\n\n def __init__(self, socket_path, debug=False):\n self.name = f\"EngineServer {self.__class__.__name__}\"\n super().__init__(debug=debug)\n self.socket_path = socket_path\n self.client_id_var = contextvars.ContextVar(\"client_id\", default=None)\n # task <--> client id mapping\n self.tasks = {}\n # child tasks spawned by main tasks\n self.child_tasks = {}\n if self.socket_path is not None:\n # create ZeroMQ context\n self.context = zmq.asyncio.Context()\n self.context.setsockopt(zmq.LINGER, 0)\n # ROUTER socket can handle multiple concurrent requests\n self.socket = self.context.socket(zmq.ROUTER)\n self.socket.setsockopt(zmq.LINGER, 0)\n # create socket file\n self.socket.bind(f\"ipc://{self.socket_path}\")\n\n @contextlib.contextmanager\n def client_id_context(self, value):\n token = self.client_id_var.set(value)\n try:\n yield\n finally:\n self.client_id_var.reset(token)\n\n async def run_and_return(self, client_id, command_fn, *args, **kwargs):\n fn_str = f\"{command_fn.__name__}({args}, {kwargs})\"\n with self.client_id_context(client_id):\n try:\n self.debug(f\"{self.name}: run-and-return {fn_str}\")\n result = error_sentinel\n try:\n result = await command_fn(*args, **kwargs)\n except BaseException as e:\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n error = f\"Error in {self.name}.{fn_str}: {e}\"\n self.debug(error)\n trace = traceback.format_exc()\n self.debug(trace)\n result = {\"_e\": (error, trace)}\n finally:\n self.tasks.pop(client_id, None)\n if result is not error_sentinel:\n self.debug(f\"{self.name}: Sending response to {fn_str}: {result}\")\n await self.send_socket_multipart(client_id, result)\n except BaseException as e:\n self.log.critical(\n f\"Unhandled exception in {self.name}.run_and_return({client_id}, {command_fn}, {args}, {kwargs}): {e}\"\n )\n self.log.critical(traceback.format_exc())\n finally:\n self.debug(f\"{self.name} finished run-and-return {command_fn.__name__}({args}, {kwargs})\")\n\n async def run_and_yield(self, client_id, command_fn, *args, **kwargs):\n fn_str = f\"{command_fn.__name__}({args}, {kwargs})\"\n with self.client_id_context(client_id):\n try:\n self.debug(f\"{self.name}: run-and-yield {fn_str}\")\n try:\n async for _ in command_fn(*args, **kwargs):\n self.debug(f\"{self.name}: sending iteration for {command_fn.__name__}(): {_}\")\n await self.send_socket_multipart(client_id, _)\n except BaseException as e:\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n error = f\"Error in {self.name}.{fn_str}: {e}\"\n trace = traceback.format_exc()\n self.debug(error)\n self.debug(trace)\n result = {\"_e\": (error, trace)}\n await self.send_socket_multipart(client_id, result)\n finally:\n self.debug(f\"{self.name} reached end of run-and-yield iteration for {command_fn.__name__}()\")\n # _s == special signal that means StopIteration\n await self.send_socket_multipart(client_id, {\"_s\": None})\n self.tasks.pop(client_id, None)\n except BaseException as e:\n self.log.critical(\n f\"Unhandled exception in {self.name}.run_and_yield({client_id}, {command_fn}, {args}, {kwargs}): {e}\"\n )\n self.log.critical(traceback.format_exc())\n finally:\n self.debug(f\"{self.name} finished run-and-yield {command_fn.__name__}()\")\n\n async def send_socket_multipart(self, client_id, message):\n try:\n message = pickle.dumps(message)\n await self._infinite_retry(self.socket.send_multipart, [client_id, message])\n except Exception as e:\n self.log.verbose(f\"Error sending ZMQ message: {e}\")\n self.log.trace(traceback.format_exc())\n\n def check_error(self, message):\n if message is error_sentinel:\n return True\n\n async def worker(self):\n self.debug(f\"{self.name}: starting worker\")\n try:\n while 1:\n client_id, binary = await self.socket.recv_multipart()\n message = self.unpickle(binary)\n # self.log.debug(f\"{self.name} got message: {message}\")\n if self.check_error(message):\n continue\n\n cmd = message.get(\"c\", None)\n if not isinstance(cmd, int):\n self.log.warning(f\"{self.name}: no command sent in message: {message}\")\n continue\n\n # -1 == cancel task\n if cmd == -1:\n self.debug(f\"{self.name} got cancel signal\")\n await self.send_socket_multipart(client_id, {\"m\": \"CANCEL_OK\"})\n await self.cancel_task(client_id)\n continue\n\n # -99 == shutdown task\n if cmd == -99:\n self.debug(f\"{self.name} got shutdown signal\")\n await self.send_socket_multipart(client_id, {\"m\": \"SHUTDOWN_OK\"})\n await self._shutdown()\n return\n\n args = message.get(\"a\", ())\n if not isinstance(args, tuple):\n self.log.warning(f\"{self.name}: received invalid args of type {type(args)}, should be tuple\")\n continue\n kwargs = message.get(\"k\", {})\n if not isinstance(kwargs, dict):\n self.log.warning(f\"{self.name}: received invalid kwargs of type {type(kwargs)}, should be dict\")\n continue\n\n command_name = self.CMDS[cmd]\n command_fn = getattr(self, command_name, None)\n\n if command_fn is None:\n self.log.warning(f'{self.name} has no function named \"{command_fn}\"')\n continue\n\n if inspect.isasyncgenfunction(command_fn):\n # self.log.debug(f\"{self.name}: creating run-and-yield coroutine for {command_name}()\")\n coroutine = self.run_and_yield(client_id, command_fn, *args, **kwargs)\n else:\n # self.log.debug(f\"{self.name}: creating run-and-return coroutine for {command_name}()\")\n coroutine = self.run_and_return(client_id, command_fn, *args, **kwargs)\n\n # self.log.debug(f\"{self.name}: creating task for {command_name}() coroutine\")\n task = asyncio.create_task(coroutine)\n self.tasks[client_id] = task, command_fn, args, kwargs\n # self.log.debug(f\"{self.name}: finished creating task for {command_name}() coroutine\")\n except BaseException as e:\n await self._shutdown()\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.log.error(f\"{self.name}: error in EngineServer worker: {e}\")\n self.log.trace(traceback.format_exc())\n finally:\n self.debug(f\"{self.name}: finished worker()\")\n\n async def _shutdown(self):\n if not self._shutdown_status:\n self.log.verbose(f\"{self.name}: shutting down...\")\n self._shutdown_status = True\n await self.cancel_all_tasks()\n try:\n self.context.destroy(linger=0)\n except Exception:\n self.log.trace(traceback.format_exc())\n try:\n self.context.term()\n except Exception:\n self.log.trace(traceback.format_exc())\n self.log.verbose(f\"{self.name}: finished shutting down\")\n\n def new_child_task(self, client_id, coro):\n task = asyncio.create_task(coro)\n try:\n self.child_tasks[client_id].add(task)\n except KeyError:\n self.child_tasks[client_id] = {task}\n return task\n\n async def finished_tasks(self, client_id, timeout=None):\n child_tasks = self.child_tasks.get(client_id, set())\n try:\n done, pending = await asyncio.wait(child_tasks, return_when=asyncio.FIRST_COMPLETED, timeout=timeout)\n except BaseException as e:\n if isinstance(e, (TimeoutError, asyncio.exceptions.TimeoutError)):\n done = set()\n self.log.warning(f\"{self.name}: Timeout after {timeout:,} seconds in finished_tasks({child_tasks})\")\n for task in child_tasks:\n task.cancel()\n else:\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.log.error(f\"{self.name}: Unhandled exception in finished_tasks({child_tasks}): {e}\")\n self.log.trace(traceback.format_exc())\n raise\n self.child_tasks[client_id] = pending\n return done\n\n async def cancel_task(self, client_id):\n parent_task = self.tasks.pop(client_id, None)\n if parent_task is None:\n return\n parent_task, _cmd, _args, _kwargs = parent_task\n self.debug(f\"{self.name}: Cancelling client id {client_id} (task: {parent_task})\")\n parent_task.cancel()\n child_tasks = self.child_tasks.pop(client_id, set())\n if child_tasks:\n self.debug(f\"{self.name}: Cancelling {len(child_tasks):,} child tasks for client id {client_id}\")\n for child_task in child_tasks:\n child_task.cancel()\n\n for task in [parent_task] + list(child_tasks):\n await self._cancel_task(task)\n\n async def _cancel_task(self, task):\n try:\n await asyncio.wait_for(task, timeout=10)\n except (TimeoutError, asyncio.exceptions.TimeoutError):\n self.log.trace(f\"{self.name}: Timeout cancelling task: {task}\")\n return\n except (KeyboardInterrupt, asyncio.CancelledError):\n return\n except BaseException as e:\n self.log.error(f\"Unhandled error in {task.get_coro().__name__}(): {e}\")\n self.log.trace(traceback.format_exc())\n\n async def cancel_all_tasks(self):\n for client_id in list(self.tasks):\n await self.cancel_task(client_id)\n for client_id, tasks in self.child_tasks.items():\n for task in tasks:\n await self._cancel_task(task)\n
"},{"location":"dev/event/","title":"Event","text":"This is a developer reference. For a high-level description of BBOT events including a full list of event types, see Events
"},{"location":"dev/event/#bbot.core.event.base.make_event","title":"make_event","text":"make_event(data, event_type=None, parent=None, context=None, module=None, scan=None, scans=None, tags=None, confidence=100, dummy=False, internal=None)\n
Creates and returns a new event object or modifies an existing one.
This function serves as a factory for creating new event objects, either by generating a new Event
object or by updating an existing event with additional metadata. If data
is already an event, it updates the event based on the additional parameters provided.
Parameters:
data
(Union[str, dict, BaseEvent]
) \u2013 The primary data for the event or an existing event object.
event_type
(str
, default: None
) \u2013 Type of the event, e.g., 'IP_ADDRESS'. Auto-detected if not provided.
parent
(BaseEvent
, default: None
) \u2013 Parent event leading to this event's discovery.
context
(str
, default: None
) \u2013 Description of circumstances leading to event's discovery.
module
(str
, default: None
) \u2013 Module that discovered the event.
scan
(Scan
, default: None
) \u2013 BBOT Scan object associated with the event.
scans
(List[Scan]
, default: None
) \u2013 Multiple BBOT Scan objects, primarily used for unserialization.
tags
(Union[str, List[str]]
, default: None
) \u2013 Descriptive tags for the event, as a list or a single string.
confidence
(int
, default: 100
) \u2013 Confidence level for the event, on a scale of 1-100. Defaults to 100.
dummy
(bool
, default: False
) \u2013 Disables data validations if set to True. Defaults to False.
internal
(Any
, default: None
) \u2013 Makes the event internal if set to True. Defaults to None.
Returns:
BaseEvent
\u2013 A new or updated event object.
Raises:
ValidationError
\u2013 Raised when there's an error in event data or type sanitization.
Examples:
If inside a module, e.g. from within its handle_event()
:
>>> self.make_event(\"1.2.3.4\", parent=event)\nIP_ADDRESS(\"1.2.3.4\", module=portscan, tags={'ipv4', 'distance-1'})\n
If you're outside a module but you have a scan object:
>>> scan.make_event(\"1.2.3.4\", parent=scan.root_event)\nIP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4', 'distance-1'})\n
If you're outside a scan and just messing around:
>>> from bbot.core.event.base import make_event\n>>> make_event(\"1.2.3.4\", dummy=True)\nIP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4'})\n
Note When working within a module's handle_event()
, use the instance method self.make_event()
instead of calling this function directly.
bbot/core/event/base.py
def make_event(\n data,\n event_type=None,\n parent=None,\n context=None,\n module=None,\n scan=None,\n scans=None,\n tags=None,\n confidence=100,\n dummy=False,\n internal=None,\n):\n \"\"\"\n Creates and returns a new event object or modifies an existing one.\n\n This function serves as a factory for creating new event objects, either by generating a new `Event`\n object or by updating an existing event with additional metadata. If `data` is already an event,\n it updates the event based on the additional parameters provided.\n\n Parameters:\n data (Union[str, dict, BaseEvent]): The primary data for the event or an existing event object.\n event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'. Auto-detected if not provided.\n parent (BaseEvent, optional): Parent event leading to this event's discovery.\n context (str, optional): Description of circumstances leading to event's discovery.\n module (str, optional): Module that discovered the event.\n scan (Scan, optional): BBOT Scan object associated with the event.\n scans (List[Scan], optional): Multiple BBOT Scan objects, primarily used for unserialization.\n tags (Union[str, List[str]], optional): Descriptive tags for the event, as a list or a single string.\n confidence (int, optional): Confidence level for the event, on a scale of 1-100. Defaults to 100.\n dummy (bool, optional): Disables data validations if set to True. Defaults to False.\n internal (Any, optional): Makes the event internal if set to True. Defaults to None.\n\n Returns:\n BaseEvent: A new or updated event object.\n\n Raises:\n ValidationError: Raised when there's an error in event data or type sanitization.\n\n Examples:\n If inside a module, e.g. from within its `handle_event()`:\n >>> self.make_event(\"1.2.3.4\", parent=event)\n IP_ADDRESS(\"1.2.3.4\", module=portscan, tags={'ipv4', 'distance-1'})\n\n If you're outside a module but you have a scan object:\n >>> scan.make_event(\"1.2.3.4\", parent=scan.root_event)\n IP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4', 'distance-1'})\n\n If you're outside a scan and just messing around:\n >>> from bbot.core.event.base import make_event\n >>> make_event(\"1.2.3.4\", dummy=True)\n IP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4'})\n\n Note:\n When working within a module's `handle_event()`, use the instance method\n `self.make_event()` instead of calling this function directly.\n \"\"\"\n\n # allow tags to be either a string or an array\n if not tags:\n tags = []\n elif isinstance(tags, str):\n tags = [tags]\n tags = set(tags)\n\n if is_event(data):\n data = copy(data)\n if scan is not None and not data.scan:\n data.scan = scan\n if scans is not None and not data.scans:\n data.scans = scans\n if module is not None:\n data.module = module\n if parent is not None:\n data.parent = parent\n if context is not None:\n data.discovery_context = context\n if internal == True:\n data.internal = True\n if tags:\n data.tags = tags.union(data.tags)\n event_type = data.type\n return data\n else:\n if event_type is None:\n event_type, data = get_event_type(data)\n if not dummy:\n log.debug(f'Autodetected event type \"{event_type}\" based on data: \"{data}\"')\n\n event_type = str(event_type).strip().upper()\n\n # Catch these common whoopsies\n if event_type in (\"DNS_NAME\", \"IP_ADDRESS\"):\n # DNS_NAME <--> EMAIL_ADDRESS confusion\n if validators.soft_validate(data, \"email\"):\n event_type = \"EMAIL_ADDRESS\"\n else:\n # DNS_NAME <--> IP_ADDRESS confusion\n try:\n data = validators.validate_host(data)\n except Exception as e:\n log.trace(traceback.format_exc())\n raise ValidationError(f'Error sanitizing event data \"{data}\" for type \"{event_type}\": {e}')\n data_is_ip = is_ip(data)\n if event_type == \"DNS_NAME\" and data_is_ip:\n event_type = \"IP_ADDRESS\"\n elif event_type == \"IP_ADDRESS\" and not data_is_ip:\n event_type = \"DNS_NAME\"\n # USERNAME <--> EMAIL_ADDRESS confusion\n if event_type == \"USERNAME\" and validators.soft_validate(data, \"email\"):\n event_type = \"EMAIL_ADDRESS\"\n tags.add(\"affiliate\")\n\n event_class = globals().get(event_type, DefaultEvent)\n\n return event_class(\n data,\n event_type=event_type,\n parent=parent,\n context=context,\n module=module,\n scan=scan,\n scans=scans,\n tags=tags,\n confidence=confidence,\n _dummy=dummy,\n _internal=internal,\n )\n
"},{"location":"dev/event/#bbot.core.event.base.event_from_json","title":"event_from_json","text":"event_from_json(j, siem_friendly=False)\n
Creates an event object from a JSON dictionary.
This function deserializes a JSON dictionary to create a new event object, using the make_event
function for the actual object creation. It sets additional attributes such as the timestamp and scope distance based on the input JSON.
Parameters:
j
(Dict
) \u2013 JSON dictionary containing the event attributes. Must include keys \"data\" and \"type\".
Returns:
BaseEvent
\u2013 A new event object initialized with attributes from the JSON dictionary.
Raises:
ValidationError
\u2013 Raised when the JSON dictionary is missing required fields.
The function assumes that the input JSON dictionary is valid and may raise exceptions if required keys are missing. Make sure to validate the JSON input beforehand.
Source code inbbot/core/event/base.py
def event_from_json(j, siem_friendly=False):\n \"\"\"\n Creates an event object from a JSON dictionary.\n\n This function deserializes a JSON dictionary to create a new event object, using the `make_event` function\n for the actual object creation. It sets additional attributes such as the timestamp and scope distance\n based on the input JSON.\n\n Parameters:\n j (Dict): JSON dictionary containing the event attributes.\n Must include keys \"data\" and \"type\".\n\n Returns:\n BaseEvent: A new event object initialized with attributes from the JSON dictionary.\n\n Raises:\n ValidationError: Raised when the JSON dictionary is missing required fields.\n\n Note:\n The function assumes that the input JSON dictionary is valid and may raise exceptions\n if required keys are missing. Make sure to validate the JSON input beforehand.\n \"\"\"\n try:\n event_type = j[\"type\"]\n kwargs = {\n \"event_type\": event_type,\n \"scans\": j.get(\"scans\", []),\n \"tags\": j.get(\"tags\", []),\n \"confidence\": j.get(\"confidence\", 100),\n \"context\": j.get(\"discovery_context\", None),\n \"dummy\": True,\n }\n if siem_friendly:\n data = j[\"data\"][event_type]\n else:\n data = j[\"data\"]\n kwargs[\"data\"] = data\n event = make_event(**kwargs)\n\n resolved_hosts = j.get(\"resolved_hosts\", [])\n event._resolved_hosts = set(resolved_hosts)\n\n event.timestamp = datetime.datetime.fromisoformat(j[\"timestamp\"])\n event.scope_distance = j[\"scope_distance\"]\n parent_id = j.get(\"parent\", None)\n if parent_id is not None:\n event._parent_id = parent_id\n return event\n except KeyError as e:\n raise ValidationError(f\"Event missing required field: {e}\")\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent","title":"BaseEvent","text":"Represents a piece of data discovered during a BBOT scan.
An Event contains various attributes that provide metadata about the discovered data. The attributes assist in understanding the context of the Event and facilitate further filtering and querying. Events are integral in the construction of visual graphs and are the cornerstone of data exchange between BBOT modules.
You can inherit from this class when creating a new event type. However, it's not always necessary. You only need to subclass if you want to layer additional functionality on top of the base class.
Attributes:
type
(str
) \u2013 Specifies the type of the event, e.g., IP_ADDRESS
, DNS_NAME
.
id
(str
) \u2013 A unique identifier for the event.
data
(str or dict
) \u2013 The main data for the event, e.g., a URL or IP address.
data_graph
(str
) \u2013 Representation of self.data
for Neo4j graph nodes.
data_human
(str
) \u2013 Representation of self.data
for human output.
data_id
(str
) \u2013 Representation of self.data
used to calculate the event's ID (and ultimately its hash, which is used for deduplication)
data_json
(str
) \u2013 Representation of self.data
to be used in JSON serialization.
host
(str, IPvXAddress, or IPvXNetwork
) \u2013 The associated IP address or hostname for the event
host_stem
(str
) \u2013 An abbreviated representation of hostname that removes the TLD, e.g. \"www.evilcorp\". Used by the word cloud.
port
(int or None
) \u2013 The port associated with the event, if applicable, else None.
words
(set
) \u2013 A list of relevant keywords extracted from the event. Used by the word cloud.
scope_distance
(int
) \u2013 Indicates how many hops the event is from the main scope; 0 means in-scope.
web_spider_distance
(int
) \u2013 The spider distance from the web root, specific to web crawling.
scan
(Scanner
) \u2013 The scan object that generated the event.
timestamp
(datetime
) \u2013 The time at which the data was discovered.
resolved_hosts
(list of str
) \u2013 List of hosts to which the event data resolves, applicable for URLs and DNS names.
parent
(BaseEvent
) \u2013 The parent event that led to the discovery of this event.
parent_id
(str
) \u2013 The id
attribute of the parent event.
tags
(set of str
) \u2013 Descriptive tags for the event, e.g., mx-record
, in-scope
.
module
(BaseModule
) \u2013 The module that discovered the event.
module_sequence
(str
) \u2013 The sequence of modules that participated in the discovery.
Examples:
{\n \"type\": \"URL\",\n \"id\": \"URL:017ec8e5dc158c0fd46f07169f8577fb4b45e89a\",\n \"data\": \"http://www.blacklanternsecurity.com/\",\n \"web_spider_distance\": 0,\n \"scope_distance\": 0,\n \"scan\": \"SCAN:4d786912dbc97be199da13074699c318e2067a7f\",\n \"timestamp\": 1688526222.723366,\n \"resolved_hosts\": [\"185.199.108.153\"],\n \"parent\": \"OPEN_TCP_PORT:cf7e6a937b161217eaed99f0c566eae045d094c7\",\n \"tags\": [\"in-scope\", \"distance-0\", \"dir\", \"ip-185-199-108-153\", \"status-301\", \"http-title-301-moved-permanently\"],\n \"module\": \"httpx\",\n \"module_sequence\": \"httpx\"\n}\n
Source code in bbot/core/event/base.py
class BaseEvent:\n \"\"\"\n Represents a piece of data discovered during a BBOT scan.\n\n An Event contains various attributes that provide metadata about the discovered data.\n The attributes assist in understanding the context of the Event and facilitate further\n filtering and querying. Events are integral in the construction of visual graphs and\n are the cornerstone of data exchange between BBOT modules.\n\n You can inherit from this class when creating a new event type. However, it's not always\n necessary. You only need to subclass if you want to layer additional functionality on\n top of the base class.\n\n Attributes:\n type (str): Specifies the type of the event, e.g., `IP_ADDRESS`, `DNS_NAME`.\n id (str): A unique identifier for the event.\n data (str or dict): The main data for the event, e.g., a URL or IP address.\n data_graph (str): Representation of `self.data` for Neo4j graph nodes.\n data_human (str): Representation of `self.data` for human output.\n data_id (str): Representation of `self.data` used to calculate the event's ID (and ultimately its hash, which is used for deduplication)\n data_json (str): Representation of `self.data` to be used in JSON serialization.\n host (str, IPvXAddress, or IPvXNetwork): The associated IP address or hostname for the event\n host_stem (str): An abbreviated representation of hostname that removes the TLD, e.g. \"www.evilcorp\". Used by the word cloud.\n port (int or None): The port associated with the event, if applicable, else None.\n words (set): A list of relevant keywords extracted from the event. Used by the word cloud.\n scope_distance (int): Indicates how many hops the event is from the main scope; 0 means in-scope.\n web_spider_distance (int): The spider distance from the web root, specific to web crawling.\n scan (Scanner): The scan object that generated the event.\n timestamp (datetime.datetime): The time at which the data was discovered.\n resolved_hosts (list of str): List of hosts to which the event data resolves, applicable for URLs and DNS names.\n parent (BaseEvent): The parent event that led to the discovery of this event.\n parent_id (str): The `id` attribute of the parent event.\n tags (set of str): Descriptive tags for the event, e.g., `mx-record`, `in-scope`.\n module (BaseModule): The module that discovered the event.\n module_sequence (str): The sequence of modules that participated in the discovery.\n\n Examples:\n ```json\n {\n \"type\": \"URL\",\n \"id\": \"URL:017ec8e5dc158c0fd46f07169f8577fb4b45e89a\",\n \"data\": \"http://www.blacklanternsecurity.com/\",\n \"web_spider_distance\": 0,\n \"scope_distance\": 0,\n \"scan\": \"SCAN:4d786912dbc97be199da13074699c318e2067a7f\",\n \"timestamp\": 1688526222.723366,\n \"resolved_hosts\": [\"185.199.108.153\"],\n \"parent\": \"OPEN_TCP_PORT:cf7e6a937b161217eaed99f0c566eae045d094c7\",\n \"tags\": [\"in-scope\", \"distance-0\", \"dir\", \"ip-185-199-108-153\", \"status-301\", \"http-title-301-moved-permanently\"],\n \"module\": \"httpx\",\n \"module_sequence\": \"httpx\"\n }\n ```\n \"\"\"\n\n # Always emit this event type even if it's not in scope\n _always_emit = False\n # Always emit events with these tags even if they're not in scope\n _always_emit_tags = [\"affiliate\", \"target\"]\n # Bypass scope checking and dns resolution, distribute immediately to modules\n # This is useful for \"end-of-line\" events like FINDING and VULNERABILITY\n _quick_emit = False\n # Whether this event has been retroactively marked as part of an important discovery chain\n _graph_important = False\n # Disables certain data validations\n _dummy = False\n # Data validation, if data is a dictionary\n _data_validator = None\n # Whether to increment scope distance if the child and parent hosts are the same\n _scope_distance_increment_same_host = False\n # Don't allow duplicates to occur within a parent chain\n # In other words, don't emit the event if the same one already exists in its discovery context\n _suppress_chain_dupes = False\n\n def __init__(\n self,\n data,\n event_type,\n parent=None,\n context=None,\n module=None,\n scan=None,\n scans=None,\n tags=None,\n confidence=100,\n timestamp=None,\n _dummy=False,\n _internal=None,\n ):\n \"\"\"\n Initializes an Event object with the given parameters.\n\n In most cases, you should use `make_event()` instead of instantiating this class directly.\n `make_event()` is much friendlier, and can auto-detect the event type for you.\n\n Attributes:\n data (str, dict): The primary data for the event.\n event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'.\n parent (BaseEvent, optional): Parent event that led to this event's discovery. Defaults to None.\n module (str, optional): Module that discovered the event. Defaults to None.\n scan (Scan, optional): BBOT Scan object. Required unless _dummy is True. Defaults to None.\n scans (list of Scan, optional): BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.\n tags (list of str, optional): Descriptive tags for the event. Defaults to None.\n confidence (int, optional): Confidence level for the event, on a scale of 1-100. Defaults to 100.\n timestamp (datetime, optional): Time of event discovery. Defaults to current UTC time.\n _dummy (bool, optional): If True, disables certain data validations. Defaults to False.\n _internal (Any, optional): If specified, makes the event internal. Defaults to None.\n\n Raises:\n ValidationError: If either `scan` or `parent` are not specified and `_dummy` is False.\n \"\"\"\n\n self._id = None\n self._hash = None\n self._data = None\n self.__host = None\n self._tags = set()\n self._port = None\n self._omit = False\n self.__words = None\n self._parent = None\n self._priority = None\n self._parent_id = None\n self._host_original = None\n self._scope_distance = None\n self._module_priority = None\n self._resolved_hosts = set()\n self.dns_children = dict()\n self._discovery_context = \"\"\n self._discovery_context_regex = re.compile(r\"\\{(?:event|module)[^}]*\\}\")\n self.web_spider_distance = 0\n\n # for creating one-off events without enforcing parent requirement\n self._dummy = _dummy\n self.module = module\n self._type = event_type\n\n # keep track of whether this event has been recorded by the scan\n self._stats_recorded = False\n\n if timestamp is not None:\n self.timestamp = timestamp\n else:\n try:\n self.timestamp = datetime.datetime.now(datetime.UTC)\n except AttributeError:\n self.timestamp = datetime.datetime.utcnow()\n\n self.confidence = int(confidence)\n self._internal = False\n\n # self.scan holds the instantiated scan object (for helpers, etc.)\n self.scan = scan\n if (not self.scan) and (not self._dummy):\n raise ValidationError(f\"Must specify scan\")\n # self.scans holds a list of scan IDs from scans that encountered this event\n self.scans = []\n if scans is not None:\n self.scans = scans\n if self.scan:\n self.scans = list(set([self.scan.id] + self.scans))\n\n try:\n self.data = self._sanitize_data(data)\n except Exception as e:\n log.trace(traceback.format_exc())\n raise ValidationError(f'Error sanitizing event data \"{data}\" for type \"{self.type}\": {e}')\n\n if not self.data:\n raise ValidationError(f'Invalid event data \"{data}\" for type \"{self.type}\"')\n\n self.parent = parent\n if (not self.parent) and (not self._dummy):\n raise ValidationError(f\"Must specify event parent\")\n\n if tags is not None:\n for tag in tags:\n self.add_tag(tag)\n\n # internal events are not ingested by output modules\n if not self._dummy:\n # removed this second part because it was making certain sslcert events internal\n if _internal: # or parent._internal:\n self.internal = True\n\n if not context:\n context = getattr(self.module, \"default_discovery_context\", \"\")\n if context:\n self.discovery_context = context\n\n @property\n def data(self):\n return self._data\n\n @property\n def confidence(self):\n return self._confidence\n\n @confidence.setter\n def confidence(self, confidence):\n self._confidence = min(100, max(1, int(confidence)))\n\n @property\n def cumulative_confidence(self):\n \"\"\"\n Considers the confidence of parent events. This is useful for filtering out speculative/unreliable events.\n\n E.g. an event with a confidence of 50 whose parent is also 50 would have a cumulative confidence of 25.\n\n A confidence of 100 will reset the cumulative confidence to 100.\n \"\"\"\n if self._confidence == 100 or self.parent is None or self.parent is self:\n return self._confidence\n return int(self._confidence * self.parent.cumulative_confidence / 100)\n\n @property\n def resolved_hosts(self):\n if is_ip(self.host):\n return {\n self.host,\n }\n return self._resolved_hosts\n\n @data.setter\n def data(self, data):\n self._hash = None\n self._data_hash = None\n self._id = None\n self.__host = None\n self._port = None\n self._data = data\n\n @property\n def internal(self):\n return self._internal\n\n @internal.setter\n def internal(self, value):\n \"\"\"\n Marks the event as internal, excluding it from output but allowing normal exchange between scan modules.\n\n Internal events are typically speculative and may not be interesting by themselves but can lead to\n the discovery of interesting events. This method sets the `_internal` attribute to True and adds the\n \"internal\" tag.\n\n Examples of internal events include `OPEN_TCP_PORT`s from the `speculate` module,\n `IP_ADDRESS`es from the `ipneighbor` module, or out-of-scope `DNS_NAME`s that originate\n from DNS resolutions.\n\n The purpose of internal events is to enable speculative/explorative discovery without cluttering\n the console with irrelevant or uninteresting events.\n \"\"\"\n if not value in (True, False):\n raise ValueError(f'\"internal\" must be boolean, not {type(value)}')\n if value == True:\n self.add_tag(\"internal\")\n else:\n self.remove_tag(\"internal\")\n self._internal = value\n\n @property\n def host(self):\n \"\"\"\n An abbreviated representation of the data that allows comparison with other events.\n For host types, this is a hostname.\n This allows comparison of an email or a URL with a domain, and vice versa\n bob@evilcorp.com --> evilcorp.com\n https://evilcorp.com --> evilcorp.com\n evilcorp.com:80 --> evilcorp.com\n\n For IP_* types, this is an instantiated object representing the event's data\n E.g. for IP_ADDRESS, it could be an ipaddress.IPv4Address() or IPv6Address() object\n \"\"\"\n if self.__host is None:\n self.host = self._host()\n return self.__host\n\n @host.setter\n def host(self, host):\n if self._host_original is None:\n self._host_original = host\n self.__host = host\n\n @property\n def host_original(self):\n \"\"\"\n Original host data, in case it was changed due to a wildcard DNS, etc.\n \"\"\"\n if self._host_original is None:\n return self.host\n return self._host_original\n\n @property\n def closest_host(self):\n \"\"\"\n Walk up the chain of parents events until we hit the first one with a host\n \"\"\"\n if self.host is not None or self.parent is None or self.parent is self:\n return self.host\n return self.parent.closest_host\n\n @property\n def port(self):\n self.host\n if getattr(self, \"parsed_url\", None):\n if self.parsed_url.port is not None:\n return self.parsed_url.port\n elif self.parsed_url.scheme == \"https\":\n return 443\n elif self.parsed_url.scheme == \"http\":\n return 80\n return self._port\n\n @property\n def host_stem(self):\n \"\"\"\n An abbreviated representation of hostname that removes the TLD\n E.g. www.evilcorp.com --> www.evilcorp\n \"\"\"\n if self.host and type(self.host) == str:\n return domain_stem(self.host)\n else:\n return f\"{self.host}\"\n\n @property\n def discovery_context(self):\n return self._discovery_context\n\n @discovery_context.setter\n def discovery_context(self, context):\n def replace(match):\n s = match.group()\n return s.format(module=self.module, event=self)\n\n try:\n self._discovery_context = self._discovery_context_regex.sub(replace, context)\n except Exception as e:\n log.trace(f\"Error formatting discovery context for {self}: {e} (context: '{context}')\")\n self._discovery_context = context\n\n @property\n def discovery_path(self):\n \"\"\"\n This event's full discovery context, including those of all its parents\n \"\"\"\n parent_path = []\n if self.parent is not None and self.parent is not self:\n parent_path = self.parent.discovery_path\n return parent_path + [[self.id, self.discovery_context]]\n\n @property\n def words(self):\n if self.__words is None:\n self.__words = set(self._words())\n return self.__words\n\n def _words(self):\n return set()\n\n @property\n def tags(self):\n return self._tags\n\n @tags.setter\n def tags(self, tags):\n self._tags = set()\n if isinstance(tags, str):\n tags = (tags,)\n for tag in tags:\n self.add_tag(tag)\n\n def add_tag(self, tag):\n self._tags.add(tagify(tag))\n\n def remove_tag(self, tag):\n with suppress(KeyError):\n self._tags.remove(tagify(tag))\n\n @property\n def always_emit(self):\n \"\"\"\n If this returns True, the event will always be distributed to output modules regardless of scope distance\n \"\"\"\n always_emit_tags = any(t in self.tags for t in self._always_emit_tags)\n no_host_information = not bool(self.host)\n return self._always_emit or always_emit_tags or no_host_information\n\n @property\n def quick_emit(self):\n no_host_information = not bool(self.host)\n return self._quick_emit or no_host_information\n\n @property\n def id(self):\n \"\"\"\n A uniquely identifiable hash of the event from the event type + a SHA1 of its data\n \"\"\"\n if self._id is None:\n self._id = f\"{self.type}:{self.data_hash.hex()}\"\n return self._id\n\n @property\n def data_hash(self):\n \"\"\"\n A raw byte hash of the event's data\n \"\"\"\n if self._data_hash is None:\n self._data_hash = sha1(self.data_id).digest()\n return self._data_hash\n\n @property\n def scope_distance(self):\n return self._scope_distance\n\n @scope_distance.setter\n def scope_distance(self, scope_distance):\n \"\"\"\n Setter for the scope_distance attribute, ensuring it only decreases.\n\n The scope_distance attribute is designed to never increase; it can only be set to smaller values than\n the current one. If a larger value is provided, it is ignored. The setter also updates the event's\n tags to reflect the new scope distance.\n\n Parameters:\n scope_distance (int): The new scope distance to set, must be a non-negative integer.\n\n Note:\n The method will automatically update the relevant 'distance-' tags associated with the event.\n \"\"\"\n if scope_distance < 0:\n raise ValueError(f\"Invalid scope distance: {scope_distance}\")\n # ensure scope distance does not increase (only allow setting to smaller values)\n if self.scope_distance is None:\n new_scope_distance = scope_distance\n else:\n new_scope_distance = min(self.scope_distance, scope_distance)\n if self._scope_distance != new_scope_distance:\n # remove old scope distance tags\n for t in list(self.tags):\n if t.startswith(\"distance-\"):\n self.remove_tag(t)\n if scope_distance == 0:\n self.add_tag(\"in-scope\")\n self.remove_tag(\"affiliate\")\n else:\n self.remove_tag(\"in-scope\")\n self.add_tag(f\"distance-{new_scope_distance}\")\n self._scope_distance = new_scope_distance\n # apply recursively to parent events\n parent_scope_distance = getattr(self.parent, \"scope_distance\", None)\n if parent_scope_distance is not None and self.parent is not self:\n self.parent.scope_distance = scope_distance + 1\n\n @property\n def scope_description(self):\n \"\"\"\n Returns a single word describing the scope of the event.\n\n \"in-scope\" if the event is in scope, \"affiliate\" if it's an affiliate, otherwise \"distance-{scope_distance}\"\n \"\"\"\n if self.scope_distance == 0:\n return \"in-scope\"\n elif \"affiliate\" in self.tags:\n return \"affiliate\"\n return f\"distance-{self.scope_distance}\"\n\n @property\n def parent(self):\n return self._parent\n\n @parent.setter\n def parent(self, parent):\n \"\"\"\n Setter for the parent attribute, ensuring it's a valid event and updating scope distance.\n\n Sets the parent of the event and automatically adjusts the scope distance based on the parent event's\n scope distance. The scope distance is incremented by 1 if the host of the parent event is different\n from the current event's host.\n\n Parameters:\n parent (BaseEvent): The new parent event to set. Must be a valid event object.\n\n Note:\n If an invalid parent is provided and the event is not a dummy, a warning will be logged.\n \"\"\"\n if is_event(parent):\n self._parent = parent\n hosts_are_same = (self.host and parent.host) and (self.host == parent.host)\n new_scope_distance = int(parent.scope_distance)\n if self.host and parent.scope_distance is not None:\n # only increment the scope distance if the host changes\n if self._scope_distance_increment_same_host or not hosts_are_same:\n new_scope_distance += 1\n self.scope_distance = new_scope_distance\n # inherit certain tags\n if hosts_are_same:\n # inherit web spider distance from parent\n self.web_spider_distance = getattr(parent, \"web_spider_distance\", 0)\n event_has_url = getattr(self, \"parsed_url\", None) is not None\n for t in parent.tags:\n if t in (\"affiliate\",):\n self.add_tag(t)\n elif t.startswith(\"mutation-\"):\n self.add_tag(t)\n # only add these tags if the event has a URL\n if event_has_url:\n if t in (\"spider-danger\", \"spider-max\"):\n self.add_tag(t)\n elif not self._dummy:\n log.warning(f\"Tried to set invalid parent on {self}: (got: {parent})\")\n\n @property\n def parent_id(self):\n parent_id = getattr(self.get_parent(), \"id\", None)\n if parent_id is not None:\n return parent_id\n return self._parent_id\n\n @property\n def validators(self):\n \"\"\"\n Depending on whether the scan attribute is accessible, return either a config-aware or non-config-aware validator\n\n This exists to prevent a chicken-and-egg scenario during the creation of certain events such as URLs,\n whose sanitization behavior is different depending on the config.\n\n However, thanks to this property, validation can still work in the absence of a config.\n \"\"\"\n if self.scan is not None:\n return self.scan.helpers.config_aware_validators\n return validators\n\n def get_parent(self):\n \"\"\"\n Takes into account events with the _omit flag\n \"\"\"\n if getattr(self.parent, \"_omit\", False):\n return self.parent.get_parent()\n return self.parent\n\n def get_parents(self, omit=False, include_self=False):\n parents = []\n e = self\n if include_self:\n parents.append(self)\n while 1:\n if omit:\n parent = e.get_parent()\n else:\n parent = e.parent\n if parent is None:\n break\n if e == parent:\n break\n parents.append(parent)\n e = parent\n return parents\n\n def _host(self):\n return None\n\n def _sanitize_data(self, data):\n \"\"\"\n Validates and sanitizes the event's data during instantiation.\n\n By default, uses the '_data_load' method to pre-process the data and then applies the '_data_validator'\n to validate and create a sanitized dictionary. Raises a ValidationError if any of the validations fail.\n Subclasses can override this method to provide custom validation logic.\n\n Returns:\n Any: The sanitized data.\n\n Raises:\n ValidationError: If the data fails to validate.\n \"\"\"\n data = self._data_load(data)\n if self._data_validator is not None:\n if not isinstance(data, dict):\n raise ValidationError(f\"data is not of type dict: {data}\")\n data = self._data_validator(**data).model_dump(exclude_none=True)\n return self.sanitize_data(data)\n\n def sanitize_data(self, data):\n return data\n\n @property\n def data_human(self):\n \"\"\"\n Human representation of event.data\n \"\"\"\n return self._data_human()\n\n def _data_human(self):\n if isinstance(self.data, (dict, list)):\n with suppress(Exception):\n return json.dumps(self.data, sort_keys=True)\n return smart_decode(self.data)\n\n def _data_load(self, data):\n \"\"\"\n How to load the event data (JSON-decode it, etc.)\n \"\"\"\n return data\n\n @property\n def data_id(self):\n \"\"\"\n Representation of the event.data used to calculate the event's ID\n \"\"\"\n return self._data_id()\n\n def _data_id(self):\n return self.data\n\n @property\n def pretty_string(self):\n \"\"\"\n A human-friendly representation of the event's data. Used for graph representation.\n\n If the event's data is a dictionary, the function will try to return a JSON-formatted string.\n Otherwise, it will use smart_decode to convert the data into a string representation.\n\n Override if necessary.\n\n Returns:\n str: The graphical representation of the event's data.\n \"\"\"\n return self._pretty_string()\n\n def _pretty_string(self):\n return self._data_human()\n\n @property\n def data_graph(self):\n \"\"\"\n Representation of event.data for neo4j graph nodes\n \"\"\"\n return self.pretty_string\n\n @property\n def data_json(self):\n \"\"\"\n JSON representation of event.data\n \"\"\"\n return self.data\n\n def __contains__(self, other):\n \"\"\"\n Allows events to be compared using the \"in\" operator:\n E.g.:\n if some_event in other_event:\n ...\n \"\"\"\n try:\n other = make_event(other, dummy=True)\n except ValidationError:\n return False\n # if hashes match\n if other == self:\n return True\n # if hosts match\n if self.host and other.host:\n if self.host == other.host:\n return True\n # hostnames and IPs\n radixtarget = RadixTarget()\n radixtarget.insert(self.host)\n return bool(radixtarget.search(other.host))\n return False\n\n def json(self, mode=\"json\", siem_friendly=False):\n \"\"\"\n Serializes the event object to a JSON-compatible dictionary.\n\n By default, it includes attributes such as 'type', 'id', 'data', 'scope_distance', and others that are present.\n Additional specific attributes can be serialized based on the mode specified.\n\n Parameters:\n mode (str): Specifies the data serialization mode. Default is \"json\". Other options include \"graph\", \"human\", and \"id\".\n siem_friendly (bool): Whether to format the JSON in a way that's friendly to SIEM ingestion by Elastic, Splunk, etc. This ensures the value of \"data\" is always the same type (a dictionary).\n\n Returns:\n dict: JSON-serializable dictionary representation of the event object.\n \"\"\"\n # type, ID, scope description\n j = dict()\n for i in (\"type\", \"id\", \"scope_description\"):\n v = getattr(self, i, \"\")\n if v:\n j.update({i: v})\n # event data\n data_attr = getattr(self, f\"data_{mode}\", None)\n if data_attr is not None:\n data = data_attr\n else:\n data = smart_decode(self.data)\n if siem_friendly:\n j[\"data\"] = {self.type: data}\n else:\n j[\"data\"] = data\n # host, dns children\n if self.host:\n j[\"host\"] = str(self.host)\n j[\"resolved_hosts\"] = sorted(str(h) for h in self.resolved_hosts)\n j[\"dns_children\"] = {k: list(v) for k, v in self.dns_children.items()}\n # web spider distance\n web_spider_distance = getattr(self, \"web_spider_distance\", None)\n if web_spider_distance is not None:\n j[\"web_spider_distance\"] = web_spider_distance\n # scope distance\n j[\"scope_distance\"] = self.scope_distance\n # scan\n if self.scan:\n j[\"scan\"] = self.scan.id\n # timestamp\n j[\"timestamp\"] = self.timestamp.isoformat()\n # parent event\n parent_id = self.parent_id\n if parent_id:\n j[\"parent\"] = parent_id\n # tags\n if self.tags:\n j.update({\"tags\": list(self.tags)})\n # parent module\n if self.module:\n j.update({\"module\": str(self.module)})\n # sequence of modules that led to discovery\n if self.module_sequence:\n j.update({\"module_sequence\": str(self.module_sequence)})\n # discovery context\n j[\"discovery_context\"] = self.discovery_context\n j[\"discovery_path\"] = self.discovery_path\n\n # normalize non-primitive python objects\n for k, v in list(j.items()):\n if k == \"data\":\n continue\n if type(v) not in (str, int, float, bool, list, dict, type(None)):\n try:\n j[k] = json.dumps(v, sort_keys=True)\n except Exception:\n j[k] = smart_decode(v)\n return j\n\n @staticmethod\n def from_json(j):\n \"\"\"\n Convenience shortcut to create an Event object from a JSON-compatible dictionary.\n\n Calls the `event_from_json()` function to deserialize the event.\n\n Parameters:\n j (dict): The JSON-compatible dictionary containing event data.\n\n Returns:\n Event: The deserialized Event object.\n \"\"\"\n return event_from_json(j)\n\n @property\n def module_sequence(self):\n \"\"\"\n Get a human-friendly string that represents the sequence of modules responsible for generating this event.\n\n Includes the names of omitted parent events to provide a complete view of the module sequence leading to this event.\n\n Returns:\n str: The module sequence in human-friendly format.\n \"\"\"\n module_name = getattr(self.module, \"name\", \"\")\n if getattr(self.parent, \"_omit\", False):\n module_name = f\"{self.parent.module_sequence}->{module_name}\"\n return module_name\n\n @property\n def module_priority(self):\n if self._module_priority is None:\n module = getattr(self, \"module\", None)\n self._module_priority = int(max(1, min(5, getattr(module, \"priority\", 3))))\n return self._module_priority\n\n @module_priority.setter\n def module_priority(self, priority):\n self._module_priority = int(max(1, min(5, priority)))\n\n @property\n def priority(self):\n if self._priority is None:\n timestamp = self.timestamp.timestamp()\n if self.parent.timestamp == self.timestamp:\n self._priority = (timestamp,)\n else:\n self._priority = getattr(self.parent, \"priority\", ()) + (timestamp,)\n\n return self._priority\n\n @property\n def type(self):\n return self._type\n\n @type.setter\n def type(self, val):\n self._type = val\n self._hash = None\n self._id = None\n\n @property\n def _host_size(self):\n \"\"\"\n Used for sorting events by their host size, so that parent ones (e.g. IP subnets) come first\n \"\"\"\n if self.host:\n if isinstance(self.host, str):\n # smaller domains should come first\n return len(self.host)\n else:\n try:\n # bigger IP subnets should come first\n return -self.host.num_addresses\n except AttributeError:\n # IP addresses default to 1\n return 1\n return 0\n\n def __iter__(self):\n \"\"\"\n For dict(event)\n \"\"\"\n yield from self.json().items()\n\n def __lt__(self, other):\n \"\"\"\n For queue sorting\n \"\"\"\n return self.priority < getattr(other, \"priority\", (0,))\n\n def __gt__(self, other):\n \"\"\"\n For queue sorting\n \"\"\"\n return self.priority > getattr(other, \"priority\", (0,))\n\n def __eq__(self, other):\n try:\n other = make_event(other, dummy=True)\n except ValidationError:\n return False\n return hash(self) == hash(other)\n\n def __hash__(self):\n if self._hash is None:\n self._hash = hash(self.id)\n return self._hash\n\n def __str__(self):\n max_event_len = 80\n d = str(self.data)\n return f'{self.type}(\"{d[:max_event_len]}{(\"...\" if len(d) > max_event_len else \"\")}\", module={self.module}, tags={self.tags})'\n\n def __repr__(self):\n return str(self)\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent.pretty_string","title":"pretty_string property
","text":"pretty_string\n
A human-friendly representation of the event's data. Used for graph representation.
If the event's data is a dictionary, the function will try to return a JSON-formatted string. Otherwise, it will use smart_decode to convert the data into a string representation.
Override if necessary.
Returns:
str
\u2013 The graphical representation of the event's data.
property
","text":"module_sequence\n
Get a human-friendly string that represents the sequence of modules responsible for generating this event.
Includes the names of omitted parent events to provide a complete view of the module sequence leading to this event.
Returns:
str
\u2013 The module sequence in human-friendly format.
__init__(data, event_type, parent=None, context=None, module=None, scan=None, scans=None, tags=None, confidence=100, timestamp=None, _dummy=False, _internal=None)\n
Initializes an Event object with the given parameters.
In most cases, you should use make_event()
instead of instantiating this class directly. make_event()
is much friendlier, and can auto-detect the event type for you.
Attributes:
data
((str, dict)
) \u2013 The primary data for the event.
event_type
(str
) \u2013 Type of the event, e.g., 'IP_ADDRESS'.
parent
(BaseEvent
) \u2013 Parent event that led to this event's discovery. Defaults to None.
module
(str
) \u2013 Module that discovered the event. Defaults to None.
scan
(Scan
) \u2013 BBOT Scan object. Required unless _dummy is True. Defaults to None.
scans
(list of Scan
) \u2013 BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.
tags
(list of str
) \u2013 Descriptive tags for the event. Defaults to None.
confidence
(int
) \u2013 Confidence level for the event, on a scale of 1-100. Defaults to 100.
timestamp
(datetime
) \u2013 Time of event discovery. Defaults to current UTC time.
_dummy
(bool
) \u2013 If True, disables certain data validations. Defaults to False.
_internal
(Any
) \u2013 If specified, makes the event internal. Defaults to None.
Raises:
ValidationError
\u2013 If either scan
or parent
are not specified and _dummy
is False.
bbot/core/event/base.py
def __init__(\n self,\n data,\n event_type,\n parent=None,\n context=None,\n module=None,\n scan=None,\n scans=None,\n tags=None,\n confidence=100,\n timestamp=None,\n _dummy=False,\n _internal=None,\n):\n \"\"\"\n Initializes an Event object with the given parameters.\n\n In most cases, you should use `make_event()` instead of instantiating this class directly.\n `make_event()` is much friendlier, and can auto-detect the event type for you.\n\n Attributes:\n data (str, dict): The primary data for the event.\n event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'.\n parent (BaseEvent, optional): Parent event that led to this event's discovery. Defaults to None.\n module (str, optional): Module that discovered the event. Defaults to None.\n scan (Scan, optional): BBOT Scan object. Required unless _dummy is True. Defaults to None.\n scans (list of Scan, optional): BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.\n tags (list of str, optional): Descriptive tags for the event. Defaults to None.\n confidence (int, optional): Confidence level for the event, on a scale of 1-100. Defaults to 100.\n timestamp (datetime, optional): Time of event discovery. Defaults to current UTC time.\n _dummy (bool, optional): If True, disables certain data validations. Defaults to False.\n _internal (Any, optional): If specified, makes the event internal. Defaults to None.\n\n Raises:\n ValidationError: If either `scan` or `parent` are not specified and `_dummy` is False.\n \"\"\"\n\n self._id = None\n self._hash = None\n self._data = None\n self.__host = None\n self._tags = set()\n self._port = None\n self._omit = False\n self.__words = None\n self._parent = None\n self._priority = None\n self._parent_id = None\n self._host_original = None\n self._scope_distance = None\n self._module_priority = None\n self._resolved_hosts = set()\n self.dns_children = dict()\n self._discovery_context = \"\"\n self._discovery_context_regex = re.compile(r\"\\{(?:event|module)[^}]*\\}\")\n self.web_spider_distance = 0\n\n # for creating one-off events without enforcing parent requirement\n self._dummy = _dummy\n self.module = module\n self._type = event_type\n\n # keep track of whether this event has been recorded by the scan\n self._stats_recorded = False\n\n if timestamp is not None:\n self.timestamp = timestamp\n else:\n try:\n self.timestamp = datetime.datetime.now(datetime.UTC)\n except AttributeError:\n self.timestamp = datetime.datetime.utcnow()\n\n self.confidence = int(confidence)\n self._internal = False\n\n # self.scan holds the instantiated scan object (for helpers, etc.)\n self.scan = scan\n if (not self.scan) and (not self._dummy):\n raise ValidationError(f\"Must specify scan\")\n # self.scans holds a list of scan IDs from scans that encountered this event\n self.scans = []\n if scans is not None:\n self.scans = scans\n if self.scan:\n self.scans = list(set([self.scan.id] + self.scans))\n\n try:\n self.data = self._sanitize_data(data)\n except Exception as e:\n log.trace(traceback.format_exc())\n raise ValidationError(f'Error sanitizing event data \"{data}\" for type \"{self.type}\": {e}')\n\n if not self.data:\n raise ValidationError(f'Invalid event data \"{data}\" for type \"{self.type}\"')\n\n self.parent = parent\n if (not self.parent) and (not self._dummy):\n raise ValidationError(f\"Must specify event parent\")\n\n if tags is not None:\n for tag in tags:\n self.add_tag(tag)\n\n # internal events are not ingested by output modules\n if not self._dummy:\n # removed this second part because it was making certain sslcert events internal\n if _internal: # or parent._internal:\n self.internal = True\n\n if not context:\n context = getattr(self.module, \"default_discovery_context\", \"\")\n if context:\n self.discovery_context = context\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent.json","title":"json","text":"json(mode='json', siem_friendly=False)\n
Serializes the event object to a JSON-compatible dictionary.
By default, it includes attributes such as 'type', 'id', 'data', 'scope_distance', and others that are present. Additional specific attributes can be serialized based on the mode specified.
Parameters:
mode
(str
, default: 'json'
) \u2013 Specifies the data serialization mode. Default is \"json\". Other options include \"graph\", \"human\", and \"id\".
siem_friendly
(bool
, default: False
) \u2013 Whether to format the JSON in a way that's friendly to SIEM ingestion by Elastic, Splunk, etc. This ensures the value of \"data\" is always the same type (a dictionary).
Returns:
dict
\u2013 JSON-serializable dictionary representation of the event object.
bbot/core/event/base.py
def json(self, mode=\"json\", siem_friendly=False):\n \"\"\"\n Serializes the event object to a JSON-compatible dictionary.\n\n By default, it includes attributes such as 'type', 'id', 'data', 'scope_distance', and others that are present.\n Additional specific attributes can be serialized based on the mode specified.\n\n Parameters:\n mode (str): Specifies the data serialization mode. Default is \"json\". Other options include \"graph\", \"human\", and \"id\".\n siem_friendly (bool): Whether to format the JSON in a way that's friendly to SIEM ingestion by Elastic, Splunk, etc. This ensures the value of \"data\" is always the same type (a dictionary).\n\n Returns:\n dict: JSON-serializable dictionary representation of the event object.\n \"\"\"\n # type, ID, scope description\n j = dict()\n for i in (\"type\", \"id\", \"scope_description\"):\n v = getattr(self, i, \"\")\n if v:\n j.update({i: v})\n # event data\n data_attr = getattr(self, f\"data_{mode}\", None)\n if data_attr is not None:\n data = data_attr\n else:\n data = smart_decode(self.data)\n if siem_friendly:\n j[\"data\"] = {self.type: data}\n else:\n j[\"data\"] = data\n # host, dns children\n if self.host:\n j[\"host\"] = str(self.host)\n j[\"resolved_hosts\"] = sorted(str(h) for h in self.resolved_hosts)\n j[\"dns_children\"] = {k: list(v) for k, v in self.dns_children.items()}\n # web spider distance\n web_spider_distance = getattr(self, \"web_spider_distance\", None)\n if web_spider_distance is not None:\n j[\"web_spider_distance\"] = web_spider_distance\n # scope distance\n j[\"scope_distance\"] = self.scope_distance\n # scan\n if self.scan:\n j[\"scan\"] = self.scan.id\n # timestamp\n j[\"timestamp\"] = self.timestamp.isoformat()\n # parent event\n parent_id = self.parent_id\n if parent_id:\n j[\"parent\"] = parent_id\n # tags\n if self.tags:\n j.update({\"tags\": list(self.tags)})\n # parent module\n if self.module:\n j.update({\"module\": str(self.module)})\n # sequence of modules that led to discovery\n if self.module_sequence:\n j.update({\"module_sequence\": str(self.module_sequence)})\n # discovery context\n j[\"discovery_context\"] = self.discovery_context\n j[\"discovery_path\"] = self.discovery_path\n\n # normalize non-primitive python objects\n for k, v in list(j.items()):\n if k == \"data\":\n continue\n if type(v) not in (str, int, float, bool, list, dict, type(None)):\n try:\n j[k] = json.dumps(v, sort_keys=True)\n except Exception:\n j[k] = smart_decode(v)\n return j\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent.from_json","title":"from_json staticmethod
","text":"from_json(j)\n
Convenience shortcut to create an Event object from a JSON-compatible dictionary.
Calls the event_from_json()
function to deserialize the event.
Parameters:
j
(dict
) \u2013 The JSON-compatible dictionary containing event data.
Returns:
Event
\u2013 The deserialized Event object.
bbot/core/event/base.py
@staticmethod\ndef from_json(j):\n \"\"\"\n Convenience shortcut to create an Event object from a JSON-compatible dictionary.\n\n Calls the `event_from_json()` function to deserialize the event.\n\n Parameters:\n j (dict): The JSON-compatible dictionary containing event data.\n\n Returns:\n Event: The deserialized Event object.\n \"\"\"\n return event_from_json(j)\n
"},{"location":"dev/module_howto/","title":"How to Write a BBOT Module","text":"Here we'll go over a basic example of writing a custom BBOT module.
"},{"location":"dev/module_howto/#create-the-python-file","title":"Create the python file","text":".py
file in bbot/modules
BaseModule
BaseModule
watched_events
what type of data your module will consumeproduced_events
what type of data your module will produceflags
) whether your module is active
or passive
, and whether it's safe
or aggressive
.handle_event()
Here is an example of a simple module that performs whois lookups:
bbot/modules/whois.pyfrom bbot.modules.base import BaseModule\n\nclass whois(BaseModule):\n watched_events = [\"DNS_NAME\"] # watch for DNS_NAME events\n produced_events = [\"WHOIS\"] # we produce WHOIS events\n flags = [\"passive\", \"safe\"]\n meta = {\"description\": \"Query WhoisXMLAPI for WHOIS data\"}\n options = {\"api_key\": \"\"} # module config options\n options_desc = {\"api_key\": \"WhoisXMLAPI Key\"}\n per_domain_only = True # only run once per domain\n\n base_url = \"https://www.whoisxmlapi.com/whoisserver/WhoisService\"\n\n # one-time setup - runs at the beginning of the scan\n async def setup(self):\n self.api_key = self.config.get(\"api_key\")\n if not self.api_key:\n # soft-fail if no API key is set\n return None, \"Must set API key\"\n\n async def handle_event(self, event):\n self.hugesuccess(f\"Got {event} (event.data: {event.data})\")\n _, domain = self.helpers.split_domain(event.data)\n url = f\"{self.base_url}?apiKey={self.api_key}&domainName={domain}&outputFormat=JSON\"\n self.hugeinfo(f\"Visiting {url}\")\n response = await self.helpers.request(url)\n if response is not None:\n await self.emit_event(response.json(), \"WHOIS\", parent=event)\n
"},{"location":"dev/module_howto/#test-your-new-module","title":"Test your new module","text":"After saving the module, you can run it with -m
:
# run a scan enabling the module in bbot/modules/mymodule.py\nbbot -t evilcorp.com -m whois\n
"},{"location":"dev/module_howto/#debugging-your-module-bbots-colorful-log-functions","title":"Debugging Your Module - BBOT's Colorful Log Functions","text":"You probably noticed the use of self.hugesuccess()
. This function is part of BBOT's builtin logging capabilty, and it prints whatever you give it in bright green. These colorful log functions can be useful for debugging.
BBOT log levels:
critical
: bright redhugesuccess
: bright greenhugewarning
: bright orangehugeinfo
: bright blueerror
: redwarning
: orangeinfo
: blueverbose
: grey (must use -v
to see)debug
: grey (must use -d
to see)For details on how tests are written, see Unit Tests.
"},{"location":"dev/module_howto/#handle_event-and-emit_event","title":"handle_event()
and emit_event()
","text":"The handle_event()
method is the most important part of the module. By overriding this method, you control what the module does. During a scan, when an event from your watched_events
is encountered (a DNS_NAME
in this example), handle_event()
is automatically called with that event as its argument.
The emit_event()
method is how modules return data. When you call emit_event()
, it creates an event and outputs it, sending it any modules that are interested in that data type.
setup()
","text":"A module's setup()
method is used for performing one-time setup at the start of the scan, like downloading a wordlist or checking to make sure an API key is valid. It needs to return either:
True
- module setup succeededNone
- module setup soft-failed (scan will continue but module will be disabled)False
- module setup hard-failed (scan will abort)Optionally, it can also return a reason. Here are some examples:
async def setup(self):\n if not self.config.get(\"api_key\"):\n # soft-fail\n return None, \"No API key specified\"\n\nasync def setup(self):\n try:\n wordlist = self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n except WordlistError as e:\n # hard-fail\n return False, f\"Error downloading wordlist: {e}\"\n\nasync def setup(self):\n self.timeout = self.config.get(\"timeout\", 5)\n # success\n return True\n
"},{"location":"dev/module_howto/#module-config-options","title":"Module Config Options","text":"Each module can have its own set of config options. These live in the options
and options_desc
attributes on your class. Both are dictionaries; options
is for defaults and options_desc
is for descriptions. Here is a typical example:
class nmap(BaseModule):\n # ...\n options = {\n \"top_ports\": 100,\n \"ports\": \"\",\n \"timing\": \"T4\",\n \"skip_host_discovery\": True,\n }\n options_desc = {\n \"top_ports\": \"Top ports to scan (default 100) (to override, specify 'ports')\",\n \"ports\": \"Ports to scan\",\n \"timing\": \"-T<0-5>: Set timing template (higher is faster)\",\n \"skip_host_discovery\": \"skip host discovery (-Pn)\",\n }\n\n async def setup(self):\n self.ports = self.config.get(\"ports\", \"\")\n self.timing = self.config.get(\"timing\", \"T4\")\n self.top_ports = self.config.get(\"top_ports\", 100)\n self.skip_host_discovery = self.config.get(\"skip_host_discovery\", True)\n return True\n
Once you've defined these variables, you can pass the options via -c
:
bbot -m nmap -c modules.nmap.top_ports=250\n
... or via the config:
~/.config/bbot/bbot.ymlmodules:\n nmap:\n top_ports: 250\n
Inside the module, you access them via self.config
, e.g.:
self.config.get(\"top_ports\")\n
"},{"location":"dev/module_howto/#module-dependencies","title":"Module Dependencies","text":"BBOT automates module dependencies with Ansible. If your module relies on a third-party binary, OS package, or python library, you can specify them in the deps_*
attributes of your module.
class MyModule(BaseModule):\n ...\n deps_apt = [\"chromium-browser\"]\n deps_ansible = [\n {\n \"name\": \"install dev tools\",\n \"package\": {\"name\": [\"gcc\", \"git\", \"make\"], \"state\": \"present\"},\n \"become\": True,\n \"ignore_errors\": True,\n },\n {\n \"name\": \"Download massdns source code\",\n \"git\": {\n \"repo\": \"https://github.com/blechschmidt/massdns.git\",\n \"dest\": \"#{BBOT_TEMP}/massdns\",\n \"single_branch\": True,\n \"version\": \"master\",\n },\n },\n {\n \"name\": \"Build massdns\",\n \"command\": {\"chdir\": \"#{BBOT_TEMP}/massdns\", \"cmd\": \"make\", \"creates\": \"#{BBOT_TEMP}/massdns/bin/massdns\"},\n },\n {\n \"name\": \"Install massdns\",\n \"copy\": {\"src\": \"#{BBOT_TEMP}/massdns/bin/massdns\", \"dest\": \"#{BBOT_TOOLS}/\", \"mode\": \"u+x,g+x,o+x\"},\n },\n ]\n
"},{"location":"dev/presets/","title":"Presets","text":""},{"location":"dev/presets/#bbot.scanner.Preset","title":"Preset","text":"A preset is the central config for a BBOT scan. It contains everything a scan needs to run -- targets, modules, flags, config options like API keys, etc.
You can create a preset manually and pass it into Scanner(preset=preset)
. Or, you can pass Preset
's kwargs into Scanner()
and it will create the preset for you implicitly.
Presets can include other presets (which can in turn include other presets, and so on). This works by merging each preset in turn using Preset.merge()
. The order matters. In case of a conflict, the last preset to be merged wins priority.
Presets can be loaded from or saved to YAML. BBOT has a number of ready-made presets for common tasks like subdomain enumeration, web spidering, dirbusting, etc.
Presets are highly customizable via conditions
, which use the Jinja2 templating engine. Using conditions
, you can define custom logic to inspect the final preset before the scan starts, and change it if need be. Based on the state of the preset, you can print a warning message, abort the scan, enable/disable modules, etc..
Attributes:
target
(Target
) \u2013 Target(s) of scan.
whitelist
(Target
) \u2013 Scan whitelist (by default this is the same as target
).
blacklist
(Target
) \u2013 Scan blacklist (this takes ultimate precedence).
strict_scope
(bool
) \u2013 If True, subdomains of targets are not considered to be in-scope.
helpers
(ConfigAwareHelper
) \u2013 Helper containing various reusable functions, regexes, etc.
output_dir
(Path
) \u2013 Output directory for scan.
scan_name
(str
) \u2013 Name of scan. Defaults to random value, e.g. \"demonic_jimmy\".
name
(str
) \u2013 Human-friendly name of preset. Used mainly for logging purposes.
description
(str
) \u2013 Description of preset.
modules
(set
) \u2013 Combined modules to enable for the scan. Includes scan modules, internal modules, and output modules.
scan_modules
(set
) \u2013 Modules to enable for the scan.
output_modules
(set
) \u2013 Output modules to enable for the scan. (note: if no output modules are specified, this is not populated until .bake())
internal_modules
(set
) \u2013 Internal modules for the scan. (note: not populated until .bake())
exclude_modules
(set
) \u2013 Modules to exclude from the scan. When set, automatically removes excluded modules.
flags
(set
) \u2013 Flags to enable for the scan. When set, automatically enables modules.
require_flags
(set
) \u2013 Require modules to have these flags. When set, automatically removes offending modules.
exclude_flags
(set
) \u2013 Exclude modules that have any of these flags. When set, automatically removes offending modules.
module_dirs
(set
) \u2013 Custom directories from which to load modules (alias to self.module_loader.module_dirs
). When set, automatically preloads contained modules.
config
(DictConfig
) \u2013 BBOT config (alias to core.config
)
core
(BBOTCore
) \u2013 Local copy of BBOTCore object.
verbose
(bool
) \u2013 Whether log level is currently set to verbose. When set, updates log level for all BBOT log handlers.
debug
(bool
) \u2013 Whether log level is currently set to debug. When set, updates log level for all BBOT log handlers.
silent
(bool
) \u2013 Whether logging is currently disabled. When set to True, silences all stderr.
Examples:
>>> preset = Preset(\n \"evilcorp.com\",\n \"1.2.3.0/24\",\n flags=[\"subdomain-enum\"],\n modules=[\"nuclei\"],\n config={\"http_proxy\": \"http://127.0.0.1\"}\n )\n>>> scan = Scanner(preset=preset)\n
>>> preset = Preset.from_yaml_file(\"my_preset.yml\")\n>>> scan = Scanner(preset=preset)\n
Source code in bbot/scanner/preset/preset.py
class Preset:\n \"\"\"\n A preset is the central config for a BBOT scan. It contains everything a scan needs to run --\n targets, modules, flags, config options like API keys, etc.\n\n You can create a preset manually and pass it into `Scanner(preset=preset)`.\n Or, you can pass `Preset`'s kwargs into `Scanner()` and it will create the preset for you implicitly.\n\n Presets can include other presets (which can in turn include other presets, and so on).\n This works by merging each preset in turn using `Preset.merge()`.\n The order matters. In case of a conflict, the last preset to be merged wins priority.\n\n Presets can be loaded from or saved to YAML. BBOT has a number of ready-made presets for common tasks like\n subdomain enumeration, web spidering, dirbusting, etc.\n\n Presets are highly customizable via `conditions`, which use the Jinja2 templating engine.\n Using `conditions`, you can define custom logic to inspect the final preset before the scan starts, and change it if need be.\n Based on the state of the preset, you can print a warning message, abort the scan, enable/disable modules, etc..\n\n Attributes:\n target (Target): Target(s) of scan.\n whitelist (Target): Scan whitelist (by default this is the same as `target`).\n blacklist (Target): Scan blacklist (this takes ultimate precedence).\n strict_scope (bool): If True, subdomains of targets are not considered to be in-scope.\n helpers (ConfigAwareHelper): Helper containing various reusable functions, regexes, etc.\n output_dir (pathlib.Path): Output directory for scan.\n scan_name (str): Name of scan. Defaults to random value, e.g. \"demonic_jimmy\".\n name (str): Human-friendly name of preset. Used mainly for logging purposes.\n description (str): Description of preset.\n modules (set): Combined modules to enable for the scan. Includes scan modules, internal modules, and output modules.\n scan_modules (set): Modules to enable for the scan.\n output_modules (set): Output modules to enable for the scan. (note: if no output modules are specified, this is not populated until .bake())\n internal_modules (set): Internal modules for the scan. (note: not populated until .bake())\n exclude_modules (set): Modules to exclude from the scan. When set, automatically removes excluded modules.\n flags (set): Flags to enable for the scan. When set, automatically enables modules.\n require_flags (set): Require modules to have these flags. When set, automatically removes offending modules.\n exclude_flags (set): Exclude modules that have any of these flags. When set, automatically removes offending modules.\n module_dirs (set): Custom directories from which to load modules (alias to `self.module_loader.module_dirs`). When set, automatically preloads contained modules.\n config (omegaconf.dictconfig.DictConfig): BBOT config (alias to `core.config`)\n core (BBOTCore): Local copy of BBOTCore object.\n verbose (bool): Whether log level is currently set to verbose. When set, updates log level for all BBOT log handlers.\n debug (bool): Whether log level is currently set to debug. When set, updates log level for all BBOT log handlers.\n silent (bool): Whether logging is currently disabled. When set to True, silences all stderr.\n\n Examples:\n >>> preset = Preset(\n \"evilcorp.com\",\n \"1.2.3.0/24\",\n flags=[\"subdomain-enum\"],\n modules=[\"nuclei\"],\n config={\"http_proxy\": \"http://127.0.0.1\"}\n )\n >>> scan = Scanner(preset=preset)\n\n >>> preset = Preset.from_yaml_file(\"my_preset.yml\")\n >>> scan = Scanner(preset=preset)\n \"\"\"\n\n def __init__(\n self,\n *targets,\n whitelist=None,\n blacklist=None,\n strict_scope=False,\n modules=None,\n output_modules=None,\n exclude_modules=None,\n flags=None,\n require_flags=None,\n exclude_flags=None,\n config=None,\n module_dirs=None,\n include=None,\n presets=None,\n output_dir=None,\n scan_name=None,\n name=None,\n description=None,\n conditions=None,\n force_start=False,\n verbose=False,\n debug=False,\n silent=False,\n _exclude=None,\n _log=True,\n ):\n \"\"\"\n Initializes the Preset class.\n\n Args:\n *targets (str): Target(s) to scan. Types supported: hostnames, IPs, CIDRs, emails, open ports.\n whitelist (list, optional): Whitelisted target(s) to scan. Defaults to the same as `targets`.\n blacklist (list, optional): Blacklisted target(s). Takes ultimate precedence. Defaults to empty.\n strict_scope (bool, optional): If True, subdomains of targets are not in-scope.\n modules (list[str], optional): List of scan modules to enable for the scan. Defaults to empty list.\n output_modules (list[str], optional): List of output modules to use. Defaults to csv, human, and json.\n exclude_modules (list[str], optional): List of modules to exclude from the scan.\n require_flags (list[str], optional): Only enable modules if they have these flags.\n exclude_flags (list[str], optional): Don't enable modules if they have any of these flags.\n module_dirs (list[str], optional): additional directories to load modules from.\n config (dict, optional): Additional scan configuration settings.\n include (list[str], optional): names or filenames of other presets to include.\n presets (list[str], optional): an alias for `include`.\n output_dir (str or Path, optional): Directory to store scan output. Defaults to BBOT home directory (`~/.bbot`).\n scan_name (str, optional): Human-readable name of the scan. If not specified, it will be random, e.g. \"demonic_jimmy\".\n name (str, optional): Human-readable name of the preset. Used mainly for logging.\n description (str, optional): Description of the preset.\n conditions (list[str], optional): Custom conditions to be executed before scan start. Written in Jinja2.\n force_start (bool, optional): If True, ignore conditional aborts and failed module setups. Just run the scan!\n verbose (bool, optional): Set the BBOT logger to verbose mode.\n debug (bool, optional): Set the BBOT logger to debug mode.\n silent (bool, optional): Silence all stderr (effectively disables the BBOT logger).\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n \"\"\"\n # internal variables\n self._cli = False\n self._log = _log\n self.scan = None\n self._args = None\n self._environ = None\n self._helpers = None\n self._module_loader = None\n self._yaml_str = \"\"\n self._baked = False\n\n self._default_output_modules = None\n self._default_internal_modules = None\n\n # modules / flags\n self.modules = set()\n self.exclude_modules = set()\n self.flags = set()\n self.exclude_flags = set()\n self.require_flags = set()\n\n # modules + flags\n if modules is None:\n modules = []\n if isinstance(modules, str):\n modules = [modules]\n if output_modules is None:\n output_modules = []\n if isinstance(output_modules, str):\n output_modules = [output_modules]\n if exclude_modules is None:\n exclude_modules = []\n if isinstance(exclude_modules, str):\n exclude_modules = [exclude_modules]\n if flags is None:\n flags = []\n if isinstance(flags, str):\n flags = [flags]\n if exclude_flags is None:\n exclude_flags = []\n if isinstance(exclude_flags, str):\n exclude_flags = [exclude_flags]\n if require_flags is None:\n require_flags = []\n if isinstance(require_flags, str):\n require_flags = [require_flags]\n\n # these are used only for preserving the modules as specified in the original preset\n # this is to ensure the preset looks the same when reserialized\n self.explicit_scan_modules = set() if modules is None else set(modules)\n self.explicit_output_modules = set() if output_modules is None else set(output_modules)\n\n # whether to force-start the scan (ignoring conditional aborts and failed module setups)\n self.force_start = force_start\n\n # scan output directory\n self.output_dir = output_dir\n # name of scan\n self.scan_name = scan_name\n\n # name of preset, default blank\n self.name = name or \"\"\n # preset description, default blank\n self.description = description or \"\"\n\n # custom conditions, evaluated during .bake()\n self.conditions = []\n if conditions is not None:\n for condition in conditions:\n self.conditions.append((self.name, condition))\n\n # keeps track of loaded preset files to prevent infinite circular inclusions\n self._preset_files_loaded = set()\n if _exclude is not None:\n for _filename in _exclude:\n self._preset_files_loaded.add(Path(_filename).resolve())\n\n # bbot core config\n self.core = CORE.copy()\n if config is None:\n config = omegaconf.OmegaConf.create({})\n # merge custom configs if specified by the user\n self.core.merge_custom(config)\n\n # log verbosity\n # actual log verbosity isn't set until .bake()\n self.verbose = verbose\n self.debug = debug\n self.silent = silent\n\n # custom module directories\n self._module_dirs = set()\n self.module_dirs = module_dirs\n\n # target / whitelist / blacklist\n self.strict_scope = strict_scope\n # these are temporary receptacles until they all get .baked() together\n self._seeds = set(targets if targets else [])\n self._whitelist = set(whitelist) if whitelist else whitelist\n self._blacklist = set(blacklist if blacklist else [])\n\n self._target = None\n\n # \"presets\" is alias to \"include\"\n if presets and include:\n raise ValueError(\n 'Cannot use both \"presets\" and \"include\" args at the same time (presets is only an alias to include). Please pick only one :)'\n )\n if presets and not include:\n include = presets\n # include other presets\n if include and not isinstance(include, (list, tuple, set)):\n include = [include]\n if include:\n for included_preset in include:\n self.include_preset(included_preset)\n\n # we don't fill self.modules yet (that happens in .bake())\n self.explicit_scan_modules.update(set(modules))\n self.explicit_output_modules.update(set(output_modules))\n self.exclude_modules.update(set(exclude_modules))\n self.flags.update(set(flags))\n self.exclude_flags.update(set(exclude_flags))\n self.require_flags.update(set(require_flags))\n\n @property\n def bbot_home(self):\n return Path(self.config.get(\"home\", \"~/.bbot\")).expanduser().resolve()\n\n @property\n def target(self):\n if self._target is None:\n raise ValueError(\"Cannot access target before preset is baked (use ._seeds instead)\")\n return self._target\n\n @property\n def whitelist(self):\n if self._target is None:\n raise ValueError(\"Cannot access whitelist before preset is baked (use ._whitelist instead)\")\n return self.target.whitelist\n\n @property\n def blacklist(self):\n if self._target is None:\n raise ValueError(\"Cannot access blacklist before preset is baked (use ._blacklist instead)\")\n return self.target.blacklist\n\n @property\n def preset_dir(self):\n return self.bbot_home / \"presets\"\n\n @property\n def default_output_modules(self):\n if self._default_output_modules is not None:\n output_modules = self._default_output_modules\n else:\n output_modules = [\"python\", \"csv\", \"txt\", \"json\"]\n if self._cli:\n output_modules.append(\"stdout\")\n return output_modules\n\n @property\n def default_internal_modules(self):\n preloaded_internal = self.module_loader.preloaded(type=\"internal\")\n if self._default_internal_modules is not None:\n internal_modules = self._default_internal_modules\n else:\n internal_modules = list(preloaded_internal)\n return {k: preloaded_internal[k] for k in internal_modules}\n\n def merge(self, other):\n \"\"\"\n Merge another preset into this one.\n\n If there are any config conflicts, `other` will win over `self`.\n\n Args:\n other (Preset): The preset to merge into this one.\n\n Examples:\n >>> preset1 = Preset(modules=[\"portscan\"])\n >>> preset1.scan_modules\n ['portscan']\n >>> preset2 = Preset(modules=[\"sslcert\"])\n >>> preset2.scan_modules\n ['sslcert']\n >>> preset1.merge(preset2)\n >>> preset1.scan_modules\n ['portscan', 'sslcert']\n \"\"\"\n self.log_debug(f'Merging preset \"{other.name}\" into \"{self.name}\"')\n # config\n self.core.merge_custom(other.core.custom_config)\n self.module_loader.core = self.core\n # module dirs\n # modules + flags\n # establish requirements / exclusions first\n self.exclude_modules.update(other.exclude_modules)\n self.require_flags.update(other.require_flags)\n self.exclude_flags.update(other.exclude_flags)\n # then it's okay to start enabling modules\n self.explicit_scan_modules.update(other.explicit_scan_modules)\n self.explicit_output_modules.update(other.explicit_output_modules)\n self.flags.update(other.flags)\n\n # target / scope\n self._seeds.update(other._seeds)\n # leave whitelist as None until we encounter one\n if other._whitelist is not None:\n if self._whitelist is None:\n self._whitelist = set(other._whitelist)\n else:\n self._whitelist.update(other._whitelist)\n self._blacklist.update(other._blacklist)\n self.strict_scope = self.strict_scope or other.strict_scope\n\n # log verbosity\n if other.silent:\n self.silent = other.silent\n if other.verbose:\n self.verbose = other.verbose\n if other.debug:\n self.debug = other.debug\n # scan name\n if other.scan_name is not None:\n self.scan_name = other.scan_name\n if other.output_dir is not None:\n self.output_dir = other.output_dir\n # conditions\n if other.conditions:\n self.conditions.extend(other.conditions)\n # misc\n self.force_start = self.force_start | other.force_start\n self._cli = self._cli | other._cli\n\n def bake(self, scan=None):\n \"\"\"\n Return a \"baked\" copy of this preset, ready for use by a BBOT scan.\n\n Baking a preset finalizes it by populating `preset.modules` based on flags,\n performing final validations, and substituting environment variables in preloaded modules.\n It also evaluates custom `conditions` as specified in the preset.\n\n This function is automatically called in Scanner.__init__(). There is no need to call it manually.\n \"\"\"\n self.log_debug(\"Getting baked\")\n # create a copy of self\n baked_preset = copy(self)\n baked_preset.scan = scan\n # copy core\n baked_preset.core = self.core.copy()\n # copy module loader\n baked_preset._module_loader = self.module_loader.copy()\n # prepare os environment\n os_environ = baked_preset.environ.prepare()\n # find and replace preloaded modules with os environ\n # this is different from the config variable substitution because it modifies\n # the preloaded modules, i.e. their ansible playbooks\n baked_preset.module_loader.find_and_replace(**os_environ)\n # update os environ\n os.environ.clear()\n os.environ.update(os_environ)\n\n # validate flags, config options\n baked_preset.validate()\n\n # validate log level options\n baked_preset.apply_log_level(apply_core=scan is not None)\n\n # assign baked preset to our scan\n if scan is not None:\n scan.preset = baked_preset\n\n # now that our requirements / exclusions are validated, we can start enabling modules\n # enable scan modules\n for module in baked_preset.explicit_scan_modules:\n baked_preset.add_module(module, module_type=\"scan\")\n\n # enable output modules\n output_modules_to_enable = set(baked_preset.explicit_output_modules)\n default_output_modules = self.default_output_modules\n output_module_override = any(m in default_output_modules for m in output_modules_to_enable)\n # if none of the default output modules have been explicitly specified, enable them all\n if not output_module_override:\n output_modules_to_enable.update(self.default_output_modules)\n for module in output_modules_to_enable:\n baked_preset.add_module(module, module_type=\"output\", raise_error=False)\n\n # enable internal modules\n for internal_module, preloaded in self.default_internal_modules.items():\n is_enabled = baked_preset.config.get(internal_module, True)\n is_excluded = internal_module in baked_preset.exclude_modules\n if is_enabled and not is_excluded:\n baked_preset.add_module(internal_module, module_type=\"internal\", raise_error=False)\n\n # disable internal modules if requested\n for internal_module in baked_preset.internal_modules:\n if baked_preset.config.get(internal_module, True) == False:\n baked_preset.exclude_modules.add(internal_module)\n\n # enable modules by flag\n for flag in baked_preset.flags:\n for module, preloaded in baked_preset.module_loader.preloaded().items():\n module_flags = preloaded.get(\"flags\", [])\n module_type = preloaded.get(\"type\", \"scan\")\n if flag in module_flags:\n self.log_debug(f'Enabling module \"{module}\" because it has flag \"{flag}\"')\n baked_preset.add_module(module, module_type, raise_error=False)\n\n # ensure we have output modules\n if not baked_preset.output_modules:\n for output_module in self.default_output_modules:\n baked_preset.add_module(output_module, module_type=\"output\", raise_error=False)\n\n # create target object\n from bbot.scanner.target import BBOTTarget\n\n baked_preset._target = BBOTTarget(\n *list(self._seeds),\n whitelist=self._whitelist,\n blacklist=self._blacklist,\n strict_scope=self.strict_scope,\n scan=scan,\n )\n\n # evaluate conditions\n if baked_preset.conditions:\n from .conditions import ConditionEvaluator\n\n evaluator = ConditionEvaluator(baked_preset)\n evaluator.evaluate()\n\n self._baked = True\n return baked_preset\n\n def parse_args(self):\n \"\"\"\n Parse CLI arguments, and merge them into this preset.\n\n Used in `cli.py`.\n \"\"\"\n self._cli = True\n self.merge(self.args.preset_from_args())\n\n @property\n def module_dirs(self):\n return self.module_loader.module_dirs\n\n @module_dirs.setter\n def module_dirs(self, module_dirs):\n if module_dirs:\n if isinstance(module_dirs, str):\n module_dirs = [module_dirs]\n for m in module_dirs:\n self.module_loader.add_module_dir(m)\n self._module_dirs.add(m)\n\n @property\n def scan_modules(self):\n return [m for m in self.modules if self.preloaded_module(m).get(\"type\", \"scan\") == \"scan\"]\n\n @property\n def output_modules(self):\n return [m for m in self.modules if self.preloaded_module(m).get(\"type\", \"scan\") == \"output\"]\n\n @property\n def internal_modules(self):\n return [m for m in self.modules if self.preloaded_module(m).get(\"type\", \"scan\") == \"internal\"]\n\n def add_module(self, module_name, module_type=\"scan\", raise_error=True):\n self.log_debug(f'Adding module \"{module_name}\" of type \"{module_type}\"')\n is_valid, reason, preloaded = self._is_valid_module(module_name, module_type, raise_error=raise_error)\n if not is_valid:\n self.log_debug(f'Unable to add {module_type} module \"{module_name}\": {reason}')\n return\n self.modules.add(module_name)\n for module_dep in preloaded.get(\"deps\", {}).get(\"modules\", []):\n if module_dep != module_name and module_dep not in self.modules:\n self.log_verbose(f'Adding module \"{module_dep}\" because {module_name} depends on it')\n self.add_module(module_dep, raise_error=False)\n\n def preloaded_module(self, module):\n return self.module_loader.preloaded()[module]\n\n @property\n def config(self):\n return self.core.config\n\n @property\n def web_config(self):\n return self.core.config.get(\"web\", {})\n\n def apply_log_level(self, apply_core=False):\n # silent takes precedence\n if self.silent:\n self.verbose = False\n self.debug = False\n if apply_core:\n self.core.logger.log_level = \"CRITICAL\"\n for key in (\"verbose\", \"debug\"):\n with suppress(omegaconf.errors.ConfigKeyError):\n del self.core.custom_config[key]\n else:\n # then debug\n if self.debug:\n self.verbose = False\n if apply_core:\n self.core.logger.log_level = \"DEBUG\"\n with suppress(omegaconf.errors.ConfigKeyError):\n del self.core.custom_config[\"verbose\"]\n else:\n # finally verbose\n if self.verbose and apply_core:\n self.core.logger.log_level = \"VERBOSE\"\n\n @property\n def helpers(self):\n if self._helpers is None:\n from bbot.core.helpers.helper import ConfigAwareHelper\n\n self._helpers = ConfigAwareHelper(preset=self)\n return self._helpers\n\n @property\n def module_loader(self):\n self.environ\n if self._module_loader is None:\n from bbot.core.modules import MODULE_LOADER\n\n self._module_loader = MODULE_LOADER\n self._module_loader.ensure_config_files()\n\n return self._module_loader\n\n @property\n def environ(self):\n if self._environ is None:\n from .environ import BBOTEnviron\n\n self._environ = BBOTEnviron(self)\n return self._environ\n\n @property\n def args(self):\n if self._args is None:\n from .args import BBOTArgs\n\n self._args = BBOTArgs(self)\n return self._args\n\n def in_scope(self, host):\n return self.target.in_scope(host)\n\n def blacklisted(self, host):\n return self.target.blacklisted(host)\n\n def whitelisted(self, host):\n return self.target.whitelisted(host)\n\n @classmethod\n def from_dict(cls, preset_dict, name=None, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a Python dictionary object.\n\n Args:\n preset_dict (dict): Preset in dictionary form\n name (str, optional): Name of preset\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n\n Returns:\n Preset: The loaded preset\n\n Examples:\n >>> preset = Preset.from_dict({\"target\": [\"evilcorp.com\"], \"modules\": [\"portscan\"]})\n \"\"\"\n new_preset = cls(\n *preset_dict.get(\"target\", []),\n whitelist=preset_dict.get(\"whitelist\"),\n blacklist=preset_dict.get(\"blacklist\"),\n modules=preset_dict.get(\"modules\"),\n output_modules=preset_dict.get(\"output_modules\"),\n exclude_modules=preset_dict.get(\"exclude_modules\"),\n flags=preset_dict.get(\"flags\"),\n require_flags=preset_dict.get(\"require_flags\"),\n exclude_flags=preset_dict.get(\"exclude_flags\"),\n verbose=preset_dict.get(\"verbose\", False),\n debug=preset_dict.get(\"debug\", False),\n silent=preset_dict.get(\"silent\", False),\n config=preset_dict.get(\"config\"),\n strict_scope=preset_dict.get(\"strict_scope\", False),\n module_dirs=preset_dict.get(\"module_dirs\", []),\n include=list(preset_dict.get(\"include\", [])),\n scan_name=preset_dict.get(\"scan_name\"),\n output_dir=preset_dict.get(\"output_dir\"),\n name=preset_dict.get(\"name\", name),\n description=preset_dict.get(\"description\"),\n conditions=preset_dict.get(\"conditions\", []),\n _exclude=_exclude,\n _log=_log,\n )\n return new_preset\n\n def include_preset(self, filename):\n \"\"\"\n Load a preset from a yaml file and merge it into this one.\n\n If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Args:\n filename (Path): The preset YAML file to merge\n\n Examples:\n >>> preset.include_preset(\"/home/user/my_preset.yml\")\n \"\"\"\n self.log_debug(f'Including preset \"{filename}\"')\n preset_filename = PRESET_PATH.find(filename)\n preset_from_yaml = self.from_yaml_file(preset_filename, _exclude=self._preset_files_loaded)\n if preset_from_yaml is not False:\n self.merge(preset_from_yaml)\n self._preset_files_loaded.add(preset_filename)\n\n @classmethod\n def from_yaml_file(cls, filename, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> preset = Preset.from_yaml_file(\"/home/user/my_preset.yml\")\n \"\"\"\n filename = Path(filename).resolve()\n try:\n return _preset_cache[filename]\n except KeyError:\n if _exclude is None:\n _exclude = set()\n if _exclude is not None and filename in _exclude:\n log.debug(f\"Not loading {filename} because it was already loaded {_exclude}\")\n return False\n log.debug(f\"Loading {filename} because it's not in excluded list ({_exclude})\")\n _exclude = set(_exclude)\n _exclude.add(filename)\n try:\n yaml_str = open(filename).read()\n except FileNotFoundError:\n raise PresetNotFoundError(f'Could not find preset at \"{filename}\" - file does not exist')\n preset = cls.from_dict(\n omegaconf.OmegaConf.create(yaml_str), name=filename.stem, _exclude=_exclude, _log=_log\n )\n preset._yaml_str = yaml_str\n _preset_cache[filename] = preset\n return preset\n\n @classmethod\n def from_yaml_string(cls, yaml_preset):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> yaml_string = '''\n >>> target:\n >>> - evilcorp.com\n >>> modules:\n >>> - portscan'''\n >>> preset = Preset.from_yaml_string(yaml_string)\n \"\"\"\n return cls.from_dict(omegaconf.OmegaConf.create(yaml_preset))\n\n def to_dict(self, include_target=False, full_config=False, redact_secrets=False):\n \"\"\"\n Convert this preset into a Python dictionary.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n\n Returns:\n dict: The preset in dictionary form\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> preset.to_dict()\n {\"flags\": [\"subdomain-enum\"], \"modules\": [\"portscan\"]}\n \"\"\"\n preset_dict = {}\n\n # config\n if full_config:\n config = self.core.config\n else:\n config = self.core.custom_config\n config = omegaconf.OmegaConf.to_object(config)\n if redact_secrets:\n config = self.core.no_secrets_config(config)\n if config:\n preset_dict[\"config\"] = config\n\n # scope\n if include_target:\n target = sorted(str(t.data) for t in self.target.seeds)\n whitelist = []\n if self.target.whitelist is not None:\n whitelist = sorted(str(t.data) for t in self.target.whitelist)\n blacklist = sorted(str(t.data) for t in self.target.blacklist)\n if target:\n preset_dict[\"target\"] = target\n if whitelist and whitelist != target:\n preset_dict[\"whitelist\"] = whitelist\n if blacklist:\n preset_dict[\"blacklist\"] = blacklist\n if self.strict_scope:\n preset_dict[\"strict_scope\"] = True\n\n # flags + modules\n if self.require_flags:\n preset_dict[\"require_flags\"] = sorted(self.require_flags)\n if self.exclude_flags:\n preset_dict[\"exclude_flags\"] = sorted(self.exclude_flags)\n if self.exclude_modules:\n preset_dict[\"exclude_modules\"] = sorted(self.exclude_modules)\n if self.flags:\n preset_dict[\"flags\"] = sorted(self.flags)\n if self.explicit_scan_modules:\n preset_dict[\"modules\"] = sorted(self.explicit_scan_modules)\n if self.explicit_output_modules:\n preset_dict[\"output_modules\"] = sorted(self.explicit_output_modules)\n\n # log verbosity\n if self.verbose:\n preset_dict[\"verbose\"] = True\n if self.debug:\n preset_dict[\"debug\"] = True\n if self.silent:\n preset_dict[\"silent\"] = True\n\n # misc scan options\n if self.scan_name:\n preset_dict[\"scan_name\"] = self.scan_name\n if self.scan_name:\n preset_dict[\"output_dir\"] = self.output_dir\n\n # conditions\n if self.conditions:\n preset_dict[\"conditions\"] = [c[-1] for c in self.conditions]\n\n return preset_dict\n\n def to_yaml(self, include_target=False, full_config=False, sort_keys=False):\n \"\"\"\n Return the preset in the form of a YAML string.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n sort_keys (bool, optional): If True, sort YAML keys alphabetically\n\n Returns:\n str: The preset in the form of a YAML string\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> print(preset.to_yaml())\n flags:\n - subdomain-enum\n modules:\n - portscan\n \"\"\"\n preset_dict = self.to_dict(include_target=include_target, full_config=full_config)\n return yaml.dump(preset_dict, sort_keys=sort_keys)\n\n def _is_valid_module(self, module, module_type, name_only=False, raise_error=True):\n if module_type == \"scan\":\n module_choices = self.module_loader.scan_module_choices\n elif module_type == \"output\":\n module_choices = self.module_loader.output_module_choices\n elif module_type == \"internal\":\n module_choices = self.module_loader.internal_module_choices\n else:\n raise ValidationError(f'Unknown module type \"{module}\"')\n\n if not module in module_choices:\n raise ValidationError(get_closest_match(module, module_choices, msg=f\"{module_type} module\"))\n\n try:\n preloaded = self.module_loader.preloaded()[module]\n except KeyError:\n raise ValidationError(f'Unknown module \"{module}\"')\n\n if name_only:\n return True, \"\", preloaded\n\n if module in self.exclude_modules:\n reason = \"the module has been excluded\"\n if raise_error:\n raise ValidationError(f'Unable to add {module_type} module \"{module}\" because {reason}')\n return False, reason, {}\n\n module_flags = preloaded.get(\"flags\", [])\n _module_type = preloaded.get(\"type\", \"scan\")\n if module_type:\n if _module_type != module_type:\n reason = f'its type ({_module_type}) is not \"{module_type}\"'\n if raise_error:\n raise ValidationError(f'Unable to add {module_type} module \"{module}\" because {reason}')\n return False, reason, preloaded\n\n if _module_type == \"scan\":\n if self.exclude_flags:\n for f in module_flags:\n if f in self.exclude_flags:\n return False, f'it has excluded flag, \"{f}\"', preloaded\n if self.require_flags and not all(f in module_flags for f in self.require_flags):\n return False, f'it doesn\\'t have the required flags ({\",\".join(self.require_flags)})', preloaded\n\n return True, \"\", preloaded\n\n def validate(self):\n \"\"\"\n Validate module/flag exclusions/requirements, and CLI config options if applicable.\n \"\"\"\n if self._cli:\n self.args.validate()\n\n # validate excluded modules\n for excluded_module in self.exclude_modules:\n if not excluded_module in self.module_loader.all_module_choices:\n raise ValidationError(\n get_closest_match(excluded_module, self.module_loader.all_module_choices, msg=\"module\")\n )\n # validate excluded flags\n for excluded_flag in self.exclude_flags:\n if not excluded_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(excluded_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate required flags\n for required_flag in self.require_flags:\n if not required_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(required_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate flags\n for flag in self.flags:\n if not flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(flag, self.module_loader.flag_choices, msg=\"flag\"))\n\n @property\n def all_presets(self):\n \"\"\"\n Recursively find all the presets and return them as a dictionary\n \"\"\"\n preset_dir = self.preset_dir\n home_dir = Path.home()\n\n # first, add local preset dir to PRESET_PATH\n PRESET_PATH.add_path(self.preset_dir)\n\n # ensure local preset directory exists\n mkdir(preset_dir)\n\n global DEFAULT_PRESETS\n if DEFAULT_PRESETS is None:\n presets = dict()\n for ext in (\"yml\", \"yaml\"):\n for preset_path in PRESET_PATH:\n # for every yaml file\n for original_filename in preset_path.rglob(f\"**/*.{ext}\"):\n # not including symlinks\n if original_filename.is_symlink():\n continue\n\n # try to load it as a preset\n try:\n loaded_preset = self.from_yaml_file(original_filename, _log=True)\n if loaded_preset is False:\n continue\n except Exception as e:\n log.warning(f'Failed to load preset at \"{original_filename}\": {e}')\n log.trace(traceback.format_exc())\n continue\n\n # category is the parent folder(s), if any\n category = str(original_filename.relative_to(preset_path).parent)\n if category == \".\":\n category = \"\"\n\n local_preset = original_filename\n # populate symlinks in local preset dir\n if not original_filename.is_relative_to(preset_dir):\n relative_preset = original_filename.relative_to(preset_path)\n local_preset = preset_dir / relative_preset\n mkdir(local_preset.parent, check_writable=False)\n if not local_preset.exists():\n local_preset.symlink_to(original_filename)\n\n # collapse home directory into \"~\"\n if local_preset.is_relative_to(home_dir):\n local_preset = Path(\"~\") / local_preset.relative_to(home_dir)\n\n presets[local_preset] = (loaded_preset, category, preset_path, original_filename)\n\n # sort by name\n DEFAULT_PRESETS = dict(sorted(presets.items(), key=lambda x: x[-1][0].name))\n return DEFAULT_PRESETS\n\n def presets_table(self, include_modules=True):\n \"\"\"\n Return a table of all the presets in the form of a string\n \"\"\"\n table = []\n header = [\"Preset\", \"Category\", \"Description\", \"# Modules\"]\n if include_modules:\n header.append(\"Modules\")\n for yaml_file, (loaded_preset, category, preset_path, original_file) in self.all_presets.items():\n loaded_preset = loaded_preset.bake()\n num_modules = f\"{len(loaded_preset.scan_modules):,}\"\n row = [loaded_preset.name, category, loaded_preset.description, num_modules]\n if include_modules:\n row.append(\", \".join(sorted(loaded_preset.scan_modules)))\n table.append(row)\n return make_table(table, header)\n\n def log_verbose(self, msg):\n if self._log:\n log.verbose(f\"Preset {self.name}: {msg}\")\n\n def log_debug(self, msg):\n if self._log:\n log.debug(f\"Preset {self.name}: {msg}\")\n
"},{"location":"dev/presets/#bbot.scanner.Preset.all_presets","title":"all_presets property
","text":"all_presets\n
Recursively find all the presets and return them as a dictionary
"},{"location":"dev/presets/#bbot.scanner.Preset.__init__","title":"__init__","text":"__init__(*targets, whitelist=None, blacklist=None, strict_scope=False, modules=None, output_modules=None, exclude_modules=None, flags=None, require_flags=None, exclude_flags=None, config=None, module_dirs=None, include=None, presets=None, output_dir=None, scan_name=None, name=None, description=None, conditions=None, force_start=False, verbose=False, debug=False, silent=False, _exclude=None, _log=True)\n
Initializes the Preset class.
Parameters:
*targets
(str
, default: ()
) \u2013 Target(s) to scan. Types supported: hostnames, IPs, CIDRs, emails, open ports.
whitelist
(list
, default: None
) \u2013 Whitelisted target(s) to scan. Defaults to the same as targets
.
blacklist
(list
, default: None
) \u2013 Blacklisted target(s). Takes ultimate precedence. Defaults to empty.
strict_scope
(bool
, default: False
) \u2013 If True, subdomains of targets are not in-scope.
modules
(list[str]
, default: None
) \u2013 List of scan modules to enable for the scan. Defaults to empty list.
output_modules
(list[str]
, default: None
) \u2013 List of output modules to use. Defaults to csv, human, and json.
exclude_modules
(list[str]
, default: None
) \u2013 List of modules to exclude from the scan.
require_flags
(list[str]
, default: None
) \u2013 Only enable modules if they have these flags.
exclude_flags
(list[str]
, default: None
) \u2013 Don't enable modules if they have any of these flags.
module_dirs
(list[str]
, default: None
) \u2013 additional directories to load modules from.
config
(dict
, default: None
) \u2013 Additional scan configuration settings.
include
(list[str]
, default: None
) \u2013 names or filenames of other presets to include.
presets
(list[str]
, default: None
) \u2013 an alias for include
.
output_dir
(str or Path
, default: None
) \u2013 Directory to store scan output. Defaults to BBOT home directory (~/.bbot
).
scan_name
(str
, default: None
) \u2013 Human-readable name of the scan. If not specified, it will be random, e.g. \"demonic_jimmy\".
name
(str
, default: None
) \u2013 Human-readable name of the preset. Used mainly for logging.
description
(str
, default: None
) \u2013 Description of the preset.
conditions
(list[str]
, default: None
) \u2013 Custom conditions to be executed before scan start. Written in Jinja2.
force_start
(bool
, default: False
) \u2013 If True, ignore conditional aborts and failed module setups. Just run the scan!
verbose
(bool
, default: False
) \u2013 Set the BBOT logger to verbose mode.
debug
(bool
, default: False
) \u2013 Set the BBOT logger to debug mode.
silent
(bool
, default: False
) \u2013 Silence all stderr (effectively disables the BBOT logger).
_exclude
(list[Path]
, default: None
) \u2013 Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.
_log
(bool
, default: True
) \u2013 Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.
bbot/scanner/preset/preset.py
def __init__(\n self,\n *targets,\n whitelist=None,\n blacklist=None,\n strict_scope=False,\n modules=None,\n output_modules=None,\n exclude_modules=None,\n flags=None,\n require_flags=None,\n exclude_flags=None,\n config=None,\n module_dirs=None,\n include=None,\n presets=None,\n output_dir=None,\n scan_name=None,\n name=None,\n description=None,\n conditions=None,\n force_start=False,\n verbose=False,\n debug=False,\n silent=False,\n _exclude=None,\n _log=True,\n):\n \"\"\"\n Initializes the Preset class.\n\n Args:\n *targets (str): Target(s) to scan. Types supported: hostnames, IPs, CIDRs, emails, open ports.\n whitelist (list, optional): Whitelisted target(s) to scan. Defaults to the same as `targets`.\n blacklist (list, optional): Blacklisted target(s). Takes ultimate precedence. Defaults to empty.\n strict_scope (bool, optional): If True, subdomains of targets are not in-scope.\n modules (list[str], optional): List of scan modules to enable for the scan. Defaults to empty list.\n output_modules (list[str], optional): List of output modules to use. Defaults to csv, human, and json.\n exclude_modules (list[str], optional): List of modules to exclude from the scan.\n require_flags (list[str], optional): Only enable modules if they have these flags.\n exclude_flags (list[str], optional): Don't enable modules if they have any of these flags.\n module_dirs (list[str], optional): additional directories to load modules from.\n config (dict, optional): Additional scan configuration settings.\n include (list[str], optional): names or filenames of other presets to include.\n presets (list[str], optional): an alias for `include`.\n output_dir (str or Path, optional): Directory to store scan output. Defaults to BBOT home directory (`~/.bbot`).\n scan_name (str, optional): Human-readable name of the scan. If not specified, it will be random, e.g. \"demonic_jimmy\".\n name (str, optional): Human-readable name of the preset. Used mainly for logging.\n description (str, optional): Description of the preset.\n conditions (list[str], optional): Custom conditions to be executed before scan start. Written in Jinja2.\n force_start (bool, optional): If True, ignore conditional aborts and failed module setups. Just run the scan!\n verbose (bool, optional): Set the BBOT logger to verbose mode.\n debug (bool, optional): Set the BBOT logger to debug mode.\n silent (bool, optional): Silence all stderr (effectively disables the BBOT logger).\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n \"\"\"\n # internal variables\n self._cli = False\n self._log = _log\n self.scan = None\n self._args = None\n self._environ = None\n self._helpers = None\n self._module_loader = None\n self._yaml_str = \"\"\n self._baked = False\n\n self._default_output_modules = None\n self._default_internal_modules = None\n\n # modules / flags\n self.modules = set()\n self.exclude_modules = set()\n self.flags = set()\n self.exclude_flags = set()\n self.require_flags = set()\n\n # modules + flags\n if modules is None:\n modules = []\n if isinstance(modules, str):\n modules = [modules]\n if output_modules is None:\n output_modules = []\n if isinstance(output_modules, str):\n output_modules = [output_modules]\n if exclude_modules is None:\n exclude_modules = []\n if isinstance(exclude_modules, str):\n exclude_modules = [exclude_modules]\n if flags is None:\n flags = []\n if isinstance(flags, str):\n flags = [flags]\n if exclude_flags is None:\n exclude_flags = []\n if isinstance(exclude_flags, str):\n exclude_flags = [exclude_flags]\n if require_flags is None:\n require_flags = []\n if isinstance(require_flags, str):\n require_flags = [require_flags]\n\n # these are used only for preserving the modules as specified in the original preset\n # this is to ensure the preset looks the same when reserialized\n self.explicit_scan_modules = set() if modules is None else set(modules)\n self.explicit_output_modules = set() if output_modules is None else set(output_modules)\n\n # whether to force-start the scan (ignoring conditional aborts and failed module setups)\n self.force_start = force_start\n\n # scan output directory\n self.output_dir = output_dir\n # name of scan\n self.scan_name = scan_name\n\n # name of preset, default blank\n self.name = name or \"\"\n # preset description, default blank\n self.description = description or \"\"\n\n # custom conditions, evaluated during .bake()\n self.conditions = []\n if conditions is not None:\n for condition in conditions:\n self.conditions.append((self.name, condition))\n\n # keeps track of loaded preset files to prevent infinite circular inclusions\n self._preset_files_loaded = set()\n if _exclude is not None:\n for _filename in _exclude:\n self._preset_files_loaded.add(Path(_filename).resolve())\n\n # bbot core config\n self.core = CORE.copy()\n if config is None:\n config = omegaconf.OmegaConf.create({})\n # merge custom configs if specified by the user\n self.core.merge_custom(config)\n\n # log verbosity\n # actual log verbosity isn't set until .bake()\n self.verbose = verbose\n self.debug = debug\n self.silent = silent\n\n # custom module directories\n self._module_dirs = set()\n self.module_dirs = module_dirs\n\n # target / whitelist / blacklist\n self.strict_scope = strict_scope\n # these are temporary receptacles until they all get .baked() together\n self._seeds = set(targets if targets else [])\n self._whitelist = set(whitelist) if whitelist else whitelist\n self._blacklist = set(blacklist if blacklist else [])\n\n self._target = None\n\n # \"presets\" is alias to \"include\"\n if presets and include:\n raise ValueError(\n 'Cannot use both \"presets\" and \"include\" args at the same time (presets is only an alias to include). Please pick only one :)'\n )\n if presets and not include:\n include = presets\n # include other presets\n if include and not isinstance(include, (list, tuple, set)):\n include = [include]\n if include:\n for included_preset in include:\n self.include_preset(included_preset)\n\n # we don't fill self.modules yet (that happens in .bake())\n self.explicit_scan_modules.update(set(modules))\n self.explicit_output_modules.update(set(output_modules))\n self.exclude_modules.update(set(exclude_modules))\n self.flags.update(set(flags))\n self.exclude_flags.update(set(exclude_flags))\n self.require_flags.update(set(require_flags))\n
"},{"location":"dev/presets/#bbot.scanner.Preset.bake","title":"bake","text":"bake(scan=None)\n
Return a \"baked\" copy of this preset, ready for use by a BBOT scan.
Baking a preset finalizes it by populating preset.modules
based on flags, performing final validations, and substituting environment variables in preloaded modules. It also evaluates custom conditions
as specified in the preset.
This function is automatically called in Scanner.init(). There is no need to call it manually.
Source code inbbot/scanner/preset/preset.py
def bake(self, scan=None):\n \"\"\"\n Return a \"baked\" copy of this preset, ready for use by a BBOT scan.\n\n Baking a preset finalizes it by populating `preset.modules` based on flags,\n performing final validations, and substituting environment variables in preloaded modules.\n It also evaluates custom `conditions` as specified in the preset.\n\n This function is automatically called in Scanner.__init__(). There is no need to call it manually.\n \"\"\"\n self.log_debug(\"Getting baked\")\n # create a copy of self\n baked_preset = copy(self)\n baked_preset.scan = scan\n # copy core\n baked_preset.core = self.core.copy()\n # copy module loader\n baked_preset._module_loader = self.module_loader.copy()\n # prepare os environment\n os_environ = baked_preset.environ.prepare()\n # find and replace preloaded modules with os environ\n # this is different from the config variable substitution because it modifies\n # the preloaded modules, i.e. their ansible playbooks\n baked_preset.module_loader.find_and_replace(**os_environ)\n # update os environ\n os.environ.clear()\n os.environ.update(os_environ)\n\n # validate flags, config options\n baked_preset.validate()\n\n # validate log level options\n baked_preset.apply_log_level(apply_core=scan is not None)\n\n # assign baked preset to our scan\n if scan is not None:\n scan.preset = baked_preset\n\n # now that our requirements / exclusions are validated, we can start enabling modules\n # enable scan modules\n for module in baked_preset.explicit_scan_modules:\n baked_preset.add_module(module, module_type=\"scan\")\n\n # enable output modules\n output_modules_to_enable = set(baked_preset.explicit_output_modules)\n default_output_modules = self.default_output_modules\n output_module_override = any(m in default_output_modules for m in output_modules_to_enable)\n # if none of the default output modules have been explicitly specified, enable them all\n if not output_module_override:\n output_modules_to_enable.update(self.default_output_modules)\n for module in output_modules_to_enable:\n baked_preset.add_module(module, module_type=\"output\", raise_error=False)\n\n # enable internal modules\n for internal_module, preloaded in self.default_internal_modules.items():\n is_enabled = baked_preset.config.get(internal_module, True)\n is_excluded = internal_module in baked_preset.exclude_modules\n if is_enabled and not is_excluded:\n baked_preset.add_module(internal_module, module_type=\"internal\", raise_error=False)\n\n # disable internal modules if requested\n for internal_module in baked_preset.internal_modules:\n if baked_preset.config.get(internal_module, True) == False:\n baked_preset.exclude_modules.add(internal_module)\n\n # enable modules by flag\n for flag in baked_preset.flags:\n for module, preloaded in baked_preset.module_loader.preloaded().items():\n module_flags = preloaded.get(\"flags\", [])\n module_type = preloaded.get(\"type\", \"scan\")\n if flag in module_flags:\n self.log_debug(f'Enabling module \"{module}\" because it has flag \"{flag}\"')\n baked_preset.add_module(module, module_type, raise_error=False)\n\n # ensure we have output modules\n if not baked_preset.output_modules:\n for output_module in self.default_output_modules:\n baked_preset.add_module(output_module, module_type=\"output\", raise_error=False)\n\n # create target object\n from bbot.scanner.target import BBOTTarget\n\n baked_preset._target = BBOTTarget(\n *list(self._seeds),\n whitelist=self._whitelist,\n blacklist=self._blacklist,\n strict_scope=self.strict_scope,\n scan=scan,\n )\n\n # evaluate conditions\n if baked_preset.conditions:\n from .conditions import ConditionEvaluator\n\n evaluator = ConditionEvaluator(baked_preset)\n evaluator.evaluate()\n\n self._baked = True\n return baked_preset\n
"},{"location":"dev/presets/#bbot.scanner.Preset.from_dict","title":"from_dict classmethod
","text":"from_dict(preset_dict, name=None, _exclude=None, _log=False)\n
Create a preset from a Python dictionary object.
Parameters:
preset_dict
(dict
) \u2013 Preset in dictionary form
name
(str
, default: None
) \u2013 Name of preset
_exclude
(list[Path]
, default: None
) \u2013 Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.
_log
(bool
, default: False
) \u2013 Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.
Returns:
Preset
\u2013 The loaded preset
Examples:
>>> preset = Preset.from_dict({\"target\": [\"evilcorp.com\"], \"modules\": [\"portscan\"]})\n
Source code in bbot/scanner/preset/preset.py
@classmethod\ndef from_dict(cls, preset_dict, name=None, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a Python dictionary object.\n\n Args:\n preset_dict (dict): Preset in dictionary form\n name (str, optional): Name of preset\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n\n Returns:\n Preset: The loaded preset\n\n Examples:\n >>> preset = Preset.from_dict({\"target\": [\"evilcorp.com\"], \"modules\": [\"portscan\"]})\n \"\"\"\n new_preset = cls(\n *preset_dict.get(\"target\", []),\n whitelist=preset_dict.get(\"whitelist\"),\n blacklist=preset_dict.get(\"blacklist\"),\n modules=preset_dict.get(\"modules\"),\n output_modules=preset_dict.get(\"output_modules\"),\n exclude_modules=preset_dict.get(\"exclude_modules\"),\n flags=preset_dict.get(\"flags\"),\n require_flags=preset_dict.get(\"require_flags\"),\n exclude_flags=preset_dict.get(\"exclude_flags\"),\n verbose=preset_dict.get(\"verbose\", False),\n debug=preset_dict.get(\"debug\", False),\n silent=preset_dict.get(\"silent\", False),\n config=preset_dict.get(\"config\"),\n strict_scope=preset_dict.get(\"strict_scope\", False),\n module_dirs=preset_dict.get(\"module_dirs\", []),\n include=list(preset_dict.get(\"include\", [])),\n scan_name=preset_dict.get(\"scan_name\"),\n output_dir=preset_dict.get(\"output_dir\"),\n name=preset_dict.get(\"name\", name),\n description=preset_dict.get(\"description\"),\n conditions=preset_dict.get(\"conditions\", []),\n _exclude=_exclude,\n _log=_log,\n )\n return new_preset\n
"},{"location":"dev/presets/#bbot.scanner.Preset.from_yaml_file","title":"from_yaml_file classmethod
","text":"from_yaml_file(filename, _exclude=None, _log=False)\n
Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.
The file extension is optional.
Examples:
>>> preset = Preset.from_yaml_file(\"/home/user/my_preset.yml\")\n
Source code in bbot/scanner/preset/preset.py
@classmethod\ndef from_yaml_file(cls, filename, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> preset = Preset.from_yaml_file(\"/home/user/my_preset.yml\")\n \"\"\"\n filename = Path(filename).resolve()\n try:\n return _preset_cache[filename]\n except KeyError:\n if _exclude is None:\n _exclude = set()\n if _exclude is not None and filename in _exclude:\n log.debug(f\"Not loading {filename} because it was already loaded {_exclude}\")\n return False\n log.debug(f\"Loading {filename} because it's not in excluded list ({_exclude})\")\n _exclude = set(_exclude)\n _exclude.add(filename)\n try:\n yaml_str = open(filename).read()\n except FileNotFoundError:\n raise PresetNotFoundError(f'Could not find preset at \"{filename}\" - file does not exist')\n preset = cls.from_dict(\n omegaconf.OmegaConf.create(yaml_str), name=filename.stem, _exclude=_exclude, _log=_log\n )\n preset._yaml_str = yaml_str\n _preset_cache[filename] = preset\n return preset\n
"},{"location":"dev/presets/#bbot.scanner.Preset.from_yaml_string","title":"from_yaml_string classmethod
","text":"from_yaml_string(yaml_preset)\n
Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.
The file extension is optional.
Examples:
>>> yaml_string = '''\n>>> target:\n>>> - evilcorp.com\n>>> modules:\n>>> - portscan'''\n>>> preset = Preset.from_yaml_string(yaml_string)\n
Source code in bbot/scanner/preset/preset.py
@classmethod\ndef from_yaml_string(cls, yaml_preset):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> yaml_string = '''\n >>> target:\n >>> - evilcorp.com\n >>> modules:\n >>> - portscan'''\n >>> preset = Preset.from_yaml_string(yaml_string)\n \"\"\"\n return cls.from_dict(omegaconf.OmegaConf.create(yaml_preset))\n
"},{"location":"dev/presets/#bbot.scanner.Preset.include_preset","title":"include_preset","text":"include_preset(filename)\n
Load a preset from a yaml file and merge it into this one.
If the full path is not specified, BBOT will look in all the usual places for it.
The file extension is optional.
Parameters:
filename
(Path
) \u2013 The preset YAML file to merge
Examples:
>>> preset.include_preset(\"/home/user/my_preset.yml\")\n
Source code in bbot/scanner/preset/preset.py
def include_preset(self, filename):\n \"\"\"\n Load a preset from a yaml file and merge it into this one.\n\n If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Args:\n filename (Path): The preset YAML file to merge\n\n Examples:\n >>> preset.include_preset(\"/home/user/my_preset.yml\")\n \"\"\"\n self.log_debug(f'Including preset \"{filename}\"')\n preset_filename = PRESET_PATH.find(filename)\n preset_from_yaml = self.from_yaml_file(preset_filename, _exclude=self._preset_files_loaded)\n if preset_from_yaml is not False:\n self.merge(preset_from_yaml)\n self._preset_files_loaded.add(preset_filename)\n
"},{"location":"dev/presets/#bbot.scanner.Preset.merge","title":"merge","text":"merge(other)\n
Merge another preset into this one.
If there are any config conflicts, other
will win over self
.
Parameters:
other
(Preset
) \u2013 The preset to merge into this one.
Examples:
>>> preset1 = Preset(modules=[\"portscan\"])\n>>> preset1.scan_modules\n['portscan']\n>>> preset2 = Preset(modules=[\"sslcert\"])\n>>> preset2.scan_modules\n['sslcert']\n>>> preset1.merge(preset2)\n>>> preset1.scan_modules\n['portscan', 'sslcert']\n
Source code in bbot/scanner/preset/preset.py
def merge(self, other):\n \"\"\"\n Merge another preset into this one.\n\n If there are any config conflicts, `other` will win over `self`.\n\n Args:\n other (Preset): The preset to merge into this one.\n\n Examples:\n >>> preset1 = Preset(modules=[\"portscan\"])\n >>> preset1.scan_modules\n ['portscan']\n >>> preset2 = Preset(modules=[\"sslcert\"])\n >>> preset2.scan_modules\n ['sslcert']\n >>> preset1.merge(preset2)\n >>> preset1.scan_modules\n ['portscan', 'sslcert']\n \"\"\"\n self.log_debug(f'Merging preset \"{other.name}\" into \"{self.name}\"')\n # config\n self.core.merge_custom(other.core.custom_config)\n self.module_loader.core = self.core\n # module dirs\n # modules + flags\n # establish requirements / exclusions first\n self.exclude_modules.update(other.exclude_modules)\n self.require_flags.update(other.require_flags)\n self.exclude_flags.update(other.exclude_flags)\n # then it's okay to start enabling modules\n self.explicit_scan_modules.update(other.explicit_scan_modules)\n self.explicit_output_modules.update(other.explicit_output_modules)\n self.flags.update(other.flags)\n\n # target / scope\n self._seeds.update(other._seeds)\n # leave whitelist as None until we encounter one\n if other._whitelist is not None:\n if self._whitelist is None:\n self._whitelist = set(other._whitelist)\n else:\n self._whitelist.update(other._whitelist)\n self._blacklist.update(other._blacklist)\n self.strict_scope = self.strict_scope or other.strict_scope\n\n # log verbosity\n if other.silent:\n self.silent = other.silent\n if other.verbose:\n self.verbose = other.verbose\n if other.debug:\n self.debug = other.debug\n # scan name\n if other.scan_name is not None:\n self.scan_name = other.scan_name\n if other.output_dir is not None:\n self.output_dir = other.output_dir\n # conditions\n if other.conditions:\n self.conditions.extend(other.conditions)\n # misc\n self.force_start = self.force_start | other.force_start\n self._cli = self._cli | other._cli\n
"},{"location":"dev/presets/#bbot.scanner.Preset.parse_args","title":"parse_args","text":"parse_args()\n
Parse CLI arguments, and merge them into this preset.
Used in cli.py
.
bbot/scanner/preset/preset.py
def parse_args(self):\n \"\"\"\n Parse CLI arguments, and merge them into this preset.\n\n Used in `cli.py`.\n \"\"\"\n self._cli = True\n self.merge(self.args.preset_from_args())\n
"},{"location":"dev/presets/#bbot.scanner.Preset.presets_table","title":"presets_table","text":"presets_table(include_modules=True)\n
Return a table of all the presets in the form of a string
Source code inbbot/scanner/preset/preset.py
def presets_table(self, include_modules=True):\n \"\"\"\n Return a table of all the presets in the form of a string\n \"\"\"\n table = []\n header = [\"Preset\", \"Category\", \"Description\", \"# Modules\"]\n if include_modules:\n header.append(\"Modules\")\n for yaml_file, (loaded_preset, category, preset_path, original_file) in self.all_presets.items():\n loaded_preset = loaded_preset.bake()\n num_modules = f\"{len(loaded_preset.scan_modules):,}\"\n row = [loaded_preset.name, category, loaded_preset.description, num_modules]\n if include_modules:\n row.append(\", \".join(sorted(loaded_preset.scan_modules)))\n table.append(row)\n return make_table(table, header)\n
"},{"location":"dev/presets/#bbot.scanner.Preset.to_dict","title":"to_dict","text":"to_dict(include_target=False, full_config=False, redact_secrets=False)\n
Convert this preset into a Python dictionary.
Parameters:
include_target
(bool
, default: False
) \u2013 If True, include target, whitelist, and blacklist in the dictionary
full_config
(bool
, default: False
) \u2013 If True, include the entire config, not just what's changed from the defaults.
Returns:
dict
\u2013 The preset in dictionary form
Examples:
>>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n>>> preset.to_dict()\n{\"flags\": [\"subdomain-enum\"], \"modules\": [\"portscan\"]}\n
Source code in bbot/scanner/preset/preset.py
def to_dict(self, include_target=False, full_config=False, redact_secrets=False):\n \"\"\"\n Convert this preset into a Python dictionary.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n\n Returns:\n dict: The preset in dictionary form\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> preset.to_dict()\n {\"flags\": [\"subdomain-enum\"], \"modules\": [\"portscan\"]}\n \"\"\"\n preset_dict = {}\n\n # config\n if full_config:\n config = self.core.config\n else:\n config = self.core.custom_config\n config = omegaconf.OmegaConf.to_object(config)\n if redact_secrets:\n config = self.core.no_secrets_config(config)\n if config:\n preset_dict[\"config\"] = config\n\n # scope\n if include_target:\n target = sorted(str(t.data) for t in self.target.seeds)\n whitelist = []\n if self.target.whitelist is not None:\n whitelist = sorted(str(t.data) for t in self.target.whitelist)\n blacklist = sorted(str(t.data) for t in self.target.blacklist)\n if target:\n preset_dict[\"target\"] = target\n if whitelist and whitelist != target:\n preset_dict[\"whitelist\"] = whitelist\n if blacklist:\n preset_dict[\"blacklist\"] = blacklist\n if self.strict_scope:\n preset_dict[\"strict_scope\"] = True\n\n # flags + modules\n if self.require_flags:\n preset_dict[\"require_flags\"] = sorted(self.require_flags)\n if self.exclude_flags:\n preset_dict[\"exclude_flags\"] = sorted(self.exclude_flags)\n if self.exclude_modules:\n preset_dict[\"exclude_modules\"] = sorted(self.exclude_modules)\n if self.flags:\n preset_dict[\"flags\"] = sorted(self.flags)\n if self.explicit_scan_modules:\n preset_dict[\"modules\"] = sorted(self.explicit_scan_modules)\n if self.explicit_output_modules:\n preset_dict[\"output_modules\"] = sorted(self.explicit_output_modules)\n\n # log verbosity\n if self.verbose:\n preset_dict[\"verbose\"] = True\n if self.debug:\n preset_dict[\"debug\"] = True\n if self.silent:\n preset_dict[\"silent\"] = True\n\n # misc scan options\n if self.scan_name:\n preset_dict[\"scan_name\"] = self.scan_name\n if self.scan_name:\n preset_dict[\"output_dir\"] = self.output_dir\n\n # conditions\n if self.conditions:\n preset_dict[\"conditions\"] = [c[-1] for c in self.conditions]\n\n return preset_dict\n
"},{"location":"dev/presets/#bbot.scanner.Preset.to_yaml","title":"to_yaml","text":"to_yaml(include_target=False, full_config=False, sort_keys=False)\n
Return the preset in the form of a YAML string.
Parameters:
include_target
(bool
, default: False
) \u2013 If True, include target, whitelist, and blacklist in the dictionary
full_config
(bool
, default: False
) \u2013 If True, include the entire config, not just what's changed from the defaults.
sort_keys
(bool
, default: False
) \u2013 If True, sort YAML keys alphabetically
Returns:
str
\u2013 The preset in the form of a YAML string
Examples:
>>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n>>> print(preset.to_yaml())\nflags:\n- subdomain-enum\nmodules:\n- portscan\n
Source code in bbot/scanner/preset/preset.py
def to_yaml(self, include_target=False, full_config=False, sort_keys=False):\n \"\"\"\n Return the preset in the form of a YAML string.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n sort_keys (bool, optional): If True, sort YAML keys alphabetically\n\n Returns:\n str: The preset in the form of a YAML string\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> print(preset.to_yaml())\n flags:\n - subdomain-enum\n modules:\n - portscan\n \"\"\"\n preset_dict = self.to_dict(include_target=include_target, full_config=full_config)\n return yaml.dump(preset_dict, sort_keys=sort_keys)\n
"},{"location":"dev/presets/#bbot.scanner.Preset.validate","title":"validate","text":"validate()\n
Validate module/flag exclusions/requirements, and CLI config options if applicable.
Source code inbbot/scanner/preset/preset.py
def validate(self):\n \"\"\"\n Validate module/flag exclusions/requirements, and CLI config options if applicable.\n \"\"\"\n if self._cli:\n self.args.validate()\n\n # validate excluded modules\n for excluded_module in self.exclude_modules:\n if not excluded_module in self.module_loader.all_module_choices:\n raise ValidationError(\n get_closest_match(excluded_module, self.module_loader.all_module_choices, msg=\"module\")\n )\n # validate excluded flags\n for excluded_flag in self.exclude_flags:\n if not excluded_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(excluded_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate required flags\n for required_flag in self.require_flags:\n if not required_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(required_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate flags\n for flag in self.flags:\n if not flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(flag, self.module_loader.flag_choices, msg=\"flag\"))\n
"},{"location":"dev/scanner/","title":"Scanner","text":""},{"location":"dev/scanner/#bbot.scanner.Scanner","title":"Scanner","text":"A class representing a single BBOT scan
Examples:
Create scan with multiple targets:
>>> my_scan = Scanner(\"evilcorp.com\", \"1.2.3.0/24\", modules=[\"portscan\", \"sslcert\", \"httpx\"])\n
Create scan with custom config:
>>> config = {\"http_proxy\": \"http://127.0.0.1:8080\", \"modules\": {\"portscan\": {\"top_ports\": 2000}}}\n>>> my_scan = Scanner(\"www.evilcorp.com\", modules=[\"portscan\", \"httpx\"], config=config)\n
Start the scan, iterating over events as they're discovered (synchronous):
>>> for event in my_scan.start():\n>>> print(event)\n
Start the scan, iterating over events as they're discovered (asynchronous):
>>> async for event in my_scan.async_start():\n>>> print(event)\n
Start the scan without consuming events (synchronous):
>>> my_scan.start_without_generator()\n
Start the scan without consuming events (asynchronous):
>>> await my_scan.async_start_without_generator()\n
Attributes:
status
(str
) \u2013 Status of scan, representing its current state. It can take on the following string values, each of which is mapped to an integer code in _status_codes
:
- \"NOT_STARTED\" (0): Initial status before the scan starts.\n- \"STARTING\" (1): Status when the scan is initializing.\n- \"RUNNING\" (2): Status when the scan is in progress.\n- \"FINISHING\" (3): Status when the scan is in the process of finalizing.\n- \"CLEANING_UP\" (4): Status when the scan is cleaning up resources.\n- \"ABORTING\" (5): Status when the scan is in the process of being aborted.\n- \"ABORTED\" (6): Status when the scan has been aborted.\n- \"FAILED\" (7): Status when the scan has encountered a failure.\n- \"FINISHED\" (8): Status when the scan has successfully completed.\n
_status_code
(int
) \u2013 The numerical representation of the current scan status, stored for internal use. It is mapped according to the values in _status_codes
.
target
(Target
) \u2013 Target of scan (alias to self.preset.target
).
preset
(Preset
) \u2013 The main scan Preset in its baked form.
config
(DictConfig
) \u2013 BBOT config (alias to self.preset.config
).
whitelist
(Target
) \u2013 Scan whitelist (by default this is the same as target
) (alias to self.preset.whitelist
).
blacklist
(Target
) \u2013 Scan blacklist (this takes ultimate precedence) (alias to self.preset.blacklist
).
helpers
(ConfigAwareHelper
) \u2013 Helper containing various reusable functions, regexes, etc. (alias to self.preset.helpers
).
output_dir
(Path
) \u2013 Output directory for scan (alias to self.preset.output_dir
).
name
(str
) \u2013 Name of scan (alias to self.preset.scan_name
).
dispatcher
(Dispatcher
) \u2013 Triggers certain events when the scan status
changes.
modules
(dict
) \u2013 Holds all loaded modules in this format: {\"module_name\": Module()}
.
stats
(ScanStats
) \u2013 Holds high-level scan statistics such as how many events have been produced and consumed by each module.
home
(Path
) \u2013 Base output directory of the scan (default: ~/.bbot/scans/<scan_name>
).
running
(bool
) \u2013 Whether the scan is currently running.
stopping
(bool
) \u2013 Whether the scan is currently stopping.
stopped
(bool
) \u2013 Whether the scan is currently stopped.
aborting
(bool
) \u2013 Whether the scan is aborted or currently aborting.
on_status
event in the dispatcher.bbot/scanner/scanner.py
class Scanner:\n \"\"\"A class representing a single BBOT scan\n\n Examples:\n Create scan with multiple targets:\n >>> my_scan = Scanner(\"evilcorp.com\", \"1.2.3.0/24\", modules=[\"portscan\", \"sslcert\", \"httpx\"])\n\n Create scan with custom config:\n >>> config = {\"http_proxy\": \"http://127.0.0.1:8080\", \"modules\": {\"portscan\": {\"top_ports\": 2000}}}\n >>> my_scan = Scanner(\"www.evilcorp.com\", modules=[\"portscan\", \"httpx\"], config=config)\n\n Start the scan, iterating over events as they're discovered (synchronous):\n >>> for event in my_scan.start():\n >>> print(event)\n\n Start the scan, iterating over events as they're discovered (asynchronous):\n >>> async for event in my_scan.async_start():\n >>> print(event)\n\n Start the scan without consuming events (synchronous):\n >>> my_scan.start_without_generator()\n\n Start the scan without consuming events (asynchronous):\n >>> await my_scan.async_start_without_generator()\n\n Attributes:\n status (str): Status of scan, representing its current state. It can take on the following string values, each of which is mapped to an integer code in `_status_codes`:\n ```markdown\n - \"NOT_STARTED\" (0): Initial status before the scan starts.\n - \"STARTING\" (1): Status when the scan is initializing.\n - \"RUNNING\" (2): Status when the scan is in progress.\n - \"FINISHING\" (3): Status when the scan is in the process of finalizing.\n - \"CLEANING_UP\" (4): Status when the scan is cleaning up resources.\n - \"ABORTING\" (5): Status when the scan is in the process of being aborted.\n - \"ABORTED\" (6): Status when the scan has been aborted.\n - \"FAILED\" (7): Status when the scan has encountered a failure.\n - \"FINISHED\" (8): Status when the scan has successfully completed.\n ```\n _status_code (int): The numerical representation of the current scan status, stored for internal use. It is mapped according to the values in `_status_codes`.\n target (Target): Target of scan (alias to `self.preset.target`).\n preset (Preset): The main scan Preset in its baked form.\n config (omegaconf.dictconfig.DictConfig): BBOT config (alias to `self.preset.config`).\n whitelist (Target): Scan whitelist (by default this is the same as `target`) (alias to `self.preset.whitelist`).\n blacklist (Target): Scan blacklist (this takes ultimate precedence) (alias to `self.preset.blacklist`).\n helpers (ConfigAwareHelper): Helper containing various reusable functions, regexes, etc. (alias to `self.preset.helpers`).\n output_dir (pathlib.Path): Output directory for scan (alias to `self.preset.output_dir`).\n name (str): Name of scan (alias to `self.preset.scan_name`).\n dispatcher (Dispatcher): Triggers certain events when the scan `status` changes.\n modules (dict): Holds all loaded modules in this format: `{\"module_name\": Module()}`.\n stats (ScanStats): Holds high-level scan statistics such as how many events have been produced and consumed by each module.\n home (pathlib.Path): Base output directory of the scan (default: `~/.bbot/scans/<scan_name>`).\n running (bool): Whether the scan is currently running.\n stopping (bool): Whether the scan is currently stopping.\n stopped (bool): Whether the scan is currently stopped.\n aborting (bool): Whether the scan is aborted or currently aborting.\n\n Notes:\n - The status is read-only once set to \"ABORTING\" until it transitions to \"ABORTED.\"\n - Invalid statuses are logged but not applied.\n - Setting a status will trigger the `on_status` event in the dispatcher.\n \"\"\"\n\n _status_codes = {\n \"NOT_STARTED\": 0,\n \"STARTING\": 1,\n \"RUNNING\": 2,\n \"FINISHING\": 3,\n \"CLEANING_UP\": 4,\n \"ABORTING\": 5,\n \"ABORTED\": 6,\n \"FAILED\": 7,\n \"FINISHED\": 8,\n }\n\n def __init__(\n self,\n *targets,\n scan_id=None,\n dispatcher=None,\n **kwargs,\n ):\n \"\"\"\n Initializes the Scanner class.\n\n If a premade `preset` is specified, it will be used for the scan.\n Otherwise, `Scan` accepts the same arguments as `Preset`, which are passed through and used to create a new preset.\n\n Args:\n *targets (list[str], optional): Scan targets (passed through to `Preset`).\n preset (Preset, optional): Preset to use for the scan.\n scan_id (str, optional): Unique identifier for the scan. Auto-generates if None.\n dispatcher (Dispatcher, optional): Dispatcher object to use. Defaults to new Dispatcher.\n **kwargs (list[str], optional): Additional keyword arguments (passed through to `Preset`).\n \"\"\"\n if scan_id is not None:\n self.id = str(id)\n else:\n self.id = f\"SCAN:{sha1(rand_string(20)).hexdigest()}\"\n\n preset = kwargs.pop(\"preset\", None)\n kwargs[\"_log\"] = True\n\n from .preset import Preset\n\n if preset is None:\n preset = Preset(*targets, **kwargs)\n else:\n if not isinstance(preset, Preset):\n raise ValidationError(f'Preset must be of type Preset, not \"{type(preset).__name__}\"')\n self.preset = preset.bake(self)\n\n # scan name\n if preset.scan_name is None:\n tries = 0\n while 1:\n if tries > 5:\n scan_name = f\"{rand_string(4)}_{rand_string(4)}\"\n break\n scan_name = random_name()\n if self.preset.output_dir is not None:\n home_path = Path(self.preset.output_dir).resolve() / scan_name\n else:\n home_path = self.preset.bbot_home / \"scans\" / scan_name\n if not home_path.exists():\n break\n tries += 1\n else:\n scan_name = str(preset.scan_name)\n self.name = scan_name\n\n # scan output dir\n if preset.output_dir is not None:\n self.home = Path(preset.output_dir).resolve() / self.name\n else:\n self.home = self.preset.bbot_home / \"scans\" / self.name\n\n self._status = \"NOT_STARTED\"\n self._status_code = 0\n\n self.modules = OrderedDict({})\n self._modules_loaded = False\n self.dummy_modules = {}\n\n if dispatcher is None:\n from .dispatcher import Dispatcher\n\n self.dispatcher = Dispatcher()\n else:\n self.dispatcher = dispatcher\n self.dispatcher.set_scan(self)\n\n # scope distance\n self.scope_config = self.config.get(\"scope\", {})\n self.scope_search_distance = max(0, int(self.scope_config.get(\"search_distance\", 0)))\n self.scope_report_distance = int(self.scope_config.get(\"report_distance\", 1))\n\n # web config\n self.web_config = self.config.get(\"web\", {})\n self.web_spider_distance = self.web_config.get(\"spider_distance\", 0)\n self.web_spider_depth = self.web_config.get(\"spider_depth\", 1)\n self.web_spider_links_per_page = self.web_config.get(\"spider_links_per_page\", 20)\n max_redirects = self.web_config.get(\"http_max_redirects\", 5)\n self.web_max_redirects = max(max_redirects, self.web_spider_distance)\n self.http_proxy = self.web_config.get(\"http_proxy\", \"\")\n self.http_timeout = self.web_config.get(\"http_timeout\", 10)\n self.httpx_timeout = self.web_config.get(\"httpx_timeout\", 5)\n self.http_retries = self.web_config.get(\"http_retries\", 1)\n self.httpx_retries = self.web_config.get(\"httpx_retries\", 1)\n self.useragent = self.web_config.get(\"user_agent\", \"BBOT\")\n # custom HTTP headers warning\n self.custom_http_headers = self.web_config.get(\"http_headers\", {})\n if self.custom_http_headers:\n self.warning(\n \"You have enabled custom HTTP headers. These will be attached to all in-scope requests and all requests made by httpx.\"\n )\n\n # url file extensions\n self.url_extension_blacklist = set(e.lower() for e in self.config.get(\"url_extension_blacklist\", []))\n self.url_extension_httpx_only = set(e.lower() for e in self.config.get(\"url_extension_httpx_only\", []))\n\n # url querystring behavior\n self.url_querystring_remove = self.config.get(\"url_querystring_remove\", True)\n\n # blob inclusion\n self._file_blobs = self.config.get(\"file_blobs\", False)\n self._folder_blobs = self.config.get(\"folder_blobs\", False)\n\n # how often to print scan status\n self.status_frequency = self.config.get(\"status_frequency\", 15)\n\n from .stats import ScanStats\n\n self.stats = ScanStats(self)\n\n self._prepped = False\n self._finished_init = False\n self._new_activity = False\n self._cleanedup = False\n self._omitted_event_types = None\n\n self.__loop = None\n self._manager_worker_loop_tasks = []\n self.init_events_task = None\n self.ticker_task = None\n self.dispatcher_tasks = []\n\n self._stopping = False\n\n self._dns_strings = None\n self._dns_regexes = None\n\n self.__log_handlers = None\n self._log_handler_backup = []\n\n async def _prep(self):\n \"\"\"\n Creates the scan's output folder, loads its modules, and calls their .setup() methods.\n \"\"\"\n\n self.helpers.mkdir(self.home)\n if not self._prepped:\n # save scan preset\n with open(self.home / \"preset.yml\", \"w\") as f:\n f.write(self.preset.to_yaml())\n\n # log scan overview\n start_msg = f\"Scan with {len(self.preset.scan_modules):,} modules seeded with {len(self.target):,} targets\"\n details = []\n if self.whitelist != self.target:\n details.append(f\"{len(self.whitelist):,} in whitelist\")\n if self.blacklist:\n details.append(f\"{len(self.blacklist):,} in blacklist\")\n if details:\n start_msg += f\" ({', '.join(details)})\"\n self.hugeinfo(start_msg)\n\n # load scan modules (this imports and instantiates them)\n # up to this point they were only preloaded\n await self.load_modules()\n\n # run each module's .setup() method\n succeeded, hard_failed, soft_failed = await self.setup_modules()\n\n # intercept modules get sewn together like human centipede\n self.intercept_modules = [m for m in self.modules.values() if m._intercept]\n for i, intercept_module in enumerate(self.intercept_modules[1:]):\n prev_intercept_module = self.intercept_modules[i]\n self.debug(\n f\"Setting intercept module {intercept_module.name}._incoming_event_queue to previous intercept module {prev_intercept_module.name}.outgoing_event_queue\"\n )\n intercept_module._incoming_event_queue = prev_intercept_module.outgoing_event_queue\n\n # abort if there are no output modules\n num_output_modules = len([m for m in self.modules.values() if m._type == \"output\"])\n if num_output_modules < 1:\n raise ScanError(\"Failed to load output modules. Aborting.\")\n # abort if any of the module .setup()s hard-failed (i.e. they errored or returned False)\n total_failed = len(hard_failed + soft_failed)\n if hard_failed:\n msg = f\"Setup hard-failed for {len(hard_failed):,} modules ({','.join(hard_failed)})\"\n self._fail_setup(msg)\n\n total_modules = total_failed + len(self.modules)\n success_msg = f\"Setup succeeded for {len(self.modules):,}/{total_modules:,} modules.\"\n\n self.success(success_msg)\n self._prepped = True\n\n def start(self):\n for event in async_to_sync_gen(self.async_start()):\n yield event\n\n def start_without_generator(self):\n for event in async_to_sync_gen(self.async_start()):\n pass\n\n async def async_start_without_generator(self):\n async for event in self.async_start():\n pass\n\n async def async_start(self):\n \"\"\" \"\"\"\n failed = True\n scan_start_time = datetime.now()\n try:\n await self._prep()\n\n self._start_log_handlers()\n self.trace(f'Ran BBOT {__version__} at {scan_start_time}, command: {\" \".join(sys.argv)}')\n self.trace(f\"Target: {self.preset.target.json}\")\n self.trace(f\"Preset: {self.preset.to_dict(redact_secrets=True)}\")\n\n if not self.target:\n self.warning(f\"No scan targets specified\")\n\n # start status ticker\n self.ticker_task = asyncio.create_task(\n self._status_ticker(self.status_frequency), name=f\"{self.name}._status_ticker()\"\n )\n\n self.status = \"STARTING\"\n\n if not self.modules:\n self.error(f\"No modules loaded\")\n self.status = \"FAILED\"\n return\n else:\n self.hugesuccess(f\"Starting scan {self.name}\")\n\n await self.dispatcher.on_start(self)\n\n self.status = \"RUNNING\"\n self._start_modules()\n self.verbose(f\"{len(self.modules):,} modules started\")\n\n # distribute seed events\n self.init_events_task = asyncio.create_task(\n self.ingress_module.init_events(self.target.events), name=f\"{self.name}.ingress_module.init_events()\"\n )\n\n # main scan loop\n while 1:\n # abort if we're aborting\n if self.aborting:\n self._drain_queues()\n break\n\n # yield events as they come (async for event in scan.async_start())\n if \"python\" in self.modules:\n events, finish = await self.modules[\"python\"]._events_waiting(batch_size=-1)\n for e in events:\n yield e\n if events:\n continue\n\n # break if initialization finished and the scan is no longer active\n if self._finished_init and self.modules_finished:\n new_activity = await self.finish()\n if not new_activity:\n break\n\n await asyncio.sleep(0.1)\n\n failed = False\n\n except BaseException as e:\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.stop()\n failed = False\n else:\n try:\n raise\n except ScanError as e:\n self.error(f\"{e}\")\n\n except BBOTError as e:\n self.critical(f\"Error during scan: {e}\")\n\n except Exception:\n self.critical(f\"Unexpected error during scan:\\n{traceback.format_exc()}\")\n\n finally:\n tasks = self._cancel_tasks()\n self.debug(f\"Awaiting {len(tasks):,} tasks\")\n for task in tasks:\n # self.debug(f\"Awaiting {task}\")\n with contextlib.suppress(BaseException):\n await asyncio.wait_for(task, timeout=0.1)\n self.debug(f\"Awaited {len(tasks):,} tasks\")\n await self._report()\n await self._cleanup()\n\n log_fn = self.hugesuccess\n if self.status == \"ABORTING\":\n self.status = \"ABORTED\"\n log_fn = self.hugewarning\n elif failed:\n self.status = \"FAILED\"\n log_fn = self.critical\n else:\n self.status = \"FINISHED\"\n\n scan_run_time = datetime.now() - scan_start_time\n scan_run_time = self.helpers.human_timedelta(scan_run_time)\n log_fn(f\"Scan {self.name} completed in {scan_run_time} with status {self.status}\")\n\n await self.dispatcher.on_finish(self)\n\n self._stop_log_handlers()\n\n def _start_modules(self):\n self.verbose(f\"Starting module worker loops\")\n for module in self.modules.values():\n module.start()\n\n async def setup_modules(self, remove_failed=True):\n \"\"\"Asynchronously initializes all loaded modules by invoking their `setup()` methods.\n\n Args:\n remove_failed (bool): Flag indicating whether to remove modules that fail setup.\n\n Returns:\n tuple:\n succeeded - List of modules that successfully set up.\n hard_failed - List of modules that encountered a hard failure during setup.\n soft_failed - List of modules that encountered a soft failure during setup.\n\n Raises:\n ScanError: If no output modules could be loaded.\n\n Notes:\n Hard-failed modules are set to an error state and removed if `remove_failed` is True.\n Soft-failed modules are not set to an error state but are also removed if `remove_failed` is True.\n \"\"\"\n await self.load_modules()\n self.verbose(f\"Setting up modules\")\n succeeded = []\n hard_failed = []\n soft_failed = []\n\n async for task in self.helpers.as_completed([m._setup() for m in self.modules.values()]):\n module, status, msg = await task\n if status == True:\n self.debug(f\"Setup succeeded for {module.name} ({msg})\")\n succeeded.append(module.name)\n elif status == False:\n self.warning(f\"Setup hard-failed for {module.name}: {msg}\")\n self.modules[module.name].set_error_state()\n hard_failed.append(module.name)\n else:\n self.info(f\"Setup soft-failed for {module.name}: {msg}\")\n soft_failed.append(module.name)\n if (not status) and (module._intercept or remove_failed):\n # if a intercept module fails setup, we always remove it\n self.modules.pop(module.name)\n\n return succeeded, hard_failed, soft_failed\n\n async def load_modules(self):\n \"\"\"Asynchronously import and instantiate all scan modules, including internal and output modules.\n\n This method is automatically invoked by `setup_modules()`. It performs several key tasks in the following sequence:\n\n 1. Install dependencies for each module via `self.helpers.depsinstaller.install()`.\n 2. Load scan modules and updates the `modules` dictionary.\n 3. Load internal modules and updates the `modules` dictionary.\n 4. Load output modules and updates the `modules` dictionary.\n 5. Sorts modules based on their `_priority` attribute.\n\n If any modules fail to load or their dependencies fail to install, a ScanError will be raised (unless `self.force_start` is True).\n\n Attributes:\n succeeded, failed (tuple): A tuple containing lists of modules that succeeded or failed during the dependency installation.\n loaded_modules, loaded_internal_modules, loaded_output_modules (dict): Dictionaries of successfully loaded modules.\n failed, failed_internal, failed_output (list): Lists of module names that failed to load.\n\n Raises:\n ScanError: If any module dependencies fail to install or modules fail to load, and if `self.force_start` is False.\n\n Returns:\n None\n\n Note:\n After all modules are loaded, they are sorted by `_priority` and stored in the `modules` dictionary.\n \"\"\"\n if not self._modules_loaded:\n if not self.preset.modules:\n self.warning(f\"No modules to load\")\n return\n\n if not self.preset.scan_modules:\n self.warning(f\"No scan modules to load\")\n\n # install module dependencies\n succeeded, failed = await self.helpers.depsinstaller.install(*self.preset.modules)\n if failed:\n msg = f\"Failed to install dependencies for {len(failed):,} modules: {','.join(failed)}\"\n self._fail_setup(msg)\n modules = sorted([m for m in self.preset.scan_modules if m in succeeded])\n output_modules = sorted([m for m in self.preset.output_modules if m in succeeded])\n internal_modules = sorted([m for m in self.preset.internal_modules if m in succeeded])\n\n # Load scan modules\n self.verbose(f\"Loading {len(modules):,} scan modules: {','.join(modules)}\")\n loaded_modules, failed = self._load_modules(modules)\n self.modules.update(loaded_modules)\n if len(failed) > 0:\n msg = f\"Failed to load {len(failed):,} scan modules: {','.join(failed)}\"\n self._fail_setup(msg)\n if loaded_modules:\n self.info(\n f\"Loaded {len(loaded_modules):,}/{len(self.preset.scan_modules):,} scan modules ({','.join(loaded_modules)})\"\n )\n\n # Load internal modules\n self.verbose(f\"Loading {len(internal_modules):,} internal modules: {','.join(internal_modules)}\")\n loaded_internal_modules, failed_internal = self._load_modules(internal_modules)\n self.modules.update(loaded_internal_modules)\n if len(failed_internal) > 0:\n msg = f\"Failed to load {len(loaded_internal_modules):,} internal modules: {','.join(loaded_internal_modules)}\"\n self._fail_setup(msg)\n if loaded_internal_modules:\n self.info(\n f\"Loaded {len(loaded_internal_modules):,}/{len(self.preset.internal_modules):,} internal modules ({','.join(loaded_internal_modules)})\"\n )\n\n # Load output modules\n self.verbose(f\"Loading {len(output_modules):,} output modules: {','.join(output_modules)}\")\n loaded_output_modules, failed_output = self._load_modules(output_modules)\n self.modules.update(loaded_output_modules)\n if len(failed_output) > 0:\n msg = f\"Failed to load {len(failed_output):,} output modules: {','.join(failed_output)}\"\n self._fail_setup(msg)\n if loaded_output_modules:\n self.info(\n f\"Loaded {len(loaded_output_modules):,}/{len(self.preset.output_modules):,} output modules, ({','.join(loaded_output_modules)})\"\n )\n\n # builtin intercept modules\n self.ingress_module = ScanIngress(self)\n self.egress_module = ScanEgress(self)\n self.modules[self.ingress_module.name] = self.ingress_module\n self.modules[self.egress_module.name] = self.egress_module\n\n # sort modules by priority\n self.modules = OrderedDict(sorted(self.modules.items(), key=lambda x: getattr(x[-1], \"priority\", 3)))\n\n self._modules_loaded = True\n\n @property\n def modules_finished(self):\n finished_modules = [m.finished for m in self.modules.values()]\n return all(finished_modules)\n\n def kill_module(self, module_name, message=None):\n from signal import SIGINT\n\n module = self.modules[module_name]\n if module._intercept:\n self.warning(f'Cannot kill module \"{module_name}\" because it is critical to the scan')\n return\n module.set_error_state(message=message, clear_outgoing_queue=True)\n for proc in module._proc_tracker:\n with contextlib.suppress(Exception):\n proc.send_signal(SIGINT)\n self.helpers.cancel_tasks_sync(module._tasks)\n\n @property\n def incoming_event_queues(self):\n return self.ingress_module.incoming_queues\n\n @property\n def num_queued_events(self):\n total = 0\n for q in self.incoming_event_queues:\n total += len(q._queue)\n return total\n\n def modules_status(self, _log=False):\n finished = True\n status = {\"modules\": {}}\n\n sorted_modules = []\n for module_name, module in self.modules.items():\n if module_name.startswith(\"_\"):\n continue\n sorted_modules.append(module)\n mod_status = module.status\n if mod_status[\"running\"]:\n finished = False\n status[\"modules\"][module_name] = mod_status\n\n # sort modules by name\n sorted_modules.sort(key=lambda m: m.name)\n\n status[\"finished\"] = finished\n\n modules_errored = [m for m, s in status[\"modules\"].items() if s[\"errored\"]]\n\n max_mem_percent = 90\n mem_status = self.helpers.memory_status()\n # abort if we don't have the memory\n mem_percent = mem_status.percent\n if mem_percent > max_mem_percent:\n free_memory = mem_status.available\n free_memory_human = self.helpers.bytes_to_human(free_memory)\n self.warning(f\"System memory is at {mem_percent:.1f}% ({free_memory_human} remaining)\")\n\n if _log:\n modules_status = []\n for m, s in status[\"modules\"].items():\n running = s[\"running\"]\n incoming = s[\"events\"][\"incoming\"]\n outgoing = s[\"events\"][\"outgoing\"]\n tasks = s[\"tasks\"]\n total = sum([incoming, outgoing, tasks])\n if running or total > 0:\n modules_status.append((m, running, incoming, outgoing, tasks, total))\n modules_status.sort(key=lambda x: x[-1], reverse=True)\n\n if modules_status:\n modules_status_str = \", \".join([f\"{m}({i:,}:{t:,}:{o:,})\" for m, r, i, o, t, _ in modules_status])\n self.info(f\"{self.name}: Modules running (incoming:processing:outgoing) {modules_status_str}\")\n else:\n self.info(f\"{self.name}: No modules running\")\n event_type_summary = sorted(self.stats.events_emitted_by_type.items(), key=lambda x: x[-1], reverse=True)\n if event_type_summary:\n self.info(\n f'{self.name}: Events produced so far: {\", \".join([f\"{k}: {v}\" for k,v in event_type_summary])}'\n )\n else:\n self.info(f\"{self.name}: No events produced yet\")\n\n if modules_errored:\n self.verbose(\n f'{self.name}: Modules errored: {len(modules_errored):,} ({\", \".join([m for m in modules_errored])})'\n )\n\n num_queued_events = self.num_queued_events\n if num_queued_events:\n self.info(\n f\"{self.name}: {num_queued_events:,} events in queue ({self.stats.speedometer.speed:,} processed in the past {self.status_frequency} seconds)\"\n )\n else:\n self.info(\n f\"{self.name}: No events in queue ({self.stats.speedometer.speed:,} processed in the past {self.status_frequency} seconds)\"\n )\n\n if self.log_level <= logging.DEBUG:\n # status debugging\n scan_active_status = []\n scan_active_status.append(f\"scan._finished_init: {self._finished_init}\")\n scan_active_status.append(f\"scan.modules_finished: {self.modules_finished}\")\n for m in sorted_modules:\n running = m.running\n scan_active_status.append(f\" {m}.finished: {m.finished}\")\n scan_active_status.append(f\" running: {running}\")\n if running:\n scan_active_status.append(f\" tasks:\")\n for task in list(m._task_counter.tasks.values()):\n scan_active_status.append(f\" - {task}:\")\n scan_active_status.append(f\" incoming_queue_size: {m.num_incoming_events}\")\n scan_active_status.append(f\" outgoing_queue_size: {m.outgoing_event_queue.qsize()}\")\n for line in scan_active_status:\n self.debug(line)\n\n # log module memory usage\n module_memory_usage = []\n for module in sorted_modules:\n memory_usage = module.memory_usage\n module_memory_usage.append((module.name, memory_usage))\n module_memory_usage.sort(key=lambda x: x[-1], reverse=True)\n self.debug(f\"MODULE MEMORY USAGE:\")\n for module_name, usage in module_memory_usage:\n self.debug(f\" - {module_name}: {self.helpers.bytes_to_human(usage)}\")\n\n status.update({\"modules_errored\": len(modules_errored)})\n\n return status\n\n def stop(self):\n \"\"\"Stops the in-progress scan and performs necessary cleanup.\n\n This method sets the scan's status to \"ABORTING,\" cancels any pending tasks, and drains event queues. It also kills child processes spawned during the scan.\n\n Returns:\n None\n \"\"\"\n if not self._stopping:\n self._stopping = True\n self.status = \"ABORTING\"\n self.hugewarning(\"Aborting scan\")\n self.trace()\n self._cancel_tasks()\n self._drain_queues()\n self.helpers.kill_children()\n self._drain_queues()\n self.helpers.kill_children()\n self.debug(\"Finished aborting scan\")\n\n async def finish(self):\n \"\"\"Finalizes the scan by invoking the `finished()` method on all active modules if new activity is detected.\n\n The method is idempotent and will return False if no new activity has been recorded since the last invocation.\n\n Returns:\n bool: True if new activity has been detected and the `finished()` method is invoked on all modules.\n False if no new activity has been detected since the last invocation.\n\n Notes:\n This method alters the scan's status to \"FINISHING\" if new activity is detected.\n \"\"\"\n # if new events were generated since last time we were here\n if self._new_activity:\n self._new_activity = False\n self.status = \"FINISHING\"\n # Trigger .finished() on every module and start over\n log.info(\"Finishing scan\")\n for module in self.modules.values():\n finished_event = self.make_event(f\"FINISHED\", \"FINISHED\", dummy=True, tags={module.name})\n await module.queue_event(finished_event)\n self.verbose(\"Completed finish()\")\n return True\n # Return False if no new events were generated since last time\n self.verbose(\"Completed final finish()\")\n return False\n\n def _drain_queues(self):\n \"\"\"Empties all the event queues for each loaded module and the manager's incoming event queue.\n\n This method iteratively empties both the incoming and outgoing event queues of each module, as well as the incoming event queue of the scan manager.\n\n Returns:\n None\n \"\"\"\n self.debug(\"Draining queues\")\n for module in self.modules.values():\n with contextlib.suppress(asyncio.queues.QueueEmpty):\n while 1:\n if module.incoming_event_queue:\n module.incoming_event_queue.get_nowait()\n with contextlib.suppress(asyncio.queues.QueueEmpty):\n while 1:\n if module.outgoing_event_queue:\n module.outgoing_event_queue.get_nowait()\n self.debug(\"Finished draining queues\")\n\n def _cancel_tasks(self):\n \"\"\"Cancels all asynchronous tasks and shuts down the process pool.\n\n This method collects all pending tasks from each module, the dispatcher,\n and the scan manager. After collecting these tasks, it cancels them synchronously\n using a helper function. Finally, it shuts down the process pool, canceling any\n pending futures.\n\n Returns:\n None\n \"\"\"\n self.debug(\"Cancelling all scan tasks\")\n tasks = []\n # module workers\n for m in self.modules.values():\n tasks += getattr(m, \"_tasks\", [])\n # init events\n if self.init_events_task:\n tasks.append(self.init_events_task)\n # ticker\n if self.ticker_task:\n tasks.append(self.ticker_task)\n # dispatcher\n tasks += self.dispatcher_tasks\n # manager worker loops\n tasks += self._manager_worker_loop_tasks\n self.helpers.cancel_tasks_sync(tasks)\n # process pool\n self.helpers.process_pool.shutdown(cancel_futures=True)\n self.debug(\"Finished cancelling all scan tasks\")\n return tasks\n\n async def _report(self):\n \"\"\"Asynchronously executes the `report()` method for each module in the scan.\n\n This method is called once at the end of each scan and is responsible for\n triggering the `report()` function for each module. It executes irrespective\n of whether the scan was aborted or completed successfully. The method makes\n use of an asynchronous context manager (`_acatch`) to handle exceptions and\n a task counter to keep track of the task's context.\n\n Returns:\n None\n \"\"\"\n for mod in self.modules.values():\n context = f\"{mod.name}.report()\"\n async with self._acatch(context), mod._task_counter.count(context):\n await mod.report()\n\n async def _cleanup(self):\n \"\"\"Asynchronously executes the `cleanup()` method for each module in the scan.\n\n This method is called once at the end of the scan to perform resource cleanup\n tasks. It is executed regardless of whether the scan was aborted or completed\n successfully. The scan status is set to \"CLEANING_UP\" during the execution.\n After calling the `cleanup()` method for each module, it performs additional\n cleanup tasks such as removing the scan's home directory if empty and cleaning\n old scans.\n\n Returns:\n None\n \"\"\"\n # clean up self\n if not self._cleanedup:\n self._cleanedup = True\n self.status = \"CLEANING_UP\"\n # clean up dns engine\n if self.helpers._dns is not None:\n await self.helpers.dns.shutdown()\n # clean up web engine\n if self.helpers._web is not None:\n await self.helpers.web.shutdown()\n # clean up modules\n for mod in self.modules.values():\n await mod._cleanup()\n with contextlib.suppress(Exception):\n self.home.rmdir()\n self.helpers.clean_old_scans()\n\n def in_scope(self, *args, **kwargs):\n return self.preset.in_scope(*args, **kwargs)\n\n def whitelisted(self, *args, **kwargs):\n return self.preset.whitelisted(*args, **kwargs)\n\n def blacklisted(self, *args, **kwargs):\n return self.preset.blacklisted(*args, **kwargs)\n\n @property\n def core(self):\n return self.preset.core\n\n @property\n def config(self):\n return self.preset.core.config\n\n @property\n def target(self):\n return self.preset.target\n\n @property\n def whitelist(self):\n return self.preset.whitelist\n\n @property\n def blacklist(self):\n return self.preset.blacklist\n\n @property\n def helpers(self):\n return self.preset.helpers\n\n @property\n def force_start(self):\n return self.preset.force_start\n\n @property\n def word_cloud(self):\n return self.helpers.word_cloud\n\n @property\n def stopping(self):\n return not self.running\n\n @property\n def stopped(self):\n return self._status_code > 5\n\n @property\n def running(self):\n return 0 < self._status_code < 4\n\n @property\n def aborting(self):\n return 5 <= self._status_code <= 6\n\n @property\n def status(self):\n return self._status\n\n @property\n def omitted_event_types(self):\n if self._omitted_event_types is None:\n self._omitted_event_types = self.config.get(\"omit_event_types\", [])\n return self._omitted_event_types\n\n @status.setter\n def status(self, status):\n \"\"\"\n Block setting after status has been aborted\n \"\"\"\n status = str(status).strip().upper()\n if status in self._status_codes:\n if self.status == \"ABORTING\" and not status == \"ABORTED\":\n self.debug(f'Attempt to set invalid status \"{status}\" on aborted scan')\n else:\n if status != self._status:\n self._status = status\n self._status_code = self._status_codes[status]\n self.dispatcher_tasks.append(\n asyncio.create_task(\n self.dispatcher.catch(self.dispatcher.on_status, self._status, self.id),\n name=f\"{self.name}.dispatcher.on_status({status})\",\n )\n )\n else:\n self.debug(f'Scan status is already \"{status}\"')\n else:\n self.debug(f'Attempt to set invalid status \"{status}\" on scan')\n\n def make_event(self, *args, **kwargs):\n kwargs[\"scan\"] = self\n event = make_event(*args, **kwargs)\n return event\n\n @property\n def root_event(self):\n \"\"\"\n The root scan event, e.g.:\n ```json\n {\n \"type\": \"SCAN\",\n \"id\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"data\": \"pixilated_kathryn (SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54)\",\n \"scope_distance\": 0,\n \"scan\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"timestamp\": 1694548779.616255,\n \"parent\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"tags\": [\n \"distance-0\"\n ],\n \"module\": \"TARGET\",\n \"module_sequence\": \"TARGET\"\n }\n ```\n \"\"\"\n root_event = self.make_event(data=self.json, event_type=\"SCAN\", dummy=True)\n root_event._id = self.id\n root_event.scope_distance = 0\n root_event.parent = root_event\n root_event.module = self._make_dummy_module(name=\"TARGET\", _type=\"TARGET\")\n root_event.discovery_context = f\"Scan {self.name} started at {root_event.timestamp}\"\n return root_event\n\n @property\n def dns_strings(self):\n \"\"\"\n A list of DNS hostname strings generated from the scan target\n \"\"\"\n if self._dns_strings is None:\n dns_targets = set(t.host for t in self.target if t.host and isinstance(t.host, str))\n dns_whitelist = set(t.host for t in self.whitelist if t.host and isinstance(t.host, str))\n dns_targets.update(dns_whitelist)\n dns_targets = sorted(dns_targets, key=len)\n dns_targets_set = set()\n dns_strings = []\n for t in dns_targets:\n if not any(x in dns_targets_set for x in self.helpers.domain_parents(t, include_self=True)):\n dns_strings.append(t)\n self._dns_strings = dns_strings\n return self._dns_strings\n\n def _generate_dns_regexes(self, pattern):\n \"\"\"\n Generates a list of compiled DNS hostname regexes based on the provided pattern.\n This method centralizes the regex compilation to avoid redundancy in the dns_regexes and dns_regexes_yara methods.\n\n Args:\n pattern (str):\n Returns:\n list[re.Pattern]: A list of compiled regex patterns if enabled, otherwise an empty list.\n \"\"\"\n\n dns_regexes = []\n for t in self.dns_strings:\n regex_pattern = re.compile(f\"{pattern}{re.escape(t)})\", re.I)\n log.debug(f\"Generated Regex [{regex_pattern.pattern}] for domain {t}\")\n dns_regexes.append(regex_pattern)\n return dns_regexes\n\n @property\n def dns_regexes(self):\n \"\"\"\n A list of DNS hostname regexes generated from the scan target\n For the purpose of extracting hostnames\n\n Examples:\n Extract hostnames from text:\n >>> for regex in scan.dns_regexes:\n ... for match in regex.finditer(response.text):\n ... hostname = match.group().lower()\n \"\"\"\n if not self._dns_regexes:\n self._dns_regexes = self._generate_dns_regexes(r\"((?:(?:[\\w-]+)\\.)+\")\n return self._dns_regexes\n\n @property\n def dns_regexes_yara(self):\n \"\"\"\n Returns a list of DNS hostname regexes formatted specifically for compatibility with YARA rules.\n \"\"\"\n return self._generate_dns_regexes(r\"(([a-z0-9-]+\\.)+\")\n\n @property\n def json(self):\n \"\"\"\n A dictionary representation of the scan including its name, ID, targets, whitelist, blacklist, and modules\n \"\"\"\n j = dict()\n for i in (\"id\", \"name\"):\n v = getattr(self, i, \"\")\n if v:\n j.update({i: v})\n j[\"target\"] = self.preset.target.json\n j[\"preset\"] = self.preset.to_dict(redact_secrets=True)\n return j\n\n def debug(self, *args, trace=False, **kwargs):\n log.debug(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def verbose(self, *args, trace=False, **kwargs):\n log.verbose(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeverbose(self, *args, trace=False, **kwargs):\n log.hugeverbose(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def info(self, *args, trace=False, **kwargs):\n log.info(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeinfo(self, *args, trace=False, **kwargs):\n log.hugeinfo(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def success(self, *args, trace=False, **kwargs):\n log.success(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugesuccess(self, *args, trace=False, **kwargs):\n log.hugesuccess(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def warning(self, *args, trace=True, **kwargs):\n log.warning(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugewarning(self, *args, trace=True, **kwargs):\n log.hugewarning(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def error(self, *args, trace=True, **kwargs):\n log.error(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def trace(self, msg=None):\n if msg is None:\n e_type, e_val, e_traceback = exc_info()\n if e_type is not None:\n log.trace(traceback.format_exc())\n else:\n log.trace(msg)\n\n def critical(self, *args, trace=True, **kwargs):\n log.critical(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n @property\n def log_level(self):\n \"\"\"\n Return the current log level, e.g. logging.INFO\n \"\"\"\n return self.core.logger.log_level\n\n @property\n def _log_handlers(self):\n if self.__log_handlers is None:\n self.helpers.mkdir(self.home)\n main_handler = logging.handlers.TimedRotatingFileHandler(\n str(self.home / \"scan.log\"), when=\"d\", interval=1, backupCount=14\n )\n main_handler.addFilter(lambda x: x.levelno != logging.TRACE and x.levelno >= logging.VERBOSE)\n debug_handler = logging.handlers.TimedRotatingFileHandler(\n str(self.home / \"debug.log\"), when=\"d\", interval=1, backupCount=14\n )\n debug_handler.addFilter(lambda x: x.levelno >= logging.DEBUG)\n self.__log_handlers = [main_handler, debug_handler]\n return self.__log_handlers\n\n def _start_log_handlers(self):\n # add log handlers\n for handler in self._log_handlers:\n self.core.logger.add_log_handler(handler)\n # temporarily disable main ones\n for handler_name in (\"file_main\", \"file_debug\"):\n handler = self.core.logger.log_handlers.get(handler_name, None)\n if handler is not None and handler not in self._log_handler_backup:\n self._log_handler_backup.append(handler)\n self.core.logger.remove_log_handler(handler)\n\n def _stop_log_handlers(self):\n # remove log handlers\n for handler in self._log_handlers:\n self.core.logger.remove_log_handler(handler)\n # restore main ones\n for handler in self._log_handler_backup:\n self.core.logger.add_log_handler(handler)\n\n def _fail_setup(self, msg):\n msg = str(msg)\n if self.force_start:\n self.error(msg)\n else:\n msg += \" (--force to run module anyway)\"\n raise ScanError(msg)\n\n def _load_modules(self, modules):\n modules = [str(m) for m in modules]\n loaded_modules = {}\n failed = set()\n for module_name, module_class in self.preset.module_loader.load_modules(modules).items():\n if module_class:\n try:\n loaded_modules[module_name] = module_class(self)\n self.verbose(f'Loaded module \"{module_name}\"')\n continue\n except Exception:\n self.warning(f\"Failed to load module {module_class}\")\n else:\n self.warning(f'Failed to load unknown module \"{module_name}\"')\n failed.add(module_name)\n return loaded_modules, failed\n\n async def _status_ticker(self, interval=15):\n async with self._acatch():\n while 1:\n await asyncio.sleep(interval)\n self.modules_status(_log=True)\n\n @contextlib.asynccontextmanager\n async def _acatch(self, context=\"scan\", finally_callback=None, unhandled_is_critical=False):\n \"\"\"\n Async version of catch()\n\n async with catch():\n await do_stuff()\n \"\"\"\n try:\n yield\n except BaseException as e:\n self._handle_exception(e, context=context, unhandled_is_critical=unhandled_is_critical)\n\n def _handle_exception(self, e, context=\"scan\", finally_callback=None, unhandled_is_critical=False):\n if callable(context):\n context = f\"{context.__qualname__}()\"\n filename, lineno, funcname = self.helpers.get_traceback_details(e)\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt,)):\n log.debug(f\"Interrupted\")\n self.stop()\n elif isinstance(e, BrokenPipeError):\n log.debug(f\"BrokenPipeError in {filename}:{lineno}:{funcname}(): {e}\")\n elif isinstance(e, asyncio.CancelledError):\n raise\n elif isinstance(e, Exception):\n traceback_str = getattr(e, \"engine_traceback\", None)\n if traceback_str is None:\n traceback_str = traceback.format_exc()\n if unhandled_is_critical:\n log.critical(f\"Error in {context}: {filename}:{lineno}:{funcname}(): {e}\")\n log.critical(traceback_str)\n else:\n log.error(f\"Error in {context}: {filename}:{lineno}:{funcname}(): {e}\")\n log.trace(traceback_str)\n if callable(finally_callback):\n finally_callback(e)\n\n def _make_dummy_module(self, name, _type=\"scan\"):\n \"\"\"\n Construct a dummy module, for attachment to events\n \"\"\"\n try:\n return self.dummy_modules[name]\n except KeyError:\n dummy = DummyModule(scan=self, name=name, _type=_type)\n self.dummy_modules[name] = dummy\n return dummy\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.dns_regexes","title":"dns_regexes property
","text":"dns_regexes\n
A list of DNS hostname regexes generated from the scan target For the purpose of extracting hostnames
Examples:
Extract hostnames from text:
>>> for regex in scan.dns_regexes:\n... for match in regex.finditer(response.text):\n... hostname = match.group().lower()\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.dns_regexes_yara","title":"dns_regexes_yara property
","text":"dns_regexes_yara\n
Returns a list of DNS hostname regexes formatted specifically for compatibility with YARA rules.
"},{"location":"dev/scanner/#bbot.scanner.Scanner.dns_strings","title":"dns_stringsproperty
","text":"dns_strings\n
A list of DNS hostname strings generated from the scan target
"},{"location":"dev/scanner/#bbot.scanner.Scanner.json","title":"jsonproperty
","text":"json\n
A dictionary representation of the scan including its name, ID, targets, whitelist, blacklist, and modules
"},{"location":"dev/scanner/#bbot.scanner.Scanner.log_level","title":"log_levelproperty
","text":"log_level\n
Return the current log level, e.g. logging.INFO
"},{"location":"dev/scanner/#bbot.scanner.Scanner.root_event","title":"root_eventproperty
","text":"root_event\n
The root scan event, e.g.:
{\n \"type\": \"SCAN\",\n \"id\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"data\": \"pixilated_kathryn (SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54)\",\n \"scope_distance\": 0,\n \"scan\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"timestamp\": 1694548779.616255,\n \"parent\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"tags\": [\n \"distance-0\"\n ],\n \"module\": \"TARGET\",\n \"module_sequence\": \"TARGET\"\n}\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.__init__","title":"__init__","text":"__init__(*targets, scan_id=None, dispatcher=None, **kwargs)\n
Initializes the Scanner class.
If a premade preset
is specified, it will be used for the scan. Otherwise, Scan
accepts the same arguments as Preset
, which are passed through and used to create a new preset.
Parameters:
*targets
(list[str]
, default: ()
) \u2013 Scan targets (passed through to Preset
).
preset
(Preset
) \u2013 Preset to use for the scan.
scan_id
(str
, default: None
) \u2013 Unique identifier for the scan. Auto-generates if None.
dispatcher
(Dispatcher
, default: None
) \u2013 Dispatcher object to use. Defaults to new Dispatcher.
**kwargs
(list[str]
, default: {}
) \u2013 Additional keyword arguments (passed through to Preset
).
bbot/scanner/scanner.py
def __init__(\n self,\n *targets,\n scan_id=None,\n dispatcher=None,\n **kwargs,\n):\n \"\"\"\n Initializes the Scanner class.\n\n If a premade `preset` is specified, it will be used for the scan.\n Otherwise, `Scan` accepts the same arguments as `Preset`, which are passed through and used to create a new preset.\n\n Args:\n *targets (list[str], optional): Scan targets (passed through to `Preset`).\n preset (Preset, optional): Preset to use for the scan.\n scan_id (str, optional): Unique identifier for the scan. Auto-generates if None.\n dispatcher (Dispatcher, optional): Dispatcher object to use. Defaults to new Dispatcher.\n **kwargs (list[str], optional): Additional keyword arguments (passed through to `Preset`).\n \"\"\"\n if scan_id is not None:\n self.id = str(id)\n else:\n self.id = f\"SCAN:{sha1(rand_string(20)).hexdigest()}\"\n\n preset = kwargs.pop(\"preset\", None)\n kwargs[\"_log\"] = True\n\n from .preset import Preset\n\n if preset is None:\n preset = Preset(*targets, **kwargs)\n else:\n if not isinstance(preset, Preset):\n raise ValidationError(f'Preset must be of type Preset, not \"{type(preset).__name__}\"')\n self.preset = preset.bake(self)\n\n # scan name\n if preset.scan_name is None:\n tries = 0\n while 1:\n if tries > 5:\n scan_name = f\"{rand_string(4)}_{rand_string(4)}\"\n break\n scan_name = random_name()\n if self.preset.output_dir is not None:\n home_path = Path(self.preset.output_dir).resolve() / scan_name\n else:\n home_path = self.preset.bbot_home / \"scans\" / scan_name\n if not home_path.exists():\n break\n tries += 1\n else:\n scan_name = str(preset.scan_name)\n self.name = scan_name\n\n # scan output dir\n if preset.output_dir is not None:\n self.home = Path(preset.output_dir).resolve() / self.name\n else:\n self.home = self.preset.bbot_home / \"scans\" / self.name\n\n self._status = \"NOT_STARTED\"\n self._status_code = 0\n\n self.modules = OrderedDict({})\n self._modules_loaded = False\n self.dummy_modules = {}\n\n if dispatcher is None:\n from .dispatcher import Dispatcher\n\n self.dispatcher = Dispatcher()\n else:\n self.dispatcher = dispatcher\n self.dispatcher.set_scan(self)\n\n # scope distance\n self.scope_config = self.config.get(\"scope\", {})\n self.scope_search_distance = max(0, int(self.scope_config.get(\"search_distance\", 0)))\n self.scope_report_distance = int(self.scope_config.get(\"report_distance\", 1))\n\n # web config\n self.web_config = self.config.get(\"web\", {})\n self.web_spider_distance = self.web_config.get(\"spider_distance\", 0)\n self.web_spider_depth = self.web_config.get(\"spider_depth\", 1)\n self.web_spider_links_per_page = self.web_config.get(\"spider_links_per_page\", 20)\n max_redirects = self.web_config.get(\"http_max_redirects\", 5)\n self.web_max_redirects = max(max_redirects, self.web_spider_distance)\n self.http_proxy = self.web_config.get(\"http_proxy\", \"\")\n self.http_timeout = self.web_config.get(\"http_timeout\", 10)\n self.httpx_timeout = self.web_config.get(\"httpx_timeout\", 5)\n self.http_retries = self.web_config.get(\"http_retries\", 1)\n self.httpx_retries = self.web_config.get(\"httpx_retries\", 1)\n self.useragent = self.web_config.get(\"user_agent\", \"BBOT\")\n # custom HTTP headers warning\n self.custom_http_headers = self.web_config.get(\"http_headers\", {})\n if self.custom_http_headers:\n self.warning(\n \"You have enabled custom HTTP headers. These will be attached to all in-scope requests and all requests made by httpx.\"\n )\n\n # url file extensions\n self.url_extension_blacklist = set(e.lower() for e in self.config.get(\"url_extension_blacklist\", []))\n self.url_extension_httpx_only = set(e.lower() for e in self.config.get(\"url_extension_httpx_only\", []))\n\n # url querystring behavior\n self.url_querystring_remove = self.config.get(\"url_querystring_remove\", True)\n\n # blob inclusion\n self._file_blobs = self.config.get(\"file_blobs\", False)\n self._folder_blobs = self.config.get(\"folder_blobs\", False)\n\n # how often to print scan status\n self.status_frequency = self.config.get(\"status_frequency\", 15)\n\n from .stats import ScanStats\n\n self.stats = ScanStats(self)\n\n self._prepped = False\n self._finished_init = False\n self._new_activity = False\n self._cleanedup = False\n self._omitted_event_types = None\n\n self.__loop = None\n self._manager_worker_loop_tasks = []\n self.init_events_task = None\n self.ticker_task = None\n self.dispatcher_tasks = []\n\n self._stopping = False\n\n self._dns_strings = None\n self._dns_regexes = None\n\n self.__log_handlers = None\n self._log_handler_backup = []\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.async_start","title":"async_start async
","text":"async_start()\n
Source code in bbot/scanner/scanner.py
async def async_start(self):\n \"\"\" \"\"\"\n failed = True\n scan_start_time = datetime.now()\n try:\n await self._prep()\n\n self._start_log_handlers()\n self.trace(f'Ran BBOT {__version__} at {scan_start_time}, command: {\" \".join(sys.argv)}')\n self.trace(f\"Target: {self.preset.target.json}\")\n self.trace(f\"Preset: {self.preset.to_dict(redact_secrets=True)}\")\n\n if not self.target:\n self.warning(f\"No scan targets specified\")\n\n # start status ticker\n self.ticker_task = asyncio.create_task(\n self._status_ticker(self.status_frequency), name=f\"{self.name}._status_ticker()\"\n )\n\n self.status = \"STARTING\"\n\n if not self.modules:\n self.error(f\"No modules loaded\")\n self.status = \"FAILED\"\n return\n else:\n self.hugesuccess(f\"Starting scan {self.name}\")\n\n await self.dispatcher.on_start(self)\n\n self.status = \"RUNNING\"\n self._start_modules()\n self.verbose(f\"{len(self.modules):,} modules started\")\n\n # distribute seed events\n self.init_events_task = asyncio.create_task(\n self.ingress_module.init_events(self.target.events), name=f\"{self.name}.ingress_module.init_events()\"\n )\n\n # main scan loop\n while 1:\n # abort if we're aborting\n if self.aborting:\n self._drain_queues()\n break\n\n # yield events as they come (async for event in scan.async_start())\n if \"python\" in self.modules:\n events, finish = await self.modules[\"python\"]._events_waiting(batch_size=-1)\n for e in events:\n yield e\n if events:\n continue\n\n # break if initialization finished and the scan is no longer active\n if self._finished_init and self.modules_finished:\n new_activity = await self.finish()\n if not new_activity:\n break\n\n await asyncio.sleep(0.1)\n\n failed = False\n\n except BaseException as e:\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.stop()\n failed = False\n else:\n try:\n raise\n except ScanError as e:\n self.error(f\"{e}\")\n\n except BBOTError as e:\n self.critical(f\"Error during scan: {e}\")\n\n except Exception:\n self.critical(f\"Unexpected error during scan:\\n{traceback.format_exc()}\")\n\n finally:\n tasks = self._cancel_tasks()\n self.debug(f\"Awaiting {len(tasks):,} tasks\")\n for task in tasks:\n # self.debug(f\"Awaiting {task}\")\n with contextlib.suppress(BaseException):\n await asyncio.wait_for(task, timeout=0.1)\n self.debug(f\"Awaited {len(tasks):,} tasks\")\n await self._report()\n await self._cleanup()\n\n log_fn = self.hugesuccess\n if self.status == \"ABORTING\":\n self.status = \"ABORTED\"\n log_fn = self.hugewarning\n elif failed:\n self.status = \"FAILED\"\n log_fn = self.critical\n else:\n self.status = \"FINISHED\"\n\n scan_run_time = datetime.now() - scan_start_time\n scan_run_time = self.helpers.human_timedelta(scan_run_time)\n log_fn(f\"Scan {self.name} completed in {scan_run_time} with status {self.status}\")\n\n await self.dispatcher.on_finish(self)\n\n self._stop_log_handlers()\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.finish","title":"finish async
","text":"finish()\n
Finalizes the scan by invoking the finished()
method on all active modules if new activity is detected.
The method is idempotent and will return False if no new activity has been recorded since the last invocation.
Returns:
bool
\u2013 True if new activity has been detected and the finished()
method is invoked on all modules. False if no new activity has been detected since the last invocation.
This method alters the scan's status to \"FINISHING\" if new activity is detected.
Source code inbbot/scanner/scanner.py
async def finish(self):\n \"\"\"Finalizes the scan by invoking the `finished()` method on all active modules if new activity is detected.\n\n The method is idempotent and will return False if no new activity has been recorded since the last invocation.\n\n Returns:\n bool: True if new activity has been detected and the `finished()` method is invoked on all modules.\n False if no new activity has been detected since the last invocation.\n\n Notes:\n This method alters the scan's status to \"FINISHING\" if new activity is detected.\n \"\"\"\n # if new events were generated since last time we were here\n if self._new_activity:\n self._new_activity = False\n self.status = \"FINISHING\"\n # Trigger .finished() on every module and start over\n log.info(\"Finishing scan\")\n for module in self.modules.values():\n finished_event = self.make_event(f\"FINISHED\", \"FINISHED\", dummy=True, tags={module.name})\n await module.queue_event(finished_event)\n self.verbose(\"Completed finish()\")\n return True\n # Return False if no new events were generated since last time\n self.verbose(\"Completed final finish()\")\n return False\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.load_modules","title":"load_modules async
","text":"load_modules()\n
Asynchronously import and instantiate all scan modules, including internal and output modules.
This method is automatically invoked by setup_modules()
. It performs several key tasks in the following sequence:
self.helpers.depsinstaller.install()
.modules
dictionary.modules
dictionary.modules
dictionary._priority
attribute.If any modules fail to load or their dependencies fail to install, a ScanError will be raised (unless self.force_start
is True).
Attributes:
succeeded,
(failed (tuple
) \u2013 A tuple containing lists of modules that succeeded or failed during the dependency installation.
loaded_modules,
(loaded_internal_modules, loaded_output_modules (dict
) \u2013 Dictionaries of successfully loaded modules.
failed,
(failed_internal, failed_output (list
) \u2013 Lists of module names that failed to load.
Raises:
ScanError
\u2013 If any module dependencies fail to install or modules fail to load, and if self.force_start
is False.
Returns:
None
After all modules are loaded, they are sorted by _priority
and stored in the modules
dictionary.
bbot/scanner/scanner.py
async def load_modules(self):\n \"\"\"Asynchronously import and instantiate all scan modules, including internal and output modules.\n\n This method is automatically invoked by `setup_modules()`. It performs several key tasks in the following sequence:\n\n 1. Install dependencies for each module via `self.helpers.depsinstaller.install()`.\n 2. Load scan modules and updates the `modules` dictionary.\n 3. Load internal modules and updates the `modules` dictionary.\n 4. Load output modules and updates the `modules` dictionary.\n 5. Sorts modules based on their `_priority` attribute.\n\n If any modules fail to load or their dependencies fail to install, a ScanError will be raised (unless `self.force_start` is True).\n\n Attributes:\n succeeded, failed (tuple): A tuple containing lists of modules that succeeded or failed during the dependency installation.\n loaded_modules, loaded_internal_modules, loaded_output_modules (dict): Dictionaries of successfully loaded modules.\n failed, failed_internal, failed_output (list): Lists of module names that failed to load.\n\n Raises:\n ScanError: If any module dependencies fail to install or modules fail to load, and if `self.force_start` is False.\n\n Returns:\n None\n\n Note:\n After all modules are loaded, they are sorted by `_priority` and stored in the `modules` dictionary.\n \"\"\"\n if not self._modules_loaded:\n if not self.preset.modules:\n self.warning(f\"No modules to load\")\n return\n\n if not self.preset.scan_modules:\n self.warning(f\"No scan modules to load\")\n\n # install module dependencies\n succeeded, failed = await self.helpers.depsinstaller.install(*self.preset.modules)\n if failed:\n msg = f\"Failed to install dependencies for {len(failed):,} modules: {','.join(failed)}\"\n self._fail_setup(msg)\n modules = sorted([m for m in self.preset.scan_modules if m in succeeded])\n output_modules = sorted([m for m in self.preset.output_modules if m in succeeded])\n internal_modules = sorted([m for m in self.preset.internal_modules if m in succeeded])\n\n # Load scan modules\n self.verbose(f\"Loading {len(modules):,} scan modules: {','.join(modules)}\")\n loaded_modules, failed = self._load_modules(modules)\n self.modules.update(loaded_modules)\n if len(failed) > 0:\n msg = f\"Failed to load {len(failed):,} scan modules: {','.join(failed)}\"\n self._fail_setup(msg)\n if loaded_modules:\n self.info(\n f\"Loaded {len(loaded_modules):,}/{len(self.preset.scan_modules):,} scan modules ({','.join(loaded_modules)})\"\n )\n\n # Load internal modules\n self.verbose(f\"Loading {len(internal_modules):,} internal modules: {','.join(internal_modules)}\")\n loaded_internal_modules, failed_internal = self._load_modules(internal_modules)\n self.modules.update(loaded_internal_modules)\n if len(failed_internal) > 0:\n msg = f\"Failed to load {len(loaded_internal_modules):,} internal modules: {','.join(loaded_internal_modules)}\"\n self._fail_setup(msg)\n if loaded_internal_modules:\n self.info(\n f\"Loaded {len(loaded_internal_modules):,}/{len(self.preset.internal_modules):,} internal modules ({','.join(loaded_internal_modules)})\"\n )\n\n # Load output modules\n self.verbose(f\"Loading {len(output_modules):,} output modules: {','.join(output_modules)}\")\n loaded_output_modules, failed_output = self._load_modules(output_modules)\n self.modules.update(loaded_output_modules)\n if len(failed_output) > 0:\n msg = f\"Failed to load {len(failed_output):,} output modules: {','.join(failed_output)}\"\n self._fail_setup(msg)\n if loaded_output_modules:\n self.info(\n f\"Loaded {len(loaded_output_modules):,}/{len(self.preset.output_modules):,} output modules, ({','.join(loaded_output_modules)})\"\n )\n\n # builtin intercept modules\n self.ingress_module = ScanIngress(self)\n self.egress_module = ScanEgress(self)\n self.modules[self.ingress_module.name] = self.ingress_module\n self.modules[self.egress_module.name] = self.egress_module\n\n # sort modules by priority\n self.modules = OrderedDict(sorted(self.modules.items(), key=lambda x: getattr(x[-1], \"priority\", 3)))\n\n self._modules_loaded = True\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.setup_modules","title":"setup_modules async
","text":"setup_modules(remove_failed=True)\n
Asynchronously initializes all loaded modules by invoking their setup()
methods.
Parameters:
remove_failed
(bool
, default: True
) \u2013 Flag indicating whether to remove modules that fail setup.
Returns:
tuple
\u2013 succeeded - List of modules that successfully set up. hard_failed - List of modules that encountered a hard failure during setup. soft_failed - List of modules that encountered a soft failure during setup.
Raises:
ScanError
\u2013 If no output modules could be loaded.
Hard-failed modules are set to an error state and removed if remove_failed
is True. Soft-failed modules are not set to an error state but are also removed if remove_failed
is True.
bbot/scanner/scanner.py
async def setup_modules(self, remove_failed=True):\n \"\"\"Asynchronously initializes all loaded modules by invoking their `setup()` methods.\n\n Args:\n remove_failed (bool): Flag indicating whether to remove modules that fail setup.\n\n Returns:\n tuple:\n succeeded - List of modules that successfully set up.\n hard_failed - List of modules that encountered a hard failure during setup.\n soft_failed - List of modules that encountered a soft failure during setup.\n\n Raises:\n ScanError: If no output modules could be loaded.\n\n Notes:\n Hard-failed modules are set to an error state and removed if `remove_failed` is True.\n Soft-failed modules are not set to an error state but are also removed if `remove_failed` is True.\n \"\"\"\n await self.load_modules()\n self.verbose(f\"Setting up modules\")\n succeeded = []\n hard_failed = []\n soft_failed = []\n\n async for task in self.helpers.as_completed([m._setup() for m in self.modules.values()]):\n module, status, msg = await task\n if status == True:\n self.debug(f\"Setup succeeded for {module.name} ({msg})\")\n succeeded.append(module.name)\n elif status == False:\n self.warning(f\"Setup hard-failed for {module.name}: {msg}\")\n self.modules[module.name].set_error_state()\n hard_failed.append(module.name)\n else:\n self.info(f\"Setup soft-failed for {module.name}: {msg}\")\n soft_failed.append(module.name)\n if (not status) and (module._intercept or remove_failed):\n # if a intercept module fails setup, we always remove it\n self.modules.pop(module.name)\n\n return succeeded, hard_failed, soft_failed\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.stop","title":"stop","text":"stop()\n
Stops the in-progress scan and performs necessary cleanup.
This method sets the scan's status to \"ABORTING,\" cancels any pending tasks, and drains event queues. It also kills child processes spawned during the scan.
Returns:
None
bbot/scanner/scanner.py
def stop(self):\n \"\"\"Stops the in-progress scan and performs necessary cleanup.\n\n This method sets the scan's status to \"ABORTING,\" cancels any pending tasks, and drains event queues. It also kills child processes spawned during the scan.\n\n Returns:\n None\n \"\"\"\n if not self._stopping:\n self._stopping = True\n self.status = \"ABORTING\"\n self.hugewarning(\"Aborting scan\")\n self.trace()\n self._cancel_tasks()\n self._drain_queues()\n self.helpers.kill_children()\n self._drain_queues()\n self.helpers.kill_children()\n self.debug(\"Finished aborting scan\")\n
"},{"location":"dev/target/","title":"Target","text":""},{"location":"dev/target/#bbot.scanner.target.Target","title":"Target","text":"A class representing a target. Can contain an unlimited number of hosts, IP or IP ranges, URLs, etc.
Attributes:
strict_scope
(bool
) \u2013 Flag indicating whether to consider child domains in-scope. If set to True, only the exact hosts specified and not their children are considered part of the target.
_radix
(RadixTree
) \u2013 Radix tree for quick IP/DNS lookups.
_events
(set
) \u2013 Flat set of contained events.
Examples:
Basic usage
>>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n>>> len(target)\n257\n>>> list(t.events)\n[\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'}),\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n]\n>>> \"www.evilcorp.com\" in target\nTrue\n>>> \"1.2.3.4\" in target\nTrue\n>>> \"4.3.2.1\" in target\nFalse\n>>> \"https://admin.evilcorp.com\" in target\nTrue\n>>> \"bob@evilcorp.com\" in target\nTrue\n
Event correlation
>>> target.get(\"www.evilcorp.com\")\nDNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n>>> target.get(\"1.2.3.4\")\nIP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n
Target comparison
>>> target2 = Targets(scan, \"www.evilcorp.com\")\n>>> target2 == target\nFalse\n>>> target2 in target\nTrue\n>>> target in target2\nFalse\n
Notes strict_scope=True
bbot/scanner/target.py
class Target:\n \"\"\"\n A class representing a target. Can contain an unlimited number of hosts, IP or IP ranges, URLs, etc.\n\n Attributes:\n strict_scope (bool): Flag indicating whether to consider child domains in-scope.\n If set to True, only the exact hosts specified and not their children are considered part of the target.\n\n _radix (RadixTree): Radix tree for quick IP/DNS lookups.\n _events (set): Flat set of contained events.\n\n Examples:\n Basic usage\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> len(target)\n 257\n >>> list(t.events)\n [\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'}),\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n ]\n >>> \"www.evilcorp.com\" in target\n True\n >>> \"1.2.3.4\" in target\n True\n >>> \"4.3.2.1\" in target\n False\n >>> \"https://admin.evilcorp.com\" in target\n True\n >>> \"bob@evilcorp.com\" in target\n True\n\n Event correlation\n >>> target.get(\"www.evilcorp.com\")\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n >>> target.get(\"1.2.3.4\")\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n\n Target comparison\n >>> target2 = Targets(scan, \"www.evilcorp.com\")\n >>> target2 == target\n False\n >>> target2 in target\n True\n >>> target in target2\n False\n\n Notes:\n - Targets are only precise down to the individual host. Ports and protocols are not considered in scope calculations.\n - If you specify \"https://evilcorp.com:8443\" as a target, all of evilcorp.com (including subdomains and other ports and protocols) will be considered part of the target\n - If you do not want to include child subdomains, use `strict_scope=True`\n \"\"\"\n\n def __init__(self, *targets, strict_scope=False, scan=None, acl_mode=False):\n \"\"\"\n Initialize a Target object.\n\n Args:\n *targets: One or more targets (e.g., domain names, IP ranges) to be included in this Target.\n strict_scope (bool): Whether to consider subdomains of target domains in-scope\n scan (Scan): Reference to the Scan object that instantiated the Target.\n acl_mode (bool): Stricter deduplication for more efficient checks\n\n Notes:\n - If you are instantiating a target from within a BBOT module, use `self.helpers.make_target()` instead. (this removes the need to pass in a scan object.)\n - The strict_scope flag can be set to restrict scope calculation to only exactly-matching hosts and not their child subdomains.\n - Each target is processed and stored as an `Event` in the '_events' dictionary.\n \"\"\"\n self.scan = scan\n self.strict_scope = strict_scope\n self.acl_mode = acl_mode\n self.special_event_types = {\n \"ORG_STUB\": re.compile(r\"^ORG:(.*)\", re.IGNORECASE),\n \"ASN\": re.compile(r\"^ASN:(.*)\", re.IGNORECASE),\n }\n self._events = set()\n self._radix = RadixTarget()\n\n for target_event in self._make_events(targets):\n self._add_event(target_event)\n\n self._hash = None\n\n def add(self, t, event_type=None):\n \"\"\"\n Add a target or merge events from another Target object into this Target.\n\n Args:\n t: The target to be added. It can be either a string, an event object, or another Target object.\n\n Attributes Modified:\n _events (dict): The dictionary is updated to include the new target's events.\n\n Examples:\n >>> target.add('example.com')\n\n Notes:\n - If `t` is of the same class as this Target, all its events are merged.\n - If `t` is an event, it is directly added to `_events`.\n \"\"\"\n if not isinstance(t, (list, tuple, set)):\n t = [t]\n for single_target in t:\n if isinstance(single_target, self.__class__):\n for event in single_target.events:\n self._add_event(event)\n else:\n if is_event(single_target):\n event = single_target\n else:\n try:\n event = make_event(\n single_target, event_type=event_type, dummy=True, tags=[\"target\"], scan=self.scan\n )\n except ValidationError as e:\n # allow commented lines\n if not str(t).startswith(\"#\"):\n log.trace(traceback.format_exc())\n raise ValidationError(f'Could not add target \"{t}\": {e}')\n self._add_event(event)\n\n @property\n def events(self):\n \"\"\"\n Returns all events in the target.\n\n Yields:\n Event object: One of the Event objects stored in the `_events` dictionary.\n\n Examples:\n >>> target = Target(scan, \"example.com\")\n >>> for event in target.events:\n ... print(event)\n\n Notes:\n - This property is read-only.\n \"\"\"\n return self._events\n\n @property\n def hosts(self):\n return [e.host for e in self.events]\n\n def copy(self):\n \"\"\"\n Creates and returns a copy of the Target object, including a shallow copy of the `_events` and `_radix` attributes.\n\n Returns:\n Target: A new Target object with the sameattributes as the original.\n A shallow copy of the `_events` dictionary is made.\n\n Examples:\n >>> original_target = Target(scan, \"example.com\")\n >>> copied_target = original_target.copy()\n >>> copied_target is original_target\n False\n >>> copied_target == original_target\n True\n >>> copied_target in original_target\n True\n >>> original_target in copied_target\n True\n\n Notes:\n - The `scan` object reference is kept intact in the copied Target object.\n \"\"\"\n self_copy = self.__class__()\n self_copy._events = set(self._events)\n self_copy._radix = copy.copy(self._radix)\n return self_copy\n\n def get(self, host, single=True):\n \"\"\"\n Gets the event associated with the specified host from the target's radix tree.\n\n Args:\n host (Event, Target, or str): The hostname, IP, URL, or event to look for.\n single (bool): Whether to return a single event. If False, return all events matching the host\n\n Returns:\n Event or None: Returns the Event object associated with the given host if it exists, otherwise returns None.\n\n Examples:\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> target.get(\"www.evilcorp.com\")\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n >>> target.get(\"1.2.3.4\")\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n\n Notes:\n - The method returns the first event that matches the given host.\n - If `strict_scope` is False, it will also consider parent domains and IP ranges.\n \"\"\"\n try:\n event = make_event(host, dummy=True)\n except ValidationError:\n return\n if event.host:\n return self.get_host(event.host, single=single)\n\n def get_host(self, host, single=True):\n \"\"\"\n A more efficient version of .get() that only accepts hostnames and IP addresses\n \"\"\"\n host = make_ip_type(host)\n with suppress(KeyError, StopIteration):\n result = self._radix.search(host)\n if result is not None:\n ret = set()\n for event in result:\n # if the result is a dns name and strict scope is enabled\n if isinstance(event.host, str) and self.strict_scope:\n # if the result doesn't exactly equal the host, abort\n if event.host != host:\n return\n if single:\n return event\n else:\n ret.add(event)\n if ret and not single:\n return ret\n\n def _sort_events(self, events):\n return sorted(events, key=lambda x: x._host_size)\n\n def _make_events(self, targets):\n events = []\n for target in targets:\n event_type = None\n for eventtype, regex in self.special_event_types.items():\n if isinstance(target, str):\n match = regex.match(target)\n if match:\n target = match.groups()[0]\n event_type = eventtype\n break\n events.append(make_event(target, event_type=event_type, dummy=True, scan=self.scan))\n return self._sort_events(events)\n\n def _add_event(self, event):\n skip = False\n if event.host:\n radix_data = self._radix.search(event.host)\n if self.acl_mode:\n # skip if the hostname/IP/subnet (or its parent) has already been added\n if radix_data is not None and not self.strict_scope:\n skip = True\n else:\n event_type = \"IP_RANGE\" if event.type == \"IP_RANGE\" else \"DNS_NAME\"\n event = make_event(event.host, event_type=event_type, dummy=True, scan=self.scan)\n if not skip:\n # if strict scope is enabled and it's not an exact host match, we add a whole new entry\n if radix_data is None or (self.strict_scope and event.host not in radix_data):\n radix_data = {event}\n self._radix.insert(event.host, radix_data)\n # otherwise, we add the event to the set\n else:\n radix_data.add(event)\n # clear hash\n self._hash = None\n elif self.acl_mode and not self.strict_scope:\n # skip if we're in ACL mode and there's no host\n skip = True\n if not skip:\n self._events.add(event)\n\n def _contains(self, other):\n if self.get(other) is not None:\n return True\n return False\n\n def __str__(self):\n return \",\".join([str(e.data) for e in self.events][:5])\n\n def __iter__(self):\n yield from self.events\n\n def __contains__(self, other):\n # if \"other\" is a Target\n if isinstance(other, self.__class__):\n contained_in_self = [self._contains(e) for e in other.events]\n return all(contained_in_self)\n else:\n return self._contains(other)\n\n def __bool__(self):\n return bool(self._events)\n\n def __eq__(self, other):\n return self.hash == other.hash\n\n @property\n def hash(self):\n if self._hash is None:\n # Create a new SHA-1 hash object\n sha1_hash = sha1()\n # Update the SHA-1 object with the hash values of each object\n for event_type, event_hash in sorted([(e.type.encode(), e.data_hash) for e in self.events]):\n sha1_hash.update(event_type)\n sha1_hash.update(event_hash)\n if self.strict_scope:\n sha1_hash.update(b\"\\x00\")\n self._hash = sha1_hash.digest()\n return self._hash\n\n def __len__(self):\n \"\"\"\n Calculates and returns the total number of hosts within this target, not counting duplicate events.\n\n Returns:\n int: The total number of unique hosts present within the target's `_events`.\n\n Examples:\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> len(target)\n 257\n\n Notes:\n - If a host is represented as an IP network, all individual IP addresses in that network are counted.\n - For other types of hosts, each unique event is counted as one.\n \"\"\"\n num_hosts = 0\n for event in self._events:\n if isinstance(event.host, (ipaddress.IPv4Network, ipaddress.IPv6Network)):\n num_hosts += event.host.num_addresses\n else:\n num_hosts += 1\n return num_hosts\n
"},{"location":"dev/target/#bbot.scanner.target.Target.events","title":"events property
","text":"events\n
Returns all events in the target.
Yields:
Event object: One of the Event objects stored in the _events
dictionary.
Examples:
>>> target = Target(scan, \"example.com\")\n>>> for event in target.events:\n... print(event)\n
Notes __init__(*targets, strict_scope=False, scan=None, acl_mode=False)\n
Initialize a Target object.
Parameters:
*targets
\u2013 One or more targets (e.g., domain names, IP ranges) to be included in this Target.
strict_scope
(bool
, default: False
) \u2013 Whether to consider subdomains of target domains in-scope
scan
(Scan
, default: None
) \u2013 Reference to the Scan object that instantiated the Target.
acl_mode
(bool
, default: False
) \u2013 Stricter deduplication for more efficient checks
self.helpers.make_target()
instead. (this removes the need to pass in a scan object.)Event
in the '_events' dictionary.bbot/scanner/target.py
def __init__(self, *targets, strict_scope=False, scan=None, acl_mode=False):\n \"\"\"\n Initialize a Target object.\n\n Args:\n *targets: One or more targets (e.g., domain names, IP ranges) to be included in this Target.\n strict_scope (bool): Whether to consider subdomains of target domains in-scope\n scan (Scan): Reference to the Scan object that instantiated the Target.\n acl_mode (bool): Stricter deduplication for more efficient checks\n\n Notes:\n - If you are instantiating a target from within a BBOT module, use `self.helpers.make_target()` instead. (this removes the need to pass in a scan object.)\n - The strict_scope flag can be set to restrict scope calculation to only exactly-matching hosts and not their child subdomains.\n - Each target is processed and stored as an `Event` in the '_events' dictionary.\n \"\"\"\n self.scan = scan\n self.strict_scope = strict_scope\n self.acl_mode = acl_mode\n self.special_event_types = {\n \"ORG_STUB\": re.compile(r\"^ORG:(.*)\", re.IGNORECASE),\n \"ASN\": re.compile(r\"^ASN:(.*)\", re.IGNORECASE),\n }\n self._events = set()\n self._radix = RadixTarget()\n\n for target_event in self._make_events(targets):\n self._add_event(target_event)\n\n self._hash = None\n
"},{"location":"dev/target/#bbot.scanner.target.Target.add","title":"add","text":"add(t, event_type=None)\n
Add a target or merge events from another Target object into this Target.
Parameters:
t
\u2013 The target to be added. It can be either a string, an event object, or another Target object.
_events (dict): The dictionary is updated to include the new target's events.
Examples:
>>> target.add('example.com')\n
Notes t
is of the same class as this Target, all its events are merged.t
is an event, it is directly added to _events
.bbot/scanner/target.py
def add(self, t, event_type=None):\n \"\"\"\n Add a target or merge events from another Target object into this Target.\n\n Args:\n t: The target to be added. It can be either a string, an event object, or another Target object.\n\n Attributes Modified:\n _events (dict): The dictionary is updated to include the new target's events.\n\n Examples:\n >>> target.add('example.com')\n\n Notes:\n - If `t` is of the same class as this Target, all its events are merged.\n - If `t` is an event, it is directly added to `_events`.\n \"\"\"\n if not isinstance(t, (list, tuple, set)):\n t = [t]\n for single_target in t:\n if isinstance(single_target, self.__class__):\n for event in single_target.events:\n self._add_event(event)\n else:\n if is_event(single_target):\n event = single_target\n else:\n try:\n event = make_event(\n single_target, event_type=event_type, dummy=True, tags=[\"target\"], scan=self.scan\n )\n except ValidationError as e:\n # allow commented lines\n if not str(t).startswith(\"#\"):\n log.trace(traceback.format_exc())\n raise ValidationError(f'Could not add target \"{t}\": {e}')\n self._add_event(event)\n
"},{"location":"dev/target/#bbot.scanner.target.Target.copy","title":"copy","text":"copy()\n
Creates and returns a copy of the Target object, including a shallow copy of the _events
and _radix
attributes.
Returns:
Target
\u2013 A new Target object with the sameattributes as the original. A shallow copy of the _events
dictionary is made.
Examples:
>>> original_target = Target(scan, \"example.com\")\n>>> copied_target = original_target.copy()\n>>> copied_target is original_target\nFalse\n>>> copied_target == original_target\nTrue\n>>> copied_target in original_target\nTrue\n>>> original_target in copied_target\nTrue\n
Notes scan
object reference is kept intact in the copied Target object.bbot/scanner/target.py
def copy(self):\n \"\"\"\n Creates and returns a copy of the Target object, including a shallow copy of the `_events` and `_radix` attributes.\n\n Returns:\n Target: A new Target object with the sameattributes as the original.\n A shallow copy of the `_events` dictionary is made.\n\n Examples:\n >>> original_target = Target(scan, \"example.com\")\n >>> copied_target = original_target.copy()\n >>> copied_target is original_target\n False\n >>> copied_target == original_target\n True\n >>> copied_target in original_target\n True\n >>> original_target in copied_target\n True\n\n Notes:\n - The `scan` object reference is kept intact in the copied Target object.\n \"\"\"\n self_copy = self.__class__()\n self_copy._events = set(self._events)\n self_copy._radix = copy.copy(self._radix)\n return self_copy\n
"},{"location":"dev/target/#bbot.scanner.target.Target.get","title":"get","text":"get(host, single=True)\n
Gets the event associated with the specified host from the target's radix tree.
Parameters:
host
(Event, Target, or str
) \u2013 The hostname, IP, URL, or event to look for.
single
(bool
, default: True
) \u2013 Whether to return a single event. If False, return all events matching the host
Returns:
Event or None: Returns the Event object associated with the given host if it exists, otherwise returns None.
Examples:
>>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n>>> target.get(\"www.evilcorp.com\")\nDNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n>>> target.get(\"1.2.3.4\")\nIP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n
Notes strict_scope
is False, it will also consider parent domains and IP ranges.bbot/scanner/target.py
def get(self, host, single=True):\n \"\"\"\n Gets the event associated with the specified host from the target's radix tree.\n\n Args:\n host (Event, Target, or str): The hostname, IP, URL, or event to look for.\n single (bool): Whether to return a single event. If False, return all events matching the host\n\n Returns:\n Event or None: Returns the Event object associated with the given host if it exists, otherwise returns None.\n\n Examples:\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> target.get(\"www.evilcorp.com\")\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n >>> target.get(\"1.2.3.4\")\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n\n Notes:\n - The method returns the first event that matches the given host.\n - If `strict_scope` is False, it will also consider parent domains and IP ranges.\n \"\"\"\n try:\n event = make_event(host, dummy=True)\n except ValidationError:\n return\n if event.host:\n return self.get_host(event.host, single=single)\n
"},{"location":"dev/target/#bbot.scanner.target.Target.get_host","title":"get_host","text":"get_host(host, single=True)\n
A more efficient version of .get() that only accepts hostnames and IP addresses
Source code inbbot/scanner/target.py
def get_host(self, host, single=True):\n \"\"\"\n A more efficient version of .get() that only accepts hostnames and IP addresses\n \"\"\"\n host = make_ip_type(host)\n with suppress(KeyError, StopIteration):\n result = self._radix.search(host)\n if result is not None:\n ret = set()\n for event in result:\n # if the result is a dns name and strict scope is enabled\n if isinstance(event.host, str) and self.strict_scope:\n # if the result doesn't exactly equal the host, abort\n if event.host != host:\n return\n if single:\n return event\n else:\n ret.add(event)\n if ret and not single:\n return ret\n
"},{"location":"dev/tests/","title":"Unit Tests","text":"BBOT takes tests seriously. Every module must have a custom-written test that actually tests its functionality. Don't worry if you want to contribute but you aren't used to writing tests. If you open a draft PR, we will help write them :)
We use black and flake8 for linting, and pytest for tests.
"},{"location":"dev/tests/#running-tests-locally","title":"Running tests locally","text":"We have Github actions that automatically run tests whenever you open a Pull Request. However, you can also run the tests locally with pytest
:
# format code with black\npoetry run black .\n\n# lint with flake8\npoetry run flake8\n\n# run all tests with pytest (takes rougly 30 minutes)\npoetry run pytest\n
"},{"location":"dev/tests/#running-specific-tests","title":"Running specific tests","text":"If you only want to run a single test, you can select it with -k
:
# run only the sslcert test\npoetry run pytest -k test_module_sslcert\n
You can also filter like this:
# run all the module tests except for sslcert\npoetry run pytest -k \"test_module_ and not test_module_sslcert\"\n
If you want to see the output of your module, you can enable --log-cli-level
:
poetry run pytest --log-cli-level=DEBUG\n
"},{"location":"dev/tests/#example-writing-a-module-test","title":"Example: Writing a Module Test","text":"To write a test for your module, create a new python file in bbot/test/test_step_2/module_tests
. Your filename must be test_module_<module_name>
:
from .base import ModuleTestBase\n\n\nclass TestMyModule(ModuleTestBase):\n targets = [\"blacklanternsecurity.com\"]\n config_overrides = {\"modules\": {\"mymodule\": {\"api_key\": \"deadbeef\"}}}\n\n async def setup_after_prep(self, module_test):\n # mock HTTP response\n module_test.httpx_mock.add_response(\n url=\"https://api.com/sudomains?apikey=deadbeef&domain=blacklanternsecurity.com\",\n json={\n \"subdomains\": [\n \"www.blacklanternsecurity.com\",\n \"dev.blacklanternsecurity.com\"\n ],\n },\n )\n # mock DNS\n await module_test.mock_dns(\n {\n \"blacklanternsecurity.com\": {\"A\": [\"1.2.3.4\"]},\n \"www.blacklanternsecurity.com\": {\"A\": [\"1.2.3.4\"]},\n \"dev.blacklanternsecurity.com\": {\"A\": [\"1.2.3.4\"]},\n }\n )\n\n def check(self, module_test, events):\n # here is where we check to make sure it worked\n dns_names = [e.data for e in events if e.type == \"DNS_NAME\"]\n # temporary log messages for debugging\n for e in dns_names:\n self.log.critical(e)\n assert \"www.blacklanternsecurity.com\" in dns_names, \"failed to find subdomain #1\"\n assert \"dev.blacklanternsecurity.com\" in dns_names, \"failed to find subdomain #2\"\n
"},{"location":"dev/tests/#debugging-a-test","title":"Debugging a test","text":"Similar to debugging from within a module, you can debug from within a test using self.log.critical()
, etc:
def check(self, module_test, events):\n for e in events:\n # bright red\n self.log.critical(e.type)\n # bright green\n self.log.hugesuccess(e.data)\n # bright orange\n self.log.hugewarning(e.tags)\n # bright blue\n self.log.hugeinfo(e.parent)\n
"},{"location":"dev/tests/#more-advanced-tests","title":"More advanced tests","text":"If you have questions about tests or need to write a more advanced test, come talk to us on GitHub or Discord.
It's also a good idea to look through our existing tests. BBOT has over a hundred of them, so you might find one that's similar to what you're trying to do.
"},{"location":"dev/helpers/","title":"BBOT Helpers","text":"In this section are various helper functions that are designed to make your life easier when devving on BBOT. Whether you're extending BBOT by writing a module or working on its core engine, these functions are designed to act as useful machine parts to perform essential tasks, such as making a web request or executing a DNS query.
The vast majority of these helpers can be accessed directly from the .helpers
attribute of a scan or module, like so:
class MyModule(BaseModule):\n\n ...\n\n async def handle_event(self, event):\n # Web Request\n response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n # DNS query\n for ip in await self.helpers.resolve(\"www.evilcorp.com\"):\n self.hugesuccess(str(ip))\n\n # Execute shell command\n completed_process = await self.run_process(\"ls\", \"-l\")\n self.hugesuccess(completed_process.stdout)\n\n # Split a DNS name into subdomain / domain\n self.helpers.split_domain(\"www.internal.evilcorp.co.uk\")\n # (\"www.internal\", \"evilcorp.co.uk\")\n
Next Up: Command Helpers -->
"},{"location":"dev/helpers/command/","title":"Command Helpers","text":"These are helpers related to executing shell commands. They are used throughout BBOT and its modules for executing various binaries such as masscan
, nuclei
, etc.
These helpers can be invoked directly from self.helpers
, but inside a module they should always use self.run_process()
or self.run_process_live()
. These are light wrappers which ensure the running process is tracked by the module so that it can be easily terminated should the user need to kill the module:
# simple subprocess\nls_result = await self.run_process(\"ls\", \"-l\")\nfor line ls_result.stdout.splitlines():\n # ...\n\n# iterate through each line in real time\nasync for line in self.run_process_live([\"grep\", \"-R\"]):\n # ...\n
"},{"location":"dev/helpers/command/#bbot.core.helpers.command.run","title":"run async
","text":"run(self, *command, check=False, text=True, idle_timeout=None, **kwargs)\n
Runs a command asynchronously and gets its output as a string.
This method is a simple helper for executing a command and capturing its output.\nIf an error occurs during execution, it can optionally raise an error or just log the stderr.\n\nArgs:\n *command (str): The command to run as separate arguments.\n check (bool, optional): If set to True, raises an error if the subprocess exits with a non-zero status.\n Defaults to False.\n text (bool, optional): If set to True, decodes the subprocess output to string. Defaults to True.\n idle_timeout (int, optional): Sets a limit on the number of seconds the process can run before throwing a TimeoutError\n **kwargs (dict): Additional keyword arguments for the subprocess.\n\nReturns:\n CompletedProcess: A completed process object with attributes for the command, return code, stdout, and stderr.\n\nRaises:\n CalledProcessError: If the subprocess exits with a non-zero status and `check=True`.\n\nExamples:\n >>> process = await run([\"ls\", \"/tmp\"])\n >>> process.stdout\n \"file1.txt\n
file2.txt\"
Source code inbbot/core/helpers/command.py
async def run(self, *command, check=False, text=True, idle_timeout=None, **kwargs):\n \"\"\"Runs a command asynchronously and gets its output as a string.\n\n This method is a simple helper for executing a command and capturing its output.\n If an error occurs during execution, it can optionally raise an error or just log the stderr.\n\n Args:\n *command (str): The command to run as separate arguments.\n check (bool, optional): If set to True, raises an error if the subprocess exits with a non-zero status.\n Defaults to False.\n text (bool, optional): If set to True, decodes the subprocess output to string. Defaults to True.\n idle_timeout (int, optional): Sets a limit on the number of seconds the process can run before throwing a TimeoutError\n **kwargs (dict): Additional keyword arguments for the subprocess.\n\n Returns:\n CompletedProcess: A completed process object with attributes for the command, return code, stdout, and stderr.\n\n Raises:\n CalledProcessError: If the subprocess exits with a non-zero status and `check=True`.\n\n Examples:\n >>> process = await run([\"ls\", \"/tmp\"])\n >>> process.stdout\n \"file1.txt\\nfile2.txt\"\n \"\"\"\n # proc_tracker optionally keeps track of which processes are running under which modules\n # this allows for graceful SIGINTing of a module's processes in the case when it's killed\n proc_tracker = kwargs.pop(\"_proc_tracker\", set())\n log_stderr = kwargs.pop(\"_log_stderr\", True)\n proc, _input, command = await self._spawn_proc(*command, **kwargs)\n if proc is not None:\n proc_tracker.add(proc)\n try:\n if _input is not None:\n if isinstance(_input, (list, tuple)):\n _input = b\"\\n\".join(smart_encode(i) for i in _input) + b\"\\n\"\n else:\n _input = smart_encode(_input)\n\n try:\n if idle_timeout is not None:\n stdout, stderr = await asyncio.wait_for(proc.communicate(_input), timeout=idle_timeout)\n else:\n stdout, stderr = await proc.communicate(_input)\n except asyncio.exceptions.TimeoutError:\n proc.send_signal(SIGINT)\n raise\n\n # surface stderr\n if text:\n if stderr is not None:\n stderr = smart_decode(stderr)\n if stdout is not None:\n stdout = smart_decode(stdout)\n if proc.returncode:\n if check:\n raise CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)\n if stderr and log_stderr:\n command_str = \" \".join(command)\n log.warning(f\"Stderr for run({command_str}):\\n\\t{stderr}\")\n\n return CompletedProcess(command, proc.returncode, stdout, stderr)\n finally:\n proc_tracker.remove(proc)\n
"},{"location":"dev/helpers/command/#bbot.core.helpers.command.run_live","title":"run_live async
","text":"run_live(self, *command, check=False, text=True, idle_timeout=None, **kwargs)\n
Runs a command asynchronously and iterates through its output line by line in realtime.
This method is useful for executing a command and capturing its output on-the-fly, as it is generated. If an error occurs during execution, it can optionally raise an error or just log the stderr.
Parameters:
*command
(str
, default: ()
) \u2013 The command to run as separate arguments.
check
(bool
, default: False
) \u2013 If set to True, raises an error if the subprocess exits with a non-zero status. Defaults to False.
text
(bool
, default: True
) \u2013 If set to True, decodes the subprocess output to string. Defaults to True.
idle_timeout
(int
, default: None
) \u2013 Sets a limit on the number of seconds the process can remain idle (no lines sent to stdout) before throwing a TimeoutError
**kwargs
(dict
, default: {}
) \u2013 Additional keyword arguments for the subprocess.
Yields:
str or bytes: The output lines of the command, either as a decoded string (if text=True
) or as bytes (if text=False
).
Raises:
CalledProcessError
\u2013 If the subprocess exits with a non-zero status and check=True
.
Examples:
>>> async for line in run_live([\"tail\", \"-f\", \"/var/log/auth.log\"]):\n... log.info(line)\n
Source code in bbot/core/helpers/command.py
async def run_live(self, *command, check=False, text=True, idle_timeout=None, **kwargs):\n \"\"\"Runs a command asynchronously and iterates through its output line by line in realtime.\n\n This method is useful for executing a command and capturing its output on-the-fly, as it is generated.\n If an error occurs during execution, it can optionally raise an error or just log the stderr.\n\n Args:\n *command (str): The command to run as separate arguments.\n check (bool, optional): If set to True, raises an error if the subprocess exits with a non-zero status.\n Defaults to False.\n text (bool, optional): If set to True, decodes the subprocess output to string. Defaults to True.\n idle_timeout (int, optional): Sets a limit on the number of seconds the process can remain idle (no lines sent to stdout) before throwing a TimeoutError\n **kwargs (dict): Additional keyword arguments for the subprocess.\n\n Yields:\n str or bytes: The output lines of the command, either as a decoded string (if `text=True`)\n or as bytes (if `text=False`).\n\n Raises:\n CalledProcessError: If the subprocess exits with a non-zero status and `check=True`.\n\n Examples:\n >>> async for line in run_live([\"tail\", \"-f\", \"/var/log/auth.log\"]):\n ... log.info(line)\n \"\"\"\n # proc_tracker optionally keeps track of which processes are running under which modules\n # this allows for graceful SIGINTing of a module's processes in the case when it's killed\n proc_tracker = kwargs.pop(\"_proc_tracker\", set())\n log_stderr = kwargs.pop(\"_log_stderr\", True)\n proc, _input, command = await self._spawn_proc(*command, **kwargs)\n if proc is not None:\n proc_tracker.add(proc)\n try:\n input_task = None\n if _input is not None:\n input_task = asyncio.create_task(_write_stdin(proc, _input))\n\n while 1:\n try:\n if idle_timeout is not None:\n line = await asyncio.wait_for(proc.stdout.readline(), timeout=idle_timeout)\n else:\n line = await proc.stdout.readline()\n except asyncio.exceptions.TimeoutError:\n proc.send_signal(SIGINT)\n raise\n except ValueError as e:\n command_str = \" \".join([str(c) for c in command])\n log.warning(f\"Error executing command {command_str}: {e}\")\n log.trace(traceback.format_exc())\n continue\n if not line:\n break\n if text:\n line = smart_decode(line).rstrip(\"\\r\\n\")\n else:\n line = line.rstrip(b\"\\r\\n\")\n yield line\n\n if input_task is not None:\n try:\n await input_task\n except ConnectionError:\n log.trace(f\"ConnectionError in command: {command}, kwargs={kwargs}\")\n log.trace(traceback.format_exc())\n await proc.wait()\n\n if proc.returncode:\n stdout, stderr = await proc.communicate()\n if text:\n if stderr is not None:\n stderr = smart_decode(stderr)\n if stdout is not None:\n stdout = smart_decode(stdout)\n if check:\n raise CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)\n # surface stderr\n if stderr and log_stderr:\n command_str = \" \".join(command)\n log.warning(f\"Stderr for run_live({command_str}):\\n\\t{stderr}\")\n finally:\n proc_tracker.remove(proc)\n
"},{"location":"dev/helpers/dns/","title":"DNS","text":"These are helpers related to DNS resolution. They are used throughout BBOT and its modules for performing DNS lookups and detecting DNS wildcards, etc.
Note that these helpers can be invoked directly from self.helpers
, e.g.:
self.helpers.resolve(\"evilcorp.com\")\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper","title":"DNSHelper","text":" Bases: EngineClient
bbot/core/helpers/dns/dns.py
class DNSHelper(EngineClient):\n\n SERVER_CLASS = DNSEngine\n ERROR_CLASS = DNSError\n\n \"\"\"Helper class for DNS-related operations within BBOT.\n\n This class provides mechanisms for host resolution, wildcard domain detection, event tagging, and more.\n It centralizes all DNS-related activities in BBOT, offering both synchronous and asynchronous methods\n for DNS resolution, as well as various utilities for batch resolution and DNS query filtering.\n\n Attributes:\n parent_helper: A reference to the instantiated `ConfigAwareHelper` (typically `scan.helpers`).\n resolver (BBOTAsyncResolver): An asynchronous DNS resolver tailored for BBOT with rate-limiting capabilities.\n timeout (int): The timeout value for DNS queries. Defaults to 5 seconds.\n retries (int): The number of retries for failed DNS queries. Defaults to 1.\n abort_threshold (int): The threshold for aborting after consecutive failed queries. Defaults to 50.\n runaway_limit (int): Maximum allowed distance for consecutive DNS resolutions. Defaults to 5.\n all_rdtypes (list): A list of DNS record types to be considered during operations.\n wildcard_ignore (tuple): Domains to be ignored during wildcard detection.\n wildcard_tests (int): Number of tests to be run for wildcard detection. Defaults to 5.\n _wildcard_cache (dict): Cache for wildcard detection results.\n _dns_cache (LRUCache): Cache for DNS resolution results, limited in size.\n resolver_file (Path): File containing system's current resolver nameservers.\n filter_bad_ptrs (bool): Whether to filter out DNS names that appear to be auto-generated PTR records. Defaults to True.\n\n Args:\n parent_helper: The parent helper object with configuration details and utilities.\n\n Raises:\n DNSError: If an issue arises when creating the BBOTAsyncResolver instance.\n\n Examples:\n >>> dns_helper = DNSHelper(parent_config)\n >>> resolved_host = dns_helper.resolver.resolve(\"example.com\")\n \"\"\"\n\n def __init__(self, parent_helper):\n self.parent_helper = parent_helper\n self.config = self.parent_helper.config\n self.dns_config = self.config.get(\"dns\", {})\n engine_debug = self.config.get(\"engine\", {}).get(\"debug\", False)\n super().__init__(server_kwargs={\"config\": self.config}, debug=engine_debug)\n\n # resolver\n self.timeout = self.dns_config.get(\"timeout\", 5)\n self.resolver = dns.asyncresolver.Resolver()\n self.resolver.rotate = True\n self.resolver.timeout = self.timeout\n self.resolver.lifetime = self.timeout\n\n self.runaway_limit = self.config.get(\"runaway_limit\", 5)\n\n # wildcard handling\n self.wildcard_disable = self.dns_config.get(\"wildcard_disable\", False)\n self.wildcard_ignore = RadixTarget()\n for d in self.dns_config.get(\"wildcard_ignore\", []):\n self.wildcard_ignore.insert(d)\n\n # copy the system's current resolvers to a text file for tool use\n self.system_resolvers = dns.resolver.Resolver().nameservers\n # TODO: DNS server speed test (start in background task)\n self.resolver_file = self.parent_helper.tempfile(self.system_resolvers, pipe=False)\n\n # brute force helper\n self._brute = None\n\n self._is_wildcard_cache = LFUCache(maxsize=1000)\n self._is_wildcard_domain_cache = LFUCache(maxsize=1000)\n\n async def resolve(self, query, **kwargs):\n return await self.run_and_return(\"resolve\", query=query, **kwargs)\n\n async def resolve_raw(self, query, **kwargs):\n return await self.run_and_return(\"resolve_raw\", query=query, **kwargs)\n\n async def resolve_batch(self, queries, **kwargs):\n agen = self.run_and_yield(\"resolve_batch\", queries=queries, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n async def resolve_raw_batch(self, queries):\n agen = self.run_and_yield(\"resolve_raw_batch\", queries=queries)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n @property\n def brute(self):\n if self._brute is None:\n from .brute import DNSBrute\n\n self._brute = DNSBrute(self.parent_helper)\n return self._brute\n\n @async_cachedmethod(lambda self: self._is_wildcard_cache)\n async def is_wildcard(self, query, ips=None, rdtype=None):\n \"\"\"\n Use this method to check whether a *host* is a wildcard entry\n\n This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.\n\n If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.\n\n Args:\n query (str): The hostname to check for a wildcard entry.\n ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.\n rdtype (str, optional): The DNS record type (e.g., \"A\", \"AAAA\") to consider during the check.\n\n Returns:\n dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.\n Keys are DNS record types like \"A\", \"AAAA\", etc.\n Values are tuples where the first element is a boolean indicating if the query is a wildcard,\n and the second element is the wildcard parent if it's a wildcard.\n\n Raises:\n ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.\n\n Examples:\n >>> is_wildcard(\"www.github.io\")\n {\"A\": (True, \"github.io\"), \"AAAA\": (True, \"github.io\")}\n\n >>> is_wildcard(\"www.evilcorp.com\", ips=[\"93.184.216.34\"], rdtype=\"A\")\n {\"A\": (False, \"evilcorp.com\")}\n\n Note:\n `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)\n \"\"\"\n if [ips, rdtype].count(None) == 1:\n raise ValueError(\"Both ips and rdtype must be specified\")\n\n query = self._wildcard_prevalidation(query)\n if not query:\n return {}\n\n # skip check if the query is a domain\n if is_domain(query):\n return {}\n\n return await self.run_and_return(\"is_wildcard\", query=query, ips=ips, rdtype=rdtype)\n\n @async_cachedmethod(lambda self: self._is_wildcard_domain_cache)\n async def is_wildcard_domain(self, domain, log_info=False):\n domain = self._wildcard_prevalidation(domain)\n if not domain:\n return {}\n\n return await self.run_and_return(\"is_wildcard_domain\", domain=domain, log_info=False)\n\n def _wildcard_prevalidation(self, host):\n if self.wildcard_disable:\n return False\n\n host = clean_dns_record(host)\n # skip check if it's an IP or a plain hostname\n if is_ip(host) or not \".\" in host:\n return False\n\n # skip if query isn't a dns name\n if not is_dns_name(host):\n return False\n\n # skip check if the query's parent domain is excluded in the config\n wildcard_ignore = self.wildcard_ignore.search(host)\n if wildcard_ignore:\n log.debug(f\"Skipping wildcard detection on {host} because {wildcard_ignore} is excluded in the config\")\n return False\n\n return host\n\n async def _mock_dns(self, mock_data):\n from .mock import MockResolver\n\n self.resolver = MockResolver(mock_data)\n await self.run_and_return(\"_mock_dns\", mock_data=mock_data)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.resolve","title":"resolve async
","text":"resolve(query, **kwargs)\n
Source code in bbot/core/helpers/dns/dns.py
async def resolve(self, query, **kwargs):\n return await self.run_and_return(\"resolve\", query=query, **kwargs)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.resolve_batch","title":"resolve_batch async
","text":"resolve_batch(queries, **kwargs)\n
Source code in bbot/core/helpers/dns/dns.py
async def resolve_batch(self, queries, **kwargs):\n agen = self.run_and_yield(\"resolve_batch\", queries=queries, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.resolve_raw","title":"resolve_raw async
","text":"resolve_raw(query, **kwargs)\n
Source code in bbot/core/helpers/dns/dns.py
async def resolve_raw(self, query, **kwargs):\n return await self.run_and_return(\"resolve_raw\", query=query, **kwargs)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.is_wildcard","title":"is_wildcard async
","text":"is_wildcard(query, ips=None, rdtype=None)\n
Use this method to check whether a host is a wildcard entry
This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.
If you want to know whether a domain is using wildcard DNS, use is_wildcard_domain()
instead.
Parameters:
query
(str
) \u2013 The hostname to check for a wildcard entry.
ips
(list
, default: None
) \u2013 List of IPs to compare against, typically obtained from a previous DNS resolution of the query.
rdtype
(str
, default: None
) \u2013 The DNS record type (e.g., \"A\", \"AAAA\") to consider during the check.
Returns:
dict
\u2013 A dictionary indicating if the query is a wildcard for each checked DNS record type. Keys are DNS record types like \"A\", \"AAAA\", etc. Values are tuples where the first element is a boolean indicating if the query is a wildcard, and the second element is the wildcard parent if it's a wildcard.
Raises:
ValueError
\u2013 If only one of ips
or rdtype
is specified or if no valid IPs are specified.
Examples:
>>> is_wildcard(\"www.github.io\")\n{\"A\": (True, \"github.io\"), \"AAAA\": (True, \"github.io\")}\n
>>> is_wildcard(\"www.evilcorp.com\", ips=[\"93.184.216.34\"], rdtype=\"A\")\n{\"A\": (False, \"evilcorp.com\")}\n
Note is_wildcard
can be True, False, or None (indicating that wildcard detection was inconclusive)
bbot/core/helpers/dns/dns.py
@async_cachedmethod(lambda self: self._is_wildcard_cache)\nasync def is_wildcard(self, query, ips=None, rdtype=None):\n \"\"\"\n Use this method to check whether a *host* is a wildcard entry\n\n This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.\n\n If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.\n\n Args:\n query (str): The hostname to check for a wildcard entry.\n ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.\n rdtype (str, optional): The DNS record type (e.g., \"A\", \"AAAA\") to consider during the check.\n\n Returns:\n dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.\n Keys are DNS record types like \"A\", \"AAAA\", etc.\n Values are tuples where the first element is a boolean indicating if the query is a wildcard,\n and the second element is the wildcard parent if it's a wildcard.\n\n Raises:\n ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.\n\n Examples:\n >>> is_wildcard(\"www.github.io\")\n {\"A\": (True, \"github.io\"), \"AAAA\": (True, \"github.io\")}\n\n >>> is_wildcard(\"www.evilcorp.com\", ips=[\"93.184.216.34\"], rdtype=\"A\")\n {\"A\": (False, \"evilcorp.com\")}\n\n Note:\n `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)\n \"\"\"\n if [ips, rdtype].count(None) == 1:\n raise ValueError(\"Both ips and rdtype must be specified\")\n\n query = self._wildcard_prevalidation(query)\n if not query:\n return {}\n\n # skip check if the query is a domain\n if is_domain(query):\n return {}\n\n return await self.run_and_return(\"is_wildcard\", query=query, ips=ips, rdtype=rdtype)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.is_wildcard_domain","title":"is_wildcard_domain async
","text":"is_wildcard_domain(domain, log_info=False)\n
Source code in bbot/core/helpers/dns/dns.py
@async_cachedmethod(lambda self: self._is_wildcard_domain_cache)\nasync def is_wildcard_domain(self, domain, log_info=False):\n domain = self._wildcard_prevalidation(domain)\n if not domain:\n return {}\n\n return await self.run_and_return(\"is_wildcard_domain\", domain=domain, log_info=False)\n
"},{"location":"dev/helpers/interactsh/","title":"Interact.sh","text":"A pure python implementation of ProjectDiscovery's interact.sh.
\"Interactsh is an open-source tool for detecting out-of-band interactions. It is a tool designed to detect vulnerabilities that cause external interactions.\"
This class facilitates interactions with the interact.sh service for out-of-band data exfiltration and vulnerability confirmation. It allows for customization by accepting server and token parameters from the configuration provided by parent_helper
.
Attributes:
parent_helper
(ConfigAwareHelper
) \u2013 An instance of a helper class containing configuration data.
server
(str
) \u2013 The server to be used. If None (the default), a random server will be chosen from a predetermined list.
correlation_id
(str
) \u2013 An identifier to correlate requests and responses. Default is None.
custom_server
(str
) \u2013 Optional. A custom interact.sh server. Loaded from configuration.
token
(str
) \u2013 Optional. A token for interact.sh API. Loaded from configuration.
_poll_task
(AsyncTask
) \u2013 The task responsible for polling the interact.sh server.
Examples:
# instantiate interact.sh client (no requests are sent yet)\n>>> interactsh_client = self.helpers.interactsh()\n# register with an interact.sh server\n>>> interactsh_domain = await interactsh_client.register()\n[INFO] Registering with interact.sh server: oast.me\n[INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n# simulate an out-of-band interaction\n>>> await self.helpers.request(f\"https://{interactsh_domain}/test\")\n# wait for out-of-band interaction to be registered\n>>> await asyncio.sleep(10)\n>>> data_list = await interactsh_client.poll()\n>>> print(data_list)\n[\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"q-type\": \"A\",\n \"raw-request\": \"...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:23.187226851Z\"\n },\n {\n \"protocol\": \"http\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"raw-request\": \"GET /test HTTP/1.1 ...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:24.155677967Z\"\n }\n]\n# finally, shut down the client\n>>> await interactsh_client.deregister()\n
Source code in bbot/core/helpers/interactsh.py
class Interactsh:\n \"\"\"\n A pure python implementation of ProjectDiscovery's interact.sh.\n\n *\"Interactsh is an open-source tool for detecting out-of-band interactions. It is a tool designed to detect vulnerabilities that cause external interactions.\"*\n\n - https://app.interactsh.com\n - https://github.com/projectdiscovery/interactsh\n\n This class facilitates interactions with the interact.sh service for\n out-of-band data exfiltration and vulnerability confirmation. It allows\n for customization by accepting server and token parameters from the\n configuration provided by `parent_helper`.\n\n Attributes:\n parent_helper (ConfigAwareHelper): An instance of a helper class containing configuration data.\n server (str): The server to be used. If None (the default), a random server will be chosen from a predetermined list.\n correlation_id (str): An identifier to correlate requests and responses. Default is None.\n custom_server (str): Optional. A custom interact.sh server. Loaded from configuration.\n token (str): Optional. A token for interact.sh API. Loaded from configuration.\n _poll_task (AsyncTask): The task responsible for polling the interact.sh server.\n\n Examples:\n ```python\n # instantiate interact.sh client (no requests are sent yet)\n >>> interactsh_client = self.helpers.interactsh()\n # register with an interact.sh server\n >>> interactsh_domain = await interactsh_client.register()\n [INFO] Registering with interact.sh server: oast.me\n [INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n # simulate an out-of-band interaction\n >>> await self.helpers.request(f\"https://{interactsh_domain}/test\")\n # wait for out-of-band interaction to be registered\n >>> await asyncio.sleep(10)\n >>> data_list = await interactsh_client.poll()\n >>> print(data_list)\n [\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"q-type\": \"A\",\n \"raw-request\": \"...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:23.187226851Z\"\n },\n {\n \"protocol\": \"http\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"raw-request\": \"GET /test HTTP/1.1 ...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:24.155677967Z\"\n }\n ]\n # finally, shut down the client\n >>> await interactsh_client.deregister()\n ```\n \"\"\"\n\n def __init__(self, parent_helper, poll_interval=10):\n self.parent_helper = parent_helper\n self.server = None\n self.correlation_id = None\n self.custom_server = self.parent_helper.config.get(\"interactsh_server\", None)\n self.token = self.parent_helper.config.get(\"interactsh_token\", None)\n self.poll_interval = poll_interval\n self._poll_task = None\n\n async def register(self, callback=None):\n \"\"\"\n Registers the instance with an interact.sh server and sets up polling.\n\n Generates RSA keys for secure communication, builds a correlation ID,\n and sends a POST request to an interact.sh server to register. Optionally,\n starts an asynchronous polling task to listen for interactions.\n\n Args:\n callback (callable, optional): A function to be called each time new interactions are received.\n\n Returns:\n str: The registered domain for out-of-band interactions.\n\n Raises:\n InteractshError: If registration with an interact.sh server fails.\n\n Examples:\n >>> interactsh_client = self.helpers.interactsh()\n >>> registered_domain = await interactsh_client.register()\n [INFO] Registering with interact.sh server: oast.me\n [INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n \"\"\"\n rsa = RSA.generate(1024)\n\n self.public_key = rsa.publickey().exportKey()\n self.private_key = rsa.exportKey()\n\n encoded_public_key = base64.b64encode(self.public_key).decode(\"utf8\")\n\n uuid = uuid4().hex.ljust(33, \"a\")\n guid = \"\".join(i if i.isdigit() else chr(ord(i) + random.randint(0, 20)) for i in uuid)\n\n self.correlation_id = guid[:20]\n self.secret = str(uuid4())\n headers = {}\n\n if self.custom_server:\n if not self.token:\n log.verbose(\"Interact.sh token is not set\")\n else:\n headers[\"Authorization\"] = self.token\n self.server_list = [str(self.custom_server)]\n else:\n self.server_list = random.sample(server_list, k=len(server_list))\n for server in self.server_list:\n log.info(f\"Registering with interact.sh server: {server}\")\n data = {\n \"public-key\": encoded_public_key,\n \"secret-key\": self.secret,\n \"correlation-id\": self.correlation_id,\n }\n r = await self.parent_helper.request(\n f\"https://{server}/register\", headers=headers, json=data, method=\"POST\"\n )\n if r is None:\n continue\n try:\n msg = r.json().get(\"message\", \"\")\n assert \"registration successful\" in msg\n except Exception:\n log.debug(f\"Failed to register with interactsh server {self.server}\")\n continue\n self.server = server\n self.domain = f\"{guid}.{self.server}\"\n break\n\n if not self.server:\n raise InteractshError(f\"Failed to register with an interactsh server\")\n\n log.info(\n f\"Successfully registered to interactsh server {self.server} with correlation_id {self.correlation_id} [{self.domain}]\"\n )\n\n if callable(callback):\n self._poll_task = asyncio.create_task(self.poll_loop(callback))\n\n return self.domain\n\n async def deregister(self):\n \"\"\"\n Deregisters the instance from the interact.sh server and cancels the polling task.\n\n Sends a POST request to the server to deregister, using the correlation ID\n and secret key generated during registration. Optionally, if a polling\n task was started, it is cancelled.\n\n Raises:\n InteractshError: If required information is missing or if deregistration fails.\n\n Examples:\n >>> await interactsh_client.deregister()\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to deregister\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n data = {\"secret-key\": self.secret, \"correlation-id\": self.correlation_id}\n\n r = await self.parent_helper.request(\n f\"https://{self.server}/deregister\", headers=headers, json=data, method=\"POST\"\n )\n\n if self._poll_task is not None:\n self._poll_task.cancel()\n\n if \"success\" not in getattr(r, \"text\", \"\"):\n raise InteractshError(f\"Failed to de-register with interactsh server {self.server}\")\n\n async def poll(self):\n \"\"\"\n Polls the interact.sh server for interactions tied to the current instance.\n\n Sends a GET request to the server to fetch interactions associated with the\n current correlation_id and secret key. Returned interactions are decrypted\n using an AES key provided by the server response.\n\n Raises:\n InteractshError: If required information for polling is missing.\n\n Returns:\n list: A list of decrypted interaction data dictionaries.\n\n Examples:\n >>> data_list = await interactsh_client.poll()\n >>> print(data_list)\n [\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n ...\n },\n ...\n ]\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to poll\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n try:\n r = await self.parent_helper.request(\n f\"https://{self.server}/poll?id={self.correlation_id}&secret={self.secret}\", headers=headers\n )\n if r is None:\n raise InteractshError(\"Error polling interact.sh: No response from server\")\n\n ret = []\n data_list = r.json().get(\"data\", None)\n if data_list:\n aes_key = r.json()[\"aes_key\"]\n\n for data in data_list:\n decrypted_data = self._decrypt(aes_key, data)\n ret.append(decrypted_data)\n return ret\n except Exception as e:\n raise InteractshError(f\"Error polling interact.sh: {e}\")\n\n async def poll_loop(self, callback):\n \"\"\"\n Starts a polling loop to continuously check for interactions with the interact.sh server.\n\n Continuously polls the interact.sh server for interactions tied to the current instance,\n using the `poll` method. When interactions are received, it executes the given callback\n function with each interaction data.\n\n Parameters:\n callback (callable): The function to be called for every interaction received from the server.\n\n Returns:\n awaitable: An awaitable object that executes the internal `_poll_loop` method.\n\n Examples:\n >>> await interactsh_client.poll_loop(my_callback)\n \"\"\"\n async with self.parent_helper.scan._acatch(context=self._poll_loop):\n return await self._poll_loop(callback)\n\n async def _poll_loop(self, callback):\n while 1:\n if self.parent_helper.scan.stopping:\n await asyncio.sleep(1)\n continue\n data_list = []\n try:\n data_list = await self.poll()\n except InteractshError as e:\n log.warning(e)\n log.trace(traceback.format_exc())\n if not data_list:\n await asyncio.sleep(self.poll_interval)\n continue\n for data in data_list:\n if data:\n await self.parent_helper.execute_sync_or_async(callback, data)\n\n def _decrypt(self, aes_key, data):\n \"\"\"\n Decrypts and returns the data received from the interact.sh server.\n\n Uses RSA and AES for decrypting the data. RSA with PKCS1_OAEP and SHA256 is used to decrypt the AES key,\n and then AES (CFB mode) is used to decrypt the actual data payload.\n\n Parameters:\n aes_key (str): The AES key for decryption, encrypted with RSA and base64 encoded.\n data (str): The data payload to decrypt, which is base64 encoded and AES encrypted.\n\n Returns:\n dict: The decrypted data, loaded as a JSON object.\n\n Examples:\n >>> decrypted_data = self._decrypt(aes_key, data)\n \"\"\"\n private_key = RSA.importKey(self.private_key)\n cipher = PKCS1_OAEP.new(private_key, hashAlgo=SHA256)\n aes_plain_key = cipher.decrypt(base64.b64decode(aes_key))\n decode = base64.b64decode(data)\n bs = AES.block_size\n iv = decode[:bs]\n cryptor = AES.new(key=aes_plain_key, mode=AES.MODE_CFB, IV=iv, segment_size=128)\n plain_text = cryptor.decrypt(decode)\n return json.loads(plain_text[16:])\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.deregister","title":"deregister async
","text":"deregister()\n
Deregisters the instance from the interact.sh server and cancels the polling task.
Sends a POST request to the server to deregister, using the correlation ID and secret key generated during registration. Optionally, if a polling task was started, it is cancelled.
Raises:
InteractshError
\u2013 If required information is missing or if deregistration fails.
Examples:
>>> await interactsh_client.deregister()\n
Source code in bbot/core/helpers/interactsh.py
async def deregister(self):\n \"\"\"\n Deregisters the instance from the interact.sh server and cancels the polling task.\n\n Sends a POST request to the server to deregister, using the correlation ID\n and secret key generated during registration. Optionally, if a polling\n task was started, it is cancelled.\n\n Raises:\n InteractshError: If required information is missing or if deregistration fails.\n\n Examples:\n >>> await interactsh_client.deregister()\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to deregister\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n data = {\"secret-key\": self.secret, \"correlation-id\": self.correlation_id}\n\n r = await self.parent_helper.request(\n f\"https://{self.server}/deregister\", headers=headers, json=data, method=\"POST\"\n )\n\n if self._poll_task is not None:\n self._poll_task.cancel()\n\n if \"success\" not in getattr(r, \"text\", \"\"):\n raise InteractshError(f\"Failed to de-register with interactsh server {self.server}\")\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.poll","title":"poll async
","text":"poll()\n
Polls the interact.sh server for interactions tied to the current instance.
Sends a GET request to the server to fetch interactions associated with the current correlation_id and secret key. Returned interactions are decrypted using an AES key provided by the server response.
Raises:
InteractshError
\u2013 If required information for polling is missing.
Returns:
list
\u2013 A list of decrypted interaction data dictionaries.
Examples:
>>> data_list = await interactsh_client.poll()\n>>> print(data_list)\n[\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n ...\n },\n ...\n]\n
Source code in bbot/core/helpers/interactsh.py
async def poll(self):\n \"\"\"\n Polls the interact.sh server for interactions tied to the current instance.\n\n Sends a GET request to the server to fetch interactions associated with the\n current correlation_id and secret key. Returned interactions are decrypted\n using an AES key provided by the server response.\n\n Raises:\n InteractshError: If required information for polling is missing.\n\n Returns:\n list: A list of decrypted interaction data dictionaries.\n\n Examples:\n >>> data_list = await interactsh_client.poll()\n >>> print(data_list)\n [\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n ...\n },\n ...\n ]\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to poll\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n try:\n r = await self.parent_helper.request(\n f\"https://{self.server}/poll?id={self.correlation_id}&secret={self.secret}\", headers=headers\n )\n if r is None:\n raise InteractshError(\"Error polling interact.sh: No response from server\")\n\n ret = []\n data_list = r.json().get(\"data\", None)\n if data_list:\n aes_key = r.json()[\"aes_key\"]\n\n for data in data_list:\n decrypted_data = self._decrypt(aes_key, data)\n ret.append(decrypted_data)\n return ret\n except Exception as e:\n raise InteractshError(f\"Error polling interact.sh: {e}\")\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.poll_loop","title":"poll_loop async
","text":"poll_loop(callback)\n
Starts a polling loop to continuously check for interactions with the interact.sh server.
Continuously polls the interact.sh server for interactions tied to the current instance, using the poll
method. When interactions are received, it executes the given callback function with each interaction data.
Parameters:
callback
(callable
) \u2013 The function to be called for every interaction received from the server.
Returns:
awaitable
\u2013 An awaitable object that executes the internal _poll_loop
method.
Examples:
>>> await interactsh_client.poll_loop(my_callback)\n
Source code in bbot/core/helpers/interactsh.py
async def poll_loop(self, callback):\n \"\"\"\n Starts a polling loop to continuously check for interactions with the interact.sh server.\n\n Continuously polls the interact.sh server for interactions tied to the current instance,\n using the `poll` method. When interactions are received, it executes the given callback\n function with each interaction data.\n\n Parameters:\n callback (callable): The function to be called for every interaction received from the server.\n\n Returns:\n awaitable: An awaitable object that executes the internal `_poll_loop` method.\n\n Examples:\n >>> await interactsh_client.poll_loop(my_callback)\n \"\"\"\n async with self.parent_helper.scan._acatch(context=self._poll_loop):\n return await self._poll_loop(callback)\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.register","title":"register async
","text":"register(callback=None)\n
Registers the instance with an interact.sh server and sets up polling.
Generates RSA keys for secure communication, builds a correlation ID, and sends a POST request to an interact.sh server to register. Optionally, starts an asynchronous polling task to listen for interactions.
Parameters:
callback
(callable
, default: None
) \u2013 A function to be called each time new interactions are received.
Returns:
str
\u2013 The registered domain for out-of-band interactions.
Raises:
InteractshError
\u2013 If registration with an interact.sh server fails.
Examples:
>>> interactsh_client = self.helpers.interactsh()\n>>> registered_domain = await interactsh_client.register()\n[INFO] Registering with interact.sh server: oast.me\n[INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n
Source code in bbot/core/helpers/interactsh.py
async def register(self, callback=None):\n \"\"\"\n Registers the instance with an interact.sh server and sets up polling.\n\n Generates RSA keys for secure communication, builds a correlation ID,\n and sends a POST request to an interact.sh server to register. Optionally,\n starts an asynchronous polling task to listen for interactions.\n\n Args:\n callback (callable, optional): A function to be called each time new interactions are received.\n\n Returns:\n str: The registered domain for out-of-band interactions.\n\n Raises:\n InteractshError: If registration with an interact.sh server fails.\n\n Examples:\n >>> interactsh_client = self.helpers.interactsh()\n >>> registered_domain = await interactsh_client.register()\n [INFO] Registering with interact.sh server: oast.me\n [INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n \"\"\"\n rsa = RSA.generate(1024)\n\n self.public_key = rsa.publickey().exportKey()\n self.private_key = rsa.exportKey()\n\n encoded_public_key = base64.b64encode(self.public_key).decode(\"utf8\")\n\n uuid = uuid4().hex.ljust(33, \"a\")\n guid = \"\".join(i if i.isdigit() else chr(ord(i) + random.randint(0, 20)) for i in uuid)\n\n self.correlation_id = guid[:20]\n self.secret = str(uuid4())\n headers = {}\n\n if self.custom_server:\n if not self.token:\n log.verbose(\"Interact.sh token is not set\")\n else:\n headers[\"Authorization\"] = self.token\n self.server_list = [str(self.custom_server)]\n else:\n self.server_list = random.sample(server_list, k=len(server_list))\n for server in self.server_list:\n log.info(f\"Registering with interact.sh server: {server}\")\n data = {\n \"public-key\": encoded_public_key,\n \"secret-key\": self.secret,\n \"correlation-id\": self.correlation_id,\n }\n r = await self.parent_helper.request(\n f\"https://{server}/register\", headers=headers, json=data, method=\"POST\"\n )\n if r is None:\n continue\n try:\n msg = r.json().get(\"message\", \"\")\n assert \"registration successful\" in msg\n except Exception:\n log.debug(f\"Failed to register with interactsh server {self.server}\")\n continue\n self.server = server\n self.domain = f\"{guid}.{self.server}\"\n break\n\n if not self.server:\n raise InteractshError(f\"Failed to register with an interactsh server\")\n\n log.info(\n f\"Successfully registered to interactsh server {self.server} with correlation_id {self.correlation_id} [{self.domain}]\"\n )\n\n if callable(callback):\n self._poll_task = asyncio.create_task(self.poll_loop(callback))\n\n return self.domain\n
"},{"location":"dev/helpers/misc/","title":"Misc Helpers","text":"These are miscellaneous helpers, used throughout BBOT and its modules for simple tasks such as parsing domains, ports, urls, etc.
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.as_completed","title":"as_completedasync
","text":"as_completed(coros)\n
Async generator that yields completed Tasks as they are completed.
Parameters:
coros
(iterable
) \u2013 An iterable of coroutine objects or asyncio Tasks.
Yields:
asyncio.Task: A Task object that has completed its execution.
Examples:
>>> async def main():\n... async for task in as_completed([coro1(), coro2(), coro3()]):\n... result = task.result()\n... print(f'Task completed with result: {result}')\n
>>> asyncio.run(main())\n
Source code in bbot/core/helpers/misc.py
async def as_completed(coros):\n \"\"\"\n Async generator that yields completed Tasks as they are completed.\n\n Args:\n coros (iterable): An iterable of coroutine objects or asyncio Tasks.\n\n Yields:\n asyncio.Task: A Task object that has completed its execution.\n\n Examples:\n >>> async def main():\n ... async for task in as_completed([coro1(), coro2(), coro3()]):\n ... result = task.result()\n ... print(f'Task completed with result: {result}')\n\n >>> asyncio.run(main())\n \"\"\"\n tasks = {coro if isinstance(coro, asyncio.Task) else asyncio.create_task(coro): coro for coro in coros}\n while tasks:\n done, _ = await asyncio.wait(tasks.keys(), return_when=asyncio.FIRST_COMPLETED)\n for task in done:\n tasks.pop(task)\n yield task\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.backup_file","title":"backup_file","text":"backup_file(filename, max_backups=10)\n
Renames a file by appending an iteration number as a backup. Recursively renames files up to a specified maximum number of backups.
Parameters:
filename
(str or Path
) \u2013 The file to backup.
max_backups
(int
, default: 10
) \u2013 The maximum number of backups to keep. Defaults to 10.
Returns:
pathlib.Path: The new backup filepath.
Examples:
>>> backup_file(\"/tmp/test.txt\")\nPosixPath(\"/tmp/test.0.txt\")\n>>> backup_file(\"/tmp/test.0.txt\")\nPosixPath(\"/tmp/test.1.txt\")\n>>> backup_file(\"/tmp/test.1.txt\")\nPosixPath(\"/tmp/test.2.txt\")\n
Source code in bbot/core/helpers/misc.py
def backup_file(filename, max_backups=10):\n \"\"\"\n Renames a file by appending an iteration number as a backup. Recursively renames\n files up to a specified maximum number of backups.\n\n Args:\n filename (str or pathlib.Path): The file to backup.\n max_backups (int, optional): The maximum number of backups to keep. Defaults to 10.\n\n Returns:\n pathlib.Path: The new backup filepath.\n\n Examples:\n >>> backup_file(\"/tmp/test.txt\")\n PosixPath(\"/tmp/test.0.txt\")\n >>> backup_file(\"/tmp/test.0.txt\")\n PosixPath(\"/tmp/test.1.txt\")\n >>> backup_file(\"/tmp/test.1.txt\")\n PosixPath(\"/tmp/test.2.txt\")\n \"\"\"\n filename = Path(filename).resolve()\n suffixes = [s.strip(\".\") for s in filename.suffixes]\n iteration = 1\n with suppress(Exception):\n iteration = min(max_backups - 1, max(0, int(suffixes[0]))) + 1\n suffixes = suffixes[1:]\n stem = filename.stem.split(\".\")[0]\n destination = filename.parent / f\"{stem}.{iteration}.{'.'.join(suffixes)}\"\n if destination.exists() and iteration < max_backups:\n backup_file(destination)\n if filename.exists():\n filename.rename(destination)\n return destination\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.best_http_status","title":"best_http_status","text":"best_http_status(code1, code2)\n
Determine the better HTTP status code between two given codes.
The 'better' status code is considered based on typical usage and priority in HTTP communication. Lower codes are generally better than higher codes. Within the same class (e.g., 2xx), a lower code is better. Between different classes, the order of preference is 2xx > 3xx > 1xx > 4xx > 5xx.
Parameters:
code1
(int
) \u2013 The first HTTP status code.
code2
(int
) \u2013 The second HTTP status code.
Returns:
int
\u2013 The better HTTP status code between the two provided codes.
Examples:
>>> better_http_status(200, 404)\n200\n>>> better_http_status(500, 400)\n400\n>>> better_http_status(301, 302)\n301\n
Source code in bbot/core/helpers/misc.py
def best_http_status(code1, code2):\n \"\"\"\n Determine the better HTTP status code between two given codes.\n\n The 'better' status code is considered based on typical usage and priority in HTTP communication.\n Lower codes are generally better than higher codes. Within the same class (e.g., 2xx), a lower code is better.\n Between different classes, the order of preference is 2xx > 3xx > 1xx > 4xx > 5xx.\n\n Args:\n code1 (int): The first HTTP status code.\n code2 (int): The second HTTP status code.\n\n Returns:\n int: The better HTTP status code between the two provided codes.\n\n Examples:\n >>> better_http_status(200, 404)\n 200\n >>> better_http_status(500, 400)\n 400\n >>> better_http_status(301, 302)\n 301\n \"\"\"\n\n # Classify the codes into their respective categories (1xx, 2xx, 3xx, 4xx, 5xx)\n def classify_code(code):\n return int(code) // 100\n\n class1 = classify_code(code1)\n class2 = classify_code(code2)\n\n # Priority order for classes\n priority_order = {2: 1, 3: 2, 1: 3, 4: 4, 5: 5}\n\n # Compare based on class priority\n p1 = priority_order.get(class1, 10)\n p2 = priority_order.get(class2, 10)\n if p1 != p2:\n return code1 if p1 < p2 else code2\n\n # If in the same class, the lower code is better\n return min(code1, code2)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.bytes_to_human","title":"bytes_to_human","text":"bytes_to_human(_bytes)\n
Convert a bytes size to a human-readable string.
This function converts a numeric bytes value into a human-readable string format, complete with the appropriate unit symbol (B, KB, MB, GB, etc.).
Parameters:
_bytes
(int
) \u2013 The number of bytes to convert.
Returns:
str
\u2013 A string representing the number of bytes in a more readable format, rounded to two decimal places.
Examples:
>>> bytes_to_human(1234129384)\n'1.15GB'\n
Source code in bbot/core/helpers/misc.py
def bytes_to_human(_bytes):\n \"\"\"Convert a bytes size to a human-readable string.\n\n This function converts a numeric bytes value into a human-readable string format, complete\n with the appropriate unit symbol (B, KB, MB, GB, etc.).\n\n Args:\n _bytes (int): The number of bytes to convert.\n\n Returns:\n str: A string representing the number of bytes in a more readable format, rounded to two\n decimal places.\n\n Examples:\n >>> bytes_to_human(1234129384)\n '1.15GB'\n \"\"\"\n sizes = [\"B\", \"KB\", \"MB\", \"GB\", \"TB\", \"PB\", \"EB\", \"ZB\"]\n units = {}\n for count, size in enumerate(sizes):\n units[size] = pow(1024, count)\n for size in sizes:\n if abs(_bytes) < 1024.0:\n if size == sizes[0]:\n _bytes = str(int(_bytes))\n else:\n _bytes = f\"{_bytes:.2f}\"\n return f\"{_bytes}{size}\"\n _bytes /= 1024\n raise ValueError(f'Unable to convert \"{_bytes}\" to human filesize')\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.can_sudo_without_password","title":"can_sudo_without_password","text":"can_sudo_without_password()\n
Check if the current user has passwordless sudo access.
This function checks whether the current user can use sudo without entering a password. It runs a command with sudo and checks the return code to determine this.
Returns:
bool
\u2013 True if the current user can use sudo without a password, False otherwise.
Examples:
>>> can_sudo_without_password()\nTrue\n
Source code in bbot/core/helpers/misc.py
def can_sudo_without_password():\n \"\"\"Check if the current user has passwordless sudo access.\n\n This function checks whether the current user can use sudo without entering a password.\n It runs a command with sudo and checks the return code to determine this.\n\n Returns:\n bool: True if the current user can use sudo without a password, False otherwise.\n\n Examples:\n >>> can_sudo_without_password()\n True\n \"\"\"\n if os.geteuid() != 0:\n env = dict(os.environ)\n env[\"SUDO_ASKPASS\"] = \"/bin/false\"\n try:\n sp.run([\"sudo\", \"-K\"], stderr=sp.DEVNULL, stdout=sp.DEVNULL, check=True, env=env)\n sp.run([\"sudo\", \"-An\", \"/bin/true\"], stderr=sp.DEVNULL, stdout=sp.DEVNULL, check=True, env=env)\n except sp.CalledProcessError:\n return False\n return True\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cancel_tasks","title":"cancel_tasks async
","text":"cancel_tasks(tasks, ignore_errors=True)\n
Asynchronously cancels a list of asyncio tasks.
Parameters:
tasks
(list[Task]
) \u2013 A list of asyncio Task objects to cancel.
ignore_errors
(bool
, default: True
) \u2013 Whether to ignore errors other than asyncio.CancelledError. Defaults to True.
Examples:
>>> async def main():\n... task1 = asyncio.create_task(async_function1())\n... task2 = asyncio.create_task(async_function2())\n... await cancel_tasks([task1, task2])\n...\n>>> asyncio.run(main())\n
Note This function will not cancel the current task that it is called from.
Source code inbbot/core/helpers/misc.py
async def cancel_tasks(tasks, ignore_errors=True):\n \"\"\"\n Asynchronously cancels a list of asyncio tasks.\n\n Args:\n tasks (list[Task]): A list of asyncio Task objects to cancel.\n ignore_errors (bool, optional): Whether to ignore errors other than asyncio.CancelledError. Defaults to True.\n\n Examples:\n >>> async def main():\n ... task1 = asyncio.create_task(async_function1())\n ... task2 = asyncio.create_task(async_function2())\n ... await cancel_tasks([task1, task2])\n ...\n >>> asyncio.run(main())\n\n Note:\n This function will not cancel the current task that it is called from.\n \"\"\"\n current_task = asyncio.current_task()\n tasks = [t for t in tasks if t != current_task]\n for task in tasks:\n # log.debug(f\"Cancelling task: {task}\")\n task.cancel()\n if ignore_errors:\n for task in tasks:\n try:\n await task\n except BaseException as e:\n if not isinstance(e, asyncio.CancelledError):\n import traceback\n\n log.trace(traceback.format_exc())\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cancel_tasks_sync","title":"cancel_tasks_sync","text":"cancel_tasks_sync(tasks)\n
Synchronously cancels a list of asyncio tasks.
Parameters:
tasks
(list[Task]
) \u2013 A list of asyncio Task objects to cancel.
Examples:
>>> loop = asyncio.get_event_loop()\n>>> task1 = loop.create_task(some_async_function1())\n>>> task2 = loop.create_task(some_async_function2())\n>>> cancel_tasks_sync([task1, task2])\n
Note This function will not cancel the current task from which it is called.
Source code inbbot/core/helpers/misc.py
def cancel_tasks_sync(tasks):\n \"\"\"\n Synchronously cancels a list of asyncio tasks.\n\n Args:\n tasks (list[Task]): A list of asyncio Task objects to cancel.\n\n Examples:\n >>> loop = asyncio.get_event_loop()\n >>> task1 = loop.create_task(some_async_function1())\n >>> task2 = loop.create_task(some_async_function2())\n >>> cancel_tasks_sync([task1, task2])\n\n Note:\n This function will not cancel the current task from which it is called.\n \"\"\"\n current_task = asyncio.current_task()\n for task in tasks:\n if task != current_task:\n # log.debug(f\"Cancelling task: {task}\")\n task.cancel()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.chain_lists","title":"chain_lists","text":"chain_lists(l, try_files=False, msg=None, remove_blank=True, validate=False, validate_chars='<>:\"/\\\\|?*)')\n
Chains together list elements, allowing for entries separated by commas.
This function takes a list l
and flattens it by splitting its entries on commas. It also allows you to optionally open entries as files and add their contents to the list.
The order of entries is preserved, and deduplication is performed automatically.
Parameters:
l
(list
) \u2013 The list of strings to chain together.
try_files
(bool
, default: False
) \u2013 Whether to try to open entries as files. Defaults to False.
msg
(str
, default: None
) \u2013 An optional message to log when reading from a file. Defaults to None.
remove_blank
(bool
, default: True
) \u2013 Whether to remove blank entries from the list. Defaults to True.
validate
(bool
, default: False
) \u2013 Whether to perform validation for undesirable characters. Defaults to False.
validate_chars
(str
, default: '<>:\"/\\\\|?*)'
) \u2013 When performing validation, what additional set of characters to block (blocks non-printable ascii automatically). Defaults to '<>:\"/|?*)'
Returns:
list
\u2013 The list of chained elements.
Raises:
ValueError
\u2013 If the input string contains invalid characters, when enabled (off by default).
Examples:
>>> chain_lists([\"a\", \"b,c,d\"])\n['a', 'b', 'c', 'd']\n
>>> chain_lists([\"a,file.txt\", \"c,d\"], try_files=True)\n['a', 'f_line1', 'f_line2', 'f_line3', 'c', 'd']\n
Source code in bbot/core/helpers/misc.py
def chain_lists(\n l,\n try_files=False,\n msg=None,\n remove_blank=True,\n validate=False,\n validate_chars='<>:\"/\\\\|?*)',\n):\n \"\"\"Chains together list elements, allowing for entries separated by commas.\n\n This function takes a list `l` and flattens it by splitting its entries on commas.\n It also allows you to optionally open entries as files and add their contents to the list.\n\n The order of entries is preserved, and deduplication is performed automatically.\n\n Args:\n l (list): The list of strings to chain together.\n try_files (bool, optional): Whether to try to open entries as files. Defaults to False.\n msg (str, optional): An optional message to log when reading from a file. Defaults to None.\n remove_blank (bool, optional): Whether to remove blank entries from the list. Defaults to True.\n validate (bool, optional): Whether to perform validation for undesirable characters. Defaults to False.\n validate_chars (str, optional): When performing validation, what additional set of characters to block (blocks non-printable ascii automatically). Defaults to '<>:\"/\\\\|?*)'\n\n Returns:\n list: The list of chained elements.\n\n Raises:\n ValueError: If the input string contains invalid characters, when enabled (off by default).\n\n Examples:\n >>> chain_lists([\"a\", \"b,c,d\"])\n ['a', 'b', 'c', 'd']\n\n >>> chain_lists([\"a,file.txt\", \"c,d\"], try_files=True)\n ['a', 'f_line1', 'f_line2', 'f_line3', 'c', 'd']\n \"\"\"\n if isinstance(l, str):\n l = [l]\n final_list = dict()\n for entry in l:\n for s in split_regex.split(entry):\n f = s.strip()\n if validate:\n if any((c in validate_chars) or (ord(c) < 32 and c != \" \") for c in f):\n raise ValueError(f\"Invalid character in string: {f}\")\n f_path = Path(f).resolve()\n if try_files and f_path.is_file():\n if msg is not None:\n new_msg = str(msg).format(filename=f_path)\n log.info(new_msg)\n for line in str_or_file(f):\n final_list[line] = None\n else:\n final_list[f] = None\n\n ret = list(final_list)\n if remove_blank:\n ret = [r for r in ret if r]\n return ret\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.clean_dict","title":"clean_dict","text":"clean_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None)\n
Recursively clean unwanted keys from a dictionary. Useful for removing secrets from a config.
Parameters:
d
(dict
) \u2013 The input dictionary.
*key_names
\u2013 Names of keys to remove.
fuzzy
(bool
, default: False
) \u2013 Whether to perform fuzzy matching on keys.
exclude_keys
((list, None)
, default: None
) \u2013 List of keys to be excluded from removal.
_prev_key
((str, None)
, default: None
) \u2013 For internal recursive use; the previous key in the hierarchy.
Returns:
dict
\u2013 A dictionary cleaned of the keys specified in key_names.
bbot/core/helpers/misc.py
def clean_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):\n \"\"\"\n Recursively clean unwanted keys from a dictionary.\n Useful for removing secrets from a config.\n\n Args:\n d (dict): The input dictionary.\n *key_names: Names of keys to remove.\n fuzzy (bool): Whether to perform fuzzy matching on keys.\n exclude_keys (list, None): List of keys to be excluded from removal.\n _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.\n\n Returns:\n dict: A dictionary cleaned of the keys specified in key_names.\n\n \"\"\"\n if exclude_keys is None:\n exclude_keys = []\n if isinstance(exclude_keys, str):\n exclude_keys = [exclude_keys]\n d = copy.deepcopy(d)\n if isinstance(d, dict):\n for key, val in list(d.items()):\n if key in key_names or (fuzzy and any(k in key for k in key_names)):\n if _prev_key not in exclude_keys:\n d.pop(key)\n continue\n d[key] = clean_dict(val, *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)\n return d\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.clean_dns_record","title":"clean_dns_record","text":"clean_dns_record(record)\n
Cleans and formats a given DNS record for further processing.
This static method converts the DNS record to text format if it's not already a string. It also removes any trailing dots and converts the record to lowercase.
Parameters:
record
(str or Rdata
) \u2013 The DNS record to clean.
Returns:
str
\u2013 The cleaned and formatted DNS record.
Examples:
>>> clean_dns_record('www.evilcorp.com.')\n'www.evilcorp.com'\n
>>> from dns.rrset import from_text\n>>> record = from_text('www.evilcorp.com', 3600, 'IN', 'A', '1.2.3.4')[0]\n>>> clean_dns_record(record)\n'1.2.3.4'\n
Source code in bbot/core/helpers/misc.py
def clean_dns_record(record):\n \"\"\"\n Cleans and formats a given DNS record for further processing.\n\n This static method converts the DNS record to text format if it's not already a string.\n It also removes any trailing dots and converts the record to lowercase.\n\n Args:\n record (str or dns.rdata.Rdata): The DNS record to clean.\n\n Returns:\n str: The cleaned and formatted DNS record.\n\n Examples:\n >>> clean_dns_record('www.evilcorp.com.')\n 'www.evilcorp.com'\n\n >>> from dns.rrset import from_text\n >>> record = from_text('www.evilcorp.com', 3600, 'IN', 'A', '1.2.3.4')[0]\n >>> clean_dns_record(record)\n '1.2.3.4'\n \"\"\"\n if not isinstance(record, str):\n record = str(record.to_text())\n return str(record).rstrip(\".\").lower()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.clean_old","title":"clean_old","text":"clean_old(d, keep=10, filter=lambda x: True, key=latest_mtime, reverse=True, raise_error=False)\n
Clean up old files and directories within a given directory based on various filtering and sorting options.
This function removes the oldest files and directories in the provided directory 'd' that exceed a specified threshold ('keep'). The items to be deleted can be filtered using a lambda function 'filter', and they are sorted by a key function, defaulting to latest modification time.
Parameters:
d
(str or Path
) \u2013 The directory path to clean up.
keep
(int
, default: 10
) \u2013 The number of items to keep. Ones beyond this count will be removed.
filter
(Callable
, default: lambda x: True
) \u2013 A lambda function for filtering which files or directories to consider. Defaults to a lambda function that returns True for all.
key
(Callable
, default: latest_mtime
) \u2013 A function to sort the files and directories. Defaults to latest modification time.
reverse
(bool
, default: True
) \u2013 Whether to reverse the order of sorted items before removing. Defaults to True.
raise_error
(bool
, default: False
) \u2013 Whether to raise an error if directory deletion fails. Defaults to False.
Examples:
>>> clean_old(\"~/.bbot/scans\", filter=lambda x: x.is_dir() and scan_name_regex.match(x.name))\n
Source code in bbot/core/helpers/misc.py
def clean_old(d, keep=10, filter=lambda x: True, key=latest_mtime, reverse=True, raise_error=False):\n \"\"\"Clean up old files and directories within a given directory based on various filtering and sorting options.\n\n This function removes the oldest files and directories in the provided directory 'd' that exceed a specified\n threshold ('keep'). The items to be deleted can be filtered using a lambda function 'filter', and they are\n sorted by a key function, defaulting to latest modification time.\n\n Args:\n d (str or Path): The directory path to clean up.\n keep (int): The number of items to keep. Ones beyond this count will be removed.\n filter (Callable): A lambda function for filtering which files or directories to consider.\n Defaults to a lambda function that returns True for all.\n key (Callable): A function to sort the files and directories. Defaults to latest modification time.\n reverse (bool): Whether to reverse the order of sorted items before removing. Defaults to True.\n raise_error (bool): Whether to raise an error if directory deletion fails. Defaults to False.\n\n Examples:\n >>> clean_old(\"~/.bbot/scans\", filter=lambda x: x.is_dir() and scan_name_regex.match(x.name))\n \"\"\"\n d = Path(d)\n if not d.is_dir():\n return\n paths = [x for x in d.iterdir() if filter(x)]\n paths.sort(key=key, reverse=reverse)\n for path in paths[keep:]:\n try:\n log.debug(f\"Removing {path}\")\n rm_rf(path)\n except Exception as e:\n msg = f\"Failed to delete directory: {path}, {e}\"\n if raise_error:\n raise errors.DirectoryDeletionError()\n log.warning(msg)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.closest_match","title":"closest_match","text":"closest_match(s, choices, n=1, cutoff=0.0)\n
Finds the closest matching strings from a list of choices based on a given string.
This function uses the difflib library to find the closest matches to a given string s
from a list of choices
. It can return either the single best match or a list of the top n
best matches.
Parameters:
s
(str
) \u2013 The string for which to find the closest match.
choices
(list
) \u2013 A list of strings to compare against.
n
(int
, default: 1
) \u2013 The number of best matches to return. Defaults to 1.
cutoff
(float
, default: 0.0
) \u2013 A float value that defines the similarity threshold. Strings with similarity below this value are not considered. Defaults to 0.0.
Returns:
str or list: Either the closest matching string or a list of the n
closest matching strings.
Examples:
>>> closest_match(\"asdf\", [\"asd\", \"fds\"])\n'asd'\n>>> closest_match(\"asdf\", [\"asd\", \"fds\", \"asdff\"], n=3)\n['asdff', 'asd', 'fds']\n
Source code in bbot/core/helpers/misc.py
def closest_match(s, choices, n=1, cutoff=0.0):\n \"\"\"Finds the closest matching strings from a list of choices based on a given string.\n\n This function uses the difflib library to find the closest matches to a given string `s` from a list of `choices`.\n It can return either the single best match or a list of the top `n` best matches.\n\n Args:\n s (str): The string for which to find the closest match.\n choices (list): A list of strings to compare against.\n n (int, optional): The number of best matches to return. Defaults to 1.\n cutoff (float, optional): A float value that defines the similarity threshold. Strings with similarity below this value are not considered. Defaults to 0.0.\n\n Returns:\n str or list: Either the closest matching string or a list of the `n` closest matching strings.\n\n Examples:\n >>> closest_match(\"asdf\", [\"asd\", \"fds\"])\n 'asd'\n >>> closest_match(\"asdf\", [\"asd\", \"fds\", \"asdff\"], n=3)\n ['asdff', 'asd', 'fds']\n \"\"\"\n import difflib\n\n matches = difflib.get_close_matches(s, choices, n=n, cutoff=cutoff)\n if not choices or not matches:\n return\n if n == 1:\n return matches[0]\n return matches\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cloudcheck","title":"cloudcheck","text":"cloudcheck(ip)\n
Check whether an IP address belongs to a cloud provider and returns the provider name, type, and subnet.
Parameters:
ip
(str
) \u2013 The IP address to check.
Returns:
tuple
\u2013 A tuple containing provider name (str), provider type (str), and subnet (IPv4Network).
Examples:
>>> cloudcheck(\"168.62.20.37\")\n('Azure', 'cloud', IPv4Network('168.62.0.0/19'))\n
Source code in bbot/core/helpers/misc.py
def cloudcheck(ip):\n \"\"\"\n Check whether an IP address belongs to a cloud provider and returns the provider name, type, and subnet.\n\n Args:\n ip (str): The IP address to check.\n\n Returns:\n tuple: A tuple containing provider name (str), provider type (str), and subnet (IPv4Network).\n\n Examples:\n >>> cloudcheck(\"168.62.20.37\")\n ('Azure', 'cloud', IPv4Network('168.62.0.0/19'))\n \"\"\"\n import cloudcheck as _cloudcheck\n\n return _cloudcheck.check(ip)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cpu_architecture","title":"cpu_architecture","text":"cpu_architecture()\n
Return the CPU architecture of the current system.
This function fetches and returns the architecture type of the CPU where the code is being executed. It maps common identifiers like \"x86_64\" to more general types like \"amd64\".
Returns:
str
\u2013 A string representing the CPU architecture, such as \"amd64\", \"armv7\", or \"arm64\".
Examples:
>>> cpu_architecture()\n'amd64'\n
Source code in bbot/core/helpers/misc.py
def cpu_architecture():\n \"\"\"Return the CPU architecture of the current system.\n\n This function fetches and returns the architecture type of the CPU where the code is being executed.\n It maps common identifiers like \"x86_64\" to more general types like \"amd64\".\n\n Returns:\n str: A string representing the CPU architecture, such as \"amd64\", \"armv7\", or \"arm64\".\n\n Examples:\n >>> cpu_architecture()\n 'amd64'\n \"\"\"\n import platform\n\n uname = platform.uname()\n arch = uname.machine.lower()\n if arch.startswith(\"aarch\"):\n return \"arm64\"\n elif arch == \"x86_64\":\n return \"amd64\"\n return arch\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.delete_file","title":"delete_file","text":"delete_file(path)\n
Deletes a file at the given path.
Parameters:
path
(str or Path
) \u2013 The path to the file to be deleted.
This function suppresses all exceptions to ensure that the program continues running even if the file could not be deleted.
Examples:
>>> delete_file(\"/tmp/test/file1.txt\")\n
Source code in bbot/core/helpers/misc.py
def delete_file(path):\n \"\"\"Deletes a file at the given path.\n\n Args:\n path (str or Path): The path to the file to be deleted.\n\n Note:\n This function suppresses all exceptions to ensure that the program continues running even if the file could not be deleted.\n\n Examples:\n >>> delete_file(\"/tmp/test/file1.txt\")\n \"\"\"\n with suppress(Exception):\n Path(path).unlink(missing_ok=True)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.domain_parents","title":"domain_parents","text":"domain_parents(d, include_self=False)\n
Generate a list of parent domains for a given domain string.
This function takes an input string d
and generates a list of parent domains in decreasing order of specificity. If include_self
is set to True, the list will also include the input domain if it is not a top-level domain.
Parameters:
d
(str
) \u2013 The input string representing a domain or subdomain.
include_self
(bool
, default: False
) \u2013 Whether to include the input domain itself. Defaults to False.
Yields:
str
\u2013 Parent domains of the input string in decreasing order of specificity.
Examples:
>>> list(domain_parents(\"test.www.evilcorp.co.uk\"))\n[\"www.evilcorp.co.uk\", \"evilcorp.co.uk\"]\n
Notes bbot/core/helpers/misc.py
def domain_parents(d, include_self=False):\n \"\"\"\n Generate a list of parent domains for a given domain string.\n\n This function takes an input string `d` and generates a list of parent domains in decreasing order of specificity.\n If `include_self` is set to True, the list will also include the input domain if it is not a top-level domain.\n\n Args:\n d (str): The input string representing a domain or subdomain.\n include_self (bool, optional): Whether to include the input domain itself. Defaults to False.\n\n Yields:\n str: Parent domains of the input string in decreasing order of specificity.\n\n Examples:\n >>> list(domain_parents(\"test.www.evilcorp.co.uk\"))\n [\"www.evilcorp.co.uk\", \"evilcorp.co.uk\"]\n\n Notes:\n - Port, if present in input, is preserved in the output.\n \"\"\"\n\n parent = str(d)\n if include_self and not is_domain(parent):\n yield parent\n while 1:\n parent = parent_domain(parent)\n if is_subdomain(parent):\n yield parent\n continue\n elif is_domain(parent):\n yield parent\n break\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.domain_stem","title":"domain_stem","text":"domain_stem(domain)\n
Returns an abbreviated representation of the hostname by removing the TLD (Top-Level Domain).
Parameters:
domain
(str
) \u2013 The full domain name to be abbreviated.
Returns:
str
\u2013 An abbreviated domain string without the TLD.
Examples:
>>> domain_stem(\"www.evilcorp.com\")\n\"www.evilcorp\"\n
Notes tldextract
function for domain parsing.bbot/core/helpers/misc.py
def domain_stem(domain):\n \"\"\"\n Returns an abbreviated representation of the hostname by removing the TLD (Top-Level Domain).\n\n Args:\n domain (str): The full domain name to be abbreviated.\n\n Returns:\n str: An abbreviated domain string without the TLD.\n\n Examples:\n >>> domain_stem(\"www.evilcorp.com\")\n \"www.evilcorp\"\n\n Notes:\n - Utilizes the `tldextract` function for domain parsing.\n \"\"\"\n parsed = tldextract(str(domain))\n return f\".\".join(parsed.subdomain.split(\".\") + parsed.domain.split(\".\")).strip(\".\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.execute_sync_or_async","title":"execute_sync_or_async async
","text":"execute_sync_or_async(callback, *args, **kwargs)\n
Execute a function or coroutine, handling either synchronous or asynchronous invocation.
Parameters:
callback
(Union[Callable, Coroutine]
) \u2013 The function or coroutine to execute.
*args
\u2013 Variable-length argument list to pass to the callback.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the callback.
Returns:
Any
\u2013 The return value from the executed function or coroutine.
Examples:
>>> async def foo_async(x):\n... return x + 1\n>>> def foo_sync(x):\n... return x + 1\n
>>> asyncio.run(execute_sync_or_async(foo_async, 1))\n2\n
>>> asyncio.run(execute_sync_or_async(foo_sync, 1))\n2\n
Source code in bbot/core/helpers/misc.py
async def execute_sync_or_async(callback, *args, **kwargs):\n \"\"\"\n Execute a function or coroutine, handling either synchronous or asynchronous invocation.\n\n Args:\n callback (Union[Callable, Coroutine]): The function or coroutine to execute.\n *args: Variable-length argument list to pass to the callback.\n **kwargs: Arbitrary keyword arguments to pass to the callback.\n\n Returns:\n Any: The return value from the executed function or coroutine.\n\n Examples:\n >>> async def foo_async(x):\n ... return x + 1\n >>> def foo_sync(x):\n ... return x + 1\n\n >>> asyncio.run(execute_sync_or_async(foo_async, 1))\n 2\n\n >>> asyncio.run(execute_sync_or_async(foo_sync, 1))\n 2\n \"\"\"\n if is_async_function(callback):\n return await callback(*args, **kwargs)\n else:\n return callback(*args, **kwargs)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_emails","title":"extract_emails","text":"extract_emails(s)\n
Extract email addresses from a body of text
This function takes in a string and yields all email addresses found in it. The emails are converted to lower case before yielding. It utilizes regular expressions for email pattern matching.
Parameters:
s
(str
) \u2013 The input string from which to extract email addresses.
Yields:
str
\u2013 Yields email addresses found in the input string, in lower case.
Examples:
>>> list(extract_emails(\"Contact us at info@evilcorp.com and support@evilcorp.com\"))\n['info@evilcorp.com', 'support@evilcorp.com']\n
Source code in bbot/core/helpers/misc.py
def extract_emails(s):\n \"\"\"\n Extract email addresses from a body of text\n\n This function takes in a string and yields all email addresses found in it.\n The emails are converted to lower case before yielding. It utilizes\n regular expressions for email pattern matching.\n\n Args:\n s (str): The input string from which to extract email addresses.\n\n Yields:\n str: Yields email addresses found in the input string, in lower case.\n\n Examples:\n >>> list(extract_emails(\"Contact us at info@evilcorp.com and support@evilcorp.com\"))\n ['info@evilcorp.com', 'support@evilcorp.com']\n \"\"\"\n for email in bbot_regexes.email_regex.findall(smart_decode(s)):\n yield email.lower()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_host","title":"extract_host","text":"extract_host(s)\n
Attempts to find and extract the host portion of a string.
Parameters:
s
(str
) \u2013 The string from which to extract the host.
Returns:
tuple
\u2013 A tuple containing three strings: (hostname (None if not found), string_before_hostname, string_after_hostname).
Examples:
>>> extract_host(\"evilcorp.com:80\")\n(\"evilcorp.com\", \"\", \":80\")\n
>>> extract_host(\"http://evilcorp.com:80/asdf.php?a=b\")\n(\"evilcorp.com\", \"http://\", \":80/asdf.php?a=b\")\n
>>> extract_host(\"bob@evilcorp.com\")\n(\"evilcorp.com\", \"bob@\", \"\")\n
>>> extract_host(\"[dead::beef]:22\")\n(\"dead::beef\", \"[\", \"]:22\")\n
>>> extract_host(\"ftp://username:password@my-ftp.com/my-file.csv\")\n(\n \"my-ftp.com\",\n \"ftp://username:password@\",\n \"/my-file.csv\",\n)\n
Source code in bbot/core/helpers/misc.py
def extract_host(s):\n \"\"\"\n Attempts to find and extract the host portion of a string.\n\n Args:\n s (str): The string from which to extract the host.\n\n Returns:\n tuple: A tuple containing three strings:\n (hostname (None if not found), string_before_hostname, string_after_hostname).\n\n Examples:\n >>> extract_host(\"evilcorp.com:80\")\n (\"evilcorp.com\", \"\", \":80\")\n\n >>> extract_host(\"http://evilcorp.com:80/asdf.php?a=b\")\n (\"evilcorp.com\", \"http://\", \":80/asdf.php?a=b\")\n\n >>> extract_host(\"bob@evilcorp.com\")\n (\"evilcorp.com\", \"bob@\", \"\")\n\n >>> extract_host(\"[dead::beef]:22\")\n (\"dead::beef\", \"[\", \"]:22\")\n\n >>> extract_host(\"ftp://username:password@my-ftp.com/my-file.csv\")\n (\n \"my-ftp.com\",\n \"ftp://username:password@\",\n \"/my-file.csv\",\n )\n \"\"\"\n s = smart_decode(s)\n match = bbot_regexes.extract_host_regex.search(s)\n\n if match:\n hostname = match.group(1)\n before = s[: match.start(1)]\n after = s[match.end(1) :]\n host, port = split_host_port(hostname)\n netloc = make_netloc(host, port)\n if netloc != hostname:\n # invalid host / port\n return (None, s, \"\")\n if host is not None:\n if port is not None:\n after = f\":{port}{after}\"\n if is_ip(host, version=6) and hostname.startswith(\"[\"):\n before = f\"{before}[\"\n after = f\"]{after}\"\n hostname = str(host)\n return (hostname, before, after)\n\n return (None, s, \"\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_params_json","title":"extract_params_json","text":"extract_params_json(json_data, compare_mode='getparam')\n
Extracts key-value pairs from a JSON object and returns them as a set of tuples. Used by the paramminer_headers
module.
Parameters:
json_data
(str
) \u2013 JSON-formatted string containing key-value pairs.
Returns:
set
\u2013 A set of tuples containing the keys and their corresponding values present in the JSON object.
Examples:
>>> extract_params_json('{\"a\": 1, \"b\": {\"c\": 2}}')\n{('a', 1), ('b', {'c': 2}), ('c', 2)}\n
Source code in bbot/core/helpers/misc.py
def extract_params_json(json_data, compare_mode=\"getparam\"):\n \"\"\"\n Extracts key-value pairs from a JSON object and returns them as a set of tuples. Used by the `paramminer_headers` module.\n\n Args:\n json_data (str): JSON-formatted string containing key-value pairs.\n\n Returns:\n set: A set of tuples containing the keys and their corresponding values present in the JSON object.\n\n Raises:\n Returns an empty set if JSONDecodeError occurs.\n\n Examples:\n >>> extract_params_json('{\"a\": 1, \"b\": {\"c\": 2}}')\n {('a', 1), ('b', {'c': 2}), ('c', 2)}\n \"\"\"\n try:\n data = json.loads(json_data)\n except json.JSONDecodeError:\n return set()\n\n key_value_pairs = set()\n stack = [(data, \"\")]\n\n while stack:\n current_data, path = stack.pop()\n if isinstance(current_data, dict):\n for key, value in current_data.items():\n full_key = f\"{path}.{key}\" if path else key\n if isinstance(value, dict):\n stack.append((value, full_key))\n elif isinstance(value, list):\n stack.append((value, full_key))\n else:\n if validate_parameter(full_key, compare_mode):\n key_value_pairs.add((full_key, value))\n elif isinstance(current_data, list):\n for item in current_data:\n if isinstance(item, (dict, list)):\n stack.append((item, path))\n return key_value_pairs\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_params_xml","title":"extract_params_xml","text":"extract_params_xml(xml_data, compare_mode='getparam')\n
Extracts tags and their text values from an XML object and returns them as a set of tuples.
Parameters:
xml_data
(str
) \u2013 XML-formatted string containing elements.
Returns:
set
\u2013 A set of tuples containing the tags and their corresponding text values present in the XML object.
Examples:
>>> extract_params_xml('<root><child1><child2>value</child2></child1></root>')\n{('root', None), ('child1', None), ('child2', 'value')}\n
Source code in bbot/core/helpers/misc.py
def extract_params_xml(xml_data, compare_mode=\"getparam\"):\n \"\"\"\n Extracts tags and their text values from an XML object and returns them as a set of tuples.\n\n Args:\n xml_data (str): XML-formatted string containing elements.\n\n Returns:\n set: A set of tuples containing the tags and their corresponding text values present in the XML object.\n\n Raises:\n Returns an empty set if ParseError occurs.\n\n Examples:\n >>> extract_params_xml('<root><child1><child2>value</child2></child1></root>')\n {('root', None), ('child1', None), ('child2', 'value')}\n \"\"\"\n import xml.etree.ElementTree as ET\n\n try:\n root = ET.fromstring(xml_data)\n except ET.ParseError:\n return set()\n\n tag_value_pairs = set()\n stack = [root]\n\n while stack:\n current_element = stack.pop()\n if validate_parameter(current_element.tag, compare_mode):\n tag_value_pairs.add((current_element.tag, current_element.text))\n for child in current_element:\n stack.append(child)\n return tag_value_pairs\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_words","title":"extract_words","text":"extract_words(data, acronyms=True, wordninja=True, model=None, max_length=100, word_regexes=None)\n
Intelligently extracts words from given data.
This function uses regular expressions and optionally wordninja to extract words from a given text string. Thanks to wordninja it can handle concatenated words intelligently.
Parameters:
data
(str
) \u2013 The data from which words are to be extracted.
acronyms
(bool
, default: True
) \u2013 Whether to include acronyms. Defaults to True.
wordninja
(bool
, default: True
) \u2013 Whether to use the wordninja library to split concatenated words. Defaults to True.
model
(object
, default: None
) \u2013 A custom wordninja model for special types of data such as DNS names.
max_length
(int
, default: 100
) \u2013 Maximum length for a word to be included. Defaults to 100.
word_regexes
(list
, default: None
) \u2013 A list of compiled regular expression objects for word extraction. Defaults to None.
Returns:
set
\u2013 A set of extracted words.
Examples:
>>> extract_words('blacklanternsecurity')\n{'black', 'lantern', 'security', 'bls', 'blacklanternsecurity'}\n
Source code in bbot/core/helpers/misc.py
def extract_words(data, acronyms=True, wordninja=True, model=None, max_length=100, word_regexes=None):\n \"\"\"Intelligently extracts words from given data.\n\n This function uses regular expressions and optionally wordninja to extract words\n from a given text string. Thanks to wordninja it can handle concatenated words intelligently.\n\n Args:\n data (str): The data from which words are to be extracted.\n acronyms (bool, optional): Whether to include acronyms. Defaults to True.\n wordninja (bool, optional): Whether to use the wordninja library to split concatenated words. Defaults to True.\n model (object, optional): A custom wordninja model for special types of data such as DNS names.\n max_length (int, optional): Maximum length for a word to be included. Defaults to 100.\n word_regexes (list, optional): A list of compiled regular expression objects for word extraction. Defaults to None.\n\n Returns:\n set: A set of extracted words.\n\n Examples:\n >>> extract_words('blacklanternsecurity')\n {'black', 'lantern', 'security', 'bls', 'blacklanternsecurity'}\n \"\"\"\n import wordninja as _wordninja\n\n if word_regexes is None:\n word_regexes = bbot_regexes.word_regexes\n words = set()\n data = smart_decode(data)\n for r in word_regexes:\n for word in set(r.findall(data)):\n # blacklanternsecurity\n if len(word) <= max_length:\n words.add(word)\n\n # blacklanternsecurity --> ['black', 'lantern', 'security']\n # max_slice_length = 3\n for word in list(words):\n if wordninja:\n if model is None:\n model = _wordninja\n subwords = model.split(word)\n for subword in subwords:\n words.add(subword)\n # this section generates compound words\n # it is interesting but currently disabled the quality of its output doesn't quite justify its quantity\n # blacklanternsecurity --> ['black', 'lantern', 'security', 'blacklantern', 'lanternsecurity']\n # for s, e in combinations(range(len(subwords) + 1), 2):\n # if e - s <= max_slice_length:\n # subword_slice = \"\".join(subwords[s:e])\n # words.add(subword_slice)\n # blacklanternsecurity --> bls\n if acronyms:\n if len(subwords) > 1:\n words.add(\"\".join([c[0] for c in subwords if len(c) > 0]))\n\n return words\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.filesize","title":"filesize","text":"filesize(f)\n
Get the file size of a given file.
This function takes a file path as an argument and returns its size in bytes. If the path does not point to a file, the function returns 0.
Parameters:
f
(str or Path
) \u2013 The file path for which to get the size.
Returns:
int
\u2013 The size of the file in bytes, or 0 if the path does not point to a file.
Examples:
>>> filesize(\"/path/to/file.txt\")\n1024\n
Source code in bbot/core/helpers/misc.py
def filesize(f):\n \"\"\"Get the file size of a given file.\n\n This function takes a file path as an argument and returns its size in bytes. If the path\n does not point to a file, the function returns 0.\n\n Args:\n f (str or Path): The file path for which to get the size.\n\n Returns:\n int: The size of the file in bytes, or 0 if the path does not point to a file.\n\n Examples:\n >>> filesize(\"/path/to/file.txt\")\n 1024\n \"\"\"\n f = Path(f)\n if f.is_file():\n return f.stat().st_size\n return 0\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.filter_dict","title":"filter_dict","text":"filter_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None)\n
Recursively filter a dictionary based on key names.
Parameters:
d
(dict
) \u2013 The input dictionary.
*key_names
\u2013 Names of keys to filter for.
fuzzy
(bool
, default: False
) \u2013 Whether to perform fuzzy matching on keys.
exclude_keys
((list, None)
, default: None
) \u2013 List of keys to be excluded from the final dict.
_prev_key
((str, None)
, default: None
) \u2013 For internal recursive use; the previous key in the hierarchy.
Returns:
dict
\u2013 A dictionary containing only the keys specified in key_names.
Examples:
>>> filter_dict({\"key1\": \"test\", \"key2\": \"asdf\"}, \"key2\")\n{\"key2\": \"asdf\"}\n>>> filter_dict({\"key1\": \"test\", \"key2\": {\"key3\": \"asdf\"}}, \"key1\", \"key3\", exclude_keys=\"key2\")\n{'key1': 'test'}\n
Source code in bbot/core/helpers/misc.py
def filter_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):\n \"\"\"\n Recursively filter a dictionary based on key names.\n\n Args:\n d (dict): The input dictionary.\n *key_names: Names of keys to filter for.\n fuzzy (bool): Whether to perform fuzzy matching on keys.\n exclude_keys (list, None): List of keys to be excluded from the final dict.\n _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.\n\n Returns:\n dict: A dictionary containing only the keys specified in key_names.\n\n Examples:\n >>> filter_dict({\"key1\": \"test\", \"key2\": \"asdf\"}, \"key2\")\n {\"key2\": \"asdf\"}\n >>> filter_dict({\"key1\": \"test\", \"key2\": {\"key3\": \"asdf\"}}, \"key1\", \"key3\", exclude_keys=\"key2\")\n {'key1': 'test'}\n \"\"\"\n if exclude_keys is None:\n exclude_keys = []\n if isinstance(exclude_keys, str):\n exclude_keys = [exclude_keys]\n ret = {}\n if isinstance(d, dict):\n for key in d:\n if key in key_names or (fuzzy and any(k in key for k in key_names)):\n if not any(k in exclude_keys for k in [key, _prev_key]):\n ret[key] = copy.deepcopy(d[key])\n elif isinstance(d[key], list) or isinstance(d[key], dict):\n child = filter_dict(d[key], *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)\n if child:\n ret[key] = child\n return ret\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.gen_numbers","title":"gen_numbers","text":"gen_numbers(n, padding=2)\n
Generates numbers with variable padding and returns them as a set of strings.
Parameters:
n
(int
) \u2013 The upper limit of numbers to generate, exclusive.
padding
(int
, default: 2
) \u2013 The maximum number of digits to pad the numbers with. Defaults to 2.
Returns:
set
\u2013 A set of string representations of numbers with varying degrees of padding.
Examples:
>>> gen_numbers(5)\n{'0', '00', '01', '02', '03', '04', '1', '2', '3', '4'}\n
>>> gen_numbers(3, padding=3)\n{'0', '00', '000', '001', '002', '01', '02', '1', '2'}\n
>>> gen_numbers(5, padding=1)\n{'0', '1', '2', '3', '4'}\n
Source code in bbot/core/helpers/misc.py
def gen_numbers(n, padding=2):\n \"\"\"Generates numbers with variable padding and returns them as a set of strings.\n\n Args:\n n (int): The upper limit of numbers to generate, exclusive.\n padding (int, optional): The maximum number of digits to pad the numbers with. Defaults to 2.\n\n Returns:\n set: A set of string representations of numbers with varying degrees of padding.\n\n Examples:\n >>> gen_numbers(5)\n {'0', '00', '01', '02', '03', '04', '1', '2', '3', '4'}\n\n >>> gen_numbers(3, padding=3)\n {'0', '00', '000', '001', '002', '01', '02', '1', '2'}\n\n >>> gen_numbers(5, padding=1)\n {'0', '1', '2', '3', '4'}\n \"\"\"\n results = set()\n for i in range(n):\n for p in range(1, padding + 1):\n results.add(str(i).zfill(p))\n return results\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_closest_match","title":"get_closest_match","text":"get_closest_match(s, choices, msg=None)\n
Finds the closest match from a list of choices for a given string.
This function is particularly useful for CLI applications where you want to validate flags or modules.
Parameters:
s
(str
) \u2013 The string for which to find the closest match.
choices
(list
) \u2013 A list of strings to compare against.
msg
(str
, default: None
) \u2013 Additional message to prepend in the warning message. Defaults to None.
loglevel
(str
) \u2013 The log level to use for the warning message. Defaults to \"HUGEWARNING\".
exitcode
(int
) \u2013 The exit code to use when exiting the program. Defaults to 2.
Examples:
>>> get_closest_match(\"some_module\", [\"some_mod\", \"some_other_mod\"], msg=\"module\")\n# Output: Could not find module \"some_module\". Did you mean \"some_mod\"?\n
Source code in bbot/core/helpers/misc.py
def get_closest_match(s, choices, msg=None):\n \"\"\"Finds the closest match from a list of choices for a given string.\n\n This function is particularly useful for CLI applications where you want to validate flags or modules.\n\n Args:\n s (str): The string for which to find the closest match.\n choices (list): A list of strings to compare against.\n msg (str, optional): Additional message to prepend in the warning message. Defaults to None.\n loglevel (str, optional): The log level to use for the warning message. Defaults to \"HUGEWARNING\".\n exitcode (int, optional): The exit code to use when exiting the program. Defaults to 2.\n\n Examples:\n >>> get_closest_match(\"some_module\", [\"some_mod\", \"some_other_mod\"], msg=\"module\")\n # Output: Could not find module \"some_module\". Did you mean \"some_mod\"?\n \"\"\"\n if msg is None:\n msg = \"\"\n else:\n msg += \" \"\n closest = closest_match(s, choices)\n return f'Could not find {msg}\"{s}\". Did you mean \"{closest}\"?'\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_exception_chain","title":"get_exception_chain","text":"get_exception_chain(e)\n
Retrieves the full chain of exceptions leading to the given exception.
Parameters:
e
(BaseException
) \u2013 The exception for which to get the chain.
Returns:
list[BaseException]: List of exceptions in the chain, from the given exception back to the root cause.
Examples:
>>> try:\n... raise ValueError(\"This is a value error\")\n... except ValueError as e:\n... exc_chain = get_exception_chain(e)\n... for exc in exc_chain:\n... print(exc)\nThis is a value error\n
Source code in bbot/core/helpers/misc.py
def get_exception_chain(e):\n \"\"\"\n Retrieves the full chain of exceptions leading to the given exception.\n\n Args:\n e (BaseException): The exception for which to get the chain.\n\n Returns:\n list[BaseException]: List of exceptions in the chain, from the given exception back to the root cause.\n\n Examples:\n >>> try:\n ... raise ValueError(\"This is a value error\")\n ... except ValueError as e:\n ... exc_chain = get_exception_chain(e)\n ... for exc in exc_chain:\n ... print(exc)\n This is a value error\n \"\"\"\n exception_chain = []\n current_exception = e\n while current_exception is not None:\n exception_chain.append(current_exception)\n current_exception = getattr(current_exception, \"__context__\", None)\n return exception_chain\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_file_extension","title":"get_file_extension","text":"get_file_extension(s)\n
Extracts the file extension from a given string representing a URL or file path.
Parameters:
s
(str
) \u2013 The string from which to extract the file extension.
Returns:
str
\u2013 The file extension, or an empty string if no extension is found.
Examples:
>>> get_file_extension(\"https://evilcorp.com/api/test.php\")\n\"php\"\n>>> get_file_extension(\"/etc/test.conf\")\n\"conf\"\n>>> get_file_extension(\"/etc/passwd\")\n\"\"\n
Source code in bbot/core/helpers/misc.py
def get_file_extension(s):\n \"\"\"\n Extracts the file extension from a given string representing a URL or file path.\n\n Args:\n s (str): The string from which to extract the file extension.\n\n Returns:\n str: The file extension, or an empty string if no extension is found.\n\n Examples:\n >>> get_file_extension(\"https://evilcorp.com/api/test.php\")\n \"php\"\n >>> get_file_extension(\"/etc/test.conf\")\n \"conf\"\n >>> get_file_extension(\"/etc/passwd\")\n \"\"\n \"\"\"\n s = str(s).lower().strip()\n rightmost_section = s.rsplit(\"/\", 1)[-1]\n if \".\" in rightmost_section:\n extension = rightmost_section.rsplit(\".\", 1)[-1]\n return extension\n return \"\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_keys_in_dot_syntax","title":"get_keys_in_dot_syntax","text":"get_keys_in_dot_syntax(config)\n
Retrieve all keys in an OmegaConf configuration in dot notation.
This function converts an OmegaConf configuration into a list of keys represented in dot notation.
Parameters:
config
(DictConfig
) \u2013 The OmegaConf configuration object.
Returns:
List[str]: A list of keys in dot notation.
Examples:
>>> config = OmegaConf.create({\n... \"web\": {\n... \"test\": True\n... },\n... \"db\": {\n... \"host\": \"localhost\",\n... \"port\": 5432\n... }\n... })\n>>> get_keys_in_dot_syntax(config)\n['web.test', 'db.host', 'db.port']\n
Source code in bbot/core/helpers/misc.py
def get_keys_in_dot_syntax(config):\n \"\"\"Retrieve all keys in an OmegaConf configuration in dot notation.\n\n This function converts an OmegaConf configuration into a list of keys\n represented in dot notation.\n\n Args:\n config (DictConfig): The OmegaConf configuration object.\n\n Returns:\n List[str]: A list of keys in dot notation.\n\n Examples:\n >>> config = OmegaConf.create({\n ... \"web\": {\n ... \"test\": True\n ... },\n ... \"db\": {\n ... \"host\": \"localhost\",\n ... \"port\": 5432\n ... }\n ... })\n >>> get_keys_in_dot_syntax(config)\n ['web.test', 'db.host', 'db.port']\n \"\"\"\n from omegaconf import OmegaConf\n\n container = OmegaConf.to_container(config, resolve=True)\n keys = []\n\n def recursive_keys(d, parent_key=\"\"):\n for k, v in d.items():\n full_key = f\"{parent_key}.{k}\" if parent_key else k\n if isinstance(v, dict):\n recursive_keys(v, full_key)\n else:\n keys.append(full_key)\n\n recursive_keys(container)\n return keys\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_size","title":"get_size","text":"get_size(obj, max_depth=5, seen=None)\n
Roughly estimate the memory footprint of a Python object using recursion.
Parameters:
obj
(any
) \u2013 The object whose size is to be determined.
max_depth
(int
, default: 5
) \u2013 Maximum depth to which nested objects will be inspected. Defaults to 5.
seen
(set
, default: None
) \u2013 Objects that have already been accounted for, to avoid loops.
Returns:
int
\u2013 Approximate memory footprint of the object in bytes.
Examples:
>>> get_size(my_list)\n4200\n
>>> get_size(my_dict, max_depth=3)\n8400\n
Source code in bbot/core/helpers/misc.py
def get_size(obj, max_depth=5, seen=None):\n \"\"\"\n Roughly estimate the memory footprint of a Python object using recursion.\n\n Parameters:\n obj (any): The object whose size is to be determined.\n max_depth (int, optional): Maximum depth to which nested objects will be inspected. Defaults to 5.\n seen (set, optional): Objects that have already been accounted for, to avoid loops.\n\n Returns:\n int: Approximate memory footprint of the object in bytes.\n\n Examples:\n >>> get_size(my_list)\n 4200\n\n >>> get_size(my_dict, max_depth=3)\n 8400\n \"\"\"\n from collections.abc import Mapping\n\n # If seen is not provided, initialize an empty set\n if seen is None:\n seen = set()\n # Get the id of the object\n obj_id = id(obj)\n # Decrease the maximum depth for the next recursion\n new_max_depth = max_depth - 1\n # If the object has already been seen or we've reached the maximum recursion depth, return 0\n if obj_id in seen or new_max_depth <= 0:\n return 0\n # Get the size of the object\n size = sys.getsizeof(obj)\n # Add the object's id to the set of seen objects\n seen.add(obj_id)\n # If the object has a __dict__ attribute, we want to measure its size\n if hasattr(obj, \"__dict__\"):\n # Iterate over the Method Resolution Order (MRO) of the class of the object\n for cls in obj.__class__.__mro__:\n # If the class's __dict__ contains a __dict__ key\n if \"__dict__\" in cls.__dict__:\n for k, v in obj.__dict__.items():\n size += get_size(k, new_max_depth, seen)\n size += get_size(v, new_max_depth, seen)\n break\n # If the object is a mapping (like a dictionary), we want to measure the size of its items\n if isinstance(obj, Mapping):\n with suppress(StopIteration):\n k, v = next(iter(obj.items()))\n size += (get_size(k, new_max_depth, seen) + get_size(v, new_max_depth, seen)) * len(obj)\n # If the object is a container (like a list or tuple) but not a string or bytes-like object\n elif isinstance(obj, (list, tuple, set)):\n with suppress(StopIteration):\n size += get_size(next(iter(obj)), new_max_depth, seen) * len(obj)\n # If the object has __slots__, we want to measure the size of the attributes in __slots__\n if hasattr(obj, \"__slots__\"):\n size += sum(get_size(getattr(obj, s), new_max_depth, seen) for s in obj.__slots__ if hasattr(obj, s))\n return size\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_traceback_details","title":"get_traceback_details","text":"get_traceback_details(e)\n
Retrieves detailed information from the traceback of an exception.
Parameters:
e
(BaseException
) \u2013 The exception for which to get traceback details.
Returns:
tuple
\u2013 A tuple containing filename (str), line number (int), and function name (str) where the exception was raised.
Examples:
>>> try:\n... raise ValueError(\"This is a value error\")\n... except ValueError as e:\n... filename, lineno, funcname = get_traceback_details(e)\n... print(f\"File: {filename}, Line: {lineno}, Function: {funcname}\")\nFile: <stdin>, Line: 2, Function: <module>\n
Source code in bbot/core/helpers/misc.py
def get_traceback_details(e):\n \"\"\"\n Retrieves detailed information from the traceback of an exception.\n\n Args:\n e (BaseException): The exception for which to get traceback details.\n\n Returns:\n tuple: A tuple containing filename (str), line number (int), and function name (str) where the exception was raised.\n\n Examples:\n >>> try:\n ... raise ValueError(\"This is a value error\")\n ... except ValueError as e:\n ... filename, lineno, funcname = get_traceback_details(e)\n ... print(f\"File: {filename}, Line: {lineno}, Function: {funcname}\")\n File: <stdin>, Line: 2, Function: <module>\n \"\"\"\n import traceback\n\n tb = traceback.extract_tb(e.__traceback__)\n last_frame = tb[-1] # Get the last frame in the traceback (the one where the exception was raised)\n filename = last_frame.filename\n lineno = last_frame.lineno\n funcname = last_frame.name\n return filename, lineno, funcname\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.grouper","title":"grouper","text":"grouper(iterable, n)\n
Grouper groups an iterable into chunks of a given size.
Parameters:
iterable
(iterable
) \u2013 The iterable to be chunked.
n
(int
) \u2013 The size of each chunk.
Returns:
iterator
\u2013 An iterator that produces lists of elements from the original iterable, each of length n
or less.
Examples:
>>> list(grouper('ABCDEFG', 3))\n[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]\n
Source code in bbot/core/helpers/misc.py
def grouper(iterable, n):\n \"\"\"\n Grouper groups an iterable into chunks of a given size.\n\n Args:\n iterable (iterable): The iterable to be chunked.\n n (int): The size of each chunk.\n\n Returns:\n iterator: An iterator that produces lists of elements from the original iterable, each of length `n` or less.\n\n Examples:\n >>> list(grouper('ABCDEFG', 3))\n [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]\n \"\"\"\n from itertools import islice\n\n iterable = iter(iterable)\n return iter(lambda: list(islice(iterable, n)), [])\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.human_timedelta","title":"human_timedelta","text":"human_timedelta(d)\n
Convert a TimeDelta object into a human-readable string.
This function takes a datetime.timedelta object and converts it into a string format that is easier to read and understand.
Parameters:
d
(timedelta
) \u2013 The TimeDelta object to convert.
Returns:
str
\u2013 A string representation of the TimeDelta object in human-readable form.
Examples:
>>> from datetime import datetime\n>>>\n>>> start_time = datetime.now()\n>>> end_time = datetime.now()\n>>> elapsed_time = end_time - start_time\n>>> human_timedelta(elapsed_time)\n'2 hours, 30 minutes, 15 seconds'\n
Source code in bbot/core/helpers/misc.py
def human_timedelta(d):\n \"\"\"Convert a TimeDelta object into a human-readable string.\n\n This function takes a datetime.timedelta object and converts it into a string format that\n is easier to read and understand.\n\n Args:\n d (datetime.timedelta): The TimeDelta object to convert.\n\n Returns:\n str: A string representation of the TimeDelta object in human-readable form.\n\n Examples:\n >>> from datetime import datetime\n >>>\n >>> start_time = datetime.now()\n >>> end_time = datetime.now()\n >>> elapsed_time = end_time - start_time\n >>> human_timedelta(elapsed_time)\n '2 hours, 30 minutes, 15 seconds'\n \"\"\"\n hours, remainder = divmod(d.seconds, 3600)\n minutes, seconds = divmod(remainder, 60)\n result = []\n if hours:\n result.append(f\"{hours:,} hour\" + (\"s\" if hours > 1 else \"\"))\n if minutes:\n result.append(f\"{minutes:,} minute\" + (\"s\" if minutes > 1 else \"\"))\n if seconds:\n result.append(f\"{seconds:,} second\" + (\"s\" if seconds > 1 else \"\"))\n ret = \", \".join(result)\n if not ret:\n ret = \"0 seconds\"\n return ret\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.human_to_bytes","title":"human_to_bytes","text":"human_to_bytes(filesize)\n
Convert a human-readable file size string to its bytes equivalent.
This function takes a human-readable file size string, such as \"2.5GB\", and converts it to its equivalent number of bytes.
Parameters:
filesize
(str or int
) \u2013 The human-readable file size string or integer bytes value to convert.
Returns:
int
\u2013 The number of bytes equivalent to the input human-readable file size.
Raises:
ValueError
\u2013 If the input string cannot be converted to bytes.
Examples:
>>> human_to_bytes(\"23.23gb\")\n24943022571\n
Source code in bbot/core/helpers/misc.py
def human_to_bytes(filesize):\n \"\"\"Convert a human-readable file size string to its bytes equivalent.\n\n This function takes a human-readable file size string, such as \"2.5GB\", and converts it\n to its equivalent number of bytes.\n\n Args:\n filesize (str or int): The human-readable file size string or integer bytes value to convert.\n\n Returns:\n int: The number of bytes equivalent to the input human-readable file size.\n\n Raises:\n ValueError: If the input string cannot be converted to bytes.\n\n Examples:\n >>> human_to_bytes(\"23.23gb\")\n 24943022571\n \"\"\"\n if isinstance(filesize, int):\n return filesize\n sizes = [\"B\", \"KB\", \"MB\", \"GB\", \"TB\", \"PB\", \"EB\", \"ZB\"]\n units = {}\n for count, size in enumerate(sizes):\n size_increment = pow(1024, count)\n units[size] = size_increment\n if len(size) == 2:\n units[size[0]] = size_increment\n match = filesize_regex.match(filesize)\n try:\n if match:\n num, size = match.groups()\n size = size.upper()\n size_increment = units[size]\n return int(float(num) * size_increment)\n except KeyError:\n pass\n raise ValueError(f'Unable to convert filesize \"{filesize}\" to bytes')\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.in_exception_chain","title":"in_exception_chain","text":"in_exception_chain(e, exc_types)\n
Given an Exception and a list of Exception types, returns whether any of the specified types are contained anywhere in the Exception chain.
Parameters:
e
(BaseException
) \u2013 The exception to check
exc_types
(list[Exception]
) \u2013 Exception types to consider intentional cancellations. Default is KeyboardInterrupt
Returns:
bool
\u2013 Whether the error is the result of an intentional cancellaion
Examples:
>>> try:\n... raise ValueError(\"This is a value error\")\n... except Exception as e:\n... if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n... raise\n
Source code in bbot/core/helpers/misc.py
def in_exception_chain(e, exc_types):\n \"\"\"\n Given an Exception and a list of Exception types, returns whether any of the specified types are contained anywhere in the Exception chain.\n\n Args:\n e (BaseException): The exception to check\n exc_types (list[Exception]): Exception types to consider intentional cancellations. Default is KeyboardInterrupt\n\n Returns:\n bool: Whether the error is the result of an intentional cancellaion\n\n Examples:\n >>> try:\n ... raise ValueError(\"This is a value error\")\n ... except Exception as e:\n ... if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n ... raise\n \"\"\"\n return any([isinstance(_, exc_types) for _ in get_exception_chain(e)])\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.integer_to_ordinal","title":"integer_to_ordinal","text":"integer_to_ordinal(n)\n
Convert an integer to its ordinal representation.
Parameters:
n
(int
) \u2013 The integer to convert.
Returns:
str
\u2013 The ordinal representation of the integer.
Examples:
>>> integer_to_ordinal(1)\n'1st'\n>>> integer_to_ordinal(2)\n'2nd'\n>>> integer_to_ordinal(3)\n'3rd'\n>>> integer_to_ordinal(11)\n'11th'\n>>> integer_to_ordinal(21)\n'21st'\n>>> integer_to_ordinal(101)\n'101st'\n
Source code in bbot/core/helpers/misc.py
def integer_to_ordinal(n):\n \"\"\"\n Convert an integer to its ordinal representation.\n\n Args:\n n (int): The integer to convert.\n\n Returns:\n str: The ordinal representation of the integer.\n\n Examples:\n >>> integer_to_ordinal(1)\n '1st'\n >>> integer_to_ordinal(2)\n '2nd'\n >>> integer_to_ordinal(3)\n '3rd'\n >>> integer_to_ordinal(11)\n '11th'\n >>> integer_to_ordinal(21)\n '21st'\n >>> integer_to_ordinal(101)\n '101st'\n \"\"\"\n # Check the last digit\n last_digit = n % 10\n # Check the last two digits for special cases (11th, 12th, 13th)\n last_two_digits = n % 100\n\n if 10 <= last_two_digits <= 20:\n suffix = \"th\"\n else:\n if last_digit == 1:\n suffix = \"st\"\n elif last_digit == 2:\n suffix = \"nd\"\n elif last_digit == 3:\n suffix = \"rd\"\n else:\n suffix = \"th\"\n\n return f\"{n}{suffix}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.ip_network_parents","title":"ip_network_parents","text":"ip_network_parents(i, include_self=False)\n
Generates all parent IP networks for a given IP address or network, optionally including the network itself.
Parameters:
i
(str or IPv4Network / IPv6Network
) \u2013 The IP address or network to find parents for.
include_self
(bool
, default: False
) \u2013 Whether to include the network itself in the result. Default is False.
Yields:
ipaddress.IPv4Network or ipaddress.IPv6Network: Parent IP networks in descending order of prefix length.
Examples:
>>> list(ip_network_parents(\"192.168.1.1\"))\n[ipaddress.IPv4Network('192.168.1.0/31'), ipaddress.IPv4Network('192.168.1.0/30'), ... , ipaddress.IPv4Network('0.0.0.0/0')]\n
Notes ipaddress
module for network operations.bbot/core/helpers/misc.py
def ip_network_parents(i, include_self=False):\n \"\"\"\n Generates all parent IP networks for a given IP address or network, optionally including the network itself.\n\n Args:\n i (str or ipaddress.IPv4Network/ipaddress.IPv6Network): The IP address or network to find parents for.\n include_self (bool, optional): Whether to include the network itself in the result. Default is False.\n\n Yields:\n ipaddress.IPv4Network or ipaddress.IPv6Network: Parent IP networks in descending order of prefix length.\n\n Examples:\n >>> list(ip_network_parents(\"192.168.1.1\"))\n [ipaddress.IPv4Network('192.168.1.0/31'), ipaddress.IPv4Network('192.168.1.0/30'), ... , ipaddress.IPv4Network('0.0.0.0/0')]\n\n Notes:\n - Utilizes Python's built-in `ipaddress` module for network operations.\n \"\"\"\n net = ipaddress.ip_network(i, strict=False)\n for i in range(net.prefixlen - (0 if include_self else 1), -1, -1):\n yield ipaddress.ip_network(f\"{net.network_address}/{i}\", strict=False)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_async_function","title":"is_async_function","text":"is_async_function(f)\n
Check if a given function is an asynchronous function.
Parameters:
f
(function
) \u2013 The function to check.
Returns:
bool
\u2013 True if the function is asynchronous, False otherwise.
Examples:
>>> async def foo():\n... pass\n>>> is_async_function(foo)\nTrue\n
Source code in bbot/core/helpers/misc.py
def is_async_function(f):\n \"\"\"\n Check if a given function is an asynchronous function.\n\n Args:\n f (function): The function to check.\n\n Returns:\n bool: True if the function is asynchronous, False otherwise.\n\n Examples:\n >>> async def foo():\n ... pass\n >>> is_async_function(foo)\n True\n \"\"\"\n import inspect\n\n return inspect.iscoroutinefunction(f)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_dns_name","title":"is_dns_name","text":"is_dns_name(d, include_local=True)\n
Determines if the given string is a valid DNS name.
Parameters:
d
(str
) \u2013 The string to be checked.
include_local
(bool
, default: True
) \u2013 Consider local hostnames to be valid (hostnames without periods)
Returns:
bool
\u2013 True if the string is a valid DNS name, False otherwise.
Examples:
>>> is_dns_name('www.example.com')\nTrue\n>>> is_dns_name('localhost')\nTrue\n>>> is_dns_name('localhost', include_local=False)\nFalse\n>>> is_dns_name('192.168.1.1')\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_dns_name(d, include_local=True):\n \"\"\"\n Determines if the given string is a valid DNS name.\n\n Args:\n d (str): The string to be checked.\n include_local (bool): Consider local hostnames to be valid (hostnames without periods)\n\n Returns:\n bool: True if the string is a valid DNS name, False otherwise.\n\n Examples:\n >>> is_dns_name('www.example.com')\n True\n >>> is_dns_name('localhost')\n True\n >>> is_dns_name('localhost', include_local=False)\n False\n >>> is_dns_name('192.168.1.1')\n False\n \"\"\"\n if is_ip(d):\n return False\n d = smart_decode(d)\n if include_local:\n if bbot_regexes.hostname_regex.match(d):\n return True\n if bbot_regexes.dns_name_regex.match(d):\n return True\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_domain","title":"is_domain","text":"is_domain(d)\n
Check if the given input represents a domain without subdomains.
This function takes an input string d
and returns True if it represents a domain without any subdomains. Otherwise, it returns False.
Parameters:
d
(str
) \u2013 The input string containing the domain.
Returns:
bool
\u2013 True if the input is a domain without subdomains, False otherwise.
Examples:
>>> is_domain(\"evilcorp.co.uk\")\nTrue\n
>>> is_domain(\"www.evilcorp.co.uk\")\nFalse\n
Notes bbot/core/helpers/misc.py
def is_domain(d):\n \"\"\"\n Check if the given input represents a domain without subdomains.\n\n This function takes an input string `d` and returns True if it represents a domain without any subdomains.\n Otherwise, it returns False.\n\n Args:\n d (str): The input string containing the domain.\n\n Returns:\n bool: True if the input is a domain without subdomains, False otherwise.\n\n Examples:\n >>> is_domain(\"evilcorp.co.uk\")\n True\n\n >>> is_domain(\"www.evilcorp.co.uk\")\n False\n\n Notes:\n - Port, if present in input, is ignored.\n \"\"\"\n d, _ = split_host_port(d)\n if is_ip(d):\n return False\n extracted = tldextract(d)\n if extracted.registered_domain:\n if not extracted.subdomain:\n return True\n else:\n return d.count(\".\") == 1\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_file","title":"is_file","text":"is_file(f)\n
Check if a path points to a file.
Parameters:
f
(str
) \u2013 Path to the file.
Returns:
bool
\u2013 True if the path is a file, False otherwise.
Examples:
>>> is_file(\"/etc/passwd\")\nTrue\n
>>> is_file(\"/nonexistent\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_file(f):\n \"\"\"\n Check if a path points to a file.\n\n Parameters:\n f (str): Path to the file.\n\n Returns:\n bool: True if the path is a file, False otherwise.\n\n Examples:\n >>> is_file(\"/etc/passwd\")\n True\n\n >>> is_file(\"/nonexistent\")\n False\n \"\"\"\n with suppress(Exception):\n return Path(f).is_file()\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_ip","title":"is_ip","text":"is_ip(d, version=None)\n
Checks if the given string or object represents a valid IP address.
Parameters:
d
(str or IPvXAddress
) \u2013 The IP address to check.
version
(int
, default: None
) \u2013 The IP version to validate (4 or 6). Default is None.
Returns:
bool
\u2013 True if the string or object is a valid IP address, False otherwise.
Examples:
>>> is_ip('192.168.1.1')\nTrue\n>>> is_ip('bad::c0de', version=6)\nTrue\n>>> is_ip('bad::c0de', version=4)\nFalse\n>>> is_ip('evilcorp.com')\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_ip(d, version=None):\n \"\"\"\n Checks if the given string or object represents a valid IP address.\n\n Args:\n d (str or ipaddress.IPvXAddress): The IP address to check.\n version (int, optional): The IP version to validate (4 or 6). Default is None.\n\n Returns:\n bool: True if the string or object is a valid IP address, False otherwise.\n\n Examples:\n >>> is_ip('192.168.1.1')\n True\n >>> is_ip('bad::c0de', version=6)\n True\n >>> is_ip('bad::c0de', version=4)\n False\n >>> is_ip('evilcorp.com')\n False\n \"\"\"\n try:\n ip = ipaddress.ip_address(d)\n if version is None or ip.version == version:\n return True\n except Exception:\n pass\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_ip_type","title":"is_ip_type","text":"is_ip_type(i)\n
Checks if the given object is an instance of an IPv4 or IPv6 type from the ipaddress module.
Parameters:
i
(_BaseV4 or _BaseV6
) \u2013 The IP object to check.
Returns:
bool
\u2013 True if the object is an instance of ipaddress._BaseV4 or ipaddress._BaseV6, False otherwise.
Examples:
>>> is_ip_type(ipaddress.IPv6Address('dead::beef'))\nTrue\n>>> is_ip_type(ipaddress.IPv4Network('192.168.1.0/24'))\nTrue\n>>> is_ip_type(\"192.168.1.0/24\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_ip_type(i):\n \"\"\"\n Checks if the given object is an instance of an IPv4 or IPv6 type from the ipaddress module.\n\n Args:\n i (ipaddress._BaseV4 or ipaddress._BaseV6): The IP object to check.\n\n Returns:\n bool: True if the object is an instance of ipaddress._BaseV4 or ipaddress._BaseV6, False otherwise.\n\n Examples:\n >>> is_ip_type(ipaddress.IPv6Address('dead::beef'))\n True\n >>> is_ip_type(ipaddress.IPv4Network('192.168.1.0/24'))\n True\n >>> is_ip_type(\"192.168.1.0/24\")\n False\n \"\"\"\n return ipaddress._IPAddressBase in i.__class__.__mro__\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_port","title":"is_port","text":"is_port(p)\n
Checks if the given string represents a valid port number.
Parameters:
p
(str or int
) \u2013 The port number to check.
Returns:
bool
\u2013 True if the port number is valid, False otherwise.
Examples:
>>> is_port('80')\nTrue\n>>> is_port('70000')\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_port(p):\n \"\"\"\n Checks if the given string represents a valid port number.\n\n Args:\n p (str or int): The port number to check.\n\n Returns:\n bool: True if the port number is valid, False otherwise.\n\n Examples:\n >>> is_port('80')\n True\n >>> is_port('70000')\n False\n \"\"\"\n\n p = str(p)\n return p and p.isdigit() and 0 <= int(p) <= 65535\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_ptr","title":"is_ptr","text":"is_ptr(d)\n
Check if the given input represents a PTR record domain.
This function takes an input string d
and returns True if it matches the PTR record format. Otherwise, it returns False.
Parameters:
d
(str
) \u2013 The input string potentially representing a PTR record domain.
Returns:
bool
\u2013 True if the input matches PTR record format, False otherwise.
Examples:
>>> is_ptr(\"wsc-11-22-33-44.evilcorp.com\")\nTrue\n
>>> is_ptr(\"www2.evilcorp.com\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_ptr(d):\n \"\"\"\n Check if the given input represents a PTR record domain.\n\n This function takes an input string `d` and returns True if it matches the PTR record format.\n Otherwise, it returns False.\n\n Args:\n d (str): The input string potentially representing a PTR record domain.\n\n Returns:\n bool: True if the input matches PTR record format, False otherwise.\n\n Examples:\n >>> is_ptr(\"wsc-11-22-33-44.evilcorp.com\")\n True\n\n >>> is_ptr(\"www2.evilcorp.com\")\n False\n \"\"\"\n return bool(bbot_regexes.ptr_regex.search(str(d)))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_subdomain","title":"is_subdomain","text":"is_subdomain(d)\n
Check if the given input represents a subdomain.
This function takes an input string d
and returns True if it represents a subdomain. Otherwise, it returns False.
Parameters:
d
(str
) \u2013 The input string containing the domain or subdomain.
Returns:
bool
\u2013 True if the input is a subdomain, False otherwise.
Examples:
>>> is_subdomain(\"www.evilcorp.co.uk\")\nTrue\n
>>> is_subdomain(\"evilcorp.co.uk\")\nFalse\n
Notes bbot/core/helpers/misc.py
def is_subdomain(d):\n \"\"\"\n Check if the given input represents a subdomain.\n\n This function takes an input string `d` and returns True if it represents a subdomain.\n Otherwise, it returns False.\n\n Args:\n d (str): The input string containing the domain or subdomain.\n\n Returns:\n bool: True if the input is a subdomain, False otherwise.\n\n Examples:\n >>> is_subdomain(\"www.evilcorp.co.uk\")\n True\n\n >>> is_subdomain(\"evilcorp.co.uk\")\n False\n\n Notes:\n - Port, if present in input, is ignored.\n \"\"\"\n d, _ = split_host_port(d)\n if is_ip(d):\n return False\n extracted = tldextract(d)\n if extracted.registered_domain:\n if extracted.subdomain:\n return True\n else:\n return d.count(\".\") > 1\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_uri","title":"is_uri","text":"is_uri(u, return_scheme=False)\n
Check if the given input represents a URI and optionally return its scheme.
This function takes an input string u
and returns True if it matches a URI format. When return_scheme
is True, it returns the URI scheme instead of a boolean.
Parameters:
u
(str
) \u2013 The input string potentially representing a URI.
return_scheme
(bool
, default: False
) \u2013 Whether to return the URI scheme. Defaults to False.
Returns:
Union[bool, str]: True if the input matches a URI format; the URI scheme if return_scheme
is True.
Examples:
>>> is_uri(\"http://evilcorp.com\")\nTrue\n
>>> is_uri(\"ftp://evilcorp.com\")\nTrue\n
>>> is_uri(\"evilcorp.com\")\nFalse\n
>>> is_uri(\"ftp://evilcorp.com\", return_scheme=True)\n\"ftp\"\n
Source code in bbot/core/helpers/misc.py
def is_uri(u, return_scheme=False):\n \"\"\"\n Check if the given input represents a URI and optionally return its scheme.\n\n This function takes an input string `u` and returns True if it matches a URI format.\n When `return_scheme` is True, it returns the URI scheme instead of a boolean.\n\n Args:\n u (str): The input string potentially representing a URI.\n return_scheme (bool, optional): Whether to return the URI scheme. Defaults to False.\n\n Returns:\n Union[bool, str]: True if the input matches a URI format; the URI scheme if `return_scheme` is True.\n\n Examples:\n >>> is_uri(\"http://evilcorp.com\")\n True\n\n >>> is_uri(\"ftp://evilcorp.com\")\n True\n\n >>> is_uri(\"evilcorp.com\")\n False\n\n >>> is_uri(\"ftp://evilcorp.com\", return_scheme=True)\n \"ftp\"\n \"\"\"\n match = uri_regex.match(u)\n if return_scheme:\n if match:\n return match.groups()[0].lower()\n return \"\"\n return bool(match)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_url","title":"is_url","text":"is_url(u)\n
Check if the given input represents a valid URL.
This function takes an input string u
and returns True if it matches any of the predefined URL formats. Otherwise, it returns False.
Parameters:
u
(str
) \u2013 The input string potentially representing a URL.
Returns:
bool
\u2013 True if the input matches a valid URL format, False otherwise.
Examples:
>>> is_url(\"https://evilcorp.com\")\nTrue\n
>>> is_url(\"not-a-url\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_url(u):\n \"\"\"\n Check if the given input represents a valid URL.\n\n This function takes an input string `u` and returns True if it matches any of the predefined URL formats.\n Otherwise, it returns False.\n\n Args:\n u (str): The input string potentially representing a URL.\n\n Returns:\n bool: True if the input matches a valid URL format, False otherwise.\n\n Examples:\n >>> is_url(\"https://evilcorp.com\")\n True\n\n >>> is_url(\"not-a-url\")\n False\n \"\"\"\n u = str(u)\n for r in bbot_regexes.event_type_regexes[\"URL\"]:\n if r.match(u):\n return True\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.kill_children","title":"kill_children","text":"kill_children(parent_pid=None, sig=None)\n
Forgive me father for I have sinned
Source code inbbot/core/helpers/misc.py
def kill_children(parent_pid=None, sig=None):\n \"\"\"\n Forgive me father for I have sinned\n \"\"\"\n import psutil\n import signal\n\n if sig is None:\n sig = signal.SIGTERM\n\n try:\n parent = psutil.Process(parent_pid)\n except psutil.NoSuchProcess:\n log.debug(f\"No such PID: {parent_pid}\")\n return\n log.debug(f\"Killing children of process ID {parent.pid}\")\n children = parent.children(recursive=True)\n for child in children:\n log.debug(f\"Killing child with PID {child.pid}\")\n if child.name != \"python\":\n try:\n child.send_signal(sig)\n except psutil.NoSuchProcess:\n log.debug(f\"No such PID: {child.pid}\")\n except psutil.AccessDenied:\n log.debug(f\"Error killing PID: {child.pid} - access denied\")\n log.debug(f\"Finished killing children of process ID {parent.pid}\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.latest_mtime","title":"latest_mtime","text":"latest_mtime(d)\n
Get the latest modified time of any file or sub-directory in a given directory.
This function takes a directory path as an argument and returns the latest modified time of any contained file or directory, recursively. It's useful for sorting directories by modified time for cleanup or other purposes.
Parameters:
d
(str or Path
) \u2013 The directory path to search for the latest modified time.
Returns:
float
\u2013 The latest modified time in Unix timestamp format.
Examples:
>>> latest_mtime(\"~/.bbot/scans/mushy_susan\")\n1659016928.2848816\n
Source code in bbot/core/helpers/misc.py
def latest_mtime(d):\n \"\"\"Get the latest modified time of any file or sub-directory in a given directory.\n\n This function takes a directory path as an argument and returns the latest modified time\n of any contained file or directory, recursively. It's useful for sorting directories by\n modified time for cleanup or other purposes.\n\n Args:\n d (str or Path): The directory path to search for the latest modified time.\n\n Returns:\n float: The latest modified time in Unix timestamp format.\n\n Examples:\n >>> latest_mtime(\"~/.bbot/scans/mushy_susan\")\n 1659016928.2848816\n \"\"\"\n d = Path(d).resolve()\n mtimes = [d.lstat().st_mtime]\n if d.is_dir():\n to_list = d.glob(\"**/*\")\n else:\n to_list = [d]\n for e in to_list:\n mtimes.append(e.lstat().st_mtime)\n return max(mtimes)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.list_files","title":"list_files","text":"list_files(directory, filter=lambda x: True)\n
Lists files in a given directory that meet a specified filter condition.
Parameters:
directory
(str
) \u2013 The directory where to list files.
filter
(callable
, default: lambda x: True
) \u2013 A function to filter the files. Defaults to a lambda function that returns True for all files.
Yields:
Path
\u2013 A Path object for each file that meets the filter condition.
Examples:
>>> list(list_files(\"/tmp/test\"))\n[Path('/tmp/test/file1.py'), Path('/tmp/test/file2.txt')]\n
>>> list(list_files(\"/tmp/test\"), filter=lambda f: f.suffix == \".py\")\n[Path('/tmp/test/file1.py')]\n
Source code in bbot/core/helpers/misc.py
def list_files(directory, filter=lambda x: True):\n \"\"\"Lists files in a given directory that meet a specified filter condition.\n\n Args:\n directory (str): The directory where to list files.\n filter (callable, optional): A function to filter the files. Defaults to a lambda function that returns True for all files.\n\n Yields:\n Path: A Path object for each file that meets the filter condition.\n\n Examples:\n >>> list(list_files(\"/tmp/test\"))\n [Path('/tmp/test/file1.py'), Path('/tmp/test/file2.txt')]\n\n >>> list(list_files(\"/tmp/test\"), filter=lambda f: f.suffix == \".py\")\n [Path('/tmp/test/file1.py')]\n \"\"\"\n directory = Path(directory).resolve()\n if directory.is_dir():\n for file in directory.iterdir():\n if file.is_file() and filter(file):\n yield file\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_date","title":"make_date","text":"make_date(d=None, microseconds=False)\n
Generates a string representation of the current date and time, with optional microsecond precision.
Parameters:
d
(datetime
, default: None
) \u2013 A datetime object to convert. Defaults to the current date and time.
microseconds
(bool
, default: False
) \u2013 Whether to include microseconds. Defaults to False.
Returns:
str
\u2013 A string representation of the date and time, formatted as YYYYMMDD_HHMM_SS or YYYYMMDD_HHMM_SSFFFFFF if microseconds are included.
Examples:
>>> make_date()\n\"20220707_1325_50\"\n>>> make_date(microseconds=True)\n\"20220707_1330_35167617\"\n
Source code in bbot/core/helpers/misc.py
def make_date(d=None, microseconds=False):\n \"\"\"\n Generates a string representation of the current date and time, with optional microsecond precision.\n\n Args:\n d (datetime, optional): A datetime object to convert. Defaults to the current date and time.\n microseconds (bool, optional): Whether to include microseconds. Defaults to False.\n\n Returns:\n str: A string representation of the date and time, formatted as YYYYMMDD_HHMM_SS or YYYYMMDD_HHMM_SSFFFFFF if microseconds are included.\n\n Examples:\n >>> make_date()\n \"20220707_1325_50\"\n >>> make_date(microseconds=True)\n \"20220707_1330_35167617\"\n \"\"\"\n from datetime import datetime\n\n f = \"%Y%m%d_%H%M_%S\"\n if microseconds:\n f += \"%f\"\n if d is None:\n d = datetime.now()\n return d.strftime(f)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_ip_type","title":"make_ip_type","text":"make_ip_type(s)\n
Convert a string to its corresponding IP address or network type.
This function attempts to convert the input string s
into either an IPv4 or IPv6 address object, or an IPv4 or IPv6 network object. If none of these conversions are possible, the original string is returned.
Parameters:
s
(str
) \u2013 The input string to be converted.
Returns:
Union[IPv4Address, IPv6Address, IPv4Network, IPv6Network, str]: The converted object or original string.
Examples:
>>> make_ip_type(\"dead::beef\")\nIPv6Address('dead::beef')\n
>>> make_ip_type(\"192.168.1.0/24\")\nIPv4Network('192.168.1.0/24')\n
>>> make_ip_type(\"evilcorp.com\")\n'evilcorp.com'\n
Source code in bbot/core/helpers/misc.py
def make_ip_type(s):\n \"\"\"\n Convert a string to its corresponding IP address or network type.\n\n This function attempts to convert the input string `s` into either an IPv4 or IPv6 address object,\n or an IPv4 or IPv6 network object. If none of these conversions are possible, the original string is returned.\n\n Args:\n s (str): The input string to be converted.\n\n Returns:\n Union[IPv4Address, IPv6Address, IPv4Network, IPv6Network, str]: The converted object or original string.\n\n Examples:\n >>> make_ip_type(\"dead::beef\")\n IPv6Address('dead::beef')\n\n >>> make_ip_type(\"192.168.1.0/24\")\n IPv4Network('192.168.1.0/24')\n\n >>> make_ip_type(\"evilcorp.com\")\n 'evilcorp.com'\n \"\"\"\n if not s:\n raise ValueError(f'Invalid hostname: \"{s}\"')\n # IP address\n with suppress(Exception):\n return ipaddress.ip_address(s)\n # IP network\n with suppress(Exception):\n return ipaddress.ip_network(s, strict=False)\n return s\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_netloc","title":"make_netloc","text":"make_netloc(host, port)\n
Constructs a network location string from a given host and port.
Parameters:
host
(str
) \u2013 The hostname or IP address.
port
(int
) \u2013 The port number. If None, the port is omitted.
Returns:
str
\u2013 A network location string in the form 'host' or 'host:port'.
Examples:
>>> make_netloc(\"192.168.1.1\", None)\n\"192.168.1.1\"\n
>>> make_netloc(\"192.168.1.1\", 443)\n\"192.168.1.1:443\"\n
>>> make_netloc(\"evilcorp.com\", 80)\n\"evilcorp.com:80\"\n
>>> make_netloc(\"dead::beef\", None)\n\"[dead::beef]\"\n
>>> make_netloc(\"dead::beef\", 443)\n\"[dead::beef]:443\"\n
Source code in bbot/core/helpers/misc.py
def make_netloc(host, port):\n \"\"\"Constructs a network location string from a given host and port.\n\n Args:\n host (str): The hostname or IP address.\n port (int, optional): The port number. If None, the port is omitted.\n\n Returns:\n str: A network location string in the form 'host' or 'host:port'.\n\n Examples:\n >>> make_netloc(\"192.168.1.1\", None)\n \"192.168.1.1\"\n\n >>> make_netloc(\"192.168.1.1\", 443)\n \"192.168.1.1:443\"\n\n >>> make_netloc(\"evilcorp.com\", 80)\n \"evilcorp.com:80\"\n\n >>> make_netloc(\"dead::beef\", None)\n \"[dead::beef]\"\n\n >>> make_netloc(\"dead::beef\", 443)\n \"[dead::beef]:443\"\n \"\"\"\n if is_ip(host, version=6):\n host = f\"[{host}]\"\n if port is None:\n return host\n return f\"{host}:{port}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_table","title":"make_table","text":"make_table(rows, header, **kwargs)\n
Generate a formatted table from the given rows and headers.
This function uses the tabulate
package to generate a table with formatting options. It can accept various input formats and table styles, which can be customized using optional arguments.
Parameters:
*args
\u2013 Positional arguments to be passed to tabulate.tabulate
.
**kwargs
\u2013 Keyword arguments to customize table formatting. - tablefmt (str, optional): Table format. Default is 'grid'. - disable_numparse (bool, optional): Disable automatic number parsing. Default is True. - maxcolwidths (int, optional): Maximum column width. Default is 40.
Returns:
str
\u2013 A string representing the formatted table.
Examples:
>>> print(make_table([[\"row1\", \"row1\"], [\"row2\", \"row2\"]], [\"header1\", \"header2\"]))\n+-----------+-----------+\n| header1 | header2 |\n+===========+===========+\n| row1 | row1 |\n+-----------+-----------+\n| row2 | row2 |\n+-----------+-----------+\n
Source code in bbot/core/helpers/misc.py
def make_table(rows, header, **kwargs):\n \"\"\"Generate a formatted table from the given rows and headers.\n\n This function uses the `tabulate` package to generate a table with formatting options.\n It can accept various input formats and table styles, which can be customized using optional arguments.\n\n Args:\n *args: Positional arguments to be passed to `tabulate.tabulate`.\n **kwargs: Keyword arguments to customize table formatting.\n - tablefmt (str, optional): Table format. Default is 'grid'.\n - disable_numparse (bool, optional): Disable automatic number parsing. Default is True.\n - maxcolwidths (int, optional): Maximum column width. Default is 40.\n\n Returns:\n str: A string representing the formatted table.\n\n Examples:\n >>> print(make_table([[\"row1\", \"row1\"], [\"row2\", \"row2\"]], [\"header1\", \"header2\"]))\n +-----------+-----------+\n | header1 | header2 |\n +===========+===========+\n | row1 | row1 |\n +-----------+-----------+\n | row2 | row2 |\n +-----------+-----------+\n \"\"\"\n from tabulate import tabulate\n\n # fix IndexError: list index out of range\n if not rows:\n rows = [[]]\n tablefmt = os.environ.get(\"BBOT_TABLE_FORMAT\", None)\n defaults = {\"tablefmt\": \"grid\", \"disable_numparse\": True, \"maxcolwidths\": None}\n if tablefmt is None:\n defaults.update({\"maxcolwidths\": 40})\n else:\n defaults.update({\"tablefmt\": tablefmt})\n for k, v in defaults.items():\n if k not in kwargs:\n kwargs[k] = v\n # don't wrap columns in markdown\n if tablefmt in (\"github\", \"markdown\"):\n kwargs.pop(\"maxcolwidths\")\n # escape problematic markdown characters in rows\n\n def markdown_escape(s):\n return str(s).replace(\"|\", \"|\")\n\n rows = [[markdown_escape(f) for f in row] for row in rows]\n header = [markdown_escape(h) for h in header]\n return tabulate(rows, header, **kwargs)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.memory_status","title":"memory_status","text":"memory_status()\n
Return statistics on system memory consumption.
The function returns a psutil
named tuple that contains statistics on system virtual memory usage, such as total memory, used memory, available memory, and more.
Returns:
psutil._pslinux.svmem: A named tuple representing various statistics about system virtual memory usage.
Examples:
>>> mem = memory_status()\n>>> mem.available\n13195399168\n
>>> mem = memory_status()\n>>> mem.percent\n79.0\n
Source code in bbot/core/helpers/misc.py
def memory_status():\n \"\"\"Return statistics on system memory consumption.\n\n The function returns a `psutil` named tuple that contains statistics on\n system virtual memory usage, such as total memory, used memory, available\n memory, and more.\n\n Returns:\n psutil._pslinux.svmem: A named tuple representing various statistics\n about system virtual memory usage.\n\n Examples:\n >>> mem = memory_status()\n >>> mem.available\n 13195399168\n\n >>> mem = memory_status()\n >>> mem.percent\n 79.0\n \"\"\"\n import psutil\n\n return psutil.virtual_memory()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.mkdir","title":"mkdir","text":"mkdir(path, check_writable=True, raise_error=True)\n
Creates a directory and optionally checks if it's writable.
Parameters:
path
(str or Path
) \u2013 The directory to create.
check_writable
(bool
, default: True
) \u2013 Whether to check if the directory is writable. Default is True.
raise_error
(bool
, default: True
) \u2013 Whether to raise an error if the directory creation fails. Default is True.
Returns:
bool
\u2013 True if the directory is successfully created (and writable, if check_writable=True); otherwise False.
Raises:
DirectoryCreationError
\u2013 Raised if the directory cannot be created and raise_error=True
.
Examples:
>>> mkdir(\"/tmp/new_dir\")\nTrue\n>>> mkdir(\"/restricted_dir\", check_writable=False, raise_error=False)\nFalse\n
Source code in bbot/core/helpers/misc.py
def mkdir(path, check_writable=True, raise_error=True):\n \"\"\"\n Creates a directory and optionally checks if it's writable.\n\n Args:\n path (str or Path): The directory to create.\n check_writable (bool, optional): Whether to check if the directory is writable. Default is True.\n raise_error (bool, optional): Whether to raise an error if the directory creation fails. Default is True.\n\n Returns:\n bool: True if the directory is successfully created (and writable, if check_writable=True); otherwise False.\n\n Raises:\n DirectoryCreationError: Raised if the directory cannot be created and `raise_error=True`.\n\n Examples:\n >>> mkdir(\"/tmp/new_dir\")\n True\n >>> mkdir(\"/restricted_dir\", check_writable=False, raise_error=False)\n False\n \"\"\"\n path = Path(path).resolve()\n touchfile = path / f\".{rand_string()}\"\n try:\n path.mkdir(exist_ok=True, parents=True)\n if check_writable:\n touchfile.touch()\n return True\n except Exception as e:\n if raise_error:\n raise errors.DirectoryCreationError(f\"Failed to create directory at {path}: {e}\")\n finally:\n with suppress(Exception):\n touchfile.unlink()\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.os_platform","title":"os_platform","text":"os_platform()\n
Return the OS platform of the current system.
This function fetches and returns the OS type where the code is being executed. It converts the platform identifier to lowercase.
Returns:
str
\u2013 A string representing the OS platform, such as \"linux\", \"darwin\", or \"windows\".
Examples:
>>> os_platform()\n'linux'\n
Source code in bbot/core/helpers/misc.py
def os_platform():\n \"\"\"Return the OS platform of the current system.\n\n This function fetches and returns the OS type where the code is being executed.\n It converts the platform identifier to lowercase.\n\n Returns:\n str: A string representing the OS platform, such as \"linux\", \"darwin\", or \"windows\".\n\n Examples:\n >>> os_platform()\n 'linux'\n \"\"\"\n import platform\n\n return platform.system().lower()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.os_platform_friendly","title":"os_platform_friendly","text":"os_platform_friendly()\n
Return a human-friendly OS platform string, suitable for golang release binaries.
This function fetches the OS platform and modifies it to a more human-readable format if necessary. Specifically, it changes \"darwin\" to \"macOS\".
Returns:
str
\u2013 A string representing the human-friendly OS platform, such as \"macOS\", \"linux\", or \"windows\".
Examples:
>>> os_platform_friendly()\n'macOS'\n
Source code in bbot/core/helpers/misc.py
def os_platform_friendly():\n \"\"\"Return a human-friendly OS platform string, suitable for golang release binaries.\n\n This function fetches the OS platform and modifies it to a more human-readable format if necessary.\n Specifically, it changes \"darwin\" to \"macOS\".\n\n Returns:\n str: A string representing the human-friendly OS platform, such as \"macOS\", \"linux\", or \"windows\".\n\n Examples:\n >>> os_platform_friendly()\n 'macOS'\n \"\"\"\n p = os_platform()\n if p == \"darwin\":\n return \"macOS\"\n return p\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.parent_domain","title":"parent_domain","text":"parent_domain(d)\n
Retrieve the parent domain of a given subdomain string.
This function takes an input string d
representing a subdomain and returns its parent domain. If the input does not represent a subdomain, it returns the input as is.
Parameters:
d
(str
) \u2013 The input string representing a subdomain or domain.
Returns:
str
\u2013 The parent domain of the subdomain, or the original input if it is not a subdomain.
Examples:
>>> parent_domain(\"www.internal.evilcorp.co.uk\")\n\"internal.evilcorp.co.uk\"\n
>>> parent_domain(\"www.internal.evilcorp.co.uk:8080\")\n\"internal.evilcorp.co.uk:8080\"\n
>>> parent_domain(\"www.evilcorp.co.uk\")\n\"evilcorp.co.uk\"\n
>>> parent_domain(\"evilcorp.co.uk\")\n\"evilcorp.co.uk\"\n
Notes bbot/core/helpers/misc.py
def parent_domain(d):\n \"\"\"\n Retrieve the parent domain of a given subdomain string.\n\n This function takes an input string `d` representing a subdomain and returns its parent domain.\n If the input does not represent a subdomain, it returns the input as is.\n\n Args:\n d (str): The input string representing a subdomain or domain.\n\n Returns:\n str: The parent domain of the subdomain, or the original input if it is not a subdomain.\n\n Examples:\n >>> parent_domain(\"www.internal.evilcorp.co.uk\")\n \"internal.evilcorp.co.uk\"\n\n >>> parent_domain(\"www.internal.evilcorp.co.uk:8080\")\n \"internal.evilcorp.co.uk:8080\"\n\n >>> parent_domain(\"www.evilcorp.co.uk\")\n \"evilcorp.co.uk\"\n\n >>> parent_domain(\"evilcorp.co.uk\")\n \"evilcorp.co.uk\"\n\n Notes:\n - Port, if present in input, is preserved in the output.\n \"\"\"\n host, port = split_host_port(d)\n if is_subdomain(d):\n return make_netloc(\".\".join(str(host).split(\".\")[1:]), port)\n return d\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.parent_url","title":"parent_url","text":"parent_url(u)\n
Retrieve the parent URL of a given URL.
This function takes an input string u
representing a URL and returns its parent URL. If the input URL does not have a parent (i.e., it's already the top-level), it returns None.
Parameters:
u
(str
) \u2013 The input string representing a URL.
Returns:
Union[str, None]: The parent URL of the input URL, or None if it has no parent.
Examples:
>>> parent_url(\"https://evilcorp.com/sub/path/\")\n\"https://evilcorp.com/sub/\"\n
>>> parent_url(\"https://evilcorp.com/\")\nNone\n
Notes bbot/core/helpers/misc.py
def parent_url(u):\n \"\"\"\n Retrieve the parent URL of a given URL.\n\n This function takes an input string `u` representing a URL and returns its parent URL.\n If the input URL does not have a parent (i.e., it's already the top-level), it returns None.\n\n Args:\n u (str): The input string representing a URL.\n\n Returns:\n Union[str, None]: The parent URL of the input URL, or None if it has no parent.\n\n Examples:\n >>> parent_url(\"https://evilcorp.com/sub/path/\")\n \"https://evilcorp.com/sub/\"\n\n >>> parent_url(\"https://evilcorp.com/\")\n None\n\n Notes:\n - Only the path component of the URL is modified.\n - All other components like scheme, netloc, query, and fragment are preserved.\n \"\"\"\n parsed = urlparse(u)\n path = Path(parsed.path)\n if path.parent == path:\n return None\n else:\n return urlunparse(parsed._replace(path=str(path.parent)))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.parse_port_string","title":"parse_port_string","text":"parse_port_string(port_string)\n
Parses a string containing ports and port ranges into a list of individual ports.
Parameters:
port_string
(str
) \u2013 The string containing individual ports and port ranges separated by commas.
Returns:
list
\u2013 A list of individual ports parsed from the input string.
Raises:
ValueError
\u2013 If the input string contains invalid ports or port ranges.
Examples:
>>> parse_port_string(\"22,80,1000-1002\")\n[22, 80, 1000, 1001, 1002]\n
>>> parse_port_string(\"1-2,3-5\")\n[1, 2, 3, 4, 5]\n
>>> parse_port_string(\"invalid\")\nValueError: Invalid port or port range: invalid\n
Source code in bbot/core/helpers/misc.py
def parse_port_string(port_string):\n \"\"\"\n Parses a string containing ports and port ranges into a list of individual ports.\n\n Args:\n port_string (str): The string containing individual ports and port ranges separated by commas.\n\n Returns:\n list: A list of individual ports parsed from the input string.\n\n Raises:\n ValueError: If the input string contains invalid ports or port ranges.\n\n Examples:\n >>> parse_port_string(\"22,80,1000-1002\")\n [22, 80, 1000, 1001, 1002]\n\n >>> parse_port_string(\"1-2,3-5\")\n [1, 2, 3, 4, 5]\n\n >>> parse_port_string(\"invalid\")\n ValueError: Invalid port or port range: invalid\n \"\"\"\n elements = str(port_string).split(\",\")\n ports = []\n\n for element in elements:\n if element.isdigit():\n port = int(element)\n if 1 <= port <= 65535:\n ports.append(port)\n else:\n raise ValueError(f\"Invalid port: {element}\")\n elif \"-\" in element:\n range_parts = element.split(\"-\")\n if len(range_parts) != 2 or not all(part.isdigit() for part in range_parts):\n raise ValueError(f\"Invalid port or port range: {element}\")\n start, end = map(int, range_parts)\n if not (1 <= start < end <= 65535):\n raise ValueError(f\"Invalid port range: {element}\")\n ports.extend(range(start, end + 1))\n else:\n raise ValueError(f\"Invalid port or port range: {element}\")\n\n return ports\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.rand_string","title":"rand_string","text":"rand_string(length=10, digits=True)\n
Generates a random string of specified length.
Parameters:
length
(int
, default: 10
) \u2013 The length of the random string. Defaults to 10.
digits
(bool
, default: True
) \u2013 Whether to include digits in the string. Defaults to True.
Returns:
str
\u2013 A random string of the specified length.
Examples:
>>> rand_string()\n'c4hp4i9jzx'\n>>> rand_string(20)\n'ap4rsdtg5iw7ey7y3oa5'\n>>> rand_string(30, digits=False)\n'xdmyxtglqfzqktngkesyulwbfrihva'\n
Source code in bbot/core/helpers/misc.py
def rand_string(length=10, digits=True):\n \"\"\"\n Generates a random string of specified length.\n\n Args:\n length (int, optional): The length of the random string. Defaults to 10.\n digits (bool, optional): Whether to include digits in the string. Defaults to True.\n\n Returns:\n str: A random string of the specified length.\n\n Examples:\n >>> rand_string()\n 'c4hp4i9jzx'\n >>> rand_string(20)\n 'ap4rsdtg5iw7ey7y3oa5'\n >>> rand_string(30, digits=False)\n 'xdmyxtglqfzqktngkesyulwbfrihva'\n \"\"\"\n pool = rand_pool\n if digits:\n pool = rand_pool_digits\n return \"\".join([random.choice(pool) for _ in range(int(length))])\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.read_file","title":"read_file","text":"read_file(filename)\n
Reads a file line by line and yields each line without line breaks.
Parameters:
filename
(str or Path
) \u2013 The path to the file to read.
Yields:
str
\u2013 A line from the file without the trailing line break.
Examples:
>>> for line in read_file(\"/tmp/file.txt\"):\n... print(line)\nfile_line1\nfile_line2\nfile_line3\n
Source code in bbot/core/helpers/misc.py
def read_file(filename):\n \"\"\"Reads a file line by line and yields each line without line breaks.\n\n Args:\n filename (str or Path): The path to the file to read.\n\n Yields:\n str: A line from the file without the trailing line break.\n\n Examples:\n >>> for line in read_file(\"/tmp/file.txt\"):\n ... print(line)\n file_line1\n file_line2\n file_line3\n \"\"\"\n with open(filename, errors=\"ignore\") as f:\n for line in f:\n yield line.rstrip(\"\\r\\n\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.recursive_decode","title":"recursive_decode","text":"recursive_decode(data, max_depth=5)\n
Recursively decodes doubly or triply-encoded strings to their original form.
Supports both URL-encoding and backslash-escapes (including unicode)
Parameters:
data
(str
) \u2013 The data to decode.
max_depth
(int
, default: 5
) \u2013 Maximum recursion depth for decoding. Defaults to 5.
Returns:
str
\u2013 The decoded string.
Examples:
>>> recursive_decode(\"Hello%20world%21\")\n\"Hello world!\"\n>>> recursive_decode(\"Hello%20%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442\")\n\"Hello \u041f\u0440\u0438\u0432\u0435\u0442\"\n>>> recursive_dcode(\"%5Cu0020%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442%5Cu0021\")\n\" \u041f\u0440\u0438\u0432\u0435\u0442!\"\n
Source code in bbot/core/helpers/misc.py
def recursive_decode(data, max_depth=5):\n \"\"\"\n Recursively decodes doubly or triply-encoded strings to their original form.\n\n Supports both URL-encoding and backslash-escapes (including unicode)\n\n Args:\n data (str): The data to decode.\n max_depth (int, optional): Maximum recursion depth for decoding. Defaults to 5.\n\n Returns:\n str: The decoded string.\n\n Examples:\n >>> recursive_decode(\"Hello%20world%21\")\n \"Hello world!\"\n >>> recursive_decode(\"Hello%20%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442\")\n \"Hello \u041f\u0440\u0438\u0432\u0435\u0442\"\n >>> recursive_dcode(\"%5Cu0020%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442%5Cu0021\")\n \" \u041f\u0440\u0438\u0432\u0435\u0442!\"\n \"\"\"\n import codecs\n\n # Decode newline and tab escapes\n data = backslash_regex.sub(\n lambda match: {\"n\": \"\\n\", \"t\": \"\\t\", \"r\": \"\\r\", \"b\": \"\\b\", \"v\": \"\\v\"}.get(match.group(\"char\")), data\n )\n data = smart_decode(data)\n if max_depth == 0:\n return data\n # Decode URL encoding\n data = unquote(data, errors=\"ignore\")\n # Decode Unicode escapes\n with suppress(UnicodeEncodeError):\n data = ensure_utf8_compliant(codecs.decode(data, \"unicode_escape\", errors=\"ignore\"))\n # Check if there's still URL-encoded or Unicode-escaped content\n if encoded_regex.search(data):\n # If yes, continue decoding\n return recursive_decode(data, max_depth=max_depth - 1)\n return data\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.rm_at_exit","title":"rm_at_exit","text":"rm_at_exit(path)\n
Registers a file to be automatically deleted when the program exits.
Parameters:
path
(str or Path
) \u2013 The path to the file to be deleted upon program exit.
Examples:
>>> rm_at_exit(\"/tmp/test/file1.txt\")\n
Source code in bbot/core/helpers/misc.py
def rm_at_exit(path):\n \"\"\"Registers a file to be automatically deleted when the program exits.\n\n Args:\n path (str or Path): The path to the file to be deleted upon program exit.\n\n Examples:\n >>> rm_at_exit(\"/tmp/test/file1.txt\")\n \"\"\"\n import atexit\n\n atexit.register(delete_file, path)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.rm_rf","title":"rm_rf","text":"rm_rf(f)\n
Recursively delete a directory
Parameters:
f
(str or Path
) \u2013 The directory path to delete.
Examples:
>>> rm_rf(\"/tmp/httpx98323849\")\n
Source code in bbot/core/helpers/misc.py
def rm_rf(f):\n \"\"\"Recursively delete a directory\n\n Args:\n f (str or Path): The directory path to delete.\n\n Examples:\n >>> rm_rf(\"/tmp/httpx98323849\")\n \"\"\"\n import shutil\n\n shutil.rmtree(f)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.search_dict_by_key","title":"search_dict_by_key","text":"search_dict_by_key(key, d)\n
Search a nested dictionary or list of dictionaries by a key and yield all matching values.
Parameters:
key
(str
) \u2013 The key to search for.
d
(Union[dict, list]
) \u2013 The dictionary or list of dictionaries to search.
Yields:
Any
\u2013 Yields all values that match the provided key.
Examples:
>>> d = {'a': 1, 'b': {'c': 2, 'a': 3}, 'd': [{'a': 4}, {'e': 5}]}\n>>> list(search_dict_by_key('a', d))\n[1, 3, 4]\n
Source code in bbot/core/helpers/misc.py
def search_dict_by_key(key, d):\n \"\"\"Search a nested dictionary or list of dictionaries by a key and yield all matching values.\n\n Args:\n key (str): The key to search for.\n d (Union[dict, list]): The dictionary or list of dictionaries to search.\n\n Yields:\n Any: Yields all values that match the provided key.\n\n Examples:\n >>> d = {'a': 1, 'b': {'c': 2, 'a': 3}, 'd': [{'a': 4}, {'e': 5}]}\n >>> list(search_dict_by_key('a', d))\n [1, 3, 4]\n \"\"\"\n if isinstance(d, dict):\n if key in d:\n yield d[key]\n for k, v in d.items():\n yield from search_dict_by_key(key, v)\n elif isinstance(d, list):\n for v in d:\n yield from search_dict_by_key(key, v)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.search_dict_values","title":"search_dict_values","text":"search_dict_values(d, *regexes)\n
Recursively search a dictionary's values based on provided regex patterns.
Parameters:
d
(Union[dict, list, str]
) \u2013 The dictionary, list, or string to search.
*regexes
\u2013 Arbitrary number of compiled regex patterns.
Returns:
Generator
\u2013 Yields matching values based on the provided regex patterns.
Examples:
>>> dict_to_search = {\n... \"key1\": {\n... \"key2\": [\n... {\n... \"key3\": \"A URL: https://www.evilcorp.com\"\n... }\n... ]\n... }\n... }\n>>> url_regexes = re.compile(r'https?://[^\\s<>\"]+|www\\.[^\\s<>\"]+')\n>>> list(search_dict_values(dict_to_search, url_regexes))\n[\"https://www.evilcorp.com\"]\n
Source code in bbot/core/helpers/misc.py
def search_dict_values(d, *regexes):\n \"\"\"Recursively search a dictionary's values based on provided regex patterns.\n\n Args:\n d (Union[dict, list, str]): The dictionary, list, or string to search.\n *regexes: Arbitrary number of compiled regex patterns.\n\n Returns:\n Generator: Yields matching values based on the provided regex patterns.\n\n Examples:\n >>> dict_to_search = {\n ... \"key1\": {\n ... \"key2\": [\n ... {\n ... \"key3\": \"A URL: https://www.evilcorp.com\"\n ... }\n ... ]\n ... }\n ... }\n >>> url_regexes = re.compile(r'https?://[^\\\\s<>\"]+|www\\\\.[^\\\\s<>\"]+')\n >>> list(search_dict_values(dict_to_search, url_regexes))\n [\"https://www.evilcorp.com\"]\n \"\"\"\n\n results = set()\n if isinstance(d, str):\n for r in regexes:\n for match in r.finditer(d):\n result = match.group()\n h = hash(result)\n if h not in results:\n results.add(h)\n yield result\n elif isinstance(d, dict):\n for _, v in d.items():\n yield from search_dict_values(v, *regexes)\n elif isinstance(d, list):\n for v in d:\n yield from search_dict_values(v, *regexes)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.search_format_dict","title":"search_format_dict","text":"search_format_dict(d, **kwargs)\n
Recursively format string values in a dictionary or list using the provided keyword arguments.
Parameters:
d
(Union[dict, list, str]
) \u2013 The dictionary, list, or string to format.
**kwargs
\u2013 Arbitrary keyword arguments used for string formatting.
Returns:
Union[dict, list, str]: The formatted dictionary, list, or string.
Examples:
>>> search_format_dict({\"test\": \"#{name} is awesome\"}, name=\"keanu\")\n{\"test\": \"keanu is awesome\"}\n
Source code in bbot/core/helpers/misc.py
def search_format_dict(d, **kwargs):\n \"\"\"Recursively format string values in a dictionary or list using the provided keyword arguments.\n\n Args:\n d (Union[dict, list, str]): The dictionary, list, or string to format.\n **kwargs: Arbitrary keyword arguments used for string formatting.\n\n Returns:\n Union[dict, list, str]: The formatted dictionary, list, or string.\n\n Examples:\n >>> search_format_dict({\"test\": \"#{name} is awesome\"}, name=\"keanu\")\n {\"test\": \"keanu is awesome\"}\n \"\"\"\n if isinstance(d, dict):\n return {k: search_format_dict(v, **kwargs) for k, v in d.items()}\n elif isinstance(d, list):\n return [search_format_dict(v, **kwargs) for v in d]\n elif isinstance(d, str):\n for find, replace in kwargs.items():\n find = \"#{\" + str(find) + \"}\"\n d = d.replace(find, replace)\n return d\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.sha1","title":"sha1","text":"sha1(data)\n
Computes the SHA-1 hash of the given data.
Parameters:
data
(str or dict
) \u2013 The data to hash. If a dictionary, it is first converted to a JSON string with sorted keys.
Returns:
hashlib.Hash: SHA-1 hash object of the input data.
Examples:
>>> sha1(\"asdf\").hexdigest()\n'3da541559918a808c2402bba5012f6c60b27661c'\n
Source code in bbot/core/helpers/misc.py
def sha1(data):\n \"\"\"\n Computes the SHA-1 hash of the given data.\n\n Args:\n data (str or dict): The data to hash. If a dictionary, it is first converted to a JSON string with sorted keys.\n\n Returns:\n hashlib.Hash: SHA-1 hash object of the input data.\n\n Examples:\n >>> sha1(\"asdf\").hexdigest()\n '3da541559918a808c2402bba5012f6c60b27661c'\n \"\"\"\n from hashlib import sha1 as hashlib_sha1\n\n if isinstance(data, dict):\n data = json.dumps(data, sort_keys=True)\n return hashlib_sha1(smart_encode(data))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_decode","title":"smart_decode","text":"smart_decode(data)\n
Decodes the input data to a UTF-8 string, silently ignoring errors.
Parameters:
data
(str or bytes
) \u2013 The data to decode.
Returns:
str
\u2013 The decoded string.
Examples:
>>> smart_decode(b\"asdf\")\n\"asdf\"\n>>> smart_decode(\"asdf\")\n\"asdf\"\n
Source code in bbot/core/helpers/misc.py
def smart_decode(data):\n \"\"\"\n Decodes the input data to a UTF-8 string, silently ignoring errors.\n\n Args:\n data (str or bytes): The data to decode.\n\n Returns:\n str: The decoded string.\n\n Examples:\n >>> smart_decode(b\"asdf\")\n \"asdf\"\n >>> smart_decode(\"asdf\")\n \"asdf\"\n \"\"\"\n if isinstance(data, bytes):\n return data.decode(\"utf-8\", errors=\"ignore\")\n else:\n return str(data)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_decode_punycode","title":"smart_decode_punycode","text":"smart_decode_punycode(text: str) -> str\n
xn--eckwd4c7c.xn--zckzah --> \u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8
Source code inbbot/core/helpers/misc.py
def smart_decode_punycode(text: str) -> str:\n \"\"\"\n xn--eckwd4c7c.xn--zckzah --> \u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8\n \"\"\"\n import idna\n\n host, before, after = extract_host(text)\n if host is None:\n return text\n\n try:\n host = idna.decode(host)\n except UnicodeError:\n pass # If decoding fails, leave the host as it is\n\n return f\"{before}{host}{after}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_encode","title":"smart_encode","text":"smart_encode(data)\n
Encodes the input data to bytes using UTF-8 encoding, silently ignoring errors.
Parameters:
data
(str or bytes
) \u2013 The data to encode.
Returns:
bytes
\u2013 The encoded bytes.
Examples:
>>> smart_encode(\"asdf\")\nb\"asdf\"\n>>> smart_encode(b\"asdf\")\nb\"asdf\"\n
Source code in bbot/core/helpers/misc.py
def smart_encode(data):\n \"\"\"\n Encodes the input data to bytes using UTF-8 encoding, silently ignoring errors.\n\n Args:\n data (str or bytes): The data to encode.\n\n Returns:\n bytes: The encoded bytes.\n\n Examples:\n >>> smart_encode(\"asdf\")\n b\"asdf\"\n >>> smart_encode(b\"asdf\")\n b\"asdf\"\n \"\"\"\n if isinstance(data, bytes):\n return data\n return str(data).encode(\"utf-8\", errors=\"ignore\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_encode_punycode","title":"smart_encode_punycode","text":"smart_encode_punycode(text: str) -> str\n
\u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8 --> xn--eckwd4c7c.xn--zckzah
Source code inbbot/core/helpers/misc.py
def smart_encode_punycode(text: str) -> str:\n \"\"\"\n \u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8 --> xn--eckwd4c7c.xn--zckzah\n \"\"\"\n import idna\n\n host, before, after = extract_host(text)\n if host is None:\n return text\n\n try:\n host = idna.encode(host).decode(errors=\"ignore\")\n except UnicodeError:\n pass # If encoding fails, leave the host as it is\n\n return f\"{before}{host}{after}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.split_domain","title":"split_domain","text":"split_domain(hostname)\n
Splits the hostname into its subdomain and registered domain components.
Parameters:
hostname
(str
) \u2013 The full hostname to be split.
Returns:
tuple
\u2013 A tuple containing the subdomain and registered domain.
Examples:
>>> split_domain(\"www.internal.evilcorp.co.uk\")\n(\"www.internal\", \"evilcorp.co.uk\")\n
Notes tldextract
function to first break down the hostname.bbot/core/helpers/misc.py
def split_domain(hostname):\n \"\"\"\n Splits the hostname into its subdomain and registered domain components.\n\n Args:\n hostname (str): The full hostname to be split.\n\n Returns:\n tuple: A tuple containing the subdomain and registered domain.\n\n Examples:\n >>> split_domain(\"www.internal.evilcorp.co.uk\")\n (\"www.internal\", \"evilcorp.co.uk\")\n\n Notes:\n - Utilizes the `tldextract` function to first break down the hostname.\n \"\"\"\n if is_ip(hostname):\n return (\"\", hostname)\n parsed = tldextract(hostname)\n subdomain = parsed.subdomain\n domain = parsed.registered_domain\n if not domain:\n split = hostname.split(\".\")\n subdomain = \".\".join(split[:-2])\n domain = \".\".join(split[-2:])\n return (subdomain, domain)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.split_host_port","title":"split_host_port","text":"split_host_port(d)\n
Parse a string containing a host and port into a tuple.
This function takes an input string d
and returns a tuple containing the host and port. The host is converted to its appropriate IP address type if possible. The port is inferred based on the scheme if not provided.
Parameters:
d
(str
) \u2013 The input string containing the host and possibly the port.
Returns:
Tuple[Union[IPv4Address, IPv6Address, str], Optional[int]]: Tuple containing the host and port.
Examples:
>>> split_host_port(\"evilcorp.com:443\")\n(\"evilcorp.com\", 443)\n
>>> split_host_port(\"192.168.1.1:443\")\n(IPv4Address('192.168.1.1'), 443)\n
>>> split_host_port(\"[dead::beef]:443\")\n(IPv6Address('dead::beef'), 443)\n
Notes bbot/core/helpers/misc.py
def split_host_port(d):\n \"\"\"\n Parse a string containing a host and port into a tuple.\n\n This function takes an input string `d` and returns a tuple containing the host and port.\n The host is converted to its appropriate IP address type if possible. The port is inferred\n based on the scheme if not provided.\n\n Args:\n d (str): The input string containing the host and possibly the port.\n\n Returns:\n Tuple[Union[IPv4Address, IPv6Address, str], Optional[int]]: Tuple containing the host and port.\n\n Examples:\n >>> split_host_port(\"evilcorp.com:443\")\n (\"evilcorp.com\", 443)\n\n >>> split_host_port(\"192.168.1.1:443\")\n (IPv4Address('192.168.1.1'), 443)\n\n >>> split_host_port(\"[dead::beef]:443\")\n (IPv6Address('dead::beef'), 443)\n\n Notes:\n - If port is not provided, it is inferred based on the scheme:\n - For \"https\" and \"wss\", port 443 is used.\n - For \"http\" and \"ws\", port 80 is used.\n \"\"\"\n d = str(d)\n host = None\n port = None\n scheme = None\n if is_ip(d):\n return make_ip_type(d), port\n\n match = bbot_regexes.split_host_port_regex.match(d)\n if match is None:\n raise ValueError(f'split_port() failed to parse \"{d}\"')\n scheme = match.group(\"scheme\")\n netloc = match.group(\"netloc\")\n if netloc is None:\n raise ValueError(f'split_port() failed to parse \"{d}\"')\n\n match = bbot_regexes.extract_open_port_regex.match(netloc)\n if match is None:\n raise ValueError(f'split_port() failed to parse netloc \"{netloc}\"')\n\n host = match.group(2)\n if host is None:\n host = match.group(1)\n if host is None:\n raise ValueError(f'split_port() failed to locate host in netloc \"{netloc}\"')\n\n port = match.group(3)\n if port is None and scheme is not None:\n scheme = scheme.lower()\n if scheme in (\"https\", \"wss\"):\n port = 443\n elif scheme in (\"http\", \"ws\"):\n port = 80\n elif port is not None:\n with suppress(ValueError):\n port = int(port)\n\n return make_ip_type(host), port\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.split_list","title":"split_list","text":"split_list(alist, wanted_parts=2)\n
Splits a list into a specified number of approximately equal parts.
Parameters:
alist
(list
) \u2013 The list to be split.
wanted_parts
(int
, default: 2
) \u2013 The number of parts to split the list into.
Returns:
list
\u2013 A list of lists, each containing a portion of the original list.
Examples:
>>> split_list([1, 2, 3, 4, 5])\n[[1, 2], [3, 4, 5]]\n
Source code in bbot/core/helpers/misc.py
def split_list(alist, wanted_parts=2):\n \"\"\"\n Splits a list into a specified number of approximately equal parts.\n\n Args:\n alist (list): The list to be split.\n wanted_parts (int): The number of parts to split the list into.\n\n Returns:\n list: A list of lists, each containing a portion of the original list.\n\n Examples:\n >>> split_list([1, 2, 3, 4, 5])\n [[1, 2], [3, 4, 5]]\n \"\"\"\n length = len(alist)\n return [alist[i * length // wanted_parts : (i + 1) * length // wanted_parts] for i in range(wanted_parts)]\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.str_or_file","title":"str_or_file","text":"str_or_file(s)\n
Reads a string or file and yields its content line-by-line.
This function tries to open the given string s
as a file and yields its lines. If it fails to open s
as a file, it treats s
as a regular string and yields it as is.
Parameters:
s
(str
) \u2013 The string or file path to read.
Yields:
str
\u2013 Either lines from the file or the original string.
Examples:
>>> list(str_or_file(\"file.txt\"))\n['file_line1', 'file_line2', 'file_line3']\n>>> list(str_or_file(\"not_a_file\"))\n['not_a_file']\n
Source code in bbot/core/helpers/misc.py
def str_or_file(s):\n \"\"\"Reads a string or file and yields its content line-by-line.\n\n This function tries to open the given string `s` as a file and yields its lines.\n If it fails to open `s` as a file, it treats `s` as a regular string and yields it as is.\n\n Args:\n s (str): The string or file path to read.\n\n Yields:\n str: Either lines from the file or the original string.\n\n Examples:\n >>> list(str_or_file(\"file.txt\"))\n ['file_line1', 'file_line2', 'file_line3']\n >>> list(str_or_file(\"not_a_file\"))\n ['not_a_file']\n \"\"\"\n try:\n with open(s, errors=\"ignore\") as f:\n for line in f:\n yield line.rstrip(\"\\r\\n\")\n except OSError:\n yield s\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.subdomain_depth","title":"subdomain_depth","text":"subdomain_depth(d)\n
Calculate the depth of subdomains within a given domain name.
Parameters:
d
(str
) \u2013 The domain name to analyze.
Returns:
int
\u2013 The depth of the subdomain. For example, a hostname \"5.4.3.2.1.evilcorp.com\"
has a subdomain depth of 5.
bbot/core/helpers/misc.py
def subdomain_depth(d):\n \"\"\"\n Calculate the depth of subdomains within a given domain name.\n\n Args:\n d (str): The domain name to analyze.\n\n Returns:\n int: The depth of the subdomain. For example, a hostname \"5.4.3.2.1.evilcorp.com\"\n has a subdomain depth of 5.\n \"\"\"\n subdomain, domain = split_domain(d)\n if not subdomain:\n return 0\n return subdomain.count(\".\") + 1\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.swap_status","title":"swap_status","text":"swap_status()\n
Return statistics on swap memory consumption.
The function returns a psutil
named tuple that contains statistics on system swap memory usage, such as total swap, used swap, free swap, and more.
Returns:
psutil._common.sswap: A named tuple representing various statistics about system swap memory usage.
Examples:
>>> swap = swap_status()\n>>> swap.total\n4294967296\n
>>> swap = swap_status()\n>>> swap.used\n2097152\n
Source code in bbot/core/helpers/misc.py
def swap_status():\n \"\"\"Return statistics on swap memory consumption.\n\n The function returns a `psutil` named tuple that contains statistics on\n system swap memory usage, such as total swap, used swap, free swap, and more.\n\n Returns:\n psutil._common.sswap: A named tuple representing various statistics\n about system swap memory usage.\n\n Examples:\n >>> swap = swap_status()\n >>> swap.total\n 4294967296\n\n >>> swap = swap_status()\n >>> swap.used\n 2097152\n \"\"\"\n import psutil\n\n return psutil.swap_memory()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.tagify","title":"tagify","text":"tagify(s, delimiter=None, maxlen=None)\n
Sanitize a string into a tag-friendly format.
Converts a given string to lowercase and replaces all characters not matching [a-z0-9] with hyphens. Optionally truncates the result to 'maxlen' characters.
Parameters:
s
(str
) \u2013 The input string to sanitize.
maxlen
(int
, default: None
) \u2013 The maximum length for the tag. Defaults to None.
Returns:
str
\u2013 A sanitized, tag-friendly string.
Examples:
>>> tagify(\"HTTP Web Title\")\n'http-web-title'\n>>> tagify(\"HTTP Web Title\", maxlen=8)\n'http-web'\n
Source code in bbot/core/helpers/misc.py
def tagify(s, delimiter=None, maxlen=None):\n \"\"\"Sanitize a string into a tag-friendly format.\n\n Converts a given string to lowercase and replaces all characters not matching\n [a-z0-9] with hyphens. Optionally truncates the result to 'maxlen' characters.\n\n Args:\n s (str): The input string to sanitize.\n maxlen (int, optional): The maximum length for the tag. Defaults to None.\n\n Returns:\n str: A sanitized, tag-friendly string.\n\n Examples:\n >>> tagify(\"HTTP Web Title\")\n 'http-web-title'\n >>> tagify(\"HTTP Web Title\", maxlen=8)\n 'http-web'\n \"\"\"\n if delimiter is None:\n delimiter = \"-\"\n ret = str(s).lower()\n return tag_filter_regex.sub(delimiter, ret)[:maxlen].strip(delimiter)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.tldextract","title":"tldextract","text":"tldextract(data)\n
Extracts the subdomain, domain, and suffix from a URL string.
Parameters:
data
(str
) \u2013 The URL string to be processed.
Returns:
ExtractResult
\u2013 A named tuple containing the subdomain, domain, and suffix.
Examples:
>>> tldextract(\"www.evilcorp.co.uk\")\nExtractResult(subdomain='www', domain='evilcorp', suffix='co.uk')\n
Notes smart_decode
to preprocess the data.tldextract
library for extraction.bbot/core/helpers/misc.py
def tldextract(data):\n \"\"\"\n Extracts the subdomain, domain, and suffix from a URL string.\n\n Args:\n data (str): The URL string to be processed.\n\n Returns:\n ExtractResult: A named tuple containing the subdomain, domain, and suffix.\n\n Examples:\n >>> tldextract(\"www.evilcorp.co.uk\")\n ExtractResult(subdomain='www', domain='evilcorp', suffix='co.uk')\n\n Notes:\n - Utilizes `smart_decode` to preprocess the data.\n - Makes use of the `tldextract` library for extraction.\n \"\"\"\n import tldextract as _tldextract\n\n return _tldextract.extract(smart_decode(data))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.top_tcp_ports","title":"top_tcp_ports","text":"top_tcp_ports(n, as_string=False)\n
Returns the top n TCP ports as evaluated by nmap
Source code inbbot/core/helpers/misc.py
def top_tcp_ports(n, as_string=False):\n \"\"\"\n Returns the top *n* TCP ports as evaluated by nmap\n \"\"\"\n top_ports_file = Path(__file__).parent.parent.parent / \"wordlists\" / \"top_open_ports_nmap.txt\"\n\n global top_ports_cache\n if top_ports_cache is None:\n # Read the open ports from the file\n with open(top_ports_file, \"r\") as f:\n top_ports_cache = [int(line.strip()) for line in f]\n\n # If n is greater than the length of the ports list, add remaining ports from range(1, 65536)\n unique_ports = set(top_ports_cache)\n top_ports_cache.extend([port for port in range(1, 65536) if port not in unique_ports])\n\n top_ports = top_ports_cache[:n]\n if as_string:\n return \",\".join([str(s) for s in top_ports])\n return top_ports\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.truncate_filename","title":"truncate_filename","text":"truncate_filename(file_path, max_length=255)\n
Truncate the filename while preserving the file extension to ensure the total path length does not exceed the maximum length.
Parameters:
file_path
(str
) \u2013 The original file path.
max_length
(int
, default: 255
) \u2013 The maximum allowed length for the total path. Default is 255.
Returns:
pathlib.Path: A new Path object with the truncated filename.
Raises:
ValueError
\u2013 If the directory path is too long to accommodate any filename within the limit.
truncate_filename('/path/to/example_long_filename.txt', 20) PosixPath('/path/to/example.txt')
Source code inbbot/core/helpers/misc.py
def truncate_filename(file_path, max_length=255):\n \"\"\"\n Truncate the filename while preserving the file extension to ensure the total path length does not exceed the maximum length.\n\n Args:\n file_path (str): The original file path.\n max_length (int): The maximum allowed length for the total path. Default is 255.\n\n Returns:\n pathlib.Path: A new Path object with the truncated filename.\n\n Raises:\n ValueError: If the directory path is too long to accommodate any filename within the limit.\n\n Example:\n >>> truncate_filename('/path/to/example_long_filename.txt', 20)\n PosixPath('/path/to/example.txt')\n \"\"\"\n p = Path(file_path)\n directory, stem, suffix = p.parent, p.stem, p.suffix\n\n max_filename_length = max_length - len(str(directory)) - len(suffix) - 1 # 1 for the '/' separator\n\n if max_filename_length <= 0:\n raise ValueError(\"The directory path is too long to accommodate any filename within the limit.\")\n\n if len(stem) > max_filename_length:\n truncated_stem = stem[:max_filename_length]\n else:\n truncated_stem = stem\n\n new_path = directory / (truncated_stem + suffix)\n return new_path\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.url_parents","title":"url_parents","text":"url_parents(u)\n
Generate a list of parent URLs for a given URL string.
This function takes an input string u
representing a URL and generates a list of its parent URLs in decreasing order of specificity.
Parameters:
u
(str
) \u2013 The input string representing a URL.
Returns:
List[str]: A list of parent URLs of the input URL in decreasing order of specificity.
Examples:
>>> url_parents(\"http://www.evilcorp.co.uk/admin/tools/cmd.php\")\n[\"http://www.evilcorp.co.uk/admin/tools/\", \"http://www.evilcorp.co.uk/admin/\", \"http://www.evilcorp.co.uk/\"]\n
Notes parent_url
until it returns None.bbot/core/helpers/misc.py
def url_parents(u):\n \"\"\"\n Generate a list of parent URLs for a given URL string.\n\n This function takes an input string `u` representing a URL and generates a list of its parent URLs in decreasing order of specificity.\n\n Args:\n u (str): The input string representing a URL.\n\n Returns:\n List[str]: A list of parent URLs of the input URL in decreasing order of specificity.\n\n Examples:\n >>> url_parents(\"http://www.evilcorp.co.uk/admin/tools/cmd.php\")\n [\"http://www.evilcorp.co.uk/admin/tools/\", \"http://www.evilcorp.co.uk/admin/\", \"http://www.evilcorp.co.uk/\"]\n\n Notes:\n - The list is generated by continuously calling `parent_url` until it returns None.\n - All components of the URL except for the path are preserved.\n \"\"\"\n parent_list = []\n while 1:\n parent = parent_url(u)\n if parent == None:\n return parent_list\n elif parent not in parent_list:\n parent_list.append(parent)\n u = parent\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.verify_sudo_password","title":"verify_sudo_password","text":"verify_sudo_password(sudo_pass)\n
Verify if the given sudo password is correct.
This function checks whether the sudo password provided is valid for the current user. It runs a command with sudo, feeding in the password via stdin, and checks the return code.
Parameters:
sudo_pass
(str
) \u2013 The sudo password to verify.
Returns:
bool
\u2013 True if the sudo password is correct, False otherwise.
Examples:
>>> verify_sudo_password(\"mysecretpassword\")\nTrue\n
Source code in bbot/core/helpers/misc.py
def verify_sudo_password(sudo_pass):\n \"\"\"Verify if the given sudo password is correct.\n\n This function checks whether the sudo password provided is valid for the current user.\n It runs a command with sudo, feeding in the password via stdin, and checks the return code.\n\n Args:\n sudo_pass (str): The sudo password to verify.\n\n Returns:\n bool: True if the sudo password is correct, False otherwise.\n\n Examples:\n >>> verify_sudo_password(\"mysecretpassword\")\n True\n \"\"\"\n try:\n sp.run(\n [\"sudo\", \"-S\", \"-k\", \"true\"],\n input=smart_encode(sudo_pass),\n stderr=sp.DEVNULL,\n stdout=sp.DEVNULL,\n check=True,\n )\n except sp.CalledProcessError:\n return False\n return True\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.weighted_shuffle","title":"weighted_shuffle","text":"weighted_shuffle(items, weights)\n
Shuffles a list of items based on their corresponding weights.
Parameters:
items
(list
) \u2013 The list of items to shuffle.
weights
(list
) \u2013 The list of weights corresponding to each item.
Returns:
list
\u2013 A new list containing the shuffled items.
Examples:
>>> items = ['apple', 'banana', 'cherry']\n>>> weights = [0.4, 0.5, 0.1]\n>>> weighted_shuffle(items, weights)\n['banana', 'apple', 'cherry']\n>>> weighted_shuffle(items, weights)\n['apple', 'banana', 'cherry']\n>>> weighted_shuffle(items, weights)\n['apple', 'banana', 'cherry']\n>>> weighted_shuffle(items, weights)\n['banana', 'apple', 'cherry']\n
Note The sum of all weights does not have to be 1. They will be normalized internally.
Source code inbbot/core/helpers/misc.py
def weighted_shuffle(items, weights):\n \"\"\"\n Shuffles a list of items based on their corresponding weights.\n\n Args:\n items (list): The list of items to shuffle.\n weights (list): The list of weights corresponding to each item.\n\n Returns:\n list: A new list containing the shuffled items.\n\n Examples:\n >>> items = ['apple', 'banana', 'cherry']\n >>> weights = [0.4, 0.5, 0.1]\n >>> weighted_shuffle(items, weights)\n ['banana', 'apple', 'cherry']\n >>> weighted_shuffle(items, weights)\n ['apple', 'banana', 'cherry']\n >>> weighted_shuffle(items, weights)\n ['apple', 'banana', 'cherry']\n >>> weighted_shuffle(items, weights)\n ['banana', 'apple', 'cherry']\n\n Note:\n The sum of all weights does not have to be 1. They will be normalized internally.\n \"\"\"\n # Create a list of tuples where each tuple is (item, weight)\n pool = list(zip(items, weights))\n\n shuffled_items = []\n\n # While there are still items to be chosen...\n while pool:\n # Normalize weights\n total = sum(weight for item, weight in pool)\n weights = [weight / total for item, weight in pool]\n\n # Choose an index based on weight\n chosen_index = random.choices(range(len(pool)), weights=weights, k=1)[0]\n\n # Add the chosen item to the shuffled list\n chosen_item, chosen_weight = pool.pop(chosen_index)\n shuffled_items.append(chosen_item)\n\n return shuffled_items\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.which","title":"which","text":"which(*executables)\n
Finds the full path of the first available executable from a list of executables.
Parameters:
*executables
(str
, default: ()
) \u2013 One or more executable names to search for.
Returns:
str
\u2013 The full path of the first available executable, or None if none are found.
Examples:
>>> which(\"python\", \"python3\")\n\"/usr/bin/python\"\n
Source code in bbot/core/helpers/misc.py
def which(*executables):\n \"\"\"Finds the full path of the first available executable from a list of executables.\n\n Args:\n *executables (str): One or more executable names to search for.\n\n Returns:\n str: The full path of the first available executable, or None if none are found.\n\n Examples:\n >>> which(\"python\", \"python3\")\n \"/usr/bin/python\"\n \"\"\"\n import shutil\n\n for e in executables:\n location = shutil.which(e)\n if location:\n return location\n
"},{"location":"dev/helpers/web/","title":"Web","text":"These are helpers for making various web requests.
Note that these helpers can be invoked directly from self.helpers
, e.g.:
self.helpers.request(\"https://www.evilcorp.com\")\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper","title":"WebHelper","text":" Bases: EngineClient
bbot/core/helpers/web/web.py
class WebHelper(EngineClient):\n\n SERVER_CLASS = HTTPEngine\n ERROR_CLASS = WebError\n\n \"\"\"\n Main utility class for managing HTTP operations in BBOT. It serves as a wrapper around the BBOTAsyncClient,\n which itself is a subclass of httpx.AsyncClient. The class provides functionalities to make HTTP requests,\n download files, and handle cached wordlists.\n\n Attributes:\n parent_helper (object): The parent helper object containing scan configurations.\n http_debug (bool): Flag to indicate whether HTTP debugging is enabled.\n ssl_verify (bool): Flag to indicate whether SSL verification is enabled.\n web_client (BBOTAsyncClient): An instance of BBOTAsyncClient for making HTTP requests.\n client_only_options (tuple): A tuple of options only applicable to the web client.\n\n Examples:\n Basic web request:\n >>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n Download file:\n >>> filename = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\")\n\n Download wordlist (cached for 30 days by default):\n >>> filename = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n \"\"\"\n\n def __init__(self, parent_helper):\n self.parent_helper = parent_helper\n self.preset = self.parent_helper.preset\n self.config = self.preset.config\n self.web_config = self.config.get(\"web\", {})\n self.web_spider_depth = self.web_config.get(\"spider_depth\", 1)\n self.web_spider_distance = self.web_config.get(\"spider_distance\", 0)\n self.target = self.preset.target\n self.ssl_verify = self.config.get(\"ssl_verify\", False)\n engine_debug = self.config.get(\"engine\", {}).get(\"debug\", False)\n super().__init__(\n server_kwargs={\"config\": self.config, \"target\": self.parent_helper.preset.target.radix_only},\n debug=engine_debug,\n )\n\n def AsyncClient(self, *args, **kwargs):\n from .client import BBOTAsyncClient\n\n return BBOTAsyncClient.from_config(self.config, self.target, *args, persist_cookies=False, **kwargs)\n\n async def request(self, *args, **kwargs):\n \"\"\"\n Asynchronous function for making HTTP requests, intended to be the most basic web request function\n used widely across BBOT and within this helper class. Handles various exceptions and timeouts\n that might occur during the request.\n\n This function automatically respects the scan's global timeout, proxy, headers, etc.\n Headers you specify will be merged with the scan's. Your arguments take ultimate precedence,\n meaning you can override the scan's values if you want.\n\n Args:\n url (str): The URL to send the request to.\n method (str, optional): The HTTP method to use for the request. Defaults to 'GET'.\n headers (dict, optional): Dictionary of HTTP headers to send with the request.\n params (dict, optional): Dictionary, list of tuples, or bytes to send in the query string.\n cookies (dict, optional): Dictionary or CookieJar object containing cookies.\n json (Any, optional): A JSON serializable Python object to send in the body.\n data (dict, optional): Dictionary, list of tuples, or bytes to send in the body.\n files (dict, optional): Dictionary of 'name': file-like-objects for multipart encoding upload.\n auth (tuple, optional): Auth tuple to enable Basic/Digest/Custom HTTP auth.\n timeout (float, optional): The maximum time to wait for the request to complete.\n proxies (dict, optional): Dictionary mapping protocol schemes to proxy URLs.\n allow_redirects (bool, optional): Enables or disables redirection. Defaults to None.\n stream (bool, optional): Enables or disables response streaming.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n client (httpx.AsyncClient, optional): A specific httpx.AsyncClient to use for the request. Defaults to self.web_client.\n cache_for (int, optional): Time in seconds to cache the request. Not used currently. Defaults to None.\n\n Raises:\n httpx.TimeoutException: If the request times out.\n httpx.ConnectError: If the connection fails.\n httpx.RequestError: For other request-related errors.\n\n Returns:\n httpx.Response or None: The HTTP response object returned by the httpx library.\n\n Examples:\n >>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n >>> response = await self.helpers.request(\"https://api.evilcorp.com/\", method=\"POST\", data=\"stuff\")\n\n Note:\n If the web request fails, it will return None unless `raise_error` is `True`.\n \"\"\"\n return await self.run_and_return(\"request\", *args, **kwargs)\n\n async def request_batch(self, urls, *args, **kwargs):\n \"\"\"\n Given a list of URLs, request them in parallel and yield responses as they come in.\n\n Args:\n urls (list[str]): List of URLs to visit\n *args: Positional arguments to pass through to httpx\n **kwargs: Keyword arguments to pass through to httpx\n\n Examples:\n >>> async for url, response in self.helpers.request_batch(urls, headers={\"X-Test\": \"Test\"}):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_batch\", urls, *args, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n async def request_custom_batch(self, urls_and_kwargs):\n \"\"\"\n Make web requests in parallel with custom options for each request. Yield responses as they come in.\n\n Similar to `request_batch` except it allows individual arguments for each URL.\n\n Args:\n urls_and_kwargs (list[tuple]): List of tuples in the format: (url, kwargs, custom_tracker)\n where custom_tracker is an optional value for your own internal use. You may use it to\n help correlate requests, etc.\n\n Examples:\n >>> urls_and_kwargs = [\n >>> (\"http://evilcorp.com/1\", {\"method\": \"GET\"}, \"request-1\"),\n >>> (\"http://evilcorp.com/2\", {\"method\": \"POST\"}, \"request-2\"),\n >>> ]\n >>> async for url, kwargs, custom_tracker, response in self.helpers.request_custom_batch(\n >>> urls_and_kwargs\n >>> ):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_custom_batch\", urls_and_kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n async def download(self, url, **kwargs):\n \"\"\"\n Asynchronous function for downloading files from a given URL. Supports caching with an optional\n time period in hours via the \"cache_hrs\" keyword argument. In case of successful download,\n returns the full path of the saved filename. If the download fails, returns None.\n\n Args:\n url (str): The URL of the file to download.\n filename (str, optional): The filename to save the downloaded file as.\n If not provided, will generate based on URL.\n max_size (str or int): Maximum filesize as a string (\"5MB\") or integer in bytes.\n cache_hrs (float, optional): The number of hours to cache the downloaded file.\n A negative value disables caching. Defaults to -1.\n method (str, optional): The HTTP method to use for the request, defaults to 'GET'.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n **kwargs: Additional keyword arguments to pass to the httpx request.\n\n Returns:\n Path or None: The full path of the downloaded file as a Path object if successful, otherwise None.\n\n Examples:\n >>> filepath = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\", cache_hrs=24)\n \"\"\"\n success = False\n filename = kwargs.pop(\"filename\", self.parent_helper.cache_filename(url))\n filename = truncate_filename(Path(filename).resolve())\n kwargs[\"filename\"] = filename\n max_size = kwargs.pop(\"max_size\", None)\n if max_size is not None:\n max_size = self.parent_helper.human_to_bytes(max_size)\n kwargs[\"max_size\"] = max_size\n cache_hrs = float(kwargs.pop(\"cache_hrs\", -1))\n if cache_hrs > 0 and self.parent_helper.is_cached(url):\n log.debug(f\"{url} is cached at {self.parent_helper.cache_filename(url)}\")\n success = True\n else:\n success = await self.run_and_return(\"download\", url, **kwargs)\n\n if success:\n return filename\n\n async def wordlist(self, path, lines=None, **kwargs):\n \"\"\"\n Asynchronous function for retrieving wordlists, either from a local path or a URL.\n Allows for optional line-based truncation and caching. Returns the full path of the wordlist\n file or a truncated version of it.\n\n Args:\n path (str): The local or remote path of the wordlist.\n lines (int, optional): Number of lines to read from the wordlist.\n If specified, will return a truncated wordlist with this many lines.\n cache_hrs (float, optional): Number of hours to cache the downloaded wordlist.\n Defaults to 720 hours (30 days) for remote wordlists.\n **kwargs: Additional keyword arguments to pass to the 'download' function for remote wordlists.\n\n Returns:\n Path: The full path of the wordlist (or its truncated version) as a Path object.\n\n Raises:\n WordlistError: If the path is invalid or the wordlist could not be retrieved or found.\n\n Examples:\n Fetching full wordlist\n >>> wordlist_path = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n\n Fetching and truncating to the first 100 lines\n >>> wordlist_path = await self.helpers.wordlist(\"/root/rockyou.txt\", lines=100)\n \"\"\"\n if not path:\n raise WordlistError(f\"Invalid wordlist: {path}\")\n if not \"cache_hrs\" in kwargs:\n kwargs[\"cache_hrs\"] = 720\n if self.parent_helper.is_url(path):\n filename = await self.download(str(path), **kwargs)\n if filename is None:\n raise WordlistError(f\"Unable to retrieve wordlist from {path}\")\n else:\n filename = Path(path).resolve()\n if not filename.is_file():\n raise WordlistError(f\"Unable to find wordlist at {path}\")\n\n if lines is None:\n return filename\n else:\n lines = int(lines)\n with open(filename) as f:\n read_lines = f.readlines()\n cache_key = f\"{filename}:{lines}\"\n truncated_filename = self.parent_helper.cache_filename(cache_key)\n with open(truncated_filename, \"w\") as f:\n for line in read_lines[:lines]:\n f.write(line)\n return truncated_filename\n\n async def api_page_iter(self, url, page_size=100, json=True, next_key=None, **requests_kwargs):\n \"\"\"\n An asynchronous generator function for iterating through paginated API data.\n\n This function continuously makes requests to a specified API URL, incrementing the page number\n or applying a custom pagination function, and yields the received data one page at a time.\n It is well-suited for APIs that provide paginated results.\n\n Args:\n url (str): The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.\n page_size (int, optional): The number of items per page. Defaults to 100.\n json (bool, optional): If True, attempts to deserialize the response content to a JSON object. Defaults to True.\n next_key (callable, optional): A function that takes the last page's data and returns the URL for the next page. Defaults to None.\n **requests_kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Yields:\n dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.\n\n Note:\n The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.\n\n Examples:\n >>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')\n >>> try:\n >>> async for page in agen:\n >>> subdomains = page[\"subdomains\"]\n >>> self.hugesuccess(subdomains)\n >>> if not subdomains:\n >>> break\n >>> finally:\n >>> agen.aclose()\n \"\"\"\n page = 1\n offset = 0\n result = None\n while 1:\n if result and callable(next_key):\n try:\n new_url = next_key(result)\n except Exception as e:\n log.debug(f\"Failed to extract next page of results from {url}: {e}\")\n log.debug(traceback.format_exc())\n else:\n new_url = url.format(page=page, page_size=page_size, offset=offset)\n result = await self.request(new_url, **requests_kwargs)\n if result is None:\n log.verbose(f\"api_page_iter() got no response for {url}\")\n break\n try:\n if json:\n result = result.json()\n yield result\n except Exception:\n log.warning(f'Error in api_page_iter() for url: \"{new_url}\"')\n log.trace(traceback.format_exc())\n break\n finally:\n offset += page_size\n page += 1\n\n async def curl(self, *args, **kwargs):\n \"\"\"\n An asynchronous function that runs a cURL command with specified arguments and options.\n\n This function constructs and executes a cURL command based on the provided parameters.\n It offers support for various cURL options such as headers, post data, and cookies.\n\n Args:\n *args: Variable length argument list for positional arguments. Unused in this function.\n url (str): The URL for the cURL request. Mandatory.\n raw_path (bool, optional): If True, activates '--path-as-is' in cURL. Defaults to False.\n headers (dict, optional): A dictionary of HTTP headers to include in the request.\n ignore_bbot_global_settings (bool, optional): If True, ignores the global settings of BBOT. Defaults to False.\n post_data (dict, optional): A dictionary containing data to be sent in the request body.\n method (str, optional): The HTTP method to use for the request (e.g., 'GET', 'POST').\n cookies (dict, optional): A dictionary of cookies to include in the request.\n path_override (str, optional): Overrides the request-target to use in the HTTP request line.\n head_mode (bool, optional): If True, includes '-I' to fetch headers only. Defaults to None.\n raw_body (str, optional): Raw string to be sent in the body of the request.\n **kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Returns:\n str: The output of the cURL command.\n\n Raises:\n CurlError: If 'url' is not supplied.\n\n Examples:\n >>> output = await curl(url=\"https://example.com\", headers={\"X-Header\": \"Wat\"})\n >>> print(output)\n \"\"\"\n url = kwargs.get(\"url\", \"\")\n\n if not url:\n raise CurlError(\"No URL supplied to CURL helper\")\n\n curl_command = [\"curl\", url, \"-s\"]\n\n raw_path = kwargs.get(\"raw_path\", False)\n if raw_path:\n curl_command.append(\"--path-as-is\")\n\n # respect global ssl verify settings\n if self.ssl_verify is not True:\n curl_command.append(\"-k\")\n\n headers = kwargs.get(\"headers\", {})\n\n ignore_bbot_global_settings = kwargs.get(\"ignore_bbot_global_settings\", False)\n\n if ignore_bbot_global_settings:\n log.debug(\"ignore_bbot_global_settings enabled. Global settings will not be applied\")\n else:\n http_timeout = self.parent_helper.web_config.get(\"http_timeout\", 20)\n user_agent = self.parent_helper.web_config.get(\"user_agent\", \"BBOT\")\n\n if \"User-Agent\" not in headers:\n headers[\"User-Agent\"] = user_agent\n\n # only add custom headers if the URL is in-scope\n if self.parent_helper.preset.in_scope(url):\n for hk, hv in self.web_config.get(\"http_headers\", {}).items():\n headers[hk] = hv\n\n # add the timeout\n if not \"timeout\" in kwargs:\n timeout = http_timeout\n\n curl_command.append(\"-m\")\n curl_command.append(str(timeout))\n\n for k, v in headers.items():\n if isinstance(v, list):\n for x in v:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {x}\")\n\n else:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {v}\")\n\n post_data = kwargs.get(\"post_data\", {})\n if len(post_data.items()) > 0:\n curl_command.append(\"-d\")\n post_data_str = \"\"\n for k, v in post_data.items():\n post_data_str += f\"&{k}={v}\"\n curl_command.append(post_data_str.lstrip(\"&\"))\n\n method = kwargs.get(\"method\", \"\")\n if method:\n curl_command.append(\"-X\")\n curl_command.append(method)\n\n cookies = kwargs.get(\"cookies\", \"\")\n if cookies:\n curl_command.append(\"-b\")\n cookies_str = \"\"\n for k, v in cookies.items():\n cookies_str += f\"{k}={v}; \"\n curl_command.append(f'{cookies_str.rstrip(\" \")}')\n\n path_override = kwargs.get(\"path_override\", None)\n if path_override:\n curl_command.append(\"--request-target\")\n curl_command.append(f\"{path_override}\")\n\n head_mode = kwargs.get(\"head_mode\", None)\n if head_mode:\n curl_command.append(\"-I\")\n\n raw_body = kwargs.get(\"raw_body\", None)\n if raw_body:\n curl_command.append(\"-d\")\n curl_command.append(raw_body)\n\n output = (await self.parent_helper.run(curl_command)).stdout\n return output\n\n def beautifulsoup(\n self,\n markup,\n features=\"html.parser\",\n builder=None,\n parse_only=None,\n from_encoding=None,\n exclude_encodings=None,\n element_classes=None,\n **kwargs,\n ):\n \"\"\"\n Naviate, Search, Modify, Parse, or PrettyPrint HTML Content.\n More information at https://beautiful-soup-4.readthedocs.io/en/latest/\n\n Args:\n markup: A string or a file-like object representing markup to be parsed.\n features: Desirable features of the parser to be used.\n This may be the name of a specific parser (\"lxml\",\n \"lxml-xml\", \"html.parser\", or \"html5lib\") or it may be\n the type of markup to be used (\"html\", \"html5\", \"xml\").\n Defaults to 'html.parser'.\n builder: A TreeBuilder subclass to instantiate (or instance to use)\n instead of looking one up based on `features`.\n parse_only: A SoupStrainer. Only parts of the document\n matching the SoupStrainer will be considered.\n from_encoding: A string indicating the encoding of the\n document to be parsed.\n exclude_encodings = A list of strings indicating\n encodings known to be wrong.\n element_classes = A dictionary mapping BeautifulSoup\n classes like Tag and NavigableString, to other classes you'd\n like to be instantiated instead as the parse tree is\n built.\n **kwargs = For backwards compatibility purposes.\n\n Returns:\n soup: An instance of the BeautifulSoup class\n\n Todo:\n - Write tests for this function\n\n Examples:\n >>> soup = self.helpers.beautifulsoup(event.data[\"body\"], \"html.parser\")\n Perform an html parse of the 'markup' argument and return a soup instance\n\n >>> email_type = soup.find(type=\"email\")\n Searches the soup instance for all occurances of the passed in argument\n \"\"\"\n try:\n soup = BeautifulSoup(\n markup, features, builder, parse_only, from_encoding, exclude_encodings, element_classes, **kwargs\n )\n return soup\n except Exception as e:\n log.debug(f\"Error parsing beautifulsoup: {e}\")\n return False\n\n user_keywords = [re.compile(r, re.I) for r in [\"user\", \"login\", \"email\"]]\n pass_keywords = [re.compile(r, re.I) for r in [\"pass\"]]\n\n def is_login_page(self, html):\n \"\"\"\n Determines if the provided HTML content contains a login page.\n\n This function parses the HTML to search for forms with input fields typically used for\n authentication. If it identifies password fields or a combination of username and password\n fields, it returns True.\n\n Args:\n html (str): The HTML content to analyze.\n\n Returns:\n bool: True if the HTML contains a login page, otherwise False.\n\n Examples:\n >>> is_login_page('<form><input type=\"text\" name=\"username\"><input type=\"password\" name=\"password\"></form>')\n True\n\n >>> is_login_page('<form><input type=\"text\" name=\"search\"></form>')\n False\n \"\"\"\n try:\n soup = BeautifulSoup(html, \"html.parser\")\n except Exception as e:\n log.debug(f\"Error parsing html: {e}\")\n return False\n\n forms = soup.find_all(\"form\")\n\n # first, check for obvious password fields\n for form in forms:\n if form.find_all(\"input\", {\"type\": \"password\"}):\n return True\n\n # next, check for forms that have both a user-like and password-like field\n for form in forms:\n user_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.user_keywords)\n pass_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.pass_keywords)\n if user_fields and pass_fields:\n return True\n return False\n\n def response_to_json(self, response):\n \"\"\"\n Convert web response to JSON object, similar to the output of `httpx -irr -json`\n \"\"\"\n\n if response is None:\n return\n\n import mmh3\n from datetime import datetime\n from hashlib import md5, sha256\n from bbot.core.helpers.misc import tagify, urlparse, split_host_port, smart_decode\n\n request = response.request\n url = str(request.url)\n parsed_url = urlparse(url)\n netloc = parsed_url.netloc\n scheme = parsed_url.scheme.lower()\n host, port = split_host_port(f\"{scheme}://{netloc}\")\n\n raw_headers = \"\\r\\n\".join([f\"{k}: {v}\" for k, v in response.headers.items()])\n raw_headers_encoded = raw_headers.encode()\n\n headers = {}\n for k, v in response.headers.items():\n k = tagify(k, delimiter=\"_\")\n headers[k] = v\n\n j = {\n \"timestamp\": datetime.now().isoformat(),\n \"hash\": {\n \"body_md5\": md5(response.content).hexdigest(),\n \"body_mmh3\": mmh3.hash(response.content),\n \"body_sha256\": sha256(response.content).hexdigest(),\n # \"body_simhash\": \"TODO\",\n \"header_md5\": md5(raw_headers_encoded).hexdigest(),\n \"header_mmh3\": mmh3.hash(raw_headers_encoded),\n \"header_sha256\": sha256(raw_headers_encoded).hexdigest(),\n # \"header_simhash\": \"TODO\",\n },\n \"header\": headers,\n \"body\": smart_decode(response.content),\n \"content_type\": headers.get(\"content_type\", \"\").split(\";\")[0].strip(),\n \"url\": url,\n \"host\": str(host),\n \"port\": port,\n \"scheme\": scheme,\n \"method\": response.request.method,\n \"path\": parsed_url.path,\n \"raw_header\": raw_headers,\n \"status_code\": response.status_code,\n }\n\n return j\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.ERROR_CLASS","title":"ERROR_CLASS class-attribute
instance-attribute
","text":"ERROR_CLASS = WebError\n
Main utility class for managing HTTP operations in BBOT. It serves as a wrapper around the BBOTAsyncClient, which itself is a subclass of httpx.AsyncClient. The class provides functionalities to make HTTP requests, download files, and handle cached wordlists.
Attributes:
parent_helper
(object
) \u2013 The parent helper object containing scan configurations.
http_debug
(bool
) \u2013 Flag to indicate whether HTTP debugging is enabled.
ssl_verify
(bool
) \u2013 Flag to indicate whether SSL verification is enabled.
web_client
(BBOTAsyncClient
) \u2013 An instance of BBOTAsyncClient for making HTTP requests.
client_only_options
(tuple
) \u2013 A tuple of options only applicable to the web client.
Examples:
Basic web request:
>>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n
Download file:
>>> filename = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\")\n
Download wordlist (cached for 30 days by default):
>>> filename = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.api_page_iter","title":"api_page_iter async
","text":"api_page_iter(url, page_size=100, json=True, next_key=None, **requests_kwargs)\n
An asynchronous generator function for iterating through paginated API data.
This function continuously makes requests to a specified API URL, incrementing the page number or applying a custom pagination function, and yields the received data one page at a time. It is well-suited for APIs that provide paginated results.
Parameters:
url
(str
) \u2013 The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.
page_size
(int
, default: 100
) \u2013 The number of items per page. Defaults to 100.
json
(bool
, default: True
) \u2013 If True, attempts to deserialize the response content to a JSON object. Defaults to True.
next_key
(callable
, default: None
) \u2013 A function that takes the last page's data and returns the URL for the next page. Defaults to None.
**requests_kwargs
\u2013 Arbitrary keyword arguments that will be forwarded to the HTTP request function.
Yields:
dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.
The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.
Examples:
>>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')\n>>> try:\n>>> async for page in agen:\n>>> subdomains = page[\"subdomains\"]\n>>> self.hugesuccess(subdomains)\n>>> if not subdomains:\n>>> break\n>>> finally:\n>>> agen.aclose()\n
Source code in bbot/core/helpers/web/web.py
async def api_page_iter(self, url, page_size=100, json=True, next_key=None, **requests_kwargs):\n \"\"\"\n An asynchronous generator function for iterating through paginated API data.\n\n This function continuously makes requests to a specified API URL, incrementing the page number\n or applying a custom pagination function, and yields the received data one page at a time.\n It is well-suited for APIs that provide paginated results.\n\n Args:\n url (str): The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.\n page_size (int, optional): The number of items per page. Defaults to 100.\n json (bool, optional): If True, attempts to deserialize the response content to a JSON object. Defaults to True.\n next_key (callable, optional): A function that takes the last page's data and returns the URL for the next page. Defaults to None.\n **requests_kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Yields:\n dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.\n\n Note:\n The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.\n\n Examples:\n >>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')\n >>> try:\n >>> async for page in agen:\n >>> subdomains = page[\"subdomains\"]\n >>> self.hugesuccess(subdomains)\n >>> if not subdomains:\n >>> break\n >>> finally:\n >>> agen.aclose()\n \"\"\"\n page = 1\n offset = 0\n result = None\n while 1:\n if result and callable(next_key):\n try:\n new_url = next_key(result)\n except Exception as e:\n log.debug(f\"Failed to extract next page of results from {url}: {e}\")\n log.debug(traceback.format_exc())\n else:\n new_url = url.format(page=page, page_size=page_size, offset=offset)\n result = await self.request(new_url, **requests_kwargs)\n if result is None:\n log.verbose(f\"api_page_iter() got no response for {url}\")\n break\n try:\n if json:\n result = result.json()\n yield result\n except Exception:\n log.warning(f'Error in api_page_iter() for url: \"{new_url}\"')\n log.trace(traceback.format_exc())\n break\n finally:\n offset += page_size\n page += 1\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.beautifulsoup","title":"beautifulsoup","text":"beautifulsoup(markup, features='html.parser', builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, element_classes=None, **kwargs)\n
Naviate, Search, Modify, Parse, or PrettyPrint HTML Content. More information at https://beautiful-soup-4.readthedocs.io/en/latest/
Parameters:
markup
\u2013 A string or a file-like object representing markup to be parsed.
features
\u2013 Desirable features of the parser to be used. This may be the name of a specific parser (\"lxml\", \"lxml-xml\", \"html.parser\", or \"html5lib\") or it may be the type of markup to be used (\"html\", \"html5\", \"xml\"). Defaults to 'html.parser'.
builder
\u2013 A TreeBuilder subclass to instantiate (or instance to use) instead of looking one up based on features
.
parse_only
\u2013 A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered.
from_encoding
\u2013 A string indicating the encoding of the document to be parsed.
Returns:
soup
\u2013 An instance of the BeautifulSoup class
Examples:
>>> soup = self.helpers.beautifulsoup(event.data[\"body\"], \"html.parser\")\nPerform an html parse of the 'markup' argument and return a soup instance\n
>>> email_type = soup.find(type=\"email\")\nSearches the soup instance for all occurances of the passed in argument\n
Source code in bbot/core/helpers/web/web.py
def beautifulsoup(\n self,\n markup,\n features=\"html.parser\",\n builder=None,\n parse_only=None,\n from_encoding=None,\n exclude_encodings=None,\n element_classes=None,\n **kwargs,\n):\n \"\"\"\n Naviate, Search, Modify, Parse, or PrettyPrint HTML Content.\n More information at https://beautiful-soup-4.readthedocs.io/en/latest/\n\n Args:\n markup: A string or a file-like object representing markup to be parsed.\n features: Desirable features of the parser to be used.\n This may be the name of a specific parser (\"lxml\",\n \"lxml-xml\", \"html.parser\", or \"html5lib\") or it may be\n the type of markup to be used (\"html\", \"html5\", \"xml\").\n Defaults to 'html.parser'.\n builder: A TreeBuilder subclass to instantiate (or instance to use)\n instead of looking one up based on `features`.\n parse_only: A SoupStrainer. Only parts of the document\n matching the SoupStrainer will be considered.\n from_encoding: A string indicating the encoding of the\n document to be parsed.\n exclude_encodings = A list of strings indicating\n encodings known to be wrong.\n element_classes = A dictionary mapping BeautifulSoup\n classes like Tag and NavigableString, to other classes you'd\n like to be instantiated instead as the parse tree is\n built.\n **kwargs = For backwards compatibility purposes.\n\n Returns:\n soup: An instance of the BeautifulSoup class\n\n Todo:\n - Write tests for this function\n\n Examples:\n >>> soup = self.helpers.beautifulsoup(event.data[\"body\"], \"html.parser\")\n Perform an html parse of the 'markup' argument and return a soup instance\n\n >>> email_type = soup.find(type=\"email\")\n Searches the soup instance for all occurances of the passed in argument\n \"\"\"\n try:\n soup = BeautifulSoup(\n markup, features, builder, parse_only, from_encoding, exclude_encodings, element_classes, **kwargs\n )\n return soup\n except Exception as e:\n log.debug(f\"Error parsing beautifulsoup: {e}\")\n return False\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.curl","title":"curl async
","text":"curl(*args, **kwargs)\n
An asynchronous function that runs a cURL command with specified arguments and options.
This function constructs and executes a cURL command based on the provided parameters. It offers support for various cURL options such as headers, post data, and cookies.
Parameters:
*args
\u2013 Variable length argument list for positional arguments. Unused in this function.
url
(str
) \u2013 The URL for the cURL request. Mandatory.
raw_path
(bool
) \u2013 If True, activates '--path-as-is' in cURL. Defaults to False.
headers
(dict
) \u2013 A dictionary of HTTP headers to include in the request.
ignore_bbot_global_settings
(bool
) \u2013 If True, ignores the global settings of BBOT. Defaults to False.
post_data
(dict
) \u2013 A dictionary containing data to be sent in the request body.
method
(str
) \u2013 The HTTP method to use for the request (e.g., 'GET', 'POST').
cookies
(dict
) \u2013 A dictionary of cookies to include in the request.
path_override
(str
) \u2013 Overrides the request-target to use in the HTTP request line.
head_mode
(bool
) \u2013 If True, includes '-I' to fetch headers only. Defaults to None.
raw_body
(str
) \u2013 Raw string to be sent in the body of the request.
**kwargs
\u2013 Arbitrary keyword arguments that will be forwarded to the HTTP request function.
Returns:
str
\u2013 The output of the cURL command.
Raises:
CurlError
\u2013 If 'url' is not supplied.
Examples:
>>> output = await curl(url=\"https://example.com\", headers={\"X-Header\": \"Wat\"})\n>>> print(output)\n
Source code in bbot/core/helpers/web/web.py
async def curl(self, *args, **kwargs):\n \"\"\"\n An asynchronous function that runs a cURL command with specified arguments and options.\n\n This function constructs and executes a cURL command based on the provided parameters.\n It offers support for various cURL options such as headers, post data, and cookies.\n\n Args:\n *args: Variable length argument list for positional arguments. Unused in this function.\n url (str): The URL for the cURL request. Mandatory.\n raw_path (bool, optional): If True, activates '--path-as-is' in cURL. Defaults to False.\n headers (dict, optional): A dictionary of HTTP headers to include in the request.\n ignore_bbot_global_settings (bool, optional): If True, ignores the global settings of BBOT. Defaults to False.\n post_data (dict, optional): A dictionary containing data to be sent in the request body.\n method (str, optional): The HTTP method to use for the request (e.g., 'GET', 'POST').\n cookies (dict, optional): A dictionary of cookies to include in the request.\n path_override (str, optional): Overrides the request-target to use in the HTTP request line.\n head_mode (bool, optional): If True, includes '-I' to fetch headers only. Defaults to None.\n raw_body (str, optional): Raw string to be sent in the body of the request.\n **kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Returns:\n str: The output of the cURL command.\n\n Raises:\n CurlError: If 'url' is not supplied.\n\n Examples:\n >>> output = await curl(url=\"https://example.com\", headers={\"X-Header\": \"Wat\"})\n >>> print(output)\n \"\"\"\n url = kwargs.get(\"url\", \"\")\n\n if not url:\n raise CurlError(\"No URL supplied to CURL helper\")\n\n curl_command = [\"curl\", url, \"-s\"]\n\n raw_path = kwargs.get(\"raw_path\", False)\n if raw_path:\n curl_command.append(\"--path-as-is\")\n\n # respect global ssl verify settings\n if self.ssl_verify is not True:\n curl_command.append(\"-k\")\n\n headers = kwargs.get(\"headers\", {})\n\n ignore_bbot_global_settings = kwargs.get(\"ignore_bbot_global_settings\", False)\n\n if ignore_bbot_global_settings:\n log.debug(\"ignore_bbot_global_settings enabled. Global settings will not be applied\")\n else:\n http_timeout = self.parent_helper.web_config.get(\"http_timeout\", 20)\n user_agent = self.parent_helper.web_config.get(\"user_agent\", \"BBOT\")\n\n if \"User-Agent\" not in headers:\n headers[\"User-Agent\"] = user_agent\n\n # only add custom headers if the URL is in-scope\n if self.parent_helper.preset.in_scope(url):\n for hk, hv in self.web_config.get(\"http_headers\", {}).items():\n headers[hk] = hv\n\n # add the timeout\n if not \"timeout\" in kwargs:\n timeout = http_timeout\n\n curl_command.append(\"-m\")\n curl_command.append(str(timeout))\n\n for k, v in headers.items():\n if isinstance(v, list):\n for x in v:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {x}\")\n\n else:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {v}\")\n\n post_data = kwargs.get(\"post_data\", {})\n if len(post_data.items()) > 0:\n curl_command.append(\"-d\")\n post_data_str = \"\"\n for k, v in post_data.items():\n post_data_str += f\"&{k}={v}\"\n curl_command.append(post_data_str.lstrip(\"&\"))\n\n method = kwargs.get(\"method\", \"\")\n if method:\n curl_command.append(\"-X\")\n curl_command.append(method)\n\n cookies = kwargs.get(\"cookies\", \"\")\n if cookies:\n curl_command.append(\"-b\")\n cookies_str = \"\"\n for k, v in cookies.items():\n cookies_str += f\"{k}={v}; \"\n curl_command.append(f'{cookies_str.rstrip(\" \")}')\n\n path_override = kwargs.get(\"path_override\", None)\n if path_override:\n curl_command.append(\"--request-target\")\n curl_command.append(f\"{path_override}\")\n\n head_mode = kwargs.get(\"head_mode\", None)\n if head_mode:\n curl_command.append(\"-I\")\n\n raw_body = kwargs.get(\"raw_body\", None)\n if raw_body:\n curl_command.append(\"-d\")\n curl_command.append(raw_body)\n\n output = (await self.parent_helper.run(curl_command)).stdout\n return output\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.download","title":"download async
","text":"download(url, **kwargs)\n
Asynchronous function for downloading files from a given URL. Supports caching with an optional time period in hours via the \"cache_hrs\" keyword argument. In case of successful download, returns the full path of the saved filename. If the download fails, returns None.
Parameters:
url
(str
) \u2013 The URL of the file to download.
filename
(str
) \u2013 The filename to save the downloaded file as. If not provided, will generate based on URL.
max_size
(str or int
) \u2013 Maximum filesize as a string (\"5MB\") or integer in bytes.
cache_hrs
(float
) \u2013 The number of hours to cache the downloaded file. A negative value disables caching. Defaults to -1.
method
(str
) \u2013 The HTTP method to use for the request, defaults to 'GET'.
raise_error
(bool
) \u2013 Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.
**kwargs
\u2013 Additional keyword arguments to pass to the httpx request.
Returns:
Path or None: The full path of the downloaded file as a Path object if successful, otherwise None.
Examples:
>>> filepath = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\", cache_hrs=24)\n
Source code in bbot/core/helpers/web/web.py
async def download(self, url, **kwargs):\n \"\"\"\n Asynchronous function for downloading files from a given URL. Supports caching with an optional\n time period in hours via the \"cache_hrs\" keyword argument. In case of successful download,\n returns the full path of the saved filename. If the download fails, returns None.\n\n Args:\n url (str): The URL of the file to download.\n filename (str, optional): The filename to save the downloaded file as.\n If not provided, will generate based on URL.\n max_size (str or int): Maximum filesize as a string (\"5MB\") or integer in bytes.\n cache_hrs (float, optional): The number of hours to cache the downloaded file.\n A negative value disables caching. Defaults to -1.\n method (str, optional): The HTTP method to use for the request, defaults to 'GET'.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n **kwargs: Additional keyword arguments to pass to the httpx request.\n\n Returns:\n Path or None: The full path of the downloaded file as a Path object if successful, otherwise None.\n\n Examples:\n >>> filepath = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\", cache_hrs=24)\n \"\"\"\n success = False\n filename = kwargs.pop(\"filename\", self.parent_helper.cache_filename(url))\n filename = truncate_filename(Path(filename).resolve())\n kwargs[\"filename\"] = filename\n max_size = kwargs.pop(\"max_size\", None)\n if max_size is not None:\n max_size = self.parent_helper.human_to_bytes(max_size)\n kwargs[\"max_size\"] = max_size\n cache_hrs = float(kwargs.pop(\"cache_hrs\", -1))\n if cache_hrs > 0 and self.parent_helper.is_cached(url):\n log.debug(f\"{url} is cached at {self.parent_helper.cache_filename(url)}\")\n success = True\n else:\n success = await self.run_and_return(\"download\", url, **kwargs)\n\n if success:\n return filename\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.is_login_page","title":"is_login_page","text":"is_login_page(html)\n
Determines if the provided HTML content contains a login page.
This function parses the HTML to search for forms with input fields typically used for authentication. If it identifies password fields or a combination of username and password fields, it returns True.
Parameters:
html
(str
) \u2013 The HTML content to analyze.
Returns:
bool
\u2013 True if the HTML contains a login page, otherwise False.
Examples:
>>> is_login_page('<form><input type=\"text\" name=\"username\"><input type=\"password\" name=\"password\"></form>')\nTrue\n
>>> is_login_page('<form><input type=\"text\" name=\"search\"></form>')\nFalse\n
Source code in bbot/core/helpers/web/web.py
def is_login_page(self, html):\n \"\"\"\n Determines if the provided HTML content contains a login page.\n\n This function parses the HTML to search for forms with input fields typically used for\n authentication. If it identifies password fields or a combination of username and password\n fields, it returns True.\n\n Args:\n html (str): The HTML content to analyze.\n\n Returns:\n bool: True if the HTML contains a login page, otherwise False.\n\n Examples:\n >>> is_login_page('<form><input type=\"text\" name=\"username\"><input type=\"password\" name=\"password\"></form>')\n True\n\n >>> is_login_page('<form><input type=\"text\" name=\"search\"></form>')\n False\n \"\"\"\n try:\n soup = BeautifulSoup(html, \"html.parser\")\n except Exception as e:\n log.debug(f\"Error parsing html: {e}\")\n return False\n\n forms = soup.find_all(\"form\")\n\n # first, check for obvious password fields\n for form in forms:\n if form.find_all(\"input\", {\"type\": \"password\"}):\n return True\n\n # next, check for forms that have both a user-like and password-like field\n for form in forms:\n user_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.user_keywords)\n pass_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.pass_keywords)\n if user_fields and pass_fields:\n return True\n return False\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.request","title":"request async
","text":"request(*args, **kwargs)\n
Asynchronous function for making HTTP requests, intended to be the most basic web request function used widely across BBOT and within this helper class. Handles various exceptions and timeouts that might occur during the request.
This function automatically respects the scan's global timeout, proxy, headers, etc. Headers you specify will be merged with the scan's. Your arguments take ultimate precedence, meaning you can override the scan's values if you want.
Parameters:
url
(str
) \u2013 The URL to send the request to.
method
(str
) \u2013 The HTTP method to use for the request. Defaults to 'GET'.
headers
(dict
) \u2013 Dictionary of HTTP headers to send with the request.
params
(dict
) \u2013 Dictionary, list of tuples, or bytes to send in the query string.
cookies
(dict
) \u2013 Dictionary or CookieJar object containing cookies.
json
(Any
) \u2013 A JSON serializable Python object to send in the body.
data
(dict
) \u2013 Dictionary, list of tuples, or bytes to send in the body.
files
(dict
) \u2013 Dictionary of 'name': file-like-objects for multipart encoding upload.
auth
(tuple
) \u2013 Auth tuple to enable Basic/Digest/Custom HTTP auth.
timeout
(float
) \u2013 The maximum time to wait for the request to complete.
proxies
(dict
) \u2013 Dictionary mapping protocol schemes to proxy URLs.
allow_redirects
(bool
) \u2013 Enables or disables redirection. Defaults to None.
stream
(bool
) \u2013 Enables or disables response streaming.
raise_error
(bool
) \u2013 Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.
client
(AsyncClient
) \u2013 A specific httpx.AsyncClient to use for the request. Defaults to self.web_client.
cache_for
(int
) \u2013 Time in seconds to cache the request. Not used currently. Defaults to None.
Raises:
TimeoutException
\u2013 If the request times out.
ConnectError
\u2013 If the connection fails.
RequestError
\u2013 For other request-related errors.
Returns:
httpx.Response or None: The HTTP response object returned by the httpx library.
Examples:
>>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n
>>> response = await self.helpers.request(\"https://api.evilcorp.com/\", method=\"POST\", data=\"stuff\")\n
Note If the web request fails, it will return None unless raise_error
is True
.
bbot/core/helpers/web/web.py
async def request(self, *args, **kwargs):\n \"\"\"\n Asynchronous function for making HTTP requests, intended to be the most basic web request function\n used widely across BBOT and within this helper class. Handles various exceptions and timeouts\n that might occur during the request.\n\n This function automatically respects the scan's global timeout, proxy, headers, etc.\n Headers you specify will be merged with the scan's. Your arguments take ultimate precedence,\n meaning you can override the scan's values if you want.\n\n Args:\n url (str): The URL to send the request to.\n method (str, optional): The HTTP method to use for the request. Defaults to 'GET'.\n headers (dict, optional): Dictionary of HTTP headers to send with the request.\n params (dict, optional): Dictionary, list of tuples, or bytes to send in the query string.\n cookies (dict, optional): Dictionary or CookieJar object containing cookies.\n json (Any, optional): A JSON serializable Python object to send in the body.\n data (dict, optional): Dictionary, list of tuples, or bytes to send in the body.\n files (dict, optional): Dictionary of 'name': file-like-objects for multipart encoding upload.\n auth (tuple, optional): Auth tuple to enable Basic/Digest/Custom HTTP auth.\n timeout (float, optional): The maximum time to wait for the request to complete.\n proxies (dict, optional): Dictionary mapping protocol schemes to proxy URLs.\n allow_redirects (bool, optional): Enables or disables redirection. Defaults to None.\n stream (bool, optional): Enables or disables response streaming.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n client (httpx.AsyncClient, optional): A specific httpx.AsyncClient to use for the request. Defaults to self.web_client.\n cache_for (int, optional): Time in seconds to cache the request. Not used currently. Defaults to None.\n\n Raises:\n httpx.TimeoutException: If the request times out.\n httpx.ConnectError: If the connection fails.\n httpx.RequestError: For other request-related errors.\n\n Returns:\n httpx.Response or None: The HTTP response object returned by the httpx library.\n\n Examples:\n >>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n >>> response = await self.helpers.request(\"https://api.evilcorp.com/\", method=\"POST\", data=\"stuff\")\n\n Note:\n If the web request fails, it will return None unless `raise_error` is `True`.\n \"\"\"\n return await self.run_and_return(\"request\", *args, **kwargs)\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.request_batch","title":"request_batch async
","text":"request_batch(urls, *args, **kwargs)\n
Given a list of URLs, request them in parallel and yield responses as they come in.
Parameters:
urls
(list[str]
) \u2013 List of URLs to visit
*args
\u2013 Positional arguments to pass through to httpx
**kwargs
\u2013 Keyword arguments to pass through to httpx
Examples:
>>> async for url, response in self.helpers.request_batch(urls, headers={\"X-Test\": \"Test\"}):\n>>> if response is not None and response.status_code == 200:\n>>> self.hugesuccess(response)\n
Source code in bbot/core/helpers/web/web.py
async def request_batch(self, urls, *args, **kwargs):\n \"\"\"\n Given a list of URLs, request them in parallel and yield responses as they come in.\n\n Args:\n urls (list[str]): List of URLs to visit\n *args: Positional arguments to pass through to httpx\n **kwargs: Keyword arguments to pass through to httpx\n\n Examples:\n >>> async for url, response in self.helpers.request_batch(urls, headers={\"X-Test\": \"Test\"}):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_batch\", urls, *args, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.request_custom_batch","title":"request_custom_batch async
","text":"request_custom_batch(urls_and_kwargs)\n
Make web requests in parallel with custom options for each request. Yield responses as they come in.
Similar to request_batch
except it allows individual arguments for each URL.
Parameters:
urls_and_kwargs
(list[tuple]
) \u2013 List of tuples in the format: (url, kwargs, custom_tracker) where custom_tracker is an optional value for your own internal use. You may use it to help correlate requests, etc.
Examples:
>>> urls_and_kwargs = [\n>>> (\"http://evilcorp.com/1\", {\"method\": \"GET\"}, \"request-1\"),\n>>> (\"http://evilcorp.com/2\", {\"method\": \"POST\"}, \"request-2\"),\n>>> ]\n>>> async for url, kwargs, custom_tracker, response in self.helpers.request_custom_batch(\n>>> urls_and_kwargs\n>>> ):\n>>> if response is not None and response.status_code == 200:\n>>> self.hugesuccess(response)\n
Source code in bbot/core/helpers/web/web.py
async def request_custom_batch(self, urls_and_kwargs):\n \"\"\"\n Make web requests in parallel with custom options for each request. Yield responses as they come in.\n\n Similar to `request_batch` except it allows individual arguments for each URL.\n\n Args:\n urls_and_kwargs (list[tuple]): List of tuples in the format: (url, kwargs, custom_tracker)\n where custom_tracker is an optional value for your own internal use. You may use it to\n help correlate requests, etc.\n\n Examples:\n >>> urls_and_kwargs = [\n >>> (\"http://evilcorp.com/1\", {\"method\": \"GET\"}, \"request-1\"),\n >>> (\"http://evilcorp.com/2\", {\"method\": \"POST\"}, \"request-2\"),\n >>> ]\n >>> async for url, kwargs, custom_tracker, response in self.helpers.request_custom_batch(\n >>> urls_and_kwargs\n >>> ):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_custom_batch\", urls_and_kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.response_to_json","title":"response_to_json","text":"response_to_json(response)\n
Convert web response to JSON object, similar to the output of httpx -irr -json
bbot/core/helpers/web/web.py
def response_to_json(self, response):\n \"\"\"\n Convert web response to JSON object, similar to the output of `httpx -irr -json`\n \"\"\"\n\n if response is None:\n return\n\n import mmh3\n from datetime import datetime\n from hashlib import md5, sha256\n from bbot.core.helpers.misc import tagify, urlparse, split_host_port, smart_decode\n\n request = response.request\n url = str(request.url)\n parsed_url = urlparse(url)\n netloc = parsed_url.netloc\n scheme = parsed_url.scheme.lower()\n host, port = split_host_port(f\"{scheme}://{netloc}\")\n\n raw_headers = \"\\r\\n\".join([f\"{k}: {v}\" for k, v in response.headers.items()])\n raw_headers_encoded = raw_headers.encode()\n\n headers = {}\n for k, v in response.headers.items():\n k = tagify(k, delimiter=\"_\")\n headers[k] = v\n\n j = {\n \"timestamp\": datetime.now().isoformat(),\n \"hash\": {\n \"body_md5\": md5(response.content).hexdigest(),\n \"body_mmh3\": mmh3.hash(response.content),\n \"body_sha256\": sha256(response.content).hexdigest(),\n # \"body_simhash\": \"TODO\",\n \"header_md5\": md5(raw_headers_encoded).hexdigest(),\n \"header_mmh3\": mmh3.hash(raw_headers_encoded),\n \"header_sha256\": sha256(raw_headers_encoded).hexdigest(),\n # \"header_simhash\": \"TODO\",\n },\n \"header\": headers,\n \"body\": smart_decode(response.content),\n \"content_type\": headers.get(\"content_type\", \"\").split(\";\")[0].strip(),\n \"url\": url,\n \"host\": str(host),\n \"port\": port,\n \"scheme\": scheme,\n \"method\": response.request.method,\n \"path\": parsed_url.path,\n \"raw_header\": raw_headers,\n \"status_code\": response.status_code,\n }\n\n return j\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.wordlist","title":"wordlist async
","text":"wordlist(path, lines=None, **kwargs)\n
Asynchronous function for retrieving wordlists, either from a local path or a URL. Allows for optional line-based truncation and caching. Returns the full path of the wordlist file or a truncated version of it.
Parameters:
path
(str
) \u2013 The local or remote path of the wordlist.
lines
(int
, default: None
) \u2013 Number of lines to read from the wordlist. If specified, will return a truncated wordlist with this many lines.
cache_hrs
(float
) \u2013 Number of hours to cache the downloaded wordlist. Defaults to 720 hours (30 days) for remote wordlists.
**kwargs
\u2013 Additional keyword arguments to pass to the 'download' function for remote wordlists.
Returns:
Path
\u2013 The full path of the wordlist (or its truncated version) as a Path object.
Raises:
WordlistError
\u2013 If the path is invalid or the wordlist could not be retrieved or found.
Examples:
Fetching full wordlist
>>> wordlist_path = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n
Fetching and truncating to the first 100 lines
>>> wordlist_path = await self.helpers.wordlist(\"/root/rockyou.txt\", lines=100)\n
Source code in bbot/core/helpers/web/web.py
async def wordlist(self, path, lines=None, **kwargs):\n \"\"\"\n Asynchronous function for retrieving wordlists, either from a local path or a URL.\n Allows for optional line-based truncation and caching. Returns the full path of the wordlist\n file or a truncated version of it.\n\n Args:\n path (str): The local or remote path of the wordlist.\n lines (int, optional): Number of lines to read from the wordlist.\n If specified, will return a truncated wordlist with this many lines.\n cache_hrs (float, optional): Number of hours to cache the downloaded wordlist.\n Defaults to 720 hours (30 days) for remote wordlists.\n **kwargs: Additional keyword arguments to pass to the 'download' function for remote wordlists.\n\n Returns:\n Path: The full path of the wordlist (or its truncated version) as a Path object.\n\n Raises:\n WordlistError: If the path is invalid or the wordlist could not be retrieved or found.\n\n Examples:\n Fetching full wordlist\n >>> wordlist_path = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n\n Fetching and truncating to the first 100 lines\n >>> wordlist_path = await self.helpers.wordlist(\"/root/rockyou.txt\", lines=100)\n \"\"\"\n if not path:\n raise WordlistError(f\"Invalid wordlist: {path}\")\n if not \"cache_hrs\" in kwargs:\n kwargs[\"cache_hrs\"] = 720\n if self.parent_helper.is_url(path):\n filename = await self.download(str(path), **kwargs)\n if filename is None:\n raise WordlistError(f\"Unable to retrieve wordlist from {path}\")\n else:\n filename = Path(path).resolve()\n if not filename.is_file():\n raise WordlistError(f\"Unable to find wordlist at {path}\")\n\n if lines is None:\n return filename\n else:\n lines = int(lines)\n with open(filename) as f:\n read_lines = f.readlines()\n cache_key = f\"{filename}:{lines}\"\n truncated_filename = self.parent_helper.cache_filename(cache_key)\n with open(truncated_filename, \"w\") as f:\n for line in read_lines[:lines]:\n f.write(line)\n return truncated_filename\n
"},{"location":"dev/helpers/wordcloud/","title":"Word Cloud","text":"These are helpers related to BBOT's Word Cloud, a mechanism for storing target-specific keywords that are useful for custom wordlists, etc.
Note that these helpers can be invoked directly from self.helpers
, e.g.:
self.helpers.word_cloud\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.DNSMutator","title":"DNSMutator","text":" Bases: Mutator
DNS-specific mutator used by the dnsbrute_mutations
module to generate target-specific subdomain mutations.
This class extends the Mutator base class to add DNS-specific logic for generating subdomain mutations based on input words. It utilizes custom word extraction patterns and a wordninja model trained on DNS-specific data.
Examples:
>>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n>>> s.start_without_generator()\n>>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n[\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n]\n
Source code in bbot/core/helpers/wordcloud.py
class DNSMutator(Mutator):\n \"\"\"\n DNS-specific mutator used by the `dnsbrute_mutations` module to generate target-specific subdomain mutations.\n\n This class extends the Mutator base class to add DNS-specific logic for generating\n subdomain mutations based on input words. It utilizes custom word extraction patterns\n and a wordninja model trained on DNS-specific data.\n\n Examples:\n >>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n >>> s.start_without_generator()\n >>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n [\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n ]\n \"\"\"\n\n extract_word_regexes = [\n re.compile(r, re.I)\n for r in [\n r\"[a-z]+\",\n r\"[a-z_-]+\",\n r\"[a-z0-9]+\",\n r\"[a-z0-9_-]+\",\n ]\n ]\n\n def __init__(self, *args, **kwargs):\n super().__init__(*args, **kwargs)\n wordlist_dir = Path(__file__).parent.parent.parent / \"wordlists\"\n wordninja_dns_wordlist = wordlist_dir / \"wordninja_dns.txt.gz\"\n self.model = wordninja.LanguageModel(wordninja_dns_wordlist)\n\n def mutations(self, words, max_mutations=None):\n if isinstance(words, str):\n words = [words]\n new_words = set()\n for word in words:\n for e in extract_words(word, acronyms=False, model=self.model, word_regexes=self.extract_word_regexes):\n new_words.add(e)\n return super().mutations(new_words, max_mutations=max_mutations)\n\n def add_word(self, word):\n spans = set()\n mutations = set()\n for r in self.extract_word_regexes:\n for match in r.finditer(word):\n span = match.span()\n if span not in spans:\n spans.add(span)\n for start, end in spans:\n match_str = word[start:end]\n # skip digits\n if match_str.isdigit():\n continue\n before = word[:start]\n after = word[end:]\n basic_mutation = (before, None, after)\n mutations.add(basic_mutation)\n match_str_split = self.model.split(match_str)\n if len(match_str_split) > 1:\n for i, s in enumerate(match_str_split):\n if s.isdigit():\n continue\n split_before = \"\".join(match_str_split[:i])\n split_after = \"\".join(match_str_split[i + 1 :])\n wordninja_mutation = (before + split_before, None, split_after + after)\n mutations.add(wordninja_mutation)\n for m in mutations:\n self._add_mutation(m)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.Mutator","title":"Mutator","text":" Bases: dict
Base class for generating mutations from a list of words. It accumulates words and produces mutations from them.
Source code inbbot/core/helpers/wordcloud.py
class Mutator(dict):\n \"\"\"\n Base class for generating mutations from a list of words.\n It accumulates words and produces mutations from them.\n \"\"\"\n\n def mutations(self, words, max_mutations=None):\n mutations = self.top_mutations(max_mutations)\n ret = set()\n if isinstance(words, str):\n words = [words]\n for word in words:\n for m in self.mutate(word, mutations=mutations):\n ret.add(\"\".join(m))\n return ret\n\n def mutate(self, word, max_mutations=None, mutations=None):\n if mutations is None:\n mutations = self.top_mutations(max_mutations)\n for mutation, count in mutations.items():\n ret = []\n for s in mutation:\n if s is not None:\n ret.append(s)\n else:\n ret.append(word)\n yield ret\n\n def top_mutations(self, n=None):\n if n is not None:\n return dict(sorted(self.items(), key=lambda x: x[-1], reverse=True)[:n])\n else:\n return dict(self)\n\n def _add_mutation(self, mutation):\n if None not in mutation:\n return\n mutation = tuple([m for m in mutation if m != \"\"])\n try:\n self[mutation] += 1\n except KeyError:\n self[mutation] = 1\n\n def add_word(self, word):\n pass\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud","title":"WordCloud","text":" Bases: dict
WordCloud is a specialized dictionary-like class for storing and aggregating words extracted from various data sources such as DNS names and URLs. The class is intended to facilitate the generation of target-specific wordlists and mutations.
The WordCloud class can be accessed and manipulated like a standard Python dictionary. It also offers additional methods for generating mutations based on the words it contains.
Attributes:
parent_helper
\u2013 The parent helper object that provides necessary utilities.
devops_mutations
\u2013 A set containing common devops-related mutations, loaded from a file.
dns_mutator
\u2013 An instance of the DNSMutator class for generating DNS-based mutations.
Examples:
>>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n>>> s.start_without_generator()\n>>> print(s.helpers.word_cloud)\n{\n \"evilcorp\": 2,\n \"ec\": 2,\n \"www1\": 1,\n \"evil\": 2,\n \"www\": 2,\n \"w1\": 1,\n \"corp\": 2,\n \"1\": 1,\n \"wt\": 1,\n \"test\": 1,\n \"www-test\": 1\n}\n
>>> s.helpers.word_cloud.mutations([\"word\"], cloud=True, numbers=0, devops=False, letters=False)\n[\n [\n \"1\",\n \"word\"\n ],\n [\n \"corp\",\n \"word\"\n ],\n [\n \"ec\",\n \"word\"\n ],\n [\n \"evil\",\n \"word\"\n ],\n ...\n]\n
>>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n[\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n]\n
Source code in bbot/core/helpers/wordcloud.py
class WordCloud(dict):\n \"\"\"\n WordCloud is a specialized dictionary-like class for storing and aggregating\n words extracted from various data sources such as DNS names and URLs. The class\n is intended to facilitate the generation of target-specific wordlists and mutations.\n\n The WordCloud class can be accessed and manipulated like a standard Python dictionary.\n It also offers additional methods for generating mutations based on the words it contains.\n\n Attributes:\n parent_helper: The parent helper object that provides necessary utilities.\n devops_mutations: A set containing common devops-related mutations, loaded from a file.\n dns_mutator: An instance of the DNSMutator class for generating DNS-based mutations.\n\n Examples:\n >>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n >>> s.start_without_generator()\n >>> print(s.helpers.word_cloud)\n {\n \"evilcorp\": 2,\n \"ec\": 2,\n \"www1\": 1,\n \"evil\": 2,\n \"www\": 2,\n \"w1\": 1,\n \"corp\": 2,\n \"1\": 1,\n \"wt\": 1,\n \"test\": 1,\n \"www-test\": 1\n }\n\n >>> s.helpers.word_cloud.mutations([\"word\"], cloud=True, numbers=0, devops=False, letters=False)\n [\n [\n \"1\",\n \"word\"\n ],\n [\n \"corp\",\n \"word\"\n ],\n [\n \"ec\",\n \"word\"\n ],\n [\n \"evil\",\n \"word\"\n ],\n ...\n ]\n\n >>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n [\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n ]\n \"\"\"\n\n def __init__(self, parent_helper, *args, **kwargs):\n self.parent_helper = parent_helper\n\n devops_filename = self.parent_helper.wordlist_dir / \"devops_mutations.txt\"\n self.devops_mutations = set(self.parent_helper.read_file(devops_filename))\n\n self.dns_mutator = DNSMutator()\n\n super().__init__(*args, **kwargs)\n\n def mutations(\n self, words, devops=True, cloud=True, letters=True, numbers=5, number_padding=2, substitute_numbers=True\n ):\n \"\"\"\n Generate various mutations for the given list of words based on different criteria.\n\n Yields tuples of strings which can be joined on the desired delimiter, e.g. \"-\" or \"_\".\n\n Args:\n words (Union[str, Iterable[str]]): A single word or list of words to mutate.\n devops (bool): Whether to include devops-related mutations.\n cloud (bool): Whether to include mutations from the word cloud.\n letters (bool): Whether to include letter-based mutations.\n numbers (int): The maximum numeric mutations to include.\n number_padding (int): Padding for numeric mutations.\n substitute_numbers (bool): Whether to substitute numbers in mutations.\n\n Yields:\n tuple: A tuple containing each of the mutation segments.\n \"\"\"\n if isinstance(words, str):\n words = (words,)\n results = set()\n for word in words:\n h = hash(word)\n if not h in results:\n results.add(h)\n yield (word,)\n if numbers > 0:\n if substitute_numbers:\n for word in words:\n for number_mutation in self.get_number_mutations(word, n=numbers, padding=number_padding):\n h = hash(number_mutation)\n if not h in results:\n results.add(h)\n yield (number_mutation,)\n for word in words:\n for modifier in self.modifiers(\n devops=devops, cloud=cloud, letters=letters, numbers=numbers, number_padding=number_padding\n ):\n a = (word, modifier)\n b = (modifier, word)\n for _ in (a, b):\n h = hash(_)\n if h not in results:\n results.add(h)\n yield _\n\n def modifiers(self, devops=True, cloud=True, letters=True, numbers=5, number_padding=2):\n modifiers = set()\n if devops:\n modifiers.update(self.devops_mutations)\n if cloud:\n modifiers.update(set(self))\n if letters:\n modifiers.update(set(string.ascii_lowercase))\n if numbers > 0:\n modifiers.update(self.parent_helper.gen_numbers(numbers, number_padding))\n return modifiers\n\n def absorb_event(self, event):\n \"\"\"\n Absorbs an event from a BBOT scan into the word cloud.\n\n This method updates the word cloud by extracting words from the given event. It aims to avoid including PTR\n (Pointer) records, as they tend to produce unhelpful mutations in the word cloud.\n\n Args:\n event (Event): The event object containing the words to be absorbed into the word cloud.\n \"\"\"\n for word in event.words:\n self.add_word(word)\n if event.scope_distance == 0 and event.type.startswith(\"DNS_NAME\"):\n subdomain = tldextract(event.data).subdomain\n if subdomain and not self.parent_helper.is_ptr(subdomain):\n for s in subdomain.split(\".\"):\n self.dns_mutator.add_word(s)\n\n def absorb_word(self, word, wordninja=True):\n \"\"\"\n Absorbs a word into the word cloud after splitting it using a word extraction algorithm.\n\n This method splits the input word into smaller meaningful words using word extraction, and then adds each\n of them to the word cloud. The splitting is done using a predefined algorithm in the parent helper.\n\n Args:\n word (str): The word to be split and absorbed into the word cloud.\n wordninja (bool, optional): If True, word extraction is enabled. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.absorb_word(\"blacklantern\")\n >>> print(self.helpers.word_cloud)\n {\n \"blacklantern\": 1,\n \"black\": 1,\n \"bl\": 1,\n \"lantern\": 1\n }\n \"\"\"\n for w in self.parent_helper.extract_words(word, wordninja=wordninja):\n self.add_word(w)\n\n def add_word(self, word, lowercase=True):\n \"\"\"\n Adds a word to the word cloud.\n\n This method updates the word cloud by adding a given word. If the word already exists in the cloud,\n its frequency count is incremented by 1. Optionally, the word can be converted to lowercase before adding.\n\n Args:\n word (str): The word to be added to the word cloud.\n lowercase (bool, optional): If True, the word will be converted to lowercase before adding. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.add_word(\"Example\")\n >>> self.helpers.word_cloud.add_word(\"example\")\n >>> print(self.helpers.word_cloud)\n {'example': 2}\n \"\"\"\n if lowercase:\n word = word.lower()\n try:\n self[word] += 1\n except KeyError:\n self[word] = 1\n\n def get_number_mutations(self, base, n=5, padding=2):\n \"\"\"\n Generates mutations of a base string by modifying the numerical parts or appending numbers.\n\n This method detects existing numbers in the base string and tries incrementing and decrementing them within a\n specified range. It also appends numbers at the end or after each word to generate more mutations.\n\n Args:\n base (str): The base string to generate mutations from.\n n (int, optional): The range of numbers to use for incrementing/decrementing. Defaults to 5.\n padding (int, optional): Zero-pad numbers up to this length. Defaults to 2.\n\n Returns:\n set: A set of mutated strings based on the base input.\n\n Examples:\n >>> self.helpers.word_cloud.get_number_mutations(\"www2-test\", n=2)\n {\n \"www0-test\",\n \"www1-test\",\n \"www2-test\",\n \"www2-test0\",\n \"www2-test00\",\n \"www2-test01\",\n \"www2-test1\",\n \"www3-test\",\n \"www4-test\"\n }\n \"\"\"\n results = set()\n\n # detects numbers and increments/decrements them\n # e.g. for \"base2_p013\", we would try:\n # - \"base0_p013\" through \"base12_p013\"\n # - \"base2_p003\" through \"base2_p023\"\n # limited to three iterations for sanity's sake\n for match in list(self.parent_helper.regexes.num_regex.finditer(base))[-3:]:\n span = match.span()\n before = base[: span[0]]\n after = base[span[-1] :]\n number = base[span[0] : span[-1]]\n numlen = len(number)\n maxnum = min(int(\"9\" * numlen), int(number) + n)\n minnum = max(0, int(number) - n)\n for i in range(minnum, maxnum + 1):\n filled_num = str(i).zfill(numlen)\n results.add(f\"{before}{filled_num}{after}\")\n if not number.startswith(\"0\"):\n results.add(f\"{before}{i}{after}\")\n\n # appends numbers after each word\n # e.g., for \"base_www\", we would try:\n # - \"base1_www\", \"base2_www\", etc.\n # - \"base_www1\", \"base_www2\", etc.\n # limited to three iterations for sanity's sake\n number_suffixes = self.parent_helper.gen_numbers(n, padding)\n for match in list(self.parent_helper.regexes.word_regex.finditer(base))[-3:]:\n span = match.span()\n for suffix in number_suffixes:\n before = base[: span[-1]]\n after = base[span[-1] :]\n # skip if there's already a number\n if len(after) > 1 and not after[0].isdigit():\n results.add(f\"{before}{suffix}{after}\")\n # basic cases so we don't miss anything\n for s in number_suffixes:\n results.add(f\"{base}{s}\")\n results.add(base)\n\n return results\n\n def truncate(self, limit):\n \"\"\"\n Truncates the word cloud dictionary to retain only the top `limit` entries based on their occurrence frequencies.\n\n Args:\n limit (int): The maximum number of entries to retain in the word cloud.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.truncate(2)\n >>> self.helpers.word_cloud\n {'cherry': 8, 'apple': 5}\n \"\"\"\n new_self = dict(self.json(limit=limit))\n self.clear()\n self.update(new_self)\n\n def json(self, limit=None):\n \"\"\"\n Returns the word cloud as a sorted OrderedDict, optionally truncated to the top `limit` entries.\n\n Args:\n limit (int, optional): The maximum number of entries to include in the returned OrderedDict. If None, all entries are included.\n\n Returns:\n OrderedDict: A dictionary sorted by word frequencies, potentially truncated to the top `limit` entries.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.json(limit=2)\n OrderedDict([('cherry', 8), ('apple', 5)])\n \"\"\"\n cloud_sorted = sorted(self.items(), key=lambda x: x[-1], reverse=True)\n if limit is not None:\n cloud_sorted = cloud_sorted[:limit]\n return OrderedDict(cloud_sorted)\n\n @property\n def default_filename(self):\n return self.parent_helper.preset.scan.home / f\"wordcloud.tsv\"\n\n def save(self, filename=None, limit=None):\n \"\"\"\n Saves the word cloud to a file. The cloud can optionally be truncated to the top `limit` entries.\n\n Args:\n filename (str, optional): The path to the file where the word cloud will be saved. If None, uses a default filename.\n limit (int, optional): The maximum number of entries to save to the file. If None, all entries are saved.\n\n Returns:\n tuple: A tuple containing a boolean indicating success or failure, and the resolved filename.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.save(filename=\"word_cloud.txt\", limit=2)\n (True, Path('word_cloud.txt'))\n \"\"\"\n if filename is None:\n filename = self.default_filename\n else:\n filename = Path(filename).resolve()\n try:\n if not self.parent_helper.mkdir(filename.parent):\n log.error(f\"Failure creating or error writing to {filename.parent} when saving word cloud\")\n return\n if len(self) > 0:\n log.debug(f\"Saving word cloud to {filename}\")\n with open(str(filename), mode=\"w\", newline=\"\") as f:\n c = csv.writer(f, delimiter=\"\\t\")\n for word, count in self.json(limit).items():\n c.writerow([count, word])\n log.debug(f\"Saved word cloud ({len(self):,} words) to {filename}\")\n return True, filename\n else:\n log.debug(f\"No words to save\")\n except Exception as e:\n import traceback\n\n log.warning(f\"Failed to save word cloud to {filename}: {e}\")\n log.trace(traceback.format_exc())\n return False, filename\n\n def load(self, filename=None):\n \"\"\"\n Loads a word cloud from a file. The file can be either a standard wordlist with one entry per line\n or a .tsv (tab-separated) file where the first row is the count and the second row is the associated entry.\n\n Args:\n filename (str, optional): The path to the file from which to load the word cloud. If None, uses a default filename.\n \"\"\"\n if filename is None:\n wordcloud_path = self.default_filename\n else:\n wordcloud_path = Path(filename).resolve()\n log.verbose(f\"Loading word cloud from {wordcloud_path}\")\n try:\n with open(str(wordcloud_path), newline=\"\") as f:\n c = csv.reader(f, delimiter=\"\\t\")\n for row in c:\n if len(row) == 1:\n self.add_word(row[0])\n elif len(row) == 2:\n with suppress(Exception):\n count, word = row\n count = int(count)\n self[word] = count\n if len(self) > 0:\n log.success(f\"Loaded word cloud ({len(self):,} words) from {wordcloud_path}\")\n except Exception as e:\n import traceback\n\n log_fn = log.debug\n if filename is not None:\n log_fn = log.warning\n log_fn(f\"Failed to load word cloud from {wordcloud_path}: {e}\")\n if filename is not None:\n log.trace(traceback.format_exc())\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.absorb_event","title":"absorb_event","text":"absorb_event(event)\n
Absorbs an event from a BBOT scan into the word cloud.
This method updates the word cloud by extracting words from the given event. It aims to avoid including PTR (Pointer) records, as they tend to produce unhelpful mutations in the word cloud.
Parameters:
event
(Event
) \u2013 The event object containing the words to be absorbed into the word cloud.
bbot/core/helpers/wordcloud.py
def absorb_event(self, event):\n \"\"\"\n Absorbs an event from a BBOT scan into the word cloud.\n\n This method updates the word cloud by extracting words from the given event. It aims to avoid including PTR\n (Pointer) records, as they tend to produce unhelpful mutations in the word cloud.\n\n Args:\n event (Event): The event object containing the words to be absorbed into the word cloud.\n \"\"\"\n for word in event.words:\n self.add_word(word)\n if event.scope_distance == 0 and event.type.startswith(\"DNS_NAME\"):\n subdomain = tldextract(event.data).subdomain\n if subdomain and not self.parent_helper.is_ptr(subdomain):\n for s in subdomain.split(\".\"):\n self.dns_mutator.add_word(s)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.absorb_word","title":"absorb_word","text":"absorb_word(word, wordninja=True)\n
Absorbs a word into the word cloud after splitting it using a word extraction algorithm.
This method splits the input word into smaller meaningful words using word extraction, and then adds each of them to the word cloud. The splitting is done using a predefined algorithm in the parent helper.
Parameters:
word
(str
) \u2013 The word to be split and absorbed into the word cloud.
wordninja
(bool
, default: True
) \u2013 If True, word extraction is enabled. Defaults to True.
Examples:
>>> self.helpers.word_cloud.absorb_word(\"blacklantern\")\n>>> print(self.helpers.word_cloud)\n{\n \"blacklantern\": 1,\n \"black\": 1,\n \"bl\": 1,\n \"lantern\": 1\n}\n
Source code in bbot/core/helpers/wordcloud.py
def absorb_word(self, word, wordninja=True):\n \"\"\"\n Absorbs a word into the word cloud after splitting it using a word extraction algorithm.\n\n This method splits the input word into smaller meaningful words using word extraction, and then adds each\n of them to the word cloud. The splitting is done using a predefined algorithm in the parent helper.\n\n Args:\n word (str): The word to be split and absorbed into the word cloud.\n wordninja (bool, optional): If True, word extraction is enabled. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.absorb_word(\"blacklantern\")\n >>> print(self.helpers.word_cloud)\n {\n \"blacklantern\": 1,\n \"black\": 1,\n \"bl\": 1,\n \"lantern\": 1\n }\n \"\"\"\n for w in self.parent_helper.extract_words(word, wordninja=wordninja):\n self.add_word(w)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.add_word","title":"add_word","text":"add_word(word, lowercase=True)\n
Adds a word to the word cloud.
This method updates the word cloud by adding a given word. If the word already exists in the cloud, its frequency count is incremented by 1. Optionally, the word can be converted to lowercase before adding.
Parameters:
word
(str
) \u2013 The word to be added to the word cloud.
lowercase
(bool
, default: True
) \u2013 If True, the word will be converted to lowercase before adding. Defaults to True.
Examples:
>>> self.helpers.word_cloud.add_word(\"Example\")\n>>> self.helpers.word_cloud.add_word(\"example\")\n>>> print(self.helpers.word_cloud)\n{'example': 2}\n
Source code in bbot/core/helpers/wordcloud.py
def add_word(self, word, lowercase=True):\n \"\"\"\n Adds a word to the word cloud.\n\n This method updates the word cloud by adding a given word. If the word already exists in the cloud,\n its frequency count is incremented by 1. Optionally, the word can be converted to lowercase before adding.\n\n Args:\n word (str): The word to be added to the word cloud.\n lowercase (bool, optional): If True, the word will be converted to lowercase before adding. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.add_word(\"Example\")\n >>> self.helpers.word_cloud.add_word(\"example\")\n >>> print(self.helpers.word_cloud)\n {'example': 2}\n \"\"\"\n if lowercase:\n word = word.lower()\n try:\n self[word] += 1\n except KeyError:\n self[word] = 1\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.get_number_mutations","title":"get_number_mutations","text":"get_number_mutations(base, n=5, padding=2)\n
Generates mutations of a base string by modifying the numerical parts or appending numbers.
This method detects existing numbers in the base string and tries incrementing and decrementing them within a specified range. It also appends numbers at the end or after each word to generate more mutations.
Parameters:
base
(str
) \u2013 The base string to generate mutations from.
n
(int
, default: 5
) \u2013 The range of numbers to use for incrementing/decrementing. Defaults to 5.
padding
(int
, default: 2
) \u2013 Zero-pad numbers up to this length. Defaults to 2.
Returns:
set
\u2013 A set of mutated strings based on the base input.
Examples:
>>> self.helpers.word_cloud.get_number_mutations(\"www2-test\", n=2)\n{\n \"www0-test\",\n \"www1-test\",\n \"www2-test\",\n \"www2-test0\",\n \"www2-test00\",\n \"www2-test01\",\n \"www2-test1\",\n \"www3-test\",\n \"www4-test\"\n}\n
Source code in bbot/core/helpers/wordcloud.py
def get_number_mutations(self, base, n=5, padding=2):\n \"\"\"\n Generates mutations of a base string by modifying the numerical parts or appending numbers.\n\n This method detects existing numbers in the base string and tries incrementing and decrementing them within a\n specified range. It also appends numbers at the end or after each word to generate more mutations.\n\n Args:\n base (str): The base string to generate mutations from.\n n (int, optional): The range of numbers to use for incrementing/decrementing. Defaults to 5.\n padding (int, optional): Zero-pad numbers up to this length. Defaults to 2.\n\n Returns:\n set: A set of mutated strings based on the base input.\n\n Examples:\n >>> self.helpers.word_cloud.get_number_mutations(\"www2-test\", n=2)\n {\n \"www0-test\",\n \"www1-test\",\n \"www2-test\",\n \"www2-test0\",\n \"www2-test00\",\n \"www2-test01\",\n \"www2-test1\",\n \"www3-test\",\n \"www4-test\"\n }\n \"\"\"\n results = set()\n\n # detects numbers and increments/decrements them\n # e.g. for \"base2_p013\", we would try:\n # - \"base0_p013\" through \"base12_p013\"\n # - \"base2_p003\" through \"base2_p023\"\n # limited to three iterations for sanity's sake\n for match in list(self.parent_helper.regexes.num_regex.finditer(base))[-3:]:\n span = match.span()\n before = base[: span[0]]\n after = base[span[-1] :]\n number = base[span[0] : span[-1]]\n numlen = len(number)\n maxnum = min(int(\"9\" * numlen), int(number) + n)\n minnum = max(0, int(number) - n)\n for i in range(minnum, maxnum + 1):\n filled_num = str(i).zfill(numlen)\n results.add(f\"{before}{filled_num}{after}\")\n if not number.startswith(\"0\"):\n results.add(f\"{before}{i}{after}\")\n\n # appends numbers after each word\n # e.g., for \"base_www\", we would try:\n # - \"base1_www\", \"base2_www\", etc.\n # - \"base_www1\", \"base_www2\", etc.\n # limited to three iterations for sanity's sake\n number_suffixes = self.parent_helper.gen_numbers(n, padding)\n for match in list(self.parent_helper.regexes.word_regex.finditer(base))[-3:]:\n span = match.span()\n for suffix in number_suffixes:\n before = base[: span[-1]]\n after = base[span[-1] :]\n # skip if there's already a number\n if len(after) > 1 and not after[0].isdigit():\n results.add(f\"{before}{suffix}{after}\")\n # basic cases so we don't miss anything\n for s in number_suffixes:\n results.add(f\"{base}{s}\")\n results.add(base)\n\n return results\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.json","title":"json","text":"json(limit=None)\n
Returns the word cloud as a sorted OrderedDict, optionally truncated to the top limit
entries.
Parameters:
limit
(int
, default: None
) \u2013 The maximum number of entries to include in the returned OrderedDict. If None, all entries are included.
Returns:
OrderedDict
\u2013 A dictionary sorted by word frequencies, potentially truncated to the top limit
entries.
Examples:
>>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n>>> self.helpers.word_cloud.json(limit=2)\nOrderedDict([('cherry', 8), ('apple', 5)])\n
Source code in bbot/core/helpers/wordcloud.py
def json(self, limit=None):\n \"\"\"\n Returns the word cloud as a sorted OrderedDict, optionally truncated to the top `limit` entries.\n\n Args:\n limit (int, optional): The maximum number of entries to include in the returned OrderedDict. If None, all entries are included.\n\n Returns:\n OrderedDict: A dictionary sorted by word frequencies, potentially truncated to the top `limit` entries.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.json(limit=2)\n OrderedDict([('cherry', 8), ('apple', 5)])\n \"\"\"\n cloud_sorted = sorted(self.items(), key=lambda x: x[-1], reverse=True)\n if limit is not None:\n cloud_sorted = cloud_sorted[:limit]\n return OrderedDict(cloud_sorted)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.load","title":"load","text":"load(filename=None)\n
Loads a word cloud from a file. The file can be either a standard wordlist with one entry per line or a .tsv (tab-separated) file where the first row is the count and the second row is the associated entry.
Parameters:
filename
(str
, default: None
) \u2013 The path to the file from which to load the word cloud. If None, uses a default filename.
bbot/core/helpers/wordcloud.py
def load(self, filename=None):\n \"\"\"\n Loads a word cloud from a file. The file can be either a standard wordlist with one entry per line\n or a .tsv (tab-separated) file where the first row is the count and the second row is the associated entry.\n\n Args:\n filename (str, optional): The path to the file from which to load the word cloud. If None, uses a default filename.\n \"\"\"\n if filename is None:\n wordcloud_path = self.default_filename\n else:\n wordcloud_path = Path(filename).resolve()\n log.verbose(f\"Loading word cloud from {wordcloud_path}\")\n try:\n with open(str(wordcloud_path), newline=\"\") as f:\n c = csv.reader(f, delimiter=\"\\t\")\n for row in c:\n if len(row) == 1:\n self.add_word(row[0])\n elif len(row) == 2:\n with suppress(Exception):\n count, word = row\n count = int(count)\n self[word] = count\n if len(self) > 0:\n log.success(f\"Loaded word cloud ({len(self):,} words) from {wordcloud_path}\")\n except Exception as e:\n import traceback\n\n log_fn = log.debug\n if filename is not None:\n log_fn = log.warning\n log_fn(f\"Failed to load word cloud from {wordcloud_path}: {e}\")\n if filename is not None:\n log.trace(traceback.format_exc())\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.mutations","title":"mutations","text":"mutations(words, devops=True, cloud=True, letters=True, numbers=5, number_padding=2, substitute_numbers=True)\n
Generate various mutations for the given list of words based on different criteria.
Yields tuples of strings which can be joined on the desired delimiter, e.g. \"-\" or \"_\".
Parameters:
words
(Union[str, Iterable[str]]
) \u2013 A single word or list of words to mutate.
devops
(bool
, default: True
) \u2013 Whether to include devops-related mutations.
cloud
(bool
, default: True
) \u2013 Whether to include mutations from the word cloud.
letters
(bool
, default: True
) \u2013 Whether to include letter-based mutations.
numbers
(int
, default: 5
) \u2013 The maximum numeric mutations to include.
number_padding
(int
, default: 2
) \u2013 Padding for numeric mutations.
substitute_numbers
(bool
, default: True
) \u2013 Whether to substitute numbers in mutations.
Yields:
tuple
\u2013 A tuple containing each of the mutation segments.
bbot/core/helpers/wordcloud.py
def mutations(\n self, words, devops=True, cloud=True, letters=True, numbers=5, number_padding=2, substitute_numbers=True\n):\n \"\"\"\n Generate various mutations for the given list of words based on different criteria.\n\n Yields tuples of strings which can be joined on the desired delimiter, e.g. \"-\" or \"_\".\n\n Args:\n words (Union[str, Iterable[str]]): A single word or list of words to mutate.\n devops (bool): Whether to include devops-related mutations.\n cloud (bool): Whether to include mutations from the word cloud.\n letters (bool): Whether to include letter-based mutations.\n numbers (int): The maximum numeric mutations to include.\n number_padding (int): Padding for numeric mutations.\n substitute_numbers (bool): Whether to substitute numbers in mutations.\n\n Yields:\n tuple: A tuple containing each of the mutation segments.\n \"\"\"\n if isinstance(words, str):\n words = (words,)\n results = set()\n for word in words:\n h = hash(word)\n if not h in results:\n results.add(h)\n yield (word,)\n if numbers > 0:\n if substitute_numbers:\n for word in words:\n for number_mutation in self.get_number_mutations(word, n=numbers, padding=number_padding):\n h = hash(number_mutation)\n if not h in results:\n results.add(h)\n yield (number_mutation,)\n for word in words:\n for modifier in self.modifiers(\n devops=devops, cloud=cloud, letters=letters, numbers=numbers, number_padding=number_padding\n ):\n a = (word, modifier)\n b = (modifier, word)\n for _ in (a, b):\n h = hash(_)\n if h not in results:\n results.add(h)\n yield _\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.save","title":"save","text":"save(filename=None, limit=None)\n
Saves the word cloud to a file. The cloud can optionally be truncated to the top limit
entries.
Parameters:
filename
(str
, default: None
) \u2013 The path to the file where the word cloud will be saved. If None, uses a default filename.
limit
(int
, default: None
) \u2013 The maximum number of entries to save to the file. If None, all entries are saved.
Returns:
tuple
\u2013 A tuple containing a boolean indicating success or failure, and the resolved filename.
Examples:
>>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n>>> self.helpers.word_cloud.save(filename=\"word_cloud.txt\", limit=2)\n(True, Path('word_cloud.txt'))\n
Source code in bbot/core/helpers/wordcloud.py
def save(self, filename=None, limit=None):\n \"\"\"\n Saves the word cloud to a file. The cloud can optionally be truncated to the top `limit` entries.\n\n Args:\n filename (str, optional): The path to the file where the word cloud will be saved. If None, uses a default filename.\n limit (int, optional): The maximum number of entries to save to the file. If None, all entries are saved.\n\n Returns:\n tuple: A tuple containing a boolean indicating success or failure, and the resolved filename.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.save(filename=\"word_cloud.txt\", limit=2)\n (True, Path('word_cloud.txt'))\n \"\"\"\n if filename is None:\n filename = self.default_filename\n else:\n filename = Path(filename).resolve()\n try:\n if not self.parent_helper.mkdir(filename.parent):\n log.error(f\"Failure creating or error writing to {filename.parent} when saving word cloud\")\n return\n if len(self) > 0:\n log.debug(f\"Saving word cloud to {filename}\")\n with open(str(filename), mode=\"w\", newline=\"\") as f:\n c = csv.writer(f, delimiter=\"\\t\")\n for word, count in self.json(limit).items():\n c.writerow([count, word])\n log.debug(f\"Saved word cloud ({len(self):,} words) to {filename}\")\n return True, filename\n else:\n log.debug(f\"No words to save\")\n except Exception as e:\n import traceback\n\n log.warning(f\"Failed to save word cloud to {filename}: {e}\")\n log.trace(traceback.format_exc())\n return False, filename\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.truncate","title":"truncate","text":"truncate(limit)\n
Truncates the word cloud dictionary to retain only the top limit
entries based on their occurrence frequencies.
Parameters:
limit
(int
) \u2013 The maximum number of entries to retain in the word cloud.
Examples:
>>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n>>> self.helpers.word_cloud.truncate(2)\n>>> self.helpers.word_cloud\n{'cherry': 8, 'apple': 5}\n
Source code in bbot/core/helpers/wordcloud.py
def truncate(self, limit):\n \"\"\"\n Truncates the word cloud dictionary to retain only the top `limit` entries based on their occurrence frequencies.\n\n Args:\n limit (int): The maximum number of entries to retain in the word cloud.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.truncate(2)\n >>> self.helpers.word_cloud\n {'cherry': 8, 'apple': 5}\n \"\"\"\n new_self = dict(self.json(limit=limit))\n self.clear()\n self.update(new_self)\n
"},{"location":"modules/custom_yara_rules/","title":"Custom Yara Rules","text":""},{"location":"modules/custom_yara_rules/#overview","title":"Overview","text":"Through the excavate
internal module, BBOT supports searching through HTTP response data using custom YARA rules.
This feature can be utilized with the command line option --custom-yara-rules
or -cy
, followed by a file containing the YARA rules.
Example:
bbot -m httpx --custom-yara-rules=test.yara -t http://example.com/\n
Where test.yara
is a file on the filesystem. The file can contain multiple YARA rules, separated by lines.
YARA rules can be quite simple, the simplest example being a single string search:
rule find_string {\n strings:\n $str1 = \"AAAABBBB\"\n\n condition:\n $str1\n}\n
To look for multiple strings, and match if any of them were to hit:
rule find_string {\n strings:\n $str1 = \"AAAABBBB\"\n $str2 = \"CCCCDDDD\"\n\n condition:\n any of them\n}\n
One of the most important capabilities is the use of regexes within the rule, as shown in the following example.
rule find_AAAABBBB_regex {\n strings:\n $regex = /A{1,4}B{1,4}/\n\n condition:\n $regex\n}\n
Note: YARA uses it's own regex engine that is not a 1:1 match with python regexes. This means many existing regexes will have to be modified before they will work with YARA. The good news is: YARA's regex engine is FAST, immensely more fast than pythons!
Further discussion of art of writing complex YARA rules goes far beyond the scope of this documentation. A good place to start learning more is the official YARA documentation.
The YARA engine provides plenty of room to make highly complex signatures possible, with various conditional operators available. Multiple signatures can be linked together to create sophisticated detection rules that can identify a wide range of specific content. This flexibility allows the crafting of efficient rules for detecting security vulnerabilities, leveraging logical operators, regular expressions, and other powerful features. Additionally, YARA's modular structure supports easy updates and maintenance of signature sets.
"},{"location":"modules/custom_yara_rules/#custom-options","title":"Custom options","text":"BBOT supports the use of a few custom meta
attributes within YARA rules, which will alter the behavior of the rule and the post-processing of the results.
The description of the rule. Will end up in the description of any produced events if defined.
Example with no description provided:
[FINDING] {\"description\": \"Custom Yara Rule [find_string] Matched via identifier [str1]\", \"host\": \"example.com\", \"url\": \"http://example.com\"} excavate\n
Example with the description added:
[FINDING] {\"description\": \"Custom Yara Rule [AAAABBBB] with description: [contains our test string] Matched via identifier [str1]\", \"host\": \"example.com, \"url\": \"http://example.com\"} excavate\n
That FINDING was produced with the following signature:
rule AAAABBBB {\n\n meta:\n description = \"contains our test string\"\n strings:\n $str1 = \"AAAABBBB\"\n condition:\n $str1\n}\n
"},{"location":"modules/custom_yara_rules/#tags","title":"tags","text":"Tags specified with this option will be passed-on to any resulting emitted events. Tags are provided as a comma separated string, as shown below:
Lets expand on the previous example:
rule AAAABBBB {\n\n meta:\n description = \"contains our test string\"\n tags = \"tag1,tag2,tag3\"\n strings:\n $str1 = \"AAAABBBB\"\n condition:\n $str1\n}\n
Now, the BBOT FINDING includes these custom tags, as with the following output:
[FINDING] {\"description\": \"Custom Yara Rule [AAAABBBB] with description: [contains our test string] Matched via identifier [str1]\", \"host\": \"example.com\", \"url\": \"http://example.com/\"} excavate (tag1, tag2, tag3)\n
"},{"location":"modules/custom_yara_rules/#emit_match","title":"emit_match","text":"When set to True, the contents returned from a successful extraction via a YARA regex will be included in the FINDING event which is emitted.
Consider the following example YARA rule:
rule SubstackLink\n{\n meta:\n description = \"contains a Substack link\"\n emit_match = true\n strings:\n $substack_link = /https?:\\/\\/[a-zA-Z0-9.-]+\\.substack\\.com/\n condition:\n $substack_link\n}\n
When run against the Black Lantern Security homepage with the following BBOT command:
bbot -m httpx --custom-yara-rules=substack.yara -t http://www.blacklanternsecurity.com/\n
We get the following result. Note that the finding now contains the actual link that was identified with the regex.
[FINDING] {\"description\": \"Custom Yara Rule [SubstackLink] with description: [contains a Substack link] Matched via identifier [substack_link] and extracted [https://blacklanternsecurity.substack.com]\", \"host\": \"www.blacklanternsecurity.com\", \"url\": \"https://www.blacklanternsecurity.com/\"} excavate\n
"},{"location":"modules/internal_modules/","title":"List of Modules","text":""},{"location":"modules/internal_modules/#what-are-internal-modules","title":"What are internal modules?","text":"Internal modules are just like regular modules, except that they run all the time. They do not have to be explicitly enabled. They can, however, be explicitly disabled if needed.
Turning them off is simple, a root-level config option is present which can be set to False to disable them:
# Infer certain events from others, e.g. IPs from IP ranges, DNS_NAMEs from URLs, etc.\nspeculate: True\n# Passively search event data for URLs, hostnames, emails, etc.\nexcavate: True\n# Summarize activity at the end of a scan\naggregate: True\n# DNS resolution\ndnsresolve: True\n# Cloud provider tagging\ncloudcheck: True\n
These modules are executing core functionality that is normally essential for a typical BBOT scan. Let's take a quick look at each one's functionality:
"},{"location":"modules/internal_modules/#aggregate","title":"aggregate","text":"Summarize statistics at the end of a scan. Disable if you don't want to see this table.
"},{"location":"modules/internal_modules/#cloud","title":"cloud","text":"The cloud module looks at events and tries to determine if they are associated with a cloud provider and tags them as such, and can also identify certain cloud resources
"},{"location":"modules/internal_modules/#dns","title":"dns","text":"The DNS internal module controls the basic DNS resoultion the BBOT performs, and all of the supporting machinery like wildcard detection, etc.
"},{"location":"modules/internal_modules/#excavate","title":"excavate","text":"The excavate internal module designed to passively extract valuable information from HTTP response data. It primarily uses YARA regexes to extract information, with various events being produced from the post-processing of the YARA results.
Here is a summary of the data it produces:
"},{"location":"modules/internal_modules/#urls","title":"URLs","text":"By extracting URLs from all visited pages, this is actually already half of a web-spider. The other half is recursion, which is baked in to BBOT from the ground up. Therefore, protections are in place by default in the form of web_spider_distance
and web_spider_depth
settings. These settings govern restrictions to URLs recursively harvested from HTTP responses, preventing endless runaway scans. However, in the right situation the controlled use of a web-spider is extremely powerful.
Parameter Extraction The parameter extraction functionality identifies and extracts key web parameters from HTTP responses, and produced WEB_PARAMETER
events. This includes parameters found in GET and POST requests, HTML forms, and jQuery requests. Currently, these are only used by the hunt
module, and by the paramminer
modules, to a limited degree. However, future functionality will make extensive use of these events.
Detect email addresses within HTTP_RESPONSE data.
"},{"location":"modules/internal_modules/#error-detection","title":"Error Detection","text":"Scans for verbose error messages in HTTP responses and raw text data. By identifying specific error signatures from various programming languages and frameworks, this feature helps uncover misconfigurations, debugging information, and potential vulnerabilities. This insight is invaluable for identifying weak points or anomalies in web applications.
"},{"location":"modules/internal_modules/#content-security-policy-csp-extraction","title":"Content Security Policy (CSP) Extraction","text":"The CSP extraction capability focuses on extracting domains from Content-Security-Policy headers. By analyzing these headers, BBOT can identify additional domains which can get fed back into the scan.
"},{"location":"modules/internal_modules/#serialization-detection","title":"Serialization Detection","text":"Serialized objects are a common source of serious security vulnerablities. Excavate aims to detect those used in Java, .NET, and PHP applications.
"},{"location":"modules/internal_modules/#functionality-detection","title":"Functionality Detection","text":"Looks for specific web functionalities such as file upload fields and WSDL URLs. By identifying these elements, BBOT can pinpoint areas of the application that may require further scrutiny for security vulnerabilities.
"},{"location":"modules/internal_modules/#non-http-scheme-detection","title":"Non-HTTP Scheme Detection","text":"The non-HTTP scheme detection capability extracts URLs with non-HTTP schemes, such as ftp, mailto, and javascript. By identifying these URLs, BBOT can uncover additional vectors for attack or information leakage.
"},{"location":"modules/internal_modules/#custom-yara-rules","title":"Custom Yara Rules","text":"Excavate supports the use of custom YARA rules, which wil be added to the other rules before the scan start. For more info, view this.
"},{"location":"modules/internal_modules/#speculate","title":"speculate","text":"Speculate is all about inferring one data type from another, particularly when certain tools like port scanners are not enabled. This is essential functionality for most BBOT scans, allowing for the discovery of web resources when starting with a DNS-only target list without a port scanner. It bridges gaps in the data, providing a more comprehensive view of the target by leveraging existing information.
For a list of module config options, see Module Options.
"},{"location":"modules/nuclei/","title":"Nuclei","text":""},{"location":"modules/nuclei/#overview","title":"Overview","text":"BBOT integrates with Nuclei, an open-source web vulnerability scanner by Project Discovery. This is one of the ways BBOT makes it possible to go from a single target domain/IP all the way to confirmed vulnerabilities, in one scan.
You can specify individual nuclei templates by setting the modules.nuclei.templates
to their comma-separated filenames:
bbot -m nuclei -c modules.nuclei.templates=http/takeovers/airee-takeover.yaml,http/takeovers/cargo-takeover.yaml\n
...or via the config:
modules:\n nuclei:\n templates: http/takeovers/airee-takeover.yaml,http/takeovers/cargo-takeover.yaml\n
"},{"location":"modules/nuclei/#configuration-and-options","title":"Configuration and Options","text":"The Nuclei module has many configuration options:
Config Option Type Description Default modules.nuclei.batch_size int Number of targets to send to Nuclei per batch (default 200) 200 modules.nuclei.budget int Used in budget mode to set the number of requests which will be allotted to the nuclei scan 1 modules.nuclei.concurrency int maximum number of templates to be executed in parallel (default 25) 25 modules.nuclei.directory_only bool Filter out 'file' URL event (default True) True modules.nuclei.etags str tags to exclude from the scan modules.nuclei.mode str manual | technology | severe | budget. Technology: Only activate based on technology events that match nuclei tags (nuclei -as mode). Manual (DEFAULT): Fully manual settings. Severe: Only critical and high severity templates without intrusive. Budget: Limit Nuclei to a specified number of HTTP requests manual modules.nuclei.ratelimit int maximum number of requests to send per second (default 150) 150 modules.nuclei.retries int number of times to retry a failed request (default 0) 0 modules.nuclei.severity str Filter based on severity field available in the template. modules.nuclei.silent bool Don't display nuclei's banner or status messages False modules.nuclei.tags str execute a subset of templates that contain the provided tags modules.nuclei.templates str template or template directory paths to include in the scan modules.nuclei.version str nuclei version 3.2.0Most of these you probably will NOT want to change. In particular, we advise against changing the version of Nuclei, as it's possible the latest version won't work right with BBOT.
We also do not recommend changing directory_only mode. This will cause Nuclei to process every URL. Because BBOT is recursive, this can get very out-of-hand very quickly, depending on which other modules are in use.
"},{"location":"modules/nuclei/#modes","title":"Modes","text":"The modes with the Nuclei module are generally in place to help you limit the number of templates you are scanning with, to make your scans quicker.
"},{"location":"modules/nuclei/#manual","title":"Manual","text":"This is the default setting, and will use all templates. However, if you're looking to do something particular, you might pair this with some of the pass-through options shown in the next setting.
"},{"location":"modules/nuclei/#severe","title":"Severe","text":"severe mode uses only high/critical severity templates. It also excludes the intrusive tag. This is intended to be a shortcut for times when you need to rapidly identify high severity vulnerabilities but can't afford the full scan. Because most templates are INFO, LOW, or MEDIUM, your scan will finish much faster.
"},{"location":"modules/nuclei/#technology","title":"Technology","text":"This is equivalent to the Nuclei '-as' scan option. It only use templates that match detected technologies, using wappalyzer-based signatures. This can be a nice way to run a light-weight scan that still has a chance to find some good vulnerabilities.
"},{"location":"modules/nuclei/#budget","title":"Budget","text":"Budget mode is unique to BBOT.
For larger scans with thousands of targets, doing a FULL Nuclei scan (1000s of Requests) for each is not realistic. As an alternative to the other modes, you can take advantage of Nuclei's \"collapsible\" template feature.
For only the cost of one (or more) \"extra\" request(s) per host, it can activate several hundred modules. These are modules which happen to look at a BaseUrl, and typically look for a specific string or other attribute. Nuclei is smart about reusing the request data when it can, and we can use this to our advantage.
The budget parameter is the # of extra requests per host you are willing to send to \"feed\" Nuclei templates (defaults to 1). For those times when vulnerability scanning isn't the main focus, but you want to look for easy wins.
Of course, there is a rapidly diminishing return when you set he value to more than a handful. Eventually, this becomes 1 template per 1 budget value increase. However, in the 1-10 range there is a lot of value. This graphic should give you a rough visual idea of this concept.
"},{"location":"modules/nuclei/#nuclei-pass-through-options","title":"Nuclei pass-through options","text":"Most of the rest of the options are usually passed straight through to Nuclei when its executed. You can do things like set specific tags to include, (or exclude with etags), exactly how you'd do with Nuclei directly. You can also limit the templates with severity.
The ratelimit and concurrency settings default to the same defaults that Nuclei does. These are relatively sane settings, but if you are in a sensitive environment it can certainly help to turn them down.
templates will allow you to set your own templates directory. This can be very useful if you have your own custom templates that you want to use with BBOT.
"},{"location":"modules/nuclei/#example-commands","title":"Example Commands","text":"# Scan a SINGLE target with a basic port scan and web modules\nbbot -f web-basic -m portscan nuclei --allow-deadly -t app.evilcorp.com\n
# Scanning MULTIPLE targets\nbbot -f web-basic -m portscan nuclei --allow-deadly -t app1.evilcorp.com app2.evilcorp.com app3.evilcorp.com\n
# Scanning MULTIPLE targets while performing subdomain enumeration\nbbot -f subdomain-enum web-basic -m portscan nuclei --allow-deadly -t app1.evilcorp.com app2.evilcorp.com app3.evilcorp.com\n
# Scanning MULTIPLE targets on a BUDGET\nbbot -f subdomain-enum web-basic -m portscan nuclei --allow-deadly -c modules.nuclei.mode=budget -t app1.evilcorp.com app2.evilcorp.com app3.evilcorp.com\n
"},{"location":"scanning/","title":"Scanning Overview","text":""},{"location":"scanning/#scan-names","title":"Scan Names","text":"Every BBOT scan gets a random, mildly-entertaining name like demonic_jimmy
. Output for that scan, including scan stats and any web screenshots, are saved to a folder by that name in ~/.bbot/scans
. The most recent 20 scans are kept, and older ones are removed.
If you don't want a random name, you can change it with -n
. You can also change the location of BBOT's output with -o
:
# save everything to the folder \"my_scan\" in the current directory\nbbot -t evilcorp.com -f subdomain-enum -m gowitness -n my_scan -o .\n
If you reuse a scan name, BBOT will automatically append to your previous output files.
"},{"location":"scanning/#targets-t","title":"Targets (-t
)","text":"Targets declare what's in-scope, and seed a scan with initial data. BBOT accepts an unlimited number of targets. They can be any of the following:
DNS_NAME
(evilcorp.com
)IP_ADDRESS
(1.2.3.4
)IP_RANGE
(1.2.3.0/24
)OPEN_TCP_PORT
(192.168.0.1:80
)URL
(https://www.evilcorp.com
)Note that BBOT only discriminates down to the host level. This means, for example, if you specify a URL https://www.evilcorp.com
as the target, the scan will be seeded with that URL, but the scope of the scan will be the entire host, www.evilcorp.com
. Other ports/URLs on that same host may also be scanned.
You can specify targets directly on the command line, load them from files, or both! For example:
$ cat targets.txt\n4.3.2.1\n10.0.0.2:80\n1.2.3.0/24\nevilcorp.com\nevilcorp.co.uk\nhttps://www.evilcorp.co.uk\n\n# load targets from a file and from the command-line\n$ bbot -t targets.txt fsociety.com 5.6.7.0/24 -m nmap\n
On start, BBOT automatically converts Targets into Events.
"},{"location":"scanning/#modules-m","title":"Modules (-m
)","text":"To see a full list of modules and their descriptions, use bbot -l
or see List of Modules.
Modules are the part of BBOT that does the work -- port scanning, subdomain brute-forcing, API querying, etc. Modules consume Events (IP_ADDRESS
, DNS_NAME
, etc.) from each other, process the data in a useful way, then emit the results as new events. You can enable individual modules with -m
.
# Enable modules: nmap, sslcert, and httpx\nbbot -t www.evilcorp.com -m nmap sslcert httpx\n
"},{"location":"scanning/#types-of-modules","title":"Types of Modules","text":"Modules fall into three categories:
nmap
, sslcert
, httpx
, etc. Enable with -m
.human
, json
, and csv
are enabled by default. Enable others with -om
. (See: Output)-c speculate=false
).aggregate
: Summarizes results at the end of a scanexcavate
: Extracts useful data such as subdomains from webpages, etc.speculate
: Intelligently infers new events, e.g. OPEN_TCP_PORT
from URL
or IP_ADDRESS
from IP_NETWORK
.For details in the inner workings of modules, see Creating a Module.
"},{"location":"scanning/#flags-f","title":"Flags (-f
)","text":"Flags are how BBOT categorizes its modules. In a way, you can think of them as groups. Flags let you enable a bunch of similar modules at the same time without having to specify them each individually. For example, -f subdomain-enum
would enable every module with the subdomain-enum
flag.
# list all subdomain-enum modules\nbbot -f subdomain-enum -l\n
"},{"location":"scanning/#filtering-modules","title":"Filtering Modules","text":"Modules can be easily enabled/disabled based on their flags:
-f
Enable these flags (e.g. -f subdomain-enum
)-rf
Require modules to have this flag (e.g. -rf safe
)-ef
Exclude these flags (e.g. -ef slow
)-em
Exclude these individual modules (e.g. -em ipneighbor
)-lf
List all available flagsEvery module is either safe
or aggressive
, and either active
or passive
. These can be useful for filtering. For example, if you wanted to enable all the safe
modules, but exclude active ones, you could do:
# Enable safe modules but exclude active ones\nbbot -t evilcorp.com -f safe -ef active\n
This is equivalent to requiring the passive flag:
# Enable safe modules but only if they're also passive\nbbot -t evilcorp.com -f safe -rf passive\n
A single module can have multiple flags. For example, the securitytrails
module is passive
, safe
, subdomain-enum
. Below is a full list of flags and their associated modules.
BBOT modules have external dependencies ranging from OS packages (openssl
) to binaries (nmap
) to Python libraries (wappalyzer
). When a module is enabled, installation of its dependencies happens at runtime with Ansible. BBOT provides several command-line flags to control how dependencies are installed.
--no-deps
- Don't install module dependencies--force-deps
- Force install all module dependencies--retry-deps
- Try again to install failed module dependencies--ignore-failed-deps
- Run modules even if they have failed dependencies--install-all-deps
- Install dependencies for all modules (useful if you are provisioning a pentest system and want to install everything ahead of time)For details on how Ansible playbooks are attached to BBOT modules, see How to Write a Module.
"},{"location":"scanning/#scope","title":"Scope","text":"For pentesters and bug bounty hunters, staying in scope is extremely important. BBOT takes this seriously, meaning that active modules (e.g. nuclei
) will only touch in-scope resources.
By default, scope is whatever you specify with -t
. This includes child subdomains. For example, if you specify -t evilcorp.com
, all its subdomains (www.evilcorp.com
, mail.evilcorp.com
, etc.) also become in-scope.
Since BBOT is recursive, it would quickly resort to scanning the entire internet without some kind of restraining mechanism. To solve this problem, every event discovered by BBOT is assigned a Scope Distance. Scope distance represents how far out from the main scope that data was discovered.
For example, if your target is evilcorp.com
, www.evilcorp.com
would have a scope distance of 0
(i.e. in-scope). If BBOT discovers that www.evilcorp.com
resolves to 1.2.3.4
, 1.2.3.4
is one hop away, which means it would have a scope distance of 1
. If 1.2.3.4
has a PTR record that points to ecorp.blob.core.windows.net
, ecorp.blob.core.windows.net
is two hops away, so its scope distance is 2
.
Scope distance continues to increase the further out you get. Most modules (e.g. nuclei
and nmap
) only consume in-scope events. Certain other passive modules such as asn
accept out to distance 1
. By default, DNS resolution happens out to a distance of 2
. Upon its discovery, any event that's determined to be in-scope (e.g. www.evilcorp.com
) immediately becomes distance 0
, and the cycle starts over.
By default, BBOT only displays in-scope events (with a few exceptions such as STORAGE_BUCKET
s). If you want to see more, you must increase the config value of scope.report_distance
:
# display out-of-scope events up to one hop away from the main scope\nbbot -t evilcorp.com -f subdomain-enum -c scope.report_distance=1\n
"},{"location":"scanning/#strict-scope","title":"Strict Scope","text":"If you want to scan only that specific target hostname and none of its children, you can specify --strict-scope
.
Note that --strict-scope
only applies to targets and whitelists, but not blacklists. This means that if you put internal.evilcorp.com
in your blacklist, you can be sure none of its subdomains will be scanned, even when using --strict-scope
.
BBOT allows precise control over scope with whitelists and blacklists. These both use the same syntax as --target
, meaning they accept the same event types, and you can specify an unlimited number of them, via a file, the CLI, or both.
--whitelist
enables you to override what's in scope. For example, if you want to run nuclei against evilcorp.com
, but stay only inside their corporate IP range of 1.2.3.0/24
, you can accomplish this like so:
# Seed scan with evilcorp.com, but restrict scope to 1.2.3.0/24\nbbot -t evilcorp.com --whitelist 1.2.3.0/24 -f subdomain-enum -m nmap nuclei --allow-deadly\n
--blacklist
takes ultimate precedence. Anything in the blacklist is completely excluded from the scan, even if it's in the whitelist.
# Scan evilcorp.com, but exclude internal.evilcorp.com and its children\nbbot -t evilcorp.com --blacklist internal.evilcorp.com -f subdomain-enum -m nmap nuclei --allow-deadly\n
"},{"location":"scanning/#dns-wildcards","title":"DNS Wildcards","text":"BBOT has robust wildcard detection built-in. It can reliably detect wildcard domains, and will tag them accordingly:
[DNS_NAME] github.io TARGET (a-record, a-wildcard-domain, aaaa-wildcard-domain, wildcard-domain)\n ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^\n
Wildcard hosts are collapsed into a single host beginning with _wildcard
:
[DNS_NAME] _wildcard.github.io TARGET (a-record, a-wildcard, a-wildcard-domain, aaaa-record, aaaa-wildcard, aaaa-wildcard-domain, wildcard, wildcard-domain)\n ^^^^^^^^^\n
If you don't want this, you can disable wildcard detection on a domain-to-domain basis in the config:
~/.bbot/config/bbot.ymldns:\n wildcard_ignore:\n - evilcorp.com\n - evilcorp.co.uk\n
There are certain edge cases (such as with dynamic DNS rules) where BBOT's wildcard detection fails. In these cases, you can try increasing the number of wildcard checks in the config:
~/.bbot/config/bbot.yml# default == 10\ndns:\n wildcard_tests: 20\n
If that doesn't work you can consider blacklisting the offending domain.
"},{"location":"scanning/advanced/","title":"Advanced","text":"Below you can find some advanced uses of BBOT.
"},{"location":"scanning/advanced/#bbot-as-a-python-library","title":"BBOT as a Python library","text":""},{"location":"scanning/advanced/#synchronous","title":"Synchronous","text":"from bbot.scanner import Scanner\n\nif __name__ == \"__main__\":\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n for event in scan.start():\n print(event)\n
"},{"location":"scanning/advanced/#asynchronous","title":"Asynchronous","text":"from bbot.scanner import Scanner\n\nasync def main():\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n async for event in scan.async_start():\n print(event.json())\n\nif __name__ == \"__main__\":\n import asyncio\n asyncio.run(main())\n
"},{"location":"scanning/advanced/#command-line-help","title":"Command-Line Help","text":"usage: bbot [-h] [-t TARGET [TARGET ...]] [-w WHITELIST [WHITELIST ...]] [-b BLACKLIST [BLACKLIST ...]] [--strict-scope] [-p [PRESET ...]] [-c [CONFIG ...]] [-lp]\n [-m MODULE [MODULE ...]] [-l] [-lmo] [-em MODULE [MODULE ...]] [-f FLAG [FLAG ...]] [-lf] [-rf FLAG [FLAG ...]] [-ef FLAG [FLAG ...]] [--allow-deadly] [-n SCAN_NAME] [-v]\n [-d] [-s] [--force] [-y] [--dry-run] [--current-preset] [--current-preset-full] [-o DIR] [-om MODULE [MODULE ...]] [--json] [--brief]\n [--event-types EVENT_TYPES [EVENT_TYPES ...]] [--no-deps | --force-deps | --retry-deps | --ignore-failed-deps | --install-all-deps] [--version]\n [-H CUSTOM_HEADERS [CUSTOM_HEADERS ...]] [--custom-yara-rules CUSTOM_YARA_RULES]\n\nBighuge BLS OSINT Tool\n\noptions:\n -h, --help show this help message and exit\n\nTarget:\n -t TARGET [TARGET ...], --targets TARGET [TARGET ...]\n Targets to seed the scan\n -w WHITELIST [WHITELIST ...], --whitelist WHITELIST [WHITELIST ...]\n What's considered in-scope (by default it's the same as --targets)\n -b BLACKLIST [BLACKLIST ...], --blacklist BLACKLIST [BLACKLIST ...]\n Don't touch these things\n --strict-scope Don't consider subdomains of target/whitelist to be in-scope\n\nPresets:\n -p [PRESET ...], --preset [PRESET ...]\n Enable BBOT preset(s)\n -c [CONFIG ...], --config [CONFIG ...]\n Custom config options in key=value format: e.g. 'modules.shodan.api_key=1234'\n -lp, --list-presets List available presets.\n\nModules:\n -m MODULE [MODULE ...], --modules MODULE [MODULE ...]\n Modules to enable. Choices: viewdns,postman,baddns_zone,dehashed,bucket_file_enum,asn,generic_ssrf,github_codesearch,columbus,azure_realm,dotnetnuke,dockerhub,credshed,passivetotal,certspotter,builtwith,otx,ipneighbor,fingerprintx,oauth,robots,dnsbrute_mutations,httpx,paramminer_headers,digitorus,gitlab,hunt,hunterio,trufflehog,ffuf,nuclei,badsecrets,git,bucket_firebase,ffuf_shortnames,urlscan,docker_pull,ip2location,subdomaincenter,telerik,pgp,zoomeye,shodan_dns,trickest,dnscommonsrv,ntlm,myssl,internetdb,emailformat,dastardly,azure_tenant,github_workflows,crt,affiliates,wayback,ajaxpro,wafw00f,iis_shortnames,sslcert,chaos,newsletters,host_header,bucket_amazon,vhost,paramminer_cookies,virustotal,rapiddns,leakix,dnsbrute,baddns,url_manipulation,code_repository,smuggler,bevigil,paramminer_getparams,unstructured,skymem,securitytrails,sitedossier,git_clone,bucket_azure,bucket_google,bypass403,wpscan,dnsdumpster,wappalyzer,dnscaa,social,hackertarget,github_org,fullhunt,filedownload,binaryedge,gowitness,anubisdb,portscan,ipstack,secretsdb,c99,censys,bucket_digitalocean\n -l, --list-modules List available modules.\n -lmo, --list-module-options\n Show all module config options\n -em MODULE [MODULE ...], --exclude-modules MODULE [MODULE ...]\n Exclude these modules.\n -f FLAG [FLAG ...], --flags FLAG [FLAG ...]\n Enable modules by flag. Choices: subdomain-hijack,web-paramminer,subdomain-enum,code-enum,cloud-enum,iis-shortnames,web-thorough,baddns,portscan,slow,social-enum,affiliates,safe,web-screenshots,deadly,report,web-basic,email-enum,active,service-enum,aggressive,passive\n -lf, --list-flags List available flags.\n -rf FLAG [FLAG ...], --require-flags FLAG [FLAG ...]\n Only enable modules with these flags (e.g. -rf passive)\n -ef FLAG [FLAG ...], --exclude-flags FLAG [FLAG ...]\n Disable modules with these flags. (e.g. -ef aggressive)\n --allow-deadly Enable the use of highly aggressive modules\n\nScan:\n -n SCAN_NAME, --name SCAN_NAME\n Name of scan (default: random)\n -v, --verbose Be more verbose\n -d, --debug Enable debugging\n -s, --silent Be quiet\n --force Run scan even in the case of condition violations or failed module setups\n -y, --yes Skip scan confirmation prompt\n --dry-run Abort before executing scan\n --current-preset Show the current preset in YAML format\n --current-preset-full\n Show the current preset in its full form, including defaults\n\nOutput:\n -o DIR, --output-dir DIR\n Directory to output scan results\n -om MODULE [MODULE ...], --output-modules MODULE [MODULE ...]\n Output module(s). Choices: subdomains,emails,web_report,json,txt,websocket,slack,asset_inventory,neo4j,splunk,csv,stdout,http,python,discord,teams\n --json, -j Output scan data in JSON format\n --brief, -br Output only the data itself\n --event-types EVENT_TYPES [EVENT_TYPES ...]\n Choose which event types to display\n\nModule dependencies:\n Control how modules install their dependencies\n\n --no-deps Don't install module dependencies\n --force-deps Force install all module dependencies\n --retry-deps Try again to install failed module dependencies\n --ignore-failed-deps Run modules even if they have failed dependencies\n --install-all-deps Install dependencies for all modules\n\nMisc:\n --version show BBOT version and exit\n -H CUSTOM_HEADERS [CUSTOM_HEADERS ...], --custom-headers CUSTOM_HEADERS [CUSTOM_HEADERS ...]\n List of custom headers as key value pairs (header=value).\n --custom-yara-rules CUSTOM_YARA_RULES, -cy CUSTOM_YARA_RULES\n Add custom yara rules to excavate\n\nEXAMPLES\n\n Subdomains:\n bbot -t evilcorp.com -p subdomain-enum\n\n Subdomains (passive only):\n bbot -t evilcorp.com -p subdomain-enum -rf passive\n\n Subdomains + port scan + web screenshots:\n bbot -t evilcorp.com -p subdomain-enum -m portscan gowitness -n my_scan -o .\n\n Subdomains + basic web scan:\n bbot -t evilcorp.com -p subdomain-enum web-basic\n\n Web spider:\n bbot -t www.evilcorp.com -p spider -c web.spider_distance=2 web.spider_depth=2\n\n Everything everywhere all at once:\n bbot -t evilcorp.com -p kitchen-sink\n\n List modules:\n bbot -l\n\n List presets:\n bbot -lp\n\n List flags:\n bbot -lf\n
"},{"location":"scanning/configuration/","title":"Configuration Overview","text":"Normally, Presets are used to configure a scan. However, there may be cases where you want to change BBOT's global defaults so a certain option is always set, even if it's not specified in a preset.
BBOT has a YAML config at ~/.config/bbot.yml
. This is the first config that BBOT loads, so it's a good place to put default settings like http_proxy
, max_threads
, or http_user_agent
. You can also put any module settings here, including API keys.
For a list of all possible config options, see:
For examples of common config changes, see Tips and Tricks.
"},{"location":"scanning/configuration/#configuration-files","title":"Configuration Files","text":"BBOT loads its config from the following files, in this order (last one loaded == highest priority):
~/.config/bbot/bbot.yml
<-- Global BBOT config-p
) <-- Presets are good for scan-specific settings-c
) <-- CLI overrides everythingbbot.yml
will be automatically created for you when you first run BBOT.
You can specify config options either via the command line or the config. For example, if you want to proxy your BBOT scan through a local proxy like Burp Suite, you could either do:
# send BBOT traffic through an HTTP proxy\nbbot -t evilcorp.com -c http_proxy=http://127.0.0.1:8080\n
Or, in ~/.config/bbot/config.yml
:
http_proxy: http://127.0.0.1:8080\n
These two are equivalent.
Config options specified via the command-line take precedence over all others. You can give BBOT a custom config file with -c myconf.yml
, or individual arguments like this: -c modules.shodan_dns.api_key=deadbeef
. To display the full and current BBOT config, including any command-line arguments, use bbot -c
.
Note that placing the following in bbot.yml
: ~/.bbot/config/bbot.yml
modules:\n shodan_dns:\n api_key: deadbeef\n
Is the same as: bbot -c modules.shodan_dns.api_key=deadbeef\n
"},{"location":"scanning/configuration/#global-config-options","title":"Global Config Options","text":"Below is a full list of the config options supported, along with their defaults.
defaults.yml### BASIC OPTIONS ###\n\n# BBOT working directory\nhome: ~/.bbot\n# How many scan results to keep before cleaning up the older ones\nkeep_scans: 20\n# Interval for displaying status messages\nstatus_frequency: 15\n# Include the raw data of files (i.e. PDFs, web screenshots) as base64 in the event\nfile_blobs: false\n# Include the raw data of directories (i.e. git repos) as tar.gz base64 in the event\nfolder_blobs: false\n\n### SCOPE ###\n\nscope:\n # Filter by scope distance which events are displayed in the output\n # 0 == show only in-scope events (affiliates are always shown)\n # 1 == show all events up to distance-1 (1 hop from target)\n report_distance: 0\n # How far out from the main scope to search\n # Do not change this setting unless you know what you're doing\n search_distance: 0\n\n### DNS ###\n\ndns:\n # Completely disable DNS resolution (careful if you have IP whitelists/blacklists, consider using minimal=true instead)\n disable: false\n # Speed up scan by not creating any new DNS events, and only resolving A and AAAA records\n minimal: false\n # How many instances of the dns module to run concurrently\n threads: 20\n # How many concurrent DNS resolvers to use when brute-forcing\n # (under the hood this is passed through directly to massdns -s)\n brute_threads: 1000\n # How far away from the main target to explore via DNS resolution (independent of scope.search_distance)\n # This is safe to change\n search_distance: 1\n # Limit how many DNS records can be followed in a row (stop malicious/runaway DNS records)\n runaway_limit: 5\n # DNS query timeout\n timeout: 5\n # How many times to retry DNS queries\n retries: 1\n # Completely disable BBOT's DNS wildcard detection\n wildcard_disable: False\n # Disable BBOT's DNS wildcard detection for select domains\n wildcard_ignore: []\n # How many sanity checks to make when verifying wildcard DNS\n # Increase this value if BBOT's wildcard detection isn't working\n wildcard_tests: 10\n # Skip DNS requests for a certain domain and rdtype after encountering this many timeouts or SERVFAILs\n # This helps prevent faulty DNS servers from hanging up the scan\n abort_threshold: 50\n # Don't show PTR records containing IP addresses\n filter_ptrs: true\n # Enable/disable debug messages for DNS queries\n debug: false\n # For performance reasons, always skip these DNS queries\n # Microsoft's DNS infrastructure is misconfigured so that certain queries to mail.protection.outlook.com always time out\n omit_queries:\n - SRV:mail.protection.outlook.com\n - CNAME:mail.protection.outlook.com\n - TXT:mail.protection.outlook.com\n\n### WEB ###\n\nweb:\n # HTTP proxy\n http_proxy: \n # Web user-agent\n user_agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.2151.97\n # Set the maximum number of HTTP links that can be followed in a row (0 == no spidering allowed)\n spider_distance: 0\n # Set the maximum directory depth for the web spider\n spider_depth: 1\n # Set the maximum number of links that can be followed per page\n spider_links_per_page: 25\n # HTTP timeout (for Python requests; API calls, etc.)\n http_timeout: 10\n # HTTP timeout (for httpx)\n httpx_timeout: 5\n # Custom HTTP headers (e.g. cookies, etc.)\n # in the format { \"Header-Key\": \"header_value\" }\n # These are attached to all in-scope HTTP requests\n # Note that some modules (e.g. github) may end up sending these to out-of-scope resources\n http_headers: {}\n # HTTP retries (for Python requests; API calls, etc.)\n http_retries: 1\n # HTTP retries (for httpx)\n httpx_retries: 1\n # Enable/disable debug messages for web requests/responses\n debug: false\n # Maximum number of HTTP redirects to follow\n http_max_redirects: 5\n # Whether to verify SSL certificates\n ssl_verify: false\n\n# Tool dependencies\ndeps:\n ffuf:\n version: \"2.1.0\"\n\n### ADVANCED OPTIONS ###\n\n# Load BBOT modules from these custom paths\nmodule_paths: []\n\n# Infer certain events from others, e.g. IPs from IP ranges, DNS_NAMEs from URLs, etc.\nspeculate: True\n# Passively search event data for URLs, hostnames, emails, etc.\nexcavate: True\n# Summarize activity at the end of a scan\naggregate: True\n# DNS resolution, wildcard detection, etc.\ndnsresolve: True\n# Cloud provider tagging\ncloudcheck: True\n\n# How to handle installation of module dependencies\n# Choices are:\n# - abort_on_failure (default) - if a module dependency fails to install, abort the scan\n# - retry_failed - try again to install failed dependencies\n# - ignore_failed - run the scan regardless of what happens with dependency installation\n# - disable - completely disable BBOT's dependency system (you are responsible for installing tools, pip packages, etc.)\ndeps_behavior: abort_on_failure\n\n# Strip querystring from URLs by default\nurl_querystring_remove: True\n# When query string is retained, by default collapse parameter values down to a single value per parameter\nurl_querystring_collapse: True\n\n# Completely ignore URLs with these extensions\nurl_extension_blacklist:\n # images\n - png\n - jpg\n - bmp\n - ico\n - jpeg\n - gif\n - svg\n - webp\n # web/fonts\n - css\n - woff\n - woff2\n - ttf\n - eot\n - sass\n - scss\n # audio\n - mp3\n - m4a\n - wav\n - flac\n # video\n - mp4\n - mkv\n - avi\n - wmv\n - mov\n - flv\n - webm\n# Distribute URLs with these extensions only to httpx (these are omitted from output)\nurl_extension_httpx_only:\n - js\n# Don't output these types of events (they are still distributed to modules)\nomit_event_types:\n - HTTP_RESPONSE\n - RAW_TEXT\n - URL_UNVERIFIED\n - DNS_NAME_UNRESOLVED\n - FILESYSTEM\n - WEB_PARAMETER\n - RAW_DNS_RECORD\n # - IP_ADDRESS\n\n# Custom interactsh server settings\ninteractsh_server: null\ninteractsh_token: null\ninteractsh_disable: false\n
"},{"location":"scanning/configuration/#module-config-options","title":"Module Config Options","text":"Many modules accept their own configuration options. These options have the ability to change their behavior. For example, the portscan
module accepts options for ports
, rate
, etc. Below is a list of all possible module config options.
['https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=vindex&search=<query>', 'http://the.earth.li:11371/pks/lookup?fingerprint=on&op=vindex&search=<query>', 'https://pgpkeys.eu/pks/lookup?search=<query>&op=index', 'https://pgp.mit.edu/pks/lookup?search=<query>&op=index']
modules.securitytrails.api_key str SecurityTrails API key modules.shodan_dns.api_key str Shodan API key modules.trickest.api_key str Trickest API key modules.trufflehog.concurrency int Number of concurrent workers 8 modules.trufflehog.only_verified bool Only report credentials that have been verified True modules.trufflehog.version str trufflehog version 3.75.1 modules.unstructured.extensions list File extensions to parse ['bak', 'bash', 'bashrc', 'conf', 'cfg', 'crt', 'csv', 'db', 'sqlite', 'doc', 'docx', 'ica', 'indd', 'ini', 'key', 'pub', 'log', 'markdown', 'md', 'odg', 'odp', 'ods', 'odt', 'pdf', 'pem', 'pps', 'ppsx', 'ppt', 'pptx', 'ps1', 'rdp', 'sh', 'sql', 'swp', 'sxw', 'txt', 'vbs', 'wpd', 'xls', 'xlsx', 'xml', 'yml', 'yaml'] modules.unstructured.ignore_folders list Subfolders to ignore when crawling downloaded folders ['.git'] modules.urlscan.urls bool Emit URLs in addition to DNS_NAMEs False modules.virustotal.api_key str VirusTotal API Key modules.wayback.garbage_threshold int Dedupe similar urls if they are in a group of this size or higher (lower values == less garbage data) 10 modules.wayback.urls bool emit URLs in addition to DNS_NAMEs False modules.zoomeye.api_key str ZoomEye API key modules.zoomeye.include_related bool Include domains which may be related to the target False modules.zoomeye.max_pages int How many pages of results to fetch 20 modules.asset_inventory.output_file str Set a custom output file modules.asset_inventory.recheck bool When use_previous=True, don't retain past details like open ports or findings. Instead, allow them to be rediscovered by the new scan False modules.asset_inventory.summary_netmask int Subnet mask to use when summarizing IP addresses at end of scan 16 modules.asset_inventory.use_previous bool Emit previous asset inventory as new events (use in conjunction with -n <old_scan_name>)
False modules.csv.output_file str Output to CSV file modules.discord.event_types list Types of events to send ['VULNERABILITY', 'FINDING'] modules.discord.min_severity str Only allow VULNERABILITY events of this severity or higher LOW modules.discord.webhook_url str Discord webhook URL modules.emails.output_file str Output to file modules.http.bearer str Authorization Bearer token modules.http.method str HTTP method POST modules.http.password str Password (basic auth) modules.http.siem_friendly bool Format JSON in a SIEM-friendly way for ingestion into Elastic, Splunk, etc. False modules.http.timeout int HTTP timeout 10 modules.http.url str Web URL modules.http.username str Username (basic auth) modules.json.output_file str Output to file modules.json.siem_friendly bool Output JSON in a SIEM-friendly format for ingestion into Elastic, Splunk, etc. False modules.neo4j.password str Neo4j password bbotislife modules.neo4j.uri str Neo4j server + port bolt://localhost:7687 modules.neo4j.username str Neo4j username neo4j modules.slack.event_types list Types of events to send ['VULNERABILITY', 'FINDING'] modules.slack.min_severity str Only allow VULNERABILITY events of this severity or higher LOW modules.slack.webhook_url str Discord webhook URL modules.splunk.hectoken str HEC Token modules.splunk.index str Index to send data to modules.splunk.source str Source path to be added to the metadata modules.splunk.timeout int HTTP timeout 10 modules.splunk.url str Web URL modules.stdout.accept_dupes bool Whether to show duplicate events, default True True modules.stdout.event_fields list Which event fields to display [] modules.stdout.event_types list Which events to display, default all event types [] modules.stdout.format str Which text format to display, choices: text,json text modules.stdout.in_scope_only bool Whether to only show in-scope events False modules.subdomains.include_unresolved bool Include unresolved subdomains in output False modules.subdomains.output_file str Output to file modules.teams.event_types list Types of events to send ['VULNERABILITY', 'FINDING'] modules.teams.min_severity str Only allow VULNERABILITY events of this severity or higher LOW modules.teams.webhook_url str Discord webhook URL modules.txt.output_file str Output to file modules.web_report.css_theme_file str CSS theme URL for HTML output https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.1.0/github-markdown.min.css modules.web_report.output_file str Output to file modules.websocket.preserve_graph bool Preserve full chains of events in the graph (prevents orphans) True modules.websocket.token str Authorization Bearer token modules.websocket.url str Web URL modules.excavate.custom_yara_rules str Include custom Yara rules modules.excavate.retain_querystring bool Keep the querystring intact on emitted WEB_PARAMETERS False modules.excavate.yara_max_match_data int Sets the maximum amount of text that can extracted from a YARA regex 2000 modules.speculate.max_hosts int Max number of IP_RANGE hosts to convert into IP_ADDRESS events 65536 modules.speculate.ports str The set of ports to speculate on 80,443"},{"location":"scanning/events/","title":"Events","text":"An Event is a piece of data discovered by BBOT. Examples include IP_ADDRESS
, DNS_NAME
, EMAIL_ADDRESS
, URL
, etc. When you run a BBOT scan, events are constantly being exchanged between modules. They are also output to the console:
[DNS_NAME] www.evilcorp.com sslcert (distance-0, in-scope, resolved, subdomain, a-record)\n ^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^\nevent type event data source module tags\n
In addition to the obvious data (e.g. www.evilcorp.com
), an event also contains other useful information such as:
.discovery_path
showing exactly how the event was discovered, starting from the first scan target.timestamp
of when the data was discovered.module
that discovered it.parent
event that led to its discovery.scope_distance
(how many hops it is from the main scope, 0 == in-scope).tags
that describe the data (mx-record
, http-title
, etc.)These attributes allow us to construct a visual graph of events (e.g. in Neo4j) and query/filter/grep them more easily. Here is what a typical event looks like in JSON format:
{\n \"type\": \"URL\",\n \"id\": \"URL:c9962277277393f8895d2a4fa9b7f70b15f3af3e\",\n \"scope_description\": \"in-scope\",\n \"data\": \"https://blog.blacklanternsecurity.com/\",\n \"host\": \"blog.blacklanternsecurity.com\",\n \"resolved_hosts\": [\n \"104.18.40.87\"\n ],\n \"dns_children\": {\n \"A\": [\n \"104.18.40.87\",\n \"172.64.147.169\"\n ]\n },\n \"web_spider_distance\": 0,\n \"scope_distance\": 0,\n \"scan\": \"SCAN:9224b49405e6d1607fd615243577d9ca86c7d206\",\n \"timestamp\": 1717260760.157012,\n \"parent\": \"OPEN_TCP_PORT:ebe3d6c10b41f60e3590ce6436ab62510b91c758\",\n \"tags\": [\n \"in-scope\",\n \"http-title-black-lantern-security-blsops\",\n \"dir\",\n \"ip-104-18-40-87\",\n \"cdn-cloudflare\",\n \"status-200\"\n ],\n \"module\": \"httpx\",\n \"module_sequence\": \"httpx\",\n \"discovery_context\": \"httpx visited blog.blacklanternsecurity.com:443 and got status code 200 at https://blog.blacklanternsecurity.com/\",\n \"discovery_path\": [\n \"Scan difficult_arthur seeded with DNS_NAME: blacklanternsecurity.com\",\n \"certspotter searched certspotter API for \\\"blacklanternsecurity.com\\\" and found DNS_NAME: blog.blacklanternsecurity.com\",\n \"speculated OPEN_TCP_PORT: blog.blacklanternsecurity.com:443\",\n \"httpx visited blog.blacklanternsecurity.com:443 and got status code 200 at https://blog.blacklanternsecurity.com/\"\n ]\n}\n
For a more detailed description of BBOT events, see Developer Documentation - Event.
Below is a full list of event types along with which modules produce/consume them.
"},{"location":"scanning/events/#list-of-event-types","title":"List of Event Types","text":"Event Type # Consuming Modules # Producing Modules Consuming Modules Producing Modules * 15 0 affiliates, cloudcheck, csv, discord, dnsresolve, http, json, neo4j, python, slack, splunk, stdout, teams, txt, websocket ASN 0 1 asn AZURE_TENANT 1 0 speculate CODE_REPOSITORY 3 5 docker_pull, git_clone, github_workflows code_repository, dockerhub, github_codesearch, github_org, gitlab DNS_NAME 56 41 anubisdb, asset_inventory, azure_realm, azure_tenant, baddns, baddns_zone, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, columbus, credshed, crt, dehashed, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, emailformat, fullhunt, github_codesearch, hackertarget, hunterio, internetdb, leakix, myssl, oauth, otx, passivetotal, pgp, portscan, postman, rapiddns, securitytrails, shodan_dns, sitedossier, skymem, speculate, subdomaincenter, subdomains, trickest, urlscan, viewdns, virustotal, wayback, zoomeye anubisdb, azure_tenant, bevigil, binaryedge, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, hackertarget, hunterio, internetdb, leakix, myssl, ntlm, oauth, otx, passivetotal, rapiddns, securitytrails, shodan_dns, sitedossier, speculate, sslcert, subdomaincenter, trickest, urlscan, vhost, viewdns, virustotal, wayback, zoomeye DNS_NAME_UNRESOLVED 3 0 baddns, speculate, subdomains EMAIL_ADDRESS 1 8 emails credshed, dehashed, dnscaa, emailformat, hunterio, pgp, skymem, sslcert FILESYSTEM 2 5 trufflehog, unstructured docker_pull, filedownload, git_clone, github_workflows, unstructured FINDING 2 28 asset_inventory, web_report ajaxpro, baddns, baddns_zone, badsecrets, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, bypass403, dastardly, git, gitlab, host_header, hunt, internetdb, newsletters, ntlm, nuclei, paramminer_cookies, paramminer_getparams, secretsdb, smuggler, speculate, telerik, trufflehog, url_manipulation, wpscan GEOLOCATION 0 2 ip2location, ipstack HASHED_PASSWORD 0 2 credshed, dehashed HTTP_RESPONSE 19 1 ajaxpro, asset_inventory, badsecrets, dastardly, dotnetnuke, excavate, filedownload, gitlab, host_header, newsletters, ntlm, paramminer_cookies, paramminer_getparams, paramminer_headers, secretsdb, speculate, telerik, wappalyzer, wpscan httpx IP_ADDRESS 8 3 asn, asset_inventory, internetdb, ip2location, ipneighbor, ipstack, portscan, speculate asset_inventory, ipneighbor, speculate IP_RANGE 2 0 portscan, speculate OPEN_TCP_PORT 4 4 asset_inventory, fingerprintx, httpx, sslcert asset_inventory, internetdb, portscan, speculate ORG_STUB 2 1 dockerhub, github_org speculate PASSWORD 0 2 credshed, dehashed PROTOCOL 0 1 fingerprintx RAW_TEXT 0 1 unstructured SOCIAL 5 3 dockerhub, github_org, gitlab, gowitness, speculate dockerhub, gitlab, social STORAGE_BUCKET 7 5 bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, speculate bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google TECHNOLOGY 4 8 asset_inventory, gitlab, web_report, wpscan badsecrets, dotnetnuke, gitlab, gowitness, internetdb, nuclei, wappalyzer, wpscan URL 19 2 ajaxpro, asset_inventory, bypass403, ffuf, generic_ssrf, git, gowitness, httpx, iis_shortnames, ntlm, nuclei, robots, smuggler, speculate, telerik, url_manipulation, vhost, wafw00f, web_report gowitness, httpx URL_HINT 1 1 ffuf_shortnames iis_shortnames URL_UNVERIFIED 6 16 code_repository, filedownload, httpx, oauth, social, speculate azure_realm, bevigil, bucket_file_enum, dnscaa, dockerhub, excavate, ffuf, ffuf_shortnames, github_codesearch, gowitness, hunterio, postman, robots, urlscan, wayback, wpscan USERNAME 1 2 speculate credshed, dehashed VHOST 1 1 web_report vhost VULNERABILITY 2 12 asset_inventory, web_report ajaxpro, baddns, baddns_zone, badsecrets, dastardly, dotnetnuke, generic_ssrf, internetdb, nuclei, telerik, trufflehog, wpscan WAF 1 1 asset_inventory wafw00f WEBSCREENSHOT 0 1 gowitness WEB_PARAMETER 4 4 hunt, paramminer_cookies, paramminer_getparams, paramminer_headers excavate, paramminer_cookies, paramminer_getparams, paramminer_headers"},{"location":"scanning/events/#findings-vs-vulnerabilities","title":"Findings Vs. Vulnerabilities","text":"BBOT has a sharp distinction between Findings and Vulnerabilities:
VULNERABILITY
FINDING
By making this separation, actionable vulnerabilities can be identified quickly in the midst of a large scan
"},{"location":"scanning/output/","title":"Output","text":"By default, BBOT saves its output in TXT, JSON, and CSV formats. The filenames are logged at the end of each scan:
Every BBOT scan gets a unique and mildly-entertaining name like demonic_jimmy
. Output for that scan, including scan stats and any web screenshots, etc., are saved to a folder by that name in ~/.bbot/scans
. The most recent 20 scans are kept, and older ones are removed. You can change the location of BBOT's output with --output
, and you can also pick a custom scan name with --name
.
If you reuse a scan name, it will append to its original output files and leverage the previous.
"},{"location":"scanning/output/#output-modules","title":"Output Modules","text":"Multiple simultaneous output formats are possible because of output modules. Output modules are similar to normal modules except they are enabled with -om
.
The stdout
output module is what you see when you execute BBOT in the terminal. By default it looks the same as the txt
module, but it has options you can customize. You can filter by event type, choose the data format (text
, json
), and which fields you want to see:
txt
output is tab-delimited, so it's easy to grep:
# grep out only the DNS_NAMEs\ncat ~/.bbot/scans/extreme_johnny/output.txt | grep '[DNS_NAME]' | cut -f2\nevilcorp.com\nwww.evilcorp.com\nmail.evilcorp.com\n
"},{"location":"scanning/output/#csv","title":"CSV","text":"The csv
output module produces a CSV like this:
If you manually enable the json
output module, it will go to stdout:
bbot -t evilcorp.com -om json | jq\n
You will then see events like this:
{\n \"type\": \"IP_ADDRESS\",\n \"id\": \"IP_ADDRESS:13cd09c2adf0860a582240229cd7ad1dccdb5eb1\",\n \"data\": \"1.2.3.4\",\n \"scope_distance\": 1,\n \"scan\": \"SCAN:64c0e076516ae7aa6502fd99489693d0d5ec26cc\",\n \"timestamp\": 1688518967.740472,\n \"resolved_hosts\": [\"1.2.3.4\"],\n \"parent\": \"DNS_NAME:2da045542abbf86723f22383d04eb453e573723c\",\n \"tags\": [\"distance-1\", \"ipv4\", \"internal\"],\n \"module\": \"A\",\n \"module_sequence\": \"A\"\n}\n
You can filter on the JSON output with jq
:
# pull out only the .data attribute of every DNS_NAME\n$ jq -r 'select(.type==\"DNS_NAME\") | .data' ~/.bbot/scans/extreme_johnny/output.json\nevilcorp.com\nwww.evilcorp.com\nmail.evilcorp.com\n
"},{"location":"scanning/output/#discord-slack-teams","title":"Discord / Slack / Teams","text":"BBOT supports output via webhooks to discord
, slack
, and teams
. To use them, you must specify a webhook URL either in the config:
modules:\n discord:\n webhook_url: https://discord.com/api/webhooks/1234/deadbeef\n
...or on the command line:
bbot -t evilcorp.com -om discord -c modules.discord.webhook_url=https://discord.com/api/webhooks/1234/deadbeef\n
By default, only VULNERABILITY
and FINDING
events are sent, but this can be customized by setting event_types
in the config like so:
modules:\n discord:\n event_types:\n - VULNERABILITY\n - FINDING\n - STORAGE_BUCKET\n
...or on the command line:
bbot -t evilcorp.com -om discord -c modules.discord.event_types=[\"STORAGE_BUCKET\",\"FINDING\",\"VULNERABILITY\"]\n
You can also filter on the severity of VULNERABILITY
events by setting min_severity
:
modules:\n discord:\n min_severity: HIGH\n
"},{"location":"scanning/output/#http","title":"HTTP","text":"The http
output module sends events in JSON format to a desired HTTP endpoint.
# POST scan results to localhost\nbbot -t evilcorp.com -om http -c modules.http.url=http://localhost:8000\n
You can customize the HTTP method if needed. Authentication is also supported:
~/.bbot/config/bbot.ymlmodules:\n http:\n url: https://localhost:8000\n method: PUT\n # Authorization: Bearer\n bearer: <bearer_token>\n # OR\n username: bob\n password: P@ssw0rd\n
"},{"location":"scanning/output/#splunk","title":"Splunk","text":"The splunk
output module sends events in JSON format to a desired splunk instance via HEC.
You can customize this output with the following config options:
~/.bbot/config/bbot.ymlmodules:\n splunk:\n # The full URL with the URI `/services/collector/event`\n url: https://localhost:8088/services/collector/event\n # Generated from splunk webui\n hectoken: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\n # Defaults to `main` if not set\n index: my-specific-index\n # Defaults to `bbot` if not set\n source: /my/source.json\n
"},{"location":"scanning/output/#asset-inventory","title":"Asset Inventory","text":"The asset_inventory
module produces a CSV like this:
The subdomains
output module produces simple text file containing only in-scope and resolved subdomains:
evilcorp.com\nwww.evilcorp.com\nmail.evilcorp.com\nportal.evilcorp.com\n
"},{"location":"scanning/output/#neo4j","title":"Neo4j","text":"Neo4j is the funnest (and prettiest) way to view and interact with BBOT data.
# start Neo4j in the background with docker\ndocker run -d -p 7687:7687 -p 7474:7474 -v \"$(pwd)/neo4j/:/data/\" -e NEO4J_AUTH=neo4j/bbotislife neo4j\n
-om neo4j
bbot -f subdomain-enum -t evilcorp.com -om neo4j\n
neo4j
/ bbotislife
Neo4j uses the Cypher Query Language for its graph query language. Cypher uses common clauses to craft relational queries and present the desired data in multiple formats.
Cypher queries can be broken down into three required pieces; selection, filter, and presentation. The selection piece identifies what data that will be searched against - 90% of the time the \"MATCH\" clause will be enough but there are means to read from csv or json data files. In all of these examples the \"MATCH\" clause will be used. The filter piece helps to focus in on the required data and used the \"WHERE\" clause to accomplish this effort (most basic operators can be used). Finally, the presentation section identifies how the data should be presented back to the querier. While neo4j is a graph database, it can be used in a traditional table view.
A simple query to grab every URL event with \".com\" in the BBOT data field would look like this: MATCH (u:URL) WHERE u.data contains \".com\" RETURN u
In this query the following can be identified: - Within the MATCH statement \"u\" is a variable and can be any value needed by the user while the \"URL\" label is a direct relationship to the BBOT event type. - The WHERE statement allows the query to filter on any of the BBOT event properties like data, tag, or even the label itself. - The RETURN statement is a general presentation of the whole URL event but this can be narrowed down to present any of the specific properties of the BBOT event (RETURN u.data, u.tags
).
The following are a few recommended queries to get started with:
// Get all \"in-scope\" DNS Nodes and return just data and tags properties\nMATCH (n:DNS_NAME)\nWHERE \"in-scope\" IN n.tags\nRETURN n.data, n.tags\n
// Get the count of labels/BBOT events in the Neo4j Database\nMATCH (n)\nRETURN labels(n), count(n)\n
// Get a graph of open ports associated with each domain\nMATCH z = ((n:DNS_NAME) --> (p:OPEN_TCP_PORT))\nRETURN z\n
// Get all domains and IP addresses with open TCP ports\nMATCH (n) --> (p:OPEN_TCP_PORT)\nWHERE \"in-scope\" in n.tags and (n:DNS_NAME or n:IP_ADDRESS)\nWITH *, TAIL(SPLIT(p.data, ':')) AS port\nRETURN n.data, collect(distinct port)\n
// Clear the database\nMATCH (n) DETACH DELETE n\n
This is not an exhaustive list of clauses, filters, or other means to use cypher and should be considered a starting point. To build more advanced queries consider reading Neo4j's Cypher documentation.
Additional note: these sample queries are dependent on the existence of the data in the target neo4j database.
"},{"location":"scanning/presets/","title":"Presets","text":"Once you start customizing BBOT, your commands can start to get really long. Presets let you put all your scan settings in a single file:
bbot -p ./my_preset.yml\n
A Preset is a YAML file that can include scan targets, modules, and config options like API keys.
A typical preset looks like this:
subdomain-enum.ymldescription: Enumerate subdomains via APIs, brute-force\n\nflags:\n - subdomain-enum\n\noutput_modules:\n - subdomains\n
"},{"location":"scanning/presets/#how-to-use-presets-p","title":"How to use Presets (-p
)","text":"BBOT has a ready-made collection of presets for common tasks like subdomain enumeration and web spidering. They live in ~/.bbot/presets
.
To list them, you can do:
# list available presets\nbbot -lp\n
Enable them with -p
:
# do a subdomain enumeration \nbbot -t evilcorp.com -p subdomain-enum\n\n# multiple presets - subdomain enumeration + web spider\nbbot -t evilcorp.com -p subdomain-enum spider\n\n# start with a preset but only enable modules that have the 'passive' flag\nbbot -t evilcorp.com -p subdomain-enum -rf passive\n\n# preset + manual config override\nbbot -t www.evilcorp.com -p spider -c web.spider_distance=10\n
You can build on the default presets, or create your own. Here's an example of a custom preset that builds on subdomain-enum
:
description: Do a subdomain enumeration + basic web scan + nuclei\n\ntarget:\n - evilcorp.com\n\ninclude:\n # include these default presets\n - subdomain-enum\n - web-basic\n\nmodules:\n # enable nuclei in addition to the other modules\n - nuclei\n\nconfig:\n # global config options\n web:\n http_proxy: http://127.0.0.1:8080\n # module config options\n modules:\n # api keys\n securitytrails:\n api_key: 21a270d5f59c9b05813a72bb41707266\n virustotal:\n api_key: 4f41243847da693a4f356c0486114bc6\n
To execute your custom preset, you do:
bbot -p ./my_subdomains.yml\n
"},{"location":"scanning/presets/#preset-load-order","title":"Preset Load Order","text":"When you enable multiple presets, the order matters. In the case of a conflict, the last preset will always win. This means, for example, if you have a custom preset called my_spider
that sets web.spider_distance
to 1:
config:\n web:\n spider_distance: 1\n
...and you enable it alongside the default spider
preset in this order:
bbot -t evilcorp.com -p ./my_spider.yml spider\n
...the value of web.spider_distance
will be overridden by spider
. To ensure this doesn't happen, you would want to switch the order of the presets:
bbot -t evilcorp.com -p spider ./my_spider.yml\n
"},{"location":"scanning/presets/#validating-presets","title":"Validating Presets","text":"To make sure BBOT is configured the way you expect, you can always check the --current-preset
to show the final version of the config that will be used when BBOT executes:
# verify the preset is what you want\nbbot -p ./mypreset.yml --current-preset\n
"},{"location":"scanning/presets/#advanced-usage","title":"Advanced Usage","text":"BBOT Presets support advanced features like environment variable substitution and custom conditions.
"},{"location":"scanning/presets/#environment-variables","title":"Environment Variables","text":"You can insert environment variables into your preset like this: ${env:<variable>}
:
description: Do a nuclei scan\n\ntarget:\n - evilcorp.com\n\nmodules:\n - nuclei\n\nconfig:\n modules:\n nuclei:\n # allow the nuclei templates to be specified at runtime via an environment variable\n tags: ${env:NUCLEI_TAGS}\n
NUCLEI_TAGS=apache,nginx bbot -p ./my_nuclei.yml\n
"},{"location":"scanning/presets/#conditions","title":"Conditions","text":"Sometimes, you might need to add custom logic to a preset. BBOT supports this via conditions
. The conditions
attribute allows you to specify a list of custom conditions that will be evaluated before the scan starts. This is useful for performing last-minute sanity checks, or changing the behavior of the scan based on custom criteria.
description: Abort if nuclei templates aren't specified\n\nmodules:\n - nuclei\n\nconditions:\n - |\n {% if not config.modules.nuclei.templates %}\n {{ abort(\"Don't forget to set your templates!\") }}\n {% endif %}\n
my_preset.ymldescription: Enable ffuf but only when the web spider isn't also enabled\n\nmodules:\n - ffuf\n\nconditions:\n - |\n {% if config.web.spider_distance > 0 and config.web.spider_depth > 0 %}\n {{ warn(\"Disabling ffuf because the web spider is enabled\") }}\n {{ preset.exclude_module(\"ffuf\") }}\n {% endif %}\n
Conditions use Jinja, which means they can contain Python code. They run inside a sandboxed environment which has access to the following variables:
preset
- the current preset objectconfig
- the current config (an alias for preset.config
)warn(message)
- display a custom warning message to the userabort(message)
- abort the scan with an optional messageIf you aren't able to accomplish what you want with conditions, or if you need access to a new variable/function, please let us know on Github.
"},{"location":"scanning/presets_list/","title":"List of Presets","text":"Below is a list of every default BBOT preset, including its YAML.
"},{"location":"scanning/presets_list/#cloud-enum","title":"cloud-enum","text":"Enumerate cloud resources such as storage buckets, etc.
cloud-enum.yml
~/.bbot/presets/cloud-enum.ymldescription: Enumerate cloud resources such as storage buckets, etc.\n\ninclude:\n - subdomain-enum\n\nflags:\n - cloud-enum\n
Modules: 53
"},{"location":"scanning/presets_list/#code-enum","title":"code-enum","text":"Enumerate Git repositories, Docker images, etc.
code-enum.yml
~/.bbot/presets/code-enum.ymldescription: Enumerate Git repositories, Docker images, etc.\n\nflags:\n - code-enum\n
Modules: 10
"},{"location":"scanning/presets_list/#dirbust-heavy","title":"dirbust-heavy","text":"Recursive web directory brute-force (aggressive)
dirbust-heavy.yml
~/.bbot/presets/web/dirbust-heavy.ymldescription: Recursive web directory brute-force (aggressive)\n\ninclude:\n - spider\n\nflags:\n - iis-shortnames\n\nmodules:\n - ffuf\n - wayback\n\nconfig:\n modules:\n iis_shortnames:\n # we exploit the shortnames vulnerability to produce URL_HINTs which are consumed by ffuf_shortnames\n detect_only: False\n ffuf:\n depth: 3\n lines: 5000\n extensions:\n - php\n - asp\n - aspx\n - ashx\n - asmx\n - jsp\n - jspx\n - cfm\n - zip\n - conf\n - config\n - xml\n - json\n - yml\n - yaml\n # emit URLs from wayback\n wayback:\n urls: True\n
Category: web
Modules: 5
"},{"location":"scanning/presets_list/#dirbust-light","title":"dirbust-light","text":"Basic web directory brute-force (surface-level directories only)
dirbust-light.yml
~/.bbot/presets/web/dirbust-light.ymldescription: Basic web directory brute-force (surface-level directories only)\n\ninclude:\n - iis-shortnames\n\nmodules:\n - ffuf\n\nconfig:\n modules:\n ffuf:\n # wordlist size = 1000\n lines: 1000\n
Category: web
Modules: 4
"},{"location":"scanning/presets_list/#dotnet-audit","title":"dotnet-audit","text":"Comprehensive scan for all IIS/.NET specific modules and module settings
dotnet-audit.yml
~/.bbot/presets/web/dotnet-audit.ymldescription: Comprehensive scan for all IIS/.NET specific modules and module settings\n\n\ninclude:\n - iis-shortnames\n\nmodules:\n - httpx\n - badsecrets\n - ffuf_shortnames\n - ffuf\n - telerik\n - ajaxpro\n - dotnetnuke\n\nconfig:\n modules:\n ffuf:\n extensions: asp,aspx,ashx,asmx,ascx\n telerik:\n exploit_RAU_crypto: True\n
Category: web
Modules: 8
"},{"location":"scanning/presets_list/#email-enum","title":"email-enum","text":"Enumerate email addresses from APIs, web crawling, etc.
email-enum.yml
~/.bbot/presets/email-enum.ymldescription: Enumerate email addresses from APIs, web crawling, etc.\n\nflags:\n - email-enum\n\noutput_modules:\n - emails\n
Modules: 7
"},{"location":"scanning/presets_list/#iis-shortnames","title":"iis-shortnames","text":"Recursively enumerate IIS shortnames
iis-shortnames.yml
~/.bbot/presets/web/iis-shortnames.ymldescription: Recursively enumerate IIS shortnames\n\nflags:\n - iis-shortnames\n\nconfig:\n modules:\n iis_shortnames:\n # exploit the vulnerability\n detect_only: false\n
Category: web
Modules: 3
"},{"location":"scanning/presets_list/#kitchen-sink","title":"kitchen-sink","text":"Everything everywhere all at once
kitchen-sink.yml
~/.bbot/presets/kitchen-sink.ymldescription: Everything everywhere all at once\n\ninclude:\n - subdomain-enum\n - cloud-enum\n - code-enum\n - email-enum\n - spider\n - web-basic\n - paramminer\n - dirbust-light\n - web-screenshots\n\nconfig:\n modules:\n baddns:\n enable_references: True\n
Modules: 75
"},{"location":"scanning/presets_list/#paramminer","title":"paramminer","text":"Discover new web parameters via brute-force
paramminer.yml
~/.bbot/presets/web/paramminer.ymldescription: Discover new web parameters via brute-force\n\nflags:\n - web-paramminer\n\nmodules:\n - httpx\n\nconfig:\n web:\n spider_distance: 1\n spider_depth: 4\n
Category: web
Modules: 4
"},{"location":"scanning/presets_list/#spider","title":"spider","text":"Recursive web spider
spider.yml
~/.bbot/presets/spider.ymldescription: Recursive web spider\n\nmodules:\n - httpx\n\nconfig:\n web:\n # how many links to follow in a row\n spider_distance: 2\n # don't follow links whose directory depth is higher than 4\n spider_depth: 4\n # maximum number of links to follow per page\n spider_links_per_page: 25\n
Modules: 1
"},{"location":"scanning/presets_list/#subdomain-enum","title":"subdomain-enum","text":"Enumerate subdomains via APIs, brute-force
subdomain-enum.yml
~/.bbot/presets/subdomain-enum.ymldescription: Enumerate subdomains via APIs, brute-force\n\nflags:\n # enable every module with the subdomain-enum flag\n - subdomain-enum\n\noutput_modules:\n # output unique subdomains to TXT file\n - subdomains\n\nconfig:\n dns:\n threads: 25\n brute_threads: 1000\n # put your API keys here\n modules:\n github:\n api_key: \"\"\n chaos:\n api_key: \"\"\n securitytrails:\n api_key: \"\"\n
Modules: 46
"},{"location":"scanning/presets_list/#web-basic","title":"web-basic","text":"Quick web scan
web-basic.yml
~/.bbot/presets/web-basic.ymldescription: Quick web scan\n\ninclude:\n - iis-shortnames\n\nflags:\n - web-basic\n
Modules: 18
"},{"location":"scanning/presets_list/#web-screenshots","title":"web-screenshots","text":"Take screenshots of webpages
web-screenshots.yml
~/.bbot/presets/web-screenshots.ymldescription: Take screenshots of webpages\n\nflags:\n - web-screenshots\n\nconfig:\n modules:\n gowitness:\n resolution_x: 1440\n resolution_y: 900\n # folder to output web screenshots (default is inside ~/.bbot/scans/scan_name)\n output_path: \"\"\n # whether to take screenshots of social media pages\n social: True\n
Modules: 3
"},{"location":"scanning/presets_list/#web-thorough","title":"web-thorough","text":"Aggressive web scan
web-thorough.yml
~/.bbot/presets/web-thorough.ymldescription: Aggressive web scan\n\ninclude:\n # include the web-basic preset\n - web-basic\n\nflags:\n - web-thorough\n
Modules: 29
"},{"location":"scanning/presets_list/#table-of-default-presets","title":"Table of Default Presets","text":"Here is a the same data, but in a table:
Preset Category Description # Modules Modules cloud-enum Enumerate cloud resources such as storage buckets, etc. 53 anubisdb, asn, azure_realm, azure_tenant, baddns, baddns_zone, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, github_codesearch, github_org, hackertarget, httpx, hunterio, internetdb, ipneighbor, leakix, myssl, oauth, otx, passivetotal, postman, rapiddns, securitytrails, shodan_dns, sitedossier, social, sslcert, subdomaincenter, trickest, urlscan, virustotal, wayback, zoomeye code-enum Enumerate Git repositories, Docker images, etc. 10 code_repository, dockerhub, git, github_codesearch, github_org, gitlab, httpx, postman, social, trufflehog dirbust-heavy web Recursive web directory brute-force (aggressive) 5 ffuf, ffuf_shortnames, httpx, iis_shortnames, wayback dirbust-light web Basic web directory brute-force (surface-level directories only) 4 ffuf, ffuf_shortnames, httpx, iis_shortnames dotnet-audit web Comprehensive scan for all IIS/.NET specific modules and module settings 8 ajaxpro, badsecrets, dotnetnuke, ffuf, ffuf_shortnames, httpx, iis_shortnames, telerik email-enum Enumerate email addresses from APIs, web crawling, etc. 7 dehashed, dnscaa, emailformat, hunterio, pgp, skymem, sslcert iis-shortnames web Recursively enumerate IIS shortnames 3 ffuf_shortnames, httpx, iis_shortnames kitchen-sink Everything everywhere all at once 75 anubisdb, asn, azure_realm, azure_tenant, baddns, baddns_zone, badsecrets, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, code_repository, columbus, crt, dehashed, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, dockerhub, emailformat, ffuf, ffuf_shortnames, filedownload, fullhunt, git, github_codesearch, github_org, gitlab, gowitness, hackertarget, httpx, hunterio, iis_shortnames, internetdb, ipneighbor, leakix, myssl, ntlm, oauth, otx, paramminer_cookies, paramminer_getparams, paramminer_headers, passivetotal, pgp, postman, rapiddns, robots, secretsdb, securitytrails, shodan_dns, sitedossier, skymem, social, sslcert, subdomaincenter, trickest, trufflehog, urlscan, virustotal, wappalyzer, wayback, zoomeye paramminer web Discover new web parameters via brute-force 4 httpx, paramminer_cookies, paramminer_getparams, paramminer_headers spider Recursive web spider 1 httpx subdomain-enum Enumerate subdomains via APIs, brute-force 46 anubisdb, asn, azure_realm, azure_tenant, baddns_zone, bevigil, binaryedge, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, github_codesearch, github_org, hackertarget, httpx, hunterio, internetdb, ipneighbor, leakix, myssl, oauth, otx, passivetotal, postman, rapiddns, securitytrails, shodan_dns, sitedossier, social, sslcert, subdomaincenter, trickest, urlscan, virustotal, wayback, zoomeye web-basic Quick web scan 18 azure_realm, baddns, badsecrets, bucket_amazon, bucket_azure, bucket_firebase, bucket_google, ffuf_shortnames, filedownload, git, httpx, iis_shortnames, ntlm, oauth, robots, secretsdb, sslcert, wappalyzer web-screenshots Take screenshots of webpages 3 gowitness, httpx, social web-thorough Aggressive web scan 29 ajaxpro, azure_realm, baddns, badsecrets, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, bypass403, dastardly, dotnetnuke, ffuf_shortnames, filedownload, generic_ssrf, git, host_header, httpx, hunt, iis_shortnames, ntlm, oauth, robots, secretsdb, smuggler, sslcert, telerik, url_manipulation, wappalyzer"},{"location":"scanning/tips_and_tricks/","title":"Tips and Tricks","text":"Below are some helpful tricks to help you in your adventures.
"},{"location":"scanning/tips_and_tricks/#change-verbosity-during-scan","title":"Change Verbosity During Scan","text":"Press enter during a BBOT scan to change the log level. This will allow you to see debugging messages, etc.
"},{"location":"scanning/tips_and_tricks/#kill-individual-module-during-scan","title":"Kill Individual Module During Scan","text":"Sometimes a certain module can get stuck or slow down the scan. If this happens and you want to kill it, just type \"kill <module>
\" in the terminal and press enter. This will kill and disable the module for the rest of the scan.
You can also kill multiple modules at a time by specifying them in a space or comma-separated list:
kill httpx sslcert\n
"},{"location":"scanning/tips_and_tricks/#common-config-changes","title":"Common Config Changes","text":""},{"location":"scanning/tips_and_tricks/#speed-up-slow-modules","title":"Speed Up Slow Modules","text":"BBOT modules can be parallelized so that more than one instance runs at a time. By default, many modules are already set to reasonable defaults:
class baddns(BaseModule):\n module_threads = 8\n
To override this, you can set a module's module_threads
in the config:
# increase baddns threads to 20\nbbot -t evilcorp.com -m baddns -c modules.baddns.module_threads=20\n
"},{"location":"scanning/tips_and_tricks/#boost-dns-brute-force-speed","title":"Boost DNS Brute-force Speed","text":"If you have a fast internet connection or are running BBOT from a cloud VM, you can speed up subdomain enumeration by cranking the threads for massdns
. The default is 1000
, which is about 1MB/s of DNS traffic:
# massdns with 5000 resolvers, about 5MB/s\nbbot -t evilcorp.com -f subdomain-enum -c dns.brute_threads=5000\n
"},{"location":"scanning/tips_and_tricks/#web-spider","title":"Web Spider","text":"The web spider is great for finding juicy data like subdomains, email addresses, and javascript secrets buried in webpages. However since it can lengthen the duration of a scan, it's disabled by default. To enable the web spider, you must increase the value of web.spider_distance
.
The web spider is controlled with three config values:
web.spider_depth
(default: 1
: the maximum directory depth allowed. This is to prevent the spider from delving too deep into a website.web.spider_distance
(0
== all spidering disabled, default: 0
): the maximum number of links that can be followed in a row. This is designed to limit the spider in cases where web.spider_depth
fails (e.g. for an ecommerce website with thousands of base-level URLs).web.spider_links_per_page
(default: 25
): the maximum number of links per page that can be followed. This is designed to save you in cases where a single page has hundreds or thousands of links.Here is a typical example:
spider.ymlconfig:\n web:\n spider_depth: 2\n spider_distance: 2\n spider_links_per_page: 25\n
# run the web spider against www.evilcorp.com\nbbot -t www.evilcorp.com -m httpx -c spider.yml\n
You can also pair the web spider with subdomain enumeration:
# spider every subdomain of evilcorp.com\nbbot -t evilcorp.com -f subdomain-enum -c spider.yml\n
"},{"location":"scanning/tips_and_tricks/#ingesting-bbot-data-into-siem-elastic-splunk","title":"Ingesting BBOT Data Into SIEM (Elastic, Splunk)","text":"If your goal is to feed BBOT data into a SIEM such as Elastic, be sure to enable this option when scanning:
bbot -t evilcorp.com -c modules.json.siem_friendly=true\n
This nests the event's .data
beneath its event type like so:
{\n \"type\": \"DNS_NAME\",\n \"data\": {\n \"DNS_NAME\": \"blacklanternsecurity.com\"\n }\n}\n
"},{"location":"scanning/tips_and_tricks/#custom-http-proxy","title":"Custom HTTP Proxy","text":"Web pentesters may appreciate BBOT's ability to quickly populate Burp Suite site maps for all subdomains in a target. If your scan includes gowitness, this will capture the traffic as if you manually visited each website in your browser -- including auxiliary web resources and javascript API calls. To accomplish this, set the web.http_proxy
config option like so:
# enumerate subdomains, take web screenshots, proxy through Burp\nbbot -t evilcorp.com -f subdomain-enum -m gowitness -c web.http_proxy=http://127.0.0.1:8080\n
"},{"location":"scanning/tips_and_tricks/#display-http_response-events","title":"Display HTTP_RESPONSE
Events","text":"BBOT's httpx
module emits HTTP_RESPONSE
events, but by default they're hidden from output. These events contain the full raw HTTP body along with headers, etc. If you want to see them, you can modify omit_event_types
in the config:
omit_event_types:\n - URL_UNVERIFIED\n # - HTTP_RESPONSE\n
"},{"location":"scanning/tips_and_tricks/#display-out-of-scope-events","title":"Display Out-of-scope Events","text":"By default, BBOT only shows in-scope events (with a few exceptions for things like storage buckets). If you want to see events that BBOT is emitting internally (such as for DNS resolution, etc.), you can increase scope.report_distance
in the config or on the command line like so:
# display events up to scope distance 2 (default == 0)\nbbot -f subdomain-enum -t evilcorp.com -c scope.report_distance=2\n
"},{"location":"scanning/tips_and_tricks/#speed-up-scans-by-disabling-dns-resolution","title":"Speed Up Scans By Disabling DNS Resolution","text":"If you already have a list of discovered targets (e.g. URLs), you can speed up the scan by skipping BBOT's DNS resolution. You can do this by setting dns.disable
to true
:
# completely disable DNS resolution\nbbot -m httpx gowitness wappalyzer -t urls.txt -c dns.disable=true\n
Note that the above setting completely disables DNS resolution, meaning even A
and AAAA
records are not resolved. This can cause problems if you're using an IP whitelist or blacklist. In this case, you'll want to use dns.minimal
instead:
# only resolve A and AAAA records\nbbot -m httpx gowitness wappalyzer -t urls.txt -c dns.minimal=true\n
"},{"location":"scanning/tips_and_tricks/#faq","title":"FAQ","text":""},{"location":"scanning/tips_and_tricks/#what-is-url_unverified","title":"What is URL_UNVERIFIED
?","text":"URL_UNVERIFIED
events are URLs that haven't yet been visited by httpx
. Once httpx
visits them, it reraises them as URL
s, tagged with their resulting status code.
For example, when excavate
gets an HTTP_RESPONSE
event, it extracts links from the raw HTTP response as URL_UNVERIFIED
s and then passes them back to httpx
to be visited.
By default, URL_UNVERIFIED
s are hidden from output. If you want to see all of them including the out-of-scope ones, you can do it by changing omit_event_types
and scope.report_distance
in the config like so:
# visit www.evilcorp.com and extract all the links\nbbot -t www.evilcorp.com -m httpx -c omit_event_types=[] scope.report_distance=2\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Getting Started","text":"A BBOT scan in real-time - visualization with VivaGraphJS
"},{"location":"#installation","title":"Installation","text":"Supported Platforms
Only Linux is supported at this time. Windows and macOS are not supported. If you use one of these platforms, consider using Docker.
BBOT offers multiple methods of installation, including pipx and Docker. If you plan to dev on BBOT, see Installation (Poetry).
"},{"location":"#python-pip-pipx","title":"Python (pip / pipx)","text":"Notepipx
installs BBOT inside its own virtual environment.
# stable version\npipx install bbot\n\n# bleeding edge (dev branch)\npipx install --pip-args '\\--pre' bbot\n\n# execute bbot command\nbbot --help\n
"},{"location":"#docker","title":"Docker","text":"Docker images are provided, along with helper script bbot-docker.sh
to persist your scan data.
# bleeding edge (dev)\ndocker run -it blacklanternsecurity/bbot --help\n\n# stable\ndocker run -it blacklanternsecurity/bbot:stable --help\n\n# helper script\ngit clone https://github.com/blacklanternsecurity/bbot && cd bbot\n./bbot-docker.sh --help\n
"},{"location":"#example-commands","title":"Example Commands","text":"Below are some examples of common scans.
Subdomains:
# Perform a full subdomain enumeration on evilcorp.com\nbbot -t evilcorp.com -p subdomain-enum\n
Subdomains (passive only):
# Perform a passive-only subdomain enumeration on evilcorp.com\nbbot -t evilcorp.com -p subdomain-enum -rf passive\n
Subdomains + port scan + web screenshots:
# Port-scan every subdomain, screenshot every webpage, output to current directory\nbbot -t evilcorp.com -p subdomain-enum -m portscan gowitness -n my_scan -o .\n
Subdomains + basic web scan:
# A basic web scan includes wappalyzer, robots.txt, and other non-intrusive web modules\nbbot -t evilcorp.com -p subdomain-enum web-basic\n
Web spider:
# Crawl www.evilcorp.com up to a max depth of 2, automatically extracting emails, secrets, etc.\nbbot -t www.evilcorp.com -p spider -c web.spider_distance=2 web.spider_depth=2\n
Everything everywhere all at once:
# Subdomains, emails, cloud buckets, port scan, basic web, web screenshots, nuclei\nbbot -t evilcorp.com -p kitchen-sink\n
"},{"location":"#api-keys","title":"API Keys","text":"BBOT works just fine without API keys. However, there are certain modules that need them to function. If you have API keys and want to make use of these modules, you can place them either in your preset:
my_preset.ymldescription: My custom subdomain enum preset\n\ninclude:\n - subdomain-enum\n - cloud-enum\n\nconfig:\n modules:\n shodan_dns:\n api_key: deadbeef\n virustotal:\n api_key: cafebabe\n
...in BBOT's global YAML config (~/.config/bbot/bbot.yml
):
Note: this will ensure the API keys are used in all scans, regardless of preset.
~/.config/bbot/bbot.ymlmodules:\n shodan_dns:\n api_key: deadbeef\n virustotal:\n api_key: cafebabe\n
...or directly on the command-line:
# specify API key with -c\nbbot -t evilcorp.com -f subdomain-enum -c modules.shodan_dns.api_key=deadbeef modules.virustotal.api_key=cafebabe\n
For more information, see Configuration. For a full list of modules, including which ones require API keys, see List of Modules.
Next Up: Scanning -->
"},{"location":"comparison/","title":"Comparison to Other Tools","text":"BBOT does a lot more than just subdomain enumeration. However, subdomain enumeration is arguably the most important part of OSINT, and since there's so many subdomain enumeration tools out there, they're the easiest class of tool to compare it to.
Thanks to BBOT's recursive nature (and its dnsbrute_mutations
module with its NLP-powered subdomain mutations), it typically finds about 20-25% more than other tools such as Amass
or theHarvester
. This holds true especially for larger targets like delta.com
(1000+ subdomains):
For a detailed analysis of this data, please see Subdomain Enumeration Tool Face-Off
"},{"location":"comparison/#ebaycom-larger-domain","title":"Ebay.com (larger domain)","text":"Note that in this benchmark, Spiderfoot crashed after ~20 minutes due to excessive memory usage. Amass never finished and had to be cancelled after 24h. All other tools finished successfully.
"},{"location":"contribution/","title":"Contribution","text":"We welcome contributions! If you have an idea for a new module, or are a Python developer who wants to get involved, please fork us or come talk to us on Discord.
To get started devving, see the following links:
It's well-known that when you're doing recon, it's best to do it recursively. However, there are very few recursive tools, and the main reason for this is because making a recursive tool is hard. In particular, it's very difficult to build a large-scale recursive system that interacts with the internet, and to keep it stable. When we first set out to make BBOT, we didn't know this, and it was definitely a lesson we learned the hard way. BBOT's stability is thanks to its extensive Unit Tests.
BBOT inherits its recursive philosophy from Spiderfoot, which means it is also event-driven. Each of BBOT's 100+ modules consume a certain type of Event, use it to discover something new, and produce new events, which get distributed to all the other modules. This happens again and again -- thousands of times during a scan -- spidering outwards in a recursive web of discovery.
Below is an interactive graph showing the relationships between modules and the event types they produce and consume.
"},{"location":"how_it_works/#how-bbot-modules-work-together","title":"How BBOT Modules Work Together","text":"Each BBOT module does one specific task, such as querying an API for subdomains, or running a tool like nuclei
, and is carefully designed to work together with other modules inside BBOT's recursive system.
For example, the portscan
module consumes DNS_NAME
, and produces OPEN_TCP_PORT
. The sslcert
module consumes OPEN_TCP_PORT
and produces DNS_NAME
. You can see how even these two modules, when enabled together, will feed each other recursively.
Because of this, enabling even one module has the potential to increase your results exponentially. This is exactly how BBOT is able to outperform other tools.
To learn more about how events flow inside BBOT, see BBOT Internal Architecture.
"},{"location":"release_history/","title":"Release History","text":""},{"location":"release_history/#v117","title":"v1.1.7","text":"May 15th, 2024
"},{"location":"release_history/#new-modules","title":"New Modules","text":"February 21, 2024
"},{"location":"release_history/#improvements_1","title":"Improvements","text":"January 29, 2024
"},{"location":"release_history/#improvements_2","title":"Improvements","text":"January 11, 2024
"},{"location":"release_history/#improvements_3","title":"Improvements","text":"October 24, 2023
"},{"location":"release_history/#improvements_4","title":"Improvements","text":"October 11, 2023
Includes webhook output modules - Discord, Slack, and Teams!
"},{"location":"release_history/#improvements_5","title":"Improvements","text":"August 4, 2023
New Features:
-lf
Improvements / Fixes:
New Modules:
March 10, 2023
New Modules:
New Features:
December 15, 2022
New Modules:
New Features:
October 12, 2022
Changes:
retries
option for httpx moduleasset_inventory
output module--help
outputFatal error from pip prevented installation.
ERROR: No matching distribution found for bbot
bash: /home/user/.local/bin/bbot: /home/user/.local/pipx/venvs/bbot/bin/python: bad interpreter
If you get errors resembling any of the above, it's probably because your Python version is too old. To install a newer version (3.9+ is required), you will need to do something like this:
# install a newer version of python\nsudo apt install python3.9 python3.9-venv\n# install pipx\npython3.9 -m pip install --user pipx\n# add pipx to your path\npython3.9 -m pipx ensurepath\n# reboot\nreboot\n# install bbot\npython3.9 -m pipx install bbot\n# run bbot\nbbot --help\n
"},{"location":"troubleshooting/#modulenotfounderror","title":"ModuleNotFoundError
","text":"If you run into a ModuleNotFoundError
, try running your bbot
command again with --force-deps
. This will repair your modules' Python dependencies.
As a troubleshooting step it is sometimes useful to clear out your older configs and let BBOT generate new ones. This will ensure that new defaults are property restored, etc.
# make a backup of the old configs\nmv ~/.config/bbot ~/.config/bbot.bak\n\n# generate new configs\nbbot\n
"},{"location":"dev/","title":"BBOT Developer Reference","text":"BBOT exposes a Python API that allows you to create, start, and stop scans.
Documented in this section are commonly-used classes and functions within BBOT, along with usage examples.
"},{"location":"dev/#adding-bbot-to-your-python-project","title":"Adding BBOT to Your Python Project","text":"If you are using Poetry, you can add BBOT to your python environment like this:
# stable\npoetry add bbot\n\n# bleeding-edge (dev branch)\npoetry add bbot --allow-prereleases\n
"},{"location":"dev/#running-a-bbot-scan-from-python","title":"Running a BBOT Scan from Python","text":""},{"location":"dev/#synchronous","title":"Synchronous","text":"from bbot.scanner import Scanner\n\nif __name__ == \"__main__\":\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n for event in scan.start():\n print(event)\n
"},{"location":"dev/#asynchronous","title":"Asynchronous","text":"from bbot.scanner import Scanner\n\nasync def main():\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n async for event in scan.async_start():\n print(event.json())\n\nif __name__ == \"__main__\":\n import asyncio\n asyncio.run(main())\n
For a full listing of Scanner
attributes and functions, see the Scanner
Code Reference.
You can specify any number of targets:
# create a scan against multiple targets\nscan = Scanner(\n \"evilcorp.com\",\n \"evilcorp.org\",\n \"evilcorp.ce\",\n \"4.3.2.1\",\n \"1.2.3.4/24\",\n presets=[\"subdomain-enum\"]\n)\n\n# this is the same as:\ntargets = [\"evilcorp.com\", \"evilcorp.org\", \"evilcorp.ce\", \"4.3.2.1\", \"1.2.3.4/24\"]\nscan = Scanner(*targets, presets=[\"subdomain-enum\"])\n
For more details, including which types of targets are valid, see Targets
"},{"location":"dev/#other-custom-options","title":"Other Custom Options","text":"In many cases, using a Preset like subdomain-enum
is sufficient. However, the Scanner
is flexible and accepts many other arguments that can override the default functionality. You can specify flags
, modules
, output_modules
, a whitelist
or blacklist
, and custom config
options:
# create a scan against multiple targets\nscan = Scanner(\n # targets\n \"evilcorp.com\",\n \"4.3.2.1\",\n # enable these presets\n presets=[\"subdomain-enum\"],\n # whitelist these hosts\n whitelist=[\"evilcorp.com\", \"evilcorp.org\"],\n # blacklist these hosts\n blacklist=[\"prod.evilcorp.com\"],\n # also enable these individual modules\n modules=[\"nuclei\", \"ipstack\"],\n # exclude modules with these flags\n exclude_flags=[\"slow\"],\n # custom config options\n config={\n \"modules\": {\n \"nuclei\": {\n \"tags\": \"apache,nginx\"\n }\n }\n }\n)\n
For a list of all the possible scan options, see the Presets
Code Reference
Here is a basic overview of BBOT's internal architecture.
"},{"location":"dev/architecture/#queues","title":"Queues","text":"Being both recursive and event-driven, BBOT makes heavy use of queues. These enable smooth communication between the modules, and ensure that large numbers of events can be produced without slowing down or clogging up the scan.
Every module in BBOT has both an incoming and outgoing queue. Event types matching the module's WATCHED_EVENTS
(e.g. DNS_NAME
) are queued in its incoming queue, and processed by the module's handle_event()
(or handle_batch()
in the case of batched modules). If the module finds anything interesting, it creates an event and places it in its outgoing queue, to be processed by the scan and redistributed to other modules.
Below is a graph showing the internal event flow in BBOT. White lines represent queues. Notice how some modules run in sequence, while others run in parallel. With the exception of a few specific modules, most BBOT modules are parallelized.
For a higher-level overview, see How it Works.
"},{"location":"dev/basemodule/","title":"BaseModule","text":""},{"location":"dev/basemodule/#bbot.modules.base.BaseModule","title":"BaseModule","text":"The base class for all BBOT modules.
Attributes:
watched_events
(List
) \u2013 Event types to watch.
produced_events
(List
) \u2013 Event types to produce.
meta
(Dict
) \u2013 Metadata about the module, such as whether authentication is required and a description.
flags
(List
) \u2013 Flags indicating the type of module (must have at least \"safe\" or \"aggressive\" and \"passive\" or \"active\").
deps_modules
(List
) \u2013 Other BBOT modules this module depends on. Empty list by default.
deps_pip
(List
) \u2013 Python dependencies to install via pip. Empty list by default.
deps_apt
(List
) \u2013 APT package dependencies to install. Empty list by default.
deps_shell
(List
) \u2013 Other dependencies installed via shell commands. Uses ansible.builtin.shell. Empty list by default.
deps_ansible
(List
) \u2013 Additional Ansible tasks for complex dependencies. Empty list by default.
accept_dupes
(bool
) \u2013 Whether to accept incoming duplicate events. Default is False.
suppress_dupes
(bool
) \u2013 Whether to suppress outgoing duplicate events. Default is True.
per_host_only
(bool
) \u2013 Limit the module to only scanning once per host. Default is False.
per_hostport_only
(bool
) \u2013 Limit the module to only scanning once per host:port. Default is False.
per_domain_only
(bool
) \u2013 Limit the module to only scanning once per domain. Default is False.
scope_distance_modifier
((int, None)
) \u2013 Modifies scope distance acceptance for events. Default is 0.
None == accept all events\n2 == accept events up to and including the scan's configured search distance plus two\n1 == accept events up to and including the scan's configured search distance plus one\n0 == (DEFAULT) accept events up to and including the scan's configured search distance\n
target_only
(bool
) \u2013 Accept only the initial target event(s). Default is False.
in_scope_only
(bool
) \u2013 Accept only explicitly in-scope events. Default is False.
options
(Dict
) \u2013 Customizable options for the module, e.g., {\"api_key\": \"\"}. Empty dict by default.
options_desc
(Dict
) \u2013 Descriptions for options, e.g., {\"api_key\": \"API Key\"}. Empty dict by default.
module_threads
(int
) \u2013 Maximum concurrent instances of handle_event() or handle_batch(). Default is 1.
batch_size
(int
) \u2013 Size of batches processed by handle_batch(). Default is 1.
batch_wait
(int
) \u2013 Seconds to wait before force-submitting a batch. Default is 10.
failed_request_abort_threshold
(int
) \u2013 Threshold for setting error state after failed HTTP requests (only takes effect when request_with_fail_count()
is used. Default is 5.
_preserve_graph
(bool
) \u2013 When set to True, accept events that may be duplicates but are necessary for construction of complete graph. Typically only enabled for output modules that need to maintain full chains of events, e.g. neo4j
and json
. Default is False.
_stats_exclude
(bool
) \u2013 Whether to exclude this module from scan statistics. Default is False.
_qsize
(int
) \u2013 Outgoing queue size (0 for infinite). Default is 0.
_priority
(int
) \u2013 Priority level of events raised by this module, 1-5. Default is 3.
_name
(str
) \u2013 Module name, overridden automatically. Default is 'base'.
_type
(str
) \u2013 Module type, for differentiating between normal and output modules. Default is 'scan'.
bbot/modules/base.py
class BaseModule:\n \"\"\"The base class for all BBOT modules.\n\n Attributes:\n watched_events (List): Event types to watch.\n\n produced_events (List): Event types to produce.\n\n meta (Dict): Metadata about the module, such as whether authentication is required and a description.\n\n flags (List): Flags indicating the type of module (must have at least \"safe\" or \"aggressive\" and \"passive\" or \"active\").\n\n deps_modules (List): Other BBOT modules this module depends on. Empty list by default.\n\n deps_pip (List): Python dependencies to install via pip. Empty list by default.\n\n deps_apt (List): APT package dependencies to install. Empty list by default.\n\n deps_shell (List): Other dependencies installed via shell commands. Uses [ansible.builtin.shell](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/shell_module.html). Empty list by default.\n\n deps_ansible (List): Additional Ansible tasks for complex dependencies. Empty list by default.\n\n accept_dupes (bool): Whether to accept incoming duplicate events. Default is False.\n\n suppress_dupes (bool): Whether to suppress outgoing duplicate events. Default is True.\n\n per_host_only (bool): Limit the module to only scanning once per host. Default is False.\n\n per_hostport_only (bool): Limit the module to only scanning once per host:port. Default is False.\n\n per_domain_only (bool): Limit the module to only scanning once per domain. Default is False.\n\n scope_distance_modifier (int, None): Modifies scope distance acceptance for events. Default is 0.\n ```\n None == accept all events\n 2 == accept events up to and including the scan's configured search distance plus two\n 1 == accept events up to and including the scan's configured search distance plus one\n 0 == (DEFAULT) accept events up to and including the scan's configured search distance\n ```\n\n target_only (bool): Accept only the initial target event(s). Default is False.\n\n in_scope_only (bool): Accept only explicitly in-scope events. Default is False.\n\n options (Dict): Customizable options for the module, e.g., {\"api_key\": \"\"}. Empty dict by default.\n\n options_desc (Dict): Descriptions for options, e.g., {\"api_key\": \"API Key\"}. Empty dict by default.\n\n module_threads (int): Maximum concurrent instances of handle_event() or handle_batch(). Default is 1.\n\n batch_size (int): Size of batches processed by handle_batch(). Default is 1.\n\n batch_wait (int): Seconds to wait before force-submitting a batch. Default is 10.\n\n failed_request_abort_threshold (int): Threshold for setting error state after failed HTTP requests (only takes effect when `request_with_fail_count()` is used. Default is 5.\n\n _preserve_graph (bool): When set to True, accept events that may be duplicates but are necessary for construction of complete graph. Typically only enabled for output modules that need to maintain full chains of events, e.g. `neo4j` and `json`. Default is False.\n\n _stats_exclude (bool): Whether to exclude this module from scan statistics. Default is False.\n\n _qsize (int): Outgoing queue size (0 for infinite). Default is 0.\n\n _priority (int): Priority level of events raised by this module, 1-5. Default is 3.\n\n _name (str): Module name, overridden automatically. Default is 'base'.\n\n _type (str): Module type, for differentiating between normal and output modules. Default is 'scan'.\n \"\"\"\n\n watched_events = []\n produced_events = []\n meta = {\"auth_required\": False, \"description\": \"Base module\"}\n flags = []\n options = {}\n options_desc = {}\n\n deps_modules = []\n deps_pip = []\n deps_apt = []\n deps_shell = []\n deps_ansible = []\n\n accept_dupes = False\n suppress_dupes = True\n per_host_only = False\n per_hostport_only = False\n per_domain_only = False\n scope_distance_modifier = 0\n target_only = False\n in_scope_only = False\n\n _module_threads = 1\n _batch_size = 1\n batch_wait = 10\n failed_request_abort_threshold = 5\n\n default_discovery_context = \"{module} discovered {event.type}: {event.data}\"\n\n _preserve_graph = False\n _stats_exclude = False\n _qsize = 1000\n _priority = 3\n _name = \"base\"\n _type = \"scan\"\n _intercept = False\n _shuffle_incoming_queue = True\n\n def __init__(self, scan):\n \"\"\"Initializes a module instance.\n\n Args:\n scan: The BBOT scan object associated with this module instance.\n\n Attributes:\n scan: The scan object associated with this module.\n\n errored (bool): Whether the module has errored out. Default is False.\n \"\"\"\n self.scan = scan\n self.errored = False\n self._log = None\n self._incoming_event_queue = None\n self._outgoing_event_queue = None\n # track incoming events to prevent unwanted duplicates\n self._incoming_dup_tracker = set()\n # tracks which subprocesses are running under this module\n self._proc_tracker = set()\n # seconds since we've submitted a batch\n self._last_submitted_batch = None\n # additional callbacks to be executed alongside self.cleanup()\n self.cleanup_callbacks = []\n self._cleanedup = False\n self._watched_events = None\n\n self._task_counter = TaskCounter()\n\n # string constant\n self._custom_filter_criteria_msg = \"it did not meet custom filter criteria\"\n\n # track number of failures (for .request_with_fail_count())\n self._request_failures = 0\n\n self._tasks = []\n self._event_received = asyncio.Condition()\n self._event_queued = asyncio.Condition()\n\n # used for optional \"per host\" tracking\n self._per_host_tracker = set()\n\n async def setup(self):\n \"\"\"\n Performs one-time setup tasks for the module.\n\n This method is responsible for preparing the module for its operation, which may include tasks\n such as downloading necessary resources, validating configuration parameters, or other preliminary\n checks.\n\n Returns:\n tuple:\n - bool or None: A status indicating the outcome of the setup process. Returns `True` if\n the setup was successful, `None` for a soft-fail where the module setup did not succeed\n but the scan will continue with the module disabled, and `False` for a hard-fail where\n the setup failure causes the scan to abort.\n - str, optional: A reason for the setup failure, provided only when the setup does not\n succeed (i.e., returns `None` or `False`).\n\n Examples:\n >>> async def setup(self):\n >>> if not self.config.get(\"api_key\"):\n >>> # Soft-fail: Configuration missing an API key\n >>> return None, \"No API key specified\"\n\n >>> async def setup(self):\n >>> try:\n >>> wordlist = await self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n >>> except WordlistError as e:\n >>> # Hard-fail: Error retrieving wordlist\n >>> return False, f\"Error retrieving wordlist: {e}\"\n\n >>> async def setup(self):\n >>> self.timeout = self.config.get(\"timeout\", 5)\n >>> # Success: Setup completed without issues\n >>> return True\n \"\"\"\n\n return True\n\n async def handle_event(self, event):\n \"\"\"Asynchronously handles incoming events that the module is configured to watch.\n\n This method is automatically invoked when an event that matches any in `watched_events` is encountered during a scan. Override this method to implement custom event-handling logic for your module.\n\n Args:\n event (Event): The event object containing details about the incoming event.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to 1.\n\n Returns:\n None\n \"\"\"\n pass\n\n async def handle_batch(self, *events):\n \"\"\"Handles incoming events in batches for optimized processing.\n\n This method is automatically called when multiple events that match any in `watched_events` are encountered and the `batch_size` attribute is set to a value greater than 1. Override this method to implement custom batch event-handling logic for your module.\n\n Args:\n *events (Event): A variable number of Event objects to be processed in a batch.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to a value greater than 1.\n\n Returns:\n None\n \"\"\"\n pass\n\n async def filter_event(self, event):\n \"\"\"Asynchronously filters incoming events based on custom criteria.\n\n Override this method for more granular control over which events are accepted by your module. This method is called automatically before `handle_event()` for each incoming event that matches any in `watched_events`.\n\n Args:\n event (Event): The incoming Event object to be filtered.\n\n Returns:\n tuple: A 2-tuple where the first value is a bool indicating whether the event should be accepted, and the second value is a string explaining the reason for its acceptance or rejection. By default, returns `(True, None)` to indicate acceptance without reason.\n\n Note:\n This method should be overridden if the module requires custom logic for event filtering.\n \"\"\"\n return True\n\n async def finish(self):\n \"\"\"Asynchronously performs final tasks as the scan nears completion.\n\n This method can be overridden to execute any necessary finalization logic. For example, if the module relies on a word cloud, you might wait for the scan to finish to ensure the word cloud is most complete before running an operation.\n\n Returns:\n None\n\n Warnings:\n This method may be called multiple times since it can raise events, which may re-trigger the \"finish\" phase of the scan. Optional to override.\n \"\"\"\n return\n\n async def report(self):\n \"\"\"Asynchronously executes a final task after the scan is complete but before cleanup.\n\n This method can be overridden to aggregate data and raise summary events at the end of the scan.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan.\n \"\"\"\n return\n\n async def cleanup(self):\n \"\"\"Asynchronously performs final cleanup operations after the scan is complete.\n\n This method can be overridden to implement custom cleanup logic. It is called only once per scan and may not raise events.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan and may not raise events.\n \"\"\"\n return\n\n async def require_api_key(self):\n \"\"\"\n Asynchronously checks if an API key is required and valid.\n\n Args:\n None\n\n Returns:\n bool or tuple: Returns True if API key is valid and ready.\n Returns a tuple (None, \"error message\") otherwise.\n\n Notes:\n - Fetches the API key from the configuration.\n - Calls the 'ping()' method to test API accessibility.\n - Sets the API key readiness status accordingly.\n \"\"\"\n self.api_key = self.config.get(\"api_key\", \"\")\n if self.auth_secret:\n try:\n await self.ping()\n self.hugesuccess(f\"API is ready\")\n return True\n except Exception as e:\n return None, f\"Error with API ({str(e).strip()})\"\n else:\n return None, \"No API key set\"\n\n async def ping(self):\n \"\"\"Asynchronously checks the health of the configured API.\n\n This method is used in conjunction with require_api_key() to verify that the API is not just configured, but also responsive. This method should include an assert statement to validate the API's health, typically by making a test request to a known endpoint.\n\n Example Usage:\n In your implementation, if the API has a \"/ping\" endpoint:\n async def ping(self):\n r = await self.request_with_fail_count(f\"{self.base_url}/ping\")\n resp_content = getattr(r, \"text\", \"\")\n assert getattr(r, \"status_code\", 0) == 200, resp_content\n\n Returns:\n None\n\n Raises:\n AssertionError: If the API does not respond as expected.\n \"\"\"\n return\n\n @property\n def batch_size(self):\n batch_size = self.config.get(\"batch_size\", None)\n # only allow overriding the batch size if its default value is greater than 1\n # this prevents modules from being accidentally neutered by an incorrect batch_size setting\n if batch_size is None or self._batch_size == 1:\n batch_size = self._batch_size\n return batch_size\n\n @property\n def module_threads(self):\n module_threads = self.config.get(\"module_threads\", None)\n if module_threads is None:\n module_threads = self._module_threads\n return module_threads\n\n @property\n def auth_secret(self):\n \"\"\"Indicates if the module is properly configured for authentication.\n\n This read-only property should be used to check whether all necessary attributes (e.g., API keys, tokens, etc.) are configured to perform authenticated requests in the module. Commonly used in setup or initialization steps.\n\n Returns:\n bool: True if the module is properly configured for authentication, otherwise False.\n \"\"\"\n return getattr(self, \"api_key\", \"\")\n\n def get_watched_events(self):\n \"\"\"Retrieve the set of events that the module is interested in observing.\n\n Override this method if the set of events the module should watch needs to be determined dynamically, e.g., based on configuration options or other runtime conditions.\n\n Returns:\n set: The set of event types that this module will handle.\n \"\"\"\n if self._watched_events is None:\n self._watched_events = set(self.watched_events)\n return self._watched_events\n\n async def _handle_batch(self):\n \"\"\"\n Asynchronously handles a batch of events in the module.\n\n Args:\n None\n\n Returns:\n bool: True if events were submitted for processing, False otherwise.\n\n Notes:\n - The method is wrapped in a task counter to monitor asynchronous operations.\n - Checks if there are any events in the incoming queue and module is not in an error state.\n - Invokes '_events_waiting()' to fetch a batch of events.\n - Calls the module's 'handle_batch()' method to process these events.\n - If a \"FINISHED\" event is found, invokes 'finish()' method of the module.\n \"\"\"\n finish = False\n async with self._task_counter.count(f\"{self.name}.handle_batch()\") as counter:\n submitted = False\n if self.batch_size <= 1:\n return\n if self.num_incoming_events > 0:\n events, finish = await self._events_waiting()\n if events and not self.errored:\n counter.n = len(events)\n self.verbose(f\"Handling batch of {len(events):,} events\")\n submitted = True\n async with self.scan._acatch(f\"{self.name}.handle_batch()\"):\n await self.handle_batch(*events)\n self.verbose(f\"Finished handling batch of {len(events):,} events\")\n if finish:\n context = f\"{self.name}.finish()\"\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.finish()\n return submitted\n\n def make_event(self, *args, **kwargs):\n \"\"\"Create an event for the scan.\n\n Raises a validation error if the event could not be created, unless raise_error is set to False.\n\n Args:\n *args: Positional arguments to be passed to the scan's make_event method.\n **kwargs: Keyword arguments to be passed to the scan's make_event method.\n raise_error (bool, optional): Whether to raise a validation error if the event could not be created. Defaults to False.\n\n Examples:\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n Event or None: The created event, or None if a validation error occurred and raise_error was False.\n\n Raises:\n ValidationError: If the event could not be validated and raise_error is True.\n \"\"\"\n raise_error = kwargs.pop(\"raise_error\", False)\n module = kwargs.pop(\"module\", None)\n if module is None:\n if (not args) or getattr(args[0], \"module\", None) is None:\n kwargs[\"module\"] = self\n try:\n event = self.scan.make_event(*args, **kwargs)\n except ValidationError as e:\n if raise_error:\n raise\n self.warning(f\"{e}\")\n return\n return event\n\n async def emit_event(self, *args, **kwargs):\n \"\"\"Emit an event to the event queue and distribute it to interested modules.\n\n This is how modules \"return\" data.\n\n The method first creates an event object by calling `self.make_event()` with the provided arguments.\n Then, the event is queued for outgoing distribution using `self.queue_outgoing_event()`.\n\n Args:\n *args: Positional arguments to be passed to `self.make_event()` for event creation.\n **kwargs: Keyword arguments to be passed for event creation or configuration of the emit action.\n ```markdown\n - on_success_callback: Optional callback function to execute upon successful event emission.\n - abort_if: Optional condition under which the event emission should be aborted.\n - quick: Optional flag to indicate whether the event should be processed quickly.\n ```\n\n Examples:\n >>> await self.emit_event(\"www.evilcorp.com\", parent=event, tags=[\"affiliate\"])\n\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n None\n\n Raises:\n ValidationError: If the event cannot be validated (handled in `self.make_event()`).\n \"\"\"\n event_kwargs = dict(kwargs)\n emit_kwargs = {}\n for o in (\"on_success_callback\", \"abort_if\", \"quick\"):\n v = event_kwargs.pop(o, None)\n if v is not None:\n emit_kwargs[o] = v\n event = self.make_event(*args, **event_kwargs)\n if event:\n await self.queue_outgoing_event(event, **emit_kwargs)\n return event\n\n async def _events_waiting(self, batch_size=None):\n \"\"\"\n Asynchronously fetches events from the incoming_event_queue, up to a specified batch size.\n\n Args:\n None\n\n Returns:\n tuple: A tuple containing two elements:\n - events (list): A list of acceptable events from the queue.\n - finish (bool): A flag indicating if a \"FINISHED\" event is encountered.\n\n Notes:\n - The method pulls events from incoming_event_queue using 'get_nowait()'.\n - Events go through '_event_postcheck()' for validation.\n - \"FINISHED\" events are handled differently and the finish flag is set to True.\n - If the queue is empty or the batch size is reached, the loop breaks.\n \"\"\"\n if batch_size is None:\n batch_size = self.batch_size\n events = []\n finish = False\n while self.incoming_event_queue:\n if batch_size != -1 and len(events) > self.batch_size:\n break\n try:\n event = self.incoming_event_queue.get_nowait()\n self.debug(f\"Got {event} from {getattr(event, 'module', 'unknown_module')}\")\n acceptable, reason = await self._event_postcheck(event)\n if acceptable:\n if event.type == \"FINISHED\":\n finish = True\n else:\n events.append(event)\n self.scan.stats.event_consumed(event, self)\n elif reason:\n self.debug(f\"Not accepting {event} because {reason}\")\n except asyncio.queues.QueueEmpty:\n break\n return events, finish\n\n @property\n def num_incoming_events(self):\n ret = 0\n if self.incoming_event_queue is not False:\n ret = self.incoming_event_queue.qsize()\n return ret\n\n def start(self):\n self._tasks = [\n asyncio.create_task(self._worker(), name=f\"{self.scan.name}.{self.name}._worker()\")\n for _ in range(self.module_threads)\n ]\n\n async def _setup(self):\n \"\"\"\n Asynchronously sets up the module by invoking its 'setup()' method.\n\n This method catches exceptions during setup, sets the module's error state if necessary, and determines the\n status code based on the result of the setup process.\n\n Args:\n None\n\n Returns:\n tuple: A tuple containing the module's name, status (True for success, False for hard-fail, None for soft-fail),\n and an optional status message.\n\n Raises:\n Exception: Captured exceptions from the 'setup()' method are logged, but not propagated.\n\n Notes:\n - The 'setup()' method can return either a simple boolean status or a tuple of status and message.\n - A WordlistError exception triggers a soft-fail status.\n - The debug log will contain setup status information for the module.\n \"\"\"\n status_codes = {False: \"hard-fail\", None: \"soft-fail\", True: \"success\"}\n\n status = False\n self.debug(f\"Setting up module {self.name}\")\n try:\n result = await self.setup()\n if type(result) == tuple and len(result) == 2:\n status, msg = result\n else:\n status = result\n msg = status_codes[status]\n self.debug(f\"Finished setting up module {self.name}\")\n except Exception as e:\n self.set_error_state(f\"Unexpected error during module setup: {e}\", critical=True)\n msg = f\"{e}\"\n self.trace()\n return self, status, str(msg)\n\n async def _worker(self):\n \"\"\"\n The core worker loop for the module, responsible for handling events from the incoming event queue.\n\n This method is a coroutine and is run asynchronously. Multiple instances can run simultaneously based on\n the 'module_threads' configuration. The worker dequeues events from 'incoming_event_queue', performs\n necessary prechecks, and passes the event to the appropriate handler function.\n\n Args:\n None\n\n Returns:\n None\n\n Raises:\n asyncio.CancelledError: If the worker is cancelled during its operation.\n\n Notes:\n - The worker is sensitive to the 'stopping' flag of the scan. It will terminate if this flag is set.\n - The worker handles backpressure by pausing when the outgoing event queue is full.\n - Batch processing is supported and is activated when 'batch_size' > 1.\n - Each event is subject to a post-check via '_event_postcheck()' to decide whether it should be handled.\n - Special 'FINISHED' events trigger the 'finish()' method of the module.\n \"\"\"\n async with self.scan._acatch(context=self._worker, unhandled_is_critical=True):\n try:\n while not self.scan.stopping and not self.errored:\n # hold the reigns if our outgoing queue is full\n if self._qsize > 0 and self.outgoing_event_queue.qsize() >= self._qsize:\n await asyncio.sleep(0.1)\n continue\n\n if self.batch_size > 1:\n submitted = await self._handle_batch()\n if not submitted:\n async with self._event_received:\n await self._event_received.wait()\n\n else:\n try:\n if self.incoming_event_queue is not False:\n event = await self.incoming_event_queue.get()\n else:\n self.debug(f\"Event queue is in bad state\")\n break\n except asyncio.queues.QueueEmpty:\n continue\n self.debug(f\"Got {event} from {getattr(event, 'module', 'unknown_module')}\")\n async with self._task_counter.count(f\"event_postcheck({event})\"):\n acceptable, reason = await self._event_postcheck(event)\n if acceptable:\n if event.type == \"FINISHED\":\n context = f\"{self.name}.finish()\"\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.finish()\n else:\n context = f\"{self.name}.handle_event({event})\"\n self.scan.stats.event_consumed(event, self)\n self.debug(f\"Handling {event}\")\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.handle_event(event)\n self.debug(f\"Finished handling {event}\")\n else:\n self.debug(f\"Not accepting {event} because {reason}\")\n except asyncio.CancelledError:\n # this trace was used for debugging leaked CancelledErrors from inside httpx\n # self.log.trace(\"Worker cancelled\")\n raise\n except BaseException as e:\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt,)):\n self.scan.stop()\n else:\n self.error(f\"Critical failure in module {self.name}: {e}\")\n self.error(traceback.format_exc())\n self.log.trace(f\"Worker stopped\")\n\n @property\n def max_scope_distance(self):\n if self.in_scope_only or self.target_only:\n return 0\n if self.scope_distance_modifier is None:\n return 999\n return max(0, self.scan.scope_search_distance + self.scope_distance_modifier)\n\n def _event_precheck(self, event):\n \"\"\"\n Pre-checks an event to determine if it should be accepted by the module for queuing.\n\n This method is called when an event is about to be enqueued into the module's incoming event queue.\n It applies various filters such as special signal event types, module error state, watched event types, and more\n to decide whether or not the event should be enqueued.\n\n Args:\n event (Event): The event object to check.\n\n Returns:\n tuple: A tuple (bool, str) where the bool indicates if the event should be accepted, and the str gives the reason.\n\n Examples:\n >>> result, reason = self._event_precheck(event)\n >>> if result:\n ... self.incoming_event_queue.put_nowait(event)\n ... else:\n ... self.debug(f\"Not accepting {event} because {reason}\")\n\n Notes:\n - The method considers special signal event types like \"FINISHED\".\n - Checks whether the module is in an error state.\n - Checks if the event type matches the types this module is interested in (`watched_events`).\n - Checks for events tagged as 'target' if the module has `target_only` flag set.\n - Applies specific filtering based on event type and module name.\n \"\"\"\n\n # special signal event types\n if event.type in (\"FINISHED\",):\n return True, \"its type is FINISHED\"\n if self.errored:\n return False, f\"module is in error state\"\n # exclude non-watched types\n if not any(t in self.get_watched_events() for t in (\"*\", event.type)):\n return False, \"its type is not in watched_events\"\n if self.target_only:\n if \"target\" not in event.tags:\n return False, \"it did not meet target_only filter criteria\"\n\n # exclude certain URLs (e.g. javascript):\n # TODO: revisit this after httpx rework\n if event.type.startswith(\"URL\") and self.name != \"httpx\" and \"httpx-only\" in event.tags:\n return False, \"its extension was listed in url_extension_httpx_only\"\n\n return True, \"precheck succeeded\"\n\n async def _event_postcheck(self, event):\n \"\"\"\n A simple wrapper for dup tracking\n \"\"\"\n # special exception for \"FINISHED\" event\n if event.type in (\"FINISHED\",):\n return True, \"\"\n acceptable, reason = await self._event_postcheck_inner(event)\n if acceptable:\n # check duplicates\n is_incoming_duplicate, reason = self.is_incoming_duplicate(event, add=True)\n if is_incoming_duplicate and not self.accept_dupes:\n return False, f\"module has already seen it\" + (f\" ({reason})\" if reason else \"\")\n\n return acceptable, reason\n\n async def _event_postcheck_inner(self, event):\n \"\"\"\n Post-checks an event to determine if it should be accepted by the module for handling.\n\n This method is called when an event is dequeued from the module's incoming event queue, right before it is actually processed.\n It applies various filters such as scope, custom filtering logic, and per-host tracking to decide the event's fate.\n\n Args:\n event (Event): The event object to check.\n\n Returns:\n tuple: A tuple (bool, str) where the bool indicates if the event should be accepted, and the str gives the reason.\n\n Notes:\n - Override the `filter_event` method for custom filtering logic.\n - This method also maintains host-based tracking when the `per_host_only` or similar flags are set.\n - The method will also update event production stats for output modules.\n \"\"\"\n # force-output certain events to the graph\n if self._is_graph_important(event):\n return True, \"event is critical to the graph\"\n\n # check scope distance\n filter_result, reason = self._scope_distance_check(event)\n if not filter_result:\n return filter_result, reason\n\n # custom filtering\n async with self.scan._acatch(context=self.filter_event):\n try:\n filter_result = await self.filter_event(event)\n except Exception as e:\n msg = f\"Unhandled exception in {self.name}.filter_event({event}): {e}\"\n self.error(msg)\n return False, msg\n msg = str(self._custom_filter_criteria_msg)\n with suppress(ValueError, TypeError):\n filter_result, reason = filter_result\n msg += f\": {reason}\"\n if not filter_result:\n return False, msg\n\n self.debug(f\"{event} passed post-check\")\n return True, \"\"\n\n def _scope_distance_check(self, event):\n if self.in_scope_only:\n if event.scope_distance > 0:\n return False, \"it did not meet in_scope_only filter criteria\"\n if self.scope_distance_modifier is not None:\n if event.scope_distance < 0:\n return False, f\"its scope_distance ({event.scope_distance}) is invalid.\"\n elif event.scope_distance > self.max_scope_distance:\n return (\n False,\n f\"its scope_distance ({event.scope_distance}) exceeds the maximum allowed by the scan ({self.scan.scope_search_distance}) + the module ({self.scope_distance_modifier}) == {self.max_scope_distance}\",\n )\n return True, \"\"\n\n async def _cleanup(self):\n if not self._cleanedup:\n self._cleanedup = True\n for callback in [self.cleanup] + self.cleanup_callbacks:\n context = f\"{self.name}.cleanup()\"\n if callable(callback):\n async with self.scan._acatch(context), self._task_counter.count(context):\n await self.helpers.execute_sync_or_async(callback)\n\n async def queue_event(self, event):\n \"\"\"\n Asynchronously queues an incoming event to the module's event queue for further processing.\n\n The function performs an initial check to see if the event is acceptable for queuing.\n If the event passes the check, it is put into the `incoming_event_queue`.\n\n Args:\n event: The event object to be queued.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `incoming_event_queue`.\n\n Examples:\n >>> await self.queue_event(some_event)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue incoming events.\n \"\"\"\n async with self._task_counter.count(\"queue_event()\", _log=False):\n if self.incoming_event_queue is False:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n return\n acceptable, reason = self._event_precheck(event)\n if not acceptable:\n if reason and reason != \"its type is not in watched_events\":\n self.debug(f\"Not queueing {event} because {reason}\")\n return\n else:\n self.debug(f\"Queueing {event} because {reason}\")\n try:\n self.incoming_event_queue.put_nowait(event)\n async with self._event_received:\n self._event_received.notify()\n if event.type != \"FINISHED\":\n self.scan._new_activity = True\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n\n async def queue_outgoing_event(self, event, **kwargs):\n \"\"\"\n Queues an outgoing event to the module's outgoing event queue for further processing.\n\n The function attempts to put the event into the `outgoing_event_queue` immediately.\n If it's not possible due to the current state of the module, an AttributeError is raised, and a debug log is generated.\n\n Args:\n event: The event object to be queued.\n **kwargs: Additional keyword arguments to be associated with the event.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `outgoing_event_queue`.\n\n Examples:\n >>> self.queue_outgoing_event(some_outgoing_event, abort_if=lambda e: \"unresolved\" in e.tags)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue outgoing events.\n \"\"\"\n try:\n await self.outgoing_event_queue.put((event, kwargs))\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue outgoing event\")\n\n def set_error_state(self, message=None, clear_outgoing_queue=False, critical=False):\n \"\"\"\n Puts the module into an errored state where it cannot accept new events. Optionally logs a warning message.\n\n The function sets the module's `errored` attribute to True and logs a warning with the optional message.\n It also clears the incoming event queue to prevent further processing and updates its status to False.\n\n Args:\n message (str, optional): Additional message to be logged along with the warning.\n\n Returns:\n None: The function doesn't return anything but updates the `errored` state and clears the incoming event queue.\n\n Examples:\n >>> self.set_error_state()\n >>> self.set_error_state(\"Failed to connect to the server\")\n\n Notes:\n - The function sets `self._incoming_event_queue` to False to prevent its further use.\n - If the module was already in an errored state, the function will not reset the error state or the queue.\n \"\"\"\n if not self.errored:\n log_msg = \"Setting error state\"\n if message is not None:\n log_msg += f\": {message}\"\n if critical:\n log_fn = self.error\n else:\n log_fn = self.warning\n log_fn(log_msg)\n self.errored = True\n # clear incoming queue\n if self.incoming_event_queue is not False:\n self.debug(f\"Emptying event_queue\")\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.incoming_event_queue.get_nowait()\n # set queue to None to prevent its use\n # if there are leftover objects in the queue, the scan will hang.\n self._incoming_event_queue = False\n\n if clear_outgoing_queue:\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.outgoing_event_queue.get_nowait()\n\n def is_incoming_duplicate(self, event, add=False):\n if event.type in (\"FINISHED\",):\n return False, \"\"\n reason = \"\"\n try:\n event_hash = self._incoming_dedup_hash(event)\n except Exception as e:\n msg = f\"Unhandled exception in {self.name}._incoming_dedup_hash({event}): {e}\"\n self.error(msg)\n return True, msg\n with suppress(TypeError, ValueError):\n event_hash, reason = event_hash\n is_dup = event_hash in self._incoming_dup_tracker\n if add:\n self._incoming_dup_tracker.add(event_hash)\n return is_dup, reason\n\n def _incoming_dedup_hash(self, event):\n \"\"\"\n Determines the criteria for what is considered to be a duplicate event if `accept_dupes` is False.\n \"\"\"\n if self.per_host_only:\n return self.get_per_host_hash(event), \"per_host_only=True\"\n if self.per_hostport_only:\n return self.get_per_hostport_hash(event), \"per_hostport_only=True\"\n elif self.per_domain_only:\n return self.get_per_domain_hash(event), \"per_domain_only=True\"\n return hash(event), \"\"\n\n def _outgoing_dedup_hash(self, event):\n \"\"\"\n Determines the criteria for what is considered to be a duplicate event if `suppress_dupes` is True.\n \"\"\"\n return hash((event, self.name))\n\n def get_per_host_hash(self, event):\n \"\"\"\n Computes a per-host hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host` to create a string to be hashed.\n\n Args:\n event (Event): The event object containing host information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_host_hash(event)\n \"\"\"\n return hash(event.host)\n\n def get_per_hostport_hash(self, event):\n \"\"\"\n Computes a per-host:port hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host`, `port`, and `scheme` (for URLs) to create a string to be hashed.\n The hash value is used for distinguishing events related to the same host.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_hostport_hash(event)\n \"\"\"\n parsed = getattr(event, \"parsed_url\", None)\n if parsed is None:\n to_hash = self.helpers.make_netloc(event.host, event.port)\n else:\n to_hash = f\"{parsed.scheme}://{parsed.netloc}/\"\n return hash(to_hash)\n\n def get_per_domain_hash(self, event):\n \"\"\"\n Computes a per-domain hash value for a given event. This method may be optionally overridden in subclasses.\n\n Events with the same root domain will receive the same hash value.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the domain.\n\n Examples:\n >>> event = self.make_event(\"https://www.example.com:8443\")\n >>> self.get_per_domain_hash(event)\n \"\"\"\n _, domain = self.helpers.split_domain(event.host)\n return hash(domain)\n\n @property\n def name(self):\n return str(self._name)\n\n @property\n def helpers(self):\n return self.scan.helpers\n\n @property\n def status(self):\n \"\"\"\n Provides the current status of the module as a dictionary.\n\n The dictionary contains the following keys:\n - 'events': A sub-dictionary with 'incoming' and 'outgoing' keys, representing the number of events in the respective queues.\n - 'tasks': The current value of the task counter.\n - 'errored': A boolean value indicating if the module is in an error state.\n - 'running': A boolean value indicating if the module is currently processing data.\n\n Returns:\n dict: A dictionary containing the current status of the module.\n\n Examples:\n >>> self.status\n {'events': {'incoming': 5, 'outgoing': 2}, 'tasks': 3, 'errored': False, 'running': True}\n \"\"\"\n status = {\n \"events\": {\"incoming\": self.num_incoming_events, \"outgoing\": self.outgoing_event_queue.qsize()},\n \"tasks\": self._task_counter.value,\n \"errored\": self.errored,\n }\n status[\"running\"] = self.running\n return status\n\n @property\n def running(self):\n \"\"\"Property indicating whether the module is currently processing data.\n\n This property checks if the task counter (`self._task_counter.value`) is greater than zero,\n indicating that there are ongoing tasks in the module.\n\n Returns:\n bool: True if the module is currently processing data, False otherwise.\n \"\"\"\n return self._task_counter.value > 0\n\n @property\n def finished(self):\n \"\"\"Property indicating whether the module has finished processing.\n\n This property checks three conditions to determine if the module is finished:\n 1. The module is not currently running (`self.running` is False).\n 2. The number of incoming events in the queue is zero or less (`self.num_incoming_events <= 0`).\n 3. The number of outgoing events in the queue is zero or less (`self.outgoing_event_queue.qsize() <= 0`).\n\n Returns:\n bool: True if the module has finished processing, False otherwise.\n \"\"\"\n return not self.running and self.num_incoming_events <= 0 and self.outgoing_event_queue.qsize() <= 0\n\n async def run_process(self, *args, **kwargs):\n kwargs[\"_proc_tracker\"] = self._proc_tracker\n return await self.helpers.run(*args, **kwargs)\n\n async def run_process_live(self, *args, **kwargs):\n kwargs[\"_proc_tracker\"] = self._proc_tracker\n async for line in self.helpers.run_live(*args, **kwargs):\n yield line\n\n async def request_with_fail_count(self, *args, **kwargs):\n \"\"\"Asynchronously perform an HTTP request while keeping track of consecutive failures.\n\n This function wraps the `self.helpers.request` method, incrementing a failure counter if\n the request returns None. When the failure counter exceeds `self.failed_request_abort_threshold`,\n the module is set to an error state.\n\n Args:\n *args: Positional arguments to pass to `self.helpers.request`.\n **kwargs: Keyword arguments to pass to `self.helpers.request`.\n\n Returns:\n Any: The response object or None if the request failed.\n\n Raises:\n None: Sets the module to an error state when the failure threshold is reached.\n \"\"\"\n r = await self.helpers.request(*args, **kwargs)\n if r is None:\n self._request_failures += 1\n else:\n self._request_failures = 0\n if self._request_failures >= self.failed_request_abort_threshold:\n self.set_error_state(f\"Setting error state due to {self._request_failures:,} failed HTTP requests\")\n return r\n\n @property\n def preset(self):\n return self.scan.preset\n\n @property\n def config(self):\n \"\"\"Property that provides easy access to the module's configuration in the scan's config.\n\n This property serves as a shortcut to retrieve the module-specific configuration from\n `self.scan.config`. If no configuration is found for this module, an empty dictionary is returned.\n\n Returns:\n dict: The configuration dictionary specific to this module.\n \"\"\"\n config = self.scan.config.get(\"modules\", {}).get(self.name, {})\n if config is None:\n config = {}\n return config\n\n @property\n def incoming_event_queue(self):\n if self._incoming_event_queue is None:\n if self._shuffle_incoming_queue:\n self._incoming_event_queue = ShuffleQueue()\n else:\n self._incoming_event_queue = asyncio.Queue()\n return self._incoming_event_queue\n\n @property\n def outgoing_event_queue(self):\n if self._outgoing_event_queue is None:\n self._outgoing_event_queue = ShuffleQueue(self._qsize)\n return self._outgoing_event_queue\n\n @property\n def priority(self):\n \"\"\"\n Gets the priority level of the module as an integer.\n\n The priority level is constrained to be between 1 and 5, inclusive.\n A lower value indicates a higher priority.\n\n Returns:\n int: The priority level of the module, constrained between 1 and 5.\n\n Examples:\n >>> self.priority\n 3\n \"\"\"\n return int(max(1, min(5, self._priority)))\n\n @property\n def auth_required(self):\n return self.meta.get(\"auth_required\", False)\n\n @property\n def http_timeout(self):\n \"\"\"\n Convenience shortcut to `http_timeout` in the config\n \"\"\"\n return self.scan.web_config.get(\"http_timeout\", 10)\n\n @property\n def log(self):\n if getattr(self, \"_log\", None) is None:\n self._log = logging.getLogger(f\"bbot.modules.{self.name}\")\n return self._log\n\n @property\n def memory_usage(self):\n \"\"\"Property that calculates the current memory usage of the module in bytes.\n\n This property uses the `get_size` function to estimate the memory consumption\n of the module object. The depth of the object graph traversal is limited to 3 levels\n to avoid performance issues. Commonly shared objects like `self.scan`, `self.helpers`,\n are excluded from the calculation to prevent double-counting.\n\n Returns:\n int: The estimated memory usage of the module in bytes.\n \"\"\"\n seen = {self.scan, self.helpers, self.log} # noqa\n return get_size(self, max_depth=3, seen=seen)\n\n def __str__(self):\n return self.name\n\n def log_table(self, *args, **kwargs):\n \"\"\"Logs a table to the console and optionally writes it to a file.\n\n This function generates a table using `self.helpers.make_table`, then logs each line\n of the table as an info-level log. If a table_name is provided, it also writes the table to a file.\n\n Args:\n *args: Variable length argument list to be passed to `self.helpers.make_table`.\n **kwargs: Arbitrary keyword arguments. If 'table_name' is specified, the table will be written to a file.\n\n Returns:\n str: The generated table as a string.\n\n Examples:\n >>> self.log_table(['Header1', 'Header2'], [['row1col1', 'row1col2'], ['row2col1', 'row2col2']], table_name=\"my_table\")\n \"\"\"\n table_name = kwargs.pop(\"table_name\", None)\n max_log_entries = kwargs.pop(\"max_log_entries\", None)\n table = self.helpers.make_table(*args, **kwargs)\n lines_logged = 0\n for line in table.splitlines():\n if max_log_entries is not None and lines_logged > max_log_entries:\n break\n self.info(line)\n lines_logged += 1\n if table_name is not None:\n date = self.helpers.make_date()\n filename = self.scan.home / f\"{self.helpers.tagify(table_name)}-table-{date}.txt\"\n with open(filename, \"w\") as f:\n f.write(table)\n self.verbose(f\"Wrote {table_name} to {filename}\")\n return table\n\n def _is_graph_important(self, event):\n return self.preserve_graph and getattr(event, \"_graph_important\", False) and not getattr(event, \"_omit\", False)\n\n @property\n def preserve_graph(self):\n preserve_graph = self.config.get(\"preserve_graph\", None)\n if preserve_graph is None:\n preserve_graph = self._preserve_graph\n return preserve_graph\n\n def debug(self, *args, trace=False, **kwargs):\n \"\"\"Logs debug messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.debug(\"This is a debug message\")\n >>> self.debug(\"This is a debug message with a trace\", trace=True)\n \"\"\"\n self.log.debug(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def verbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.verbose(\"This is a verbose message\")\n >>> self.verbose(\"This is a verbose message with a trace\", trace=True)\n \"\"\"\n self.log.verbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeverbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened white text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeverbose(\"This is a huge verbose message\")\n >>> self.hugeverbose(\"This is a huge verbose message with a trace\", trace=True)\n \"\"\"\n self.log.hugeverbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def info(self, *args, trace=False, **kwargs):\n \"\"\"Logs informational messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.info(\"This is an informational message\")\n >>> self.info(\"This is an informational message with a trace\", trace=True)\n \"\"\"\n self.log.info(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeinfo(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened blue text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeinfo(\"This is a huge informational message\")\n >>> self.hugeinfo(\"This is a huge informational message with a trace\", trace=True)\n \"\"\"\n self.log.hugeinfo(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def success(self, *args, trace=False, **kwargs):\n \"\"\"Logs a success message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.success(\"Operation completed successfully\")\n >>> self.success(\"Operation completed with a trace\", trace=True)\n \"\"\"\n self.log.success(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugesuccess(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened green text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugesuccess(\"This is a huge success message\")\n >>> self.hugesuccess(\"This is a huge success message with a trace\", trace=True)\n \"\"\"\n self.log.hugesuccess(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def warning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a warning message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.warning(\"This is a warning message\")\n >>> self.warning(\"This is a warning message with a trace\", trace=False)\n \"\"\"\n self.log.warning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugewarning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened orange text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugewarning(\"This is a huge warning message\")\n >>> self.hugewarning(\"This is a huge warning message with a trace\", trace=False)\n \"\"\"\n self.log.hugewarning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def error(self, *args, trace=True, **kwargs):\n \"\"\"Logs an error message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.error(\"This is an error message\")\n >>> self.error(\"This is an error message with a trace\", trace=False)\n \"\"\"\n self.log.error(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n\n def trace(self, msg=None):\n \"\"\"Logs the stack trace of the most recently caught exception.\n\n This method captures the type, value, and traceback of the most recent exception and logs it using the trace level. It is typically used for debugging purposes.\n\n Anything logged using this method will always be written to the scan's `debug.log`, even if debugging is not enabled.\n\n Examples:\n >>> try:\n >>> 1 / 0\n >>> except ZeroDivisionError:\n >>> self.trace()\n \"\"\"\n if msg is None:\n e_type, e_val, e_traceback = exc_info()\n if e_type is not None:\n self.log.trace(traceback.format_exc())\n else:\n self.log.trace(msg)\n\n def critical(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened red text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.critical(\"This is a critical message\")\n >>> self.critical(\"This is a critical message with a trace\", trace=False)\n \"\"\"\n self.log.critical(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.auth_secret","title":"auth_secret property
","text":"auth_secret\n
Indicates if the module is properly configured for authentication.
This read-only property should be used to check whether all necessary attributes (e.g., API keys, tokens, etc.) are configured to perform authenticated requests in the module. Commonly used in setup or initialization steps.
Returns:
bool
\u2013 True if the module is properly configured for authentication, otherwise False.
property
","text":"config\n
Property that provides easy access to the module's configuration in the scan's config.
This property serves as a shortcut to retrieve the module-specific configuration from self.scan.config
. If no configuration is found for this module, an empty dictionary is returned.
Returns:
dict
\u2013 The configuration dictionary specific to this module.
property
","text":"finished\n
Property indicating whether the module has finished processing.
This property checks three conditions to determine if the module is finished: 1. The module is not currently running (self.running
is False). 2. The number of incoming events in the queue is zero or less (self.num_incoming_events <= 0
). 3. The number of outgoing events in the queue is zero or less (self.outgoing_event_queue.qsize() <= 0
).
Returns:
bool
\u2013 True if the module has finished processing, False otherwise.
property
","text":"http_timeout\n
Convenience shortcut to http_timeout
in the config
property
","text":"memory_usage\n
Property that calculates the current memory usage of the module in bytes.
This property uses the get_size
function to estimate the memory consumption of the module object. The depth of the object graph traversal is limited to 3 levels to avoid performance issues. Commonly shared objects like self.scan
, self.helpers
, are excluded from the calculation to prevent double-counting.
Returns:
int
\u2013 The estimated memory usage of the module in bytes.
property
","text":"priority\n
Gets the priority level of the module as an integer.
The priority level is constrained to be between 1 and 5, inclusive. A lower value indicates a higher priority.
Returns:
int
\u2013 The priority level of the module, constrained between 1 and 5.
Examples:
>>> self.priority\n3\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.running","title":"running property
","text":"running\n
Property indicating whether the module is currently processing data.
This property checks if the task counter (self._task_counter.value
) is greater than zero, indicating that there are ongoing tasks in the module.
Returns:
bool
\u2013 True if the module is currently processing data, False otherwise.
property
","text":"status\n
Provides the current status of the module as a dictionary.
The dictionary contains the following keysReturns:
dict
\u2013 A dictionary containing the current status of the module.
Examples:
>>> self.status\n{'events': {'incoming': 5, 'outgoing': 2}, 'tasks': 3, 'errored': False, 'running': True}\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.__init__","title":"__init__","text":"__init__(scan)\n
Initializes a module instance.
Parameters:
scan
\u2013 The BBOT scan object associated with this module instance.
Attributes:
scan
\u2013 The scan object associated with this module.
errored
(bool
) \u2013 Whether the module has errored out. Default is False.
bbot/modules/base.py
def __init__(self, scan):\n \"\"\"Initializes a module instance.\n\n Args:\n scan: The BBOT scan object associated with this module instance.\n\n Attributes:\n scan: The scan object associated with this module.\n\n errored (bool): Whether the module has errored out. Default is False.\n \"\"\"\n self.scan = scan\n self.errored = False\n self._log = None\n self._incoming_event_queue = None\n self._outgoing_event_queue = None\n # track incoming events to prevent unwanted duplicates\n self._incoming_dup_tracker = set()\n # tracks which subprocesses are running under this module\n self._proc_tracker = set()\n # seconds since we've submitted a batch\n self._last_submitted_batch = None\n # additional callbacks to be executed alongside self.cleanup()\n self.cleanup_callbacks = []\n self._cleanedup = False\n self._watched_events = None\n\n self._task_counter = TaskCounter()\n\n # string constant\n self._custom_filter_criteria_msg = \"it did not meet custom filter criteria\"\n\n # track number of failures (for .request_with_fail_count())\n self._request_failures = 0\n\n self._tasks = []\n self._event_received = asyncio.Condition()\n self._event_queued = asyncio.Condition()\n\n # used for optional \"per host\" tracking\n self._per_host_tracker = set()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.cleanup","title":"cleanup async
","text":"cleanup()\n
Asynchronously performs final cleanup operations after the scan is complete.
This method can be overridden to implement custom cleanup logic. It is called only once per scan and may not raise events.
Returns:
None
This method is called only once per scan and may not raise events.
Source code inbbot/modules/base.py
async def cleanup(self):\n \"\"\"Asynchronously performs final cleanup operations after the scan is complete.\n\n This method can be overridden to implement custom cleanup logic. It is called only once per scan and may not raise events.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan and may not raise events.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.critical","title":"critical","text":"critical(*args, trace=True, **kwargs)\n
Logs a whole message in emboldened red text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.critical(\"This is a critical message\")\n>>> self.critical(\"This is a critical message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def critical(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened red text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.critical(\"This is a critical message\")\n >>> self.critical(\"This is a critical message with a trace\", trace=False)\n \"\"\"\n self.log.critical(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.debug","title":"debug","text":"debug(*args, trace=False, **kwargs)\n
Logs debug messages and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.debug(\"This is a debug message\")\n>>> self.debug(\"This is a debug message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def debug(self, *args, trace=False, **kwargs):\n \"\"\"Logs debug messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.debug(\"This is a debug message\")\n >>> self.debug(\"This is a debug message with a trace\", trace=True)\n \"\"\"\n self.log.debug(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.emit_event","title":"emit_event async
","text":"emit_event(*args, **kwargs)\n
Emit an event to the event queue and distribute it to interested modules.
This is how modules \"return\" data.
The method first creates an event object by calling self.make_event()
with the provided arguments. Then, the event is queued for outgoing distribution using self.queue_outgoing_event()
.
Parameters:
*args
\u2013 Positional arguments to be passed to self.make_event()
for event creation.
**kwargs
\u2013 Keyword arguments to be passed for event creation or configuration of the emit action.
- on_success_callback: Optional callback function to execute upon successful event emission.\n- abort_if: Optional condition under which the event emission should be aborted.\n- quick: Optional flag to indicate whether the event should be processed quickly.\n
Examples:
>>> await self.emit_event(\"www.evilcorp.com\", parent=event, tags=[\"affiliate\"])\n
>>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n>>> await self.emit_event(new_event)\n
Returns:
None
Raises:
ValidationError
\u2013 If the event cannot be validated (handled in self.make_event()
).
bbot/modules/base.py
async def emit_event(self, *args, **kwargs):\n \"\"\"Emit an event to the event queue and distribute it to interested modules.\n\n This is how modules \"return\" data.\n\n The method first creates an event object by calling `self.make_event()` with the provided arguments.\n Then, the event is queued for outgoing distribution using `self.queue_outgoing_event()`.\n\n Args:\n *args: Positional arguments to be passed to `self.make_event()` for event creation.\n **kwargs: Keyword arguments to be passed for event creation or configuration of the emit action.\n ```markdown\n - on_success_callback: Optional callback function to execute upon successful event emission.\n - abort_if: Optional condition under which the event emission should be aborted.\n - quick: Optional flag to indicate whether the event should be processed quickly.\n ```\n\n Examples:\n >>> await self.emit_event(\"www.evilcorp.com\", parent=event, tags=[\"affiliate\"])\n\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n None\n\n Raises:\n ValidationError: If the event cannot be validated (handled in `self.make_event()`).\n \"\"\"\n event_kwargs = dict(kwargs)\n emit_kwargs = {}\n for o in (\"on_success_callback\", \"abort_if\", \"quick\"):\n v = event_kwargs.pop(o, None)\n if v is not None:\n emit_kwargs[o] = v\n event = self.make_event(*args, **event_kwargs)\n if event:\n await self.queue_outgoing_event(event, **emit_kwargs)\n return event\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.error","title":"error","text":"error(*args, trace=True, **kwargs)\n
Logs an error message, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.error(\"This is an error message\")\n>>> self.error(\"This is an error message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def error(self, *args, trace=True, **kwargs):\n \"\"\"Logs an error message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.error(\"This is an error message\")\n >>> self.error(\"This is an error message with a trace\", trace=False)\n \"\"\"\n self.log.error(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.filter_event","title":"filter_event async
","text":"filter_event(event)\n
Asynchronously filters incoming events based on custom criteria.
Override this method for more granular control over which events are accepted by your module. This method is called automatically before handle_event()
for each incoming event that matches any in watched_events
.
Parameters:
event
(Event
) \u2013 The incoming Event object to be filtered.
Returns:
tuple
\u2013 A 2-tuple where the first value is a bool indicating whether the event should be accepted, and the second value is a string explaining the reason for its acceptance or rejection. By default, returns (True, None)
to indicate acceptance without reason.
This method should be overridden if the module requires custom logic for event filtering.
Source code inbbot/modules/base.py
async def filter_event(self, event):\n \"\"\"Asynchronously filters incoming events based on custom criteria.\n\n Override this method for more granular control over which events are accepted by your module. This method is called automatically before `handle_event()` for each incoming event that matches any in `watched_events`.\n\n Args:\n event (Event): The incoming Event object to be filtered.\n\n Returns:\n tuple: A 2-tuple where the first value is a bool indicating whether the event should be accepted, and the second value is a string explaining the reason for its acceptance or rejection. By default, returns `(True, None)` to indicate acceptance without reason.\n\n Note:\n This method should be overridden if the module requires custom logic for event filtering.\n \"\"\"\n return True\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.finish","title":"finish async
","text":"finish()\n
Asynchronously performs final tasks as the scan nears completion.
This method can be overridden to execute any necessary finalization logic. For example, if the module relies on a word cloud, you might wait for the scan to finish to ensure the word cloud is most complete before running an operation.
Returns:
None
bbot/modules/base.py
async def finish(self):\n \"\"\"Asynchronously performs final tasks as the scan nears completion.\n\n This method can be overridden to execute any necessary finalization logic. For example, if the module relies on a word cloud, you might wait for the scan to finish to ensure the word cloud is most complete before running an operation.\n\n Returns:\n None\n\n Warnings:\n This method may be called multiple times since it can raise events, which may re-trigger the \"finish\" phase of the scan. Optional to override.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_per_domain_hash","title":"get_per_domain_hash","text":"get_per_domain_hash(event)\n
Computes a per-domain hash value for a given event. This method may be optionally overridden in subclasses.
Events with the same root domain will receive the same hash value.
Parameters:
event
(Event
) \u2013 The event object containing host, port, or parsed URL information.
Returns:
int
\u2013 The hash value computed for the domain.
Examples:
>>> event = self.make_event(\"https://www.example.com:8443\")\n>>> self.get_per_domain_hash(event)\n
Source code in bbot/modules/base.py
def get_per_domain_hash(self, event):\n \"\"\"\n Computes a per-domain hash value for a given event. This method may be optionally overridden in subclasses.\n\n Events with the same root domain will receive the same hash value.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the domain.\n\n Examples:\n >>> event = self.make_event(\"https://www.example.com:8443\")\n >>> self.get_per_domain_hash(event)\n \"\"\"\n _, domain = self.helpers.split_domain(event.host)\n return hash(domain)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_per_host_hash","title":"get_per_host_hash","text":"get_per_host_hash(event)\n
Computes a per-host hash value for a given event. This method may be optionally overridden in subclasses.
The function uses the event's host
to create a string to be hashed.
Parameters:
event
(Event
) \u2013 The event object containing host information.
Returns:
int
\u2013 The hash value computed for the host.
Examples:
>>> event = self.make_event(\"https://example.com:8443\")\n>>> self.get_per_host_hash(event)\n
Source code in bbot/modules/base.py
def get_per_host_hash(self, event):\n \"\"\"\n Computes a per-host hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host` to create a string to be hashed.\n\n Args:\n event (Event): The event object containing host information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_host_hash(event)\n \"\"\"\n return hash(event.host)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_per_hostport_hash","title":"get_per_hostport_hash","text":"get_per_hostport_hash(event)\n
Computes a per-host:port hash value for a given event. This method may be optionally overridden in subclasses.
The function uses the event's host
, port
, and scheme
(for URLs) to create a string to be hashed. The hash value is used for distinguishing events related to the same host.
Parameters:
event
(Event
) \u2013 The event object containing host, port, or parsed URL information.
Returns:
int
\u2013 The hash value computed for the host.
Examples:
>>> event = self.make_event(\"https://example.com:8443\")\n>>> self.get_per_hostport_hash(event)\n
Source code in bbot/modules/base.py
def get_per_hostport_hash(self, event):\n \"\"\"\n Computes a per-host:port hash value for a given event. This method may be optionally overridden in subclasses.\n\n The function uses the event's `host`, `port`, and `scheme` (for URLs) to create a string to be hashed.\n The hash value is used for distinguishing events related to the same host.\n\n Args:\n event (Event): The event object containing host, port, or parsed URL information.\n\n Returns:\n int: The hash value computed for the host.\n\n Examples:\n >>> event = self.make_event(\"https://example.com:8443\")\n >>> self.get_per_hostport_hash(event)\n \"\"\"\n parsed = getattr(event, \"parsed_url\", None)\n if parsed is None:\n to_hash = self.helpers.make_netloc(event.host, event.port)\n else:\n to_hash = f\"{parsed.scheme}://{parsed.netloc}/\"\n return hash(to_hash)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.get_watched_events","title":"get_watched_events","text":"get_watched_events()\n
Retrieve the set of events that the module is interested in observing.
Override this method if the set of events the module should watch needs to be determined dynamically, e.g., based on configuration options or other runtime conditions.
Returns:
set
\u2013 The set of event types that this module will handle.
bbot/modules/base.py
def get_watched_events(self):\n \"\"\"Retrieve the set of events that the module is interested in observing.\n\n Override this method if the set of events the module should watch needs to be determined dynamically, e.g., based on configuration options or other runtime conditions.\n\n Returns:\n set: The set of event types that this module will handle.\n \"\"\"\n if self._watched_events is None:\n self._watched_events = set(self.watched_events)\n return self._watched_events\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.handle_batch","title":"handle_batch async
","text":"handle_batch(*events)\n
Handles incoming events in batches for optimized processing.
This method is automatically called when multiple events that match any in watched_events
are encountered and the batch_size
attribute is set to a value greater than 1. Override this method to implement custom batch event-handling logic for your module.
Parameters:
*events
(Event
, default: ()
) \u2013 A variable number of Event objects to be processed in a batch.
This method should be overridden if the batch_size
attribute of the module is set to a value greater than 1.
Returns:
None
bbot/modules/base.py
async def handle_batch(self, *events):\n \"\"\"Handles incoming events in batches for optimized processing.\n\n This method is automatically called when multiple events that match any in `watched_events` are encountered and the `batch_size` attribute is set to a value greater than 1. Override this method to implement custom batch event-handling logic for your module.\n\n Args:\n *events (Event): A variable number of Event objects to be processed in a batch.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to a value greater than 1.\n\n Returns:\n None\n \"\"\"\n pass\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.handle_event","title":"handle_event async
","text":"handle_event(event)\n
Asynchronously handles incoming events that the module is configured to watch.
This method is automatically invoked when an event that matches any in watched_events
is encountered during a scan. Override this method to implement custom event-handling logic for your module.
Parameters:
event
(Event
) \u2013 The event object containing details about the incoming event.
This method should be overridden if the batch_size
attribute of the module is set to 1.
Returns:
None
bbot/modules/base.py
async def handle_event(self, event):\n \"\"\"Asynchronously handles incoming events that the module is configured to watch.\n\n This method is automatically invoked when an event that matches any in `watched_events` is encountered during a scan. Override this method to implement custom event-handling logic for your module.\n\n Args:\n event (Event): The event object containing details about the incoming event.\n\n Note:\n This method should be overridden if the `batch_size` attribute of the module is set to 1.\n\n Returns:\n None\n \"\"\"\n pass\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugeinfo","title":"hugeinfo","text":"hugeinfo(*args, trace=False, **kwargs)\n
Logs a whole message in emboldened blue text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugeinfo(\"This is a huge informational message\")\n>>> self.hugeinfo(\"This is a huge informational message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def hugeinfo(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened blue text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeinfo(\"This is a huge informational message\")\n >>> self.hugeinfo(\"This is a huge informational message with a trace\", trace=True)\n \"\"\"\n self.log.hugeinfo(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugesuccess","title":"hugesuccess","text":"hugesuccess(*args, trace=False, **kwargs)\n
Logs a whole message in emboldened green text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugesuccess(\"This is a huge success message\")\n>>> self.hugesuccess(\"This is a huge success message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def hugesuccess(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened green text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugesuccess(\"This is a huge success message\")\n >>> self.hugesuccess(\"This is a huge success message with a trace\", trace=True)\n \"\"\"\n self.log.hugesuccess(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugeverbose","title":"hugeverbose","text":"hugeverbose(*args, trace=False, **kwargs)\n
Logs a whole message in emboldened white text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugeverbose(\"This is a huge verbose message\")\n>>> self.hugeverbose(\"This is a huge verbose message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def hugeverbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs a whole message in emboldened white text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugeverbose(\"This is a huge verbose message\")\n >>> self.hugeverbose(\"This is a huge verbose message with a trace\", trace=True)\n \"\"\"\n self.log.hugeverbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.hugewarning","title":"hugewarning","text":"hugewarning(*args, trace=True, **kwargs)\n
Logs a whole message in emboldened orange text, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.hugewarning(\"This is a huge warning message\")\n>>> self.hugewarning(\"This is a huge warning message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def hugewarning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a whole message in emboldened orange text, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.hugewarning(\"This is a huge warning message\")\n >>> self.hugewarning(\"This is a huge warning message with a trace\", trace=False)\n \"\"\"\n self.log.hugewarning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.info","title":"info","text":"info(*args, trace=False, **kwargs)\n
Logs informational messages and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.info(\"This is an informational message\")\n>>> self.info(\"This is an informational message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def info(self, *args, trace=False, **kwargs):\n \"\"\"Logs informational messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.info(\"This is an informational message\")\n >>> self.info(\"This is an informational message with a trace\", trace=True)\n \"\"\"\n self.log.info(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.log_table","title":"log_table","text":"log_table(*args, **kwargs)\n
Logs a table to the console and optionally writes it to a file.
This function generates a table using self.helpers.make_table
, then logs each line of the table as an info-level log. If a table_name is provided, it also writes the table to a file.
Parameters:
*args
\u2013 Variable length argument list to be passed to self.helpers.make_table
.
**kwargs
\u2013 Arbitrary keyword arguments. If 'table_name' is specified, the table will be written to a file.
Returns:
str
\u2013 The generated table as a string.
Examples:
>>> self.log_table(['Header1', 'Header2'], [['row1col1', 'row1col2'], ['row2col1', 'row2col2']], table_name=\"my_table\")\n
Source code in bbot/modules/base.py
def log_table(self, *args, **kwargs):\n \"\"\"Logs a table to the console and optionally writes it to a file.\n\n This function generates a table using `self.helpers.make_table`, then logs each line\n of the table as an info-level log. If a table_name is provided, it also writes the table to a file.\n\n Args:\n *args: Variable length argument list to be passed to `self.helpers.make_table`.\n **kwargs: Arbitrary keyword arguments. If 'table_name' is specified, the table will be written to a file.\n\n Returns:\n str: The generated table as a string.\n\n Examples:\n >>> self.log_table(['Header1', 'Header2'], [['row1col1', 'row1col2'], ['row2col1', 'row2col2']], table_name=\"my_table\")\n \"\"\"\n table_name = kwargs.pop(\"table_name\", None)\n max_log_entries = kwargs.pop(\"max_log_entries\", None)\n table = self.helpers.make_table(*args, **kwargs)\n lines_logged = 0\n for line in table.splitlines():\n if max_log_entries is not None and lines_logged > max_log_entries:\n break\n self.info(line)\n lines_logged += 1\n if table_name is not None:\n date = self.helpers.make_date()\n filename = self.scan.home / f\"{self.helpers.tagify(table_name)}-table-{date}.txt\"\n with open(filename, \"w\") as f:\n f.write(table)\n self.verbose(f\"Wrote {table_name} to {filename}\")\n return table\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.make_event","title":"make_event","text":"make_event(*args, **kwargs)\n
Create an event for the scan.
Raises a validation error if the event could not be created, unless raise_error is set to False.
Parameters:
*args
\u2013 Positional arguments to be passed to the scan's make_event method.
**kwargs
\u2013 Keyword arguments to be passed to the scan's make_event method.
raise_error
(bool
) \u2013 Whether to raise a validation error if the event could not be created. Defaults to False.
Examples:
>>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n>>> await self.emit_event(new_event)\n
Returns:
Event or None: The created event, or None if a validation error occurred and raise_error was False.
Raises:
ValidationError
\u2013 If the event could not be validated and raise_error is True.
bbot/modules/base.py
def make_event(self, *args, **kwargs):\n \"\"\"Create an event for the scan.\n\n Raises a validation error if the event could not be created, unless raise_error is set to False.\n\n Args:\n *args: Positional arguments to be passed to the scan's make_event method.\n **kwargs: Keyword arguments to be passed to the scan's make_event method.\n raise_error (bool, optional): Whether to raise a validation error if the event could not be created. Defaults to False.\n\n Examples:\n >>> new_event = self.make_event(\"1.2.3.4\", parent=event)\n >>> await self.emit_event(new_event)\n\n Returns:\n Event or None: The created event, or None if a validation error occurred and raise_error was False.\n\n Raises:\n ValidationError: If the event could not be validated and raise_error is True.\n \"\"\"\n raise_error = kwargs.pop(\"raise_error\", False)\n module = kwargs.pop(\"module\", None)\n if module is None:\n if (not args) or getattr(args[0], \"module\", None) is None:\n kwargs[\"module\"] = self\n try:\n event = self.scan.make_event(*args, **kwargs)\n except ValidationError as e:\n if raise_error:\n raise\n self.warning(f\"{e}\")\n return\n return event\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.ping","title":"ping async
","text":"ping()\n
Asynchronously checks the health of the configured API.
This method is used in conjunction with require_api_key() to verify that the API is not just configured, but also responsive. This method should include an assert statement to validate the API's health, typically by making a test request to a known endpoint.
Example UsageIn your implementation, if the API has a \"/ping\" endpoint: async def ping(self): r = await self.request_with_fail_count(f\"{self.base_url}/ping\") resp_content = getattr(r, \"text\", \"\") assert getattr(r, \"status_code\", 0) == 200, resp_content
Returns:
None
Raises:
AssertionError
\u2013 If the API does not respond as expected.
bbot/modules/base.py
async def ping(self):\n \"\"\"Asynchronously checks the health of the configured API.\n\n This method is used in conjunction with require_api_key() to verify that the API is not just configured, but also responsive. This method should include an assert statement to validate the API's health, typically by making a test request to a known endpoint.\n\n Example Usage:\n In your implementation, if the API has a \"/ping\" endpoint:\n async def ping(self):\n r = await self.request_with_fail_count(f\"{self.base_url}/ping\")\n resp_content = getattr(r, \"text\", \"\")\n assert getattr(r, \"status_code\", 0) == 200, resp_content\n\n Returns:\n None\n\n Raises:\n AssertionError: If the API does not respond as expected.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.queue_event","title":"queue_event async
","text":"queue_event(event)\n
Asynchronously queues an incoming event to the module's event queue for further processing.
The function performs an initial check to see if the event is acceptable for queuing. If the event passes the check, it is put into the incoming_event_queue
.
Parameters:
event
\u2013 The event object to be queued.
Returns:
None
\u2013 The function doesn't return anything but modifies the state of the incoming_event_queue
.
Examples:
>>> await self.queue_event(some_event)\n
Raises:
AttributeError
\u2013 If the module is not in an acceptable state to queue incoming events.
bbot/modules/base.py
async def queue_event(self, event):\n \"\"\"\n Asynchronously queues an incoming event to the module's event queue for further processing.\n\n The function performs an initial check to see if the event is acceptable for queuing.\n If the event passes the check, it is put into the `incoming_event_queue`.\n\n Args:\n event: The event object to be queued.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `incoming_event_queue`.\n\n Examples:\n >>> await self.queue_event(some_event)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue incoming events.\n \"\"\"\n async with self._task_counter.count(\"queue_event()\", _log=False):\n if self.incoming_event_queue is False:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n return\n acceptable, reason = self._event_precheck(event)\n if not acceptable:\n if reason and reason != \"its type is not in watched_events\":\n self.debug(f\"Not queueing {event} because {reason}\")\n return\n else:\n self.debug(f\"Queueing {event} because {reason}\")\n try:\n self.incoming_event_queue.put_nowait(event)\n async with self._event_received:\n self._event_received.notify()\n if event.type != \"FINISHED\":\n self.scan._new_activity = True\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue incoming event\")\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.queue_outgoing_event","title":"queue_outgoing_event async
","text":"queue_outgoing_event(event, **kwargs)\n
Queues an outgoing event to the module's outgoing event queue for further processing.
The function attempts to put the event into the outgoing_event_queue
immediately. If it's not possible due to the current state of the module, an AttributeError is raised, and a debug log is generated.
Parameters:
event
\u2013 The event object to be queued.
**kwargs
\u2013 Additional keyword arguments to be associated with the event.
Returns:
None
\u2013 The function doesn't return anything but modifies the state of the outgoing_event_queue
.
Examples:
>>> self.queue_outgoing_event(some_outgoing_event, abort_if=lambda e: \"unresolved\" in e.tags)\n
Raises:
AttributeError
\u2013 If the module is not in an acceptable state to queue outgoing events.
bbot/modules/base.py
async def queue_outgoing_event(self, event, **kwargs):\n \"\"\"\n Queues an outgoing event to the module's outgoing event queue for further processing.\n\n The function attempts to put the event into the `outgoing_event_queue` immediately.\n If it's not possible due to the current state of the module, an AttributeError is raised, and a debug log is generated.\n\n Args:\n event: The event object to be queued.\n **kwargs: Additional keyword arguments to be associated with the event.\n\n Returns:\n None: The function doesn't return anything but modifies the state of the `outgoing_event_queue`.\n\n Examples:\n >>> self.queue_outgoing_event(some_outgoing_event, abort_if=lambda e: \"unresolved\" in e.tags)\n\n Raises:\n AttributeError: If the module is not in an acceptable state to queue outgoing events.\n \"\"\"\n try:\n await self.outgoing_event_queue.put((event, kwargs))\n except AttributeError:\n self.debug(f\"Not in an acceptable state to queue outgoing event\")\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.report","title":"report async
","text":"report()\n
Asynchronously executes a final task after the scan is complete but before cleanup.
This method can be overridden to aggregate data and raise summary events at the end of the scan.
Returns:
None
This method is called only once per scan.
Source code inbbot/modules/base.py
async def report(self):\n \"\"\"Asynchronously executes a final task after the scan is complete but before cleanup.\n\n This method can be overridden to aggregate data and raise summary events at the end of the scan.\n\n Returns:\n None\n\n Note:\n This method is called only once per scan.\n \"\"\"\n return\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.request_with_fail_count","title":"request_with_fail_count async
","text":"request_with_fail_count(*args, **kwargs)\n
Asynchronously perform an HTTP request while keeping track of consecutive failures.
This function wraps the self.helpers.request
method, incrementing a failure counter if the request returns None. When the failure counter exceeds self.failed_request_abort_threshold
, the module is set to an error state.
Parameters:
*args
\u2013 Positional arguments to pass to self.helpers.request
.
**kwargs
\u2013 Keyword arguments to pass to self.helpers.request
.
Returns:
Any
\u2013 The response object or None if the request failed.
Raises:
None
\u2013 Sets the module to an error state when the failure threshold is reached.
bbot/modules/base.py
async def request_with_fail_count(self, *args, **kwargs):\n \"\"\"Asynchronously perform an HTTP request while keeping track of consecutive failures.\n\n This function wraps the `self.helpers.request` method, incrementing a failure counter if\n the request returns None. When the failure counter exceeds `self.failed_request_abort_threshold`,\n the module is set to an error state.\n\n Args:\n *args: Positional arguments to pass to `self.helpers.request`.\n **kwargs: Keyword arguments to pass to `self.helpers.request`.\n\n Returns:\n Any: The response object or None if the request failed.\n\n Raises:\n None: Sets the module to an error state when the failure threshold is reached.\n \"\"\"\n r = await self.helpers.request(*args, **kwargs)\n if r is None:\n self._request_failures += 1\n else:\n self._request_failures = 0\n if self._request_failures >= self.failed_request_abort_threshold:\n self.set_error_state(f\"Setting error state due to {self._request_failures:,} failed HTTP requests\")\n return r\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.require_api_key","title":"require_api_key async
","text":"require_api_key()\n
Asynchronously checks if an API key is required and valid.
Returns:
bool or tuple: Returns True if API key is valid and ready. Returns a tuple (None, \"error message\") otherwise.
bbot/modules/base.py
async def require_api_key(self):\n \"\"\"\n Asynchronously checks if an API key is required and valid.\n\n Args:\n None\n\n Returns:\n bool or tuple: Returns True if API key is valid and ready.\n Returns a tuple (None, \"error message\") otherwise.\n\n Notes:\n - Fetches the API key from the configuration.\n - Calls the 'ping()' method to test API accessibility.\n - Sets the API key readiness status accordingly.\n \"\"\"\n self.api_key = self.config.get(\"api_key\", \"\")\n if self.auth_secret:\n try:\n await self.ping()\n self.hugesuccess(f\"API is ready\")\n return True\n except Exception as e:\n return None, f\"Error with API ({str(e).strip()})\"\n else:\n return None, \"No API key set\"\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.set_error_state","title":"set_error_state","text":"set_error_state(message=None, clear_outgoing_queue=False, critical=False)\n
Puts the module into an errored state where it cannot accept new events. Optionally logs a warning message.
The function sets the module's errored
attribute to True and logs a warning with the optional message. It also clears the incoming event queue to prevent further processing and updates its status to False.
Parameters:
message
(str
, default: None
) \u2013 Additional message to be logged along with the warning.
Returns:
None
\u2013 The function doesn't return anything but updates the errored
state and clears the incoming event queue.
Examples:
>>> self.set_error_state()\n>>> self.set_error_state(\"Failed to connect to the server\")\n
Notes self._incoming_event_queue
to False to prevent its further use.bbot/modules/base.py
def set_error_state(self, message=None, clear_outgoing_queue=False, critical=False):\n \"\"\"\n Puts the module into an errored state where it cannot accept new events. Optionally logs a warning message.\n\n The function sets the module's `errored` attribute to True and logs a warning with the optional message.\n It also clears the incoming event queue to prevent further processing and updates its status to False.\n\n Args:\n message (str, optional): Additional message to be logged along with the warning.\n\n Returns:\n None: The function doesn't return anything but updates the `errored` state and clears the incoming event queue.\n\n Examples:\n >>> self.set_error_state()\n >>> self.set_error_state(\"Failed to connect to the server\")\n\n Notes:\n - The function sets `self._incoming_event_queue` to False to prevent its further use.\n - If the module was already in an errored state, the function will not reset the error state or the queue.\n \"\"\"\n if not self.errored:\n log_msg = \"Setting error state\"\n if message is not None:\n log_msg += f\": {message}\"\n if critical:\n log_fn = self.error\n else:\n log_fn = self.warning\n log_fn(log_msg)\n self.errored = True\n # clear incoming queue\n if self.incoming_event_queue is not False:\n self.debug(f\"Emptying event_queue\")\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.incoming_event_queue.get_nowait()\n # set queue to None to prevent its use\n # if there are leftover objects in the queue, the scan will hang.\n self._incoming_event_queue = False\n\n if clear_outgoing_queue:\n with suppress(asyncio.queues.QueueEmpty):\n while 1:\n self.outgoing_event_queue.get_nowait()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.setup","title":"setup async
","text":"setup()\n
Performs one-time setup tasks for the module.
This method is responsible for preparing the module for its operation, which may include tasks such as downloading necessary resources, validating configuration parameters, or other preliminary checks.
Returns:
tuple
\u2013 True
if the setup was successful, None
for a soft-fail where the module setup did not succeed but the scan will continue with the module disabled, and False
for a hard-fail where the setup failure causes the scan to abort.None
or False
).Examples:
>>> async def setup(self):\n>>> if not self.config.get(\"api_key\"):\n>>> # Soft-fail: Configuration missing an API key\n>>> return None, \"No API key specified\"\n
>>> async def setup(self):\n>>> try:\n>>> wordlist = await self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n>>> except WordlistError as e:\n>>> # Hard-fail: Error retrieving wordlist\n>>> return False, f\"Error retrieving wordlist: {e}\"\n
>>> async def setup(self):\n>>> self.timeout = self.config.get(\"timeout\", 5)\n>>> # Success: Setup completed without issues\n>>> return True\n
Source code in bbot/modules/base.py
async def setup(self):\n \"\"\"\n Performs one-time setup tasks for the module.\n\n This method is responsible for preparing the module for its operation, which may include tasks\n such as downloading necessary resources, validating configuration parameters, or other preliminary\n checks.\n\n Returns:\n tuple:\n - bool or None: A status indicating the outcome of the setup process. Returns `True` if\n the setup was successful, `None` for a soft-fail where the module setup did not succeed\n but the scan will continue with the module disabled, and `False` for a hard-fail where\n the setup failure causes the scan to abort.\n - str, optional: A reason for the setup failure, provided only when the setup does not\n succeed (i.e., returns `None` or `False`).\n\n Examples:\n >>> async def setup(self):\n >>> if not self.config.get(\"api_key\"):\n >>> # Soft-fail: Configuration missing an API key\n >>> return None, \"No API key specified\"\n\n >>> async def setup(self):\n >>> try:\n >>> wordlist = await self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n >>> except WordlistError as e:\n >>> # Hard-fail: Error retrieving wordlist\n >>> return False, f\"Error retrieving wordlist: {e}\"\n\n >>> async def setup(self):\n >>> self.timeout = self.config.get(\"timeout\", 5)\n >>> # Success: Setup completed without issues\n >>> return True\n \"\"\"\n\n return True\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.success","title":"success","text":"success(*args, trace=False, **kwargs)\n
Logs a success message, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.success(\"Operation completed successfully\")\n>>> self.success(\"Operation completed with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def success(self, *args, trace=False, **kwargs):\n \"\"\"Logs a success message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.success(\"Operation completed successfully\")\n >>> self.success(\"Operation completed with a trace\", trace=True)\n \"\"\"\n self.log.success(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.trace","title":"trace","text":"trace(msg=None)\n
Logs the stack trace of the most recently caught exception.
This method captures the type, value, and traceback of the most recent exception and logs it using the trace level. It is typically used for debugging purposes.
Anything logged using this method will always be written to the scan's debug.log
, even if debugging is not enabled.
Examples:
>>> try:\n>>> 1 / 0\n>>> except ZeroDivisionError:\n>>> self.trace()\n
Source code in bbot/modules/base.py
def trace(self, msg=None):\n \"\"\"Logs the stack trace of the most recently caught exception.\n\n This method captures the type, value, and traceback of the most recent exception and logs it using the trace level. It is typically used for debugging purposes.\n\n Anything logged using this method will always be written to the scan's `debug.log`, even if debugging is not enabled.\n\n Examples:\n >>> try:\n >>> 1 / 0\n >>> except ZeroDivisionError:\n >>> self.trace()\n \"\"\"\n if msg is None:\n e_type, e_val, e_traceback = exc_info()\n if e_type is not None:\n self.log.trace(traceback.format_exc())\n else:\n self.log.trace(msg)\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.verbose","title":"verbose","text":"verbose(*args, trace=False, **kwargs)\n
Logs messages and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: False
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to False.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.verbose(\"This is a verbose message\")\n>>> self.verbose(\"This is a verbose message with a trace\", trace=True)\n
Source code in bbot/modules/base.py
def verbose(self, *args, trace=False, **kwargs):\n \"\"\"Logs messages and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to False.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.verbose(\"This is a verbose message\")\n >>> self.verbose(\"This is a verbose message with a trace\", trace=True)\n \"\"\"\n self.log.verbose(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/basemodule/#bbot.modules.base.BaseModule.warning","title":"warning","text":"warning(*args, trace=True, **kwargs)\n
Logs a warning message, and optionally the stack trace of the most recent exception.
Parameters:
*args
\u2013 Variable-length argument list to pass to the logger.
trace
(bool
, default: True
) \u2013 Whether to log the stack trace of the most recently caught exception. Defaults to True.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the logger.
Examples:
>>> self.warning(\"This is a warning message\")\n>>> self.warning(\"This is a warning message with a trace\", trace=False)\n
Source code in bbot/modules/base.py
def warning(self, *args, trace=True, **kwargs):\n \"\"\"Logs a warning message, and optionally the stack trace of the most recent exception.\n\n Args:\n *args: Variable-length argument list to pass to the logger.\n trace (bool, optional): Whether to log the stack trace of the most recently caught exception. Defaults to True.\n **kwargs: Arbitrary keyword arguments to pass to the logger.\n\n Examples:\n >>> self.warning(\"This is a warning message\")\n >>> self.warning(\"This is a warning message with a trace\", trace=False)\n \"\"\"\n self.log.warning(*args, extra={\"scan_id\": self.scan.id}, **kwargs)\n if trace:\n self.trace()\n
"},{"location":"dev/core/","title":"BBOTCore","text":""},{"location":"dev/core/#bbot.core.core.BBOTCore","title":"BBOTCore","text":"This is the first thing that loads when you import BBOT.
Unlike a Preset, BBOTCore holds only the config, not scan-specific stuff like targets, flags, modules, etc.
Its main jobs are:
default
and custom
config (this allows presets to only display the config options that have changed)bbot/core/core.py
class BBOTCore:\n \"\"\"\n This is the first thing that loads when you import BBOT.\n\n Unlike a Preset, BBOTCore holds only the config, not scan-specific stuff like targets, flags, modules, etc.\n\n Its main jobs are:\n\n - set up logging\n - keep separation between the `default` and `custom` config (this allows presets to only display the config options that have changed)\n - allow for easy merging of configs\n - load quickly\n \"\"\"\n\n # used for filtering out sensitive config values\n secrets_strings = [\"api_key\", \"username\", \"password\", \"token\", \"secret\", \"_id\"]\n # don't filter/remove entries under this key\n secrets_exclude_keys = [\"modules\"]\n\n def __init__(self):\n self._logger = None\n self._files_config = None\n\n self.bbot_sudo_pass = None\n\n self._config = None\n self._custom_config = None\n\n # bare minimum == logging\n self.logger\n self.log = logging.getLogger(\"bbot.core\")\n\n import multiprocessing\n\n self.process_name = multiprocessing.current_process().name\n\n @property\n def home(self):\n return Path(self.config[\"home\"]).expanduser().resolve()\n\n @property\n def cache_dir(self):\n return self.home / \"cache\"\n\n @property\n def tools_dir(self):\n return self.home / \"tools\"\n\n @property\n def temp_dir(self):\n return self.home / \"temp\"\n\n @property\n def lib_dir(self):\n return self.home / \"lib\"\n\n @property\n def scans_dir(self):\n return self.home / \"scans\"\n\n @property\n def config(self):\n \"\"\"\n .config is just .default_config + .custom_config merged together\n\n any new values should be added to custom_config.\n \"\"\"\n if self._config is None:\n self._config = OmegaConf.merge(self.default_config, self.custom_config)\n # set read-only flag (change .custom_config instead)\n OmegaConf.set_readonly(self._config, True)\n return self._config\n\n @property\n def default_config(self):\n \"\"\"\n The default BBOT config (from `defaults.yml`). Read-only.\n \"\"\"\n global DEFAULT_CONFIG\n if DEFAULT_CONFIG is None:\n self.default_config = self.files_config.get_default_config()\n # ensure bbot home dir\n if not \"home\" in self.default_config:\n self.default_config[\"home\"] = \"~/.bbot\"\n return DEFAULT_CONFIG\n\n @default_config.setter\n def default_config(self, value):\n # we temporarily clear out the config so it can be refreshed if/when default_config changes\n global DEFAULT_CONFIG\n self._config = None\n DEFAULT_CONFIG = value\n # set read-only flag (change .custom_config instead)\n OmegaConf.set_readonly(DEFAULT_CONFIG, True)\n\n @property\n def custom_config(self):\n \"\"\"\n Custom BBOT config (from `~/.config/bbot/bbot.yml`)\n \"\"\"\n # we temporarily clear out the config so it can be refreshed if/when custom_config changes\n self._config = None\n if self._custom_config is None:\n self.custom_config = self.files_config.get_custom_config()\n return self._custom_config\n\n @custom_config.setter\n def custom_config(self, value):\n # we temporarily clear out the config so it can be refreshed if/when custom_config changes\n self._config = None\n # ensure the modules key is always a dictionary\n modules_entry = value.get(\"modules\", None)\n if modules_entry is not None and not OmegaConf.is_dict(modules_entry):\n value[\"modules\"] = {}\n self._custom_config = value\n\n def no_secrets_config(self, config):\n from .helpers.misc import clean_dict\n\n with suppress(ValueError):\n config = OmegaConf.to_object(config)\n\n return clean_dict(\n config,\n *self.secrets_strings,\n fuzzy=True,\n exclude_keys=self.secrets_exclude_keys,\n )\n\n def secrets_only_config(self, config):\n from .helpers.misc import filter_dict\n\n with suppress(ValueError):\n config = OmegaConf.to_object(config)\n\n return filter_dict(\n config,\n *self.secrets_strings,\n fuzzy=True,\n exclude_keys=self.secrets_exclude_keys,\n )\n\n def merge_custom(self, config):\n \"\"\"\n Merge a config into the custom config.\n \"\"\"\n self.custom_config = OmegaConf.merge(self.custom_config, OmegaConf.create(config))\n\n def merge_default(self, config):\n \"\"\"\n Merge a config into the default config.\n \"\"\"\n self.default_config = OmegaConf.merge(self.default_config, OmegaConf.create(config))\n\n def copy(self):\n \"\"\"\n Return a semi-shallow copy of self. (`custom_config` is copied, but `default_config` stays the same)\n \"\"\"\n core_copy = copy(self)\n core_copy._custom_config = self._custom_config.copy()\n return core_copy\n\n @property\n def files_config(self):\n \"\"\"\n Get the configs from `bbot.yml` and `defaults.yml`\n \"\"\"\n if self._files_config is None:\n from .config import files\n\n self.files = files\n self._files_config = files.BBOTConfigFiles(self)\n return self._files_config\n\n def create_process(self, *args, **kwargs):\n if os.environ.get(\"BBOT_TESTING\", \"\") == \"True\":\n process = self.create_thread(*args, **kwargs)\n else:\n if self.process_name == \"MainProcess\":\n from .helpers.process import BBOTProcess\n\n process = BBOTProcess(*args, **kwargs)\n else:\n raise BBOTError(f\"Tried to start server from process {self.process_name}\")\n process.daemon = True\n return process\n\n def create_thread(self, *args, **kwargs):\n from .helpers.process import BBOTThread\n\n return BBOTThread(*args, **kwargs)\n\n @property\n def logger(self):\n self.config\n if self._logger is None:\n from .config.logger import BBOTLogger\n\n self._logger = BBOTLogger(self)\n return self._logger\n
"},{"location":"dev/core/#bbot.core.core.BBOTCore.config","title":"config property
","text":"config\n
.config is just .default_config + .custom_config merged together
any new values should be added to custom_config.
"},{"location":"dev/core/#bbot.core.core.BBOTCore.custom_config","title":"custom_configproperty
writable
","text":"custom_config\n
Custom BBOT config (from ~/.config/bbot/bbot.yml
)
property
writable
","text":"default_config\n
The default BBOT config (from defaults.yml
). Read-only.
property
","text":"files_config\n
Get the configs from bbot.yml
and defaults.yml
copy()\n
Return a semi-shallow copy of self. (custom_config
is copied, but default_config
stays the same)
bbot/core/core.py
def copy(self):\n \"\"\"\n Return a semi-shallow copy of self. (`custom_config` is copied, but `default_config` stays the same)\n \"\"\"\n core_copy = copy(self)\n core_copy._custom_config = self._custom_config.copy()\n return core_copy\n
"},{"location":"dev/core/#bbot.core.core.BBOTCore.merge_custom","title":"merge_custom","text":"merge_custom(config)\n
Merge a config into the custom config.
Source code inbbot/core/core.py
def merge_custom(self, config):\n \"\"\"\n Merge a config into the custom config.\n \"\"\"\n self.custom_config = OmegaConf.merge(self.custom_config, OmegaConf.create(config))\n
"},{"location":"dev/core/#bbot.core.core.BBOTCore.merge_default","title":"merge_default","text":"merge_default(config)\n
Merge a config into the default config.
Source code inbbot/core/core.py
def merge_default(self, config):\n \"\"\"\n Merge a config into the default config.\n \"\"\"\n self.default_config = OmegaConf.merge(self.default_config, OmegaConf.create(config))\n
"},{"location":"dev/dev_environment/","title":"Setting Up a Dev Environment","text":"The following will show you how to set up a fully functioning python environment for devving on BBOT.
"},{"location":"dev/dev_environment/#installation-poetry","title":"Installation (Poetry)","text":"Poetry is the recommended method of installation if you want to dev on BBOT. To set up a dev environment with Poetry, you can follow these steps:
# clone your forked repo and cd into it\ngit clone git@github.com/<username>/bbot.git\ncd bbot\n\n# install poetry\ncurl -sSL https://install.python-poetry.org | python3 -\n\n# install pip dependencies\npoetry install\n# install pre-commit hooks, etc.\npoetry run pre-commit install\n\n# enter virtual environment\npoetry shell\n\nbbot --help\n
bbot
command.# auto-format code indentation, etc.\nblack .\n\n# run tests\n./bbot/test/run_tests.sh\n
dev
branch of the main BBOT repo.Below is a simple Discord bot designed to run BBOT scans.
examples/discord_bot.pyimport discord\nfrom discord.ext import commands\n\nfrom bbot.scanner import Scanner\nfrom bbot.modules.output.discord import Discord\n\n\nclass BBOTDiscordBot(commands.Cog):\n \"\"\"\n A simple Discord bot capable of running a BBOT scan.\n\n To set up:\n 1. Go to Discord Developer Portal (https://discord.com/developers)\n 2. Create a new application\n 3. Create an invite link for the bot, visit the link to invite it to your server\n - Your Application --> OAuth2 --> URL Generator\n - For Scopes, select \"bot\"\"\n - For Bot Permissions, select:\n - Read Messages/View Channels\n - Send Messages\n 4. Turn on \"Message Content Intent\"\n - Your Application --> Bot --> Privileged Gateway Intents --> Message Content Intent\n 5. Copy your Discord Bot Token and put it at the top this file\n - Your Application --> Bot --> Reset Token\n 6. Run this script\n\n To scan evilcorp.com, you would type:\n\n /scan evilcorp.com\n\n Results will be output to the same channel.\n \"\"\"\n\n def __init__(self):\n self.current_scan = None\n\n @commands.command(name=\"scan\", description=\"Scan a target with BBOT.\")\n async def scan(self, ctx, target: str):\n if self.current_scan is not None:\n self.current_scan.stop()\n await ctx.send(f\"Starting scan against {target}.\")\n\n # creates scan instance\n self.current_scan = Scanner(target, flags=\"subdomain-enum\")\n discord_module = Discord(self.current_scan)\n\n seen = set()\n num_events = 0\n # start scan and iterate through results\n async for event in self.current_scan.async_start():\n if hash(event) in seen:\n continue\n seen.add(hash(event))\n await ctx.send(discord_module.format_message(event))\n num_events += 1\n\n await ctx.send(f\"Finished scan against {target}. {num_events:,} results.\")\n self.current_scan = None\n\n\nif __name__ == \"__main__\":\n intents = discord.Intents.default()\n intents.message_content = True\n bot = commands.Bot(command_prefix=\"/\", intents=intents)\n\n @bot.event\n async def on_ready():\n print(f\"We have logged in as {bot.user}\")\n await bot.add_cog(BBOTDiscordBot())\n\n bot.run(\"DISCORD_BOT_TOKEN_HERE\")\n
"},{"location":"dev/engine/","title":"Engine","text":""},{"location":"dev/engine/#bbot.core.engine.EngineBase","title":"EngineBase","text":"Base Engine class for Server and Client.
An Engine is a simple and lightweight RPC implementation that allows offloading async tasks to a separate process. It leverages ZeroMQ in a ROUTER-DEALER configuration.
BBOT makes use of this by spawning a dedicated engine for DNS and HTTP tasks. This offloads I/O and helps free up the main event loop for other tasks.
To use Engine, you must subclass both EngineClient and EngineServer.
See the respective EngineClient and EngineServer classes for usage examples.
Source code inbbot/core/engine.py
class EngineBase:\n \"\"\"\n Base Engine class for Server and Client.\n\n An Engine is a simple and lightweight RPC implementation that allows offloading async tasks\n to a separate process. It leverages ZeroMQ in a ROUTER-DEALER configuration.\n\n BBOT makes use of this by spawning a dedicated engine for DNS and HTTP tasks.\n This offloads I/O and helps free up the main event loop for other tasks.\n\n To use Engine, you must subclass both EngineClient and EngineServer.\n\n See the respective EngineClient and EngineServer classes for usage examples.\n \"\"\"\n\n ERROR_CLASS = BBOTEngineError\n\n def __init__(self, debug=False):\n self._shutdown_status = False\n self.log = logging.getLogger(f\"bbot.core.{self.__class__.__name__.lower()}\")\n self._debug = debug\n\n def pickle(self, obj):\n try:\n return pickle.dumps(obj)\n except Exception as e:\n self.log.error(f\"Error serializing object: {obj}: {e}\")\n self.log.trace(traceback.format_exc())\n return error_sentinel\n\n def unpickle(self, binary):\n try:\n return pickle.loads(binary)\n except Exception as e:\n self.log.error(f\"Error deserializing binary: {e}\")\n self.log.trace(f\"Offending binary: {binary}\")\n self.log.trace(traceback.format_exc())\n return error_sentinel\n\n async def _infinite_retry(self, callback, *args, **kwargs):\n interval = kwargs.pop(\"_interval\", 15)\n context = kwargs.pop(\"_context\", \"\")\n # default overall timeout of 5 minutes (15 second interval * 20 iterations)\n max_retries = kwargs.pop(\"_max_retries\", 4 * 5)\n if not context:\n context = f\"{callback.__name__}({args}, {kwargs})\"\n retries = 0\n while not self._shutdown_status:\n try:\n return await asyncio.wait_for(callback(*args, **kwargs), timeout=interval)\n except (TimeoutError, asyncio.exceptions.TimeoutError):\n self.log.debug(f\"{self.name}: Timeout after {interval:,} seconds{context}, retrying...\")\n retries += 1\n if max_retries is not None and retries > max_retries:\n raise TimeoutError(f\"Timed out after {max_retries*interval:,} seconds {context}\")\n\n def debug(self, *args, **kwargs):\n if self._debug:\n self.log.debug(*args, **kwargs)\n
"},{"location":"dev/engine/#bbot.core.engine.EngineClient","title":"EngineClient","text":" Bases: EngineBase
The client portion of BBOT's RPC Engine.
To create an engine, you must create a subclass of this class and also define methods for each of your desired functions.
Note that this only supports async functions. If you need to offload a synchronous function to another CPU, use BBOT's multiprocessing pool instead.
Any CPU or I/O intense logic should be implemented in the EngineServer.
These functions are typically stubs whose only job is to forward the arguments to the server.
Functions with the same names should be defined on the EngineServer.
The EngineClient must specify its associated server class via the SERVER_CLASS
variable.
Depending on whether your function is a generator, you will use either run_and_return()
, or run_and_yield
.
Examples:
>>> from bbot.core.engine import EngineClient\n>>>\n>>> class MyClient(EngineClient):\n>>> SERVER_CLASS = MyServer\n>>>\n>>> async def my_function(self, **kwargs)\n>>> return await self.run_and_return(\"my_function\", **kwargs)\n>>>\n>>> async def my_generator(self, **kwargs):\n>>> async for _ in self.run_and_yield(\"my_generator\", **kwargs):\n>>> yield _\n
Source code in bbot/core/engine.py
class EngineClient(EngineBase):\n \"\"\"\n The client portion of BBOT's RPC Engine.\n\n To create an engine, you must create a subclass of this class and also\n define methods for each of your desired functions.\n\n Note that this only supports async functions. If you need to offload a synchronous function to another CPU, use BBOT's multiprocessing pool instead.\n\n Any CPU or I/O intense logic should be implemented in the EngineServer.\n\n These functions are typically stubs whose only job is to forward the arguments to the server.\n\n Functions with the same names should be defined on the EngineServer.\n\n The EngineClient must specify its associated server class via the `SERVER_CLASS` variable.\n\n Depending on whether your function is a generator, you will use either `run_and_return()`, or `run_and_yield`.\n\n Examples:\n >>> from bbot.core.engine import EngineClient\n >>>\n >>> class MyClient(EngineClient):\n >>> SERVER_CLASS = MyServer\n >>>\n >>> async def my_function(self, **kwargs)\n >>> return await self.run_and_return(\"my_function\", **kwargs)\n >>>\n >>> async def my_generator(self, **kwargs):\n >>> async for _ in self.run_and_yield(\"my_generator\", **kwargs):\n >>> yield _\n \"\"\"\n\n SERVER_CLASS = None\n\n def __init__(self, debug=False, **kwargs):\n self.name = f\"EngineClient {self.__class__.__name__}\"\n super().__init__(debug=debug)\n self.process = None\n if self.SERVER_CLASS is None:\n raise ValueError(f\"Must set EngineClient SERVER_CLASS, {self.SERVER_CLASS}\")\n self.CMDS = dict(self.SERVER_CLASS.CMDS)\n for k, v in list(self.CMDS.items()):\n self.CMDS[v] = k\n self.socket_address = f\"zmq_{rand_string(8)}.sock\"\n self.socket_path = Path(tempfile.gettempdir()) / self.socket_address\n self.server_kwargs = kwargs.pop(\"server_kwargs\", {})\n self._server_process = None\n self.context = zmq.asyncio.Context()\n self.context.setsockopt(zmq.LINGER, 0)\n self.sockets = set()\n\n def check_error(self, message):\n if isinstance(message, dict) and len(message) == 1 and \"_e\" in message:\n error, trace = message[\"_e\"]\n error = self.ERROR_CLASS(error)\n error.engine_traceback = trace\n raise error\n return False\n\n async def run_and_return(self, command, *args, **kwargs):\n fn_str = f\"{command}({args}, {kwargs})\"\n self.debug(f\"{self.name}: executing run-and-return {fn_str}\")\n if self._shutdown_status and not command == \"_shutdown\":\n self.log.verbose(f\"{self.name} has been shut down and is not accepting new tasks\")\n return\n async with self.new_socket() as socket:\n try:\n message = self.make_message(command, args=args, kwargs=kwargs)\n if message is error_sentinel:\n return\n await socket.send(message)\n binary = await self._infinite_retry(socket.recv, _context=f\"waiting for return value from {fn_str}\")\n except BaseException:\n try:\n await self.send_cancel_message(socket, fn_str)\n except Exception:\n self.log.debug(f\"{self.name}: {fn_str} failed to send cancel message after exception\")\n self.log.trace(traceback.format_exc())\n raise\n # self.log.debug(f\"{self.name}.{command}({kwargs}) got binary: {binary}\")\n message = self.unpickle(binary)\n self.debug(f\"{self.name}: {fn_str} got return value: {message}\")\n # error handling\n if self.check_error(message):\n return\n return message\n\n async def run_and_yield(self, command, *args, **kwargs):\n fn_str = f\"{command}({args}, {kwargs})\"\n self.debug(f\"{self.name}: executing run-and-yield {fn_str}\")\n if self._shutdown_status:\n self.log.verbose(\"Engine has been shut down and is not accepting new tasks\")\n return\n message = self.make_message(command, args=args, kwargs=kwargs)\n if message is error_sentinel:\n return\n async with self.new_socket() as socket:\n # TODO: synchronize server-side generator by limiting qsize\n # socket.setsockopt(zmq.RCVHWM, 1)\n # socket.setsockopt(zmq.SNDHWM, 1)\n await socket.send(message)\n while 1:\n try:\n binary = await self._infinite_retry(\n socket.recv, _context=f\"waiting for new iteration from {fn_str}\"\n )\n # self.log.debug(f\"{self.name}.{command}({kwargs}) got binary: {binary}\")\n message = self.unpickle(binary)\n self.debug(f\"{self.name}: {fn_str} got iteration: {message}\")\n # error handling\n if self.check_error(message) or self.check_stop(message):\n break\n yield message\n except (StopAsyncIteration, GeneratorExit) as e:\n exc_name = e.__class__.__name__\n self.debug(f\"{self.name}.{command} got {exc_name}\")\n try:\n await self.send_cancel_message(socket, fn_str)\n except Exception:\n self.debug(f\"{self.name}.{command} failed to send cancel message after {exc_name}\")\n self.log.trace(traceback.format_exc())\n break\n\n async def send_cancel_message(self, socket, context):\n \"\"\"\n Send a cancel message and wait for confirmation from the server\n \"\"\"\n # -1 == special \"cancel\" signal\n message = pickle.dumps({\"c\": -1})\n await self._infinite_retry(socket.send, message)\n while 1:\n response = await self._infinite_retry(\n socket.recv, _context=f\"waiting for CANCEL_OK from {context}\", _max_retries=4\n )\n response = pickle.loads(response)\n if isinstance(response, dict):\n response = response.get(\"m\", \"\")\n if response == \"CANCEL_OK\":\n break\n\n async def send_shutdown_message(self):\n async with self.new_socket() as socket:\n # -99 == special shutdown message\n message = pickle.dumps({\"c\": -99})\n with suppress(TimeoutError, asyncio.exceptions.TimeoutError):\n await asyncio.wait_for(socket.send(message), 0.5)\n with suppress(TimeoutError, asyncio.exceptions.TimeoutError):\n while 1:\n response = await asyncio.wait_for(socket.recv(), 0.5)\n response = pickle.loads(response)\n if isinstance(response, dict):\n response = response.get(\"m\", \"\")\n if response == \"SHUTDOWN_OK\":\n break\n\n def check_stop(self, message):\n if isinstance(message, dict) and len(message) == 1 and \"_s\" in message:\n return True\n return False\n\n def make_message(self, command, args=None, kwargs=None):\n try:\n cmd_id = self.CMDS[command]\n except KeyError:\n raise KeyError(f'Command \"{command}\" not found. Available commands: {\",\".join(self.available_commands)}')\n message = {\"c\": cmd_id}\n if args:\n message[\"a\"] = args\n if kwargs:\n message[\"k\"] = kwargs\n return pickle.dumps(message)\n\n @property\n def available_commands(self):\n return [s for s in self.CMDS if isinstance(s, str)]\n\n def start_server(self):\n import multiprocessing\n\n process_name = multiprocessing.current_process().name\n if process_name == \"MainProcess\":\n kwargs = dict(self.server_kwargs)\n # if we're in tests, we use a single event loop to avoid weird race conditions\n # this allows us to more easily mock http, etc.\n if os.environ.get(\"BBOT_TESTING\", \"\") == \"True\":\n kwargs[\"_loop\"] = get_event_loop()\n kwargs[\"debug\"] = self._debug\n self.process = CORE.create_process(\n target=self.server_process,\n args=(\n self.SERVER_CLASS,\n self.socket_path,\n ),\n kwargs=kwargs,\n custom_name=f\"BBOT {self.__class__.__name__}\",\n )\n self.process.start()\n return self.process\n else:\n raise BBOTEngineError(\n f\"Tried to start server from process {process_name}. Did you forget \\\"if __name__ == '__main__'?\\\"\"\n )\n\n @staticmethod\n def server_process(server_class, socket_path, **kwargs):\n try:\n loop = kwargs.pop(\"_loop\", None)\n engine_server = server_class(socket_path, **kwargs)\n if loop is not None:\n future = asyncio.run_coroutine_threadsafe(engine_server.worker(), loop)\n future.result()\n else:\n asyncio.run(engine_server.worker())\n except (asyncio.CancelledError, KeyboardInterrupt, CancelledError):\n return\n except Exception:\n import traceback\n\n log = logging.getLogger(\"bbot.core.engine.server\")\n log.critical(f\"Unhandled error in {server_class.__name__} server process: {traceback.format_exc()}\")\n\n @asynccontextmanager\n async def new_socket(self):\n if self._server_process is None:\n self._server_process = self.start_server()\n while not self.socket_path.exists():\n self.debug(f\"{self.name}: waiting for server process to start...\")\n await asyncio.sleep(0.1)\n socket = self.context.socket(zmq.DEALER)\n socket.setsockopt(zmq.LINGER, 0)\n socket.connect(f\"ipc://{self.socket_path}\")\n self.sockets.add(socket)\n try:\n yield socket\n finally:\n self.sockets.remove(socket)\n with suppress(Exception):\n socket.close()\n\n async def shutdown(self):\n if not self._shutdown_status:\n self._shutdown_status = True\n self.log.verbose(f\"{self.name}: shutting down...\")\n # send shutdown signal\n await self.send_shutdown_message()\n # then terminate context\n try:\n self.context.destroy(linger=0)\n except Exception:\n print(traceback.format_exc(), file=sys.stderr)\n try:\n self.context.term()\n except Exception:\n print(traceback.format_exc(), file=sys.stderr)\n # delete socket file on exit\n self.socket_path.unlink(missing_ok=True)\n
"},{"location":"dev/engine/#bbot.core.engine.EngineClient.send_cancel_message","title":"send_cancel_message async
","text":"send_cancel_message(socket, context)\n
Send a cancel message and wait for confirmation from the server
Source code inbbot/core/engine.py
async def send_cancel_message(self, socket, context):\n \"\"\"\n Send a cancel message and wait for confirmation from the server\n \"\"\"\n # -1 == special \"cancel\" signal\n message = pickle.dumps({\"c\": -1})\n await self._infinite_retry(socket.send, message)\n while 1:\n response = await self._infinite_retry(\n socket.recv, _context=f\"waiting for CANCEL_OK from {context}\", _max_retries=4\n )\n response = pickle.loads(response)\n if isinstance(response, dict):\n response = response.get(\"m\", \"\")\n if response == \"CANCEL_OK\":\n break\n
"},{"location":"dev/engine/#bbot.core.engine.EngineServer","title":"EngineServer","text":" Bases: EngineBase
The server portion of BBOT's RPC Engine.
Methods defined here must match the methods in your EngineClient.
To use the functions, you must create mappings for them in the CMDS attribute, as shown below.
Examples:
>>> from bbot.core.engine import EngineServer\n>>>\n>>> class MyServer(EngineServer):\n>>> CMDS = {\n>>> 0: \"my_function\",\n>>> 1: \"my_generator\",\n>>> }\n>>>\n>>> def my_function(self, arg1=None):\n>>> await asyncio.sleep(1)\n>>> return str(arg1)\n>>>\n>>> def my_generator(self):\n>>> for i in range(10):\n>>> await asyncio.sleep(1)\n>>> yield i\n
Source code in bbot/core/engine.py
class EngineServer(EngineBase):\n \"\"\"\n The server portion of BBOT's RPC Engine.\n\n Methods defined here must match the methods in your EngineClient.\n\n To use the functions, you must create mappings for them in the CMDS attribute, as shown below.\n\n Examples:\n >>> from bbot.core.engine import EngineServer\n >>>\n >>> class MyServer(EngineServer):\n >>> CMDS = {\n >>> 0: \"my_function\",\n >>> 1: \"my_generator\",\n >>> }\n >>>\n >>> def my_function(self, arg1=None):\n >>> await asyncio.sleep(1)\n >>> return str(arg1)\n >>>\n >>> def my_generator(self):\n >>> for i in range(10):\n >>> await asyncio.sleep(1)\n >>> yield i\n \"\"\"\n\n CMDS = {}\n\n def __init__(self, socket_path, debug=False):\n self.name = f\"EngineServer {self.__class__.__name__}\"\n super().__init__(debug=debug)\n self.socket_path = socket_path\n self.client_id_var = contextvars.ContextVar(\"client_id\", default=None)\n # task <--> client id mapping\n self.tasks = {}\n # child tasks spawned by main tasks\n self.child_tasks = {}\n if self.socket_path is not None:\n # create ZeroMQ context\n self.context = zmq.asyncio.Context()\n self.context.setsockopt(zmq.LINGER, 0)\n # ROUTER socket can handle multiple concurrent requests\n self.socket = self.context.socket(zmq.ROUTER)\n self.socket.setsockopt(zmq.LINGER, 0)\n # create socket file\n self.socket.bind(f\"ipc://{self.socket_path}\")\n\n @contextlib.contextmanager\n def client_id_context(self, value):\n token = self.client_id_var.set(value)\n try:\n yield\n finally:\n self.client_id_var.reset(token)\n\n async def run_and_return(self, client_id, command_fn, *args, **kwargs):\n fn_str = f\"{command_fn.__name__}({args}, {kwargs})\"\n with self.client_id_context(client_id):\n try:\n self.debug(f\"{self.name}: run-and-return {fn_str}\")\n result = error_sentinel\n try:\n result = await command_fn(*args, **kwargs)\n except BaseException as e:\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n error = f\"Error in {self.name}.{fn_str}: {e}\"\n self.debug(error)\n trace = traceback.format_exc()\n self.debug(trace)\n result = {\"_e\": (error, trace)}\n finally:\n self.tasks.pop(client_id, None)\n if result is not error_sentinel:\n self.debug(f\"{self.name}: Sending response to {fn_str}: {result}\")\n await self.send_socket_multipart(client_id, result)\n except BaseException as e:\n self.log.critical(\n f\"Unhandled exception in {self.name}.run_and_return({client_id}, {command_fn}, {args}, {kwargs}): {e}\"\n )\n self.log.critical(traceback.format_exc())\n finally:\n self.debug(f\"{self.name} finished run-and-return {command_fn.__name__}({args}, {kwargs})\")\n\n async def run_and_yield(self, client_id, command_fn, *args, **kwargs):\n fn_str = f\"{command_fn.__name__}({args}, {kwargs})\"\n with self.client_id_context(client_id):\n try:\n self.debug(f\"{self.name}: run-and-yield {fn_str}\")\n try:\n async for _ in command_fn(*args, **kwargs):\n self.debug(f\"{self.name}: sending iteration for {command_fn.__name__}(): {_}\")\n await self.send_socket_multipart(client_id, _)\n except BaseException as e:\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n error = f\"Error in {self.name}.{fn_str}: {e}\"\n trace = traceback.format_exc()\n self.debug(error)\n self.debug(trace)\n result = {\"_e\": (error, trace)}\n await self.send_socket_multipart(client_id, result)\n finally:\n self.debug(f\"{self.name} reached end of run-and-yield iteration for {command_fn.__name__}()\")\n # _s == special signal that means StopIteration\n await self.send_socket_multipart(client_id, {\"_s\": None})\n self.tasks.pop(client_id, None)\n except BaseException as e:\n self.log.critical(\n f\"Unhandled exception in {self.name}.run_and_yield({client_id}, {command_fn}, {args}, {kwargs}): {e}\"\n )\n self.log.critical(traceback.format_exc())\n finally:\n self.debug(f\"{self.name} finished run-and-yield {command_fn.__name__}()\")\n\n async def send_socket_multipart(self, client_id, message):\n try:\n message = pickle.dumps(message)\n await self._infinite_retry(self.socket.send_multipart, [client_id, message])\n except Exception as e:\n self.log.verbose(f\"Error sending ZMQ message: {e}\")\n self.log.trace(traceback.format_exc())\n\n def check_error(self, message):\n if message is error_sentinel:\n return True\n\n async def worker(self):\n self.debug(f\"{self.name}: starting worker\")\n try:\n while 1:\n client_id, binary = await self.socket.recv_multipart()\n message = self.unpickle(binary)\n # self.log.debug(f\"{self.name} got message: {message}\")\n if self.check_error(message):\n continue\n\n cmd = message.get(\"c\", None)\n if not isinstance(cmd, int):\n self.log.warning(f\"{self.name}: no command sent in message: {message}\")\n continue\n\n # -1 == cancel task\n if cmd == -1:\n self.debug(f\"{self.name} got cancel signal\")\n await self.send_socket_multipart(client_id, {\"m\": \"CANCEL_OK\"})\n await self.cancel_task(client_id)\n continue\n\n # -99 == shutdown task\n if cmd == -99:\n self.debug(f\"{self.name} got shutdown signal\")\n await self.send_socket_multipart(client_id, {\"m\": \"SHUTDOWN_OK\"})\n await self._shutdown()\n return\n\n args = message.get(\"a\", ())\n if not isinstance(args, tuple):\n self.log.warning(f\"{self.name}: received invalid args of type {type(args)}, should be tuple\")\n continue\n kwargs = message.get(\"k\", {})\n if not isinstance(kwargs, dict):\n self.log.warning(f\"{self.name}: received invalid kwargs of type {type(kwargs)}, should be dict\")\n continue\n\n command_name = self.CMDS[cmd]\n command_fn = getattr(self, command_name, None)\n\n if command_fn is None:\n self.log.warning(f'{self.name} has no function named \"{command_fn}\"')\n continue\n\n if inspect.isasyncgenfunction(command_fn):\n # self.log.debug(f\"{self.name}: creating run-and-yield coroutine for {command_name}()\")\n coroutine = self.run_and_yield(client_id, command_fn, *args, **kwargs)\n else:\n # self.log.debug(f\"{self.name}: creating run-and-return coroutine for {command_name}()\")\n coroutine = self.run_and_return(client_id, command_fn, *args, **kwargs)\n\n # self.log.debug(f\"{self.name}: creating task for {command_name}() coroutine\")\n task = asyncio.create_task(coroutine)\n self.tasks[client_id] = task, command_fn, args, kwargs\n # self.log.debug(f\"{self.name}: finished creating task for {command_name}() coroutine\")\n except BaseException as e:\n await self._shutdown()\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.log.error(f\"{self.name}: error in EngineServer worker: {e}\")\n self.log.trace(traceback.format_exc())\n finally:\n self.debug(f\"{self.name}: finished worker()\")\n\n async def _shutdown(self):\n if not self._shutdown_status:\n self.log.verbose(f\"{self.name}: shutting down...\")\n self._shutdown_status = True\n await self.cancel_all_tasks()\n try:\n self.context.destroy(linger=0)\n except Exception:\n self.log.trace(traceback.format_exc())\n try:\n self.context.term()\n except Exception:\n self.log.trace(traceback.format_exc())\n self.log.verbose(f\"{self.name}: finished shutting down\")\n\n def new_child_task(self, client_id, coro):\n task = asyncio.create_task(coro)\n try:\n self.child_tasks[client_id].add(task)\n except KeyError:\n self.child_tasks[client_id] = {task}\n return task\n\n async def finished_tasks(self, client_id, timeout=None):\n child_tasks = self.child_tasks.get(client_id, set())\n try:\n done, pending = await asyncio.wait(child_tasks, return_when=asyncio.FIRST_COMPLETED, timeout=timeout)\n except BaseException as e:\n if isinstance(e, (TimeoutError, asyncio.exceptions.TimeoutError)):\n done = set()\n self.log.warning(f\"{self.name}: Timeout after {timeout:,} seconds in finished_tasks({child_tasks})\")\n for task in child_tasks:\n task.cancel()\n else:\n if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.log.error(f\"{self.name}: Unhandled exception in finished_tasks({child_tasks}): {e}\")\n self.log.trace(traceback.format_exc())\n raise\n self.child_tasks[client_id] = pending\n return done\n\n async def cancel_task(self, client_id):\n parent_task = self.tasks.pop(client_id, None)\n if parent_task is None:\n return\n parent_task, _cmd, _args, _kwargs = parent_task\n self.debug(f\"{self.name}: Cancelling client id {client_id} (task: {parent_task})\")\n parent_task.cancel()\n child_tasks = self.child_tasks.pop(client_id, set())\n if child_tasks:\n self.debug(f\"{self.name}: Cancelling {len(child_tasks):,} child tasks for client id {client_id}\")\n for child_task in child_tasks:\n child_task.cancel()\n\n for task in [parent_task] + list(child_tasks):\n await self._cancel_task(task)\n\n async def _cancel_task(self, task):\n try:\n await asyncio.wait_for(task, timeout=10)\n except (TimeoutError, asyncio.exceptions.TimeoutError):\n self.log.trace(f\"{self.name}: Timeout cancelling task: {task}\")\n return\n except (KeyboardInterrupt, asyncio.CancelledError):\n return\n except BaseException as e:\n self.log.error(f\"Unhandled error in {task.get_coro().__name__}(): {e}\")\n self.log.trace(traceback.format_exc())\n\n async def cancel_all_tasks(self):\n for client_id in list(self.tasks):\n await self.cancel_task(client_id)\n for client_id, tasks in self.child_tasks.items():\n for task in tasks:\n await self._cancel_task(task)\n
"},{"location":"dev/event/","title":"Event","text":"This is a developer reference. For a high-level description of BBOT events including a full list of event types, see Events
"},{"location":"dev/event/#bbot.core.event.base.make_event","title":"make_event","text":"make_event(data, event_type=None, parent=None, context=None, module=None, scan=None, scans=None, tags=None, confidence=100, dummy=False, internal=None)\n
Creates and returns a new event object or modifies an existing one.
This function serves as a factory for creating new event objects, either by generating a new Event
object or by updating an existing event with additional metadata. If data
is already an event, it updates the event based on the additional parameters provided.
Parameters:
data
(Union[str, dict, BaseEvent]
) \u2013 The primary data for the event or an existing event object.
event_type
(str
, default: None
) \u2013 Type of the event, e.g., 'IP_ADDRESS'. Auto-detected if not provided.
parent
(BaseEvent
, default: None
) \u2013 Parent event leading to this event's discovery.
context
(str
, default: None
) \u2013 Description of circumstances leading to event's discovery.
module
(str
, default: None
) \u2013 Module that discovered the event.
scan
(Scan
, default: None
) \u2013 BBOT Scan object associated with the event.
scans
(List[Scan]
, default: None
) \u2013 Multiple BBOT Scan objects, primarily used for unserialization.
tags
(Union[str, List[str]]
, default: None
) \u2013 Descriptive tags for the event, as a list or a single string.
confidence
(int
, default: 100
) \u2013 Confidence level for the event, on a scale of 1-100. Defaults to 100.
dummy
(bool
, default: False
) \u2013 Disables data validations if set to True. Defaults to False.
internal
(Any
, default: None
) \u2013 Makes the event internal if set to True. Defaults to None.
Returns:
BaseEvent
\u2013 A new or updated event object.
Raises:
ValidationError
\u2013 Raised when there's an error in event data or type sanitization.
Examples:
If inside a module, e.g. from within its handle_event()
:
>>> self.make_event(\"1.2.3.4\", parent=event)\nIP_ADDRESS(\"1.2.3.4\", module=portscan, tags={'ipv4', 'distance-1'})\n
If you're outside a module but you have a scan object:
>>> scan.make_event(\"1.2.3.4\", parent=scan.root_event)\nIP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4', 'distance-1'})\n
If you're outside a scan and just messing around:
>>> from bbot.core.event.base import make_event\n>>> make_event(\"1.2.3.4\", dummy=True)\nIP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4'})\n
Note When working within a module's handle_event()
, use the instance method self.make_event()
instead of calling this function directly.
bbot/core/event/base.py
def make_event(\n data,\n event_type=None,\n parent=None,\n context=None,\n module=None,\n scan=None,\n scans=None,\n tags=None,\n confidence=100,\n dummy=False,\n internal=None,\n):\n \"\"\"\n Creates and returns a new event object or modifies an existing one.\n\n This function serves as a factory for creating new event objects, either by generating a new `Event`\n object or by updating an existing event with additional metadata. If `data` is already an event,\n it updates the event based on the additional parameters provided.\n\n Parameters:\n data (Union[str, dict, BaseEvent]): The primary data for the event or an existing event object.\n event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'. Auto-detected if not provided.\n parent (BaseEvent, optional): Parent event leading to this event's discovery.\n context (str, optional): Description of circumstances leading to event's discovery.\n module (str, optional): Module that discovered the event.\n scan (Scan, optional): BBOT Scan object associated with the event.\n scans (List[Scan], optional): Multiple BBOT Scan objects, primarily used for unserialization.\n tags (Union[str, List[str]], optional): Descriptive tags for the event, as a list or a single string.\n confidence (int, optional): Confidence level for the event, on a scale of 1-100. Defaults to 100.\n dummy (bool, optional): Disables data validations if set to True. Defaults to False.\n internal (Any, optional): Makes the event internal if set to True. Defaults to None.\n\n Returns:\n BaseEvent: A new or updated event object.\n\n Raises:\n ValidationError: Raised when there's an error in event data or type sanitization.\n\n Examples:\n If inside a module, e.g. from within its `handle_event()`:\n >>> self.make_event(\"1.2.3.4\", parent=event)\n IP_ADDRESS(\"1.2.3.4\", module=portscan, tags={'ipv4', 'distance-1'})\n\n If you're outside a module but you have a scan object:\n >>> scan.make_event(\"1.2.3.4\", parent=scan.root_event)\n IP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4', 'distance-1'})\n\n If you're outside a scan and just messing around:\n >>> from bbot.core.event.base import make_event\n >>> make_event(\"1.2.3.4\", dummy=True)\n IP_ADDRESS(\"1.2.3.4\", module=None, tags={'ipv4'})\n\n Note:\n When working within a module's `handle_event()`, use the instance method\n `self.make_event()` instead of calling this function directly.\n \"\"\"\n\n # allow tags to be either a string or an array\n if not tags:\n tags = []\n elif isinstance(tags, str):\n tags = [tags]\n tags = set(tags)\n\n if is_event(data):\n data = copy(data)\n if scan is not None and not data.scan:\n data.scan = scan\n if scans is not None and not data.scans:\n data.scans = scans\n if module is not None:\n data.module = module\n if parent is not None:\n data.parent = parent\n if context is not None:\n data.discovery_context = context\n if internal == True:\n data.internal = True\n if tags:\n data.tags = tags.union(data.tags)\n event_type = data.type\n return data\n else:\n if event_type is None:\n event_type, data = get_event_type(data)\n if not dummy:\n log.debug(f'Autodetected event type \"{event_type}\" based on data: \"{data}\"')\n\n event_type = str(event_type).strip().upper()\n\n # Catch these common whoopsies\n if event_type in (\"DNS_NAME\", \"IP_ADDRESS\"):\n # DNS_NAME <--> EMAIL_ADDRESS confusion\n if validators.soft_validate(data, \"email\"):\n event_type = \"EMAIL_ADDRESS\"\n else:\n # DNS_NAME <--> IP_ADDRESS confusion\n try:\n data = validators.validate_host(data)\n except Exception as e:\n log.trace(traceback.format_exc())\n raise ValidationError(f'Error sanitizing event data \"{data}\" for type \"{event_type}\": {e}')\n data_is_ip = is_ip(data)\n if event_type == \"DNS_NAME\" and data_is_ip:\n event_type = \"IP_ADDRESS\"\n elif event_type == \"IP_ADDRESS\" and not data_is_ip:\n event_type = \"DNS_NAME\"\n # USERNAME <--> EMAIL_ADDRESS confusion\n if event_type == \"USERNAME\" and validators.soft_validate(data, \"email\"):\n event_type = \"EMAIL_ADDRESS\"\n tags.add(\"affiliate\")\n\n event_class = globals().get(event_type, DefaultEvent)\n\n return event_class(\n data,\n event_type=event_type,\n parent=parent,\n context=context,\n module=module,\n scan=scan,\n scans=scans,\n tags=tags,\n confidence=confidence,\n _dummy=dummy,\n _internal=internal,\n )\n
"},{"location":"dev/event/#bbot.core.event.base.event_from_json","title":"event_from_json","text":"event_from_json(j, siem_friendly=False)\n
Creates an event object from a JSON dictionary.
This function deserializes a JSON dictionary to create a new event object, using the make_event
function for the actual object creation. It sets additional attributes such as the timestamp and scope distance based on the input JSON.
Parameters:
j
(Dict
) \u2013 JSON dictionary containing the event attributes. Must include keys \"data\" and \"type\".
Returns:
BaseEvent
\u2013 A new event object initialized with attributes from the JSON dictionary.
Raises:
ValidationError
\u2013 Raised when the JSON dictionary is missing required fields.
The function assumes that the input JSON dictionary is valid and may raise exceptions if required keys are missing. Make sure to validate the JSON input beforehand.
Source code inbbot/core/event/base.py
def event_from_json(j, siem_friendly=False):\n \"\"\"\n Creates an event object from a JSON dictionary.\n\n This function deserializes a JSON dictionary to create a new event object, using the `make_event` function\n for the actual object creation. It sets additional attributes such as the timestamp and scope distance\n based on the input JSON.\n\n Parameters:\n j (Dict): JSON dictionary containing the event attributes.\n Must include keys \"data\" and \"type\".\n\n Returns:\n BaseEvent: A new event object initialized with attributes from the JSON dictionary.\n\n Raises:\n ValidationError: Raised when the JSON dictionary is missing required fields.\n\n Note:\n The function assumes that the input JSON dictionary is valid and may raise exceptions\n if required keys are missing. Make sure to validate the JSON input beforehand.\n \"\"\"\n try:\n event_type = j[\"type\"]\n kwargs = {\n \"event_type\": event_type,\n \"scans\": j.get(\"scans\", []),\n \"tags\": j.get(\"tags\", []),\n \"confidence\": j.get(\"confidence\", 100),\n \"context\": j.get(\"discovery_context\", None),\n \"dummy\": True,\n }\n if siem_friendly:\n data = j[\"data\"][event_type]\n else:\n data = j[\"data\"]\n kwargs[\"data\"] = data\n event = make_event(**kwargs)\n\n resolved_hosts = j.get(\"resolved_hosts\", [])\n event._resolved_hosts = set(resolved_hosts)\n\n event.timestamp = datetime.datetime.fromisoformat(j[\"timestamp\"])\n event.scope_distance = j[\"scope_distance\"]\n parent_id = j.get(\"parent\", None)\n if parent_id is not None:\n event._parent_id = parent_id\n return event\n except KeyError as e:\n raise ValidationError(f\"Event missing required field: {e}\")\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent","title":"BaseEvent","text":"Represents a piece of data discovered during a BBOT scan.
An Event contains various attributes that provide metadata about the discovered data. The attributes assist in understanding the context of the Event and facilitate further filtering and querying. Events are integral in the construction of visual graphs and are the cornerstone of data exchange between BBOT modules.
You can inherit from this class when creating a new event type. However, it's not always necessary. You only need to subclass if you want to layer additional functionality on top of the base class.
Attributes:
type
(str
) \u2013 Specifies the type of the event, e.g., IP_ADDRESS
, DNS_NAME
.
id
(str
) \u2013 A unique identifier for the event.
data
(str or dict
) \u2013 The main data for the event, e.g., a URL or IP address.
data_graph
(str
) \u2013 Representation of self.data
for Neo4j graph nodes.
data_human
(str
) \u2013 Representation of self.data
for human output.
data_id
(str
) \u2013 Representation of self.data
used to calculate the event's ID (and ultimately its hash, which is used for deduplication)
data_json
(str
) \u2013 Representation of self.data
to be used in JSON serialization.
host
(str, IPvXAddress, or IPvXNetwork
) \u2013 The associated IP address or hostname for the event
host_stem
(str
) \u2013 An abbreviated representation of hostname that removes the TLD, e.g. \"www.evilcorp\". Used by the word cloud.
port
(int or None
) \u2013 The port associated with the event, if applicable, else None.
words
(set
) \u2013 A list of relevant keywords extracted from the event. Used by the word cloud.
scope_distance
(int
) \u2013 Indicates how many hops the event is from the main scope; 0 means in-scope.
web_spider_distance
(int
) \u2013 The spider distance from the web root, specific to web crawling.
scan
(Scanner
) \u2013 The scan object that generated the event.
timestamp
(datetime
) \u2013 The time at which the data was discovered.
resolved_hosts
(list of str
) \u2013 List of hosts to which the event data resolves, applicable for URLs and DNS names.
parent
(BaseEvent
) \u2013 The parent event that led to the discovery of this event.
parent_id
(str
) \u2013 The id
attribute of the parent event.
tags
(set of str
) \u2013 Descriptive tags for the event, e.g., mx-record
, in-scope
.
module
(BaseModule
) \u2013 The module that discovered the event.
module_sequence
(str
) \u2013 The sequence of modules that participated in the discovery.
Examples:
{\n \"type\": \"URL\",\n \"id\": \"URL:017ec8e5dc158c0fd46f07169f8577fb4b45e89a\",\n \"data\": \"http://www.blacklanternsecurity.com/\",\n \"web_spider_distance\": 0,\n \"scope_distance\": 0,\n \"scan\": \"SCAN:4d786912dbc97be199da13074699c318e2067a7f\",\n \"timestamp\": 1688526222.723366,\n \"resolved_hosts\": [\"185.199.108.153\"],\n \"parent\": \"OPEN_TCP_PORT:cf7e6a937b161217eaed99f0c566eae045d094c7\",\n \"tags\": [\"in-scope\", \"distance-0\", \"dir\", \"ip-185-199-108-153\", \"status-301\", \"http-title-301-moved-permanently\"],\n \"module\": \"httpx\",\n \"module_sequence\": \"httpx\"\n}\n
Source code in bbot/core/event/base.py
class BaseEvent:\n \"\"\"\n Represents a piece of data discovered during a BBOT scan.\n\n An Event contains various attributes that provide metadata about the discovered data.\n The attributes assist in understanding the context of the Event and facilitate further\n filtering and querying. Events are integral in the construction of visual graphs and\n are the cornerstone of data exchange between BBOT modules.\n\n You can inherit from this class when creating a new event type. However, it's not always\n necessary. You only need to subclass if you want to layer additional functionality on\n top of the base class.\n\n Attributes:\n type (str): Specifies the type of the event, e.g., `IP_ADDRESS`, `DNS_NAME`.\n id (str): A unique identifier for the event.\n data (str or dict): The main data for the event, e.g., a URL or IP address.\n data_graph (str): Representation of `self.data` for Neo4j graph nodes.\n data_human (str): Representation of `self.data` for human output.\n data_id (str): Representation of `self.data` used to calculate the event's ID (and ultimately its hash, which is used for deduplication)\n data_json (str): Representation of `self.data` to be used in JSON serialization.\n host (str, IPvXAddress, or IPvXNetwork): The associated IP address or hostname for the event\n host_stem (str): An abbreviated representation of hostname that removes the TLD, e.g. \"www.evilcorp\". Used by the word cloud.\n port (int or None): The port associated with the event, if applicable, else None.\n words (set): A list of relevant keywords extracted from the event. Used by the word cloud.\n scope_distance (int): Indicates how many hops the event is from the main scope; 0 means in-scope.\n web_spider_distance (int): The spider distance from the web root, specific to web crawling.\n scan (Scanner): The scan object that generated the event.\n timestamp (datetime.datetime): The time at which the data was discovered.\n resolved_hosts (list of str): List of hosts to which the event data resolves, applicable for URLs and DNS names.\n parent (BaseEvent): The parent event that led to the discovery of this event.\n parent_id (str): The `id` attribute of the parent event.\n tags (set of str): Descriptive tags for the event, e.g., `mx-record`, `in-scope`.\n module (BaseModule): The module that discovered the event.\n module_sequence (str): The sequence of modules that participated in the discovery.\n\n Examples:\n ```json\n {\n \"type\": \"URL\",\n \"id\": \"URL:017ec8e5dc158c0fd46f07169f8577fb4b45e89a\",\n \"data\": \"http://www.blacklanternsecurity.com/\",\n \"web_spider_distance\": 0,\n \"scope_distance\": 0,\n \"scan\": \"SCAN:4d786912dbc97be199da13074699c318e2067a7f\",\n \"timestamp\": 1688526222.723366,\n \"resolved_hosts\": [\"185.199.108.153\"],\n \"parent\": \"OPEN_TCP_PORT:cf7e6a937b161217eaed99f0c566eae045d094c7\",\n \"tags\": [\"in-scope\", \"distance-0\", \"dir\", \"ip-185-199-108-153\", \"status-301\", \"http-title-301-moved-permanently\"],\n \"module\": \"httpx\",\n \"module_sequence\": \"httpx\"\n }\n ```\n \"\"\"\n\n # Always emit this event type even if it's not in scope\n _always_emit = False\n # Always emit events with these tags even if they're not in scope\n _always_emit_tags = [\"affiliate\", \"target\"]\n # Bypass scope checking and dns resolution, distribute immediately to modules\n # This is useful for \"end-of-line\" events like FINDING and VULNERABILITY\n _quick_emit = False\n # Whether this event has been retroactively marked as part of an important discovery chain\n _graph_important = False\n # Disables certain data validations\n _dummy = False\n # Data validation, if data is a dictionary\n _data_validator = None\n # Whether to increment scope distance if the child and parent hosts are the same\n _scope_distance_increment_same_host = False\n # Don't allow duplicates to occur within a parent chain\n # In other words, don't emit the event if the same one already exists in its discovery context\n _suppress_chain_dupes = False\n\n def __init__(\n self,\n data,\n event_type,\n parent=None,\n context=None,\n module=None,\n scan=None,\n scans=None,\n tags=None,\n confidence=100,\n timestamp=None,\n _dummy=False,\n _internal=None,\n ):\n \"\"\"\n Initializes an Event object with the given parameters.\n\n In most cases, you should use `make_event()` instead of instantiating this class directly.\n `make_event()` is much friendlier, and can auto-detect the event type for you.\n\n Attributes:\n data (str, dict): The primary data for the event.\n event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'.\n parent (BaseEvent, optional): Parent event that led to this event's discovery. Defaults to None.\n module (str, optional): Module that discovered the event. Defaults to None.\n scan (Scan, optional): BBOT Scan object. Required unless _dummy is True. Defaults to None.\n scans (list of Scan, optional): BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.\n tags (list of str, optional): Descriptive tags for the event. Defaults to None.\n confidence (int, optional): Confidence level for the event, on a scale of 1-100. Defaults to 100.\n timestamp (datetime, optional): Time of event discovery. Defaults to current UTC time.\n _dummy (bool, optional): If True, disables certain data validations. Defaults to False.\n _internal (Any, optional): If specified, makes the event internal. Defaults to None.\n\n Raises:\n ValidationError: If either `scan` or `parent` are not specified and `_dummy` is False.\n \"\"\"\n\n self._id = None\n self._hash = None\n self._data = None\n self.__host = None\n self._tags = set()\n self._port = None\n self._omit = False\n self.__words = None\n self._parent = None\n self._priority = None\n self._parent_id = None\n self._host_original = None\n self._scope_distance = None\n self._module_priority = None\n self._resolved_hosts = set()\n self.dns_children = dict()\n self._discovery_context = \"\"\n self._discovery_context_regex = re.compile(r\"\\{(?:event|module)[^}]*\\}\")\n self.web_spider_distance = 0\n\n # for creating one-off events without enforcing parent requirement\n self._dummy = _dummy\n self.module = module\n self._type = event_type\n\n # keep track of whether this event has been recorded by the scan\n self._stats_recorded = False\n\n if timestamp is not None:\n self.timestamp = timestamp\n else:\n try:\n self.timestamp = datetime.datetime.now(datetime.UTC)\n except AttributeError:\n self.timestamp = datetime.datetime.utcnow()\n\n self.confidence = int(confidence)\n self._internal = False\n\n # self.scan holds the instantiated scan object (for helpers, etc.)\n self.scan = scan\n if (not self.scan) and (not self._dummy):\n raise ValidationError(f\"Must specify scan\")\n # self.scans holds a list of scan IDs from scans that encountered this event\n self.scans = []\n if scans is not None:\n self.scans = scans\n if self.scan:\n self.scans = list(set([self.scan.id] + self.scans))\n\n try:\n self.data = self._sanitize_data(data)\n except Exception as e:\n log.trace(traceback.format_exc())\n raise ValidationError(f'Error sanitizing event data \"{data}\" for type \"{self.type}\": {e}')\n\n if not self.data:\n raise ValidationError(f'Invalid event data \"{data}\" for type \"{self.type}\"')\n\n self.parent = parent\n if (not self.parent) and (not self._dummy):\n raise ValidationError(f\"Must specify event parent\")\n\n if tags is not None:\n for tag in tags:\n self.add_tag(tag)\n\n # internal events are not ingested by output modules\n if not self._dummy:\n # removed this second part because it was making certain sslcert events internal\n if _internal: # or parent._internal:\n self.internal = True\n\n if not context:\n context = getattr(self.module, \"default_discovery_context\", \"\")\n if context:\n self.discovery_context = context\n\n @property\n def data(self):\n return self._data\n\n @property\n def confidence(self):\n return self._confidence\n\n @confidence.setter\n def confidence(self, confidence):\n self._confidence = min(100, max(1, int(confidence)))\n\n @property\n def cumulative_confidence(self):\n \"\"\"\n Considers the confidence of parent events. This is useful for filtering out speculative/unreliable events.\n\n E.g. an event with a confidence of 50 whose parent is also 50 would have a cumulative confidence of 25.\n\n A confidence of 100 will reset the cumulative confidence to 100.\n \"\"\"\n if self._confidence == 100 or self.parent is None or self.parent is self:\n return self._confidence\n return int(self._confidence * self.parent.cumulative_confidence / 100)\n\n @property\n def resolved_hosts(self):\n if is_ip(self.host):\n return {\n self.host,\n }\n return self._resolved_hosts\n\n @data.setter\n def data(self, data):\n self._hash = None\n self._data_hash = None\n self._id = None\n self.__host = None\n self._port = None\n self._data = data\n\n @property\n def internal(self):\n return self._internal\n\n @internal.setter\n def internal(self, value):\n \"\"\"\n Marks the event as internal, excluding it from output but allowing normal exchange between scan modules.\n\n Internal events are typically speculative and may not be interesting by themselves but can lead to\n the discovery of interesting events. This method sets the `_internal` attribute to True and adds the\n \"internal\" tag.\n\n Examples of internal events include `OPEN_TCP_PORT`s from the `speculate` module,\n `IP_ADDRESS`es from the `ipneighbor` module, or out-of-scope `DNS_NAME`s that originate\n from DNS resolutions.\n\n The purpose of internal events is to enable speculative/explorative discovery without cluttering\n the console with irrelevant or uninteresting events.\n \"\"\"\n if not value in (True, False):\n raise ValueError(f'\"internal\" must be boolean, not {type(value)}')\n if value == True:\n self.add_tag(\"internal\")\n else:\n self.remove_tag(\"internal\")\n self._internal = value\n\n @property\n def host(self):\n \"\"\"\n An abbreviated representation of the data that allows comparison with other events.\n For host types, this is a hostname.\n This allows comparison of an email or a URL with a domain, and vice versa\n bob@evilcorp.com --> evilcorp.com\n https://evilcorp.com --> evilcorp.com\n evilcorp.com:80 --> evilcorp.com\n\n For IP_* types, this is an instantiated object representing the event's data\n E.g. for IP_ADDRESS, it could be an ipaddress.IPv4Address() or IPv6Address() object\n \"\"\"\n if self.__host is None:\n self.host = self._host()\n return self.__host\n\n @host.setter\n def host(self, host):\n if self._host_original is None:\n self._host_original = host\n self.__host = host\n\n @property\n def host_original(self):\n \"\"\"\n Original host data, in case it was changed due to a wildcard DNS, etc.\n \"\"\"\n if self._host_original is None:\n return self.host\n return self._host_original\n\n @property\n def closest_host(self):\n \"\"\"\n Walk up the chain of parents events until we hit the first one with a host\n \"\"\"\n if self.host is not None or self.parent is None or self.parent is self:\n return self.host\n return self.parent.closest_host\n\n @property\n def port(self):\n self.host\n if getattr(self, \"parsed_url\", None):\n if self.parsed_url.port is not None:\n return self.parsed_url.port\n elif self.parsed_url.scheme == \"https\":\n return 443\n elif self.parsed_url.scheme == \"http\":\n return 80\n return self._port\n\n @property\n def host_stem(self):\n \"\"\"\n An abbreviated representation of hostname that removes the TLD\n E.g. www.evilcorp.com --> www.evilcorp\n \"\"\"\n if self.host and type(self.host) == str:\n return domain_stem(self.host)\n else:\n return f\"{self.host}\"\n\n @property\n def discovery_context(self):\n return self._discovery_context\n\n @discovery_context.setter\n def discovery_context(self, context):\n def replace(match):\n s = match.group()\n return s.format(module=self.module, event=self)\n\n try:\n self._discovery_context = self._discovery_context_regex.sub(replace, context)\n except Exception as e:\n log.trace(f\"Error formatting discovery context for {self}: {e} (context: '{context}')\")\n self._discovery_context = context\n\n @property\n def discovery_path(self):\n \"\"\"\n This event's full discovery context, including those of all its parents\n \"\"\"\n parent_path = []\n if self.parent is not None and self.parent is not self:\n parent_path = self.parent.discovery_path\n return parent_path + [[self.id, self.discovery_context]]\n\n @property\n def words(self):\n if self.__words is None:\n self.__words = set(self._words())\n return self.__words\n\n def _words(self):\n return set()\n\n @property\n def tags(self):\n return self._tags\n\n @tags.setter\n def tags(self, tags):\n self._tags = set()\n if isinstance(tags, str):\n tags = (tags,)\n for tag in tags:\n self.add_tag(tag)\n\n def add_tag(self, tag):\n self._tags.add(tagify(tag))\n\n def remove_tag(self, tag):\n with suppress(KeyError):\n self._tags.remove(tagify(tag))\n\n @property\n def always_emit(self):\n \"\"\"\n If this returns True, the event will always be distributed to output modules regardless of scope distance\n \"\"\"\n always_emit_tags = any(t in self.tags for t in self._always_emit_tags)\n no_host_information = not bool(self.host)\n return self._always_emit or always_emit_tags or no_host_information\n\n @property\n def quick_emit(self):\n no_host_information = not bool(self.host)\n return self._quick_emit or no_host_information\n\n @property\n def id(self):\n \"\"\"\n A uniquely identifiable hash of the event from the event type + a SHA1 of its data\n \"\"\"\n if self._id is None:\n self._id = f\"{self.type}:{self.data_hash.hex()}\"\n return self._id\n\n @property\n def data_hash(self):\n \"\"\"\n A raw byte hash of the event's data\n \"\"\"\n if self._data_hash is None:\n self._data_hash = sha1(self.data_id).digest()\n return self._data_hash\n\n @property\n def scope_distance(self):\n return self._scope_distance\n\n @scope_distance.setter\n def scope_distance(self, scope_distance):\n \"\"\"\n Setter for the scope_distance attribute, ensuring it only decreases.\n\n The scope_distance attribute is designed to never increase; it can only be set to smaller values than\n the current one. If a larger value is provided, it is ignored. The setter also updates the event's\n tags to reflect the new scope distance.\n\n Parameters:\n scope_distance (int): The new scope distance to set, must be a non-negative integer.\n\n Note:\n The method will automatically update the relevant 'distance-' tags associated with the event.\n \"\"\"\n if scope_distance < 0:\n raise ValueError(f\"Invalid scope distance: {scope_distance}\")\n # ensure scope distance does not increase (only allow setting to smaller values)\n if self.scope_distance is None:\n new_scope_distance = scope_distance\n else:\n new_scope_distance = min(self.scope_distance, scope_distance)\n if self._scope_distance != new_scope_distance:\n # remove old scope distance tags\n for t in list(self.tags):\n if t.startswith(\"distance-\"):\n self.remove_tag(t)\n if scope_distance == 0:\n self.add_tag(\"in-scope\")\n self.remove_tag(\"affiliate\")\n else:\n self.remove_tag(\"in-scope\")\n self.add_tag(f\"distance-{new_scope_distance}\")\n self._scope_distance = new_scope_distance\n # apply recursively to parent events\n parent_scope_distance = getattr(self.parent, \"scope_distance\", None)\n if parent_scope_distance is not None and self.parent is not self:\n self.parent.scope_distance = scope_distance + 1\n\n @property\n def scope_description(self):\n \"\"\"\n Returns a single word describing the scope of the event.\n\n \"in-scope\" if the event is in scope, \"affiliate\" if it's an affiliate, otherwise \"distance-{scope_distance}\"\n \"\"\"\n if self.scope_distance == 0:\n return \"in-scope\"\n elif \"affiliate\" in self.tags:\n return \"affiliate\"\n return f\"distance-{self.scope_distance}\"\n\n @property\n def parent(self):\n return self._parent\n\n @parent.setter\n def parent(self, parent):\n \"\"\"\n Setter for the parent attribute, ensuring it's a valid event and updating scope distance.\n\n Sets the parent of the event and automatically adjusts the scope distance based on the parent event's\n scope distance. The scope distance is incremented by 1 if the host of the parent event is different\n from the current event's host.\n\n Parameters:\n parent (BaseEvent): The new parent event to set. Must be a valid event object.\n\n Note:\n If an invalid parent is provided and the event is not a dummy, a warning will be logged.\n \"\"\"\n if is_event(parent):\n self._parent = parent\n hosts_are_same = (self.host and parent.host) and (self.host == parent.host)\n new_scope_distance = int(parent.scope_distance)\n if self.host and parent.scope_distance is not None:\n # only increment the scope distance if the host changes\n if self._scope_distance_increment_same_host or not hosts_are_same:\n new_scope_distance += 1\n self.scope_distance = new_scope_distance\n # inherit certain tags\n if hosts_are_same:\n # inherit web spider distance from parent\n self.web_spider_distance = getattr(parent, \"web_spider_distance\", 0)\n event_has_url = getattr(self, \"parsed_url\", None) is not None\n for t in parent.tags:\n if t in (\"affiliate\",):\n self.add_tag(t)\n elif t.startswith(\"mutation-\"):\n self.add_tag(t)\n # only add these tags if the event has a URL\n if event_has_url:\n if t in (\"spider-danger\", \"spider-max\"):\n self.add_tag(t)\n elif not self._dummy:\n log.warning(f\"Tried to set invalid parent on {self}: (got: {parent})\")\n\n @property\n def parent_id(self):\n parent_id = getattr(self.get_parent(), \"id\", None)\n if parent_id is not None:\n return parent_id\n return self._parent_id\n\n @property\n def validators(self):\n \"\"\"\n Depending on whether the scan attribute is accessible, return either a config-aware or non-config-aware validator\n\n This exists to prevent a chicken-and-egg scenario during the creation of certain events such as URLs,\n whose sanitization behavior is different depending on the config.\n\n However, thanks to this property, validation can still work in the absence of a config.\n \"\"\"\n if self.scan is not None:\n return self.scan.helpers.config_aware_validators\n return validators\n\n def get_parent(self):\n \"\"\"\n Takes into account events with the _omit flag\n \"\"\"\n if getattr(self.parent, \"_omit\", False):\n return self.parent.get_parent()\n return self.parent\n\n def get_parents(self, omit=False, include_self=False):\n parents = []\n e = self\n if include_self:\n parents.append(self)\n while 1:\n if omit:\n parent = e.get_parent()\n else:\n parent = e.parent\n if parent is None:\n break\n if e == parent:\n break\n parents.append(parent)\n e = parent\n return parents\n\n def _host(self):\n return None\n\n def _sanitize_data(self, data):\n \"\"\"\n Validates and sanitizes the event's data during instantiation.\n\n By default, uses the '_data_load' method to pre-process the data and then applies the '_data_validator'\n to validate and create a sanitized dictionary. Raises a ValidationError if any of the validations fail.\n Subclasses can override this method to provide custom validation logic.\n\n Returns:\n Any: The sanitized data.\n\n Raises:\n ValidationError: If the data fails to validate.\n \"\"\"\n data = self._data_load(data)\n if self._data_validator is not None:\n if not isinstance(data, dict):\n raise ValidationError(f\"data is not of type dict: {data}\")\n data = self._data_validator(**data).model_dump(exclude_none=True)\n return self.sanitize_data(data)\n\n def sanitize_data(self, data):\n return data\n\n @property\n def data_human(self):\n \"\"\"\n Human representation of event.data\n \"\"\"\n return self._data_human()\n\n def _data_human(self):\n if isinstance(self.data, (dict, list)):\n with suppress(Exception):\n return json.dumps(self.data, sort_keys=True)\n return smart_decode(self.data)\n\n def _data_load(self, data):\n \"\"\"\n How to load the event data (JSON-decode it, etc.)\n \"\"\"\n return data\n\n @property\n def data_id(self):\n \"\"\"\n Representation of the event.data used to calculate the event's ID\n \"\"\"\n return self._data_id()\n\n def _data_id(self):\n return self.data\n\n @property\n def pretty_string(self):\n \"\"\"\n A human-friendly representation of the event's data. Used for graph representation.\n\n If the event's data is a dictionary, the function will try to return a JSON-formatted string.\n Otherwise, it will use smart_decode to convert the data into a string representation.\n\n Override if necessary.\n\n Returns:\n str: The graphical representation of the event's data.\n \"\"\"\n return self._pretty_string()\n\n def _pretty_string(self):\n return self._data_human()\n\n @property\n def data_graph(self):\n \"\"\"\n Representation of event.data for neo4j graph nodes\n \"\"\"\n return self.pretty_string\n\n @property\n def data_json(self):\n \"\"\"\n JSON representation of event.data\n \"\"\"\n return self.data\n\n def __contains__(self, other):\n \"\"\"\n Allows events to be compared using the \"in\" operator:\n E.g.:\n if some_event in other_event:\n ...\n \"\"\"\n try:\n other = make_event(other, dummy=True)\n except ValidationError:\n return False\n # if hashes match\n if other == self:\n return True\n # if hosts match\n if self.host and other.host:\n if self.host == other.host:\n return True\n # hostnames and IPs\n radixtarget = RadixTarget()\n radixtarget.insert(self.host)\n return bool(radixtarget.search(other.host))\n return False\n\n def json(self, mode=\"json\", siem_friendly=False):\n \"\"\"\n Serializes the event object to a JSON-compatible dictionary.\n\n By default, it includes attributes such as 'type', 'id', 'data', 'scope_distance', and others that are present.\n Additional specific attributes can be serialized based on the mode specified.\n\n Parameters:\n mode (str): Specifies the data serialization mode. Default is \"json\". Other options include \"graph\", \"human\", and \"id\".\n siem_friendly (bool): Whether to format the JSON in a way that's friendly to SIEM ingestion by Elastic, Splunk, etc. This ensures the value of \"data\" is always the same type (a dictionary).\n\n Returns:\n dict: JSON-serializable dictionary representation of the event object.\n \"\"\"\n # type, ID, scope description\n j = dict()\n for i in (\"type\", \"id\", \"scope_description\"):\n v = getattr(self, i, \"\")\n if v:\n j.update({i: v})\n # event data\n data_attr = getattr(self, f\"data_{mode}\", None)\n if data_attr is not None:\n data = data_attr\n else:\n data = smart_decode(self.data)\n if siem_friendly:\n j[\"data\"] = {self.type: data}\n else:\n j[\"data\"] = data\n # host, dns children\n if self.host:\n j[\"host\"] = str(self.host)\n j[\"resolved_hosts\"] = sorted(str(h) for h in self.resolved_hosts)\n j[\"dns_children\"] = {k: list(v) for k, v in self.dns_children.items()}\n # web spider distance\n web_spider_distance = getattr(self, \"web_spider_distance\", None)\n if web_spider_distance is not None:\n j[\"web_spider_distance\"] = web_spider_distance\n # scope distance\n j[\"scope_distance\"] = self.scope_distance\n # scan\n if self.scan:\n j[\"scan\"] = self.scan.id\n # timestamp\n j[\"timestamp\"] = self.timestamp.isoformat()\n # parent event\n parent_id = self.parent_id\n if parent_id:\n j[\"parent\"] = parent_id\n # tags\n if self.tags:\n j.update({\"tags\": list(self.tags)})\n # parent module\n if self.module:\n j.update({\"module\": str(self.module)})\n # sequence of modules that led to discovery\n if self.module_sequence:\n j.update({\"module_sequence\": str(self.module_sequence)})\n # discovery context\n j[\"discovery_context\"] = self.discovery_context\n j[\"discovery_path\"] = self.discovery_path\n\n # normalize non-primitive python objects\n for k, v in list(j.items()):\n if k == \"data\":\n continue\n if type(v) not in (str, int, float, bool, list, dict, type(None)):\n try:\n j[k] = json.dumps(v, sort_keys=True)\n except Exception:\n j[k] = smart_decode(v)\n return j\n\n @staticmethod\n def from_json(j):\n \"\"\"\n Convenience shortcut to create an Event object from a JSON-compatible dictionary.\n\n Calls the `event_from_json()` function to deserialize the event.\n\n Parameters:\n j (dict): The JSON-compatible dictionary containing event data.\n\n Returns:\n Event: The deserialized Event object.\n \"\"\"\n return event_from_json(j)\n\n @property\n def module_sequence(self):\n \"\"\"\n Get a human-friendly string that represents the sequence of modules responsible for generating this event.\n\n Includes the names of omitted parent events to provide a complete view of the module sequence leading to this event.\n\n Returns:\n str: The module sequence in human-friendly format.\n \"\"\"\n module_name = getattr(self.module, \"name\", \"\")\n if getattr(self.parent, \"_omit\", False):\n module_name = f\"{self.parent.module_sequence}->{module_name}\"\n return module_name\n\n @property\n def module_priority(self):\n if self._module_priority is None:\n module = getattr(self, \"module\", None)\n self._module_priority = int(max(1, min(5, getattr(module, \"priority\", 3))))\n return self._module_priority\n\n @module_priority.setter\n def module_priority(self, priority):\n self._module_priority = int(max(1, min(5, priority)))\n\n @property\n def priority(self):\n if self._priority is None:\n timestamp = self.timestamp.timestamp()\n if self.parent.timestamp == self.timestamp:\n self._priority = (timestamp,)\n else:\n self._priority = getattr(self.parent, \"priority\", ()) + (timestamp,)\n\n return self._priority\n\n @property\n def type(self):\n return self._type\n\n @type.setter\n def type(self, val):\n self._type = val\n self._hash = None\n self._id = None\n\n @property\n def _host_size(self):\n \"\"\"\n Used for sorting events by their host size, so that parent ones (e.g. IP subnets) come first\n \"\"\"\n if self.host:\n if isinstance(self.host, str):\n # smaller domains should come first\n return len(self.host)\n else:\n try:\n # bigger IP subnets should come first\n return -self.host.num_addresses\n except AttributeError:\n # IP addresses default to 1\n return 1\n return 0\n\n def __iter__(self):\n \"\"\"\n For dict(event)\n \"\"\"\n yield from self.json().items()\n\n def __lt__(self, other):\n \"\"\"\n For queue sorting\n \"\"\"\n return self.priority < getattr(other, \"priority\", (0,))\n\n def __gt__(self, other):\n \"\"\"\n For queue sorting\n \"\"\"\n return self.priority > getattr(other, \"priority\", (0,))\n\n def __eq__(self, other):\n try:\n other = make_event(other, dummy=True)\n except ValidationError:\n return False\n return hash(self) == hash(other)\n\n def __hash__(self):\n if self._hash is None:\n self._hash = hash(self.id)\n return self._hash\n\n def __str__(self):\n max_event_len = 80\n d = str(self.data)\n return f'{self.type}(\"{d[:max_event_len]}{(\"...\" if len(d) > max_event_len else \"\")}\", module={self.module}, tags={self.tags})'\n\n def __repr__(self):\n return str(self)\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent.pretty_string","title":"pretty_string property
","text":"pretty_string\n
A human-friendly representation of the event's data. Used for graph representation.
If the event's data is a dictionary, the function will try to return a JSON-formatted string. Otherwise, it will use smart_decode to convert the data into a string representation.
Override if necessary.
Returns:
str
\u2013 The graphical representation of the event's data.
property
","text":"module_sequence\n
Get a human-friendly string that represents the sequence of modules responsible for generating this event.
Includes the names of omitted parent events to provide a complete view of the module sequence leading to this event.
Returns:
str
\u2013 The module sequence in human-friendly format.
__init__(data, event_type, parent=None, context=None, module=None, scan=None, scans=None, tags=None, confidence=100, timestamp=None, _dummy=False, _internal=None)\n
Initializes an Event object with the given parameters.
In most cases, you should use make_event()
instead of instantiating this class directly. make_event()
is much friendlier, and can auto-detect the event type for you.
Attributes:
data
((str, dict)
) \u2013 The primary data for the event.
event_type
(str
) \u2013 Type of the event, e.g., 'IP_ADDRESS'.
parent
(BaseEvent
) \u2013 Parent event that led to this event's discovery. Defaults to None.
module
(str
) \u2013 Module that discovered the event. Defaults to None.
scan
(Scan
) \u2013 BBOT Scan object. Required unless _dummy is True. Defaults to None.
scans
(list of Scan
) \u2013 BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.
tags
(list of str
) \u2013 Descriptive tags for the event. Defaults to None.
confidence
(int
) \u2013 Confidence level for the event, on a scale of 1-100. Defaults to 100.
timestamp
(datetime
) \u2013 Time of event discovery. Defaults to current UTC time.
_dummy
(bool
) \u2013 If True, disables certain data validations. Defaults to False.
_internal
(Any
) \u2013 If specified, makes the event internal. Defaults to None.
Raises:
ValidationError
\u2013 If either scan
or parent
are not specified and _dummy
is False.
bbot/core/event/base.py
def __init__(\n self,\n data,\n event_type,\n parent=None,\n context=None,\n module=None,\n scan=None,\n scans=None,\n tags=None,\n confidence=100,\n timestamp=None,\n _dummy=False,\n _internal=None,\n):\n \"\"\"\n Initializes an Event object with the given parameters.\n\n In most cases, you should use `make_event()` instead of instantiating this class directly.\n `make_event()` is much friendlier, and can auto-detect the event type for you.\n\n Attributes:\n data (str, dict): The primary data for the event.\n event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'.\n parent (BaseEvent, optional): Parent event that led to this event's discovery. Defaults to None.\n module (str, optional): Module that discovered the event. Defaults to None.\n scan (Scan, optional): BBOT Scan object. Required unless _dummy is True. Defaults to None.\n scans (list of Scan, optional): BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.\n tags (list of str, optional): Descriptive tags for the event. Defaults to None.\n confidence (int, optional): Confidence level for the event, on a scale of 1-100. Defaults to 100.\n timestamp (datetime, optional): Time of event discovery. Defaults to current UTC time.\n _dummy (bool, optional): If True, disables certain data validations. Defaults to False.\n _internal (Any, optional): If specified, makes the event internal. Defaults to None.\n\n Raises:\n ValidationError: If either `scan` or `parent` are not specified and `_dummy` is False.\n \"\"\"\n\n self._id = None\n self._hash = None\n self._data = None\n self.__host = None\n self._tags = set()\n self._port = None\n self._omit = False\n self.__words = None\n self._parent = None\n self._priority = None\n self._parent_id = None\n self._host_original = None\n self._scope_distance = None\n self._module_priority = None\n self._resolved_hosts = set()\n self.dns_children = dict()\n self._discovery_context = \"\"\n self._discovery_context_regex = re.compile(r\"\\{(?:event|module)[^}]*\\}\")\n self.web_spider_distance = 0\n\n # for creating one-off events without enforcing parent requirement\n self._dummy = _dummy\n self.module = module\n self._type = event_type\n\n # keep track of whether this event has been recorded by the scan\n self._stats_recorded = False\n\n if timestamp is not None:\n self.timestamp = timestamp\n else:\n try:\n self.timestamp = datetime.datetime.now(datetime.UTC)\n except AttributeError:\n self.timestamp = datetime.datetime.utcnow()\n\n self.confidence = int(confidence)\n self._internal = False\n\n # self.scan holds the instantiated scan object (for helpers, etc.)\n self.scan = scan\n if (not self.scan) and (not self._dummy):\n raise ValidationError(f\"Must specify scan\")\n # self.scans holds a list of scan IDs from scans that encountered this event\n self.scans = []\n if scans is not None:\n self.scans = scans\n if self.scan:\n self.scans = list(set([self.scan.id] + self.scans))\n\n try:\n self.data = self._sanitize_data(data)\n except Exception as e:\n log.trace(traceback.format_exc())\n raise ValidationError(f'Error sanitizing event data \"{data}\" for type \"{self.type}\": {e}')\n\n if not self.data:\n raise ValidationError(f'Invalid event data \"{data}\" for type \"{self.type}\"')\n\n self.parent = parent\n if (not self.parent) and (not self._dummy):\n raise ValidationError(f\"Must specify event parent\")\n\n if tags is not None:\n for tag in tags:\n self.add_tag(tag)\n\n # internal events are not ingested by output modules\n if not self._dummy:\n # removed this second part because it was making certain sslcert events internal\n if _internal: # or parent._internal:\n self.internal = True\n\n if not context:\n context = getattr(self.module, \"default_discovery_context\", \"\")\n if context:\n self.discovery_context = context\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent.json","title":"json","text":"json(mode='json', siem_friendly=False)\n
Serializes the event object to a JSON-compatible dictionary.
By default, it includes attributes such as 'type', 'id', 'data', 'scope_distance', and others that are present. Additional specific attributes can be serialized based on the mode specified.
Parameters:
mode
(str
, default: 'json'
) \u2013 Specifies the data serialization mode. Default is \"json\". Other options include \"graph\", \"human\", and \"id\".
siem_friendly
(bool
, default: False
) \u2013 Whether to format the JSON in a way that's friendly to SIEM ingestion by Elastic, Splunk, etc. This ensures the value of \"data\" is always the same type (a dictionary).
Returns:
dict
\u2013 JSON-serializable dictionary representation of the event object.
bbot/core/event/base.py
def json(self, mode=\"json\", siem_friendly=False):\n \"\"\"\n Serializes the event object to a JSON-compatible dictionary.\n\n By default, it includes attributes such as 'type', 'id', 'data', 'scope_distance', and others that are present.\n Additional specific attributes can be serialized based on the mode specified.\n\n Parameters:\n mode (str): Specifies the data serialization mode. Default is \"json\". Other options include \"graph\", \"human\", and \"id\".\n siem_friendly (bool): Whether to format the JSON in a way that's friendly to SIEM ingestion by Elastic, Splunk, etc. This ensures the value of \"data\" is always the same type (a dictionary).\n\n Returns:\n dict: JSON-serializable dictionary representation of the event object.\n \"\"\"\n # type, ID, scope description\n j = dict()\n for i in (\"type\", \"id\", \"scope_description\"):\n v = getattr(self, i, \"\")\n if v:\n j.update({i: v})\n # event data\n data_attr = getattr(self, f\"data_{mode}\", None)\n if data_attr is not None:\n data = data_attr\n else:\n data = smart_decode(self.data)\n if siem_friendly:\n j[\"data\"] = {self.type: data}\n else:\n j[\"data\"] = data\n # host, dns children\n if self.host:\n j[\"host\"] = str(self.host)\n j[\"resolved_hosts\"] = sorted(str(h) for h in self.resolved_hosts)\n j[\"dns_children\"] = {k: list(v) for k, v in self.dns_children.items()}\n # web spider distance\n web_spider_distance = getattr(self, \"web_spider_distance\", None)\n if web_spider_distance is not None:\n j[\"web_spider_distance\"] = web_spider_distance\n # scope distance\n j[\"scope_distance\"] = self.scope_distance\n # scan\n if self.scan:\n j[\"scan\"] = self.scan.id\n # timestamp\n j[\"timestamp\"] = self.timestamp.isoformat()\n # parent event\n parent_id = self.parent_id\n if parent_id:\n j[\"parent\"] = parent_id\n # tags\n if self.tags:\n j.update({\"tags\": list(self.tags)})\n # parent module\n if self.module:\n j.update({\"module\": str(self.module)})\n # sequence of modules that led to discovery\n if self.module_sequence:\n j.update({\"module_sequence\": str(self.module_sequence)})\n # discovery context\n j[\"discovery_context\"] = self.discovery_context\n j[\"discovery_path\"] = self.discovery_path\n\n # normalize non-primitive python objects\n for k, v in list(j.items()):\n if k == \"data\":\n continue\n if type(v) not in (str, int, float, bool, list, dict, type(None)):\n try:\n j[k] = json.dumps(v, sort_keys=True)\n except Exception:\n j[k] = smart_decode(v)\n return j\n
"},{"location":"dev/event/#bbot.core.event.base.BaseEvent.from_json","title":"from_json staticmethod
","text":"from_json(j)\n
Convenience shortcut to create an Event object from a JSON-compatible dictionary.
Calls the event_from_json()
function to deserialize the event.
Parameters:
j
(dict
) \u2013 The JSON-compatible dictionary containing event data.
Returns:
Event
\u2013 The deserialized Event object.
bbot/core/event/base.py
@staticmethod\ndef from_json(j):\n \"\"\"\n Convenience shortcut to create an Event object from a JSON-compatible dictionary.\n\n Calls the `event_from_json()` function to deserialize the event.\n\n Parameters:\n j (dict): The JSON-compatible dictionary containing event data.\n\n Returns:\n Event: The deserialized Event object.\n \"\"\"\n return event_from_json(j)\n
"},{"location":"dev/module_howto/","title":"How to Write a BBOT Module","text":"Here we'll go over a basic example of writing a custom BBOT module.
"},{"location":"dev/module_howto/#create-the-python-file","title":"Create the python file","text":".py
file in bbot/modules
BaseModule
BaseModule
watched_events
what type of data your module will consumeproduced_events
what type of data your module will produceflags
) whether your module is active
or passive
, and whether it's safe
or aggressive
.handle_event()
Here is an example of a simple module that performs whois lookups:
bbot/modules/whois.pyfrom bbot.modules.base import BaseModule\n\nclass whois(BaseModule):\n watched_events = [\"DNS_NAME\"] # watch for DNS_NAME events\n produced_events = [\"WHOIS\"] # we produce WHOIS events\n flags = [\"passive\", \"safe\"]\n meta = {\"description\": \"Query WhoisXMLAPI for WHOIS data\"}\n options = {\"api_key\": \"\"} # module config options\n options_desc = {\"api_key\": \"WhoisXMLAPI Key\"}\n per_domain_only = True # only run once per domain\n\n base_url = \"https://www.whoisxmlapi.com/whoisserver/WhoisService\"\n\n # one-time setup - runs at the beginning of the scan\n async def setup(self):\n self.api_key = self.config.get(\"api_key\")\n if not self.api_key:\n # soft-fail if no API key is set\n return None, \"Must set API key\"\n\n async def handle_event(self, event):\n self.hugesuccess(f\"Got {event} (event.data: {event.data})\")\n _, domain = self.helpers.split_domain(event.data)\n url = f\"{self.base_url}?apiKey={self.api_key}&domainName={domain}&outputFormat=JSON\"\n self.hugeinfo(f\"Visiting {url}\")\n response = await self.helpers.request(url)\n if response is not None:\n await self.emit_event(response.json(), \"WHOIS\", parent=event)\n
"},{"location":"dev/module_howto/#test-your-new-module","title":"Test your new module","text":"After saving the module, you can run it with -m
:
# run a scan enabling the module in bbot/modules/mymodule.py\nbbot -t evilcorp.com -m whois\n
"},{"location":"dev/module_howto/#debugging-your-module-bbots-colorful-log-functions","title":"Debugging Your Module - BBOT's Colorful Log Functions","text":"You probably noticed the use of self.hugesuccess()
. This function is part of BBOT's builtin logging capabilty, and it prints whatever you give it in bright green. These colorful log functions can be useful for debugging.
BBOT log levels:
critical
: bright redhugesuccess
: bright greenhugewarning
: bright orangehugeinfo
: bright blueerror
: redwarning
: orangeinfo
: blueverbose
: grey (must use -v
to see)debug
: grey (must use -d
to see)For details on how tests are written, see Unit Tests.
"},{"location":"dev/module_howto/#handle_event-and-emit_event","title":"handle_event()
and emit_event()
","text":"The handle_event()
method is the most important part of the module. By overriding this method, you control what the module does. During a scan, when an event from your watched_events
is encountered (a DNS_NAME
in this example), handle_event()
is automatically called with that event as its argument.
The emit_event()
method is how modules return data. When you call emit_event()
, it creates an event and outputs it, sending it any modules that are interested in that data type.
setup()
","text":"A module's setup()
method is used for performing one-time setup at the start of the scan, like downloading a wordlist or checking to make sure an API key is valid. It needs to return either:
True
- module setup succeededNone
- module setup soft-failed (scan will continue but module will be disabled)False
- module setup hard-failed (scan will abort)Optionally, it can also return a reason. Here are some examples:
async def setup(self):\n if not self.config.get(\"api_key\"):\n # soft-fail\n return None, \"No API key specified\"\n\nasync def setup(self):\n try:\n wordlist = self.helpers.wordlist(\"https://raw.githubusercontent.com/user/wordlist.txt\")\n except WordlistError as e:\n # hard-fail\n return False, f\"Error downloading wordlist: {e}\"\n\nasync def setup(self):\n self.timeout = self.config.get(\"timeout\", 5)\n # success\n return True\n
"},{"location":"dev/module_howto/#module-config-options","title":"Module Config Options","text":"Each module can have its own set of config options. These live in the options
and options_desc
attributes on your class. Both are dictionaries; options
is for defaults and options_desc
is for descriptions. Here is a typical example:
class nmap(BaseModule):\n # ...\n options = {\n \"top_ports\": 100,\n \"ports\": \"\",\n \"timing\": \"T4\",\n \"skip_host_discovery\": True,\n }\n options_desc = {\n \"top_ports\": \"Top ports to scan (default 100) (to override, specify 'ports')\",\n \"ports\": \"Ports to scan\",\n \"timing\": \"-T<0-5>: Set timing template (higher is faster)\",\n \"skip_host_discovery\": \"skip host discovery (-Pn)\",\n }\n\n async def setup(self):\n self.ports = self.config.get(\"ports\", \"\")\n self.timing = self.config.get(\"timing\", \"T4\")\n self.top_ports = self.config.get(\"top_ports\", 100)\n self.skip_host_discovery = self.config.get(\"skip_host_discovery\", True)\n return True\n
Once you've defined these variables, you can pass the options via -c
:
bbot -m nmap -c modules.nmap.top_ports=250\n
... or via the config:
~/.config/bbot/bbot.ymlmodules:\n nmap:\n top_ports: 250\n
Inside the module, you access them via self.config
, e.g.:
self.config.get(\"top_ports\")\n
"},{"location":"dev/module_howto/#module-dependencies","title":"Module Dependencies","text":"BBOT automates module dependencies with Ansible. If your module relies on a third-party binary, OS package, or python library, you can specify them in the deps_*
attributes of your module.
class MyModule(BaseModule):\n ...\n deps_apt = [\"chromium-browser\"]\n deps_ansible = [\n {\n \"name\": \"install dev tools\",\n \"package\": {\"name\": [\"gcc\", \"git\", \"make\"], \"state\": \"present\"},\n \"become\": True,\n \"ignore_errors\": True,\n },\n {\n \"name\": \"Download massdns source code\",\n \"git\": {\n \"repo\": \"https://github.com/blechschmidt/massdns.git\",\n \"dest\": \"#{BBOT_TEMP}/massdns\",\n \"single_branch\": True,\n \"version\": \"master\",\n },\n },\n {\n \"name\": \"Build massdns\",\n \"command\": {\"chdir\": \"#{BBOT_TEMP}/massdns\", \"cmd\": \"make\", \"creates\": \"#{BBOT_TEMP}/massdns/bin/massdns\"},\n },\n {\n \"name\": \"Install massdns\",\n \"copy\": {\"src\": \"#{BBOT_TEMP}/massdns/bin/massdns\", \"dest\": \"#{BBOT_TOOLS}/\", \"mode\": \"u+x,g+x,o+x\"},\n },\n ]\n
"},{"location":"dev/presets/","title":"Presets","text":""},{"location":"dev/presets/#bbot.scanner.Preset","title":"Preset","text":"A preset is the central config for a BBOT scan. It contains everything a scan needs to run -- targets, modules, flags, config options like API keys, etc.
You can create a preset manually and pass it into Scanner(preset=preset)
. Or, you can pass Preset
's kwargs into Scanner()
and it will create the preset for you implicitly.
Presets can include other presets (which can in turn include other presets, and so on). This works by merging each preset in turn using Preset.merge()
. The order matters. In case of a conflict, the last preset to be merged wins priority.
Presets can be loaded from or saved to YAML. BBOT has a number of ready-made presets for common tasks like subdomain enumeration, web spidering, dirbusting, etc.
Presets are highly customizable via conditions
, which use the Jinja2 templating engine. Using conditions
, you can define custom logic to inspect the final preset before the scan starts, and change it if need be. Based on the state of the preset, you can print a warning message, abort the scan, enable/disable modules, etc..
Attributes:
target
(Target
) \u2013 Target(s) of scan.
whitelist
(Target
) \u2013 Scan whitelist (by default this is the same as target
).
blacklist
(Target
) \u2013 Scan blacklist (this takes ultimate precedence).
strict_scope
(bool
) \u2013 If True, subdomains of targets are not considered to be in-scope.
helpers
(ConfigAwareHelper
) \u2013 Helper containing various reusable functions, regexes, etc.
output_dir
(Path
) \u2013 Output directory for scan.
scan_name
(str
) \u2013 Name of scan. Defaults to random value, e.g. \"demonic_jimmy\".
name
(str
) \u2013 Human-friendly name of preset. Used mainly for logging purposes.
description
(str
) \u2013 Description of preset.
modules
(set
) \u2013 Combined modules to enable for the scan. Includes scan modules, internal modules, and output modules.
scan_modules
(set
) \u2013 Modules to enable for the scan.
output_modules
(set
) \u2013 Output modules to enable for the scan. (note: if no output modules are specified, this is not populated until .bake())
internal_modules
(set
) \u2013 Internal modules for the scan. (note: not populated until .bake())
exclude_modules
(set
) \u2013 Modules to exclude from the scan. When set, automatically removes excluded modules.
flags
(set
) \u2013 Flags to enable for the scan. When set, automatically enables modules.
require_flags
(set
) \u2013 Require modules to have these flags. When set, automatically removes offending modules.
exclude_flags
(set
) \u2013 Exclude modules that have any of these flags. When set, automatically removes offending modules.
module_dirs
(set
) \u2013 Custom directories from which to load modules (alias to self.module_loader.module_dirs
). When set, automatically preloads contained modules.
config
(DictConfig
) \u2013 BBOT config (alias to core.config
)
core
(BBOTCore
) \u2013 Local copy of BBOTCore object.
verbose
(bool
) \u2013 Whether log level is currently set to verbose. When set, updates log level for all BBOT log handlers.
debug
(bool
) \u2013 Whether log level is currently set to debug. When set, updates log level for all BBOT log handlers.
silent
(bool
) \u2013 Whether logging is currently disabled. When set to True, silences all stderr.
Examples:
>>> preset = Preset(\n \"evilcorp.com\",\n \"1.2.3.0/24\",\n flags=[\"subdomain-enum\"],\n modules=[\"nuclei\"],\n config={\"http_proxy\": \"http://127.0.0.1\"}\n )\n>>> scan = Scanner(preset=preset)\n
>>> preset = Preset.from_yaml_file(\"my_preset.yml\")\n>>> scan = Scanner(preset=preset)\n
Source code in bbot/scanner/preset/preset.py
class Preset:\n \"\"\"\n A preset is the central config for a BBOT scan. It contains everything a scan needs to run --\n targets, modules, flags, config options like API keys, etc.\n\n You can create a preset manually and pass it into `Scanner(preset=preset)`.\n Or, you can pass `Preset`'s kwargs into `Scanner()` and it will create the preset for you implicitly.\n\n Presets can include other presets (which can in turn include other presets, and so on).\n This works by merging each preset in turn using `Preset.merge()`.\n The order matters. In case of a conflict, the last preset to be merged wins priority.\n\n Presets can be loaded from or saved to YAML. BBOT has a number of ready-made presets for common tasks like\n subdomain enumeration, web spidering, dirbusting, etc.\n\n Presets are highly customizable via `conditions`, which use the Jinja2 templating engine.\n Using `conditions`, you can define custom logic to inspect the final preset before the scan starts, and change it if need be.\n Based on the state of the preset, you can print a warning message, abort the scan, enable/disable modules, etc..\n\n Attributes:\n target (Target): Target(s) of scan.\n whitelist (Target): Scan whitelist (by default this is the same as `target`).\n blacklist (Target): Scan blacklist (this takes ultimate precedence).\n strict_scope (bool): If True, subdomains of targets are not considered to be in-scope.\n helpers (ConfigAwareHelper): Helper containing various reusable functions, regexes, etc.\n output_dir (pathlib.Path): Output directory for scan.\n scan_name (str): Name of scan. Defaults to random value, e.g. \"demonic_jimmy\".\n name (str): Human-friendly name of preset. Used mainly for logging purposes.\n description (str): Description of preset.\n modules (set): Combined modules to enable for the scan. Includes scan modules, internal modules, and output modules.\n scan_modules (set): Modules to enable for the scan.\n output_modules (set): Output modules to enable for the scan. (note: if no output modules are specified, this is not populated until .bake())\n internal_modules (set): Internal modules for the scan. (note: not populated until .bake())\n exclude_modules (set): Modules to exclude from the scan. When set, automatically removes excluded modules.\n flags (set): Flags to enable for the scan. When set, automatically enables modules.\n require_flags (set): Require modules to have these flags. When set, automatically removes offending modules.\n exclude_flags (set): Exclude modules that have any of these flags. When set, automatically removes offending modules.\n module_dirs (set): Custom directories from which to load modules (alias to `self.module_loader.module_dirs`). When set, automatically preloads contained modules.\n config (omegaconf.dictconfig.DictConfig): BBOT config (alias to `core.config`)\n core (BBOTCore): Local copy of BBOTCore object.\n verbose (bool): Whether log level is currently set to verbose. When set, updates log level for all BBOT log handlers.\n debug (bool): Whether log level is currently set to debug. When set, updates log level for all BBOT log handlers.\n silent (bool): Whether logging is currently disabled. When set to True, silences all stderr.\n\n Examples:\n >>> preset = Preset(\n \"evilcorp.com\",\n \"1.2.3.0/24\",\n flags=[\"subdomain-enum\"],\n modules=[\"nuclei\"],\n config={\"http_proxy\": \"http://127.0.0.1\"}\n )\n >>> scan = Scanner(preset=preset)\n\n >>> preset = Preset.from_yaml_file(\"my_preset.yml\")\n >>> scan = Scanner(preset=preset)\n \"\"\"\n\n def __init__(\n self,\n *targets,\n whitelist=None,\n blacklist=None,\n strict_scope=False,\n modules=None,\n output_modules=None,\n exclude_modules=None,\n flags=None,\n require_flags=None,\n exclude_flags=None,\n config=None,\n module_dirs=None,\n include=None,\n presets=None,\n output_dir=None,\n scan_name=None,\n name=None,\n description=None,\n conditions=None,\n force_start=False,\n verbose=False,\n debug=False,\n silent=False,\n _exclude=None,\n _log=True,\n ):\n \"\"\"\n Initializes the Preset class.\n\n Args:\n *targets (str): Target(s) to scan. Types supported: hostnames, IPs, CIDRs, emails, open ports.\n whitelist (list, optional): Whitelisted target(s) to scan. Defaults to the same as `targets`.\n blacklist (list, optional): Blacklisted target(s). Takes ultimate precedence. Defaults to empty.\n strict_scope (bool, optional): If True, subdomains of targets are not in-scope.\n modules (list[str], optional): List of scan modules to enable for the scan. Defaults to empty list.\n output_modules (list[str], optional): List of output modules to use. Defaults to csv, human, and json.\n exclude_modules (list[str], optional): List of modules to exclude from the scan.\n require_flags (list[str], optional): Only enable modules if they have these flags.\n exclude_flags (list[str], optional): Don't enable modules if they have any of these flags.\n module_dirs (list[str], optional): additional directories to load modules from.\n config (dict, optional): Additional scan configuration settings.\n include (list[str], optional): names or filenames of other presets to include.\n presets (list[str], optional): an alias for `include`.\n output_dir (str or Path, optional): Directory to store scan output. Defaults to BBOT home directory (`~/.bbot`).\n scan_name (str, optional): Human-readable name of the scan. If not specified, it will be random, e.g. \"demonic_jimmy\".\n name (str, optional): Human-readable name of the preset. Used mainly for logging.\n description (str, optional): Description of the preset.\n conditions (list[str], optional): Custom conditions to be executed before scan start. Written in Jinja2.\n force_start (bool, optional): If True, ignore conditional aborts and failed module setups. Just run the scan!\n verbose (bool, optional): Set the BBOT logger to verbose mode.\n debug (bool, optional): Set the BBOT logger to debug mode.\n silent (bool, optional): Silence all stderr (effectively disables the BBOT logger).\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n \"\"\"\n # internal variables\n self._cli = False\n self._log = _log\n self.scan = None\n self._args = None\n self._environ = None\n self._helpers = None\n self._module_loader = None\n self._yaml_str = \"\"\n self._baked = False\n\n self._default_output_modules = None\n self._default_internal_modules = None\n\n # modules / flags\n self.modules = set()\n self.exclude_modules = set()\n self.flags = set()\n self.exclude_flags = set()\n self.require_flags = set()\n\n # modules + flags\n if modules is None:\n modules = []\n if isinstance(modules, str):\n modules = [modules]\n if output_modules is None:\n output_modules = []\n if isinstance(output_modules, str):\n output_modules = [output_modules]\n if exclude_modules is None:\n exclude_modules = []\n if isinstance(exclude_modules, str):\n exclude_modules = [exclude_modules]\n if flags is None:\n flags = []\n if isinstance(flags, str):\n flags = [flags]\n if exclude_flags is None:\n exclude_flags = []\n if isinstance(exclude_flags, str):\n exclude_flags = [exclude_flags]\n if require_flags is None:\n require_flags = []\n if isinstance(require_flags, str):\n require_flags = [require_flags]\n\n # these are used only for preserving the modules as specified in the original preset\n # this is to ensure the preset looks the same when reserialized\n self.explicit_scan_modules = set() if modules is None else set(modules)\n self.explicit_output_modules = set() if output_modules is None else set(output_modules)\n\n # whether to force-start the scan (ignoring conditional aborts and failed module setups)\n self.force_start = force_start\n\n # scan output directory\n self.output_dir = output_dir\n # name of scan\n self.scan_name = scan_name\n\n # name of preset, default blank\n self.name = name or \"\"\n # preset description, default blank\n self.description = description or \"\"\n\n # custom conditions, evaluated during .bake()\n self.conditions = []\n if conditions is not None:\n for condition in conditions:\n self.conditions.append((self.name, condition))\n\n # keeps track of loaded preset files to prevent infinite circular inclusions\n self._preset_files_loaded = set()\n if _exclude is not None:\n for _filename in _exclude:\n self._preset_files_loaded.add(Path(_filename).resolve())\n\n # bbot core config\n self.core = CORE.copy()\n if config is None:\n config = omegaconf.OmegaConf.create({})\n # merge custom configs if specified by the user\n self.core.merge_custom(config)\n\n # log verbosity\n # actual log verbosity isn't set until .bake()\n self.verbose = verbose\n self.debug = debug\n self.silent = silent\n\n # custom module directories\n self._module_dirs = set()\n self.module_dirs = module_dirs\n\n # target / whitelist / blacklist\n self.strict_scope = strict_scope\n # these are temporary receptacles until they all get .baked() together\n self._seeds = set(targets if targets else [])\n self._whitelist = set(whitelist) if whitelist else whitelist\n self._blacklist = set(blacklist if blacklist else [])\n\n self._target = None\n\n # \"presets\" is alias to \"include\"\n if presets and include:\n raise ValueError(\n 'Cannot use both \"presets\" and \"include\" args at the same time (presets is only an alias to include). Please pick only one :)'\n )\n if presets and not include:\n include = presets\n # include other presets\n if include and not isinstance(include, (list, tuple, set)):\n include = [include]\n if include:\n for included_preset in include:\n self.include_preset(included_preset)\n\n # we don't fill self.modules yet (that happens in .bake())\n self.explicit_scan_modules.update(set(modules))\n self.explicit_output_modules.update(set(output_modules))\n self.exclude_modules.update(set(exclude_modules))\n self.flags.update(set(flags))\n self.exclude_flags.update(set(exclude_flags))\n self.require_flags.update(set(require_flags))\n\n @property\n def bbot_home(self):\n return Path(self.config.get(\"home\", \"~/.bbot\")).expanduser().resolve()\n\n @property\n def target(self):\n if self._target is None:\n raise ValueError(\"Cannot access target before preset is baked (use ._seeds instead)\")\n return self._target\n\n @property\n def whitelist(self):\n if self._target is None:\n raise ValueError(\"Cannot access whitelist before preset is baked (use ._whitelist instead)\")\n return self.target.whitelist\n\n @property\n def blacklist(self):\n if self._target is None:\n raise ValueError(\"Cannot access blacklist before preset is baked (use ._blacklist instead)\")\n return self.target.blacklist\n\n @property\n def preset_dir(self):\n return self.bbot_home / \"presets\"\n\n @property\n def default_output_modules(self):\n if self._default_output_modules is not None:\n output_modules = self._default_output_modules\n else:\n output_modules = [\"python\", \"csv\", \"txt\", \"json\"]\n if self._cli:\n output_modules.append(\"stdout\")\n return output_modules\n\n @property\n def default_internal_modules(self):\n preloaded_internal = self.module_loader.preloaded(type=\"internal\")\n if self._default_internal_modules is not None:\n internal_modules = self._default_internal_modules\n else:\n internal_modules = list(preloaded_internal)\n return {k: preloaded_internal[k] for k in internal_modules}\n\n def merge(self, other):\n \"\"\"\n Merge another preset into this one.\n\n If there are any config conflicts, `other` will win over `self`.\n\n Args:\n other (Preset): The preset to merge into this one.\n\n Examples:\n >>> preset1 = Preset(modules=[\"portscan\"])\n >>> preset1.scan_modules\n ['portscan']\n >>> preset2 = Preset(modules=[\"sslcert\"])\n >>> preset2.scan_modules\n ['sslcert']\n >>> preset1.merge(preset2)\n >>> preset1.scan_modules\n ['portscan', 'sslcert']\n \"\"\"\n self.log_debug(f'Merging preset \"{other.name}\" into \"{self.name}\"')\n # config\n self.core.merge_custom(other.core.custom_config)\n self.module_loader.core = self.core\n # module dirs\n # modules + flags\n # establish requirements / exclusions first\n self.exclude_modules.update(other.exclude_modules)\n self.require_flags.update(other.require_flags)\n self.exclude_flags.update(other.exclude_flags)\n # then it's okay to start enabling modules\n self.explicit_scan_modules.update(other.explicit_scan_modules)\n self.explicit_output_modules.update(other.explicit_output_modules)\n self.flags.update(other.flags)\n\n # target / scope\n self._seeds.update(other._seeds)\n # leave whitelist as None until we encounter one\n if other._whitelist is not None:\n if self._whitelist is None:\n self._whitelist = set(other._whitelist)\n else:\n self._whitelist.update(other._whitelist)\n self._blacklist.update(other._blacklist)\n self.strict_scope = self.strict_scope or other.strict_scope\n\n # log verbosity\n if other.silent:\n self.silent = other.silent\n if other.verbose:\n self.verbose = other.verbose\n if other.debug:\n self.debug = other.debug\n # scan name\n if other.scan_name is not None:\n self.scan_name = other.scan_name\n if other.output_dir is not None:\n self.output_dir = other.output_dir\n # conditions\n if other.conditions:\n self.conditions.extend(other.conditions)\n # misc\n self.force_start = self.force_start | other.force_start\n self._cli = self._cli | other._cli\n\n def bake(self, scan=None):\n \"\"\"\n Return a \"baked\" copy of this preset, ready for use by a BBOT scan.\n\n Baking a preset finalizes it by populating `preset.modules` based on flags,\n performing final validations, and substituting environment variables in preloaded modules.\n It also evaluates custom `conditions` as specified in the preset.\n\n This function is automatically called in Scanner.__init__(). There is no need to call it manually.\n \"\"\"\n self.log_debug(\"Getting baked\")\n # create a copy of self\n baked_preset = copy(self)\n baked_preset.scan = scan\n # copy core\n baked_preset.core = self.core.copy()\n # copy module loader\n baked_preset._module_loader = self.module_loader.copy()\n # prepare os environment\n os_environ = baked_preset.environ.prepare()\n # find and replace preloaded modules with os environ\n # this is different from the config variable substitution because it modifies\n # the preloaded modules, i.e. their ansible playbooks\n baked_preset.module_loader.find_and_replace(**os_environ)\n # update os environ\n os.environ.clear()\n os.environ.update(os_environ)\n\n # validate flags, config options\n baked_preset.validate()\n\n # validate log level options\n baked_preset.apply_log_level(apply_core=scan is not None)\n\n # assign baked preset to our scan\n if scan is not None:\n scan.preset = baked_preset\n\n # now that our requirements / exclusions are validated, we can start enabling modules\n # enable scan modules\n for module in baked_preset.explicit_scan_modules:\n baked_preset.add_module(module, module_type=\"scan\")\n\n # enable output modules\n output_modules_to_enable = set(baked_preset.explicit_output_modules)\n default_output_modules = self.default_output_modules\n output_module_override = any(m in default_output_modules for m in output_modules_to_enable)\n # if none of the default output modules have been explicitly specified, enable them all\n if not output_module_override:\n output_modules_to_enable.update(self.default_output_modules)\n for module in output_modules_to_enable:\n baked_preset.add_module(module, module_type=\"output\", raise_error=False)\n\n # enable internal modules\n for internal_module, preloaded in self.default_internal_modules.items():\n is_enabled = baked_preset.config.get(internal_module, True)\n is_excluded = internal_module in baked_preset.exclude_modules\n if is_enabled and not is_excluded:\n baked_preset.add_module(internal_module, module_type=\"internal\", raise_error=False)\n\n # disable internal modules if requested\n for internal_module in baked_preset.internal_modules:\n if baked_preset.config.get(internal_module, True) == False:\n baked_preset.exclude_modules.add(internal_module)\n\n # enable modules by flag\n for flag in baked_preset.flags:\n for module, preloaded in baked_preset.module_loader.preloaded().items():\n module_flags = preloaded.get(\"flags\", [])\n module_type = preloaded.get(\"type\", \"scan\")\n if flag in module_flags:\n self.log_debug(f'Enabling module \"{module}\" because it has flag \"{flag}\"')\n baked_preset.add_module(module, module_type, raise_error=False)\n\n # ensure we have output modules\n if not baked_preset.output_modules:\n for output_module in self.default_output_modules:\n baked_preset.add_module(output_module, module_type=\"output\", raise_error=False)\n\n # create target object\n from bbot.scanner.target import BBOTTarget\n\n baked_preset._target = BBOTTarget(\n *list(self._seeds),\n whitelist=self._whitelist,\n blacklist=self._blacklist,\n strict_scope=self.strict_scope,\n scan=scan,\n )\n\n # evaluate conditions\n if baked_preset.conditions:\n from .conditions import ConditionEvaluator\n\n evaluator = ConditionEvaluator(baked_preset)\n evaluator.evaluate()\n\n self._baked = True\n return baked_preset\n\n def parse_args(self):\n \"\"\"\n Parse CLI arguments, and merge them into this preset.\n\n Used in `cli.py`.\n \"\"\"\n self._cli = True\n self.merge(self.args.preset_from_args())\n\n @property\n def module_dirs(self):\n return self.module_loader.module_dirs\n\n @module_dirs.setter\n def module_dirs(self, module_dirs):\n if module_dirs:\n if isinstance(module_dirs, str):\n module_dirs = [module_dirs]\n for m in module_dirs:\n self.module_loader.add_module_dir(m)\n self._module_dirs.add(m)\n\n @property\n def scan_modules(self):\n return [m for m in self.modules if self.preloaded_module(m).get(\"type\", \"scan\") == \"scan\"]\n\n @property\n def output_modules(self):\n return [m for m in self.modules if self.preloaded_module(m).get(\"type\", \"scan\") == \"output\"]\n\n @property\n def internal_modules(self):\n return [m for m in self.modules if self.preloaded_module(m).get(\"type\", \"scan\") == \"internal\"]\n\n def add_module(self, module_name, module_type=\"scan\", raise_error=True):\n self.log_debug(f'Adding module \"{module_name}\" of type \"{module_type}\"')\n is_valid, reason, preloaded = self._is_valid_module(module_name, module_type, raise_error=raise_error)\n if not is_valid:\n self.log_debug(f'Unable to add {module_type} module \"{module_name}\": {reason}')\n return\n self.modules.add(module_name)\n for module_dep in preloaded.get(\"deps\", {}).get(\"modules\", []):\n if module_dep != module_name and module_dep not in self.modules:\n self.log_verbose(f'Adding module \"{module_dep}\" because {module_name} depends on it')\n self.add_module(module_dep, raise_error=False)\n\n def preloaded_module(self, module):\n return self.module_loader.preloaded()[module]\n\n @property\n def config(self):\n return self.core.config\n\n @property\n def web_config(self):\n return self.core.config.get(\"web\", {})\n\n def apply_log_level(self, apply_core=False):\n # silent takes precedence\n if self.silent:\n self.verbose = False\n self.debug = False\n if apply_core:\n self.core.logger.log_level = \"CRITICAL\"\n for key in (\"verbose\", \"debug\"):\n with suppress(omegaconf.errors.ConfigKeyError):\n del self.core.custom_config[key]\n else:\n # then debug\n if self.debug:\n self.verbose = False\n if apply_core:\n self.core.logger.log_level = \"DEBUG\"\n with suppress(omegaconf.errors.ConfigKeyError):\n del self.core.custom_config[\"verbose\"]\n else:\n # finally verbose\n if self.verbose and apply_core:\n self.core.logger.log_level = \"VERBOSE\"\n\n @property\n def helpers(self):\n if self._helpers is None:\n from bbot.core.helpers.helper import ConfigAwareHelper\n\n self._helpers = ConfigAwareHelper(preset=self)\n return self._helpers\n\n @property\n def module_loader(self):\n self.environ\n if self._module_loader is None:\n from bbot.core.modules import MODULE_LOADER\n\n self._module_loader = MODULE_LOADER\n self._module_loader.ensure_config_files()\n\n return self._module_loader\n\n @property\n def environ(self):\n if self._environ is None:\n from .environ import BBOTEnviron\n\n self._environ = BBOTEnviron(self)\n return self._environ\n\n @property\n def args(self):\n if self._args is None:\n from .args import BBOTArgs\n\n self._args = BBOTArgs(self)\n return self._args\n\n def in_scope(self, host):\n return self.target.in_scope(host)\n\n def blacklisted(self, host):\n return self.target.blacklisted(host)\n\n def whitelisted(self, host):\n return self.target.whitelisted(host)\n\n @classmethod\n def from_dict(cls, preset_dict, name=None, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a Python dictionary object.\n\n Args:\n preset_dict (dict): Preset in dictionary form\n name (str, optional): Name of preset\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n\n Returns:\n Preset: The loaded preset\n\n Examples:\n >>> preset = Preset.from_dict({\"target\": [\"evilcorp.com\"], \"modules\": [\"portscan\"]})\n \"\"\"\n new_preset = cls(\n *preset_dict.get(\"target\", []),\n whitelist=preset_dict.get(\"whitelist\"),\n blacklist=preset_dict.get(\"blacklist\"),\n modules=preset_dict.get(\"modules\"),\n output_modules=preset_dict.get(\"output_modules\"),\n exclude_modules=preset_dict.get(\"exclude_modules\"),\n flags=preset_dict.get(\"flags\"),\n require_flags=preset_dict.get(\"require_flags\"),\n exclude_flags=preset_dict.get(\"exclude_flags\"),\n verbose=preset_dict.get(\"verbose\", False),\n debug=preset_dict.get(\"debug\", False),\n silent=preset_dict.get(\"silent\", False),\n config=preset_dict.get(\"config\"),\n strict_scope=preset_dict.get(\"strict_scope\", False),\n module_dirs=preset_dict.get(\"module_dirs\", []),\n include=list(preset_dict.get(\"include\", [])),\n scan_name=preset_dict.get(\"scan_name\"),\n output_dir=preset_dict.get(\"output_dir\"),\n name=preset_dict.get(\"name\", name),\n description=preset_dict.get(\"description\"),\n conditions=preset_dict.get(\"conditions\", []),\n _exclude=_exclude,\n _log=_log,\n )\n return new_preset\n\n def include_preset(self, filename):\n \"\"\"\n Load a preset from a yaml file and merge it into this one.\n\n If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Args:\n filename (Path): The preset YAML file to merge\n\n Examples:\n >>> preset.include_preset(\"/home/user/my_preset.yml\")\n \"\"\"\n self.log_debug(f'Including preset \"{filename}\"')\n preset_filename = PRESET_PATH.find(filename)\n preset_from_yaml = self.from_yaml_file(preset_filename, _exclude=self._preset_files_loaded)\n if preset_from_yaml is not False:\n self.merge(preset_from_yaml)\n self._preset_files_loaded.add(preset_filename)\n\n @classmethod\n def from_yaml_file(cls, filename, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> preset = Preset.from_yaml_file(\"/home/user/my_preset.yml\")\n \"\"\"\n filename = Path(filename).resolve()\n try:\n return _preset_cache[filename]\n except KeyError:\n if _exclude is None:\n _exclude = set()\n if _exclude is not None and filename in _exclude:\n log.debug(f\"Not loading {filename} because it was already loaded {_exclude}\")\n return False\n log.debug(f\"Loading {filename} because it's not in excluded list ({_exclude})\")\n _exclude = set(_exclude)\n _exclude.add(filename)\n try:\n yaml_str = open(filename).read()\n except FileNotFoundError:\n raise PresetNotFoundError(f'Could not find preset at \"{filename}\" - file does not exist')\n preset = cls.from_dict(\n omegaconf.OmegaConf.create(yaml_str), name=filename.stem, _exclude=_exclude, _log=_log\n )\n preset._yaml_str = yaml_str\n _preset_cache[filename] = preset\n return preset\n\n @classmethod\n def from_yaml_string(cls, yaml_preset):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> yaml_string = '''\n >>> target:\n >>> - evilcorp.com\n >>> modules:\n >>> - portscan'''\n >>> preset = Preset.from_yaml_string(yaml_string)\n \"\"\"\n return cls.from_dict(omegaconf.OmegaConf.create(yaml_preset))\n\n def to_dict(self, include_target=False, full_config=False, redact_secrets=False):\n \"\"\"\n Convert this preset into a Python dictionary.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n\n Returns:\n dict: The preset in dictionary form\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> preset.to_dict()\n {\"flags\": [\"subdomain-enum\"], \"modules\": [\"portscan\"]}\n \"\"\"\n preset_dict = {}\n\n # config\n if full_config:\n config = self.core.config\n else:\n config = self.core.custom_config\n config = omegaconf.OmegaConf.to_object(config)\n if redact_secrets:\n config = self.core.no_secrets_config(config)\n if config:\n preset_dict[\"config\"] = config\n\n # scope\n if include_target:\n target = sorted(str(t.data) for t in self.target.seeds)\n whitelist = []\n if self.target.whitelist is not None:\n whitelist = sorted(str(t.data) for t in self.target.whitelist)\n blacklist = sorted(str(t.data) for t in self.target.blacklist)\n if target:\n preset_dict[\"target\"] = target\n if whitelist and whitelist != target:\n preset_dict[\"whitelist\"] = whitelist\n if blacklist:\n preset_dict[\"blacklist\"] = blacklist\n if self.strict_scope:\n preset_dict[\"strict_scope\"] = True\n\n # flags + modules\n if self.require_flags:\n preset_dict[\"require_flags\"] = sorted(self.require_flags)\n if self.exclude_flags:\n preset_dict[\"exclude_flags\"] = sorted(self.exclude_flags)\n if self.exclude_modules:\n preset_dict[\"exclude_modules\"] = sorted(self.exclude_modules)\n if self.flags:\n preset_dict[\"flags\"] = sorted(self.flags)\n if self.explicit_scan_modules:\n preset_dict[\"modules\"] = sorted(self.explicit_scan_modules)\n if self.explicit_output_modules:\n preset_dict[\"output_modules\"] = sorted(self.explicit_output_modules)\n\n # log verbosity\n if self.verbose:\n preset_dict[\"verbose\"] = True\n if self.debug:\n preset_dict[\"debug\"] = True\n if self.silent:\n preset_dict[\"silent\"] = True\n\n # misc scan options\n if self.scan_name:\n preset_dict[\"scan_name\"] = self.scan_name\n if self.scan_name:\n preset_dict[\"output_dir\"] = self.output_dir\n\n # conditions\n if self.conditions:\n preset_dict[\"conditions\"] = [c[-1] for c in self.conditions]\n\n return preset_dict\n\n def to_yaml(self, include_target=False, full_config=False, sort_keys=False):\n \"\"\"\n Return the preset in the form of a YAML string.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n sort_keys (bool, optional): If True, sort YAML keys alphabetically\n\n Returns:\n str: The preset in the form of a YAML string\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> print(preset.to_yaml())\n flags:\n - subdomain-enum\n modules:\n - portscan\n \"\"\"\n preset_dict = self.to_dict(include_target=include_target, full_config=full_config)\n return yaml.dump(preset_dict, sort_keys=sort_keys)\n\n def _is_valid_module(self, module, module_type, name_only=False, raise_error=True):\n if module_type == \"scan\":\n module_choices = self.module_loader.scan_module_choices\n elif module_type == \"output\":\n module_choices = self.module_loader.output_module_choices\n elif module_type == \"internal\":\n module_choices = self.module_loader.internal_module_choices\n else:\n raise ValidationError(f'Unknown module type \"{module}\"')\n\n if not module in module_choices:\n raise ValidationError(get_closest_match(module, module_choices, msg=f\"{module_type} module\"))\n\n try:\n preloaded = self.module_loader.preloaded()[module]\n except KeyError:\n raise ValidationError(f'Unknown module \"{module}\"')\n\n if name_only:\n return True, \"\", preloaded\n\n if module in self.exclude_modules:\n reason = \"the module has been excluded\"\n if raise_error:\n raise ValidationError(f'Unable to add {module_type} module \"{module}\" because {reason}')\n return False, reason, {}\n\n module_flags = preloaded.get(\"flags\", [])\n _module_type = preloaded.get(\"type\", \"scan\")\n if module_type:\n if _module_type != module_type:\n reason = f'its type ({_module_type}) is not \"{module_type}\"'\n if raise_error:\n raise ValidationError(f'Unable to add {module_type} module \"{module}\" because {reason}')\n return False, reason, preloaded\n\n if _module_type == \"scan\":\n if self.exclude_flags:\n for f in module_flags:\n if f in self.exclude_flags:\n return False, f'it has excluded flag, \"{f}\"', preloaded\n if self.require_flags and not all(f in module_flags for f in self.require_flags):\n return False, f'it doesn\\'t have the required flags ({\",\".join(self.require_flags)})', preloaded\n\n return True, \"\", preloaded\n\n def validate(self):\n \"\"\"\n Validate module/flag exclusions/requirements, and CLI config options if applicable.\n \"\"\"\n if self._cli:\n self.args.validate()\n\n # validate excluded modules\n for excluded_module in self.exclude_modules:\n if not excluded_module in self.module_loader.all_module_choices:\n raise ValidationError(\n get_closest_match(excluded_module, self.module_loader.all_module_choices, msg=\"module\")\n )\n # validate excluded flags\n for excluded_flag in self.exclude_flags:\n if not excluded_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(excluded_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate required flags\n for required_flag in self.require_flags:\n if not required_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(required_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate flags\n for flag in self.flags:\n if not flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(flag, self.module_loader.flag_choices, msg=\"flag\"))\n\n @property\n def all_presets(self):\n \"\"\"\n Recursively find all the presets and return them as a dictionary\n \"\"\"\n preset_dir = self.preset_dir\n home_dir = Path.home()\n\n # first, add local preset dir to PRESET_PATH\n PRESET_PATH.add_path(self.preset_dir)\n\n # ensure local preset directory exists\n mkdir(preset_dir)\n\n global DEFAULT_PRESETS\n if DEFAULT_PRESETS is None:\n presets = dict()\n for ext in (\"yml\", \"yaml\"):\n for preset_path in PRESET_PATH:\n # for every yaml file\n for original_filename in preset_path.rglob(f\"**/*.{ext}\"):\n # not including symlinks\n if original_filename.is_symlink():\n continue\n\n # try to load it as a preset\n try:\n loaded_preset = self.from_yaml_file(original_filename, _log=True)\n if loaded_preset is False:\n continue\n except Exception as e:\n log.warning(f'Failed to load preset at \"{original_filename}\": {e}')\n log.trace(traceback.format_exc())\n continue\n\n # category is the parent folder(s), if any\n category = str(original_filename.relative_to(preset_path).parent)\n if category == \".\":\n category = \"\"\n\n local_preset = original_filename\n # populate symlinks in local preset dir\n if not original_filename.is_relative_to(preset_dir):\n relative_preset = original_filename.relative_to(preset_path)\n local_preset = preset_dir / relative_preset\n mkdir(local_preset.parent, check_writable=False)\n if not local_preset.exists():\n local_preset.symlink_to(original_filename)\n\n # collapse home directory into \"~\"\n if local_preset.is_relative_to(home_dir):\n local_preset = Path(\"~\") / local_preset.relative_to(home_dir)\n\n presets[local_preset] = (loaded_preset, category, preset_path, original_filename)\n\n # sort by name\n DEFAULT_PRESETS = dict(sorted(presets.items(), key=lambda x: x[-1][0].name))\n return DEFAULT_PRESETS\n\n def presets_table(self, include_modules=True):\n \"\"\"\n Return a table of all the presets in the form of a string\n \"\"\"\n table = []\n header = [\"Preset\", \"Category\", \"Description\", \"# Modules\"]\n if include_modules:\n header.append(\"Modules\")\n for yaml_file, (loaded_preset, category, preset_path, original_file) in self.all_presets.items():\n loaded_preset = loaded_preset.bake()\n num_modules = f\"{len(loaded_preset.scan_modules):,}\"\n row = [loaded_preset.name, category, loaded_preset.description, num_modules]\n if include_modules:\n row.append(\", \".join(sorted(loaded_preset.scan_modules)))\n table.append(row)\n return make_table(table, header)\n\n def log_verbose(self, msg):\n if self._log:\n log.verbose(f\"Preset {self.name}: {msg}\")\n\n def log_debug(self, msg):\n if self._log:\n log.debug(f\"Preset {self.name}: {msg}\")\n
"},{"location":"dev/presets/#bbot.scanner.Preset.all_presets","title":"all_presets property
","text":"all_presets\n
Recursively find all the presets and return them as a dictionary
"},{"location":"dev/presets/#bbot.scanner.Preset.__init__","title":"__init__","text":"__init__(*targets, whitelist=None, blacklist=None, strict_scope=False, modules=None, output_modules=None, exclude_modules=None, flags=None, require_flags=None, exclude_flags=None, config=None, module_dirs=None, include=None, presets=None, output_dir=None, scan_name=None, name=None, description=None, conditions=None, force_start=False, verbose=False, debug=False, silent=False, _exclude=None, _log=True)\n
Initializes the Preset class.
Parameters:
*targets
(str
, default: ()
) \u2013 Target(s) to scan. Types supported: hostnames, IPs, CIDRs, emails, open ports.
whitelist
(list
, default: None
) \u2013 Whitelisted target(s) to scan. Defaults to the same as targets
.
blacklist
(list
, default: None
) \u2013 Blacklisted target(s). Takes ultimate precedence. Defaults to empty.
strict_scope
(bool
, default: False
) \u2013 If True, subdomains of targets are not in-scope.
modules
(list[str]
, default: None
) \u2013 List of scan modules to enable for the scan. Defaults to empty list.
output_modules
(list[str]
, default: None
) \u2013 List of output modules to use. Defaults to csv, human, and json.
exclude_modules
(list[str]
, default: None
) \u2013 List of modules to exclude from the scan.
require_flags
(list[str]
, default: None
) \u2013 Only enable modules if they have these flags.
exclude_flags
(list[str]
, default: None
) \u2013 Don't enable modules if they have any of these flags.
module_dirs
(list[str]
, default: None
) \u2013 additional directories to load modules from.
config
(dict
, default: None
) \u2013 Additional scan configuration settings.
include
(list[str]
, default: None
) \u2013 names or filenames of other presets to include.
presets
(list[str]
, default: None
) \u2013 an alias for include
.
output_dir
(str or Path
, default: None
) \u2013 Directory to store scan output. Defaults to BBOT home directory (~/.bbot
).
scan_name
(str
, default: None
) \u2013 Human-readable name of the scan. If not specified, it will be random, e.g. \"demonic_jimmy\".
name
(str
, default: None
) \u2013 Human-readable name of the preset. Used mainly for logging.
description
(str
, default: None
) \u2013 Description of the preset.
conditions
(list[str]
, default: None
) \u2013 Custom conditions to be executed before scan start. Written in Jinja2.
force_start
(bool
, default: False
) \u2013 If True, ignore conditional aborts and failed module setups. Just run the scan!
verbose
(bool
, default: False
) \u2013 Set the BBOT logger to verbose mode.
debug
(bool
, default: False
) \u2013 Set the BBOT logger to debug mode.
silent
(bool
, default: False
) \u2013 Silence all stderr (effectively disables the BBOT logger).
_exclude
(list[Path]
, default: None
) \u2013 Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.
_log
(bool
, default: True
) \u2013 Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.
bbot/scanner/preset/preset.py
def __init__(\n self,\n *targets,\n whitelist=None,\n blacklist=None,\n strict_scope=False,\n modules=None,\n output_modules=None,\n exclude_modules=None,\n flags=None,\n require_flags=None,\n exclude_flags=None,\n config=None,\n module_dirs=None,\n include=None,\n presets=None,\n output_dir=None,\n scan_name=None,\n name=None,\n description=None,\n conditions=None,\n force_start=False,\n verbose=False,\n debug=False,\n silent=False,\n _exclude=None,\n _log=True,\n):\n \"\"\"\n Initializes the Preset class.\n\n Args:\n *targets (str): Target(s) to scan. Types supported: hostnames, IPs, CIDRs, emails, open ports.\n whitelist (list, optional): Whitelisted target(s) to scan. Defaults to the same as `targets`.\n blacklist (list, optional): Blacklisted target(s). Takes ultimate precedence. Defaults to empty.\n strict_scope (bool, optional): If True, subdomains of targets are not in-scope.\n modules (list[str], optional): List of scan modules to enable for the scan. Defaults to empty list.\n output_modules (list[str], optional): List of output modules to use. Defaults to csv, human, and json.\n exclude_modules (list[str], optional): List of modules to exclude from the scan.\n require_flags (list[str], optional): Only enable modules if they have these flags.\n exclude_flags (list[str], optional): Don't enable modules if they have any of these flags.\n module_dirs (list[str], optional): additional directories to load modules from.\n config (dict, optional): Additional scan configuration settings.\n include (list[str], optional): names or filenames of other presets to include.\n presets (list[str], optional): an alias for `include`.\n output_dir (str or Path, optional): Directory to store scan output. Defaults to BBOT home directory (`~/.bbot`).\n scan_name (str, optional): Human-readable name of the scan. If not specified, it will be random, e.g. \"demonic_jimmy\".\n name (str, optional): Human-readable name of the preset. Used mainly for logging.\n description (str, optional): Description of the preset.\n conditions (list[str], optional): Custom conditions to be executed before scan start. Written in Jinja2.\n force_start (bool, optional): If True, ignore conditional aborts and failed module setups. Just run the scan!\n verbose (bool, optional): Set the BBOT logger to verbose mode.\n debug (bool, optional): Set the BBOT logger to debug mode.\n silent (bool, optional): Silence all stderr (effectively disables the BBOT logger).\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n \"\"\"\n # internal variables\n self._cli = False\n self._log = _log\n self.scan = None\n self._args = None\n self._environ = None\n self._helpers = None\n self._module_loader = None\n self._yaml_str = \"\"\n self._baked = False\n\n self._default_output_modules = None\n self._default_internal_modules = None\n\n # modules / flags\n self.modules = set()\n self.exclude_modules = set()\n self.flags = set()\n self.exclude_flags = set()\n self.require_flags = set()\n\n # modules + flags\n if modules is None:\n modules = []\n if isinstance(modules, str):\n modules = [modules]\n if output_modules is None:\n output_modules = []\n if isinstance(output_modules, str):\n output_modules = [output_modules]\n if exclude_modules is None:\n exclude_modules = []\n if isinstance(exclude_modules, str):\n exclude_modules = [exclude_modules]\n if flags is None:\n flags = []\n if isinstance(flags, str):\n flags = [flags]\n if exclude_flags is None:\n exclude_flags = []\n if isinstance(exclude_flags, str):\n exclude_flags = [exclude_flags]\n if require_flags is None:\n require_flags = []\n if isinstance(require_flags, str):\n require_flags = [require_flags]\n\n # these are used only for preserving the modules as specified in the original preset\n # this is to ensure the preset looks the same when reserialized\n self.explicit_scan_modules = set() if modules is None else set(modules)\n self.explicit_output_modules = set() if output_modules is None else set(output_modules)\n\n # whether to force-start the scan (ignoring conditional aborts and failed module setups)\n self.force_start = force_start\n\n # scan output directory\n self.output_dir = output_dir\n # name of scan\n self.scan_name = scan_name\n\n # name of preset, default blank\n self.name = name or \"\"\n # preset description, default blank\n self.description = description or \"\"\n\n # custom conditions, evaluated during .bake()\n self.conditions = []\n if conditions is not None:\n for condition in conditions:\n self.conditions.append((self.name, condition))\n\n # keeps track of loaded preset files to prevent infinite circular inclusions\n self._preset_files_loaded = set()\n if _exclude is not None:\n for _filename in _exclude:\n self._preset_files_loaded.add(Path(_filename).resolve())\n\n # bbot core config\n self.core = CORE.copy()\n if config is None:\n config = omegaconf.OmegaConf.create({})\n # merge custom configs if specified by the user\n self.core.merge_custom(config)\n\n # log verbosity\n # actual log verbosity isn't set until .bake()\n self.verbose = verbose\n self.debug = debug\n self.silent = silent\n\n # custom module directories\n self._module_dirs = set()\n self.module_dirs = module_dirs\n\n # target / whitelist / blacklist\n self.strict_scope = strict_scope\n # these are temporary receptacles until they all get .baked() together\n self._seeds = set(targets if targets else [])\n self._whitelist = set(whitelist) if whitelist else whitelist\n self._blacklist = set(blacklist if blacklist else [])\n\n self._target = None\n\n # \"presets\" is alias to \"include\"\n if presets and include:\n raise ValueError(\n 'Cannot use both \"presets\" and \"include\" args at the same time (presets is only an alias to include). Please pick only one :)'\n )\n if presets and not include:\n include = presets\n # include other presets\n if include and not isinstance(include, (list, tuple, set)):\n include = [include]\n if include:\n for included_preset in include:\n self.include_preset(included_preset)\n\n # we don't fill self.modules yet (that happens in .bake())\n self.explicit_scan_modules.update(set(modules))\n self.explicit_output_modules.update(set(output_modules))\n self.exclude_modules.update(set(exclude_modules))\n self.flags.update(set(flags))\n self.exclude_flags.update(set(exclude_flags))\n self.require_flags.update(set(require_flags))\n
"},{"location":"dev/presets/#bbot.scanner.Preset.bake","title":"bake","text":"bake(scan=None)\n
Return a \"baked\" copy of this preset, ready for use by a BBOT scan.
Baking a preset finalizes it by populating preset.modules
based on flags, performing final validations, and substituting environment variables in preloaded modules. It also evaluates custom conditions
as specified in the preset.
This function is automatically called in Scanner.init(). There is no need to call it manually.
Source code inbbot/scanner/preset/preset.py
def bake(self, scan=None):\n \"\"\"\n Return a \"baked\" copy of this preset, ready for use by a BBOT scan.\n\n Baking a preset finalizes it by populating `preset.modules` based on flags,\n performing final validations, and substituting environment variables in preloaded modules.\n It also evaluates custom `conditions` as specified in the preset.\n\n This function is automatically called in Scanner.__init__(). There is no need to call it manually.\n \"\"\"\n self.log_debug(\"Getting baked\")\n # create a copy of self\n baked_preset = copy(self)\n baked_preset.scan = scan\n # copy core\n baked_preset.core = self.core.copy()\n # copy module loader\n baked_preset._module_loader = self.module_loader.copy()\n # prepare os environment\n os_environ = baked_preset.environ.prepare()\n # find and replace preloaded modules with os environ\n # this is different from the config variable substitution because it modifies\n # the preloaded modules, i.e. their ansible playbooks\n baked_preset.module_loader.find_and_replace(**os_environ)\n # update os environ\n os.environ.clear()\n os.environ.update(os_environ)\n\n # validate flags, config options\n baked_preset.validate()\n\n # validate log level options\n baked_preset.apply_log_level(apply_core=scan is not None)\n\n # assign baked preset to our scan\n if scan is not None:\n scan.preset = baked_preset\n\n # now that our requirements / exclusions are validated, we can start enabling modules\n # enable scan modules\n for module in baked_preset.explicit_scan_modules:\n baked_preset.add_module(module, module_type=\"scan\")\n\n # enable output modules\n output_modules_to_enable = set(baked_preset.explicit_output_modules)\n default_output_modules = self.default_output_modules\n output_module_override = any(m in default_output_modules for m in output_modules_to_enable)\n # if none of the default output modules have been explicitly specified, enable them all\n if not output_module_override:\n output_modules_to_enable.update(self.default_output_modules)\n for module in output_modules_to_enable:\n baked_preset.add_module(module, module_type=\"output\", raise_error=False)\n\n # enable internal modules\n for internal_module, preloaded in self.default_internal_modules.items():\n is_enabled = baked_preset.config.get(internal_module, True)\n is_excluded = internal_module in baked_preset.exclude_modules\n if is_enabled and not is_excluded:\n baked_preset.add_module(internal_module, module_type=\"internal\", raise_error=False)\n\n # disable internal modules if requested\n for internal_module in baked_preset.internal_modules:\n if baked_preset.config.get(internal_module, True) == False:\n baked_preset.exclude_modules.add(internal_module)\n\n # enable modules by flag\n for flag in baked_preset.flags:\n for module, preloaded in baked_preset.module_loader.preloaded().items():\n module_flags = preloaded.get(\"flags\", [])\n module_type = preloaded.get(\"type\", \"scan\")\n if flag in module_flags:\n self.log_debug(f'Enabling module \"{module}\" because it has flag \"{flag}\"')\n baked_preset.add_module(module, module_type, raise_error=False)\n\n # ensure we have output modules\n if not baked_preset.output_modules:\n for output_module in self.default_output_modules:\n baked_preset.add_module(output_module, module_type=\"output\", raise_error=False)\n\n # create target object\n from bbot.scanner.target import BBOTTarget\n\n baked_preset._target = BBOTTarget(\n *list(self._seeds),\n whitelist=self._whitelist,\n blacklist=self._blacklist,\n strict_scope=self.strict_scope,\n scan=scan,\n )\n\n # evaluate conditions\n if baked_preset.conditions:\n from .conditions import ConditionEvaluator\n\n evaluator = ConditionEvaluator(baked_preset)\n evaluator.evaluate()\n\n self._baked = True\n return baked_preset\n
"},{"location":"dev/presets/#bbot.scanner.Preset.from_dict","title":"from_dict classmethod
","text":"from_dict(preset_dict, name=None, _exclude=None, _log=False)\n
Create a preset from a Python dictionary object.
Parameters:
preset_dict
(dict
) \u2013 Preset in dictionary form
name
(str
, default: None
) \u2013 Name of preset
_exclude
(list[Path]
, default: None
) \u2013 Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.
_log
(bool
, default: False
) \u2013 Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.
Returns:
Preset
\u2013 The loaded preset
Examples:
>>> preset = Preset.from_dict({\"target\": [\"evilcorp.com\"], \"modules\": [\"portscan\"]})\n
Source code in bbot/scanner/preset/preset.py
@classmethod\ndef from_dict(cls, preset_dict, name=None, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a Python dictionary object.\n\n Args:\n preset_dict (dict): Preset in dictionary form\n name (str, optional): Name of preset\n _exclude (list[Path], optional): Preset filenames to exclude from inclusion. Used internally to prevent infinite recursion in circular or self-referencing presets.\n _log (bool, optional): Whether to enable logging for the preset. This will record which modules/flags are enabled, etc.\n\n Returns:\n Preset: The loaded preset\n\n Examples:\n >>> preset = Preset.from_dict({\"target\": [\"evilcorp.com\"], \"modules\": [\"portscan\"]})\n \"\"\"\n new_preset = cls(\n *preset_dict.get(\"target\", []),\n whitelist=preset_dict.get(\"whitelist\"),\n blacklist=preset_dict.get(\"blacklist\"),\n modules=preset_dict.get(\"modules\"),\n output_modules=preset_dict.get(\"output_modules\"),\n exclude_modules=preset_dict.get(\"exclude_modules\"),\n flags=preset_dict.get(\"flags\"),\n require_flags=preset_dict.get(\"require_flags\"),\n exclude_flags=preset_dict.get(\"exclude_flags\"),\n verbose=preset_dict.get(\"verbose\", False),\n debug=preset_dict.get(\"debug\", False),\n silent=preset_dict.get(\"silent\", False),\n config=preset_dict.get(\"config\"),\n strict_scope=preset_dict.get(\"strict_scope\", False),\n module_dirs=preset_dict.get(\"module_dirs\", []),\n include=list(preset_dict.get(\"include\", [])),\n scan_name=preset_dict.get(\"scan_name\"),\n output_dir=preset_dict.get(\"output_dir\"),\n name=preset_dict.get(\"name\", name),\n description=preset_dict.get(\"description\"),\n conditions=preset_dict.get(\"conditions\", []),\n _exclude=_exclude,\n _log=_log,\n )\n return new_preset\n
"},{"location":"dev/presets/#bbot.scanner.Preset.from_yaml_file","title":"from_yaml_file classmethod
","text":"from_yaml_file(filename, _exclude=None, _log=False)\n
Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.
The file extension is optional.
Examples:
>>> preset = Preset.from_yaml_file(\"/home/user/my_preset.yml\")\n
Source code in bbot/scanner/preset/preset.py
@classmethod\ndef from_yaml_file(cls, filename, _exclude=None, _log=False):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> preset = Preset.from_yaml_file(\"/home/user/my_preset.yml\")\n \"\"\"\n filename = Path(filename).resolve()\n try:\n return _preset_cache[filename]\n except KeyError:\n if _exclude is None:\n _exclude = set()\n if _exclude is not None and filename in _exclude:\n log.debug(f\"Not loading {filename} because it was already loaded {_exclude}\")\n return False\n log.debug(f\"Loading {filename} because it's not in excluded list ({_exclude})\")\n _exclude = set(_exclude)\n _exclude.add(filename)\n try:\n yaml_str = open(filename).read()\n except FileNotFoundError:\n raise PresetNotFoundError(f'Could not find preset at \"{filename}\" - file does not exist')\n preset = cls.from_dict(\n omegaconf.OmegaConf.create(yaml_str), name=filename.stem, _exclude=_exclude, _log=_log\n )\n preset._yaml_str = yaml_str\n _preset_cache[filename] = preset\n return preset\n
"},{"location":"dev/presets/#bbot.scanner.Preset.from_yaml_string","title":"from_yaml_string classmethod
","text":"from_yaml_string(yaml_preset)\n
Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.
The file extension is optional.
Examples:
>>> yaml_string = '''\n>>> target:\n>>> - evilcorp.com\n>>> modules:\n>>> - portscan'''\n>>> preset = Preset.from_yaml_string(yaml_string)\n
Source code in bbot/scanner/preset/preset.py
@classmethod\ndef from_yaml_string(cls, yaml_preset):\n \"\"\"\n Create a preset from a YAML file. If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Examples:\n >>> yaml_string = '''\n >>> target:\n >>> - evilcorp.com\n >>> modules:\n >>> - portscan'''\n >>> preset = Preset.from_yaml_string(yaml_string)\n \"\"\"\n return cls.from_dict(omegaconf.OmegaConf.create(yaml_preset))\n
"},{"location":"dev/presets/#bbot.scanner.Preset.include_preset","title":"include_preset","text":"include_preset(filename)\n
Load a preset from a yaml file and merge it into this one.
If the full path is not specified, BBOT will look in all the usual places for it.
The file extension is optional.
Parameters:
filename
(Path
) \u2013 The preset YAML file to merge
Examples:
>>> preset.include_preset(\"/home/user/my_preset.yml\")\n
Source code in bbot/scanner/preset/preset.py
def include_preset(self, filename):\n \"\"\"\n Load a preset from a yaml file and merge it into this one.\n\n If the full path is not specified, BBOT will look in all the usual places for it.\n\n The file extension is optional.\n\n Args:\n filename (Path): The preset YAML file to merge\n\n Examples:\n >>> preset.include_preset(\"/home/user/my_preset.yml\")\n \"\"\"\n self.log_debug(f'Including preset \"{filename}\"')\n preset_filename = PRESET_PATH.find(filename)\n preset_from_yaml = self.from_yaml_file(preset_filename, _exclude=self._preset_files_loaded)\n if preset_from_yaml is not False:\n self.merge(preset_from_yaml)\n self._preset_files_loaded.add(preset_filename)\n
"},{"location":"dev/presets/#bbot.scanner.Preset.merge","title":"merge","text":"merge(other)\n
Merge another preset into this one.
If there are any config conflicts, other
will win over self
.
Parameters:
other
(Preset
) \u2013 The preset to merge into this one.
Examples:
>>> preset1 = Preset(modules=[\"portscan\"])\n>>> preset1.scan_modules\n['portscan']\n>>> preset2 = Preset(modules=[\"sslcert\"])\n>>> preset2.scan_modules\n['sslcert']\n>>> preset1.merge(preset2)\n>>> preset1.scan_modules\n['portscan', 'sslcert']\n
Source code in bbot/scanner/preset/preset.py
def merge(self, other):\n \"\"\"\n Merge another preset into this one.\n\n If there are any config conflicts, `other` will win over `self`.\n\n Args:\n other (Preset): The preset to merge into this one.\n\n Examples:\n >>> preset1 = Preset(modules=[\"portscan\"])\n >>> preset1.scan_modules\n ['portscan']\n >>> preset2 = Preset(modules=[\"sslcert\"])\n >>> preset2.scan_modules\n ['sslcert']\n >>> preset1.merge(preset2)\n >>> preset1.scan_modules\n ['portscan', 'sslcert']\n \"\"\"\n self.log_debug(f'Merging preset \"{other.name}\" into \"{self.name}\"')\n # config\n self.core.merge_custom(other.core.custom_config)\n self.module_loader.core = self.core\n # module dirs\n # modules + flags\n # establish requirements / exclusions first\n self.exclude_modules.update(other.exclude_modules)\n self.require_flags.update(other.require_flags)\n self.exclude_flags.update(other.exclude_flags)\n # then it's okay to start enabling modules\n self.explicit_scan_modules.update(other.explicit_scan_modules)\n self.explicit_output_modules.update(other.explicit_output_modules)\n self.flags.update(other.flags)\n\n # target / scope\n self._seeds.update(other._seeds)\n # leave whitelist as None until we encounter one\n if other._whitelist is not None:\n if self._whitelist is None:\n self._whitelist = set(other._whitelist)\n else:\n self._whitelist.update(other._whitelist)\n self._blacklist.update(other._blacklist)\n self.strict_scope = self.strict_scope or other.strict_scope\n\n # log verbosity\n if other.silent:\n self.silent = other.silent\n if other.verbose:\n self.verbose = other.verbose\n if other.debug:\n self.debug = other.debug\n # scan name\n if other.scan_name is not None:\n self.scan_name = other.scan_name\n if other.output_dir is not None:\n self.output_dir = other.output_dir\n # conditions\n if other.conditions:\n self.conditions.extend(other.conditions)\n # misc\n self.force_start = self.force_start | other.force_start\n self._cli = self._cli | other._cli\n
"},{"location":"dev/presets/#bbot.scanner.Preset.parse_args","title":"parse_args","text":"parse_args()\n
Parse CLI arguments, and merge them into this preset.
Used in cli.py
.
bbot/scanner/preset/preset.py
def parse_args(self):\n \"\"\"\n Parse CLI arguments, and merge them into this preset.\n\n Used in `cli.py`.\n \"\"\"\n self._cli = True\n self.merge(self.args.preset_from_args())\n
"},{"location":"dev/presets/#bbot.scanner.Preset.presets_table","title":"presets_table","text":"presets_table(include_modules=True)\n
Return a table of all the presets in the form of a string
Source code inbbot/scanner/preset/preset.py
def presets_table(self, include_modules=True):\n \"\"\"\n Return a table of all the presets in the form of a string\n \"\"\"\n table = []\n header = [\"Preset\", \"Category\", \"Description\", \"# Modules\"]\n if include_modules:\n header.append(\"Modules\")\n for yaml_file, (loaded_preset, category, preset_path, original_file) in self.all_presets.items():\n loaded_preset = loaded_preset.bake()\n num_modules = f\"{len(loaded_preset.scan_modules):,}\"\n row = [loaded_preset.name, category, loaded_preset.description, num_modules]\n if include_modules:\n row.append(\", \".join(sorted(loaded_preset.scan_modules)))\n table.append(row)\n return make_table(table, header)\n
"},{"location":"dev/presets/#bbot.scanner.Preset.to_dict","title":"to_dict","text":"to_dict(include_target=False, full_config=False, redact_secrets=False)\n
Convert this preset into a Python dictionary.
Parameters:
include_target
(bool
, default: False
) \u2013 If True, include target, whitelist, and blacklist in the dictionary
full_config
(bool
, default: False
) \u2013 If True, include the entire config, not just what's changed from the defaults.
Returns:
dict
\u2013 The preset in dictionary form
Examples:
>>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n>>> preset.to_dict()\n{\"flags\": [\"subdomain-enum\"], \"modules\": [\"portscan\"]}\n
Source code in bbot/scanner/preset/preset.py
def to_dict(self, include_target=False, full_config=False, redact_secrets=False):\n \"\"\"\n Convert this preset into a Python dictionary.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n\n Returns:\n dict: The preset in dictionary form\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> preset.to_dict()\n {\"flags\": [\"subdomain-enum\"], \"modules\": [\"portscan\"]}\n \"\"\"\n preset_dict = {}\n\n # config\n if full_config:\n config = self.core.config\n else:\n config = self.core.custom_config\n config = omegaconf.OmegaConf.to_object(config)\n if redact_secrets:\n config = self.core.no_secrets_config(config)\n if config:\n preset_dict[\"config\"] = config\n\n # scope\n if include_target:\n target = sorted(str(t.data) for t in self.target.seeds)\n whitelist = []\n if self.target.whitelist is not None:\n whitelist = sorted(str(t.data) for t in self.target.whitelist)\n blacklist = sorted(str(t.data) for t in self.target.blacklist)\n if target:\n preset_dict[\"target\"] = target\n if whitelist and whitelist != target:\n preset_dict[\"whitelist\"] = whitelist\n if blacklist:\n preset_dict[\"blacklist\"] = blacklist\n if self.strict_scope:\n preset_dict[\"strict_scope\"] = True\n\n # flags + modules\n if self.require_flags:\n preset_dict[\"require_flags\"] = sorted(self.require_flags)\n if self.exclude_flags:\n preset_dict[\"exclude_flags\"] = sorted(self.exclude_flags)\n if self.exclude_modules:\n preset_dict[\"exclude_modules\"] = sorted(self.exclude_modules)\n if self.flags:\n preset_dict[\"flags\"] = sorted(self.flags)\n if self.explicit_scan_modules:\n preset_dict[\"modules\"] = sorted(self.explicit_scan_modules)\n if self.explicit_output_modules:\n preset_dict[\"output_modules\"] = sorted(self.explicit_output_modules)\n\n # log verbosity\n if self.verbose:\n preset_dict[\"verbose\"] = True\n if self.debug:\n preset_dict[\"debug\"] = True\n if self.silent:\n preset_dict[\"silent\"] = True\n\n # misc scan options\n if self.scan_name:\n preset_dict[\"scan_name\"] = self.scan_name\n if self.scan_name:\n preset_dict[\"output_dir\"] = self.output_dir\n\n # conditions\n if self.conditions:\n preset_dict[\"conditions\"] = [c[-1] for c in self.conditions]\n\n return preset_dict\n
"},{"location":"dev/presets/#bbot.scanner.Preset.to_yaml","title":"to_yaml","text":"to_yaml(include_target=False, full_config=False, sort_keys=False)\n
Return the preset in the form of a YAML string.
Parameters:
include_target
(bool
, default: False
) \u2013 If True, include target, whitelist, and blacklist in the dictionary
full_config
(bool
, default: False
) \u2013 If True, include the entire config, not just what's changed from the defaults.
sort_keys
(bool
, default: False
) \u2013 If True, sort YAML keys alphabetically
Returns:
str
\u2013 The preset in the form of a YAML string
Examples:
>>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n>>> print(preset.to_yaml())\nflags:\n- subdomain-enum\nmodules:\n- portscan\n
Source code in bbot/scanner/preset/preset.py
def to_yaml(self, include_target=False, full_config=False, sort_keys=False):\n \"\"\"\n Return the preset in the form of a YAML string.\n\n Args:\n include_target (bool, optional): If True, include target, whitelist, and blacklist in the dictionary\n full_config (bool, optional): If True, include the entire config, not just what's changed from the defaults.\n sort_keys (bool, optional): If True, sort YAML keys alphabetically\n\n Returns:\n str: The preset in the form of a YAML string\n\n Examples:\n >>> preset = Preset(flags=[\"subdomain-enum\"], modules=[\"portscan\"])\n >>> print(preset.to_yaml())\n flags:\n - subdomain-enum\n modules:\n - portscan\n \"\"\"\n preset_dict = self.to_dict(include_target=include_target, full_config=full_config)\n return yaml.dump(preset_dict, sort_keys=sort_keys)\n
"},{"location":"dev/presets/#bbot.scanner.Preset.validate","title":"validate","text":"validate()\n
Validate module/flag exclusions/requirements, and CLI config options if applicable.
Source code inbbot/scanner/preset/preset.py
def validate(self):\n \"\"\"\n Validate module/flag exclusions/requirements, and CLI config options if applicable.\n \"\"\"\n if self._cli:\n self.args.validate()\n\n # validate excluded modules\n for excluded_module in self.exclude_modules:\n if not excluded_module in self.module_loader.all_module_choices:\n raise ValidationError(\n get_closest_match(excluded_module, self.module_loader.all_module_choices, msg=\"module\")\n )\n # validate excluded flags\n for excluded_flag in self.exclude_flags:\n if not excluded_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(excluded_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate required flags\n for required_flag in self.require_flags:\n if not required_flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(required_flag, self.module_loader.flag_choices, msg=\"flag\"))\n # validate flags\n for flag in self.flags:\n if not flag in self.module_loader.flag_choices:\n raise ValidationError(get_closest_match(flag, self.module_loader.flag_choices, msg=\"flag\"))\n
"},{"location":"dev/scanner/","title":"Scanner","text":""},{"location":"dev/scanner/#bbot.scanner.Scanner","title":"Scanner","text":"A class representing a single BBOT scan
Examples:
Create scan with multiple targets:
>>> my_scan = Scanner(\"evilcorp.com\", \"1.2.3.0/24\", modules=[\"portscan\", \"sslcert\", \"httpx\"])\n
Create scan with custom config:
>>> config = {\"http_proxy\": \"http://127.0.0.1:8080\", \"modules\": {\"portscan\": {\"top_ports\": 2000}}}\n>>> my_scan = Scanner(\"www.evilcorp.com\", modules=[\"portscan\", \"httpx\"], config=config)\n
Start the scan, iterating over events as they're discovered (synchronous):
>>> for event in my_scan.start():\n>>> print(event)\n
Start the scan, iterating over events as they're discovered (asynchronous):
>>> async for event in my_scan.async_start():\n>>> print(event)\n
Start the scan without consuming events (synchronous):
>>> my_scan.start_without_generator()\n
Start the scan without consuming events (asynchronous):
>>> await my_scan.async_start_without_generator()\n
Attributes:
status
(str
) \u2013 Status of scan, representing its current state. It can take on the following string values, each of which is mapped to an integer code in _status_codes
:
- \"NOT_STARTED\" (0): Initial status before the scan starts.\n- \"STARTING\" (1): Status when the scan is initializing.\n- \"RUNNING\" (2): Status when the scan is in progress.\n- \"FINISHING\" (3): Status when the scan is in the process of finalizing.\n- \"CLEANING_UP\" (4): Status when the scan is cleaning up resources.\n- \"ABORTING\" (5): Status when the scan is in the process of being aborted.\n- \"ABORTED\" (6): Status when the scan has been aborted.\n- \"FAILED\" (7): Status when the scan has encountered a failure.\n- \"FINISHED\" (8): Status when the scan has successfully completed.\n
_status_code
(int
) \u2013 The numerical representation of the current scan status, stored for internal use. It is mapped according to the values in _status_codes
.
target
(Target
) \u2013 Target of scan (alias to self.preset.target
).
preset
(Preset
) \u2013 The main scan Preset in its baked form.
config
(DictConfig
) \u2013 BBOT config (alias to self.preset.config
).
whitelist
(Target
) \u2013 Scan whitelist (by default this is the same as target
) (alias to self.preset.whitelist
).
blacklist
(Target
) \u2013 Scan blacklist (this takes ultimate precedence) (alias to self.preset.blacklist
).
helpers
(ConfigAwareHelper
) \u2013 Helper containing various reusable functions, regexes, etc. (alias to self.preset.helpers
).
output_dir
(Path
) \u2013 Output directory for scan (alias to self.preset.output_dir
).
name
(str
) \u2013 Name of scan (alias to self.preset.scan_name
).
dispatcher
(Dispatcher
) \u2013 Triggers certain events when the scan status
changes.
modules
(dict
) \u2013 Holds all loaded modules in this format: {\"module_name\": Module()}
.
stats
(ScanStats
) \u2013 Holds high-level scan statistics such as how many events have been produced and consumed by each module.
home
(Path
) \u2013 Base output directory of the scan (default: ~/.bbot/scans/<scan_name>
).
running
(bool
) \u2013 Whether the scan is currently running.
stopping
(bool
) \u2013 Whether the scan is currently stopping.
stopped
(bool
) \u2013 Whether the scan is currently stopped.
aborting
(bool
) \u2013 Whether the scan is aborted or currently aborting.
on_status
event in the dispatcher.bbot/scanner/scanner.py
class Scanner:\n \"\"\"A class representing a single BBOT scan\n\n Examples:\n Create scan with multiple targets:\n >>> my_scan = Scanner(\"evilcorp.com\", \"1.2.3.0/24\", modules=[\"portscan\", \"sslcert\", \"httpx\"])\n\n Create scan with custom config:\n >>> config = {\"http_proxy\": \"http://127.0.0.1:8080\", \"modules\": {\"portscan\": {\"top_ports\": 2000}}}\n >>> my_scan = Scanner(\"www.evilcorp.com\", modules=[\"portscan\", \"httpx\"], config=config)\n\n Start the scan, iterating over events as they're discovered (synchronous):\n >>> for event in my_scan.start():\n >>> print(event)\n\n Start the scan, iterating over events as they're discovered (asynchronous):\n >>> async for event in my_scan.async_start():\n >>> print(event)\n\n Start the scan without consuming events (synchronous):\n >>> my_scan.start_without_generator()\n\n Start the scan without consuming events (asynchronous):\n >>> await my_scan.async_start_without_generator()\n\n Attributes:\n status (str): Status of scan, representing its current state. It can take on the following string values, each of which is mapped to an integer code in `_status_codes`:\n ```markdown\n - \"NOT_STARTED\" (0): Initial status before the scan starts.\n - \"STARTING\" (1): Status when the scan is initializing.\n - \"RUNNING\" (2): Status when the scan is in progress.\n - \"FINISHING\" (3): Status when the scan is in the process of finalizing.\n - \"CLEANING_UP\" (4): Status when the scan is cleaning up resources.\n - \"ABORTING\" (5): Status when the scan is in the process of being aborted.\n - \"ABORTED\" (6): Status when the scan has been aborted.\n - \"FAILED\" (7): Status when the scan has encountered a failure.\n - \"FINISHED\" (8): Status when the scan has successfully completed.\n ```\n _status_code (int): The numerical representation of the current scan status, stored for internal use. It is mapped according to the values in `_status_codes`.\n target (Target): Target of scan (alias to `self.preset.target`).\n preset (Preset): The main scan Preset in its baked form.\n config (omegaconf.dictconfig.DictConfig): BBOT config (alias to `self.preset.config`).\n whitelist (Target): Scan whitelist (by default this is the same as `target`) (alias to `self.preset.whitelist`).\n blacklist (Target): Scan blacklist (this takes ultimate precedence) (alias to `self.preset.blacklist`).\n helpers (ConfigAwareHelper): Helper containing various reusable functions, regexes, etc. (alias to `self.preset.helpers`).\n output_dir (pathlib.Path): Output directory for scan (alias to `self.preset.output_dir`).\n name (str): Name of scan (alias to `self.preset.scan_name`).\n dispatcher (Dispatcher): Triggers certain events when the scan `status` changes.\n modules (dict): Holds all loaded modules in this format: `{\"module_name\": Module()}`.\n stats (ScanStats): Holds high-level scan statistics such as how many events have been produced and consumed by each module.\n home (pathlib.Path): Base output directory of the scan (default: `~/.bbot/scans/<scan_name>`).\n running (bool): Whether the scan is currently running.\n stopping (bool): Whether the scan is currently stopping.\n stopped (bool): Whether the scan is currently stopped.\n aborting (bool): Whether the scan is aborted or currently aborting.\n\n Notes:\n - The status is read-only once set to \"ABORTING\" until it transitions to \"ABORTED.\"\n - Invalid statuses are logged but not applied.\n - Setting a status will trigger the `on_status` event in the dispatcher.\n \"\"\"\n\n _status_codes = {\n \"NOT_STARTED\": 0,\n \"STARTING\": 1,\n \"RUNNING\": 2,\n \"FINISHING\": 3,\n \"CLEANING_UP\": 4,\n \"ABORTING\": 5,\n \"ABORTED\": 6,\n \"FAILED\": 7,\n \"FINISHED\": 8,\n }\n\n def __init__(\n self,\n *targets,\n scan_id=None,\n dispatcher=None,\n **kwargs,\n ):\n \"\"\"\n Initializes the Scanner class.\n\n If a premade `preset` is specified, it will be used for the scan.\n Otherwise, `Scan` accepts the same arguments as `Preset`, which are passed through and used to create a new preset.\n\n Args:\n *targets (list[str], optional): Scan targets (passed through to `Preset`).\n preset (Preset, optional): Preset to use for the scan.\n scan_id (str, optional): Unique identifier for the scan. Auto-generates if None.\n dispatcher (Dispatcher, optional): Dispatcher object to use. Defaults to new Dispatcher.\n **kwargs (list[str], optional): Additional keyword arguments (passed through to `Preset`).\n \"\"\"\n if scan_id is not None:\n self.id = str(id)\n else:\n self.id = f\"SCAN:{sha1(rand_string(20)).hexdigest()}\"\n\n preset = kwargs.pop(\"preset\", None)\n kwargs[\"_log\"] = True\n\n from .preset import Preset\n\n if preset is None:\n preset = Preset(*targets, **kwargs)\n else:\n if not isinstance(preset, Preset):\n raise ValidationError(f'Preset must be of type Preset, not \"{type(preset).__name__}\"')\n self.preset = preset.bake(self)\n\n # scan name\n if preset.scan_name is None:\n tries = 0\n while 1:\n if tries > 5:\n scan_name = f\"{rand_string(4)}_{rand_string(4)}\"\n break\n scan_name = random_name()\n if self.preset.output_dir is not None:\n home_path = Path(self.preset.output_dir).resolve() / scan_name\n else:\n home_path = self.preset.bbot_home / \"scans\" / scan_name\n if not home_path.exists():\n break\n tries += 1\n else:\n scan_name = str(preset.scan_name)\n self.name = scan_name\n\n # scan output dir\n if preset.output_dir is not None:\n self.home = Path(preset.output_dir).resolve() / self.name\n else:\n self.home = self.preset.bbot_home / \"scans\" / self.name\n\n self._status = \"NOT_STARTED\"\n self._status_code = 0\n\n self.modules = OrderedDict({})\n self._modules_loaded = False\n self.dummy_modules = {}\n\n if dispatcher is None:\n from .dispatcher import Dispatcher\n\n self.dispatcher = Dispatcher()\n else:\n self.dispatcher = dispatcher\n self.dispatcher.set_scan(self)\n\n # scope distance\n self.scope_config = self.config.get(\"scope\", {})\n self.scope_search_distance = max(0, int(self.scope_config.get(\"search_distance\", 0)))\n self.scope_report_distance = int(self.scope_config.get(\"report_distance\", 1))\n\n # web config\n self.web_config = self.config.get(\"web\", {})\n self.web_spider_distance = self.web_config.get(\"spider_distance\", 0)\n self.web_spider_depth = self.web_config.get(\"spider_depth\", 1)\n self.web_spider_links_per_page = self.web_config.get(\"spider_links_per_page\", 20)\n max_redirects = self.web_config.get(\"http_max_redirects\", 5)\n self.web_max_redirects = max(max_redirects, self.web_spider_distance)\n self.http_proxy = self.web_config.get(\"http_proxy\", \"\")\n self.http_timeout = self.web_config.get(\"http_timeout\", 10)\n self.httpx_timeout = self.web_config.get(\"httpx_timeout\", 5)\n self.http_retries = self.web_config.get(\"http_retries\", 1)\n self.httpx_retries = self.web_config.get(\"httpx_retries\", 1)\n self.useragent = self.web_config.get(\"user_agent\", \"BBOT\")\n # custom HTTP headers warning\n self.custom_http_headers = self.web_config.get(\"http_headers\", {})\n if self.custom_http_headers:\n self.warning(\n \"You have enabled custom HTTP headers. These will be attached to all in-scope requests and all requests made by httpx.\"\n )\n\n # url file extensions\n self.url_extension_blacklist = set(e.lower() for e in self.config.get(\"url_extension_blacklist\", []))\n self.url_extension_httpx_only = set(e.lower() for e in self.config.get(\"url_extension_httpx_only\", []))\n\n # url querystring behavior\n self.url_querystring_remove = self.config.get(\"url_querystring_remove\", True)\n\n # blob inclusion\n self._file_blobs = self.config.get(\"file_blobs\", False)\n self._folder_blobs = self.config.get(\"folder_blobs\", False)\n\n # how often to print scan status\n self.status_frequency = self.config.get(\"status_frequency\", 15)\n\n from .stats import ScanStats\n\n self.stats = ScanStats(self)\n\n self._prepped = False\n self._finished_init = False\n self._new_activity = False\n self._cleanedup = False\n self._omitted_event_types = None\n\n self.__loop = None\n self._manager_worker_loop_tasks = []\n self.init_events_task = None\n self.ticker_task = None\n self.dispatcher_tasks = []\n\n self._stopping = False\n\n self._dns_strings = None\n self._dns_regexes = None\n\n self.__log_handlers = None\n self._log_handler_backup = []\n\n async def _prep(self):\n \"\"\"\n Creates the scan's output folder, loads its modules, and calls their .setup() methods.\n \"\"\"\n\n self.helpers.mkdir(self.home)\n if not self._prepped:\n # save scan preset\n with open(self.home / \"preset.yml\", \"w\") as f:\n f.write(self.preset.to_yaml())\n\n # log scan overview\n start_msg = f\"Scan with {len(self.preset.scan_modules):,} modules seeded with {len(self.target):,} targets\"\n details = []\n if self.whitelist != self.target:\n details.append(f\"{len(self.whitelist):,} in whitelist\")\n if self.blacklist:\n details.append(f\"{len(self.blacklist):,} in blacklist\")\n if details:\n start_msg += f\" ({', '.join(details)})\"\n self.hugeinfo(start_msg)\n\n # load scan modules (this imports and instantiates them)\n # up to this point they were only preloaded\n await self.load_modules()\n\n # run each module's .setup() method\n succeeded, hard_failed, soft_failed = await self.setup_modules()\n\n # intercept modules get sewn together like human centipede\n self.intercept_modules = [m for m in self.modules.values() if m._intercept]\n for i, intercept_module in enumerate(self.intercept_modules[1:]):\n prev_intercept_module = self.intercept_modules[i]\n self.debug(\n f\"Setting intercept module {intercept_module.name}._incoming_event_queue to previous intercept module {prev_intercept_module.name}.outgoing_event_queue\"\n )\n intercept_module._incoming_event_queue = prev_intercept_module.outgoing_event_queue\n\n # abort if there are no output modules\n num_output_modules = len([m for m in self.modules.values() if m._type == \"output\"])\n if num_output_modules < 1:\n raise ScanError(\"Failed to load output modules. Aborting.\")\n # abort if any of the module .setup()s hard-failed (i.e. they errored or returned False)\n total_failed = len(hard_failed + soft_failed)\n if hard_failed:\n msg = f\"Setup hard-failed for {len(hard_failed):,} modules ({','.join(hard_failed)})\"\n self._fail_setup(msg)\n\n total_modules = total_failed + len(self.modules)\n success_msg = f\"Setup succeeded for {len(self.modules):,}/{total_modules:,} modules.\"\n\n self.success(success_msg)\n self._prepped = True\n\n def start(self):\n for event in async_to_sync_gen(self.async_start()):\n yield event\n\n def start_without_generator(self):\n for event in async_to_sync_gen(self.async_start()):\n pass\n\n async def async_start_without_generator(self):\n async for event in self.async_start():\n pass\n\n async def async_start(self):\n \"\"\" \"\"\"\n failed = True\n scan_start_time = datetime.now()\n try:\n await self._prep()\n\n self._start_log_handlers()\n self.trace(f'Ran BBOT {__version__} at {scan_start_time}, command: {\" \".join(sys.argv)}')\n self.trace(f\"Target: {self.preset.target.json}\")\n self.trace(f\"Preset: {self.preset.to_dict(redact_secrets=True)}\")\n\n if not self.target:\n self.warning(f\"No scan targets specified\")\n\n # start status ticker\n self.ticker_task = asyncio.create_task(\n self._status_ticker(self.status_frequency), name=f\"{self.name}._status_ticker()\"\n )\n\n self.status = \"STARTING\"\n\n if not self.modules:\n self.error(f\"No modules loaded\")\n self.status = \"FAILED\"\n return\n else:\n self.hugesuccess(f\"Starting scan {self.name}\")\n\n await self.dispatcher.on_start(self)\n\n self.status = \"RUNNING\"\n self._start_modules()\n self.verbose(f\"{len(self.modules):,} modules started\")\n\n # distribute seed events\n self.init_events_task = asyncio.create_task(\n self.ingress_module.init_events(self.target.events), name=f\"{self.name}.ingress_module.init_events()\"\n )\n\n # main scan loop\n while 1:\n # abort if we're aborting\n if self.aborting:\n self._drain_queues()\n break\n\n # yield events as they come (async for event in scan.async_start())\n if \"python\" in self.modules:\n events, finish = await self.modules[\"python\"]._events_waiting(batch_size=-1)\n for e in events:\n yield e\n if events:\n continue\n\n # break if initialization finished and the scan is no longer active\n if self._finished_init and self.modules_finished:\n new_activity = await self.finish()\n if not new_activity:\n break\n\n await asyncio.sleep(0.1)\n\n failed = False\n\n except BaseException as e:\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.stop()\n failed = False\n else:\n try:\n raise\n except ScanError as e:\n self.error(f\"{e}\")\n\n except BBOTError as e:\n self.critical(f\"Error during scan: {e}\")\n\n except Exception:\n self.critical(f\"Unexpected error during scan:\\n{traceback.format_exc()}\")\n\n finally:\n tasks = self._cancel_tasks()\n self.debug(f\"Awaiting {len(tasks):,} tasks\")\n for task in tasks:\n # self.debug(f\"Awaiting {task}\")\n with contextlib.suppress(BaseException):\n await asyncio.wait_for(task, timeout=0.1)\n self.debug(f\"Awaited {len(tasks):,} tasks\")\n await self._report()\n await self._cleanup()\n\n log_fn = self.hugesuccess\n if self.status == \"ABORTING\":\n self.status = \"ABORTED\"\n log_fn = self.hugewarning\n elif failed:\n self.status = \"FAILED\"\n log_fn = self.critical\n else:\n self.status = \"FINISHED\"\n\n scan_run_time = datetime.now() - scan_start_time\n scan_run_time = self.helpers.human_timedelta(scan_run_time)\n log_fn(f\"Scan {self.name} completed in {scan_run_time} with status {self.status}\")\n\n await self.dispatcher.on_finish(self)\n\n self._stop_log_handlers()\n\n def _start_modules(self):\n self.verbose(f\"Starting module worker loops\")\n for module in self.modules.values():\n module.start()\n\n async def setup_modules(self, remove_failed=True):\n \"\"\"Asynchronously initializes all loaded modules by invoking their `setup()` methods.\n\n Args:\n remove_failed (bool): Flag indicating whether to remove modules that fail setup.\n\n Returns:\n tuple:\n succeeded - List of modules that successfully set up.\n hard_failed - List of modules that encountered a hard failure during setup.\n soft_failed - List of modules that encountered a soft failure during setup.\n\n Raises:\n ScanError: If no output modules could be loaded.\n\n Notes:\n Hard-failed modules are set to an error state and removed if `remove_failed` is True.\n Soft-failed modules are not set to an error state but are also removed if `remove_failed` is True.\n \"\"\"\n await self.load_modules()\n self.verbose(f\"Setting up modules\")\n succeeded = []\n hard_failed = []\n soft_failed = []\n\n async for task in self.helpers.as_completed([m._setup() for m in self.modules.values()]):\n module, status, msg = await task\n if status == True:\n self.debug(f\"Setup succeeded for {module.name} ({msg})\")\n succeeded.append(module.name)\n elif status == False:\n self.warning(f\"Setup hard-failed for {module.name}: {msg}\")\n self.modules[module.name].set_error_state()\n hard_failed.append(module.name)\n else:\n self.info(f\"Setup soft-failed for {module.name}: {msg}\")\n soft_failed.append(module.name)\n if (not status) and (module._intercept or remove_failed):\n # if a intercept module fails setup, we always remove it\n self.modules.pop(module.name)\n\n return succeeded, hard_failed, soft_failed\n\n async def load_modules(self):\n \"\"\"Asynchronously import and instantiate all scan modules, including internal and output modules.\n\n This method is automatically invoked by `setup_modules()`. It performs several key tasks in the following sequence:\n\n 1. Install dependencies for each module via `self.helpers.depsinstaller.install()`.\n 2. Load scan modules and updates the `modules` dictionary.\n 3. Load internal modules and updates the `modules` dictionary.\n 4. Load output modules and updates the `modules` dictionary.\n 5. Sorts modules based on their `_priority` attribute.\n\n If any modules fail to load or their dependencies fail to install, a ScanError will be raised (unless `self.force_start` is True).\n\n Attributes:\n succeeded, failed (tuple): A tuple containing lists of modules that succeeded or failed during the dependency installation.\n loaded_modules, loaded_internal_modules, loaded_output_modules (dict): Dictionaries of successfully loaded modules.\n failed, failed_internal, failed_output (list): Lists of module names that failed to load.\n\n Raises:\n ScanError: If any module dependencies fail to install or modules fail to load, and if `self.force_start` is False.\n\n Returns:\n None\n\n Note:\n After all modules are loaded, they are sorted by `_priority` and stored in the `modules` dictionary.\n \"\"\"\n if not self._modules_loaded:\n if not self.preset.modules:\n self.warning(f\"No modules to load\")\n return\n\n if not self.preset.scan_modules:\n self.warning(f\"No scan modules to load\")\n\n # install module dependencies\n succeeded, failed = await self.helpers.depsinstaller.install(*self.preset.modules)\n if failed:\n msg = f\"Failed to install dependencies for {len(failed):,} modules: {','.join(failed)}\"\n self._fail_setup(msg)\n modules = sorted([m for m in self.preset.scan_modules if m in succeeded])\n output_modules = sorted([m for m in self.preset.output_modules if m in succeeded])\n internal_modules = sorted([m for m in self.preset.internal_modules if m in succeeded])\n\n # Load scan modules\n self.verbose(f\"Loading {len(modules):,} scan modules: {','.join(modules)}\")\n loaded_modules, failed = self._load_modules(modules)\n self.modules.update(loaded_modules)\n if len(failed) > 0:\n msg = f\"Failed to load {len(failed):,} scan modules: {','.join(failed)}\"\n self._fail_setup(msg)\n if loaded_modules:\n self.info(\n f\"Loaded {len(loaded_modules):,}/{len(self.preset.scan_modules):,} scan modules ({','.join(loaded_modules)})\"\n )\n\n # Load internal modules\n self.verbose(f\"Loading {len(internal_modules):,} internal modules: {','.join(internal_modules)}\")\n loaded_internal_modules, failed_internal = self._load_modules(internal_modules)\n self.modules.update(loaded_internal_modules)\n if len(failed_internal) > 0:\n msg = f\"Failed to load {len(loaded_internal_modules):,} internal modules: {','.join(loaded_internal_modules)}\"\n self._fail_setup(msg)\n if loaded_internal_modules:\n self.info(\n f\"Loaded {len(loaded_internal_modules):,}/{len(self.preset.internal_modules):,} internal modules ({','.join(loaded_internal_modules)})\"\n )\n\n # Load output modules\n self.verbose(f\"Loading {len(output_modules):,} output modules: {','.join(output_modules)}\")\n loaded_output_modules, failed_output = self._load_modules(output_modules)\n self.modules.update(loaded_output_modules)\n if len(failed_output) > 0:\n msg = f\"Failed to load {len(failed_output):,} output modules: {','.join(failed_output)}\"\n self._fail_setup(msg)\n if loaded_output_modules:\n self.info(\n f\"Loaded {len(loaded_output_modules):,}/{len(self.preset.output_modules):,} output modules, ({','.join(loaded_output_modules)})\"\n )\n\n # builtin intercept modules\n self.ingress_module = ScanIngress(self)\n self.egress_module = ScanEgress(self)\n self.modules[self.ingress_module.name] = self.ingress_module\n self.modules[self.egress_module.name] = self.egress_module\n\n # sort modules by priority\n self.modules = OrderedDict(sorted(self.modules.items(), key=lambda x: getattr(x[-1], \"priority\", 3)))\n\n self._modules_loaded = True\n\n @property\n def modules_finished(self):\n finished_modules = [m.finished for m in self.modules.values()]\n return all(finished_modules)\n\n def kill_module(self, module_name, message=None):\n from signal import SIGINT\n\n module = self.modules[module_name]\n if module._intercept:\n self.warning(f'Cannot kill module \"{module_name}\" because it is critical to the scan')\n return\n module.set_error_state(message=message, clear_outgoing_queue=True)\n for proc in module._proc_tracker:\n with contextlib.suppress(Exception):\n proc.send_signal(SIGINT)\n self.helpers.cancel_tasks_sync(module._tasks)\n\n @property\n def incoming_event_queues(self):\n return self.ingress_module.incoming_queues\n\n @property\n def num_queued_events(self):\n total = 0\n for q in self.incoming_event_queues:\n total += len(q._queue)\n return total\n\n def modules_status(self, _log=False):\n finished = True\n status = {\"modules\": {}}\n\n sorted_modules = []\n for module_name, module in self.modules.items():\n if module_name.startswith(\"_\"):\n continue\n sorted_modules.append(module)\n mod_status = module.status\n if mod_status[\"running\"]:\n finished = False\n status[\"modules\"][module_name] = mod_status\n\n # sort modules by name\n sorted_modules.sort(key=lambda m: m.name)\n\n status[\"finished\"] = finished\n\n modules_errored = [m for m, s in status[\"modules\"].items() if s[\"errored\"]]\n\n max_mem_percent = 90\n mem_status = self.helpers.memory_status()\n # abort if we don't have the memory\n mem_percent = mem_status.percent\n if mem_percent > max_mem_percent:\n free_memory = mem_status.available\n free_memory_human = self.helpers.bytes_to_human(free_memory)\n self.warning(f\"System memory is at {mem_percent:.1f}% ({free_memory_human} remaining)\")\n\n if _log:\n modules_status = []\n for m, s in status[\"modules\"].items():\n running = s[\"running\"]\n incoming = s[\"events\"][\"incoming\"]\n outgoing = s[\"events\"][\"outgoing\"]\n tasks = s[\"tasks\"]\n total = sum([incoming, outgoing, tasks])\n if running or total > 0:\n modules_status.append((m, running, incoming, outgoing, tasks, total))\n modules_status.sort(key=lambda x: x[-1], reverse=True)\n\n if modules_status:\n modules_status_str = \", \".join([f\"{m}({i:,}:{t:,}:{o:,})\" for m, r, i, o, t, _ in modules_status])\n self.info(f\"{self.name}: Modules running (incoming:processing:outgoing) {modules_status_str}\")\n else:\n self.info(f\"{self.name}: No modules running\")\n event_type_summary = sorted(self.stats.events_emitted_by_type.items(), key=lambda x: x[-1], reverse=True)\n if event_type_summary:\n self.info(\n f'{self.name}: Events produced so far: {\", \".join([f\"{k}: {v}\" for k,v in event_type_summary])}'\n )\n else:\n self.info(f\"{self.name}: No events produced yet\")\n\n if modules_errored:\n self.verbose(\n f'{self.name}: Modules errored: {len(modules_errored):,} ({\", \".join([m for m in modules_errored])})'\n )\n\n num_queued_events = self.num_queued_events\n if num_queued_events:\n self.info(\n f\"{self.name}: {num_queued_events:,} events in queue ({self.stats.speedometer.speed:,} processed in the past {self.status_frequency} seconds)\"\n )\n else:\n self.info(\n f\"{self.name}: No events in queue ({self.stats.speedometer.speed:,} processed in the past {self.status_frequency} seconds)\"\n )\n\n if self.log_level <= logging.DEBUG:\n # status debugging\n scan_active_status = []\n scan_active_status.append(f\"scan._finished_init: {self._finished_init}\")\n scan_active_status.append(f\"scan.modules_finished: {self.modules_finished}\")\n for m in sorted_modules:\n running = m.running\n scan_active_status.append(f\" {m}.finished: {m.finished}\")\n scan_active_status.append(f\" running: {running}\")\n if running:\n scan_active_status.append(f\" tasks:\")\n for task in list(m._task_counter.tasks.values()):\n scan_active_status.append(f\" - {task}:\")\n scan_active_status.append(f\" incoming_queue_size: {m.num_incoming_events}\")\n scan_active_status.append(f\" outgoing_queue_size: {m.outgoing_event_queue.qsize()}\")\n for line in scan_active_status:\n self.debug(line)\n\n # log module memory usage\n module_memory_usage = []\n for module in sorted_modules:\n memory_usage = module.memory_usage\n module_memory_usage.append((module.name, memory_usage))\n module_memory_usage.sort(key=lambda x: x[-1], reverse=True)\n self.debug(f\"MODULE MEMORY USAGE:\")\n for module_name, usage in module_memory_usage:\n self.debug(f\" - {module_name}: {self.helpers.bytes_to_human(usage)}\")\n\n status.update({\"modules_errored\": len(modules_errored)})\n\n return status\n\n def stop(self):\n \"\"\"Stops the in-progress scan and performs necessary cleanup.\n\n This method sets the scan's status to \"ABORTING,\" cancels any pending tasks, and drains event queues. It also kills child processes spawned during the scan.\n\n Returns:\n None\n \"\"\"\n if not self._stopping:\n self._stopping = True\n self.status = \"ABORTING\"\n self.hugewarning(\"Aborting scan\")\n self.trace()\n self._cancel_tasks()\n self._drain_queues()\n self.helpers.kill_children()\n self._drain_queues()\n self.helpers.kill_children()\n self.debug(\"Finished aborting scan\")\n\n async def finish(self):\n \"\"\"Finalizes the scan by invoking the `finished()` method on all active modules if new activity is detected.\n\n The method is idempotent and will return False if no new activity has been recorded since the last invocation.\n\n Returns:\n bool: True if new activity has been detected and the `finished()` method is invoked on all modules.\n False if no new activity has been detected since the last invocation.\n\n Notes:\n This method alters the scan's status to \"FINISHING\" if new activity is detected.\n \"\"\"\n # if new events were generated since last time we were here\n if self._new_activity:\n self._new_activity = False\n self.status = \"FINISHING\"\n # Trigger .finished() on every module and start over\n log.info(\"Finishing scan\")\n for module in self.modules.values():\n finished_event = self.make_event(f\"FINISHED\", \"FINISHED\", dummy=True, tags={module.name})\n await module.queue_event(finished_event)\n self.verbose(\"Completed finish()\")\n return True\n # Return False if no new events were generated since last time\n self.verbose(\"Completed final finish()\")\n return False\n\n def _drain_queues(self):\n \"\"\"Empties all the event queues for each loaded module and the manager's incoming event queue.\n\n This method iteratively empties both the incoming and outgoing event queues of each module, as well as the incoming event queue of the scan manager.\n\n Returns:\n None\n \"\"\"\n self.debug(\"Draining queues\")\n for module in self.modules.values():\n with contextlib.suppress(asyncio.queues.QueueEmpty):\n while 1:\n if module.incoming_event_queue:\n module.incoming_event_queue.get_nowait()\n with contextlib.suppress(asyncio.queues.QueueEmpty):\n while 1:\n if module.outgoing_event_queue:\n module.outgoing_event_queue.get_nowait()\n self.debug(\"Finished draining queues\")\n\n def _cancel_tasks(self):\n \"\"\"Cancels all asynchronous tasks and shuts down the process pool.\n\n This method collects all pending tasks from each module, the dispatcher,\n and the scan manager. After collecting these tasks, it cancels them synchronously\n using a helper function. Finally, it shuts down the process pool, canceling any\n pending futures.\n\n Returns:\n None\n \"\"\"\n self.debug(\"Cancelling all scan tasks\")\n tasks = []\n # module workers\n for m in self.modules.values():\n tasks += getattr(m, \"_tasks\", [])\n # init events\n if self.init_events_task:\n tasks.append(self.init_events_task)\n # ticker\n if self.ticker_task:\n tasks.append(self.ticker_task)\n # dispatcher\n tasks += self.dispatcher_tasks\n # manager worker loops\n tasks += self._manager_worker_loop_tasks\n self.helpers.cancel_tasks_sync(tasks)\n # process pool\n self.helpers.process_pool.shutdown(cancel_futures=True)\n self.debug(\"Finished cancelling all scan tasks\")\n return tasks\n\n async def _report(self):\n \"\"\"Asynchronously executes the `report()` method for each module in the scan.\n\n This method is called once at the end of each scan and is responsible for\n triggering the `report()` function for each module. It executes irrespective\n of whether the scan was aborted or completed successfully. The method makes\n use of an asynchronous context manager (`_acatch`) to handle exceptions and\n a task counter to keep track of the task's context.\n\n Returns:\n None\n \"\"\"\n for mod in self.modules.values():\n context = f\"{mod.name}.report()\"\n async with self._acatch(context), mod._task_counter.count(context):\n await mod.report()\n\n async def _cleanup(self):\n \"\"\"Asynchronously executes the `cleanup()` method for each module in the scan.\n\n This method is called once at the end of the scan to perform resource cleanup\n tasks. It is executed regardless of whether the scan was aborted or completed\n successfully. The scan status is set to \"CLEANING_UP\" during the execution.\n After calling the `cleanup()` method for each module, it performs additional\n cleanup tasks such as removing the scan's home directory if empty and cleaning\n old scans.\n\n Returns:\n None\n \"\"\"\n # clean up self\n if not self._cleanedup:\n self._cleanedup = True\n self.status = \"CLEANING_UP\"\n # clean up dns engine\n if self.helpers._dns is not None:\n await self.helpers.dns.shutdown()\n # clean up web engine\n if self.helpers._web is not None:\n await self.helpers.web.shutdown()\n # clean up modules\n for mod in self.modules.values():\n await mod._cleanup()\n with contextlib.suppress(Exception):\n self.home.rmdir()\n self.helpers.clean_old_scans()\n\n def in_scope(self, *args, **kwargs):\n return self.preset.in_scope(*args, **kwargs)\n\n def whitelisted(self, *args, **kwargs):\n return self.preset.whitelisted(*args, **kwargs)\n\n def blacklisted(self, *args, **kwargs):\n return self.preset.blacklisted(*args, **kwargs)\n\n @property\n def core(self):\n return self.preset.core\n\n @property\n def config(self):\n return self.preset.core.config\n\n @property\n def target(self):\n return self.preset.target\n\n @property\n def whitelist(self):\n return self.preset.whitelist\n\n @property\n def blacklist(self):\n return self.preset.blacklist\n\n @property\n def helpers(self):\n return self.preset.helpers\n\n @property\n def force_start(self):\n return self.preset.force_start\n\n @property\n def word_cloud(self):\n return self.helpers.word_cloud\n\n @property\n def stopping(self):\n return not self.running\n\n @property\n def stopped(self):\n return self._status_code > 5\n\n @property\n def running(self):\n return 0 < self._status_code < 4\n\n @property\n def aborting(self):\n return 5 <= self._status_code <= 6\n\n @property\n def status(self):\n return self._status\n\n @property\n def omitted_event_types(self):\n if self._omitted_event_types is None:\n self._omitted_event_types = self.config.get(\"omit_event_types\", [])\n return self._omitted_event_types\n\n @status.setter\n def status(self, status):\n \"\"\"\n Block setting after status has been aborted\n \"\"\"\n status = str(status).strip().upper()\n if status in self._status_codes:\n if self.status == \"ABORTING\" and not status == \"ABORTED\":\n self.debug(f'Attempt to set invalid status \"{status}\" on aborted scan')\n else:\n if status != self._status:\n self._status = status\n self._status_code = self._status_codes[status]\n self.dispatcher_tasks.append(\n asyncio.create_task(\n self.dispatcher.catch(self.dispatcher.on_status, self._status, self.id),\n name=f\"{self.name}.dispatcher.on_status({status})\",\n )\n )\n else:\n self.debug(f'Scan status is already \"{status}\"')\n else:\n self.debug(f'Attempt to set invalid status \"{status}\" on scan')\n\n def make_event(self, *args, **kwargs):\n kwargs[\"scan\"] = self\n event = make_event(*args, **kwargs)\n return event\n\n @property\n def root_event(self):\n \"\"\"\n The root scan event, e.g.:\n ```json\n {\n \"type\": \"SCAN\",\n \"id\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"data\": \"pixilated_kathryn (SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54)\",\n \"scope_distance\": 0,\n \"scan\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"timestamp\": 1694548779.616255,\n \"parent\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"tags\": [\n \"distance-0\"\n ],\n \"module\": \"TARGET\",\n \"module_sequence\": \"TARGET\"\n }\n ```\n \"\"\"\n root_event = self.make_event(data=self.json, event_type=\"SCAN\", dummy=True)\n root_event._id = self.id\n root_event.scope_distance = 0\n root_event.parent = root_event\n root_event.module = self._make_dummy_module(name=\"TARGET\", _type=\"TARGET\")\n root_event.discovery_context = f\"Scan {self.name} started at {root_event.timestamp}\"\n return root_event\n\n @property\n def dns_strings(self):\n \"\"\"\n A list of DNS hostname strings generated from the scan target\n \"\"\"\n if self._dns_strings is None:\n dns_targets = set(t.host for t in self.target if t.host and isinstance(t.host, str))\n dns_whitelist = set(t.host for t in self.whitelist if t.host and isinstance(t.host, str))\n dns_targets.update(dns_whitelist)\n dns_targets = sorted(dns_targets, key=len)\n dns_targets_set = set()\n dns_strings = []\n for t in dns_targets:\n if not any(x in dns_targets_set for x in self.helpers.domain_parents(t, include_self=True)):\n dns_strings.append(t)\n self._dns_strings = dns_strings\n return self._dns_strings\n\n def _generate_dns_regexes(self, pattern):\n \"\"\"\n Generates a list of compiled DNS hostname regexes based on the provided pattern.\n This method centralizes the regex compilation to avoid redundancy in the dns_regexes and dns_regexes_yara methods.\n\n Args:\n pattern (str):\n Returns:\n list[re.Pattern]: A list of compiled regex patterns if enabled, otherwise an empty list.\n \"\"\"\n\n dns_regexes = []\n for t in self.dns_strings:\n regex_pattern = re.compile(f\"{pattern}{re.escape(t)})\", re.I)\n log.debug(f\"Generated Regex [{regex_pattern.pattern}] for domain {t}\")\n dns_regexes.append(regex_pattern)\n return dns_regexes\n\n @property\n def dns_regexes(self):\n \"\"\"\n A list of DNS hostname regexes generated from the scan target\n For the purpose of extracting hostnames\n\n Examples:\n Extract hostnames from text:\n >>> for regex in scan.dns_regexes:\n ... for match in regex.finditer(response.text):\n ... hostname = match.group().lower()\n \"\"\"\n if not self._dns_regexes:\n self._dns_regexes = self._generate_dns_regexes(r\"((?:(?:[\\w-]+)\\.)+\")\n return self._dns_regexes\n\n @property\n def dns_regexes_yara(self):\n \"\"\"\n Returns a list of DNS hostname regexes formatted specifically for compatibility with YARA rules.\n \"\"\"\n return self._generate_dns_regexes(r\"(([a-z0-9-]+\\.)+\")\n\n @property\n def json(self):\n \"\"\"\n A dictionary representation of the scan including its name, ID, targets, whitelist, blacklist, and modules\n \"\"\"\n j = dict()\n for i in (\"id\", \"name\"):\n v = getattr(self, i, \"\")\n if v:\n j.update({i: v})\n j[\"target\"] = self.preset.target.json\n j[\"preset\"] = self.preset.to_dict(redact_secrets=True)\n return j\n\n def debug(self, *args, trace=False, **kwargs):\n log.debug(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def verbose(self, *args, trace=False, **kwargs):\n log.verbose(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeverbose(self, *args, trace=False, **kwargs):\n log.hugeverbose(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def info(self, *args, trace=False, **kwargs):\n log.info(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugeinfo(self, *args, trace=False, **kwargs):\n log.hugeinfo(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def success(self, *args, trace=False, **kwargs):\n log.success(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugesuccess(self, *args, trace=False, **kwargs):\n log.hugesuccess(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def warning(self, *args, trace=True, **kwargs):\n log.warning(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def hugewarning(self, *args, trace=True, **kwargs):\n log.hugewarning(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def error(self, *args, trace=True, **kwargs):\n log.error(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n def trace(self, msg=None):\n if msg is None:\n e_type, e_val, e_traceback = exc_info()\n if e_type is not None:\n log.trace(traceback.format_exc())\n else:\n log.trace(msg)\n\n def critical(self, *args, trace=True, **kwargs):\n log.critical(*args, extra={\"scan_id\": self.id}, **kwargs)\n if trace:\n self.trace()\n\n @property\n def log_level(self):\n \"\"\"\n Return the current log level, e.g. logging.INFO\n \"\"\"\n return self.core.logger.log_level\n\n @property\n def _log_handlers(self):\n if self.__log_handlers is None:\n self.helpers.mkdir(self.home)\n main_handler = logging.handlers.TimedRotatingFileHandler(\n str(self.home / \"scan.log\"), when=\"d\", interval=1, backupCount=14\n )\n main_handler.addFilter(lambda x: x.levelno != logging.TRACE and x.levelno >= logging.VERBOSE)\n debug_handler = logging.handlers.TimedRotatingFileHandler(\n str(self.home / \"debug.log\"), when=\"d\", interval=1, backupCount=14\n )\n debug_handler.addFilter(lambda x: x.levelno >= logging.DEBUG)\n self.__log_handlers = [main_handler, debug_handler]\n return self.__log_handlers\n\n def _start_log_handlers(self):\n # add log handlers\n for handler in self._log_handlers:\n self.core.logger.add_log_handler(handler)\n # temporarily disable main ones\n for handler_name in (\"file_main\", \"file_debug\"):\n handler = self.core.logger.log_handlers.get(handler_name, None)\n if handler is not None and handler not in self._log_handler_backup:\n self._log_handler_backup.append(handler)\n self.core.logger.remove_log_handler(handler)\n\n def _stop_log_handlers(self):\n # remove log handlers\n for handler in self._log_handlers:\n self.core.logger.remove_log_handler(handler)\n # restore main ones\n for handler in self._log_handler_backup:\n self.core.logger.add_log_handler(handler)\n\n def _fail_setup(self, msg):\n msg = str(msg)\n if self.force_start:\n self.error(msg)\n else:\n msg += \" (--force to run module anyway)\"\n raise ScanError(msg)\n\n def _load_modules(self, modules):\n modules = [str(m) for m in modules]\n loaded_modules = {}\n failed = set()\n for module_name, module_class in self.preset.module_loader.load_modules(modules).items():\n if module_class:\n try:\n loaded_modules[module_name] = module_class(self)\n self.verbose(f'Loaded module \"{module_name}\"')\n continue\n except Exception:\n self.warning(f\"Failed to load module {module_class}\")\n else:\n self.warning(f'Failed to load unknown module \"{module_name}\"')\n failed.add(module_name)\n return loaded_modules, failed\n\n async def _status_ticker(self, interval=15):\n async with self._acatch():\n while 1:\n await asyncio.sleep(interval)\n self.modules_status(_log=True)\n\n @contextlib.asynccontextmanager\n async def _acatch(self, context=\"scan\", finally_callback=None, unhandled_is_critical=False):\n \"\"\"\n Async version of catch()\n\n async with catch():\n await do_stuff()\n \"\"\"\n try:\n yield\n except BaseException as e:\n self._handle_exception(e, context=context, unhandled_is_critical=unhandled_is_critical)\n\n def _handle_exception(self, e, context=\"scan\", finally_callback=None, unhandled_is_critical=False):\n if callable(context):\n context = f\"{context.__qualname__}()\"\n filename, lineno, funcname = self.helpers.get_traceback_details(e)\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt,)):\n log.debug(f\"Interrupted\")\n self.stop()\n elif isinstance(e, BrokenPipeError):\n log.debug(f\"BrokenPipeError in {filename}:{lineno}:{funcname}(): {e}\")\n elif isinstance(e, asyncio.CancelledError):\n raise\n elif isinstance(e, Exception):\n traceback_str = getattr(e, \"engine_traceback\", None)\n if traceback_str is None:\n traceback_str = traceback.format_exc()\n if unhandled_is_critical:\n log.critical(f\"Error in {context}: {filename}:{lineno}:{funcname}(): {e}\")\n log.critical(traceback_str)\n else:\n log.error(f\"Error in {context}: {filename}:{lineno}:{funcname}(): {e}\")\n log.trace(traceback_str)\n if callable(finally_callback):\n finally_callback(e)\n\n def _make_dummy_module(self, name, _type=\"scan\"):\n \"\"\"\n Construct a dummy module, for attachment to events\n \"\"\"\n try:\n return self.dummy_modules[name]\n except KeyError:\n dummy = DummyModule(scan=self, name=name, _type=_type)\n self.dummy_modules[name] = dummy\n return dummy\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.dns_regexes","title":"dns_regexes property
","text":"dns_regexes\n
A list of DNS hostname regexes generated from the scan target For the purpose of extracting hostnames
Examples:
Extract hostnames from text:
>>> for regex in scan.dns_regexes:\n... for match in regex.finditer(response.text):\n... hostname = match.group().lower()\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.dns_regexes_yara","title":"dns_regexes_yara property
","text":"dns_regexes_yara\n
Returns a list of DNS hostname regexes formatted specifically for compatibility with YARA rules.
"},{"location":"dev/scanner/#bbot.scanner.Scanner.dns_strings","title":"dns_stringsproperty
","text":"dns_strings\n
A list of DNS hostname strings generated from the scan target
"},{"location":"dev/scanner/#bbot.scanner.Scanner.json","title":"jsonproperty
","text":"json\n
A dictionary representation of the scan including its name, ID, targets, whitelist, blacklist, and modules
"},{"location":"dev/scanner/#bbot.scanner.Scanner.log_level","title":"log_levelproperty
","text":"log_level\n
Return the current log level, e.g. logging.INFO
"},{"location":"dev/scanner/#bbot.scanner.Scanner.root_event","title":"root_eventproperty
","text":"root_event\n
The root scan event, e.g.:
{\n \"type\": \"SCAN\",\n \"id\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"data\": \"pixilated_kathryn (SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54)\",\n \"scope_distance\": 0,\n \"scan\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"timestamp\": 1694548779.616255,\n \"parent\": \"SCAN:1188928d942ace8e3befae0bdb9c3caa22705f54\",\n \"tags\": [\n \"distance-0\"\n ],\n \"module\": \"TARGET\",\n \"module_sequence\": \"TARGET\"\n}\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.__init__","title":"__init__","text":"__init__(*targets, scan_id=None, dispatcher=None, **kwargs)\n
Initializes the Scanner class.
If a premade preset
is specified, it will be used for the scan. Otherwise, Scan
accepts the same arguments as Preset
, which are passed through and used to create a new preset.
Parameters:
*targets
(list[str]
, default: ()
) \u2013 Scan targets (passed through to Preset
).
preset
(Preset
) \u2013 Preset to use for the scan.
scan_id
(str
, default: None
) \u2013 Unique identifier for the scan. Auto-generates if None.
dispatcher
(Dispatcher
, default: None
) \u2013 Dispatcher object to use. Defaults to new Dispatcher.
**kwargs
(list[str]
, default: {}
) \u2013 Additional keyword arguments (passed through to Preset
).
bbot/scanner/scanner.py
def __init__(\n self,\n *targets,\n scan_id=None,\n dispatcher=None,\n **kwargs,\n):\n \"\"\"\n Initializes the Scanner class.\n\n If a premade `preset` is specified, it will be used for the scan.\n Otherwise, `Scan` accepts the same arguments as `Preset`, which are passed through and used to create a new preset.\n\n Args:\n *targets (list[str], optional): Scan targets (passed through to `Preset`).\n preset (Preset, optional): Preset to use for the scan.\n scan_id (str, optional): Unique identifier for the scan. Auto-generates if None.\n dispatcher (Dispatcher, optional): Dispatcher object to use. Defaults to new Dispatcher.\n **kwargs (list[str], optional): Additional keyword arguments (passed through to `Preset`).\n \"\"\"\n if scan_id is not None:\n self.id = str(id)\n else:\n self.id = f\"SCAN:{sha1(rand_string(20)).hexdigest()}\"\n\n preset = kwargs.pop(\"preset\", None)\n kwargs[\"_log\"] = True\n\n from .preset import Preset\n\n if preset is None:\n preset = Preset(*targets, **kwargs)\n else:\n if not isinstance(preset, Preset):\n raise ValidationError(f'Preset must be of type Preset, not \"{type(preset).__name__}\"')\n self.preset = preset.bake(self)\n\n # scan name\n if preset.scan_name is None:\n tries = 0\n while 1:\n if tries > 5:\n scan_name = f\"{rand_string(4)}_{rand_string(4)}\"\n break\n scan_name = random_name()\n if self.preset.output_dir is not None:\n home_path = Path(self.preset.output_dir).resolve() / scan_name\n else:\n home_path = self.preset.bbot_home / \"scans\" / scan_name\n if not home_path.exists():\n break\n tries += 1\n else:\n scan_name = str(preset.scan_name)\n self.name = scan_name\n\n # scan output dir\n if preset.output_dir is not None:\n self.home = Path(preset.output_dir).resolve() / self.name\n else:\n self.home = self.preset.bbot_home / \"scans\" / self.name\n\n self._status = \"NOT_STARTED\"\n self._status_code = 0\n\n self.modules = OrderedDict({})\n self._modules_loaded = False\n self.dummy_modules = {}\n\n if dispatcher is None:\n from .dispatcher import Dispatcher\n\n self.dispatcher = Dispatcher()\n else:\n self.dispatcher = dispatcher\n self.dispatcher.set_scan(self)\n\n # scope distance\n self.scope_config = self.config.get(\"scope\", {})\n self.scope_search_distance = max(0, int(self.scope_config.get(\"search_distance\", 0)))\n self.scope_report_distance = int(self.scope_config.get(\"report_distance\", 1))\n\n # web config\n self.web_config = self.config.get(\"web\", {})\n self.web_spider_distance = self.web_config.get(\"spider_distance\", 0)\n self.web_spider_depth = self.web_config.get(\"spider_depth\", 1)\n self.web_spider_links_per_page = self.web_config.get(\"spider_links_per_page\", 20)\n max_redirects = self.web_config.get(\"http_max_redirects\", 5)\n self.web_max_redirects = max(max_redirects, self.web_spider_distance)\n self.http_proxy = self.web_config.get(\"http_proxy\", \"\")\n self.http_timeout = self.web_config.get(\"http_timeout\", 10)\n self.httpx_timeout = self.web_config.get(\"httpx_timeout\", 5)\n self.http_retries = self.web_config.get(\"http_retries\", 1)\n self.httpx_retries = self.web_config.get(\"httpx_retries\", 1)\n self.useragent = self.web_config.get(\"user_agent\", \"BBOT\")\n # custom HTTP headers warning\n self.custom_http_headers = self.web_config.get(\"http_headers\", {})\n if self.custom_http_headers:\n self.warning(\n \"You have enabled custom HTTP headers. These will be attached to all in-scope requests and all requests made by httpx.\"\n )\n\n # url file extensions\n self.url_extension_blacklist = set(e.lower() for e in self.config.get(\"url_extension_blacklist\", []))\n self.url_extension_httpx_only = set(e.lower() for e in self.config.get(\"url_extension_httpx_only\", []))\n\n # url querystring behavior\n self.url_querystring_remove = self.config.get(\"url_querystring_remove\", True)\n\n # blob inclusion\n self._file_blobs = self.config.get(\"file_blobs\", False)\n self._folder_blobs = self.config.get(\"folder_blobs\", False)\n\n # how often to print scan status\n self.status_frequency = self.config.get(\"status_frequency\", 15)\n\n from .stats import ScanStats\n\n self.stats = ScanStats(self)\n\n self._prepped = False\n self._finished_init = False\n self._new_activity = False\n self._cleanedup = False\n self._omitted_event_types = None\n\n self.__loop = None\n self._manager_worker_loop_tasks = []\n self.init_events_task = None\n self.ticker_task = None\n self.dispatcher_tasks = []\n\n self._stopping = False\n\n self._dns_strings = None\n self._dns_regexes = None\n\n self.__log_handlers = None\n self._log_handler_backup = []\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.async_start","title":"async_start async
","text":"async_start()\n
Source code in bbot/scanner/scanner.py
async def async_start(self):\n \"\"\" \"\"\"\n failed = True\n scan_start_time = datetime.now()\n try:\n await self._prep()\n\n self._start_log_handlers()\n self.trace(f'Ran BBOT {__version__} at {scan_start_time}, command: {\" \".join(sys.argv)}')\n self.trace(f\"Target: {self.preset.target.json}\")\n self.trace(f\"Preset: {self.preset.to_dict(redact_secrets=True)}\")\n\n if not self.target:\n self.warning(f\"No scan targets specified\")\n\n # start status ticker\n self.ticker_task = asyncio.create_task(\n self._status_ticker(self.status_frequency), name=f\"{self.name}._status_ticker()\"\n )\n\n self.status = \"STARTING\"\n\n if not self.modules:\n self.error(f\"No modules loaded\")\n self.status = \"FAILED\"\n return\n else:\n self.hugesuccess(f\"Starting scan {self.name}\")\n\n await self.dispatcher.on_start(self)\n\n self.status = \"RUNNING\"\n self._start_modules()\n self.verbose(f\"{len(self.modules):,} modules started\")\n\n # distribute seed events\n self.init_events_task = asyncio.create_task(\n self.ingress_module.init_events(self.target.events), name=f\"{self.name}.ingress_module.init_events()\"\n )\n\n # main scan loop\n while 1:\n # abort if we're aborting\n if self.aborting:\n self._drain_queues()\n break\n\n # yield events as they come (async for event in scan.async_start())\n if \"python\" in self.modules:\n events, finish = await self.modules[\"python\"]._events_waiting(batch_size=-1)\n for e in events:\n yield e\n if events:\n continue\n\n # break if initialization finished and the scan is no longer active\n if self._finished_init and self.modules_finished:\n new_activity = await self.finish()\n if not new_activity:\n break\n\n await asyncio.sleep(0.1)\n\n failed = False\n\n except BaseException as e:\n if self.helpers.in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n self.stop()\n failed = False\n else:\n try:\n raise\n except ScanError as e:\n self.error(f\"{e}\")\n\n except BBOTError as e:\n self.critical(f\"Error during scan: {e}\")\n\n except Exception:\n self.critical(f\"Unexpected error during scan:\\n{traceback.format_exc()}\")\n\n finally:\n tasks = self._cancel_tasks()\n self.debug(f\"Awaiting {len(tasks):,} tasks\")\n for task in tasks:\n # self.debug(f\"Awaiting {task}\")\n with contextlib.suppress(BaseException):\n await asyncio.wait_for(task, timeout=0.1)\n self.debug(f\"Awaited {len(tasks):,} tasks\")\n await self._report()\n await self._cleanup()\n\n log_fn = self.hugesuccess\n if self.status == \"ABORTING\":\n self.status = \"ABORTED\"\n log_fn = self.hugewarning\n elif failed:\n self.status = \"FAILED\"\n log_fn = self.critical\n else:\n self.status = \"FINISHED\"\n\n scan_run_time = datetime.now() - scan_start_time\n scan_run_time = self.helpers.human_timedelta(scan_run_time)\n log_fn(f\"Scan {self.name} completed in {scan_run_time} with status {self.status}\")\n\n await self.dispatcher.on_finish(self)\n\n self._stop_log_handlers()\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.finish","title":"finish async
","text":"finish()\n
Finalizes the scan by invoking the finished()
method on all active modules if new activity is detected.
The method is idempotent and will return False if no new activity has been recorded since the last invocation.
Returns:
bool
\u2013 True if new activity has been detected and the finished()
method is invoked on all modules. False if no new activity has been detected since the last invocation.
This method alters the scan's status to \"FINISHING\" if new activity is detected.
Source code inbbot/scanner/scanner.py
async def finish(self):\n \"\"\"Finalizes the scan by invoking the `finished()` method on all active modules if new activity is detected.\n\n The method is idempotent and will return False if no new activity has been recorded since the last invocation.\n\n Returns:\n bool: True if new activity has been detected and the `finished()` method is invoked on all modules.\n False if no new activity has been detected since the last invocation.\n\n Notes:\n This method alters the scan's status to \"FINISHING\" if new activity is detected.\n \"\"\"\n # if new events were generated since last time we were here\n if self._new_activity:\n self._new_activity = False\n self.status = \"FINISHING\"\n # Trigger .finished() on every module and start over\n log.info(\"Finishing scan\")\n for module in self.modules.values():\n finished_event = self.make_event(f\"FINISHED\", \"FINISHED\", dummy=True, tags={module.name})\n await module.queue_event(finished_event)\n self.verbose(\"Completed finish()\")\n return True\n # Return False if no new events were generated since last time\n self.verbose(\"Completed final finish()\")\n return False\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.load_modules","title":"load_modules async
","text":"load_modules()\n
Asynchronously import and instantiate all scan modules, including internal and output modules.
This method is automatically invoked by setup_modules()
. It performs several key tasks in the following sequence:
self.helpers.depsinstaller.install()
.modules
dictionary.modules
dictionary.modules
dictionary._priority
attribute.If any modules fail to load or their dependencies fail to install, a ScanError will be raised (unless self.force_start
is True).
Attributes:
succeeded,
(failed (tuple
) \u2013 A tuple containing lists of modules that succeeded or failed during the dependency installation.
loaded_modules,
(loaded_internal_modules, loaded_output_modules (dict
) \u2013 Dictionaries of successfully loaded modules.
failed,
(failed_internal, failed_output (list
) \u2013 Lists of module names that failed to load.
Raises:
ScanError
\u2013 If any module dependencies fail to install or modules fail to load, and if self.force_start
is False.
Returns:
None
After all modules are loaded, they are sorted by _priority
and stored in the modules
dictionary.
bbot/scanner/scanner.py
async def load_modules(self):\n \"\"\"Asynchronously import and instantiate all scan modules, including internal and output modules.\n\n This method is automatically invoked by `setup_modules()`. It performs several key tasks in the following sequence:\n\n 1. Install dependencies for each module via `self.helpers.depsinstaller.install()`.\n 2. Load scan modules and updates the `modules` dictionary.\n 3. Load internal modules and updates the `modules` dictionary.\n 4. Load output modules and updates the `modules` dictionary.\n 5. Sorts modules based on their `_priority` attribute.\n\n If any modules fail to load or their dependencies fail to install, a ScanError will be raised (unless `self.force_start` is True).\n\n Attributes:\n succeeded, failed (tuple): A tuple containing lists of modules that succeeded or failed during the dependency installation.\n loaded_modules, loaded_internal_modules, loaded_output_modules (dict): Dictionaries of successfully loaded modules.\n failed, failed_internal, failed_output (list): Lists of module names that failed to load.\n\n Raises:\n ScanError: If any module dependencies fail to install or modules fail to load, and if `self.force_start` is False.\n\n Returns:\n None\n\n Note:\n After all modules are loaded, they are sorted by `_priority` and stored in the `modules` dictionary.\n \"\"\"\n if not self._modules_loaded:\n if not self.preset.modules:\n self.warning(f\"No modules to load\")\n return\n\n if not self.preset.scan_modules:\n self.warning(f\"No scan modules to load\")\n\n # install module dependencies\n succeeded, failed = await self.helpers.depsinstaller.install(*self.preset.modules)\n if failed:\n msg = f\"Failed to install dependencies for {len(failed):,} modules: {','.join(failed)}\"\n self._fail_setup(msg)\n modules = sorted([m for m in self.preset.scan_modules if m in succeeded])\n output_modules = sorted([m for m in self.preset.output_modules if m in succeeded])\n internal_modules = sorted([m for m in self.preset.internal_modules if m in succeeded])\n\n # Load scan modules\n self.verbose(f\"Loading {len(modules):,} scan modules: {','.join(modules)}\")\n loaded_modules, failed = self._load_modules(modules)\n self.modules.update(loaded_modules)\n if len(failed) > 0:\n msg = f\"Failed to load {len(failed):,} scan modules: {','.join(failed)}\"\n self._fail_setup(msg)\n if loaded_modules:\n self.info(\n f\"Loaded {len(loaded_modules):,}/{len(self.preset.scan_modules):,} scan modules ({','.join(loaded_modules)})\"\n )\n\n # Load internal modules\n self.verbose(f\"Loading {len(internal_modules):,} internal modules: {','.join(internal_modules)}\")\n loaded_internal_modules, failed_internal = self._load_modules(internal_modules)\n self.modules.update(loaded_internal_modules)\n if len(failed_internal) > 0:\n msg = f\"Failed to load {len(loaded_internal_modules):,} internal modules: {','.join(loaded_internal_modules)}\"\n self._fail_setup(msg)\n if loaded_internal_modules:\n self.info(\n f\"Loaded {len(loaded_internal_modules):,}/{len(self.preset.internal_modules):,} internal modules ({','.join(loaded_internal_modules)})\"\n )\n\n # Load output modules\n self.verbose(f\"Loading {len(output_modules):,} output modules: {','.join(output_modules)}\")\n loaded_output_modules, failed_output = self._load_modules(output_modules)\n self.modules.update(loaded_output_modules)\n if len(failed_output) > 0:\n msg = f\"Failed to load {len(failed_output):,} output modules: {','.join(failed_output)}\"\n self._fail_setup(msg)\n if loaded_output_modules:\n self.info(\n f\"Loaded {len(loaded_output_modules):,}/{len(self.preset.output_modules):,} output modules, ({','.join(loaded_output_modules)})\"\n )\n\n # builtin intercept modules\n self.ingress_module = ScanIngress(self)\n self.egress_module = ScanEgress(self)\n self.modules[self.ingress_module.name] = self.ingress_module\n self.modules[self.egress_module.name] = self.egress_module\n\n # sort modules by priority\n self.modules = OrderedDict(sorted(self.modules.items(), key=lambda x: getattr(x[-1], \"priority\", 3)))\n\n self._modules_loaded = True\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.setup_modules","title":"setup_modules async
","text":"setup_modules(remove_failed=True)\n
Asynchronously initializes all loaded modules by invoking their setup()
methods.
Parameters:
remove_failed
(bool
, default: True
) \u2013 Flag indicating whether to remove modules that fail setup.
Returns:
tuple
\u2013 succeeded - List of modules that successfully set up. hard_failed - List of modules that encountered a hard failure during setup. soft_failed - List of modules that encountered a soft failure during setup.
Raises:
ScanError
\u2013 If no output modules could be loaded.
Hard-failed modules are set to an error state and removed if remove_failed
is True. Soft-failed modules are not set to an error state but are also removed if remove_failed
is True.
bbot/scanner/scanner.py
async def setup_modules(self, remove_failed=True):\n \"\"\"Asynchronously initializes all loaded modules by invoking their `setup()` methods.\n\n Args:\n remove_failed (bool): Flag indicating whether to remove modules that fail setup.\n\n Returns:\n tuple:\n succeeded - List of modules that successfully set up.\n hard_failed - List of modules that encountered a hard failure during setup.\n soft_failed - List of modules that encountered a soft failure during setup.\n\n Raises:\n ScanError: If no output modules could be loaded.\n\n Notes:\n Hard-failed modules are set to an error state and removed if `remove_failed` is True.\n Soft-failed modules are not set to an error state but are also removed if `remove_failed` is True.\n \"\"\"\n await self.load_modules()\n self.verbose(f\"Setting up modules\")\n succeeded = []\n hard_failed = []\n soft_failed = []\n\n async for task in self.helpers.as_completed([m._setup() for m in self.modules.values()]):\n module, status, msg = await task\n if status == True:\n self.debug(f\"Setup succeeded for {module.name} ({msg})\")\n succeeded.append(module.name)\n elif status == False:\n self.warning(f\"Setup hard-failed for {module.name}: {msg}\")\n self.modules[module.name].set_error_state()\n hard_failed.append(module.name)\n else:\n self.info(f\"Setup soft-failed for {module.name}: {msg}\")\n soft_failed.append(module.name)\n if (not status) and (module._intercept or remove_failed):\n # if a intercept module fails setup, we always remove it\n self.modules.pop(module.name)\n\n return succeeded, hard_failed, soft_failed\n
"},{"location":"dev/scanner/#bbot.scanner.Scanner.stop","title":"stop","text":"stop()\n
Stops the in-progress scan and performs necessary cleanup.
This method sets the scan's status to \"ABORTING,\" cancels any pending tasks, and drains event queues. It also kills child processes spawned during the scan.
Returns:
None
bbot/scanner/scanner.py
def stop(self):\n \"\"\"Stops the in-progress scan and performs necessary cleanup.\n\n This method sets the scan's status to \"ABORTING,\" cancels any pending tasks, and drains event queues. It also kills child processes spawned during the scan.\n\n Returns:\n None\n \"\"\"\n if not self._stopping:\n self._stopping = True\n self.status = \"ABORTING\"\n self.hugewarning(\"Aborting scan\")\n self.trace()\n self._cancel_tasks()\n self._drain_queues()\n self.helpers.kill_children()\n self._drain_queues()\n self.helpers.kill_children()\n self.debug(\"Finished aborting scan\")\n
"},{"location":"dev/target/","title":"Target","text":""},{"location":"dev/target/#bbot.scanner.target.Target","title":"Target","text":"A class representing a target. Can contain an unlimited number of hosts, IP or IP ranges, URLs, etc.
Attributes:
strict_scope
(bool
) \u2013 Flag indicating whether to consider child domains in-scope. If set to True, only the exact hosts specified and not their children are considered part of the target.
_radix
(RadixTree
) \u2013 Radix tree for quick IP/DNS lookups.
_events
(set
) \u2013 Flat set of contained events.
Examples:
Basic usage
>>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n>>> len(target)\n257\n>>> list(t.events)\n[\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'}),\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n]\n>>> \"www.evilcorp.com\" in target\nTrue\n>>> \"1.2.3.4\" in target\nTrue\n>>> \"4.3.2.1\" in target\nFalse\n>>> \"https://admin.evilcorp.com\" in target\nTrue\n>>> \"bob@evilcorp.com\" in target\nTrue\n
Event correlation
>>> target.get(\"www.evilcorp.com\")\nDNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n>>> target.get(\"1.2.3.4\")\nIP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n
Target comparison
>>> target2 = Targets(scan, \"www.evilcorp.com\")\n>>> target2 == target\nFalse\n>>> target2 in target\nTrue\n>>> target in target2\nFalse\n
Notes strict_scope=True
bbot/scanner/target.py
class Target:\n \"\"\"\n A class representing a target. Can contain an unlimited number of hosts, IP or IP ranges, URLs, etc.\n\n Attributes:\n strict_scope (bool): Flag indicating whether to consider child domains in-scope.\n If set to True, only the exact hosts specified and not their children are considered part of the target.\n\n _radix (RadixTree): Radix tree for quick IP/DNS lookups.\n _events (set): Flat set of contained events.\n\n Examples:\n Basic usage\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> len(target)\n 257\n >>> list(t.events)\n [\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'}),\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n ]\n >>> \"www.evilcorp.com\" in target\n True\n >>> \"1.2.3.4\" in target\n True\n >>> \"4.3.2.1\" in target\n False\n >>> \"https://admin.evilcorp.com\" in target\n True\n >>> \"bob@evilcorp.com\" in target\n True\n\n Event correlation\n >>> target.get(\"www.evilcorp.com\")\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n >>> target.get(\"1.2.3.4\")\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n\n Target comparison\n >>> target2 = Targets(scan, \"www.evilcorp.com\")\n >>> target2 == target\n False\n >>> target2 in target\n True\n >>> target in target2\n False\n\n Notes:\n - Targets are only precise down to the individual host. Ports and protocols are not considered in scope calculations.\n - If you specify \"https://evilcorp.com:8443\" as a target, all of evilcorp.com (including subdomains and other ports and protocols) will be considered part of the target\n - If you do not want to include child subdomains, use `strict_scope=True`\n \"\"\"\n\n def __init__(self, *targets, strict_scope=False, scan=None, acl_mode=False):\n \"\"\"\n Initialize a Target object.\n\n Args:\n *targets: One or more targets (e.g., domain names, IP ranges) to be included in this Target.\n strict_scope (bool): Whether to consider subdomains of target domains in-scope\n scan (Scan): Reference to the Scan object that instantiated the Target.\n acl_mode (bool): Stricter deduplication for more efficient checks\n\n Notes:\n - If you are instantiating a target from within a BBOT module, use `self.helpers.make_target()` instead. (this removes the need to pass in a scan object.)\n - The strict_scope flag can be set to restrict scope calculation to only exactly-matching hosts and not their child subdomains.\n - Each target is processed and stored as an `Event` in the '_events' dictionary.\n \"\"\"\n self.scan = scan\n self.strict_scope = strict_scope\n self.acl_mode = acl_mode\n self.special_event_types = {\n \"ORG_STUB\": re.compile(r\"^(?:ORG|ORG_STUB):(.*)\", re.IGNORECASE),\n \"USERNAME\": re.compile(r\"^(?:USER|USERNAME):(.*)\", re.IGNORECASE),\n }\n self._events = set()\n self._radix = RadixTarget()\n\n for target_event in self._make_events(targets):\n self._add_event(target_event)\n\n self._hash = None\n\n def add(self, t, event_type=None):\n \"\"\"\n Add a target or merge events from another Target object into this Target.\n\n Args:\n t: The target to be added. It can be either a string, an event object, or another Target object.\n\n Attributes Modified:\n _events (dict): The dictionary is updated to include the new target's events.\n\n Examples:\n >>> target.add('example.com')\n\n Notes:\n - If `t` is of the same class as this Target, all its events are merged.\n - If `t` is an event, it is directly added to `_events`.\n \"\"\"\n if not isinstance(t, (list, tuple, set)):\n t = [t]\n for single_target in t:\n if isinstance(single_target, self.__class__):\n for event in single_target.events:\n self._add_event(event)\n else:\n if is_event(single_target):\n event = single_target\n else:\n try:\n event = make_event(\n single_target, event_type=event_type, dummy=True, tags=[\"target\"], scan=self.scan\n )\n except ValidationError as e:\n # allow commented lines\n if not str(t).startswith(\"#\"):\n log.trace(traceback.format_exc())\n raise ValidationError(f'Could not add target \"{t}\": {e}')\n self._add_event(event)\n\n @property\n def events(self):\n \"\"\"\n Returns all events in the target.\n\n Yields:\n Event object: One of the Event objects stored in the `_events` dictionary.\n\n Examples:\n >>> target = Target(scan, \"example.com\")\n >>> for event in target.events:\n ... print(event)\n\n Notes:\n - This property is read-only.\n \"\"\"\n return self._events\n\n @property\n def hosts(self):\n return [e.host for e in self.events]\n\n def copy(self):\n \"\"\"\n Creates and returns a copy of the Target object, including a shallow copy of the `_events` and `_radix` attributes.\n\n Returns:\n Target: A new Target object with the sameattributes as the original.\n A shallow copy of the `_events` dictionary is made.\n\n Examples:\n >>> original_target = Target(scan, \"example.com\")\n >>> copied_target = original_target.copy()\n >>> copied_target is original_target\n False\n >>> copied_target == original_target\n True\n >>> copied_target in original_target\n True\n >>> original_target in copied_target\n True\n\n Notes:\n - The `scan` object reference is kept intact in the copied Target object.\n \"\"\"\n self_copy = self.__class__()\n self_copy._events = set(self._events)\n self_copy._radix = copy.copy(self._radix)\n return self_copy\n\n def get(self, host, single=True):\n \"\"\"\n Gets the event associated with the specified host from the target's radix tree.\n\n Args:\n host (Event, Target, or str): The hostname, IP, URL, or event to look for.\n single (bool): Whether to return a single event. If False, return all events matching the host\n\n Returns:\n Event or None: Returns the Event object associated with the given host if it exists, otherwise returns None.\n\n Examples:\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> target.get(\"www.evilcorp.com\")\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n >>> target.get(\"1.2.3.4\")\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n\n Notes:\n - The method returns the first event that matches the given host.\n - If `strict_scope` is False, it will also consider parent domains and IP ranges.\n \"\"\"\n try:\n event = make_event(host, dummy=True)\n except ValidationError:\n return\n if event.host:\n return self.get_host(event.host, single=single)\n\n def get_host(self, host, single=True):\n \"\"\"\n A more efficient version of .get() that only accepts hostnames and IP addresses\n \"\"\"\n host = make_ip_type(host)\n with suppress(KeyError, StopIteration):\n result = self._radix.search(host)\n if result is not None:\n ret = set()\n for event in result:\n # if the result is a dns name and strict scope is enabled\n if isinstance(event.host, str) and self.strict_scope:\n # if the result doesn't exactly equal the host, abort\n if event.host != host:\n return\n if single:\n return event\n else:\n ret.add(event)\n if ret and not single:\n return ret\n\n def _sort_events(self, events):\n return sorted(events, key=lambda x: x._host_size)\n\n def _make_events(self, targets):\n events = []\n for target in targets:\n event_type = None\n for eventtype, regex in self.special_event_types.items():\n if isinstance(target, str):\n match = regex.match(target)\n if match:\n target = match.groups()[0]\n event_type = eventtype\n break\n events.append(make_event(target, event_type=event_type, dummy=True, scan=self.scan))\n return self._sort_events(events)\n\n def _add_event(self, event):\n skip = False\n if event.host:\n radix_data = self._radix.search(event.host)\n if self.acl_mode:\n # skip if the hostname/IP/subnet (or its parent) has already been added\n if radix_data is not None and not self.strict_scope:\n skip = True\n else:\n event_type = \"IP_RANGE\" if event.type == \"IP_RANGE\" else \"DNS_NAME\"\n event = make_event(event.host, event_type=event_type, dummy=True, scan=self.scan)\n if not skip:\n # if strict scope is enabled and it's not an exact host match, we add a whole new entry\n if radix_data is None or (self.strict_scope and event.host not in radix_data):\n radix_data = {event}\n self._radix.insert(event.host, radix_data)\n # otherwise, we add the event to the set\n else:\n radix_data.add(event)\n # clear hash\n self._hash = None\n elif self.acl_mode and not self.strict_scope:\n # skip if we're in ACL mode and there's no host\n skip = True\n if not skip:\n self._events.add(event)\n\n def _contains(self, other):\n if self.get(other) is not None:\n return True\n return False\n\n def __str__(self):\n return \",\".join([str(e.data) for e in self.events][:5])\n\n def __iter__(self):\n yield from self.events\n\n def __contains__(self, other):\n # if \"other\" is a Target\n if isinstance(other, self.__class__):\n contained_in_self = [self._contains(e) for e in other.events]\n return all(contained_in_self)\n else:\n return self._contains(other)\n\n def __bool__(self):\n return bool(self._events)\n\n def __eq__(self, other):\n return self.hash == other.hash\n\n @property\n def hash(self):\n if self._hash is None:\n # Create a new SHA-1 hash object\n sha1_hash = sha1()\n # Update the SHA-1 object with the hash values of each object\n for event_type, event_hash in sorted([(e.type.encode(), e.data_hash) for e in self.events]):\n sha1_hash.update(event_type)\n sha1_hash.update(event_hash)\n if self.strict_scope:\n sha1_hash.update(b\"\\x00\")\n self._hash = sha1_hash.digest()\n return self._hash\n\n def __len__(self):\n \"\"\"\n Calculates and returns the total number of hosts within this target, not counting duplicate events.\n\n Returns:\n int: The total number of unique hosts present within the target's `_events`.\n\n Examples:\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> len(target)\n 257\n\n Notes:\n - If a host is represented as an IP network, all individual IP addresses in that network are counted.\n - For other types of hosts, each unique event is counted as one.\n \"\"\"\n num_hosts = 0\n for event in self._events:\n if isinstance(event.host, (ipaddress.IPv4Network, ipaddress.IPv6Network)):\n num_hosts += event.host.num_addresses\n else:\n num_hosts += 1\n return num_hosts\n
"},{"location":"dev/target/#bbot.scanner.target.Target.events","title":"events property
","text":"events\n
Returns all events in the target.
Yields:
Event object: One of the Event objects stored in the _events
dictionary.
Examples:
>>> target = Target(scan, \"example.com\")\n>>> for event in target.events:\n... print(event)\n
Notes __init__(*targets, strict_scope=False, scan=None, acl_mode=False)\n
Initialize a Target object.
Parameters:
*targets
\u2013 One or more targets (e.g., domain names, IP ranges) to be included in this Target.
strict_scope
(bool
, default: False
) \u2013 Whether to consider subdomains of target domains in-scope
scan
(Scan
, default: None
) \u2013 Reference to the Scan object that instantiated the Target.
acl_mode
(bool
, default: False
) \u2013 Stricter deduplication for more efficient checks
self.helpers.make_target()
instead. (this removes the need to pass in a scan object.)Event
in the '_events' dictionary.bbot/scanner/target.py
def __init__(self, *targets, strict_scope=False, scan=None, acl_mode=False):\n \"\"\"\n Initialize a Target object.\n\n Args:\n *targets: One or more targets (e.g., domain names, IP ranges) to be included in this Target.\n strict_scope (bool): Whether to consider subdomains of target domains in-scope\n scan (Scan): Reference to the Scan object that instantiated the Target.\n acl_mode (bool): Stricter deduplication for more efficient checks\n\n Notes:\n - If you are instantiating a target from within a BBOT module, use `self.helpers.make_target()` instead. (this removes the need to pass in a scan object.)\n - The strict_scope flag can be set to restrict scope calculation to only exactly-matching hosts and not their child subdomains.\n - Each target is processed and stored as an `Event` in the '_events' dictionary.\n \"\"\"\n self.scan = scan\n self.strict_scope = strict_scope\n self.acl_mode = acl_mode\n self.special_event_types = {\n \"ORG_STUB\": re.compile(r\"^(?:ORG|ORG_STUB):(.*)\", re.IGNORECASE),\n \"USERNAME\": re.compile(r\"^(?:USER|USERNAME):(.*)\", re.IGNORECASE),\n }\n self._events = set()\n self._radix = RadixTarget()\n\n for target_event in self._make_events(targets):\n self._add_event(target_event)\n\n self._hash = None\n
"},{"location":"dev/target/#bbot.scanner.target.Target.add","title":"add","text":"add(t, event_type=None)\n
Add a target or merge events from another Target object into this Target.
Parameters:
t
\u2013 The target to be added. It can be either a string, an event object, or another Target object.
_events (dict): The dictionary is updated to include the new target's events.
Examples:
>>> target.add('example.com')\n
Notes t
is of the same class as this Target, all its events are merged.t
is an event, it is directly added to _events
.bbot/scanner/target.py
def add(self, t, event_type=None):\n \"\"\"\n Add a target or merge events from another Target object into this Target.\n\n Args:\n t: The target to be added. It can be either a string, an event object, or another Target object.\n\n Attributes Modified:\n _events (dict): The dictionary is updated to include the new target's events.\n\n Examples:\n >>> target.add('example.com')\n\n Notes:\n - If `t` is of the same class as this Target, all its events are merged.\n - If `t` is an event, it is directly added to `_events`.\n \"\"\"\n if not isinstance(t, (list, tuple, set)):\n t = [t]\n for single_target in t:\n if isinstance(single_target, self.__class__):\n for event in single_target.events:\n self._add_event(event)\n else:\n if is_event(single_target):\n event = single_target\n else:\n try:\n event = make_event(\n single_target, event_type=event_type, dummy=True, tags=[\"target\"], scan=self.scan\n )\n except ValidationError as e:\n # allow commented lines\n if not str(t).startswith(\"#\"):\n log.trace(traceback.format_exc())\n raise ValidationError(f'Could not add target \"{t}\": {e}')\n self._add_event(event)\n
"},{"location":"dev/target/#bbot.scanner.target.Target.copy","title":"copy","text":"copy()\n
Creates and returns a copy of the Target object, including a shallow copy of the _events
and _radix
attributes.
Returns:
Target
\u2013 A new Target object with the sameattributes as the original. A shallow copy of the _events
dictionary is made.
Examples:
>>> original_target = Target(scan, \"example.com\")\n>>> copied_target = original_target.copy()\n>>> copied_target is original_target\nFalse\n>>> copied_target == original_target\nTrue\n>>> copied_target in original_target\nTrue\n>>> original_target in copied_target\nTrue\n
Notes scan
object reference is kept intact in the copied Target object.bbot/scanner/target.py
def copy(self):\n \"\"\"\n Creates and returns a copy of the Target object, including a shallow copy of the `_events` and `_radix` attributes.\n\n Returns:\n Target: A new Target object with the sameattributes as the original.\n A shallow copy of the `_events` dictionary is made.\n\n Examples:\n >>> original_target = Target(scan, \"example.com\")\n >>> copied_target = original_target.copy()\n >>> copied_target is original_target\n False\n >>> copied_target == original_target\n True\n >>> copied_target in original_target\n True\n >>> original_target in copied_target\n True\n\n Notes:\n - The `scan` object reference is kept intact in the copied Target object.\n \"\"\"\n self_copy = self.__class__()\n self_copy._events = set(self._events)\n self_copy._radix = copy.copy(self._radix)\n return self_copy\n
"},{"location":"dev/target/#bbot.scanner.target.Target.get","title":"get","text":"get(host, single=True)\n
Gets the event associated with the specified host from the target's radix tree.
Parameters:
host
(Event, Target, or str
) \u2013 The hostname, IP, URL, or event to look for.
single
(bool
, default: True
) \u2013 Whether to return a single event. If False, return all events matching the host
Returns:
Event or None: Returns the Event object associated with the given host if it exists, otherwise returns None.
Examples:
>>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n>>> target.get(\"www.evilcorp.com\")\nDNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n>>> target.get(\"1.2.3.4\")\nIP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n
Notes strict_scope
is False, it will also consider parent domains and IP ranges.bbot/scanner/target.py
def get(self, host, single=True):\n \"\"\"\n Gets the event associated with the specified host from the target's radix tree.\n\n Args:\n host (Event, Target, or str): The hostname, IP, URL, or event to look for.\n single (bool): Whether to return a single event. If False, return all events matching the host\n\n Returns:\n Event or None: Returns the Event object associated with the given host if it exists, otherwise returns None.\n\n Examples:\n >>> target = Target(scan, \"evilcorp.com\", \"1.2.3.0/24\")\n >>> target.get(\"www.evilcorp.com\")\n DNS_NAME(\"evilcorp.com\", module=TARGET, tags={'domain', 'distance-1', 'target'})\n >>> target.get(\"1.2.3.4\")\n IP_RANGE(\"1.2.3.0/24\", module=TARGET, tags={'ipv4', 'distance-1', 'target'})\n\n Notes:\n - The method returns the first event that matches the given host.\n - If `strict_scope` is False, it will also consider parent domains and IP ranges.\n \"\"\"\n try:\n event = make_event(host, dummy=True)\n except ValidationError:\n return\n if event.host:\n return self.get_host(event.host, single=single)\n
"},{"location":"dev/target/#bbot.scanner.target.Target.get_host","title":"get_host","text":"get_host(host, single=True)\n
A more efficient version of .get() that only accepts hostnames and IP addresses
Source code inbbot/scanner/target.py
def get_host(self, host, single=True):\n \"\"\"\n A more efficient version of .get() that only accepts hostnames and IP addresses\n \"\"\"\n host = make_ip_type(host)\n with suppress(KeyError, StopIteration):\n result = self._radix.search(host)\n if result is not None:\n ret = set()\n for event in result:\n # if the result is a dns name and strict scope is enabled\n if isinstance(event.host, str) and self.strict_scope:\n # if the result doesn't exactly equal the host, abort\n if event.host != host:\n return\n if single:\n return event\n else:\n ret.add(event)\n if ret and not single:\n return ret\n
"},{"location":"dev/tests/","title":"Unit Tests","text":"BBOT takes tests seriously. Every module must have a custom-written test that actually tests its functionality. Don't worry if you want to contribute but you aren't used to writing tests. If you open a draft PR, we will help write them :)
We use black and flake8 for linting, and pytest for tests.
"},{"location":"dev/tests/#running-tests-locally","title":"Running tests locally","text":"We have Github actions that automatically run tests whenever you open a Pull Request. However, you can also run the tests locally with pytest
:
# format code with black\npoetry run black .\n\n# lint with flake8\npoetry run flake8\n\n# run all tests with pytest (takes rougly 30 minutes)\npoetry run pytest\n
"},{"location":"dev/tests/#running-specific-tests","title":"Running specific tests","text":"If you only want to run a single test, you can select it with -k
:
# run only the sslcert test\npoetry run pytest -k test_module_sslcert\n
You can also filter like this:
# run all the module tests except for sslcert\npoetry run pytest -k \"test_module_ and not test_module_sslcert\"\n
If you want to see the output of your module, you can enable --log-cli-level
:
poetry run pytest --log-cli-level=DEBUG\n
"},{"location":"dev/tests/#example-writing-a-module-test","title":"Example: Writing a Module Test","text":"To write a test for your module, create a new python file in bbot/test/test_step_2/module_tests
. Your filename must be test_module_<module_name>
:
from .base import ModuleTestBase\n\n\nclass TestMyModule(ModuleTestBase):\n targets = [\"blacklanternsecurity.com\"]\n config_overrides = {\"modules\": {\"mymodule\": {\"api_key\": \"deadbeef\"}}}\n\n async def setup_after_prep(self, module_test):\n # mock HTTP response\n module_test.httpx_mock.add_response(\n url=\"https://api.com/sudomains?apikey=deadbeef&domain=blacklanternsecurity.com\",\n json={\n \"subdomains\": [\n \"www.blacklanternsecurity.com\",\n \"dev.blacklanternsecurity.com\"\n ],\n },\n )\n # mock DNS\n await module_test.mock_dns(\n {\n \"blacklanternsecurity.com\": {\"A\": [\"1.2.3.4\"]},\n \"www.blacklanternsecurity.com\": {\"A\": [\"1.2.3.4\"]},\n \"dev.blacklanternsecurity.com\": {\"A\": [\"1.2.3.4\"]},\n }\n )\n\n def check(self, module_test, events):\n # here is where we check to make sure it worked\n dns_names = [e.data for e in events if e.type == \"DNS_NAME\"]\n # temporary log messages for debugging\n for e in dns_names:\n self.log.critical(e)\n assert \"www.blacklanternsecurity.com\" in dns_names, \"failed to find subdomain #1\"\n assert \"dev.blacklanternsecurity.com\" in dns_names, \"failed to find subdomain #2\"\n
"},{"location":"dev/tests/#debugging-a-test","title":"Debugging a test","text":"Similar to debugging from within a module, you can debug from within a test using self.log.critical()
, etc:
def check(self, module_test, events):\n for e in events:\n # bright red\n self.log.critical(e.type)\n # bright green\n self.log.hugesuccess(e.data)\n # bright orange\n self.log.hugewarning(e.tags)\n # bright blue\n self.log.hugeinfo(e.parent)\n
"},{"location":"dev/tests/#more-advanced-tests","title":"More advanced tests","text":"If you have questions about tests or need to write a more advanced test, come talk to us on GitHub or Discord.
It's also a good idea to look through our existing tests. BBOT has over a hundred of them, so you might find one that's similar to what you're trying to do.
"},{"location":"dev/helpers/","title":"BBOT Helpers","text":"In this section are various helper functions that are designed to make your life easier when devving on BBOT. Whether you're extending BBOT by writing a module or working on its core engine, these functions are designed to act as useful machine parts to perform essential tasks, such as making a web request or executing a DNS query.
The vast majority of these helpers can be accessed directly from the .helpers
attribute of a scan or module, like so:
class MyModule(BaseModule):\n\n ...\n\n async def handle_event(self, event):\n # Web Request\n response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n # DNS query\n for ip in await self.helpers.resolve(\"www.evilcorp.com\"):\n self.hugesuccess(str(ip))\n\n # Execute shell command\n completed_process = await self.run_process(\"ls\", \"-l\")\n self.hugesuccess(completed_process.stdout)\n\n # Split a DNS name into subdomain / domain\n self.helpers.split_domain(\"www.internal.evilcorp.co.uk\")\n # (\"www.internal\", \"evilcorp.co.uk\")\n
Next Up: Command Helpers -->
"},{"location":"dev/helpers/command/","title":"Command Helpers","text":"These are helpers related to executing shell commands. They are used throughout BBOT and its modules for executing various binaries such as masscan
, nuclei
, etc.
These helpers can be invoked directly from self.helpers
, but inside a module they should always use self.run_process()
or self.run_process_live()
. These are light wrappers which ensure the running process is tracked by the module so that it can be easily terminated should the user need to kill the module:
# simple subprocess\nls_result = await self.run_process(\"ls\", \"-l\")\nfor line ls_result.stdout.splitlines():\n # ...\n\n# iterate through each line in real time\nasync for line in self.run_process_live([\"grep\", \"-R\"]):\n # ...\n
"},{"location":"dev/helpers/command/#bbot.core.helpers.command.run","title":"run async
","text":"run(self, *command, check=False, text=True, idle_timeout=None, **kwargs)\n
Runs a command asynchronously and gets its output as a string.
This method is a simple helper for executing a command and capturing its output.\nIf an error occurs during execution, it can optionally raise an error or just log the stderr.\n\nArgs:\n *command (str): The command to run as separate arguments.\n check (bool, optional): If set to True, raises an error if the subprocess exits with a non-zero status.\n Defaults to False.\n text (bool, optional): If set to True, decodes the subprocess output to string. Defaults to True.\n idle_timeout (int, optional): Sets a limit on the number of seconds the process can run before throwing a TimeoutError\n **kwargs (dict): Additional keyword arguments for the subprocess.\n\nReturns:\n CompletedProcess: A completed process object with attributes for the command, return code, stdout, and stderr.\n\nRaises:\n CalledProcessError: If the subprocess exits with a non-zero status and `check=True`.\n\nExamples:\n >>> process = await run([\"ls\", \"/tmp\"])\n >>> process.stdout\n \"file1.txt\n
file2.txt\"
Source code inbbot/core/helpers/command.py
async def run(self, *command, check=False, text=True, idle_timeout=None, **kwargs):\n \"\"\"Runs a command asynchronously and gets its output as a string.\n\n This method is a simple helper for executing a command and capturing its output.\n If an error occurs during execution, it can optionally raise an error or just log the stderr.\n\n Args:\n *command (str): The command to run as separate arguments.\n check (bool, optional): If set to True, raises an error if the subprocess exits with a non-zero status.\n Defaults to False.\n text (bool, optional): If set to True, decodes the subprocess output to string. Defaults to True.\n idle_timeout (int, optional): Sets a limit on the number of seconds the process can run before throwing a TimeoutError\n **kwargs (dict): Additional keyword arguments for the subprocess.\n\n Returns:\n CompletedProcess: A completed process object with attributes for the command, return code, stdout, and stderr.\n\n Raises:\n CalledProcessError: If the subprocess exits with a non-zero status and `check=True`.\n\n Examples:\n >>> process = await run([\"ls\", \"/tmp\"])\n >>> process.stdout\n \"file1.txt\\nfile2.txt\"\n \"\"\"\n # proc_tracker optionally keeps track of which processes are running under which modules\n # this allows for graceful SIGINTing of a module's processes in the case when it's killed\n proc_tracker = kwargs.pop(\"_proc_tracker\", set())\n log_stderr = kwargs.pop(\"_log_stderr\", True)\n proc, _input, command = await self._spawn_proc(*command, **kwargs)\n if proc is not None:\n proc_tracker.add(proc)\n try:\n if _input is not None:\n if isinstance(_input, (list, tuple)):\n _input = b\"\\n\".join(smart_encode(i) for i in _input) + b\"\\n\"\n else:\n _input = smart_encode(_input)\n\n try:\n if idle_timeout is not None:\n stdout, stderr = await asyncio.wait_for(proc.communicate(_input), timeout=idle_timeout)\n else:\n stdout, stderr = await proc.communicate(_input)\n except asyncio.exceptions.TimeoutError:\n proc.send_signal(SIGINT)\n raise\n\n # surface stderr\n if text:\n if stderr is not None:\n stderr = smart_decode(stderr)\n if stdout is not None:\n stdout = smart_decode(stdout)\n if proc.returncode:\n if check:\n raise CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)\n if stderr and log_stderr:\n command_str = \" \".join(command)\n log.warning(f\"Stderr for run({command_str}):\\n\\t{stderr}\")\n\n return CompletedProcess(command, proc.returncode, stdout, stderr)\n finally:\n proc_tracker.remove(proc)\n
"},{"location":"dev/helpers/command/#bbot.core.helpers.command.run_live","title":"run_live async
","text":"run_live(self, *command, check=False, text=True, idle_timeout=None, **kwargs)\n
Runs a command asynchronously and iterates through its output line by line in realtime.
This method is useful for executing a command and capturing its output on-the-fly, as it is generated. If an error occurs during execution, it can optionally raise an error or just log the stderr.
Parameters:
*command
(str
, default: ()
) \u2013 The command to run as separate arguments.
check
(bool
, default: False
) \u2013 If set to True, raises an error if the subprocess exits with a non-zero status. Defaults to False.
text
(bool
, default: True
) \u2013 If set to True, decodes the subprocess output to string. Defaults to True.
idle_timeout
(int
, default: None
) \u2013 Sets a limit on the number of seconds the process can remain idle (no lines sent to stdout) before throwing a TimeoutError
**kwargs
(dict
, default: {}
) \u2013 Additional keyword arguments for the subprocess.
Yields:
str or bytes: The output lines of the command, either as a decoded string (if text=True
) or as bytes (if text=False
).
Raises:
CalledProcessError
\u2013 If the subprocess exits with a non-zero status and check=True
.
Examples:
>>> async for line in run_live([\"tail\", \"-f\", \"/var/log/auth.log\"]):\n... log.info(line)\n
Source code in bbot/core/helpers/command.py
async def run_live(self, *command, check=False, text=True, idle_timeout=None, **kwargs):\n \"\"\"Runs a command asynchronously and iterates through its output line by line in realtime.\n\n This method is useful for executing a command and capturing its output on-the-fly, as it is generated.\n If an error occurs during execution, it can optionally raise an error or just log the stderr.\n\n Args:\n *command (str): The command to run as separate arguments.\n check (bool, optional): If set to True, raises an error if the subprocess exits with a non-zero status.\n Defaults to False.\n text (bool, optional): If set to True, decodes the subprocess output to string. Defaults to True.\n idle_timeout (int, optional): Sets a limit on the number of seconds the process can remain idle (no lines sent to stdout) before throwing a TimeoutError\n **kwargs (dict): Additional keyword arguments for the subprocess.\n\n Yields:\n str or bytes: The output lines of the command, either as a decoded string (if `text=True`)\n or as bytes (if `text=False`).\n\n Raises:\n CalledProcessError: If the subprocess exits with a non-zero status and `check=True`.\n\n Examples:\n >>> async for line in run_live([\"tail\", \"-f\", \"/var/log/auth.log\"]):\n ... log.info(line)\n \"\"\"\n # proc_tracker optionally keeps track of which processes are running under which modules\n # this allows for graceful SIGINTing of a module's processes in the case when it's killed\n proc_tracker = kwargs.pop(\"_proc_tracker\", set())\n log_stderr = kwargs.pop(\"_log_stderr\", True)\n proc, _input, command = await self._spawn_proc(*command, **kwargs)\n if proc is not None:\n proc_tracker.add(proc)\n try:\n input_task = None\n if _input is not None:\n input_task = asyncio.create_task(_write_stdin(proc, _input))\n\n while 1:\n try:\n if idle_timeout is not None:\n line = await asyncio.wait_for(proc.stdout.readline(), timeout=idle_timeout)\n else:\n line = await proc.stdout.readline()\n except asyncio.exceptions.TimeoutError:\n proc.send_signal(SIGINT)\n raise\n except ValueError as e:\n command_str = \" \".join([str(c) for c in command])\n log.warning(f\"Error executing command {command_str}: {e}\")\n log.trace(traceback.format_exc())\n continue\n if not line:\n break\n if text:\n line = smart_decode(line).rstrip(\"\\r\\n\")\n else:\n line = line.rstrip(b\"\\r\\n\")\n yield line\n\n if input_task is not None:\n try:\n await input_task\n except ConnectionError:\n log.trace(f\"ConnectionError in command: {command}, kwargs={kwargs}\")\n log.trace(traceback.format_exc())\n await proc.wait()\n\n if proc.returncode:\n stdout, stderr = await proc.communicate()\n if text:\n if stderr is not None:\n stderr = smart_decode(stderr)\n if stdout is not None:\n stdout = smart_decode(stdout)\n if check:\n raise CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)\n # surface stderr\n if stderr and log_stderr:\n command_str = \" \".join(command)\n log.warning(f\"Stderr for run_live({command_str}):\\n\\t{stderr}\")\n finally:\n proc_tracker.remove(proc)\n
"},{"location":"dev/helpers/dns/","title":"DNS","text":"These are helpers related to DNS resolution. They are used throughout BBOT and its modules for performing DNS lookups and detecting DNS wildcards, etc.
Note that these helpers can be invoked directly from self.helpers
, e.g.:
self.helpers.resolve(\"evilcorp.com\")\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper","title":"DNSHelper","text":" Bases: EngineClient
bbot/core/helpers/dns/dns.py
class DNSHelper(EngineClient):\n\n SERVER_CLASS = DNSEngine\n ERROR_CLASS = DNSError\n\n \"\"\"Helper class for DNS-related operations within BBOT.\n\n This class provides mechanisms for host resolution, wildcard domain detection, event tagging, and more.\n It centralizes all DNS-related activities in BBOT, offering both synchronous and asynchronous methods\n for DNS resolution, as well as various utilities for batch resolution and DNS query filtering.\n\n Attributes:\n parent_helper: A reference to the instantiated `ConfigAwareHelper` (typically `scan.helpers`).\n resolver (BBOTAsyncResolver): An asynchronous DNS resolver tailored for BBOT with rate-limiting capabilities.\n timeout (int): The timeout value for DNS queries. Defaults to 5 seconds.\n retries (int): The number of retries for failed DNS queries. Defaults to 1.\n abort_threshold (int): The threshold for aborting after consecutive failed queries. Defaults to 50.\n runaway_limit (int): Maximum allowed distance for consecutive DNS resolutions. Defaults to 5.\n all_rdtypes (list): A list of DNS record types to be considered during operations.\n wildcard_ignore (tuple): Domains to be ignored during wildcard detection.\n wildcard_tests (int): Number of tests to be run for wildcard detection. Defaults to 5.\n _wildcard_cache (dict): Cache for wildcard detection results.\n _dns_cache (LRUCache): Cache for DNS resolution results, limited in size.\n resolver_file (Path): File containing system's current resolver nameservers.\n filter_bad_ptrs (bool): Whether to filter out DNS names that appear to be auto-generated PTR records. Defaults to True.\n\n Args:\n parent_helper: The parent helper object with configuration details and utilities.\n\n Raises:\n DNSError: If an issue arises when creating the BBOTAsyncResolver instance.\n\n Examples:\n >>> dns_helper = DNSHelper(parent_config)\n >>> resolved_host = dns_helper.resolver.resolve(\"example.com\")\n \"\"\"\n\n def __init__(self, parent_helper):\n self.parent_helper = parent_helper\n self.config = self.parent_helper.config\n self.dns_config = self.config.get(\"dns\", {})\n engine_debug = self.config.get(\"engine\", {}).get(\"debug\", False)\n super().__init__(server_kwargs={\"config\": self.config}, debug=engine_debug)\n\n # resolver\n self.timeout = self.dns_config.get(\"timeout\", 5)\n self.resolver = dns.asyncresolver.Resolver()\n self.resolver.rotate = True\n self.resolver.timeout = self.timeout\n self.resolver.lifetime = self.timeout\n\n self.runaway_limit = self.config.get(\"runaway_limit\", 5)\n\n # wildcard handling\n self.wildcard_disable = self.dns_config.get(\"wildcard_disable\", False)\n self.wildcard_ignore = RadixTarget()\n for d in self.dns_config.get(\"wildcard_ignore\", []):\n self.wildcard_ignore.insert(d)\n\n # copy the system's current resolvers to a text file for tool use\n self.system_resolvers = dns.resolver.Resolver().nameservers\n # TODO: DNS server speed test (start in background task)\n self.resolver_file = self.parent_helper.tempfile(self.system_resolvers, pipe=False)\n\n # brute force helper\n self._brute = None\n\n self._is_wildcard_cache = LFUCache(maxsize=1000)\n self._is_wildcard_domain_cache = LFUCache(maxsize=1000)\n\n async def resolve(self, query, **kwargs):\n return await self.run_and_return(\"resolve\", query=query, **kwargs)\n\n async def resolve_raw(self, query, **kwargs):\n return await self.run_and_return(\"resolve_raw\", query=query, **kwargs)\n\n async def resolve_batch(self, queries, **kwargs):\n agen = self.run_and_yield(\"resolve_batch\", queries=queries, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n async def resolve_raw_batch(self, queries):\n agen = self.run_and_yield(\"resolve_raw_batch\", queries=queries)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n @property\n def brute(self):\n if self._brute is None:\n from .brute import DNSBrute\n\n self._brute = DNSBrute(self.parent_helper)\n return self._brute\n\n @async_cachedmethod(lambda self: self._is_wildcard_cache)\n async def is_wildcard(self, query, ips=None, rdtype=None):\n \"\"\"\n Use this method to check whether a *host* is a wildcard entry\n\n This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.\n\n If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.\n\n Args:\n query (str): The hostname to check for a wildcard entry.\n ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.\n rdtype (str, optional): The DNS record type (e.g., \"A\", \"AAAA\") to consider during the check.\n\n Returns:\n dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.\n Keys are DNS record types like \"A\", \"AAAA\", etc.\n Values are tuples where the first element is a boolean indicating if the query is a wildcard,\n and the second element is the wildcard parent if it's a wildcard.\n\n Raises:\n ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.\n\n Examples:\n >>> is_wildcard(\"www.github.io\")\n {\"A\": (True, \"github.io\"), \"AAAA\": (True, \"github.io\")}\n\n >>> is_wildcard(\"www.evilcorp.com\", ips=[\"93.184.216.34\"], rdtype=\"A\")\n {\"A\": (False, \"evilcorp.com\")}\n\n Note:\n `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)\n \"\"\"\n if [ips, rdtype].count(None) == 1:\n raise ValueError(\"Both ips and rdtype must be specified\")\n\n query = self._wildcard_prevalidation(query)\n if not query:\n return {}\n\n # skip check if the query is a domain\n if is_domain(query):\n return {}\n\n return await self.run_and_return(\"is_wildcard\", query=query, ips=ips, rdtype=rdtype)\n\n @async_cachedmethod(lambda self: self._is_wildcard_domain_cache)\n async def is_wildcard_domain(self, domain, log_info=False):\n domain = self._wildcard_prevalidation(domain)\n if not domain:\n return {}\n\n return await self.run_and_return(\"is_wildcard_domain\", domain=domain, log_info=False)\n\n def _wildcard_prevalidation(self, host):\n if self.wildcard_disable:\n return False\n\n host = clean_dns_record(host)\n # skip check if it's an IP or a plain hostname\n if is_ip(host) or not \".\" in host:\n return False\n\n # skip if query isn't a dns name\n if not is_dns_name(host):\n return False\n\n # skip check if the query's parent domain is excluded in the config\n wildcard_ignore = self.wildcard_ignore.search(host)\n if wildcard_ignore:\n log.debug(f\"Skipping wildcard detection on {host} because {wildcard_ignore} is excluded in the config\")\n return False\n\n return host\n\n async def _mock_dns(self, mock_data):\n from .mock import MockResolver\n\n self.resolver = MockResolver(mock_data)\n await self.run_and_return(\"_mock_dns\", mock_data=mock_data)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.resolve","title":"resolve async
","text":"resolve(query, **kwargs)\n
Source code in bbot/core/helpers/dns/dns.py
async def resolve(self, query, **kwargs):\n return await self.run_and_return(\"resolve\", query=query, **kwargs)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.resolve_batch","title":"resolve_batch async
","text":"resolve_batch(queries, **kwargs)\n
Source code in bbot/core/helpers/dns/dns.py
async def resolve_batch(self, queries, **kwargs):\n agen = self.run_and_yield(\"resolve_batch\", queries=queries, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.resolve_raw","title":"resolve_raw async
","text":"resolve_raw(query, **kwargs)\n
Source code in bbot/core/helpers/dns/dns.py
async def resolve_raw(self, query, **kwargs):\n return await self.run_and_return(\"resolve_raw\", query=query, **kwargs)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.is_wildcard","title":"is_wildcard async
","text":"is_wildcard(query, ips=None, rdtype=None)\n
Use this method to check whether a host is a wildcard entry
This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.
If you want to know whether a domain is using wildcard DNS, use is_wildcard_domain()
instead.
Parameters:
query
(str
) \u2013 The hostname to check for a wildcard entry.
ips
(list
, default: None
) \u2013 List of IPs to compare against, typically obtained from a previous DNS resolution of the query.
rdtype
(str
, default: None
) \u2013 The DNS record type (e.g., \"A\", \"AAAA\") to consider during the check.
Returns:
dict
\u2013 A dictionary indicating if the query is a wildcard for each checked DNS record type. Keys are DNS record types like \"A\", \"AAAA\", etc. Values are tuples where the first element is a boolean indicating if the query is a wildcard, and the second element is the wildcard parent if it's a wildcard.
Raises:
ValueError
\u2013 If only one of ips
or rdtype
is specified or if no valid IPs are specified.
Examples:
>>> is_wildcard(\"www.github.io\")\n{\"A\": (True, \"github.io\"), \"AAAA\": (True, \"github.io\")}\n
>>> is_wildcard(\"www.evilcorp.com\", ips=[\"93.184.216.34\"], rdtype=\"A\")\n{\"A\": (False, \"evilcorp.com\")}\n
Note is_wildcard
can be True, False, or None (indicating that wildcard detection was inconclusive)
bbot/core/helpers/dns/dns.py
@async_cachedmethod(lambda self: self._is_wildcard_cache)\nasync def is_wildcard(self, query, ips=None, rdtype=None):\n \"\"\"\n Use this method to check whether a *host* is a wildcard entry\n\n This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.\n\n If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.\n\n Args:\n query (str): The hostname to check for a wildcard entry.\n ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.\n rdtype (str, optional): The DNS record type (e.g., \"A\", \"AAAA\") to consider during the check.\n\n Returns:\n dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.\n Keys are DNS record types like \"A\", \"AAAA\", etc.\n Values are tuples where the first element is a boolean indicating if the query is a wildcard,\n and the second element is the wildcard parent if it's a wildcard.\n\n Raises:\n ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.\n\n Examples:\n >>> is_wildcard(\"www.github.io\")\n {\"A\": (True, \"github.io\"), \"AAAA\": (True, \"github.io\")}\n\n >>> is_wildcard(\"www.evilcorp.com\", ips=[\"93.184.216.34\"], rdtype=\"A\")\n {\"A\": (False, \"evilcorp.com\")}\n\n Note:\n `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)\n \"\"\"\n if [ips, rdtype].count(None) == 1:\n raise ValueError(\"Both ips and rdtype must be specified\")\n\n query = self._wildcard_prevalidation(query)\n if not query:\n return {}\n\n # skip check if the query is a domain\n if is_domain(query):\n return {}\n\n return await self.run_and_return(\"is_wildcard\", query=query, ips=ips, rdtype=rdtype)\n
"},{"location":"dev/helpers/dns/#bbot.core.helpers.dns.DNSHelper.is_wildcard_domain","title":"is_wildcard_domain async
","text":"is_wildcard_domain(domain, log_info=False)\n
Source code in bbot/core/helpers/dns/dns.py
@async_cachedmethod(lambda self: self._is_wildcard_domain_cache)\nasync def is_wildcard_domain(self, domain, log_info=False):\n domain = self._wildcard_prevalidation(domain)\n if not domain:\n return {}\n\n return await self.run_and_return(\"is_wildcard_domain\", domain=domain, log_info=False)\n
"},{"location":"dev/helpers/interactsh/","title":"Interact.sh","text":"A pure python implementation of ProjectDiscovery's interact.sh.
\"Interactsh is an open-source tool for detecting out-of-band interactions. It is a tool designed to detect vulnerabilities that cause external interactions.\"
This class facilitates interactions with the interact.sh service for out-of-band data exfiltration and vulnerability confirmation. It allows for customization by accepting server and token parameters from the configuration provided by parent_helper
.
Attributes:
parent_helper
(ConfigAwareHelper
) \u2013 An instance of a helper class containing configuration data.
server
(str
) \u2013 The server to be used. If None (the default), a random server will be chosen from a predetermined list.
correlation_id
(str
) \u2013 An identifier to correlate requests and responses. Default is None.
custom_server
(str
) \u2013 Optional. A custom interact.sh server. Loaded from configuration.
token
(str
) \u2013 Optional. A token for interact.sh API. Loaded from configuration.
_poll_task
(AsyncTask
) \u2013 The task responsible for polling the interact.sh server.
Examples:
# instantiate interact.sh client (no requests are sent yet)\n>>> interactsh_client = self.helpers.interactsh()\n# register with an interact.sh server\n>>> interactsh_domain = await interactsh_client.register()\n[INFO] Registering with interact.sh server: oast.me\n[INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n# simulate an out-of-band interaction\n>>> await self.helpers.request(f\"https://{interactsh_domain}/test\")\n# wait for out-of-band interaction to be registered\n>>> await asyncio.sleep(10)\n>>> data_list = await interactsh_client.poll()\n>>> print(data_list)\n[\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"q-type\": \"A\",\n \"raw-request\": \"...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:23.187226851Z\"\n },\n {\n \"protocol\": \"http\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"raw-request\": \"GET /test HTTP/1.1 ...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:24.155677967Z\"\n }\n]\n# finally, shut down the client\n>>> await interactsh_client.deregister()\n
Source code in bbot/core/helpers/interactsh.py
class Interactsh:\n \"\"\"\n A pure python implementation of ProjectDiscovery's interact.sh.\n\n *\"Interactsh is an open-source tool for detecting out-of-band interactions. It is a tool designed to detect vulnerabilities that cause external interactions.\"*\n\n - https://app.interactsh.com\n - https://github.com/projectdiscovery/interactsh\n\n This class facilitates interactions with the interact.sh service for\n out-of-band data exfiltration and vulnerability confirmation. It allows\n for customization by accepting server and token parameters from the\n configuration provided by `parent_helper`.\n\n Attributes:\n parent_helper (ConfigAwareHelper): An instance of a helper class containing configuration data.\n server (str): The server to be used. If None (the default), a random server will be chosen from a predetermined list.\n correlation_id (str): An identifier to correlate requests and responses. Default is None.\n custom_server (str): Optional. A custom interact.sh server. Loaded from configuration.\n token (str): Optional. A token for interact.sh API. Loaded from configuration.\n _poll_task (AsyncTask): The task responsible for polling the interact.sh server.\n\n Examples:\n ```python\n # instantiate interact.sh client (no requests are sent yet)\n >>> interactsh_client = self.helpers.interactsh()\n # register with an interact.sh server\n >>> interactsh_domain = await interactsh_client.register()\n [INFO] Registering with interact.sh server: oast.me\n [INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n # simulate an out-of-band interaction\n >>> await self.helpers.request(f\"https://{interactsh_domain}/test\")\n # wait for out-of-band interaction to be registered\n >>> await asyncio.sleep(10)\n >>> data_list = await interactsh_client.poll()\n >>> print(data_list)\n [\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"q-type\": \"A\",\n \"raw-request\": \"...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:23.187226851Z\"\n },\n {\n \"protocol\": \"http\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"full-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n \"raw-request\": \"GET /test HTTP/1.1 ...\",\n \"remote-address\": \"1.2.3.4\",\n \"timestamp\": \"2023-09-15T21:09:24.155677967Z\"\n }\n ]\n # finally, shut down the client\n >>> await interactsh_client.deregister()\n ```\n \"\"\"\n\n def __init__(self, parent_helper, poll_interval=10):\n self.parent_helper = parent_helper\n self.server = None\n self.correlation_id = None\n self.custom_server = self.parent_helper.config.get(\"interactsh_server\", None)\n self.token = self.parent_helper.config.get(\"interactsh_token\", None)\n self.poll_interval = poll_interval\n self._poll_task = None\n\n async def register(self, callback=None):\n \"\"\"\n Registers the instance with an interact.sh server and sets up polling.\n\n Generates RSA keys for secure communication, builds a correlation ID,\n and sends a POST request to an interact.sh server to register. Optionally,\n starts an asynchronous polling task to listen for interactions.\n\n Args:\n callback (callable, optional): A function to be called each time new interactions are received.\n\n Returns:\n str: The registered domain for out-of-band interactions.\n\n Raises:\n InteractshError: If registration with an interact.sh server fails.\n\n Examples:\n >>> interactsh_client = self.helpers.interactsh()\n >>> registered_domain = await interactsh_client.register()\n [INFO] Registering with interact.sh server: oast.me\n [INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n \"\"\"\n rsa = RSA.generate(1024)\n\n self.public_key = rsa.publickey().exportKey()\n self.private_key = rsa.exportKey()\n\n encoded_public_key = base64.b64encode(self.public_key).decode(\"utf8\")\n\n uuid = uuid4().hex.ljust(33, \"a\")\n guid = \"\".join(i if i.isdigit() else chr(ord(i) + random.randint(0, 20)) for i in uuid)\n\n self.correlation_id = guid[:20]\n self.secret = str(uuid4())\n headers = {}\n\n if self.custom_server:\n if not self.token:\n log.verbose(\"Interact.sh token is not set\")\n else:\n headers[\"Authorization\"] = self.token\n self.server_list = [str(self.custom_server)]\n else:\n self.server_list = random.sample(server_list, k=len(server_list))\n for server in self.server_list:\n log.info(f\"Registering with interact.sh server: {server}\")\n data = {\n \"public-key\": encoded_public_key,\n \"secret-key\": self.secret,\n \"correlation-id\": self.correlation_id,\n }\n r = await self.parent_helper.request(\n f\"https://{server}/register\", headers=headers, json=data, method=\"POST\"\n )\n if r is None:\n continue\n try:\n msg = r.json().get(\"message\", \"\")\n assert \"registration successful\" in msg\n except Exception:\n log.debug(f\"Failed to register with interactsh server {self.server}\")\n continue\n self.server = server\n self.domain = f\"{guid}.{self.server}\"\n break\n\n if not self.server:\n raise InteractshError(f\"Failed to register with an interactsh server\")\n\n log.info(\n f\"Successfully registered to interactsh server {self.server} with correlation_id {self.correlation_id} [{self.domain}]\"\n )\n\n if callable(callback):\n self._poll_task = asyncio.create_task(self.poll_loop(callback))\n\n return self.domain\n\n async def deregister(self):\n \"\"\"\n Deregisters the instance from the interact.sh server and cancels the polling task.\n\n Sends a POST request to the server to deregister, using the correlation ID\n and secret key generated during registration. Optionally, if a polling\n task was started, it is cancelled.\n\n Raises:\n InteractshError: If required information is missing or if deregistration fails.\n\n Examples:\n >>> await interactsh_client.deregister()\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to deregister\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n data = {\"secret-key\": self.secret, \"correlation-id\": self.correlation_id}\n\n r = await self.parent_helper.request(\n f\"https://{self.server}/deregister\", headers=headers, json=data, method=\"POST\"\n )\n\n if self._poll_task is not None:\n self._poll_task.cancel()\n\n if \"success\" not in getattr(r, \"text\", \"\"):\n raise InteractshError(f\"Failed to de-register with interactsh server {self.server}\")\n\n async def poll(self):\n \"\"\"\n Polls the interact.sh server for interactions tied to the current instance.\n\n Sends a GET request to the server to fetch interactions associated with the\n current correlation_id and secret key. Returned interactions are decrypted\n using an AES key provided by the server response.\n\n Raises:\n InteractshError: If required information for polling is missing.\n\n Returns:\n list: A list of decrypted interaction data dictionaries.\n\n Examples:\n >>> data_list = await interactsh_client.poll()\n >>> print(data_list)\n [\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n ...\n },\n ...\n ]\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to poll\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n try:\n r = await self.parent_helper.request(\n f\"https://{self.server}/poll?id={self.correlation_id}&secret={self.secret}\", headers=headers\n )\n if r is None:\n raise InteractshError(\"Error polling interact.sh: No response from server\")\n\n ret = []\n data_list = r.json().get(\"data\", None)\n if data_list:\n aes_key = r.json()[\"aes_key\"]\n\n for data in data_list:\n decrypted_data = self._decrypt(aes_key, data)\n ret.append(decrypted_data)\n return ret\n except Exception as e:\n raise InteractshError(f\"Error polling interact.sh: {e}\")\n\n async def poll_loop(self, callback):\n \"\"\"\n Starts a polling loop to continuously check for interactions with the interact.sh server.\n\n Continuously polls the interact.sh server for interactions tied to the current instance,\n using the `poll` method. When interactions are received, it executes the given callback\n function with each interaction data.\n\n Parameters:\n callback (callable): The function to be called for every interaction received from the server.\n\n Returns:\n awaitable: An awaitable object that executes the internal `_poll_loop` method.\n\n Examples:\n >>> await interactsh_client.poll_loop(my_callback)\n \"\"\"\n async with self.parent_helper.scan._acatch(context=self._poll_loop):\n return await self._poll_loop(callback)\n\n async def _poll_loop(self, callback):\n while 1:\n if self.parent_helper.scan.stopping:\n await asyncio.sleep(1)\n continue\n data_list = []\n try:\n data_list = await self.poll()\n except InteractshError as e:\n log.warning(e)\n log.trace(traceback.format_exc())\n if not data_list:\n await asyncio.sleep(self.poll_interval)\n continue\n for data in data_list:\n if data:\n await self.parent_helper.execute_sync_or_async(callback, data)\n\n def _decrypt(self, aes_key, data):\n \"\"\"\n Decrypts and returns the data received from the interact.sh server.\n\n Uses RSA and AES for decrypting the data. RSA with PKCS1_OAEP and SHA256 is used to decrypt the AES key,\n and then AES (CFB mode) is used to decrypt the actual data payload.\n\n Parameters:\n aes_key (str): The AES key for decryption, encrypted with RSA and base64 encoded.\n data (str): The data payload to decrypt, which is base64 encoded and AES encrypted.\n\n Returns:\n dict: The decrypted data, loaded as a JSON object.\n\n Examples:\n >>> decrypted_data = self._decrypt(aes_key, data)\n \"\"\"\n private_key = RSA.importKey(self.private_key)\n cipher = PKCS1_OAEP.new(private_key, hashAlgo=SHA256)\n aes_plain_key = cipher.decrypt(base64.b64decode(aes_key))\n decode = base64.b64decode(data)\n bs = AES.block_size\n iv = decode[:bs]\n cryptor = AES.new(key=aes_plain_key, mode=AES.MODE_CFB, IV=iv, segment_size=128)\n plain_text = cryptor.decrypt(decode)\n return json.loads(plain_text[16:])\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.deregister","title":"deregister async
","text":"deregister()\n
Deregisters the instance from the interact.sh server and cancels the polling task.
Sends a POST request to the server to deregister, using the correlation ID and secret key generated during registration. Optionally, if a polling task was started, it is cancelled.
Raises:
InteractshError
\u2013 If required information is missing or if deregistration fails.
Examples:
>>> await interactsh_client.deregister()\n
Source code in bbot/core/helpers/interactsh.py
async def deregister(self):\n \"\"\"\n Deregisters the instance from the interact.sh server and cancels the polling task.\n\n Sends a POST request to the server to deregister, using the correlation ID\n and secret key generated during registration. Optionally, if a polling\n task was started, it is cancelled.\n\n Raises:\n InteractshError: If required information is missing or if deregistration fails.\n\n Examples:\n >>> await interactsh_client.deregister()\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to deregister\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n data = {\"secret-key\": self.secret, \"correlation-id\": self.correlation_id}\n\n r = await self.parent_helper.request(\n f\"https://{self.server}/deregister\", headers=headers, json=data, method=\"POST\"\n )\n\n if self._poll_task is not None:\n self._poll_task.cancel()\n\n if \"success\" not in getattr(r, \"text\", \"\"):\n raise InteractshError(f\"Failed to de-register with interactsh server {self.server}\")\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.poll","title":"poll async
","text":"poll()\n
Polls the interact.sh server for interactions tied to the current instance.
Sends a GET request to the server to fetch interactions associated with the current correlation_id and secret key. Returned interactions are decrypted using an AES key provided by the server response.
Raises:
InteractshError
\u2013 If required information for polling is missing.
Returns:
list
\u2013 A list of decrypted interaction data dictionaries.
Examples:
>>> data_list = await interactsh_client.poll()\n>>> print(data_list)\n[\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n ...\n },\n ...\n]\n
Source code in bbot/core/helpers/interactsh.py
async def poll(self):\n \"\"\"\n Polls the interact.sh server for interactions tied to the current instance.\n\n Sends a GET request to the server to fetch interactions associated with the\n current correlation_id and secret key. Returned interactions are decrypted\n using an AES key provided by the server response.\n\n Raises:\n InteractshError: If required information for polling is missing.\n\n Returns:\n list: A list of decrypted interaction data dictionaries.\n\n Examples:\n >>> data_list = await interactsh_client.poll()\n >>> print(data_list)\n [\n {\n \"protocol\": \"dns\",\n \"unique-id\": \"rg99x2f860h5466ou3so86i07n1m3013k\",\n ...\n },\n ...\n ]\n \"\"\"\n if not self.server or not self.correlation_id or not self.secret:\n raise InteractshError(f\"Missing required information to poll\")\n\n headers = {}\n if self.token:\n headers[\"Authorization\"] = self.token\n\n try:\n r = await self.parent_helper.request(\n f\"https://{self.server}/poll?id={self.correlation_id}&secret={self.secret}\", headers=headers\n )\n if r is None:\n raise InteractshError(\"Error polling interact.sh: No response from server\")\n\n ret = []\n data_list = r.json().get(\"data\", None)\n if data_list:\n aes_key = r.json()[\"aes_key\"]\n\n for data in data_list:\n decrypted_data = self._decrypt(aes_key, data)\n ret.append(decrypted_data)\n return ret\n except Exception as e:\n raise InteractshError(f\"Error polling interact.sh: {e}\")\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.poll_loop","title":"poll_loop async
","text":"poll_loop(callback)\n
Starts a polling loop to continuously check for interactions with the interact.sh server.
Continuously polls the interact.sh server for interactions tied to the current instance, using the poll
method. When interactions are received, it executes the given callback function with each interaction data.
Parameters:
callback
(callable
) \u2013 The function to be called for every interaction received from the server.
Returns:
awaitable
\u2013 An awaitable object that executes the internal _poll_loop
method.
Examples:
>>> await interactsh_client.poll_loop(my_callback)\n
Source code in bbot/core/helpers/interactsh.py
async def poll_loop(self, callback):\n \"\"\"\n Starts a polling loop to continuously check for interactions with the interact.sh server.\n\n Continuously polls the interact.sh server for interactions tied to the current instance,\n using the `poll` method. When interactions are received, it executes the given callback\n function with each interaction data.\n\n Parameters:\n callback (callable): The function to be called for every interaction received from the server.\n\n Returns:\n awaitable: An awaitable object that executes the internal `_poll_loop` method.\n\n Examples:\n >>> await interactsh_client.poll_loop(my_callback)\n \"\"\"\n async with self.parent_helper.scan._acatch(context=self._poll_loop):\n return await self._poll_loop(callback)\n
"},{"location":"dev/helpers/interactsh/#bbot.core.helpers.interactsh.Interactsh.register","title":"register async
","text":"register(callback=None)\n
Registers the instance with an interact.sh server and sets up polling.
Generates RSA keys for secure communication, builds a correlation ID, and sends a POST request to an interact.sh server to register. Optionally, starts an asynchronous polling task to listen for interactions.
Parameters:
callback
(callable
, default: None
) \u2013 A function to be called each time new interactions are received.
Returns:
str
\u2013 The registered domain for out-of-band interactions.
Raises:
InteractshError
\u2013 If registration with an interact.sh server fails.
Examples:
>>> interactsh_client = self.helpers.interactsh()\n>>> registered_domain = await interactsh_client.register()\n[INFO] Registering with interact.sh server: oast.me\n[INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n
Source code in bbot/core/helpers/interactsh.py
async def register(self, callback=None):\n \"\"\"\n Registers the instance with an interact.sh server and sets up polling.\n\n Generates RSA keys for secure communication, builds a correlation ID,\n and sends a POST request to an interact.sh server to register. Optionally,\n starts an asynchronous polling task to listen for interactions.\n\n Args:\n callback (callable, optional): A function to be called each time new interactions are received.\n\n Returns:\n str: The registered domain for out-of-band interactions.\n\n Raises:\n InteractshError: If registration with an interact.sh server fails.\n\n Examples:\n >>> interactsh_client = self.helpers.interactsh()\n >>> registered_domain = await interactsh_client.register()\n [INFO] Registering with interact.sh server: oast.me\n [INFO] Successfully registered to interactsh server oast.me with correlation_id rg99x2f860h5466ou3so [rg99x2f860h5466ou3so86i07n1m3013k.oast.me]\n \"\"\"\n rsa = RSA.generate(1024)\n\n self.public_key = rsa.publickey().exportKey()\n self.private_key = rsa.exportKey()\n\n encoded_public_key = base64.b64encode(self.public_key).decode(\"utf8\")\n\n uuid = uuid4().hex.ljust(33, \"a\")\n guid = \"\".join(i if i.isdigit() else chr(ord(i) + random.randint(0, 20)) for i in uuid)\n\n self.correlation_id = guid[:20]\n self.secret = str(uuid4())\n headers = {}\n\n if self.custom_server:\n if not self.token:\n log.verbose(\"Interact.sh token is not set\")\n else:\n headers[\"Authorization\"] = self.token\n self.server_list = [str(self.custom_server)]\n else:\n self.server_list = random.sample(server_list, k=len(server_list))\n for server in self.server_list:\n log.info(f\"Registering with interact.sh server: {server}\")\n data = {\n \"public-key\": encoded_public_key,\n \"secret-key\": self.secret,\n \"correlation-id\": self.correlation_id,\n }\n r = await self.parent_helper.request(\n f\"https://{server}/register\", headers=headers, json=data, method=\"POST\"\n )\n if r is None:\n continue\n try:\n msg = r.json().get(\"message\", \"\")\n assert \"registration successful\" in msg\n except Exception:\n log.debug(f\"Failed to register with interactsh server {self.server}\")\n continue\n self.server = server\n self.domain = f\"{guid}.{self.server}\"\n break\n\n if not self.server:\n raise InteractshError(f\"Failed to register with an interactsh server\")\n\n log.info(\n f\"Successfully registered to interactsh server {self.server} with correlation_id {self.correlation_id} [{self.domain}]\"\n )\n\n if callable(callback):\n self._poll_task = asyncio.create_task(self.poll_loop(callback))\n\n return self.domain\n
"},{"location":"dev/helpers/misc/","title":"Misc Helpers","text":"These are miscellaneous helpers, used throughout BBOT and its modules for simple tasks such as parsing domains, ports, urls, etc.
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.as_completed","title":"as_completedasync
","text":"as_completed(coros)\n
Async generator that yields completed Tasks as they are completed.
Parameters:
coros
(iterable
) \u2013 An iterable of coroutine objects or asyncio Tasks.
Yields:
asyncio.Task: A Task object that has completed its execution.
Examples:
>>> async def main():\n... async for task in as_completed([coro1(), coro2(), coro3()]):\n... result = task.result()\n... print(f'Task completed with result: {result}')\n
>>> asyncio.run(main())\n
Source code in bbot/core/helpers/misc.py
async def as_completed(coros):\n \"\"\"\n Async generator that yields completed Tasks as they are completed.\n\n Args:\n coros (iterable): An iterable of coroutine objects or asyncio Tasks.\n\n Yields:\n asyncio.Task: A Task object that has completed its execution.\n\n Examples:\n >>> async def main():\n ... async for task in as_completed([coro1(), coro2(), coro3()]):\n ... result = task.result()\n ... print(f'Task completed with result: {result}')\n\n >>> asyncio.run(main())\n \"\"\"\n tasks = {coro if isinstance(coro, asyncio.Task) else asyncio.create_task(coro): coro for coro in coros}\n while tasks:\n done, _ = await asyncio.wait(tasks.keys(), return_when=asyncio.FIRST_COMPLETED)\n for task in done:\n tasks.pop(task)\n yield task\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.backup_file","title":"backup_file","text":"backup_file(filename, max_backups=10)\n
Renames a file by appending an iteration number as a backup. Recursively renames files up to a specified maximum number of backups.
Parameters:
filename
(str or Path
) \u2013 The file to backup.
max_backups
(int
, default: 10
) \u2013 The maximum number of backups to keep. Defaults to 10.
Returns:
pathlib.Path: The new backup filepath.
Examples:
>>> backup_file(\"/tmp/test.txt\")\nPosixPath(\"/tmp/test.0.txt\")\n>>> backup_file(\"/tmp/test.0.txt\")\nPosixPath(\"/tmp/test.1.txt\")\n>>> backup_file(\"/tmp/test.1.txt\")\nPosixPath(\"/tmp/test.2.txt\")\n
Source code in bbot/core/helpers/misc.py
def backup_file(filename, max_backups=10):\n \"\"\"\n Renames a file by appending an iteration number as a backup. Recursively renames\n files up to a specified maximum number of backups.\n\n Args:\n filename (str or pathlib.Path): The file to backup.\n max_backups (int, optional): The maximum number of backups to keep. Defaults to 10.\n\n Returns:\n pathlib.Path: The new backup filepath.\n\n Examples:\n >>> backup_file(\"/tmp/test.txt\")\n PosixPath(\"/tmp/test.0.txt\")\n >>> backup_file(\"/tmp/test.0.txt\")\n PosixPath(\"/tmp/test.1.txt\")\n >>> backup_file(\"/tmp/test.1.txt\")\n PosixPath(\"/tmp/test.2.txt\")\n \"\"\"\n filename = Path(filename).resolve()\n suffixes = [s.strip(\".\") for s in filename.suffixes]\n iteration = 1\n with suppress(Exception):\n iteration = min(max_backups - 1, max(0, int(suffixes[0]))) + 1\n suffixes = suffixes[1:]\n stem = filename.stem.split(\".\")[0]\n destination = filename.parent / f\"{stem}.{iteration}.{'.'.join(suffixes)}\"\n if destination.exists() and iteration < max_backups:\n backup_file(destination)\n if filename.exists():\n filename.rename(destination)\n return destination\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.best_http_status","title":"best_http_status","text":"best_http_status(code1, code2)\n
Determine the better HTTP status code between two given codes.
The 'better' status code is considered based on typical usage and priority in HTTP communication. Lower codes are generally better than higher codes. Within the same class (e.g., 2xx), a lower code is better. Between different classes, the order of preference is 2xx > 3xx > 1xx > 4xx > 5xx.
Parameters:
code1
(int
) \u2013 The first HTTP status code.
code2
(int
) \u2013 The second HTTP status code.
Returns:
int
\u2013 The better HTTP status code between the two provided codes.
Examples:
>>> better_http_status(200, 404)\n200\n>>> better_http_status(500, 400)\n400\n>>> better_http_status(301, 302)\n301\n
Source code in bbot/core/helpers/misc.py
def best_http_status(code1, code2):\n \"\"\"\n Determine the better HTTP status code between two given codes.\n\n The 'better' status code is considered based on typical usage and priority in HTTP communication.\n Lower codes are generally better than higher codes. Within the same class (e.g., 2xx), a lower code is better.\n Between different classes, the order of preference is 2xx > 3xx > 1xx > 4xx > 5xx.\n\n Args:\n code1 (int): The first HTTP status code.\n code2 (int): The second HTTP status code.\n\n Returns:\n int: The better HTTP status code between the two provided codes.\n\n Examples:\n >>> better_http_status(200, 404)\n 200\n >>> better_http_status(500, 400)\n 400\n >>> better_http_status(301, 302)\n 301\n \"\"\"\n\n # Classify the codes into their respective categories (1xx, 2xx, 3xx, 4xx, 5xx)\n def classify_code(code):\n return int(code) // 100\n\n class1 = classify_code(code1)\n class2 = classify_code(code2)\n\n # Priority order for classes\n priority_order = {2: 1, 3: 2, 1: 3, 4: 4, 5: 5}\n\n # Compare based on class priority\n p1 = priority_order.get(class1, 10)\n p2 = priority_order.get(class2, 10)\n if p1 != p2:\n return code1 if p1 < p2 else code2\n\n # If in the same class, the lower code is better\n return min(code1, code2)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.bytes_to_human","title":"bytes_to_human","text":"bytes_to_human(_bytes)\n
Convert a bytes size to a human-readable string.
This function converts a numeric bytes value into a human-readable string format, complete with the appropriate unit symbol (B, KB, MB, GB, etc.).
Parameters:
_bytes
(int
) \u2013 The number of bytes to convert.
Returns:
str
\u2013 A string representing the number of bytes in a more readable format, rounded to two decimal places.
Examples:
>>> bytes_to_human(1234129384)\n'1.15GB'\n
Source code in bbot/core/helpers/misc.py
def bytes_to_human(_bytes):\n \"\"\"Convert a bytes size to a human-readable string.\n\n This function converts a numeric bytes value into a human-readable string format, complete\n with the appropriate unit symbol (B, KB, MB, GB, etc.).\n\n Args:\n _bytes (int): The number of bytes to convert.\n\n Returns:\n str: A string representing the number of bytes in a more readable format, rounded to two\n decimal places.\n\n Examples:\n >>> bytes_to_human(1234129384)\n '1.15GB'\n \"\"\"\n sizes = [\"B\", \"KB\", \"MB\", \"GB\", \"TB\", \"PB\", \"EB\", \"ZB\"]\n units = {}\n for count, size in enumerate(sizes):\n units[size] = pow(1024, count)\n for size in sizes:\n if abs(_bytes) < 1024.0:\n if size == sizes[0]:\n _bytes = str(int(_bytes))\n else:\n _bytes = f\"{_bytes:.2f}\"\n return f\"{_bytes}{size}\"\n _bytes /= 1024\n raise ValueError(f'Unable to convert \"{_bytes}\" to human filesize')\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.can_sudo_without_password","title":"can_sudo_without_password","text":"can_sudo_without_password()\n
Check if the current user has passwordless sudo access.
This function checks whether the current user can use sudo without entering a password. It runs a command with sudo and checks the return code to determine this.
Returns:
bool
\u2013 True if the current user can use sudo without a password, False otherwise.
Examples:
>>> can_sudo_without_password()\nTrue\n
Source code in bbot/core/helpers/misc.py
def can_sudo_without_password():\n \"\"\"Check if the current user has passwordless sudo access.\n\n This function checks whether the current user can use sudo without entering a password.\n It runs a command with sudo and checks the return code to determine this.\n\n Returns:\n bool: True if the current user can use sudo without a password, False otherwise.\n\n Examples:\n >>> can_sudo_without_password()\n True\n \"\"\"\n if os.geteuid() != 0:\n env = dict(os.environ)\n env[\"SUDO_ASKPASS\"] = \"/bin/false\"\n try:\n sp.run([\"sudo\", \"-K\"], stderr=sp.DEVNULL, stdout=sp.DEVNULL, check=True, env=env)\n sp.run([\"sudo\", \"-An\", \"/bin/true\"], stderr=sp.DEVNULL, stdout=sp.DEVNULL, check=True, env=env)\n except sp.CalledProcessError:\n return False\n return True\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cancel_tasks","title":"cancel_tasks async
","text":"cancel_tasks(tasks, ignore_errors=True)\n
Asynchronously cancels a list of asyncio tasks.
Parameters:
tasks
(list[Task]
) \u2013 A list of asyncio Task objects to cancel.
ignore_errors
(bool
, default: True
) \u2013 Whether to ignore errors other than asyncio.CancelledError. Defaults to True.
Examples:
>>> async def main():\n... task1 = asyncio.create_task(async_function1())\n... task2 = asyncio.create_task(async_function2())\n... await cancel_tasks([task1, task2])\n...\n>>> asyncio.run(main())\n
Note This function will not cancel the current task that it is called from.
Source code inbbot/core/helpers/misc.py
async def cancel_tasks(tasks, ignore_errors=True):\n \"\"\"\n Asynchronously cancels a list of asyncio tasks.\n\n Args:\n tasks (list[Task]): A list of asyncio Task objects to cancel.\n ignore_errors (bool, optional): Whether to ignore errors other than asyncio.CancelledError. Defaults to True.\n\n Examples:\n >>> async def main():\n ... task1 = asyncio.create_task(async_function1())\n ... task2 = asyncio.create_task(async_function2())\n ... await cancel_tasks([task1, task2])\n ...\n >>> asyncio.run(main())\n\n Note:\n This function will not cancel the current task that it is called from.\n \"\"\"\n current_task = asyncio.current_task()\n tasks = [t for t in tasks if t != current_task]\n for task in tasks:\n # log.debug(f\"Cancelling task: {task}\")\n task.cancel()\n if ignore_errors:\n for task in tasks:\n try:\n await task\n except BaseException as e:\n if not isinstance(e, asyncio.CancelledError):\n import traceback\n\n log.trace(traceback.format_exc())\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cancel_tasks_sync","title":"cancel_tasks_sync","text":"cancel_tasks_sync(tasks)\n
Synchronously cancels a list of asyncio tasks.
Parameters:
tasks
(list[Task]
) \u2013 A list of asyncio Task objects to cancel.
Examples:
>>> loop = asyncio.get_event_loop()\n>>> task1 = loop.create_task(some_async_function1())\n>>> task2 = loop.create_task(some_async_function2())\n>>> cancel_tasks_sync([task1, task2])\n
Note This function will not cancel the current task from which it is called.
Source code inbbot/core/helpers/misc.py
def cancel_tasks_sync(tasks):\n \"\"\"\n Synchronously cancels a list of asyncio tasks.\n\n Args:\n tasks (list[Task]): A list of asyncio Task objects to cancel.\n\n Examples:\n >>> loop = asyncio.get_event_loop()\n >>> task1 = loop.create_task(some_async_function1())\n >>> task2 = loop.create_task(some_async_function2())\n >>> cancel_tasks_sync([task1, task2])\n\n Note:\n This function will not cancel the current task from which it is called.\n \"\"\"\n current_task = asyncio.current_task()\n for task in tasks:\n if task != current_task:\n # log.debug(f\"Cancelling task: {task}\")\n task.cancel()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.chain_lists","title":"chain_lists","text":"chain_lists(l, try_files=False, msg=None, remove_blank=True, validate=False, validate_chars='<>:\"/\\\\|?*)')\n
Chains together list elements, allowing for entries separated by commas.
This function takes a list l
and flattens it by splitting its entries on commas. It also allows you to optionally open entries as files and add their contents to the list.
The order of entries is preserved, and deduplication is performed automatically.
Parameters:
l
(list
) \u2013 The list of strings to chain together.
try_files
(bool
, default: False
) \u2013 Whether to try to open entries as files. Defaults to False.
msg
(str
, default: None
) \u2013 An optional message to log when reading from a file. Defaults to None.
remove_blank
(bool
, default: True
) \u2013 Whether to remove blank entries from the list. Defaults to True.
validate
(bool
, default: False
) \u2013 Whether to perform validation for undesirable characters. Defaults to False.
validate_chars
(str
, default: '<>:\"/\\\\|?*)'
) \u2013 When performing validation, what additional set of characters to block (blocks non-printable ascii automatically). Defaults to '<>:\"/|?*)'
Returns:
list
\u2013 The list of chained elements.
Raises:
ValueError
\u2013 If the input string contains invalid characters, when enabled (off by default).
Examples:
>>> chain_lists([\"a\", \"b,c,d\"])\n['a', 'b', 'c', 'd']\n
>>> chain_lists([\"a,file.txt\", \"c,d\"], try_files=True)\n['a', 'f_line1', 'f_line2', 'f_line3', 'c', 'd']\n
Source code in bbot/core/helpers/misc.py
def chain_lists(\n l,\n try_files=False,\n msg=None,\n remove_blank=True,\n validate=False,\n validate_chars='<>:\"/\\\\|?*)',\n):\n \"\"\"Chains together list elements, allowing for entries separated by commas.\n\n This function takes a list `l` and flattens it by splitting its entries on commas.\n It also allows you to optionally open entries as files and add their contents to the list.\n\n The order of entries is preserved, and deduplication is performed automatically.\n\n Args:\n l (list): The list of strings to chain together.\n try_files (bool, optional): Whether to try to open entries as files. Defaults to False.\n msg (str, optional): An optional message to log when reading from a file. Defaults to None.\n remove_blank (bool, optional): Whether to remove blank entries from the list. Defaults to True.\n validate (bool, optional): Whether to perform validation for undesirable characters. Defaults to False.\n validate_chars (str, optional): When performing validation, what additional set of characters to block (blocks non-printable ascii automatically). Defaults to '<>:\"/\\\\|?*)'\n\n Returns:\n list: The list of chained elements.\n\n Raises:\n ValueError: If the input string contains invalid characters, when enabled (off by default).\n\n Examples:\n >>> chain_lists([\"a\", \"b,c,d\"])\n ['a', 'b', 'c', 'd']\n\n >>> chain_lists([\"a,file.txt\", \"c,d\"], try_files=True)\n ['a', 'f_line1', 'f_line2', 'f_line3', 'c', 'd']\n \"\"\"\n if isinstance(l, str):\n l = [l]\n final_list = dict()\n for entry in l:\n for s in split_regex.split(entry):\n f = s.strip()\n if validate:\n if any((c in validate_chars) or (ord(c) < 32 and c != \" \") for c in f):\n raise ValueError(f\"Invalid character in string: {f}\")\n f_path = Path(f).resolve()\n if try_files and f_path.is_file():\n if msg is not None:\n new_msg = str(msg).format(filename=f_path)\n log.info(new_msg)\n for line in str_or_file(f):\n final_list[line] = None\n else:\n final_list[f] = None\n\n ret = list(final_list)\n if remove_blank:\n ret = [r for r in ret if r]\n return ret\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.clean_dict","title":"clean_dict","text":"clean_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None)\n
Recursively clean unwanted keys from a dictionary. Useful for removing secrets from a config.
Parameters:
d
(dict
) \u2013 The input dictionary.
*key_names
\u2013 Names of keys to remove.
fuzzy
(bool
, default: False
) \u2013 Whether to perform fuzzy matching on keys.
exclude_keys
((list, None)
, default: None
) \u2013 List of keys to be excluded from removal.
_prev_key
((str, None)
, default: None
) \u2013 For internal recursive use; the previous key in the hierarchy.
Returns:
dict
\u2013 A dictionary cleaned of the keys specified in key_names.
bbot/core/helpers/misc.py
def clean_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):\n \"\"\"\n Recursively clean unwanted keys from a dictionary.\n Useful for removing secrets from a config.\n\n Args:\n d (dict): The input dictionary.\n *key_names: Names of keys to remove.\n fuzzy (bool): Whether to perform fuzzy matching on keys.\n exclude_keys (list, None): List of keys to be excluded from removal.\n _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.\n\n Returns:\n dict: A dictionary cleaned of the keys specified in key_names.\n\n \"\"\"\n if exclude_keys is None:\n exclude_keys = []\n if isinstance(exclude_keys, str):\n exclude_keys = [exclude_keys]\n d = copy.deepcopy(d)\n if isinstance(d, dict):\n for key, val in list(d.items()):\n if key in key_names or (fuzzy and any(k in key for k in key_names)):\n if _prev_key not in exclude_keys:\n d.pop(key)\n continue\n d[key] = clean_dict(val, *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)\n return d\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.clean_dns_record","title":"clean_dns_record","text":"clean_dns_record(record)\n
Cleans and formats a given DNS record for further processing.
This static method converts the DNS record to text format if it's not already a string. It also removes any trailing dots and converts the record to lowercase.
Parameters:
record
(str or Rdata
) \u2013 The DNS record to clean.
Returns:
str
\u2013 The cleaned and formatted DNS record.
Examples:
>>> clean_dns_record('www.evilcorp.com.')\n'www.evilcorp.com'\n
>>> from dns.rrset import from_text\n>>> record = from_text('www.evilcorp.com', 3600, 'IN', 'A', '1.2.3.4')[0]\n>>> clean_dns_record(record)\n'1.2.3.4'\n
Source code in bbot/core/helpers/misc.py
def clean_dns_record(record):\n \"\"\"\n Cleans and formats a given DNS record for further processing.\n\n This static method converts the DNS record to text format if it's not already a string.\n It also removes any trailing dots and converts the record to lowercase.\n\n Args:\n record (str or dns.rdata.Rdata): The DNS record to clean.\n\n Returns:\n str: The cleaned and formatted DNS record.\n\n Examples:\n >>> clean_dns_record('www.evilcorp.com.')\n 'www.evilcorp.com'\n\n >>> from dns.rrset import from_text\n >>> record = from_text('www.evilcorp.com', 3600, 'IN', 'A', '1.2.3.4')[0]\n >>> clean_dns_record(record)\n '1.2.3.4'\n \"\"\"\n if not isinstance(record, str):\n record = str(record.to_text())\n return str(record).rstrip(\".\").lower()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.clean_old","title":"clean_old","text":"clean_old(d, keep=10, filter=lambda x: True, key=latest_mtime, reverse=True, raise_error=False)\n
Clean up old files and directories within a given directory based on various filtering and sorting options.
This function removes the oldest files and directories in the provided directory 'd' that exceed a specified threshold ('keep'). The items to be deleted can be filtered using a lambda function 'filter', and they are sorted by a key function, defaulting to latest modification time.
Parameters:
d
(str or Path
) \u2013 The directory path to clean up.
keep
(int
, default: 10
) \u2013 The number of items to keep. Ones beyond this count will be removed.
filter
(Callable
, default: lambda x: True
) \u2013 A lambda function for filtering which files or directories to consider. Defaults to a lambda function that returns True for all.
key
(Callable
, default: latest_mtime
) \u2013 A function to sort the files and directories. Defaults to latest modification time.
reverse
(bool
, default: True
) \u2013 Whether to reverse the order of sorted items before removing. Defaults to True.
raise_error
(bool
, default: False
) \u2013 Whether to raise an error if directory deletion fails. Defaults to False.
Examples:
>>> clean_old(\"~/.bbot/scans\", filter=lambda x: x.is_dir() and scan_name_regex.match(x.name))\n
Source code in bbot/core/helpers/misc.py
def clean_old(d, keep=10, filter=lambda x: True, key=latest_mtime, reverse=True, raise_error=False):\n \"\"\"Clean up old files and directories within a given directory based on various filtering and sorting options.\n\n This function removes the oldest files and directories in the provided directory 'd' that exceed a specified\n threshold ('keep'). The items to be deleted can be filtered using a lambda function 'filter', and they are\n sorted by a key function, defaulting to latest modification time.\n\n Args:\n d (str or Path): The directory path to clean up.\n keep (int): The number of items to keep. Ones beyond this count will be removed.\n filter (Callable): A lambda function for filtering which files or directories to consider.\n Defaults to a lambda function that returns True for all.\n key (Callable): A function to sort the files and directories. Defaults to latest modification time.\n reverse (bool): Whether to reverse the order of sorted items before removing. Defaults to True.\n raise_error (bool): Whether to raise an error if directory deletion fails. Defaults to False.\n\n Examples:\n >>> clean_old(\"~/.bbot/scans\", filter=lambda x: x.is_dir() and scan_name_regex.match(x.name))\n \"\"\"\n d = Path(d)\n if not d.is_dir():\n return\n paths = [x for x in d.iterdir() if filter(x)]\n paths.sort(key=key, reverse=reverse)\n for path in paths[keep:]:\n try:\n log.debug(f\"Removing {path}\")\n rm_rf(path)\n except Exception as e:\n msg = f\"Failed to delete directory: {path}, {e}\"\n if raise_error:\n raise errors.DirectoryDeletionError()\n log.warning(msg)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.closest_match","title":"closest_match","text":"closest_match(s, choices, n=1, cutoff=0.0)\n
Finds the closest matching strings from a list of choices based on a given string.
This function uses the difflib library to find the closest matches to a given string s
from a list of choices
. It can return either the single best match or a list of the top n
best matches.
Parameters:
s
(str
) \u2013 The string for which to find the closest match.
choices
(list
) \u2013 A list of strings to compare against.
n
(int
, default: 1
) \u2013 The number of best matches to return. Defaults to 1.
cutoff
(float
, default: 0.0
) \u2013 A float value that defines the similarity threshold. Strings with similarity below this value are not considered. Defaults to 0.0.
Returns:
str or list: Either the closest matching string or a list of the n
closest matching strings.
Examples:
>>> closest_match(\"asdf\", [\"asd\", \"fds\"])\n'asd'\n>>> closest_match(\"asdf\", [\"asd\", \"fds\", \"asdff\"], n=3)\n['asdff', 'asd', 'fds']\n
Source code in bbot/core/helpers/misc.py
def closest_match(s, choices, n=1, cutoff=0.0):\n \"\"\"Finds the closest matching strings from a list of choices based on a given string.\n\n This function uses the difflib library to find the closest matches to a given string `s` from a list of `choices`.\n It can return either the single best match or a list of the top `n` best matches.\n\n Args:\n s (str): The string for which to find the closest match.\n choices (list): A list of strings to compare against.\n n (int, optional): The number of best matches to return. Defaults to 1.\n cutoff (float, optional): A float value that defines the similarity threshold. Strings with similarity below this value are not considered. Defaults to 0.0.\n\n Returns:\n str or list: Either the closest matching string or a list of the `n` closest matching strings.\n\n Examples:\n >>> closest_match(\"asdf\", [\"asd\", \"fds\"])\n 'asd'\n >>> closest_match(\"asdf\", [\"asd\", \"fds\", \"asdff\"], n=3)\n ['asdff', 'asd', 'fds']\n \"\"\"\n import difflib\n\n matches = difflib.get_close_matches(s, choices, n=n, cutoff=cutoff)\n if not choices or not matches:\n return\n if n == 1:\n return matches[0]\n return matches\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cloudcheck","title":"cloudcheck","text":"cloudcheck(ip)\n
Check whether an IP address belongs to a cloud provider and returns the provider name, type, and subnet.
Parameters:
ip
(str
) \u2013 The IP address to check.
Returns:
tuple
\u2013 A tuple containing provider name (str), provider type (str), and subnet (IPv4Network).
Examples:
>>> cloudcheck(\"168.62.20.37\")\n('Azure', 'cloud', IPv4Network('168.62.0.0/19'))\n
Source code in bbot/core/helpers/misc.py
def cloudcheck(ip):\n \"\"\"\n Check whether an IP address belongs to a cloud provider and returns the provider name, type, and subnet.\n\n Args:\n ip (str): The IP address to check.\n\n Returns:\n tuple: A tuple containing provider name (str), provider type (str), and subnet (IPv4Network).\n\n Examples:\n >>> cloudcheck(\"168.62.20.37\")\n ('Azure', 'cloud', IPv4Network('168.62.0.0/19'))\n \"\"\"\n import cloudcheck as _cloudcheck\n\n return _cloudcheck.check(ip)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.cpu_architecture","title":"cpu_architecture","text":"cpu_architecture()\n
Return the CPU architecture of the current system.
This function fetches and returns the architecture type of the CPU where the code is being executed. It maps common identifiers like \"x86_64\" to more general types like \"amd64\".
Returns:
str
\u2013 A string representing the CPU architecture, such as \"amd64\", \"armv7\", or \"arm64\".
Examples:
>>> cpu_architecture()\n'amd64'\n
Source code in bbot/core/helpers/misc.py
def cpu_architecture():\n \"\"\"Return the CPU architecture of the current system.\n\n This function fetches and returns the architecture type of the CPU where the code is being executed.\n It maps common identifiers like \"x86_64\" to more general types like \"amd64\".\n\n Returns:\n str: A string representing the CPU architecture, such as \"amd64\", \"armv7\", or \"arm64\".\n\n Examples:\n >>> cpu_architecture()\n 'amd64'\n \"\"\"\n import platform\n\n uname = platform.uname()\n arch = uname.machine.lower()\n if arch.startswith(\"aarch\"):\n return \"arm64\"\n elif arch == \"x86_64\":\n return \"amd64\"\n return arch\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.delete_file","title":"delete_file","text":"delete_file(path)\n
Deletes a file at the given path.
Parameters:
path
(str or Path
) \u2013 The path to the file to be deleted.
This function suppresses all exceptions to ensure that the program continues running even if the file could not be deleted.
Examples:
>>> delete_file(\"/tmp/test/file1.txt\")\n
Source code in bbot/core/helpers/misc.py
def delete_file(path):\n \"\"\"Deletes a file at the given path.\n\n Args:\n path (str or Path): The path to the file to be deleted.\n\n Note:\n This function suppresses all exceptions to ensure that the program continues running even if the file could not be deleted.\n\n Examples:\n >>> delete_file(\"/tmp/test/file1.txt\")\n \"\"\"\n with suppress(Exception):\n Path(path).unlink(missing_ok=True)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.domain_parents","title":"domain_parents","text":"domain_parents(d, include_self=False)\n
Generate a list of parent domains for a given domain string.
This function takes an input string d
and generates a list of parent domains in decreasing order of specificity. If include_self
is set to True, the list will also include the input domain if it is not a top-level domain.
Parameters:
d
(str
) \u2013 The input string representing a domain or subdomain.
include_self
(bool
, default: False
) \u2013 Whether to include the input domain itself. Defaults to False.
Yields:
str
\u2013 Parent domains of the input string in decreasing order of specificity.
Examples:
>>> list(domain_parents(\"test.www.evilcorp.co.uk\"))\n[\"www.evilcorp.co.uk\", \"evilcorp.co.uk\"]\n
Notes bbot/core/helpers/misc.py
def domain_parents(d, include_self=False):\n \"\"\"\n Generate a list of parent domains for a given domain string.\n\n This function takes an input string `d` and generates a list of parent domains in decreasing order of specificity.\n If `include_self` is set to True, the list will also include the input domain if it is not a top-level domain.\n\n Args:\n d (str): The input string representing a domain or subdomain.\n include_self (bool, optional): Whether to include the input domain itself. Defaults to False.\n\n Yields:\n str: Parent domains of the input string in decreasing order of specificity.\n\n Examples:\n >>> list(domain_parents(\"test.www.evilcorp.co.uk\"))\n [\"www.evilcorp.co.uk\", \"evilcorp.co.uk\"]\n\n Notes:\n - Port, if present in input, is preserved in the output.\n \"\"\"\n\n parent = str(d)\n if include_self and not is_domain(parent):\n yield parent\n while 1:\n parent = parent_domain(parent)\n if is_subdomain(parent):\n yield parent\n continue\n elif is_domain(parent):\n yield parent\n break\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.domain_stem","title":"domain_stem","text":"domain_stem(domain)\n
Returns an abbreviated representation of the hostname by removing the TLD (Top-Level Domain).
Parameters:
domain
(str
) \u2013 The full domain name to be abbreviated.
Returns:
str
\u2013 An abbreviated domain string without the TLD.
Examples:
>>> domain_stem(\"www.evilcorp.com\")\n\"www.evilcorp\"\n
Notes tldextract
function for domain parsing.bbot/core/helpers/misc.py
def domain_stem(domain):\n \"\"\"\n Returns an abbreviated representation of the hostname by removing the TLD (Top-Level Domain).\n\n Args:\n domain (str): The full domain name to be abbreviated.\n\n Returns:\n str: An abbreviated domain string without the TLD.\n\n Examples:\n >>> domain_stem(\"www.evilcorp.com\")\n \"www.evilcorp\"\n\n Notes:\n - Utilizes the `tldextract` function for domain parsing.\n \"\"\"\n parsed = tldextract(str(domain))\n return f\".\".join(parsed.subdomain.split(\".\") + parsed.domain.split(\".\")).strip(\".\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.execute_sync_or_async","title":"execute_sync_or_async async
","text":"execute_sync_or_async(callback, *args, **kwargs)\n
Execute a function or coroutine, handling either synchronous or asynchronous invocation.
Parameters:
callback
(Union[Callable, Coroutine]
) \u2013 The function or coroutine to execute.
*args
\u2013 Variable-length argument list to pass to the callback.
**kwargs
\u2013 Arbitrary keyword arguments to pass to the callback.
Returns:
Any
\u2013 The return value from the executed function or coroutine.
Examples:
>>> async def foo_async(x):\n... return x + 1\n>>> def foo_sync(x):\n... return x + 1\n
>>> asyncio.run(execute_sync_or_async(foo_async, 1))\n2\n
>>> asyncio.run(execute_sync_or_async(foo_sync, 1))\n2\n
Source code in bbot/core/helpers/misc.py
async def execute_sync_or_async(callback, *args, **kwargs):\n \"\"\"\n Execute a function or coroutine, handling either synchronous or asynchronous invocation.\n\n Args:\n callback (Union[Callable, Coroutine]): The function or coroutine to execute.\n *args: Variable-length argument list to pass to the callback.\n **kwargs: Arbitrary keyword arguments to pass to the callback.\n\n Returns:\n Any: The return value from the executed function or coroutine.\n\n Examples:\n >>> async def foo_async(x):\n ... return x + 1\n >>> def foo_sync(x):\n ... return x + 1\n\n >>> asyncio.run(execute_sync_or_async(foo_async, 1))\n 2\n\n >>> asyncio.run(execute_sync_or_async(foo_sync, 1))\n 2\n \"\"\"\n if is_async_function(callback):\n return await callback(*args, **kwargs)\n else:\n return callback(*args, **kwargs)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_emails","title":"extract_emails","text":"extract_emails(s)\n
Extract email addresses from a body of text
This function takes in a string and yields all email addresses found in it. The emails are converted to lower case before yielding. It utilizes regular expressions for email pattern matching.
Parameters:
s
(str
) \u2013 The input string from which to extract email addresses.
Yields:
str
\u2013 Yields email addresses found in the input string, in lower case.
Examples:
>>> list(extract_emails(\"Contact us at info@evilcorp.com and support@evilcorp.com\"))\n['info@evilcorp.com', 'support@evilcorp.com']\n
Source code in bbot/core/helpers/misc.py
def extract_emails(s):\n \"\"\"\n Extract email addresses from a body of text\n\n This function takes in a string and yields all email addresses found in it.\n The emails are converted to lower case before yielding. It utilizes\n regular expressions for email pattern matching.\n\n Args:\n s (str): The input string from which to extract email addresses.\n\n Yields:\n str: Yields email addresses found in the input string, in lower case.\n\n Examples:\n >>> list(extract_emails(\"Contact us at info@evilcorp.com and support@evilcorp.com\"))\n ['info@evilcorp.com', 'support@evilcorp.com']\n \"\"\"\n for email in bbot_regexes.email_regex.findall(smart_decode(s)):\n yield email.lower()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_host","title":"extract_host","text":"extract_host(s)\n
Attempts to find and extract the host portion of a string.
Parameters:
s
(str
) \u2013 The string from which to extract the host.
Returns:
tuple
\u2013 A tuple containing three strings: (hostname (None if not found), string_before_hostname, string_after_hostname).
Examples:
>>> extract_host(\"evilcorp.com:80\")\n(\"evilcorp.com\", \"\", \":80\")\n
>>> extract_host(\"http://evilcorp.com:80/asdf.php?a=b\")\n(\"evilcorp.com\", \"http://\", \":80/asdf.php?a=b\")\n
>>> extract_host(\"bob@evilcorp.com\")\n(\"evilcorp.com\", \"bob@\", \"\")\n
>>> extract_host(\"[dead::beef]:22\")\n(\"dead::beef\", \"[\", \"]:22\")\n
>>> extract_host(\"ftp://username:password@my-ftp.com/my-file.csv\")\n(\n \"my-ftp.com\",\n \"ftp://username:password@\",\n \"/my-file.csv\",\n)\n
Source code in bbot/core/helpers/misc.py
def extract_host(s):\n \"\"\"\n Attempts to find and extract the host portion of a string.\n\n Args:\n s (str): The string from which to extract the host.\n\n Returns:\n tuple: A tuple containing three strings:\n (hostname (None if not found), string_before_hostname, string_after_hostname).\n\n Examples:\n >>> extract_host(\"evilcorp.com:80\")\n (\"evilcorp.com\", \"\", \":80\")\n\n >>> extract_host(\"http://evilcorp.com:80/asdf.php?a=b\")\n (\"evilcorp.com\", \"http://\", \":80/asdf.php?a=b\")\n\n >>> extract_host(\"bob@evilcorp.com\")\n (\"evilcorp.com\", \"bob@\", \"\")\n\n >>> extract_host(\"[dead::beef]:22\")\n (\"dead::beef\", \"[\", \"]:22\")\n\n >>> extract_host(\"ftp://username:password@my-ftp.com/my-file.csv\")\n (\n \"my-ftp.com\",\n \"ftp://username:password@\",\n \"/my-file.csv\",\n )\n \"\"\"\n s = smart_decode(s)\n match = bbot_regexes.extract_host_regex.search(s)\n\n if match:\n hostname = match.group(1)\n before = s[: match.start(1)]\n after = s[match.end(1) :]\n host, port = split_host_port(hostname)\n netloc = make_netloc(host, port)\n if netloc != hostname:\n # invalid host / port\n return (None, s, \"\")\n if host is not None:\n if port is not None:\n after = f\":{port}{after}\"\n if is_ip(host, version=6) and hostname.startswith(\"[\"):\n before = f\"{before}[\"\n after = f\"]{after}\"\n hostname = str(host)\n return (hostname, before, after)\n\n return (None, s, \"\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_params_json","title":"extract_params_json","text":"extract_params_json(json_data, compare_mode='getparam')\n
Extracts key-value pairs from a JSON object and returns them as a set of tuples. Used by the paramminer_headers
module.
Parameters:
json_data
(str
) \u2013 JSON-formatted string containing key-value pairs.
Returns:
set
\u2013 A set of tuples containing the keys and their corresponding values present in the JSON object.
Examples:
>>> extract_params_json('{\"a\": 1, \"b\": {\"c\": 2}}')\n{('a', 1), ('b', {'c': 2}), ('c', 2)}\n
Source code in bbot/core/helpers/misc.py
def extract_params_json(json_data, compare_mode=\"getparam\"):\n \"\"\"\n Extracts key-value pairs from a JSON object and returns them as a set of tuples. Used by the `paramminer_headers` module.\n\n Args:\n json_data (str): JSON-formatted string containing key-value pairs.\n\n Returns:\n set: A set of tuples containing the keys and their corresponding values present in the JSON object.\n\n Raises:\n Returns an empty set if JSONDecodeError occurs.\n\n Examples:\n >>> extract_params_json('{\"a\": 1, \"b\": {\"c\": 2}}')\n {('a', 1), ('b', {'c': 2}), ('c', 2)}\n \"\"\"\n try:\n data = json.loads(json_data)\n except json.JSONDecodeError:\n return set()\n\n key_value_pairs = set()\n stack = [(data, \"\")]\n\n while stack:\n current_data, path = stack.pop()\n if isinstance(current_data, dict):\n for key, value in current_data.items():\n full_key = f\"{path}.{key}\" if path else key\n if isinstance(value, dict):\n stack.append((value, full_key))\n elif isinstance(value, list):\n stack.append((value, full_key))\n else:\n if validate_parameter(full_key, compare_mode):\n key_value_pairs.add((full_key, value))\n elif isinstance(current_data, list):\n for item in current_data:\n if isinstance(item, (dict, list)):\n stack.append((item, path))\n return key_value_pairs\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_params_xml","title":"extract_params_xml","text":"extract_params_xml(xml_data, compare_mode='getparam')\n
Extracts tags and their text values from an XML object and returns them as a set of tuples.
Parameters:
xml_data
(str
) \u2013 XML-formatted string containing elements.
Returns:
set
\u2013 A set of tuples containing the tags and their corresponding text values present in the XML object.
Examples:
>>> extract_params_xml('<root><child1><child2>value</child2></child1></root>')\n{('root', None), ('child1', None), ('child2', 'value')}\n
Source code in bbot/core/helpers/misc.py
def extract_params_xml(xml_data, compare_mode=\"getparam\"):\n \"\"\"\n Extracts tags and their text values from an XML object and returns them as a set of tuples.\n\n Args:\n xml_data (str): XML-formatted string containing elements.\n\n Returns:\n set: A set of tuples containing the tags and their corresponding text values present in the XML object.\n\n Raises:\n Returns an empty set if ParseError occurs.\n\n Examples:\n >>> extract_params_xml('<root><child1><child2>value</child2></child1></root>')\n {('root', None), ('child1', None), ('child2', 'value')}\n \"\"\"\n import xml.etree.ElementTree as ET\n\n try:\n root = ET.fromstring(xml_data)\n except ET.ParseError:\n return set()\n\n tag_value_pairs = set()\n stack = [root]\n\n while stack:\n current_element = stack.pop()\n if validate_parameter(current_element.tag, compare_mode):\n tag_value_pairs.add((current_element.tag, current_element.text))\n for child in current_element:\n stack.append(child)\n return tag_value_pairs\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.extract_words","title":"extract_words","text":"extract_words(data, acronyms=True, wordninja=True, model=None, max_length=100, word_regexes=None)\n
Intelligently extracts words from given data.
This function uses regular expressions and optionally wordninja to extract words from a given text string. Thanks to wordninja it can handle concatenated words intelligently.
Parameters:
data
(str
) \u2013 The data from which words are to be extracted.
acronyms
(bool
, default: True
) \u2013 Whether to include acronyms. Defaults to True.
wordninja
(bool
, default: True
) \u2013 Whether to use the wordninja library to split concatenated words. Defaults to True.
model
(object
, default: None
) \u2013 A custom wordninja model for special types of data such as DNS names.
max_length
(int
, default: 100
) \u2013 Maximum length for a word to be included. Defaults to 100.
word_regexes
(list
, default: None
) \u2013 A list of compiled regular expression objects for word extraction. Defaults to None.
Returns:
set
\u2013 A set of extracted words.
Examples:
>>> extract_words('blacklanternsecurity')\n{'black', 'lantern', 'security', 'bls', 'blacklanternsecurity'}\n
Source code in bbot/core/helpers/misc.py
def extract_words(data, acronyms=True, wordninja=True, model=None, max_length=100, word_regexes=None):\n \"\"\"Intelligently extracts words from given data.\n\n This function uses regular expressions and optionally wordninja to extract words\n from a given text string. Thanks to wordninja it can handle concatenated words intelligently.\n\n Args:\n data (str): The data from which words are to be extracted.\n acronyms (bool, optional): Whether to include acronyms. Defaults to True.\n wordninja (bool, optional): Whether to use the wordninja library to split concatenated words. Defaults to True.\n model (object, optional): A custom wordninja model for special types of data such as DNS names.\n max_length (int, optional): Maximum length for a word to be included. Defaults to 100.\n word_regexes (list, optional): A list of compiled regular expression objects for word extraction. Defaults to None.\n\n Returns:\n set: A set of extracted words.\n\n Examples:\n >>> extract_words('blacklanternsecurity')\n {'black', 'lantern', 'security', 'bls', 'blacklanternsecurity'}\n \"\"\"\n import wordninja as _wordninja\n\n if word_regexes is None:\n word_regexes = bbot_regexes.word_regexes\n words = set()\n data = smart_decode(data)\n for r in word_regexes:\n for word in set(r.findall(data)):\n # blacklanternsecurity\n if len(word) <= max_length:\n words.add(word)\n\n # blacklanternsecurity --> ['black', 'lantern', 'security']\n # max_slice_length = 3\n for word in list(words):\n if wordninja:\n if model is None:\n model = _wordninja\n subwords = model.split(word)\n for subword in subwords:\n words.add(subword)\n # this section generates compound words\n # it is interesting but currently disabled the quality of its output doesn't quite justify its quantity\n # blacklanternsecurity --> ['black', 'lantern', 'security', 'blacklantern', 'lanternsecurity']\n # for s, e in combinations(range(len(subwords) + 1), 2):\n # if e - s <= max_slice_length:\n # subword_slice = \"\".join(subwords[s:e])\n # words.add(subword_slice)\n # blacklanternsecurity --> bls\n if acronyms:\n if len(subwords) > 1:\n words.add(\"\".join([c[0] for c in subwords if len(c) > 0]))\n\n return words\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.filesize","title":"filesize","text":"filesize(f)\n
Get the file size of a given file.
This function takes a file path as an argument and returns its size in bytes. If the path does not point to a file, the function returns 0.
Parameters:
f
(str or Path
) \u2013 The file path for which to get the size.
Returns:
int
\u2013 The size of the file in bytes, or 0 if the path does not point to a file.
Examples:
>>> filesize(\"/path/to/file.txt\")\n1024\n
Source code in bbot/core/helpers/misc.py
def filesize(f):\n \"\"\"Get the file size of a given file.\n\n This function takes a file path as an argument and returns its size in bytes. If the path\n does not point to a file, the function returns 0.\n\n Args:\n f (str or Path): The file path for which to get the size.\n\n Returns:\n int: The size of the file in bytes, or 0 if the path does not point to a file.\n\n Examples:\n >>> filesize(\"/path/to/file.txt\")\n 1024\n \"\"\"\n f = Path(f)\n if f.is_file():\n return f.stat().st_size\n return 0\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.filter_dict","title":"filter_dict","text":"filter_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None)\n
Recursively filter a dictionary based on key names.
Parameters:
d
(dict
) \u2013 The input dictionary.
*key_names
\u2013 Names of keys to filter for.
fuzzy
(bool
, default: False
) \u2013 Whether to perform fuzzy matching on keys.
exclude_keys
((list, None)
, default: None
) \u2013 List of keys to be excluded from the final dict.
_prev_key
((str, None)
, default: None
) \u2013 For internal recursive use; the previous key in the hierarchy.
Returns:
dict
\u2013 A dictionary containing only the keys specified in key_names.
Examples:
>>> filter_dict({\"key1\": \"test\", \"key2\": \"asdf\"}, \"key2\")\n{\"key2\": \"asdf\"}\n>>> filter_dict({\"key1\": \"test\", \"key2\": {\"key3\": \"asdf\"}}, \"key1\", \"key3\", exclude_keys=\"key2\")\n{'key1': 'test'}\n
Source code in bbot/core/helpers/misc.py
def filter_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):\n \"\"\"\n Recursively filter a dictionary based on key names.\n\n Args:\n d (dict): The input dictionary.\n *key_names: Names of keys to filter for.\n fuzzy (bool): Whether to perform fuzzy matching on keys.\n exclude_keys (list, None): List of keys to be excluded from the final dict.\n _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.\n\n Returns:\n dict: A dictionary containing only the keys specified in key_names.\n\n Examples:\n >>> filter_dict({\"key1\": \"test\", \"key2\": \"asdf\"}, \"key2\")\n {\"key2\": \"asdf\"}\n >>> filter_dict({\"key1\": \"test\", \"key2\": {\"key3\": \"asdf\"}}, \"key1\", \"key3\", exclude_keys=\"key2\")\n {'key1': 'test'}\n \"\"\"\n if exclude_keys is None:\n exclude_keys = []\n if isinstance(exclude_keys, str):\n exclude_keys = [exclude_keys]\n ret = {}\n if isinstance(d, dict):\n for key in d:\n if key in key_names or (fuzzy and any(k in key for k in key_names)):\n if not any(k in exclude_keys for k in [key, _prev_key]):\n ret[key] = copy.deepcopy(d[key])\n elif isinstance(d[key], list) or isinstance(d[key], dict):\n child = filter_dict(d[key], *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)\n if child:\n ret[key] = child\n return ret\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.gen_numbers","title":"gen_numbers","text":"gen_numbers(n, padding=2)\n
Generates numbers with variable padding and returns them as a set of strings.
Parameters:
n
(int
) \u2013 The upper limit of numbers to generate, exclusive.
padding
(int
, default: 2
) \u2013 The maximum number of digits to pad the numbers with. Defaults to 2.
Returns:
set
\u2013 A set of string representations of numbers with varying degrees of padding.
Examples:
>>> gen_numbers(5)\n{'0', '00', '01', '02', '03', '04', '1', '2', '3', '4'}\n
>>> gen_numbers(3, padding=3)\n{'0', '00', '000', '001', '002', '01', '02', '1', '2'}\n
>>> gen_numbers(5, padding=1)\n{'0', '1', '2', '3', '4'}\n
Source code in bbot/core/helpers/misc.py
def gen_numbers(n, padding=2):\n \"\"\"Generates numbers with variable padding and returns them as a set of strings.\n\n Args:\n n (int): The upper limit of numbers to generate, exclusive.\n padding (int, optional): The maximum number of digits to pad the numbers with. Defaults to 2.\n\n Returns:\n set: A set of string representations of numbers with varying degrees of padding.\n\n Examples:\n >>> gen_numbers(5)\n {'0', '00', '01', '02', '03', '04', '1', '2', '3', '4'}\n\n >>> gen_numbers(3, padding=3)\n {'0', '00', '000', '001', '002', '01', '02', '1', '2'}\n\n >>> gen_numbers(5, padding=1)\n {'0', '1', '2', '3', '4'}\n \"\"\"\n results = set()\n for i in range(n):\n for p in range(1, padding + 1):\n results.add(str(i).zfill(p))\n return results\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_closest_match","title":"get_closest_match","text":"get_closest_match(s, choices, msg=None)\n
Finds the closest match from a list of choices for a given string.
This function is particularly useful for CLI applications where you want to validate flags or modules.
Parameters:
s
(str
) \u2013 The string for which to find the closest match.
choices
(list
) \u2013 A list of strings to compare against.
msg
(str
, default: None
) \u2013 Additional message to prepend in the warning message. Defaults to None.
loglevel
(str
) \u2013 The log level to use for the warning message. Defaults to \"HUGEWARNING\".
exitcode
(int
) \u2013 The exit code to use when exiting the program. Defaults to 2.
Examples:
>>> get_closest_match(\"some_module\", [\"some_mod\", \"some_other_mod\"], msg=\"module\")\n# Output: Could not find module \"some_module\". Did you mean \"some_mod\"?\n
Source code in bbot/core/helpers/misc.py
def get_closest_match(s, choices, msg=None):\n \"\"\"Finds the closest match from a list of choices for a given string.\n\n This function is particularly useful for CLI applications where you want to validate flags or modules.\n\n Args:\n s (str): The string for which to find the closest match.\n choices (list): A list of strings to compare against.\n msg (str, optional): Additional message to prepend in the warning message. Defaults to None.\n loglevel (str, optional): The log level to use for the warning message. Defaults to \"HUGEWARNING\".\n exitcode (int, optional): The exit code to use when exiting the program. Defaults to 2.\n\n Examples:\n >>> get_closest_match(\"some_module\", [\"some_mod\", \"some_other_mod\"], msg=\"module\")\n # Output: Could not find module \"some_module\". Did you mean \"some_mod\"?\n \"\"\"\n if msg is None:\n msg = \"\"\n else:\n msg += \" \"\n closest = closest_match(s, choices)\n return f'Could not find {msg}\"{s}\". Did you mean \"{closest}\"?'\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_exception_chain","title":"get_exception_chain","text":"get_exception_chain(e)\n
Retrieves the full chain of exceptions leading to the given exception.
Parameters:
e
(BaseException
) \u2013 The exception for which to get the chain.
Returns:
list[BaseException]: List of exceptions in the chain, from the given exception back to the root cause.
Examples:
>>> try:\n... raise ValueError(\"This is a value error\")\n... except ValueError as e:\n... exc_chain = get_exception_chain(e)\n... for exc in exc_chain:\n... print(exc)\nThis is a value error\n
Source code in bbot/core/helpers/misc.py
def get_exception_chain(e):\n \"\"\"\n Retrieves the full chain of exceptions leading to the given exception.\n\n Args:\n e (BaseException): The exception for which to get the chain.\n\n Returns:\n list[BaseException]: List of exceptions in the chain, from the given exception back to the root cause.\n\n Examples:\n >>> try:\n ... raise ValueError(\"This is a value error\")\n ... except ValueError as e:\n ... exc_chain = get_exception_chain(e)\n ... for exc in exc_chain:\n ... print(exc)\n This is a value error\n \"\"\"\n exception_chain = []\n current_exception = e\n while current_exception is not None:\n exception_chain.append(current_exception)\n current_exception = getattr(current_exception, \"__context__\", None)\n return exception_chain\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_file_extension","title":"get_file_extension","text":"get_file_extension(s)\n
Extracts the file extension from a given string representing a URL or file path.
Parameters:
s
(str
) \u2013 The string from which to extract the file extension.
Returns:
str
\u2013 The file extension, or an empty string if no extension is found.
Examples:
>>> get_file_extension(\"https://evilcorp.com/api/test.php\")\n\"php\"\n>>> get_file_extension(\"/etc/test.conf\")\n\"conf\"\n>>> get_file_extension(\"/etc/passwd\")\n\"\"\n
Source code in bbot/core/helpers/misc.py
def get_file_extension(s):\n \"\"\"\n Extracts the file extension from a given string representing a URL or file path.\n\n Args:\n s (str): The string from which to extract the file extension.\n\n Returns:\n str: The file extension, or an empty string if no extension is found.\n\n Examples:\n >>> get_file_extension(\"https://evilcorp.com/api/test.php\")\n \"php\"\n >>> get_file_extension(\"/etc/test.conf\")\n \"conf\"\n >>> get_file_extension(\"/etc/passwd\")\n \"\"\n \"\"\"\n s = str(s).lower().strip()\n rightmost_section = s.rsplit(\"/\", 1)[-1]\n if \".\" in rightmost_section:\n extension = rightmost_section.rsplit(\".\", 1)[-1]\n return extension\n return \"\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_keys_in_dot_syntax","title":"get_keys_in_dot_syntax","text":"get_keys_in_dot_syntax(config)\n
Retrieve all keys in an OmegaConf configuration in dot notation.
This function converts an OmegaConf configuration into a list of keys represented in dot notation.
Parameters:
config
(DictConfig
) \u2013 The OmegaConf configuration object.
Returns:
List[str]: A list of keys in dot notation.
Examples:
>>> config = OmegaConf.create({\n... \"web\": {\n... \"test\": True\n... },\n... \"db\": {\n... \"host\": \"localhost\",\n... \"port\": 5432\n... }\n... })\n>>> get_keys_in_dot_syntax(config)\n['web.test', 'db.host', 'db.port']\n
Source code in bbot/core/helpers/misc.py
def get_keys_in_dot_syntax(config):\n \"\"\"Retrieve all keys in an OmegaConf configuration in dot notation.\n\n This function converts an OmegaConf configuration into a list of keys\n represented in dot notation.\n\n Args:\n config (DictConfig): The OmegaConf configuration object.\n\n Returns:\n List[str]: A list of keys in dot notation.\n\n Examples:\n >>> config = OmegaConf.create({\n ... \"web\": {\n ... \"test\": True\n ... },\n ... \"db\": {\n ... \"host\": \"localhost\",\n ... \"port\": 5432\n ... }\n ... })\n >>> get_keys_in_dot_syntax(config)\n ['web.test', 'db.host', 'db.port']\n \"\"\"\n from omegaconf import OmegaConf\n\n container = OmegaConf.to_container(config, resolve=True)\n keys = []\n\n def recursive_keys(d, parent_key=\"\"):\n for k, v in d.items():\n full_key = f\"{parent_key}.{k}\" if parent_key else k\n if isinstance(v, dict):\n recursive_keys(v, full_key)\n else:\n keys.append(full_key)\n\n recursive_keys(container)\n return keys\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_size","title":"get_size","text":"get_size(obj, max_depth=5, seen=None)\n
Roughly estimate the memory footprint of a Python object using recursion.
Parameters:
obj
(any
) \u2013 The object whose size is to be determined.
max_depth
(int
, default: 5
) \u2013 Maximum depth to which nested objects will be inspected. Defaults to 5.
seen
(set
, default: None
) \u2013 Objects that have already been accounted for, to avoid loops.
Returns:
int
\u2013 Approximate memory footprint of the object in bytes.
Examples:
>>> get_size(my_list)\n4200\n
>>> get_size(my_dict, max_depth=3)\n8400\n
Source code in bbot/core/helpers/misc.py
def get_size(obj, max_depth=5, seen=None):\n \"\"\"\n Roughly estimate the memory footprint of a Python object using recursion.\n\n Parameters:\n obj (any): The object whose size is to be determined.\n max_depth (int, optional): Maximum depth to which nested objects will be inspected. Defaults to 5.\n seen (set, optional): Objects that have already been accounted for, to avoid loops.\n\n Returns:\n int: Approximate memory footprint of the object in bytes.\n\n Examples:\n >>> get_size(my_list)\n 4200\n\n >>> get_size(my_dict, max_depth=3)\n 8400\n \"\"\"\n from collections.abc import Mapping\n\n # If seen is not provided, initialize an empty set\n if seen is None:\n seen = set()\n # Get the id of the object\n obj_id = id(obj)\n # Decrease the maximum depth for the next recursion\n new_max_depth = max_depth - 1\n # If the object has already been seen or we've reached the maximum recursion depth, return 0\n if obj_id in seen or new_max_depth <= 0:\n return 0\n # Get the size of the object\n size = sys.getsizeof(obj)\n # Add the object's id to the set of seen objects\n seen.add(obj_id)\n # If the object has a __dict__ attribute, we want to measure its size\n if hasattr(obj, \"__dict__\"):\n # Iterate over the Method Resolution Order (MRO) of the class of the object\n for cls in obj.__class__.__mro__:\n # If the class's __dict__ contains a __dict__ key\n if \"__dict__\" in cls.__dict__:\n for k, v in obj.__dict__.items():\n size += get_size(k, new_max_depth, seen)\n size += get_size(v, new_max_depth, seen)\n break\n # If the object is a mapping (like a dictionary), we want to measure the size of its items\n if isinstance(obj, Mapping):\n with suppress(StopIteration):\n k, v = next(iter(obj.items()))\n size += (get_size(k, new_max_depth, seen) + get_size(v, new_max_depth, seen)) * len(obj)\n # If the object is a container (like a list or tuple) but not a string or bytes-like object\n elif isinstance(obj, (list, tuple, set)):\n with suppress(StopIteration):\n size += get_size(next(iter(obj)), new_max_depth, seen) * len(obj)\n # If the object has __slots__, we want to measure the size of the attributes in __slots__\n if hasattr(obj, \"__slots__\"):\n size += sum(get_size(getattr(obj, s), new_max_depth, seen) for s in obj.__slots__ if hasattr(obj, s))\n return size\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.get_traceback_details","title":"get_traceback_details","text":"get_traceback_details(e)\n
Retrieves detailed information from the traceback of an exception.
Parameters:
e
(BaseException
) \u2013 The exception for which to get traceback details.
Returns:
tuple
\u2013 A tuple containing filename (str), line number (int), and function name (str) where the exception was raised.
Examples:
>>> try:\n... raise ValueError(\"This is a value error\")\n... except ValueError as e:\n... filename, lineno, funcname = get_traceback_details(e)\n... print(f\"File: {filename}, Line: {lineno}, Function: {funcname}\")\nFile: <stdin>, Line: 2, Function: <module>\n
Source code in bbot/core/helpers/misc.py
def get_traceback_details(e):\n \"\"\"\n Retrieves detailed information from the traceback of an exception.\n\n Args:\n e (BaseException): The exception for which to get traceback details.\n\n Returns:\n tuple: A tuple containing filename (str), line number (int), and function name (str) where the exception was raised.\n\n Examples:\n >>> try:\n ... raise ValueError(\"This is a value error\")\n ... except ValueError as e:\n ... filename, lineno, funcname = get_traceback_details(e)\n ... print(f\"File: {filename}, Line: {lineno}, Function: {funcname}\")\n File: <stdin>, Line: 2, Function: <module>\n \"\"\"\n import traceback\n\n tb = traceback.extract_tb(e.__traceback__)\n last_frame = tb[-1] # Get the last frame in the traceback (the one where the exception was raised)\n filename = last_frame.filename\n lineno = last_frame.lineno\n funcname = last_frame.name\n return filename, lineno, funcname\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.grouper","title":"grouper","text":"grouper(iterable, n)\n
Grouper groups an iterable into chunks of a given size.
Parameters:
iterable
(iterable
) \u2013 The iterable to be chunked.
n
(int
) \u2013 The size of each chunk.
Returns:
iterator
\u2013 An iterator that produces lists of elements from the original iterable, each of length n
or less.
Examples:
>>> list(grouper('ABCDEFG', 3))\n[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]\n
Source code in bbot/core/helpers/misc.py
def grouper(iterable, n):\n \"\"\"\n Grouper groups an iterable into chunks of a given size.\n\n Args:\n iterable (iterable): The iterable to be chunked.\n n (int): The size of each chunk.\n\n Returns:\n iterator: An iterator that produces lists of elements from the original iterable, each of length `n` or less.\n\n Examples:\n >>> list(grouper('ABCDEFG', 3))\n [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]\n \"\"\"\n from itertools import islice\n\n iterable = iter(iterable)\n return iter(lambda: list(islice(iterable, n)), [])\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.human_timedelta","title":"human_timedelta","text":"human_timedelta(d)\n
Convert a TimeDelta object into a human-readable string.
This function takes a datetime.timedelta object and converts it into a string format that is easier to read and understand.
Parameters:
d
(timedelta
) \u2013 The TimeDelta object to convert.
Returns:
str
\u2013 A string representation of the TimeDelta object in human-readable form.
Examples:
>>> from datetime import datetime\n>>>\n>>> start_time = datetime.now()\n>>> end_time = datetime.now()\n>>> elapsed_time = end_time - start_time\n>>> human_timedelta(elapsed_time)\n'2 hours, 30 minutes, 15 seconds'\n
Source code in bbot/core/helpers/misc.py
def human_timedelta(d):\n \"\"\"Convert a TimeDelta object into a human-readable string.\n\n This function takes a datetime.timedelta object and converts it into a string format that\n is easier to read and understand.\n\n Args:\n d (datetime.timedelta): The TimeDelta object to convert.\n\n Returns:\n str: A string representation of the TimeDelta object in human-readable form.\n\n Examples:\n >>> from datetime import datetime\n >>>\n >>> start_time = datetime.now()\n >>> end_time = datetime.now()\n >>> elapsed_time = end_time - start_time\n >>> human_timedelta(elapsed_time)\n '2 hours, 30 minutes, 15 seconds'\n \"\"\"\n hours, remainder = divmod(d.seconds, 3600)\n minutes, seconds = divmod(remainder, 60)\n result = []\n if hours:\n result.append(f\"{hours:,} hour\" + (\"s\" if hours > 1 else \"\"))\n if minutes:\n result.append(f\"{minutes:,} minute\" + (\"s\" if minutes > 1 else \"\"))\n if seconds:\n result.append(f\"{seconds:,} second\" + (\"s\" if seconds > 1 else \"\"))\n ret = \", \".join(result)\n if not ret:\n ret = \"0 seconds\"\n return ret\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.human_to_bytes","title":"human_to_bytes","text":"human_to_bytes(filesize)\n
Convert a human-readable file size string to its bytes equivalent.
This function takes a human-readable file size string, such as \"2.5GB\", and converts it to its equivalent number of bytes.
Parameters:
filesize
(str or int
) \u2013 The human-readable file size string or integer bytes value to convert.
Returns:
int
\u2013 The number of bytes equivalent to the input human-readable file size.
Raises:
ValueError
\u2013 If the input string cannot be converted to bytes.
Examples:
>>> human_to_bytes(\"23.23gb\")\n24943022571\n
Source code in bbot/core/helpers/misc.py
def human_to_bytes(filesize):\n \"\"\"Convert a human-readable file size string to its bytes equivalent.\n\n This function takes a human-readable file size string, such as \"2.5GB\", and converts it\n to its equivalent number of bytes.\n\n Args:\n filesize (str or int): The human-readable file size string or integer bytes value to convert.\n\n Returns:\n int: The number of bytes equivalent to the input human-readable file size.\n\n Raises:\n ValueError: If the input string cannot be converted to bytes.\n\n Examples:\n >>> human_to_bytes(\"23.23gb\")\n 24943022571\n \"\"\"\n if isinstance(filesize, int):\n return filesize\n sizes = [\"B\", \"KB\", \"MB\", \"GB\", \"TB\", \"PB\", \"EB\", \"ZB\"]\n units = {}\n for count, size in enumerate(sizes):\n size_increment = pow(1024, count)\n units[size] = size_increment\n if len(size) == 2:\n units[size[0]] = size_increment\n match = filesize_regex.match(filesize)\n try:\n if match:\n num, size = match.groups()\n size = size.upper()\n size_increment = units[size]\n return int(float(num) * size_increment)\n except KeyError:\n pass\n raise ValueError(f'Unable to convert filesize \"{filesize}\" to bytes')\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.in_exception_chain","title":"in_exception_chain","text":"in_exception_chain(e, exc_types)\n
Given an Exception and a list of Exception types, returns whether any of the specified types are contained anywhere in the Exception chain.
Parameters:
e
(BaseException
) \u2013 The exception to check
exc_types
(list[Exception]
) \u2013 Exception types to consider intentional cancellations. Default is KeyboardInterrupt
Returns:
bool
\u2013 Whether the error is the result of an intentional cancellaion
Examples:
>>> try:\n... raise ValueError(\"This is a value error\")\n... except Exception as e:\n... if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n... raise\n
Source code in bbot/core/helpers/misc.py
def in_exception_chain(e, exc_types):\n \"\"\"\n Given an Exception and a list of Exception types, returns whether any of the specified types are contained anywhere in the Exception chain.\n\n Args:\n e (BaseException): The exception to check\n exc_types (list[Exception]): Exception types to consider intentional cancellations. Default is KeyboardInterrupt\n\n Returns:\n bool: Whether the error is the result of an intentional cancellaion\n\n Examples:\n >>> try:\n ... raise ValueError(\"This is a value error\")\n ... except Exception as e:\n ... if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):\n ... raise\n \"\"\"\n return any([isinstance(_, exc_types) for _ in get_exception_chain(e)])\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.integer_to_ordinal","title":"integer_to_ordinal","text":"integer_to_ordinal(n)\n
Convert an integer to its ordinal representation.
Parameters:
n
(int
) \u2013 The integer to convert.
Returns:
str
\u2013 The ordinal representation of the integer.
Examples:
>>> integer_to_ordinal(1)\n'1st'\n>>> integer_to_ordinal(2)\n'2nd'\n>>> integer_to_ordinal(3)\n'3rd'\n>>> integer_to_ordinal(11)\n'11th'\n>>> integer_to_ordinal(21)\n'21st'\n>>> integer_to_ordinal(101)\n'101st'\n
Source code in bbot/core/helpers/misc.py
def integer_to_ordinal(n):\n \"\"\"\n Convert an integer to its ordinal representation.\n\n Args:\n n (int): The integer to convert.\n\n Returns:\n str: The ordinal representation of the integer.\n\n Examples:\n >>> integer_to_ordinal(1)\n '1st'\n >>> integer_to_ordinal(2)\n '2nd'\n >>> integer_to_ordinal(3)\n '3rd'\n >>> integer_to_ordinal(11)\n '11th'\n >>> integer_to_ordinal(21)\n '21st'\n >>> integer_to_ordinal(101)\n '101st'\n \"\"\"\n # Check the last digit\n last_digit = n % 10\n # Check the last two digits for special cases (11th, 12th, 13th)\n last_two_digits = n % 100\n\n if 10 <= last_two_digits <= 20:\n suffix = \"th\"\n else:\n if last_digit == 1:\n suffix = \"st\"\n elif last_digit == 2:\n suffix = \"nd\"\n elif last_digit == 3:\n suffix = \"rd\"\n else:\n suffix = \"th\"\n\n return f\"{n}{suffix}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.ip_network_parents","title":"ip_network_parents","text":"ip_network_parents(i, include_self=False)\n
Generates all parent IP networks for a given IP address or network, optionally including the network itself.
Parameters:
i
(str or IPv4Network / IPv6Network
) \u2013 The IP address or network to find parents for.
include_self
(bool
, default: False
) \u2013 Whether to include the network itself in the result. Default is False.
Yields:
ipaddress.IPv4Network or ipaddress.IPv6Network: Parent IP networks in descending order of prefix length.
Examples:
>>> list(ip_network_parents(\"192.168.1.1\"))\n[ipaddress.IPv4Network('192.168.1.0/31'), ipaddress.IPv4Network('192.168.1.0/30'), ... , ipaddress.IPv4Network('0.0.0.0/0')]\n
Notes ipaddress
module for network operations.bbot/core/helpers/misc.py
def ip_network_parents(i, include_self=False):\n \"\"\"\n Generates all parent IP networks for a given IP address or network, optionally including the network itself.\n\n Args:\n i (str or ipaddress.IPv4Network/ipaddress.IPv6Network): The IP address or network to find parents for.\n include_self (bool, optional): Whether to include the network itself in the result. Default is False.\n\n Yields:\n ipaddress.IPv4Network or ipaddress.IPv6Network: Parent IP networks in descending order of prefix length.\n\n Examples:\n >>> list(ip_network_parents(\"192.168.1.1\"))\n [ipaddress.IPv4Network('192.168.1.0/31'), ipaddress.IPv4Network('192.168.1.0/30'), ... , ipaddress.IPv4Network('0.0.0.0/0')]\n\n Notes:\n - Utilizes Python's built-in `ipaddress` module for network operations.\n \"\"\"\n net = ipaddress.ip_network(i, strict=False)\n for i in range(net.prefixlen - (0 if include_self else 1), -1, -1):\n yield ipaddress.ip_network(f\"{net.network_address}/{i}\", strict=False)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_async_function","title":"is_async_function","text":"is_async_function(f)\n
Check if a given function is an asynchronous function.
Parameters:
f
(function
) \u2013 The function to check.
Returns:
bool
\u2013 True if the function is asynchronous, False otherwise.
Examples:
>>> async def foo():\n... pass\n>>> is_async_function(foo)\nTrue\n
Source code in bbot/core/helpers/misc.py
def is_async_function(f):\n \"\"\"\n Check if a given function is an asynchronous function.\n\n Args:\n f (function): The function to check.\n\n Returns:\n bool: True if the function is asynchronous, False otherwise.\n\n Examples:\n >>> async def foo():\n ... pass\n >>> is_async_function(foo)\n True\n \"\"\"\n import inspect\n\n return inspect.iscoroutinefunction(f)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_dns_name","title":"is_dns_name","text":"is_dns_name(d, include_local=True)\n
Determines if the given string is a valid DNS name.
Parameters:
d
(str
) \u2013 The string to be checked.
include_local
(bool
, default: True
) \u2013 Consider local hostnames to be valid (hostnames without periods)
Returns:
bool
\u2013 True if the string is a valid DNS name, False otherwise.
Examples:
>>> is_dns_name('www.example.com')\nTrue\n>>> is_dns_name('localhost')\nTrue\n>>> is_dns_name('localhost', include_local=False)\nFalse\n>>> is_dns_name('192.168.1.1')\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_dns_name(d, include_local=True):\n \"\"\"\n Determines if the given string is a valid DNS name.\n\n Args:\n d (str): The string to be checked.\n include_local (bool): Consider local hostnames to be valid (hostnames without periods)\n\n Returns:\n bool: True if the string is a valid DNS name, False otherwise.\n\n Examples:\n >>> is_dns_name('www.example.com')\n True\n >>> is_dns_name('localhost')\n True\n >>> is_dns_name('localhost', include_local=False)\n False\n >>> is_dns_name('192.168.1.1')\n False\n \"\"\"\n if is_ip(d):\n return False\n d = smart_decode(d)\n if include_local:\n if bbot_regexes.hostname_regex.match(d):\n return True\n if bbot_regexes.dns_name_regex.match(d):\n return True\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_domain","title":"is_domain","text":"is_domain(d)\n
Check if the given input represents a domain without subdomains.
This function takes an input string d
and returns True if it represents a domain without any subdomains. Otherwise, it returns False.
Parameters:
d
(str
) \u2013 The input string containing the domain.
Returns:
bool
\u2013 True if the input is a domain without subdomains, False otherwise.
Examples:
>>> is_domain(\"evilcorp.co.uk\")\nTrue\n
>>> is_domain(\"www.evilcorp.co.uk\")\nFalse\n
Notes bbot/core/helpers/misc.py
def is_domain(d):\n \"\"\"\n Check if the given input represents a domain without subdomains.\n\n This function takes an input string `d` and returns True if it represents a domain without any subdomains.\n Otherwise, it returns False.\n\n Args:\n d (str): The input string containing the domain.\n\n Returns:\n bool: True if the input is a domain without subdomains, False otherwise.\n\n Examples:\n >>> is_domain(\"evilcorp.co.uk\")\n True\n\n >>> is_domain(\"www.evilcorp.co.uk\")\n False\n\n Notes:\n - Port, if present in input, is ignored.\n \"\"\"\n d, _ = split_host_port(d)\n if is_ip(d):\n return False\n extracted = tldextract(d)\n if extracted.registered_domain:\n if not extracted.subdomain:\n return True\n else:\n return d.count(\".\") == 1\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_file","title":"is_file","text":"is_file(f)\n
Check if a path points to a file.
Parameters:
f
(str
) \u2013 Path to the file.
Returns:
bool
\u2013 True if the path is a file, False otherwise.
Examples:
>>> is_file(\"/etc/passwd\")\nTrue\n
>>> is_file(\"/nonexistent\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_file(f):\n \"\"\"\n Check if a path points to a file.\n\n Parameters:\n f (str): Path to the file.\n\n Returns:\n bool: True if the path is a file, False otherwise.\n\n Examples:\n >>> is_file(\"/etc/passwd\")\n True\n\n >>> is_file(\"/nonexistent\")\n False\n \"\"\"\n with suppress(Exception):\n return Path(f).is_file()\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_ip","title":"is_ip","text":"is_ip(d, version=None)\n
Checks if the given string or object represents a valid IP address.
Parameters:
d
(str or IPvXAddress
) \u2013 The IP address to check.
version
(int
, default: None
) \u2013 The IP version to validate (4 or 6). Default is None.
Returns:
bool
\u2013 True if the string or object is a valid IP address, False otherwise.
Examples:
>>> is_ip('192.168.1.1')\nTrue\n>>> is_ip('bad::c0de', version=6)\nTrue\n>>> is_ip('bad::c0de', version=4)\nFalse\n>>> is_ip('evilcorp.com')\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_ip(d, version=None):\n \"\"\"\n Checks if the given string or object represents a valid IP address.\n\n Args:\n d (str or ipaddress.IPvXAddress): The IP address to check.\n version (int, optional): The IP version to validate (4 or 6). Default is None.\n\n Returns:\n bool: True if the string or object is a valid IP address, False otherwise.\n\n Examples:\n >>> is_ip('192.168.1.1')\n True\n >>> is_ip('bad::c0de', version=6)\n True\n >>> is_ip('bad::c0de', version=4)\n False\n >>> is_ip('evilcorp.com')\n False\n \"\"\"\n try:\n ip = ipaddress.ip_address(d)\n if version is None or ip.version == version:\n return True\n except Exception:\n pass\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_ip_type","title":"is_ip_type","text":"is_ip_type(i)\n
Checks if the given object is an instance of an IPv4 or IPv6 type from the ipaddress module.
Parameters:
i
(_BaseV4 or _BaseV6
) \u2013 The IP object to check.
Returns:
bool
\u2013 True if the object is an instance of ipaddress._BaseV4 or ipaddress._BaseV6, False otherwise.
Examples:
>>> is_ip_type(ipaddress.IPv6Address('dead::beef'))\nTrue\n>>> is_ip_type(ipaddress.IPv4Network('192.168.1.0/24'))\nTrue\n>>> is_ip_type(\"192.168.1.0/24\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_ip_type(i):\n \"\"\"\n Checks if the given object is an instance of an IPv4 or IPv6 type from the ipaddress module.\n\n Args:\n i (ipaddress._BaseV4 or ipaddress._BaseV6): The IP object to check.\n\n Returns:\n bool: True if the object is an instance of ipaddress._BaseV4 or ipaddress._BaseV6, False otherwise.\n\n Examples:\n >>> is_ip_type(ipaddress.IPv6Address('dead::beef'))\n True\n >>> is_ip_type(ipaddress.IPv4Network('192.168.1.0/24'))\n True\n >>> is_ip_type(\"192.168.1.0/24\")\n False\n \"\"\"\n return ipaddress._IPAddressBase in i.__class__.__mro__\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_port","title":"is_port","text":"is_port(p)\n
Checks if the given string represents a valid port number.
Parameters:
p
(str or int
) \u2013 The port number to check.
Returns:
bool
\u2013 True if the port number is valid, False otherwise.
Examples:
>>> is_port('80')\nTrue\n>>> is_port('70000')\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_port(p):\n \"\"\"\n Checks if the given string represents a valid port number.\n\n Args:\n p (str or int): The port number to check.\n\n Returns:\n bool: True if the port number is valid, False otherwise.\n\n Examples:\n >>> is_port('80')\n True\n >>> is_port('70000')\n False\n \"\"\"\n\n p = str(p)\n return p and p.isdigit() and 0 <= int(p) <= 65535\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_ptr","title":"is_ptr","text":"is_ptr(d)\n
Check if the given input represents a PTR record domain.
This function takes an input string d
and returns True if it matches the PTR record format. Otherwise, it returns False.
Parameters:
d
(str
) \u2013 The input string potentially representing a PTR record domain.
Returns:
bool
\u2013 True if the input matches PTR record format, False otherwise.
Examples:
>>> is_ptr(\"wsc-11-22-33-44.evilcorp.com\")\nTrue\n
>>> is_ptr(\"www2.evilcorp.com\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_ptr(d):\n \"\"\"\n Check if the given input represents a PTR record domain.\n\n This function takes an input string `d` and returns True if it matches the PTR record format.\n Otherwise, it returns False.\n\n Args:\n d (str): The input string potentially representing a PTR record domain.\n\n Returns:\n bool: True if the input matches PTR record format, False otherwise.\n\n Examples:\n >>> is_ptr(\"wsc-11-22-33-44.evilcorp.com\")\n True\n\n >>> is_ptr(\"www2.evilcorp.com\")\n False\n \"\"\"\n return bool(bbot_regexes.ptr_regex.search(str(d)))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_subdomain","title":"is_subdomain","text":"is_subdomain(d)\n
Check if the given input represents a subdomain.
This function takes an input string d
and returns True if it represents a subdomain. Otherwise, it returns False.
Parameters:
d
(str
) \u2013 The input string containing the domain or subdomain.
Returns:
bool
\u2013 True if the input is a subdomain, False otherwise.
Examples:
>>> is_subdomain(\"www.evilcorp.co.uk\")\nTrue\n
>>> is_subdomain(\"evilcorp.co.uk\")\nFalse\n
Notes bbot/core/helpers/misc.py
def is_subdomain(d):\n \"\"\"\n Check if the given input represents a subdomain.\n\n This function takes an input string `d` and returns True if it represents a subdomain.\n Otherwise, it returns False.\n\n Args:\n d (str): The input string containing the domain or subdomain.\n\n Returns:\n bool: True if the input is a subdomain, False otherwise.\n\n Examples:\n >>> is_subdomain(\"www.evilcorp.co.uk\")\n True\n\n >>> is_subdomain(\"evilcorp.co.uk\")\n False\n\n Notes:\n - Port, if present in input, is ignored.\n \"\"\"\n d, _ = split_host_port(d)\n if is_ip(d):\n return False\n extracted = tldextract(d)\n if extracted.registered_domain:\n if extracted.subdomain:\n return True\n else:\n return d.count(\".\") > 1\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_uri","title":"is_uri","text":"is_uri(u, return_scheme=False)\n
Check if the given input represents a URI and optionally return its scheme.
This function takes an input string u
and returns True if it matches a URI format. When return_scheme
is True, it returns the URI scheme instead of a boolean.
Parameters:
u
(str
) \u2013 The input string potentially representing a URI.
return_scheme
(bool
, default: False
) \u2013 Whether to return the URI scheme. Defaults to False.
Returns:
Union[bool, str]: True if the input matches a URI format; the URI scheme if return_scheme
is True.
Examples:
>>> is_uri(\"http://evilcorp.com\")\nTrue\n
>>> is_uri(\"ftp://evilcorp.com\")\nTrue\n
>>> is_uri(\"evilcorp.com\")\nFalse\n
>>> is_uri(\"ftp://evilcorp.com\", return_scheme=True)\n\"ftp\"\n
Source code in bbot/core/helpers/misc.py
def is_uri(u, return_scheme=False):\n \"\"\"\n Check if the given input represents a URI and optionally return its scheme.\n\n This function takes an input string `u` and returns True if it matches a URI format.\n When `return_scheme` is True, it returns the URI scheme instead of a boolean.\n\n Args:\n u (str): The input string potentially representing a URI.\n return_scheme (bool, optional): Whether to return the URI scheme. Defaults to False.\n\n Returns:\n Union[bool, str]: True if the input matches a URI format; the URI scheme if `return_scheme` is True.\n\n Examples:\n >>> is_uri(\"http://evilcorp.com\")\n True\n\n >>> is_uri(\"ftp://evilcorp.com\")\n True\n\n >>> is_uri(\"evilcorp.com\")\n False\n\n >>> is_uri(\"ftp://evilcorp.com\", return_scheme=True)\n \"ftp\"\n \"\"\"\n match = uri_regex.match(u)\n if return_scheme:\n if match:\n return match.groups()[0].lower()\n return \"\"\n return bool(match)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.is_url","title":"is_url","text":"is_url(u)\n
Check if the given input represents a valid URL.
This function takes an input string u
and returns True if it matches any of the predefined URL formats. Otherwise, it returns False.
Parameters:
u
(str
) \u2013 The input string potentially representing a URL.
Returns:
bool
\u2013 True if the input matches a valid URL format, False otherwise.
Examples:
>>> is_url(\"https://evilcorp.com\")\nTrue\n
>>> is_url(\"not-a-url\")\nFalse\n
Source code in bbot/core/helpers/misc.py
def is_url(u):\n \"\"\"\n Check if the given input represents a valid URL.\n\n This function takes an input string `u` and returns True if it matches any of the predefined URL formats.\n Otherwise, it returns False.\n\n Args:\n u (str): The input string potentially representing a URL.\n\n Returns:\n bool: True if the input matches a valid URL format, False otherwise.\n\n Examples:\n >>> is_url(\"https://evilcorp.com\")\n True\n\n >>> is_url(\"not-a-url\")\n False\n \"\"\"\n u = str(u)\n for r in bbot_regexes.event_type_regexes[\"URL\"]:\n if r.match(u):\n return True\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.kill_children","title":"kill_children","text":"kill_children(parent_pid=None, sig=None)\n
Forgive me father for I have sinned
Source code inbbot/core/helpers/misc.py
def kill_children(parent_pid=None, sig=None):\n \"\"\"\n Forgive me father for I have sinned\n \"\"\"\n import psutil\n import signal\n\n if sig is None:\n sig = signal.SIGTERM\n\n try:\n parent = psutil.Process(parent_pid)\n except psutil.NoSuchProcess:\n log.debug(f\"No such PID: {parent_pid}\")\n return\n log.debug(f\"Killing children of process ID {parent.pid}\")\n children = parent.children(recursive=True)\n for child in children:\n log.debug(f\"Killing child with PID {child.pid}\")\n if child.name != \"python\":\n try:\n child.send_signal(sig)\n except psutil.NoSuchProcess:\n log.debug(f\"No such PID: {child.pid}\")\n except psutil.AccessDenied:\n log.debug(f\"Error killing PID: {child.pid} - access denied\")\n log.debug(f\"Finished killing children of process ID {parent.pid}\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.latest_mtime","title":"latest_mtime","text":"latest_mtime(d)\n
Get the latest modified time of any file or sub-directory in a given directory.
This function takes a directory path as an argument and returns the latest modified time of any contained file or directory, recursively. It's useful for sorting directories by modified time for cleanup or other purposes.
Parameters:
d
(str or Path
) \u2013 The directory path to search for the latest modified time.
Returns:
float
\u2013 The latest modified time in Unix timestamp format.
Examples:
>>> latest_mtime(\"~/.bbot/scans/mushy_susan\")\n1659016928.2848816\n
Source code in bbot/core/helpers/misc.py
def latest_mtime(d):\n \"\"\"Get the latest modified time of any file or sub-directory in a given directory.\n\n This function takes a directory path as an argument and returns the latest modified time\n of any contained file or directory, recursively. It's useful for sorting directories by\n modified time for cleanup or other purposes.\n\n Args:\n d (str or Path): The directory path to search for the latest modified time.\n\n Returns:\n float: The latest modified time in Unix timestamp format.\n\n Examples:\n >>> latest_mtime(\"~/.bbot/scans/mushy_susan\")\n 1659016928.2848816\n \"\"\"\n d = Path(d).resolve()\n mtimes = [d.lstat().st_mtime]\n if d.is_dir():\n to_list = d.glob(\"**/*\")\n else:\n to_list = [d]\n for e in to_list:\n mtimes.append(e.lstat().st_mtime)\n return max(mtimes)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.list_files","title":"list_files","text":"list_files(directory, filter=lambda x: True)\n
Lists files in a given directory that meet a specified filter condition.
Parameters:
directory
(str
) \u2013 The directory where to list files.
filter
(callable
, default: lambda x: True
) \u2013 A function to filter the files. Defaults to a lambda function that returns True for all files.
Yields:
Path
\u2013 A Path object for each file that meets the filter condition.
Examples:
>>> list(list_files(\"/tmp/test\"))\n[Path('/tmp/test/file1.py'), Path('/tmp/test/file2.txt')]\n
>>> list(list_files(\"/tmp/test\"), filter=lambda f: f.suffix == \".py\")\n[Path('/tmp/test/file1.py')]\n
Source code in bbot/core/helpers/misc.py
def list_files(directory, filter=lambda x: True):\n \"\"\"Lists files in a given directory that meet a specified filter condition.\n\n Args:\n directory (str): The directory where to list files.\n filter (callable, optional): A function to filter the files. Defaults to a lambda function that returns True for all files.\n\n Yields:\n Path: A Path object for each file that meets the filter condition.\n\n Examples:\n >>> list(list_files(\"/tmp/test\"))\n [Path('/tmp/test/file1.py'), Path('/tmp/test/file2.txt')]\n\n >>> list(list_files(\"/tmp/test\"), filter=lambda f: f.suffix == \".py\")\n [Path('/tmp/test/file1.py')]\n \"\"\"\n directory = Path(directory).resolve()\n if directory.is_dir():\n for file in directory.iterdir():\n if file.is_file() and filter(file):\n yield file\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_date","title":"make_date","text":"make_date(d=None, microseconds=False)\n
Generates a string representation of the current date and time, with optional microsecond precision.
Parameters:
d
(datetime
, default: None
) \u2013 A datetime object to convert. Defaults to the current date and time.
microseconds
(bool
, default: False
) \u2013 Whether to include microseconds. Defaults to False.
Returns:
str
\u2013 A string representation of the date and time, formatted as YYYYMMDD_HHMM_SS or YYYYMMDD_HHMM_SSFFFFFF if microseconds are included.
Examples:
>>> make_date()\n\"20220707_1325_50\"\n>>> make_date(microseconds=True)\n\"20220707_1330_35167617\"\n
Source code in bbot/core/helpers/misc.py
def make_date(d=None, microseconds=False):\n \"\"\"\n Generates a string representation of the current date and time, with optional microsecond precision.\n\n Args:\n d (datetime, optional): A datetime object to convert. Defaults to the current date and time.\n microseconds (bool, optional): Whether to include microseconds. Defaults to False.\n\n Returns:\n str: A string representation of the date and time, formatted as YYYYMMDD_HHMM_SS or YYYYMMDD_HHMM_SSFFFFFF if microseconds are included.\n\n Examples:\n >>> make_date()\n \"20220707_1325_50\"\n >>> make_date(microseconds=True)\n \"20220707_1330_35167617\"\n \"\"\"\n from datetime import datetime\n\n f = \"%Y%m%d_%H%M_%S\"\n if microseconds:\n f += \"%f\"\n if d is None:\n d = datetime.now()\n return d.strftime(f)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_ip_type","title":"make_ip_type","text":"make_ip_type(s)\n
Convert a string to its corresponding IP address or network type.
This function attempts to convert the input string s
into either an IPv4 or IPv6 address object, or an IPv4 or IPv6 network object. If none of these conversions are possible, the original string is returned.
Parameters:
s
(str
) \u2013 The input string to be converted.
Returns:
Union[IPv4Address, IPv6Address, IPv4Network, IPv6Network, str]: The converted object or original string.
Examples:
>>> make_ip_type(\"dead::beef\")\nIPv6Address('dead::beef')\n
>>> make_ip_type(\"192.168.1.0/24\")\nIPv4Network('192.168.1.0/24')\n
>>> make_ip_type(\"evilcorp.com\")\n'evilcorp.com'\n
Source code in bbot/core/helpers/misc.py
def make_ip_type(s):\n \"\"\"\n Convert a string to its corresponding IP address or network type.\n\n This function attempts to convert the input string `s` into either an IPv4 or IPv6 address object,\n or an IPv4 or IPv6 network object. If none of these conversions are possible, the original string is returned.\n\n Args:\n s (str): The input string to be converted.\n\n Returns:\n Union[IPv4Address, IPv6Address, IPv4Network, IPv6Network, str]: The converted object or original string.\n\n Examples:\n >>> make_ip_type(\"dead::beef\")\n IPv6Address('dead::beef')\n\n >>> make_ip_type(\"192.168.1.0/24\")\n IPv4Network('192.168.1.0/24')\n\n >>> make_ip_type(\"evilcorp.com\")\n 'evilcorp.com'\n \"\"\"\n if not s:\n raise ValueError(f'Invalid hostname: \"{s}\"')\n # IP address\n with suppress(Exception):\n return ipaddress.ip_address(s)\n # IP network\n with suppress(Exception):\n return ipaddress.ip_network(s, strict=False)\n return s\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_netloc","title":"make_netloc","text":"make_netloc(host, port)\n
Constructs a network location string from a given host and port.
Parameters:
host
(str
) \u2013 The hostname or IP address.
port
(int
) \u2013 The port number. If None, the port is omitted.
Returns:
str
\u2013 A network location string in the form 'host' or 'host:port'.
Examples:
>>> make_netloc(\"192.168.1.1\", None)\n\"192.168.1.1\"\n
>>> make_netloc(\"192.168.1.1\", 443)\n\"192.168.1.1:443\"\n
>>> make_netloc(\"evilcorp.com\", 80)\n\"evilcorp.com:80\"\n
>>> make_netloc(\"dead::beef\", None)\n\"[dead::beef]\"\n
>>> make_netloc(\"dead::beef\", 443)\n\"[dead::beef]:443\"\n
Source code in bbot/core/helpers/misc.py
def make_netloc(host, port):\n \"\"\"Constructs a network location string from a given host and port.\n\n Args:\n host (str): The hostname or IP address.\n port (int, optional): The port number. If None, the port is omitted.\n\n Returns:\n str: A network location string in the form 'host' or 'host:port'.\n\n Examples:\n >>> make_netloc(\"192.168.1.1\", None)\n \"192.168.1.1\"\n\n >>> make_netloc(\"192.168.1.1\", 443)\n \"192.168.1.1:443\"\n\n >>> make_netloc(\"evilcorp.com\", 80)\n \"evilcorp.com:80\"\n\n >>> make_netloc(\"dead::beef\", None)\n \"[dead::beef]\"\n\n >>> make_netloc(\"dead::beef\", 443)\n \"[dead::beef]:443\"\n \"\"\"\n if is_ip(host, version=6):\n host = f\"[{host}]\"\n if port is None:\n return host\n return f\"{host}:{port}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.make_table","title":"make_table","text":"make_table(rows, header, **kwargs)\n
Generate a formatted table from the given rows and headers.
This function uses the tabulate
package to generate a table with formatting options. It can accept various input formats and table styles, which can be customized using optional arguments.
Parameters:
*args
\u2013 Positional arguments to be passed to tabulate.tabulate
.
**kwargs
\u2013 Keyword arguments to customize table formatting. - tablefmt (str, optional): Table format. Default is 'grid'. - disable_numparse (bool, optional): Disable automatic number parsing. Default is True. - maxcolwidths (int, optional): Maximum column width. Default is 40.
Returns:
str
\u2013 A string representing the formatted table.
Examples:
>>> print(make_table([[\"row1\", \"row1\"], [\"row2\", \"row2\"]], [\"header1\", \"header2\"]))\n+-----------+-----------+\n| header1 | header2 |\n+===========+===========+\n| row1 | row1 |\n+-----------+-----------+\n| row2 | row2 |\n+-----------+-----------+\n
Source code in bbot/core/helpers/misc.py
def make_table(rows, header, **kwargs):\n \"\"\"Generate a formatted table from the given rows and headers.\n\n This function uses the `tabulate` package to generate a table with formatting options.\n It can accept various input formats and table styles, which can be customized using optional arguments.\n\n Args:\n *args: Positional arguments to be passed to `tabulate.tabulate`.\n **kwargs: Keyword arguments to customize table formatting.\n - tablefmt (str, optional): Table format. Default is 'grid'.\n - disable_numparse (bool, optional): Disable automatic number parsing. Default is True.\n - maxcolwidths (int, optional): Maximum column width. Default is 40.\n\n Returns:\n str: A string representing the formatted table.\n\n Examples:\n >>> print(make_table([[\"row1\", \"row1\"], [\"row2\", \"row2\"]], [\"header1\", \"header2\"]))\n +-----------+-----------+\n | header1 | header2 |\n +===========+===========+\n | row1 | row1 |\n +-----------+-----------+\n | row2 | row2 |\n +-----------+-----------+\n \"\"\"\n from tabulate import tabulate\n\n # fix IndexError: list index out of range\n if not rows:\n rows = [[]]\n tablefmt = os.environ.get(\"BBOT_TABLE_FORMAT\", None)\n defaults = {\"tablefmt\": \"grid\", \"disable_numparse\": True, \"maxcolwidths\": None}\n if tablefmt is None:\n defaults.update({\"maxcolwidths\": 40})\n else:\n defaults.update({\"tablefmt\": tablefmt})\n for k, v in defaults.items():\n if k not in kwargs:\n kwargs[k] = v\n # don't wrap columns in markdown\n if tablefmt in (\"github\", \"markdown\"):\n kwargs.pop(\"maxcolwidths\")\n # escape problematic markdown characters in rows\n\n def markdown_escape(s):\n return str(s).replace(\"|\", \"|\")\n\n rows = [[markdown_escape(f) for f in row] for row in rows]\n header = [markdown_escape(h) for h in header]\n return tabulate(rows, header, **kwargs)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.memory_status","title":"memory_status","text":"memory_status()\n
Return statistics on system memory consumption.
The function returns a psutil
named tuple that contains statistics on system virtual memory usage, such as total memory, used memory, available memory, and more.
Returns:
psutil._pslinux.svmem: A named tuple representing various statistics about system virtual memory usage.
Examples:
>>> mem = memory_status()\n>>> mem.available\n13195399168\n
>>> mem = memory_status()\n>>> mem.percent\n79.0\n
Source code in bbot/core/helpers/misc.py
def memory_status():\n \"\"\"Return statistics on system memory consumption.\n\n The function returns a `psutil` named tuple that contains statistics on\n system virtual memory usage, such as total memory, used memory, available\n memory, and more.\n\n Returns:\n psutil._pslinux.svmem: A named tuple representing various statistics\n about system virtual memory usage.\n\n Examples:\n >>> mem = memory_status()\n >>> mem.available\n 13195399168\n\n >>> mem = memory_status()\n >>> mem.percent\n 79.0\n \"\"\"\n import psutil\n\n return psutil.virtual_memory()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.mkdir","title":"mkdir","text":"mkdir(path, check_writable=True, raise_error=True)\n
Creates a directory and optionally checks if it's writable.
Parameters:
path
(str or Path
) \u2013 The directory to create.
check_writable
(bool
, default: True
) \u2013 Whether to check if the directory is writable. Default is True.
raise_error
(bool
, default: True
) \u2013 Whether to raise an error if the directory creation fails. Default is True.
Returns:
bool
\u2013 True if the directory is successfully created (and writable, if check_writable=True); otherwise False.
Raises:
DirectoryCreationError
\u2013 Raised if the directory cannot be created and raise_error=True
.
Examples:
>>> mkdir(\"/tmp/new_dir\")\nTrue\n>>> mkdir(\"/restricted_dir\", check_writable=False, raise_error=False)\nFalse\n
Source code in bbot/core/helpers/misc.py
def mkdir(path, check_writable=True, raise_error=True):\n \"\"\"\n Creates a directory and optionally checks if it's writable.\n\n Args:\n path (str or Path): The directory to create.\n check_writable (bool, optional): Whether to check if the directory is writable. Default is True.\n raise_error (bool, optional): Whether to raise an error if the directory creation fails. Default is True.\n\n Returns:\n bool: True if the directory is successfully created (and writable, if check_writable=True); otherwise False.\n\n Raises:\n DirectoryCreationError: Raised if the directory cannot be created and `raise_error=True`.\n\n Examples:\n >>> mkdir(\"/tmp/new_dir\")\n True\n >>> mkdir(\"/restricted_dir\", check_writable=False, raise_error=False)\n False\n \"\"\"\n path = Path(path).resolve()\n touchfile = path / f\".{rand_string()}\"\n try:\n path.mkdir(exist_ok=True, parents=True)\n if check_writable:\n touchfile.touch()\n return True\n except Exception as e:\n if raise_error:\n raise errors.DirectoryCreationError(f\"Failed to create directory at {path}: {e}\")\n finally:\n with suppress(Exception):\n touchfile.unlink()\n return False\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.os_platform","title":"os_platform","text":"os_platform()\n
Return the OS platform of the current system.
This function fetches and returns the OS type where the code is being executed. It converts the platform identifier to lowercase.
Returns:
str
\u2013 A string representing the OS platform, such as \"linux\", \"darwin\", or \"windows\".
Examples:
>>> os_platform()\n'linux'\n
Source code in bbot/core/helpers/misc.py
def os_platform():\n \"\"\"Return the OS platform of the current system.\n\n This function fetches and returns the OS type where the code is being executed.\n It converts the platform identifier to lowercase.\n\n Returns:\n str: A string representing the OS platform, such as \"linux\", \"darwin\", or \"windows\".\n\n Examples:\n >>> os_platform()\n 'linux'\n \"\"\"\n import platform\n\n return platform.system().lower()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.os_platform_friendly","title":"os_platform_friendly","text":"os_platform_friendly()\n
Return a human-friendly OS platform string, suitable for golang release binaries.
This function fetches the OS platform and modifies it to a more human-readable format if necessary. Specifically, it changes \"darwin\" to \"macOS\".
Returns:
str
\u2013 A string representing the human-friendly OS platform, such as \"macOS\", \"linux\", or \"windows\".
Examples:
>>> os_platform_friendly()\n'macOS'\n
Source code in bbot/core/helpers/misc.py
def os_platform_friendly():\n \"\"\"Return a human-friendly OS platform string, suitable for golang release binaries.\n\n This function fetches the OS platform and modifies it to a more human-readable format if necessary.\n Specifically, it changes \"darwin\" to \"macOS\".\n\n Returns:\n str: A string representing the human-friendly OS platform, such as \"macOS\", \"linux\", or \"windows\".\n\n Examples:\n >>> os_platform_friendly()\n 'macOS'\n \"\"\"\n p = os_platform()\n if p == \"darwin\":\n return \"macOS\"\n return p\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.parent_domain","title":"parent_domain","text":"parent_domain(d)\n
Retrieve the parent domain of a given subdomain string.
This function takes an input string d
representing a subdomain and returns its parent domain. If the input does not represent a subdomain, it returns the input as is.
Parameters:
d
(str
) \u2013 The input string representing a subdomain or domain.
Returns:
str
\u2013 The parent domain of the subdomain, or the original input if it is not a subdomain.
Examples:
>>> parent_domain(\"www.internal.evilcorp.co.uk\")\n\"internal.evilcorp.co.uk\"\n
>>> parent_domain(\"www.internal.evilcorp.co.uk:8080\")\n\"internal.evilcorp.co.uk:8080\"\n
>>> parent_domain(\"www.evilcorp.co.uk\")\n\"evilcorp.co.uk\"\n
>>> parent_domain(\"evilcorp.co.uk\")\n\"evilcorp.co.uk\"\n
Notes bbot/core/helpers/misc.py
def parent_domain(d):\n \"\"\"\n Retrieve the parent domain of a given subdomain string.\n\n This function takes an input string `d` representing a subdomain and returns its parent domain.\n If the input does not represent a subdomain, it returns the input as is.\n\n Args:\n d (str): The input string representing a subdomain or domain.\n\n Returns:\n str: The parent domain of the subdomain, or the original input if it is not a subdomain.\n\n Examples:\n >>> parent_domain(\"www.internal.evilcorp.co.uk\")\n \"internal.evilcorp.co.uk\"\n\n >>> parent_domain(\"www.internal.evilcorp.co.uk:8080\")\n \"internal.evilcorp.co.uk:8080\"\n\n >>> parent_domain(\"www.evilcorp.co.uk\")\n \"evilcorp.co.uk\"\n\n >>> parent_domain(\"evilcorp.co.uk\")\n \"evilcorp.co.uk\"\n\n Notes:\n - Port, if present in input, is preserved in the output.\n \"\"\"\n host, port = split_host_port(d)\n if is_subdomain(d):\n return make_netloc(\".\".join(str(host).split(\".\")[1:]), port)\n return d\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.parent_url","title":"parent_url","text":"parent_url(u)\n
Retrieve the parent URL of a given URL.
This function takes an input string u
representing a URL and returns its parent URL. If the input URL does not have a parent (i.e., it's already the top-level), it returns None.
Parameters:
u
(str
) \u2013 The input string representing a URL.
Returns:
Union[str, None]: The parent URL of the input URL, or None if it has no parent.
Examples:
>>> parent_url(\"https://evilcorp.com/sub/path/\")\n\"https://evilcorp.com/sub/\"\n
>>> parent_url(\"https://evilcorp.com/\")\nNone\n
Notes bbot/core/helpers/misc.py
def parent_url(u):\n \"\"\"\n Retrieve the parent URL of a given URL.\n\n This function takes an input string `u` representing a URL and returns its parent URL.\n If the input URL does not have a parent (i.e., it's already the top-level), it returns None.\n\n Args:\n u (str): The input string representing a URL.\n\n Returns:\n Union[str, None]: The parent URL of the input URL, or None if it has no parent.\n\n Examples:\n >>> parent_url(\"https://evilcorp.com/sub/path/\")\n \"https://evilcorp.com/sub/\"\n\n >>> parent_url(\"https://evilcorp.com/\")\n None\n\n Notes:\n - Only the path component of the URL is modified.\n - All other components like scheme, netloc, query, and fragment are preserved.\n \"\"\"\n parsed = urlparse(u)\n path = Path(parsed.path)\n if path.parent == path:\n return None\n else:\n return urlunparse(parsed._replace(path=str(path.parent)))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.parse_port_string","title":"parse_port_string","text":"parse_port_string(port_string)\n
Parses a string containing ports and port ranges into a list of individual ports.
Parameters:
port_string
(str
) \u2013 The string containing individual ports and port ranges separated by commas.
Returns:
list
\u2013 A list of individual ports parsed from the input string.
Raises:
ValueError
\u2013 If the input string contains invalid ports or port ranges.
Examples:
>>> parse_port_string(\"22,80,1000-1002\")\n[22, 80, 1000, 1001, 1002]\n
>>> parse_port_string(\"1-2,3-5\")\n[1, 2, 3, 4, 5]\n
>>> parse_port_string(\"invalid\")\nValueError: Invalid port or port range: invalid\n
Source code in bbot/core/helpers/misc.py
def parse_port_string(port_string):\n \"\"\"\n Parses a string containing ports and port ranges into a list of individual ports.\n\n Args:\n port_string (str): The string containing individual ports and port ranges separated by commas.\n\n Returns:\n list: A list of individual ports parsed from the input string.\n\n Raises:\n ValueError: If the input string contains invalid ports or port ranges.\n\n Examples:\n >>> parse_port_string(\"22,80,1000-1002\")\n [22, 80, 1000, 1001, 1002]\n\n >>> parse_port_string(\"1-2,3-5\")\n [1, 2, 3, 4, 5]\n\n >>> parse_port_string(\"invalid\")\n ValueError: Invalid port or port range: invalid\n \"\"\"\n elements = str(port_string).split(\",\")\n ports = []\n\n for element in elements:\n if element.isdigit():\n port = int(element)\n if 1 <= port <= 65535:\n ports.append(port)\n else:\n raise ValueError(f\"Invalid port: {element}\")\n elif \"-\" in element:\n range_parts = element.split(\"-\")\n if len(range_parts) != 2 or not all(part.isdigit() for part in range_parts):\n raise ValueError(f\"Invalid port or port range: {element}\")\n start, end = map(int, range_parts)\n if not (1 <= start < end <= 65535):\n raise ValueError(f\"Invalid port range: {element}\")\n ports.extend(range(start, end + 1))\n else:\n raise ValueError(f\"Invalid port or port range: {element}\")\n\n return ports\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.rand_string","title":"rand_string","text":"rand_string(length=10, digits=True)\n
Generates a random string of specified length.
Parameters:
length
(int
, default: 10
) \u2013 The length of the random string. Defaults to 10.
digits
(bool
, default: True
) \u2013 Whether to include digits in the string. Defaults to True.
Returns:
str
\u2013 A random string of the specified length.
Examples:
>>> rand_string()\n'c4hp4i9jzx'\n>>> rand_string(20)\n'ap4rsdtg5iw7ey7y3oa5'\n>>> rand_string(30, digits=False)\n'xdmyxtglqfzqktngkesyulwbfrihva'\n
Source code in bbot/core/helpers/misc.py
def rand_string(length=10, digits=True):\n \"\"\"\n Generates a random string of specified length.\n\n Args:\n length (int, optional): The length of the random string. Defaults to 10.\n digits (bool, optional): Whether to include digits in the string. Defaults to True.\n\n Returns:\n str: A random string of the specified length.\n\n Examples:\n >>> rand_string()\n 'c4hp4i9jzx'\n >>> rand_string(20)\n 'ap4rsdtg5iw7ey7y3oa5'\n >>> rand_string(30, digits=False)\n 'xdmyxtglqfzqktngkesyulwbfrihva'\n \"\"\"\n pool = rand_pool\n if digits:\n pool = rand_pool_digits\n return \"\".join([random.choice(pool) for _ in range(int(length))])\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.read_file","title":"read_file","text":"read_file(filename)\n
Reads a file line by line and yields each line without line breaks.
Parameters:
filename
(str or Path
) \u2013 The path to the file to read.
Yields:
str
\u2013 A line from the file without the trailing line break.
Examples:
>>> for line in read_file(\"/tmp/file.txt\"):\n... print(line)\nfile_line1\nfile_line2\nfile_line3\n
Source code in bbot/core/helpers/misc.py
def read_file(filename):\n \"\"\"Reads a file line by line and yields each line without line breaks.\n\n Args:\n filename (str or Path): The path to the file to read.\n\n Yields:\n str: A line from the file without the trailing line break.\n\n Examples:\n >>> for line in read_file(\"/tmp/file.txt\"):\n ... print(line)\n file_line1\n file_line2\n file_line3\n \"\"\"\n with open(filename, errors=\"ignore\") as f:\n for line in f:\n yield line.rstrip(\"\\r\\n\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.recursive_decode","title":"recursive_decode","text":"recursive_decode(data, max_depth=5)\n
Recursively decodes doubly or triply-encoded strings to their original form.
Supports both URL-encoding and backslash-escapes (including unicode)
Parameters:
data
(str
) \u2013 The data to decode.
max_depth
(int
, default: 5
) \u2013 Maximum recursion depth for decoding. Defaults to 5.
Returns:
str
\u2013 The decoded string.
Examples:
>>> recursive_decode(\"Hello%20world%21\")\n\"Hello world!\"\n>>> recursive_decode(\"Hello%20%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442\")\n\"Hello \u041f\u0440\u0438\u0432\u0435\u0442\"\n>>> recursive_dcode(\"%5Cu0020%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442%5Cu0021\")\n\" \u041f\u0440\u0438\u0432\u0435\u0442!\"\n
Source code in bbot/core/helpers/misc.py
def recursive_decode(data, max_depth=5):\n \"\"\"\n Recursively decodes doubly or triply-encoded strings to their original form.\n\n Supports both URL-encoding and backslash-escapes (including unicode)\n\n Args:\n data (str): The data to decode.\n max_depth (int, optional): Maximum recursion depth for decoding. Defaults to 5.\n\n Returns:\n str: The decoded string.\n\n Examples:\n >>> recursive_decode(\"Hello%20world%21\")\n \"Hello world!\"\n >>> recursive_decode(\"Hello%20%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442\")\n \"Hello \u041f\u0440\u0438\u0432\u0435\u0442\"\n >>> recursive_dcode(\"%5Cu0020%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442%5Cu0021\")\n \" \u041f\u0440\u0438\u0432\u0435\u0442!\"\n \"\"\"\n import codecs\n\n # Decode newline and tab escapes\n data = backslash_regex.sub(\n lambda match: {\"n\": \"\\n\", \"t\": \"\\t\", \"r\": \"\\r\", \"b\": \"\\b\", \"v\": \"\\v\"}.get(match.group(\"char\")), data\n )\n data = smart_decode(data)\n if max_depth == 0:\n return data\n # Decode URL encoding\n data = unquote(data, errors=\"ignore\")\n # Decode Unicode escapes\n with suppress(UnicodeEncodeError):\n data = ensure_utf8_compliant(codecs.decode(data, \"unicode_escape\", errors=\"ignore\"))\n # Check if there's still URL-encoded or Unicode-escaped content\n if encoded_regex.search(data):\n # If yes, continue decoding\n return recursive_decode(data, max_depth=max_depth - 1)\n return data\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.rm_at_exit","title":"rm_at_exit","text":"rm_at_exit(path)\n
Registers a file to be automatically deleted when the program exits.
Parameters:
path
(str or Path
) \u2013 The path to the file to be deleted upon program exit.
Examples:
>>> rm_at_exit(\"/tmp/test/file1.txt\")\n
Source code in bbot/core/helpers/misc.py
def rm_at_exit(path):\n \"\"\"Registers a file to be automatically deleted when the program exits.\n\n Args:\n path (str or Path): The path to the file to be deleted upon program exit.\n\n Examples:\n >>> rm_at_exit(\"/tmp/test/file1.txt\")\n \"\"\"\n import atexit\n\n atexit.register(delete_file, path)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.rm_rf","title":"rm_rf","text":"rm_rf(f)\n
Recursively delete a directory
Parameters:
f
(str or Path
) \u2013 The directory path to delete.
Examples:
>>> rm_rf(\"/tmp/httpx98323849\")\n
Source code in bbot/core/helpers/misc.py
def rm_rf(f):\n \"\"\"Recursively delete a directory\n\n Args:\n f (str or Path): The directory path to delete.\n\n Examples:\n >>> rm_rf(\"/tmp/httpx98323849\")\n \"\"\"\n import shutil\n\n shutil.rmtree(f)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.search_dict_by_key","title":"search_dict_by_key","text":"search_dict_by_key(key, d)\n
Search a nested dictionary or list of dictionaries by a key and yield all matching values.
Parameters:
key
(str
) \u2013 The key to search for.
d
(Union[dict, list]
) \u2013 The dictionary or list of dictionaries to search.
Yields:
Any
\u2013 Yields all values that match the provided key.
Examples:
>>> d = {'a': 1, 'b': {'c': 2, 'a': 3}, 'd': [{'a': 4}, {'e': 5}]}\n>>> list(search_dict_by_key('a', d))\n[1, 3, 4]\n
Source code in bbot/core/helpers/misc.py
def search_dict_by_key(key, d):\n \"\"\"Search a nested dictionary or list of dictionaries by a key and yield all matching values.\n\n Args:\n key (str): The key to search for.\n d (Union[dict, list]): The dictionary or list of dictionaries to search.\n\n Yields:\n Any: Yields all values that match the provided key.\n\n Examples:\n >>> d = {'a': 1, 'b': {'c': 2, 'a': 3}, 'd': [{'a': 4}, {'e': 5}]}\n >>> list(search_dict_by_key('a', d))\n [1, 3, 4]\n \"\"\"\n if isinstance(d, dict):\n if key in d:\n yield d[key]\n for k, v in d.items():\n yield from search_dict_by_key(key, v)\n elif isinstance(d, list):\n for v in d:\n yield from search_dict_by_key(key, v)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.search_dict_values","title":"search_dict_values","text":"search_dict_values(d, *regexes)\n
Recursively search a dictionary's values based on provided regex patterns.
Parameters:
d
(Union[dict, list, str]
) \u2013 The dictionary, list, or string to search.
*regexes
\u2013 Arbitrary number of compiled regex patterns.
Returns:
Generator
\u2013 Yields matching values based on the provided regex patterns.
Examples:
>>> dict_to_search = {\n... \"key1\": {\n... \"key2\": [\n... {\n... \"key3\": \"A URL: https://www.evilcorp.com\"\n... }\n... ]\n... }\n... }\n>>> url_regexes = re.compile(r'https?://[^\\s<>\"]+|www\\.[^\\s<>\"]+')\n>>> list(search_dict_values(dict_to_search, url_regexes))\n[\"https://www.evilcorp.com\"]\n
Source code in bbot/core/helpers/misc.py
def search_dict_values(d, *regexes):\n \"\"\"Recursively search a dictionary's values based on provided regex patterns.\n\n Args:\n d (Union[dict, list, str]): The dictionary, list, or string to search.\n *regexes: Arbitrary number of compiled regex patterns.\n\n Returns:\n Generator: Yields matching values based on the provided regex patterns.\n\n Examples:\n >>> dict_to_search = {\n ... \"key1\": {\n ... \"key2\": [\n ... {\n ... \"key3\": \"A URL: https://www.evilcorp.com\"\n ... }\n ... ]\n ... }\n ... }\n >>> url_regexes = re.compile(r'https?://[^\\\\s<>\"]+|www\\\\.[^\\\\s<>\"]+')\n >>> list(search_dict_values(dict_to_search, url_regexes))\n [\"https://www.evilcorp.com\"]\n \"\"\"\n\n results = set()\n if isinstance(d, str):\n for r in regexes:\n for match in r.finditer(d):\n result = match.group()\n h = hash(result)\n if h not in results:\n results.add(h)\n yield result\n elif isinstance(d, dict):\n for _, v in d.items():\n yield from search_dict_values(v, *regexes)\n elif isinstance(d, list):\n for v in d:\n yield from search_dict_values(v, *regexes)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.search_format_dict","title":"search_format_dict","text":"search_format_dict(d, **kwargs)\n
Recursively format string values in a dictionary or list using the provided keyword arguments.
Parameters:
d
(Union[dict, list, str]
) \u2013 The dictionary, list, or string to format.
**kwargs
\u2013 Arbitrary keyword arguments used for string formatting.
Returns:
Union[dict, list, str]: The formatted dictionary, list, or string.
Examples:
>>> search_format_dict({\"test\": \"#{name} is awesome\"}, name=\"keanu\")\n{\"test\": \"keanu is awesome\"}\n
Source code in bbot/core/helpers/misc.py
def search_format_dict(d, **kwargs):\n \"\"\"Recursively format string values in a dictionary or list using the provided keyword arguments.\n\n Args:\n d (Union[dict, list, str]): The dictionary, list, or string to format.\n **kwargs: Arbitrary keyword arguments used for string formatting.\n\n Returns:\n Union[dict, list, str]: The formatted dictionary, list, or string.\n\n Examples:\n >>> search_format_dict({\"test\": \"#{name} is awesome\"}, name=\"keanu\")\n {\"test\": \"keanu is awesome\"}\n \"\"\"\n if isinstance(d, dict):\n return {k: search_format_dict(v, **kwargs) for k, v in d.items()}\n elif isinstance(d, list):\n return [search_format_dict(v, **kwargs) for v in d]\n elif isinstance(d, str):\n for find, replace in kwargs.items():\n find = \"#{\" + str(find) + \"}\"\n d = d.replace(find, replace)\n return d\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.sha1","title":"sha1","text":"sha1(data)\n
Computes the SHA-1 hash of the given data.
Parameters:
data
(str or dict
) \u2013 The data to hash. If a dictionary, it is first converted to a JSON string with sorted keys.
Returns:
hashlib.Hash: SHA-1 hash object of the input data.
Examples:
>>> sha1(\"asdf\").hexdigest()\n'3da541559918a808c2402bba5012f6c60b27661c'\n
Source code in bbot/core/helpers/misc.py
def sha1(data):\n \"\"\"\n Computes the SHA-1 hash of the given data.\n\n Args:\n data (str or dict): The data to hash. If a dictionary, it is first converted to a JSON string with sorted keys.\n\n Returns:\n hashlib.Hash: SHA-1 hash object of the input data.\n\n Examples:\n >>> sha1(\"asdf\").hexdigest()\n '3da541559918a808c2402bba5012f6c60b27661c'\n \"\"\"\n from hashlib import sha1 as hashlib_sha1\n\n if isinstance(data, dict):\n data = json.dumps(data, sort_keys=True)\n return hashlib_sha1(smart_encode(data))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_decode","title":"smart_decode","text":"smart_decode(data)\n
Decodes the input data to a UTF-8 string, silently ignoring errors.
Parameters:
data
(str or bytes
) \u2013 The data to decode.
Returns:
str
\u2013 The decoded string.
Examples:
>>> smart_decode(b\"asdf\")\n\"asdf\"\n>>> smart_decode(\"asdf\")\n\"asdf\"\n
Source code in bbot/core/helpers/misc.py
def smart_decode(data):\n \"\"\"\n Decodes the input data to a UTF-8 string, silently ignoring errors.\n\n Args:\n data (str or bytes): The data to decode.\n\n Returns:\n str: The decoded string.\n\n Examples:\n >>> smart_decode(b\"asdf\")\n \"asdf\"\n >>> smart_decode(\"asdf\")\n \"asdf\"\n \"\"\"\n if isinstance(data, bytes):\n return data.decode(\"utf-8\", errors=\"ignore\")\n else:\n return str(data)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_decode_punycode","title":"smart_decode_punycode","text":"smart_decode_punycode(text: str) -> str\n
xn--eckwd4c7c.xn--zckzah --> \u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8
Source code inbbot/core/helpers/misc.py
def smart_decode_punycode(text: str) -> str:\n \"\"\"\n xn--eckwd4c7c.xn--zckzah --> \u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8\n \"\"\"\n import idna\n\n host, before, after = extract_host(text)\n if host is None:\n return text\n\n try:\n host = idna.decode(host)\n except UnicodeError:\n pass # If decoding fails, leave the host as it is\n\n return f\"{before}{host}{after}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_encode","title":"smart_encode","text":"smart_encode(data)\n
Encodes the input data to bytes using UTF-8 encoding, silently ignoring errors.
Parameters:
data
(str or bytes
) \u2013 The data to encode.
Returns:
bytes
\u2013 The encoded bytes.
Examples:
>>> smart_encode(\"asdf\")\nb\"asdf\"\n>>> smart_encode(b\"asdf\")\nb\"asdf\"\n
Source code in bbot/core/helpers/misc.py
def smart_encode(data):\n \"\"\"\n Encodes the input data to bytes using UTF-8 encoding, silently ignoring errors.\n\n Args:\n data (str or bytes): The data to encode.\n\n Returns:\n bytes: The encoded bytes.\n\n Examples:\n >>> smart_encode(\"asdf\")\n b\"asdf\"\n >>> smart_encode(b\"asdf\")\n b\"asdf\"\n \"\"\"\n if isinstance(data, bytes):\n return data\n return str(data).encode(\"utf-8\", errors=\"ignore\")\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.smart_encode_punycode","title":"smart_encode_punycode","text":"smart_encode_punycode(text: str) -> str\n
\u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8 --> xn--eckwd4c7c.xn--zckzah
Source code inbbot/core/helpers/misc.py
def smart_encode_punycode(text: str) -> str:\n \"\"\"\n \u30c9\u30e1\u30a4\u30f3.\u30c6\u30b9\u30c8 --> xn--eckwd4c7c.xn--zckzah\n \"\"\"\n import idna\n\n host, before, after = extract_host(text)\n if host is None:\n return text\n\n try:\n host = idna.encode(host).decode(errors=\"ignore\")\n except UnicodeError:\n pass # If encoding fails, leave the host as it is\n\n return f\"{before}{host}{after}\"\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.split_domain","title":"split_domain","text":"split_domain(hostname)\n
Splits the hostname into its subdomain and registered domain components.
Parameters:
hostname
(str
) \u2013 The full hostname to be split.
Returns:
tuple
\u2013 A tuple containing the subdomain and registered domain.
Examples:
>>> split_domain(\"www.internal.evilcorp.co.uk\")\n(\"www.internal\", \"evilcorp.co.uk\")\n
Notes tldextract
function to first break down the hostname.bbot/core/helpers/misc.py
def split_domain(hostname):\n \"\"\"\n Splits the hostname into its subdomain and registered domain components.\n\n Args:\n hostname (str): The full hostname to be split.\n\n Returns:\n tuple: A tuple containing the subdomain and registered domain.\n\n Examples:\n >>> split_domain(\"www.internal.evilcorp.co.uk\")\n (\"www.internal\", \"evilcorp.co.uk\")\n\n Notes:\n - Utilizes the `tldextract` function to first break down the hostname.\n \"\"\"\n if is_ip(hostname):\n return (\"\", hostname)\n parsed = tldextract(hostname)\n subdomain = parsed.subdomain\n domain = parsed.registered_domain\n if not domain:\n split = hostname.split(\".\")\n subdomain = \".\".join(split[:-2])\n domain = \".\".join(split[-2:])\n return (subdomain, domain)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.split_host_port","title":"split_host_port","text":"split_host_port(d)\n
Parse a string containing a host and port into a tuple.
This function takes an input string d
and returns a tuple containing the host and port. The host is converted to its appropriate IP address type if possible. The port is inferred based on the scheme if not provided.
Parameters:
d
(str
) \u2013 The input string containing the host and possibly the port.
Returns:
Tuple[Union[IPv4Address, IPv6Address, str], Optional[int]]: Tuple containing the host and port.
Examples:
>>> split_host_port(\"evilcorp.com:443\")\n(\"evilcorp.com\", 443)\n
>>> split_host_port(\"192.168.1.1:443\")\n(IPv4Address('192.168.1.1'), 443)\n
>>> split_host_port(\"[dead::beef]:443\")\n(IPv6Address('dead::beef'), 443)\n
Notes bbot/core/helpers/misc.py
def split_host_port(d):\n \"\"\"\n Parse a string containing a host and port into a tuple.\n\n This function takes an input string `d` and returns a tuple containing the host and port.\n The host is converted to its appropriate IP address type if possible. The port is inferred\n based on the scheme if not provided.\n\n Args:\n d (str): The input string containing the host and possibly the port.\n\n Returns:\n Tuple[Union[IPv4Address, IPv6Address, str], Optional[int]]: Tuple containing the host and port.\n\n Examples:\n >>> split_host_port(\"evilcorp.com:443\")\n (\"evilcorp.com\", 443)\n\n >>> split_host_port(\"192.168.1.1:443\")\n (IPv4Address('192.168.1.1'), 443)\n\n >>> split_host_port(\"[dead::beef]:443\")\n (IPv6Address('dead::beef'), 443)\n\n Notes:\n - If port is not provided, it is inferred based on the scheme:\n - For \"https\" and \"wss\", port 443 is used.\n - For \"http\" and \"ws\", port 80 is used.\n \"\"\"\n d = str(d)\n host = None\n port = None\n scheme = None\n if is_ip(d):\n return make_ip_type(d), port\n\n match = bbot_regexes.split_host_port_regex.match(d)\n if match is None:\n raise ValueError(f'split_port() failed to parse \"{d}\"')\n scheme = match.group(\"scheme\")\n netloc = match.group(\"netloc\")\n if netloc is None:\n raise ValueError(f'split_port() failed to parse \"{d}\"')\n\n match = bbot_regexes.extract_open_port_regex.match(netloc)\n if match is None:\n raise ValueError(f'split_port() failed to parse netloc \"{netloc}\"')\n\n host = match.group(2)\n if host is None:\n host = match.group(1)\n if host is None:\n raise ValueError(f'split_port() failed to locate host in netloc \"{netloc}\"')\n\n port = match.group(3)\n if port is None and scheme is not None:\n scheme = scheme.lower()\n if scheme in (\"https\", \"wss\"):\n port = 443\n elif scheme in (\"http\", \"ws\"):\n port = 80\n elif port is not None:\n with suppress(ValueError):\n port = int(port)\n\n return make_ip_type(host), port\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.split_list","title":"split_list","text":"split_list(alist, wanted_parts=2)\n
Splits a list into a specified number of approximately equal parts.
Parameters:
alist
(list
) \u2013 The list to be split.
wanted_parts
(int
, default: 2
) \u2013 The number of parts to split the list into.
Returns:
list
\u2013 A list of lists, each containing a portion of the original list.
Examples:
>>> split_list([1, 2, 3, 4, 5])\n[[1, 2], [3, 4, 5]]\n
Source code in bbot/core/helpers/misc.py
def split_list(alist, wanted_parts=2):\n \"\"\"\n Splits a list into a specified number of approximately equal parts.\n\n Args:\n alist (list): The list to be split.\n wanted_parts (int): The number of parts to split the list into.\n\n Returns:\n list: A list of lists, each containing a portion of the original list.\n\n Examples:\n >>> split_list([1, 2, 3, 4, 5])\n [[1, 2], [3, 4, 5]]\n \"\"\"\n length = len(alist)\n return [alist[i * length // wanted_parts : (i + 1) * length // wanted_parts] for i in range(wanted_parts)]\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.str_or_file","title":"str_or_file","text":"str_or_file(s)\n
Reads a string or file and yields its content line-by-line.
This function tries to open the given string s
as a file and yields its lines. If it fails to open s
as a file, it treats s
as a regular string and yields it as is.
Parameters:
s
(str
) \u2013 The string or file path to read.
Yields:
str
\u2013 Either lines from the file or the original string.
Examples:
>>> list(str_or_file(\"file.txt\"))\n['file_line1', 'file_line2', 'file_line3']\n>>> list(str_or_file(\"not_a_file\"))\n['not_a_file']\n
Source code in bbot/core/helpers/misc.py
def str_or_file(s):\n \"\"\"Reads a string or file and yields its content line-by-line.\n\n This function tries to open the given string `s` as a file and yields its lines.\n If it fails to open `s` as a file, it treats `s` as a regular string and yields it as is.\n\n Args:\n s (str): The string or file path to read.\n\n Yields:\n str: Either lines from the file or the original string.\n\n Examples:\n >>> list(str_or_file(\"file.txt\"))\n ['file_line1', 'file_line2', 'file_line3']\n >>> list(str_or_file(\"not_a_file\"))\n ['not_a_file']\n \"\"\"\n try:\n with open(s, errors=\"ignore\") as f:\n for line in f:\n yield line.rstrip(\"\\r\\n\")\n except OSError:\n yield s\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.subdomain_depth","title":"subdomain_depth","text":"subdomain_depth(d)\n
Calculate the depth of subdomains within a given domain name.
Parameters:
d
(str
) \u2013 The domain name to analyze.
Returns:
int
\u2013 The depth of the subdomain. For example, a hostname \"5.4.3.2.1.evilcorp.com\"
has a subdomain depth of 5.
bbot/core/helpers/misc.py
def subdomain_depth(d):\n \"\"\"\n Calculate the depth of subdomains within a given domain name.\n\n Args:\n d (str): The domain name to analyze.\n\n Returns:\n int: The depth of the subdomain. For example, a hostname \"5.4.3.2.1.evilcorp.com\"\n has a subdomain depth of 5.\n \"\"\"\n subdomain, domain = split_domain(d)\n if not subdomain:\n return 0\n return subdomain.count(\".\") + 1\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.swap_status","title":"swap_status","text":"swap_status()\n
Return statistics on swap memory consumption.
The function returns a psutil
named tuple that contains statistics on system swap memory usage, such as total swap, used swap, free swap, and more.
Returns:
psutil._common.sswap: A named tuple representing various statistics about system swap memory usage.
Examples:
>>> swap = swap_status()\n>>> swap.total\n4294967296\n
>>> swap = swap_status()\n>>> swap.used\n2097152\n
Source code in bbot/core/helpers/misc.py
def swap_status():\n \"\"\"Return statistics on swap memory consumption.\n\n The function returns a `psutil` named tuple that contains statistics on\n system swap memory usage, such as total swap, used swap, free swap, and more.\n\n Returns:\n psutil._common.sswap: A named tuple representing various statistics\n about system swap memory usage.\n\n Examples:\n >>> swap = swap_status()\n >>> swap.total\n 4294967296\n\n >>> swap = swap_status()\n >>> swap.used\n 2097152\n \"\"\"\n import psutil\n\n return psutil.swap_memory()\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.tagify","title":"tagify","text":"tagify(s, delimiter=None, maxlen=None)\n
Sanitize a string into a tag-friendly format.
Converts a given string to lowercase and replaces all characters not matching [a-z0-9] with hyphens. Optionally truncates the result to 'maxlen' characters.
Parameters:
s
(str
) \u2013 The input string to sanitize.
maxlen
(int
, default: None
) \u2013 The maximum length for the tag. Defaults to None.
Returns:
str
\u2013 A sanitized, tag-friendly string.
Examples:
>>> tagify(\"HTTP Web Title\")\n'http-web-title'\n>>> tagify(\"HTTP Web Title\", maxlen=8)\n'http-web'\n
Source code in bbot/core/helpers/misc.py
def tagify(s, delimiter=None, maxlen=None):\n \"\"\"Sanitize a string into a tag-friendly format.\n\n Converts a given string to lowercase and replaces all characters not matching\n [a-z0-9] with hyphens. Optionally truncates the result to 'maxlen' characters.\n\n Args:\n s (str): The input string to sanitize.\n maxlen (int, optional): The maximum length for the tag. Defaults to None.\n\n Returns:\n str: A sanitized, tag-friendly string.\n\n Examples:\n >>> tagify(\"HTTP Web Title\")\n 'http-web-title'\n >>> tagify(\"HTTP Web Title\", maxlen=8)\n 'http-web'\n \"\"\"\n if delimiter is None:\n delimiter = \"-\"\n ret = str(s).lower()\n return tag_filter_regex.sub(delimiter, ret)[:maxlen].strip(delimiter)\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.tldextract","title":"tldextract","text":"tldextract(data)\n
Extracts the subdomain, domain, and suffix from a URL string.
Parameters:
data
(str
) \u2013 The URL string to be processed.
Returns:
ExtractResult
\u2013 A named tuple containing the subdomain, domain, and suffix.
Examples:
>>> tldextract(\"www.evilcorp.co.uk\")\nExtractResult(subdomain='www', domain='evilcorp', suffix='co.uk')\n
Notes smart_decode
to preprocess the data.tldextract
library for extraction.bbot/core/helpers/misc.py
def tldextract(data):\n \"\"\"\n Extracts the subdomain, domain, and suffix from a URL string.\n\n Args:\n data (str): The URL string to be processed.\n\n Returns:\n ExtractResult: A named tuple containing the subdomain, domain, and suffix.\n\n Examples:\n >>> tldextract(\"www.evilcorp.co.uk\")\n ExtractResult(subdomain='www', domain='evilcorp', suffix='co.uk')\n\n Notes:\n - Utilizes `smart_decode` to preprocess the data.\n - Makes use of the `tldextract` library for extraction.\n \"\"\"\n import tldextract as _tldextract\n\n return _tldextract.extract(smart_decode(data))\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.top_tcp_ports","title":"top_tcp_ports","text":"top_tcp_ports(n, as_string=False)\n
Returns the top n TCP ports as evaluated by nmap
Source code inbbot/core/helpers/misc.py
def top_tcp_ports(n, as_string=False):\n \"\"\"\n Returns the top *n* TCP ports as evaluated by nmap\n \"\"\"\n top_ports_file = Path(__file__).parent.parent.parent / \"wordlists\" / \"top_open_ports_nmap.txt\"\n\n global top_ports_cache\n if top_ports_cache is None:\n # Read the open ports from the file\n with open(top_ports_file, \"r\") as f:\n top_ports_cache = [int(line.strip()) for line in f]\n\n # If n is greater than the length of the ports list, add remaining ports from range(1, 65536)\n unique_ports = set(top_ports_cache)\n top_ports_cache.extend([port for port in range(1, 65536) if port not in unique_ports])\n\n top_ports = top_ports_cache[:n]\n if as_string:\n return \",\".join([str(s) for s in top_ports])\n return top_ports\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.truncate_filename","title":"truncate_filename","text":"truncate_filename(file_path, max_length=255)\n
Truncate the filename while preserving the file extension to ensure the total path length does not exceed the maximum length.
Parameters:
file_path
(str
) \u2013 The original file path.
max_length
(int
, default: 255
) \u2013 The maximum allowed length for the total path. Default is 255.
Returns:
pathlib.Path: A new Path object with the truncated filename.
Raises:
ValueError
\u2013 If the directory path is too long to accommodate any filename within the limit.
truncate_filename('/path/to/example_long_filename.txt', 20) PosixPath('/path/to/example.txt')
Source code inbbot/core/helpers/misc.py
def truncate_filename(file_path, max_length=255):\n \"\"\"\n Truncate the filename while preserving the file extension to ensure the total path length does not exceed the maximum length.\n\n Args:\n file_path (str): The original file path.\n max_length (int): The maximum allowed length for the total path. Default is 255.\n\n Returns:\n pathlib.Path: A new Path object with the truncated filename.\n\n Raises:\n ValueError: If the directory path is too long to accommodate any filename within the limit.\n\n Example:\n >>> truncate_filename('/path/to/example_long_filename.txt', 20)\n PosixPath('/path/to/example.txt')\n \"\"\"\n p = Path(file_path)\n directory, stem, suffix = p.parent, p.stem, p.suffix\n\n max_filename_length = max_length - len(str(directory)) - len(suffix) - 1 # 1 for the '/' separator\n\n if max_filename_length <= 0:\n raise ValueError(\"The directory path is too long to accommodate any filename within the limit.\")\n\n if len(stem) > max_filename_length:\n truncated_stem = stem[:max_filename_length]\n else:\n truncated_stem = stem\n\n new_path = directory / (truncated_stem + suffix)\n return new_path\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.url_parents","title":"url_parents","text":"url_parents(u)\n
Generate a list of parent URLs for a given URL string.
This function takes an input string u
representing a URL and generates a list of its parent URLs in decreasing order of specificity.
Parameters:
u
(str
) \u2013 The input string representing a URL.
Returns:
List[str]: A list of parent URLs of the input URL in decreasing order of specificity.
Examples:
>>> url_parents(\"http://www.evilcorp.co.uk/admin/tools/cmd.php\")\n[\"http://www.evilcorp.co.uk/admin/tools/\", \"http://www.evilcorp.co.uk/admin/\", \"http://www.evilcorp.co.uk/\"]\n
Notes parent_url
until it returns None.bbot/core/helpers/misc.py
def url_parents(u):\n \"\"\"\n Generate a list of parent URLs for a given URL string.\n\n This function takes an input string `u` representing a URL and generates a list of its parent URLs in decreasing order of specificity.\n\n Args:\n u (str): The input string representing a URL.\n\n Returns:\n List[str]: A list of parent URLs of the input URL in decreasing order of specificity.\n\n Examples:\n >>> url_parents(\"http://www.evilcorp.co.uk/admin/tools/cmd.php\")\n [\"http://www.evilcorp.co.uk/admin/tools/\", \"http://www.evilcorp.co.uk/admin/\", \"http://www.evilcorp.co.uk/\"]\n\n Notes:\n - The list is generated by continuously calling `parent_url` until it returns None.\n - All components of the URL except for the path are preserved.\n \"\"\"\n parent_list = []\n while 1:\n parent = parent_url(u)\n if parent == None:\n return parent_list\n elif parent not in parent_list:\n parent_list.append(parent)\n u = parent\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.verify_sudo_password","title":"verify_sudo_password","text":"verify_sudo_password(sudo_pass)\n
Verify if the given sudo password is correct.
This function checks whether the sudo password provided is valid for the current user. It runs a command with sudo, feeding in the password via stdin, and checks the return code.
Parameters:
sudo_pass
(str
) \u2013 The sudo password to verify.
Returns:
bool
\u2013 True if the sudo password is correct, False otherwise.
Examples:
>>> verify_sudo_password(\"mysecretpassword\")\nTrue\n
Source code in bbot/core/helpers/misc.py
def verify_sudo_password(sudo_pass):\n \"\"\"Verify if the given sudo password is correct.\n\n This function checks whether the sudo password provided is valid for the current user.\n It runs a command with sudo, feeding in the password via stdin, and checks the return code.\n\n Args:\n sudo_pass (str): The sudo password to verify.\n\n Returns:\n bool: True if the sudo password is correct, False otherwise.\n\n Examples:\n >>> verify_sudo_password(\"mysecretpassword\")\n True\n \"\"\"\n try:\n sp.run(\n [\"sudo\", \"-S\", \"-k\", \"true\"],\n input=smart_encode(sudo_pass),\n stderr=sp.DEVNULL,\n stdout=sp.DEVNULL,\n check=True,\n )\n except sp.CalledProcessError:\n return False\n return True\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.weighted_shuffle","title":"weighted_shuffle","text":"weighted_shuffle(items, weights)\n
Shuffles a list of items based on their corresponding weights.
Parameters:
items
(list
) \u2013 The list of items to shuffle.
weights
(list
) \u2013 The list of weights corresponding to each item.
Returns:
list
\u2013 A new list containing the shuffled items.
Examples:
>>> items = ['apple', 'banana', 'cherry']\n>>> weights = [0.4, 0.5, 0.1]\n>>> weighted_shuffle(items, weights)\n['banana', 'apple', 'cherry']\n>>> weighted_shuffle(items, weights)\n['apple', 'banana', 'cherry']\n>>> weighted_shuffle(items, weights)\n['apple', 'banana', 'cherry']\n>>> weighted_shuffle(items, weights)\n['banana', 'apple', 'cherry']\n
Note The sum of all weights does not have to be 1. They will be normalized internally.
Source code inbbot/core/helpers/misc.py
def weighted_shuffle(items, weights):\n \"\"\"\n Shuffles a list of items based on their corresponding weights.\n\n Args:\n items (list): The list of items to shuffle.\n weights (list): The list of weights corresponding to each item.\n\n Returns:\n list: A new list containing the shuffled items.\n\n Examples:\n >>> items = ['apple', 'banana', 'cherry']\n >>> weights = [0.4, 0.5, 0.1]\n >>> weighted_shuffle(items, weights)\n ['banana', 'apple', 'cherry']\n >>> weighted_shuffle(items, weights)\n ['apple', 'banana', 'cherry']\n >>> weighted_shuffle(items, weights)\n ['apple', 'banana', 'cherry']\n >>> weighted_shuffle(items, weights)\n ['banana', 'apple', 'cherry']\n\n Note:\n The sum of all weights does not have to be 1. They will be normalized internally.\n \"\"\"\n # Create a list of tuples where each tuple is (item, weight)\n pool = list(zip(items, weights))\n\n shuffled_items = []\n\n # While there are still items to be chosen...\n while pool:\n # Normalize weights\n total = sum(weight for item, weight in pool)\n weights = [weight / total for item, weight in pool]\n\n # Choose an index based on weight\n chosen_index = random.choices(range(len(pool)), weights=weights, k=1)[0]\n\n # Add the chosen item to the shuffled list\n chosen_item, chosen_weight = pool.pop(chosen_index)\n shuffled_items.append(chosen_item)\n\n return shuffled_items\n
"},{"location":"dev/helpers/misc/#bbot.core.helpers.misc.which","title":"which","text":"which(*executables)\n
Finds the full path of the first available executable from a list of executables.
Parameters:
*executables
(str
, default: ()
) \u2013 One or more executable names to search for.
Returns:
str
\u2013 The full path of the first available executable, or None if none are found.
Examples:
>>> which(\"python\", \"python3\")\n\"/usr/bin/python\"\n
Source code in bbot/core/helpers/misc.py
def which(*executables):\n \"\"\"Finds the full path of the first available executable from a list of executables.\n\n Args:\n *executables (str): One or more executable names to search for.\n\n Returns:\n str: The full path of the first available executable, or None if none are found.\n\n Examples:\n >>> which(\"python\", \"python3\")\n \"/usr/bin/python\"\n \"\"\"\n import shutil\n\n for e in executables:\n location = shutil.which(e)\n if location:\n return location\n
"},{"location":"dev/helpers/web/","title":"Web","text":"These are helpers for making various web requests.
Note that these helpers can be invoked directly from self.helpers
, e.g.:
self.helpers.request(\"https://www.evilcorp.com\")\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper","title":"WebHelper","text":" Bases: EngineClient
bbot/core/helpers/web/web.py
class WebHelper(EngineClient):\n\n SERVER_CLASS = HTTPEngine\n ERROR_CLASS = WebError\n\n \"\"\"\n Main utility class for managing HTTP operations in BBOT. It serves as a wrapper around the BBOTAsyncClient,\n which itself is a subclass of httpx.AsyncClient. The class provides functionalities to make HTTP requests,\n download files, and handle cached wordlists.\n\n Attributes:\n parent_helper (object): The parent helper object containing scan configurations.\n http_debug (bool): Flag to indicate whether HTTP debugging is enabled.\n ssl_verify (bool): Flag to indicate whether SSL verification is enabled.\n web_client (BBOTAsyncClient): An instance of BBOTAsyncClient for making HTTP requests.\n client_only_options (tuple): A tuple of options only applicable to the web client.\n\n Examples:\n Basic web request:\n >>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n Download file:\n >>> filename = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\")\n\n Download wordlist (cached for 30 days by default):\n >>> filename = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n \"\"\"\n\n def __init__(self, parent_helper):\n self.parent_helper = parent_helper\n self.preset = self.parent_helper.preset\n self.config = self.preset.config\n self.web_config = self.config.get(\"web\", {})\n self.web_spider_depth = self.web_config.get(\"spider_depth\", 1)\n self.web_spider_distance = self.web_config.get(\"spider_distance\", 0)\n self.target = self.preset.target\n self.ssl_verify = self.config.get(\"ssl_verify\", False)\n engine_debug = self.config.get(\"engine\", {}).get(\"debug\", False)\n super().__init__(\n server_kwargs={\"config\": self.config, \"target\": self.parent_helper.preset.target.radix_only},\n debug=engine_debug,\n )\n\n def AsyncClient(self, *args, **kwargs):\n from .client import BBOTAsyncClient\n\n return BBOTAsyncClient.from_config(self.config, self.target, *args, persist_cookies=False, **kwargs)\n\n async def request(self, *args, **kwargs):\n \"\"\"\n Asynchronous function for making HTTP requests, intended to be the most basic web request function\n used widely across BBOT and within this helper class. Handles various exceptions and timeouts\n that might occur during the request.\n\n This function automatically respects the scan's global timeout, proxy, headers, etc.\n Headers you specify will be merged with the scan's. Your arguments take ultimate precedence,\n meaning you can override the scan's values if you want.\n\n Args:\n url (str): The URL to send the request to.\n method (str, optional): The HTTP method to use for the request. Defaults to 'GET'.\n headers (dict, optional): Dictionary of HTTP headers to send with the request.\n params (dict, optional): Dictionary, list of tuples, or bytes to send in the query string.\n cookies (dict, optional): Dictionary or CookieJar object containing cookies.\n json (Any, optional): A JSON serializable Python object to send in the body.\n data (dict, optional): Dictionary, list of tuples, or bytes to send in the body.\n files (dict, optional): Dictionary of 'name': file-like-objects for multipart encoding upload.\n auth (tuple, optional): Auth tuple to enable Basic/Digest/Custom HTTP auth.\n timeout (float, optional): The maximum time to wait for the request to complete.\n proxies (dict, optional): Dictionary mapping protocol schemes to proxy URLs.\n allow_redirects (bool, optional): Enables or disables redirection. Defaults to None.\n stream (bool, optional): Enables or disables response streaming.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n client (httpx.AsyncClient, optional): A specific httpx.AsyncClient to use for the request. Defaults to self.web_client.\n cache_for (int, optional): Time in seconds to cache the request. Not used currently. Defaults to None.\n\n Raises:\n httpx.TimeoutException: If the request times out.\n httpx.ConnectError: If the connection fails.\n httpx.RequestError: For other request-related errors.\n\n Returns:\n httpx.Response or None: The HTTP response object returned by the httpx library.\n\n Examples:\n >>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n >>> response = await self.helpers.request(\"https://api.evilcorp.com/\", method=\"POST\", data=\"stuff\")\n\n Note:\n If the web request fails, it will return None unless `raise_error` is `True`.\n \"\"\"\n return await self.run_and_return(\"request\", *args, **kwargs)\n\n async def request_batch(self, urls, *args, **kwargs):\n \"\"\"\n Given a list of URLs, request them in parallel and yield responses as they come in.\n\n Args:\n urls (list[str]): List of URLs to visit\n *args: Positional arguments to pass through to httpx\n **kwargs: Keyword arguments to pass through to httpx\n\n Examples:\n >>> async for url, response in self.helpers.request_batch(urls, headers={\"X-Test\": \"Test\"}):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_batch\", urls, *args, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n async def request_custom_batch(self, urls_and_kwargs):\n \"\"\"\n Make web requests in parallel with custom options for each request. Yield responses as they come in.\n\n Similar to `request_batch` except it allows individual arguments for each URL.\n\n Args:\n urls_and_kwargs (list[tuple]): List of tuples in the format: (url, kwargs, custom_tracker)\n where custom_tracker is an optional value for your own internal use. You may use it to\n help correlate requests, etc.\n\n Examples:\n >>> urls_and_kwargs = [\n >>> (\"http://evilcorp.com/1\", {\"method\": \"GET\"}, \"request-1\"),\n >>> (\"http://evilcorp.com/2\", {\"method\": \"POST\"}, \"request-2\"),\n >>> ]\n >>> async for url, kwargs, custom_tracker, response in self.helpers.request_custom_batch(\n >>> urls_and_kwargs\n >>> ):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_custom_batch\", urls_and_kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n\n async def download(self, url, **kwargs):\n \"\"\"\n Asynchronous function for downloading files from a given URL. Supports caching with an optional\n time period in hours via the \"cache_hrs\" keyword argument. In case of successful download,\n returns the full path of the saved filename. If the download fails, returns None.\n\n Args:\n url (str): The URL of the file to download.\n filename (str, optional): The filename to save the downloaded file as.\n If not provided, will generate based on URL.\n max_size (str or int): Maximum filesize as a string (\"5MB\") or integer in bytes.\n cache_hrs (float, optional): The number of hours to cache the downloaded file.\n A negative value disables caching. Defaults to -1.\n method (str, optional): The HTTP method to use for the request, defaults to 'GET'.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n **kwargs: Additional keyword arguments to pass to the httpx request.\n\n Returns:\n Path or None: The full path of the downloaded file as a Path object if successful, otherwise None.\n\n Examples:\n >>> filepath = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\", cache_hrs=24)\n \"\"\"\n success = False\n filename = kwargs.pop(\"filename\", self.parent_helper.cache_filename(url))\n filename = truncate_filename(Path(filename).resolve())\n kwargs[\"filename\"] = filename\n max_size = kwargs.pop(\"max_size\", None)\n if max_size is not None:\n max_size = self.parent_helper.human_to_bytes(max_size)\n kwargs[\"max_size\"] = max_size\n cache_hrs = float(kwargs.pop(\"cache_hrs\", -1))\n if cache_hrs > 0 and self.parent_helper.is_cached(url):\n log.debug(f\"{url} is cached at {self.parent_helper.cache_filename(url)}\")\n success = True\n else:\n success = await self.run_and_return(\"download\", url, **kwargs)\n\n if success:\n return filename\n\n async def wordlist(self, path, lines=None, **kwargs):\n \"\"\"\n Asynchronous function for retrieving wordlists, either from a local path or a URL.\n Allows for optional line-based truncation and caching. Returns the full path of the wordlist\n file or a truncated version of it.\n\n Args:\n path (str): The local or remote path of the wordlist.\n lines (int, optional): Number of lines to read from the wordlist.\n If specified, will return a truncated wordlist with this many lines.\n cache_hrs (float, optional): Number of hours to cache the downloaded wordlist.\n Defaults to 720 hours (30 days) for remote wordlists.\n **kwargs: Additional keyword arguments to pass to the 'download' function for remote wordlists.\n\n Returns:\n Path: The full path of the wordlist (or its truncated version) as a Path object.\n\n Raises:\n WordlistError: If the path is invalid or the wordlist could not be retrieved or found.\n\n Examples:\n Fetching full wordlist\n >>> wordlist_path = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n\n Fetching and truncating to the first 100 lines\n >>> wordlist_path = await self.helpers.wordlist(\"/root/rockyou.txt\", lines=100)\n \"\"\"\n if not path:\n raise WordlistError(f\"Invalid wordlist: {path}\")\n if not \"cache_hrs\" in kwargs:\n kwargs[\"cache_hrs\"] = 720\n if self.parent_helper.is_url(path):\n filename = await self.download(str(path), **kwargs)\n if filename is None:\n raise WordlistError(f\"Unable to retrieve wordlist from {path}\")\n else:\n filename = Path(path).resolve()\n if not filename.is_file():\n raise WordlistError(f\"Unable to find wordlist at {path}\")\n\n if lines is None:\n return filename\n else:\n lines = int(lines)\n with open(filename) as f:\n read_lines = f.readlines()\n cache_key = f\"{filename}:{lines}\"\n truncated_filename = self.parent_helper.cache_filename(cache_key)\n with open(truncated_filename, \"w\") as f:\n for line in read_lines[:lines]:\n f.write(line)\n return truncated_filename\n\n async def api_page_iter(self, url, page_size=100, json=True, next_key=None, **requests_kwargs):\n \"\"\"\n An asynchronous generator function for iterating through paginated API data.\n\n This function continuously makes requests to a specified API URL, incrementing the page number\n or applying a custom pagination function, and yields the received data one page at a time.\n It is well-suited for APIs that provide paginated results.\n\n Args:\n url (str): The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.\n page_size (int, optional): The number of items per page. Defaults to 100.\n json (bool, optional): If True, attempts to deserialize the response content to a JSON object. Defaults to True.\n next_key (callable, optional): A function that takes the last page's data and returns the URL for the next page. Defaults to None.\n **requests_kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Yields:\n dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.\n\n Note:\n The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.\n\n Examples:\n >>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')\n >>> try:\n >>> async for page in agen:\n >>> subdomains = page[\"subdomains\"]\n >>> self.hugesuccess(subdomains)\n >>> if not subdomains:\n >>> break\n >>> finally:\n >>> agen.aclose()\n \"\"\"\n page = 1\n offset = 0\n result = None\n while 1:\n if result and callable(next_key):\n try:\n new_url = next_key(result)\n except Exception as e:\n log.debug(f\"Failed to extract next page of results from {url}: {e}\")\n log.debug(traceback.format_exc())\n else:\n new_url = url.format(page=page, page_size=page_size, offset=offset)\n result = await self.request(new_url, **requests_kwargs)\n if result is None:\n log.verbose(f\"api_page_iter() got no response for {url}\")\n break\n try:\n if json:\n result = result.json()\n yield result\n except Exception:\n log.warning(f'Error in api_page_iter() for url: \"{new_url}\"')\n log.trace(traceback.format_exc())\n break\n finally:\n offset += page_size\n page += 1\n\n async def curl(self, *args, **kwargs):\n \"\"\"\n An asynchronous function that runs a cURL command with specified arguments and options.\n\n This function constructs and executes a cURL command based on the provided parameters.\n It offers support for various cURL options such as headers, post data, and cookies.\n\n Args:\n *args: Variable length argument list for positional arguments. Unused in this function.\n url (str): The URL for the cURL request. Mandatory.\n raw_path (bool, optional): If True, activates '--path-as-is' in cURL. Defaults to False.\n headers (dict, optional): A dictionary of HTTP headers to include in the request.\n ignore_bbot_global_settings (bool, optional): If True, ignores the global settings of BBOT. Defaults to False.\n post_data (dict, optional): A dictionary containing data to be sent in the request body.\n method (str, optional): The HTTP method to use for the request (e.g., 'GET', 'POST').\n cookies (dict, optional): A dictionary of cookies to include in the request.\n path_override (str, optional): Overrides the request-target to use in the HTTP request line.\n head_mode (bool, optional): If True, includes '-I' to fetch headers only. Defaults to None.\n raw_body (str, optional): Raw string to be sent in the body of the request.\n **kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Returns:\n str: The output of the cURL command.\n\n Raises:\n CurlError: If 'url' is not supplied.\n\n Examples:\n >>> output = await curl(url=\"https://example.com\", headers={\"X-Header\": \"Wat\"})\n >>> print(output)\n \"\"\"\n url = kwargs.get(\"url\", \"\")\n\n if not url:\n raise CurlError(\"No URL supplied to CURL helper\")\n\n curl_command = [\"curl\", url, \"-s\"]\n\n raw_path = kwargs.get(\"raw_path\", False)\n if raw_path:\n curl_command.append(\"--path-as-is\")\n\n # respect global ssl verify settings\n if self.ssl_verify is not True:\n curl_command.append(\"-k\")\n\n headers = kwargs.get(\"headers\", {})\n\n ignore_bbot_global_settings = kwargs.get(\"ignore_bbot_global_settings\", False)\n\n if ignore_bbot_global_settings:\n log.debug(\"ignore_bbot_global_settings enabled. Global settings will not be applied\")\n else:\n http_timeout = self.parent_helper.web_config.get(\"http_timeout\", 20)\n user_agent = self.parent_helper.web_config.get(\"user_agent\", \"BBOT\")\n\n if \"User-Agent\" not in headers:\n headers[\"User-Agent\"] = user_agent\n\n # only add custom headers if the URL is in-scope\n if self.parent_helper.preset.in_scope(url):\n for hk, hv in self.web_config.get(\"http_headers\", {}).items():\n headers[hk] = hv\n\n # add the timeout\n if not \"timeout\" in kwargs:\n timeout = http_timeout\n\n curl_command.append(\"-m\")\n curl_command.append(str(timeout))\n\n for k, v in headers.items():\n if isinstance(v, list):\n for x in v:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {x}\")\n\n else:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {v}\")\n\n post_data = kwargs.get(\"post_data\", {})\n if len(post_data.items()) > 0:\n curl_command.append(\"-d\")\n post_data_str = \"\"\n for k, v in post_data.items():\n post_data_str += f\"&{k}={v}\"\n curl_command.append(post_data_str.lstrip(\"&\"))\n\n method = kwargs.get(\"method\", \"\")\n if method:\n curl_command.append(\"-X\")\n curl_command.append(method)\n\n cookies = kwargs.get(\"cookies\", \"\")\n if cookies:\n curl_command.append(\"-b\")\n cookies_str = \"\"\n for k, v in cookies.items():\n cookies_str += f\"{k}={v}; \"\n curl_command.append(f'{cookies_str.rstrip(\" \")}')\n\n path_override = kwargs.get(\"path_override\", None)\n if path_override:\n curl_command.append(\"--request-target\")\n curl_command.append(f\"{path_override}\")\n\n head_mode = kwargs.get(\"head_mode\", None)\n if head_mode:\n curl_command.append(\"-I\")\n\n raw_body = kwargs.get(\"raw_body\", None)\n if raw_body:\n curl_command.append(\"-d\")\n curl_command.append(raw_body)\n\n output = (await self.parent_helper.run(curl_command)).stdout\n return output\n\n def beautifulsoup(\n self,\n markup,\n features=\"html.parser\",\n builder=None,\n parse_only=None,\n from_encoding=None,\n exclude_encodings=None,\n element_classes=None,\n **kwargs,\n ):\n \"\"\"\n Naviate, Search, Modify, Parse, or PrettyPrint HTML Content.\n More information at https://beautiful-soup-4.readthedocs.io/en/latest/\n\n Args:\n markup: A string or a file-like object representing markup to be parsed.\n features: Desirable features of the parser to be used.\n This may be the name of a specific parser (\"lxml\",\n \"lxml-xml\", \"html.parser\", or \"html5lib\") or it may be\n the type of markup to be used (\"html\", \"html5\", \"xml\").\n Defaults to 'html.parser'.\n builder: A TreeBuilder subclass to instantiate (or instance to use)\n instead of looking one up based on `features`.\n parse_only: A SoupStrainer. Only parts of the document\n matching the SoupStrainer will be considered.\n from_encoding: A string indicating the encoding of the\n document to be parsed.\n exclude_encodings = A list of strings indicating\n encodings known to be wrong.\n element_classes = A dictionary mapping BeautifulSoup\n classes like Tag and NavigableString, to other classes you'd\n like to be instantiated instead as the parse tree is\n built.\n **kwargs = For backwards compatibility purposes.\n\n Returns:\n soup: An instance of the BeautifulSoup class\n\n Todo:\n - Write tests for this function\n\n Examples:\n >>> soup = self.helpers.beautifulsoup(event.data[\"body\"], \"html.parser\")\n Perform an html parse of the 'markup' argument and return a soup instance\n\n >>> email_type = soup.find(type=\"email\")\n Searches the soup instance for all occurances of the passed in argument\n \"\"\"\n try:\n soup = BeautifulSoup(\n markup, features, builder, parse_only, from_encoding, exclude_encodings, element_classes, **kwargs\n )\n return soup\n except Exception as e:\n log.debug(f\"Error parsing beautifulsoup: {e}\")\n return False\n\n user_keywords = [re.compile(r, re.I) for r in [\"user\", \"login\", \"email\"]]\n pass_keywords = [re.compile(r, re.I) for r in [\"pass\"]]\n\n def is_login_page(self, html):\n \"\"\"\n Determines if the provided HTML content contains a login page.\n\n This function parses the HTML to search for forms with input fields typically used for\n authentication. If it identifies password fields or a combination of username and password\n fields, it returns True.\n\n Args:\n html (str): The HTML content to analyze.\n\n Returns:\n bool: True if the HTML contains a login page, otherwise False.\n\n Examples:\n >>> is_login_page('<form><input type=\"text\" name=\"username\"><input type=\"password\" name=\"password\"></form>')\n True\n\n >>> is_login_page('<form><input type=\"text\" name=\"search\"></form>')\n False\n \"\"\"\n try:\n soup = BeautifulSoup(html, \"html.parser\")\n except Exception as e:\n log.debug(f\"Error parsing html: {e}\")\n return False\n\n forms = soup.find_all(\"form\")\n\n # first, check for obvious password fields\n for form in forms:\n if form.find_all(\"input\", {\"type\": \"password\"}):\n return True\n\n # next, check for forms that have both a user-like and password-like field\n for form in forms:\n user_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.user_keywords)\n pass_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.pass_keywords)\n if user_fields and pass_fields:\n return True\n return False\n\n def response_to_json(self, response):\n \"\"\"\n Convert web response to JSON object, similar to the output of `httpx -irr -json`\n \"\"\"\n\n if response is None:\n return\n\n import mmh3\n from datetime import datetime\n from hashlib import md5, sha256\n from bbot.core.helpers.misc import tagify, urlparse, split_host_port, smart_decode\n\n request = response.request\n url = str(request.url)\n parsed_url = urlparse(url)\n netloc = parsed_url.netloc\n scheme = parsed_url.scheme.lower()\n host, port = split_host_port(f\"{scheme}://{netloc}\")\n\n raw_headers = \"\\r\\n\".join([f\"{k}: {v}\" for k, v in response.headers.items()])\n raw_headers_encoded = raw_headers.encode()\n\n headers = {}\n for k, v in response.headers.items():\n k = tagify(k, delimiter=\"_\")\n headers[k] = v\n\n j = {\n \"timestamp\": datetime.now().isoformat(),\n \"hash\": {\n \"body_md5\": md5(response.content).hexdigest(),\n \"body_mmh3\": mmh3.hash(response.content),\n \"body_sha256\": sha256(response.content).hexdigest(),\n # \"body_simhash\": \"TODO\",\n \"header_md5\": md5(raw_headers_encoded).hexdigest(),\n \"header_mmh3\": mmh3.hash(raw_headers_encoded),\n \"header_sha256\": sha256(raw_headers_encoded).hexdigest(),\n # \"header_simhash\": \"TODO\",\n },\n \"header\": headers,\n \"body\": smart_decode(response.content),\n \"content_type\": headers.get(\"content_type\", \"\").split(\";\")[0].strip(),\n \"url\": url,\n \"host\": str(host),\n \"port\": port,\n \"scheme\": scheme,\n \"method\": response.request.method,\n \"path\": parsed_url.path,\n \"raw_header\": raw_headers,\n \"status_code\": response.status_code,\n }\n\n return j\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.ERROR_CLASS","title":"ERROR_CLASS class-attribute
instance-attribute
","text":"ERROR_CLASS = WebError\n
Main utility class for managing HTTP operations in BBOT. It serves as a wrapper around the BBOTAsyncClient, which itself is a subclass of httpx.AsyncClient. The class provides functionalities to make HTTP requests, download files, and handle cached wordlists.
Attributes:
parent_helper
(object
) \u2013 The parent helper object containing scan configurations.
http_debug
(bool
) \u2013 Flag to indicate whether HTTP debugging is enabled.
ssl_verify
(bool
) \u2013 Flag to indicate whether SSL verification is enabled.
web_client
(BBOTAsyncClient
) \u2013 An instance of BBOTAsyncClient for making HTTP requests.
client_only_options
(tuple
) \u2013 A tuple of options only applicable to the web client.
Examples:
Basic web request:
>>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n
Download file:
>>> filename = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\")\n
Download wordlist (cached for 30 days by default):
>>> filename = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.api_page_iter","title":"api_page_iter async
","text":"api_page_iter(url, page_size=100, json=True, next_key=None, **requests_kwargs)\n
An asynchronous generator function for iterating through paginated API data.
This function continuously makes requests to a specified API URL, incrementing the page number or applying a custom pagination function, and yields the received data one page at a time. It is well-suited for APIs that provide paginated results.
Parameters:
url
(str
) \u2013 The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.
page_size
(int
, default: 100
) \u2013 The number of items per page. Defaults to 100.
json
(bool
, default: True
) \u2013 If True, attempts to deserialize the response content to a JSON object. Defaults to True.
next_key
(callable
, default: None
) \u2013 A function that takes the last page's data and returns the URL for the next page. Defaults to None.
**requests_kwargs
\u2013 Arbitrary keyword arguments that will be forwarded to the HTTP request function.
Yields:
dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.
The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.
Examples:
>>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')\n>>> try:\n>>> async for page in agen:\n>>> subdomains = page[\"subdomains\"]\n>>> self.hugesuccess(subdomains)\n>>> if not subdomains:\n>>> break\n>>> finally:\n>>> agen.aclose()\n
Source code in bbot/core/helpers/web/web.py
async def api_page_iter(self, url, page_size=100, json=True, next_key=None, **requests_kwargs):\n \"\"\"\n An asynchronous generator function for iterating through paginated API data.\n\n This function continuously makes requests to a specified API URL, incrementing the page number\n or applying a custom pagination function, and yields the received data one page at a time.\n It is well-suited for APIs that provide paginated results.\n\n Args:\n url (str): The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.\n page_size (int, optional): The number of items per page. Defaults to 100.\n json (bool, optional): If True, attempts to deserialize the response content to a JSON object. Defaults to True.\n next_key (callable, optional): A function that takes the last page's data and returns the URL for the next page. Defaults to None.\n **requests_kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Yields:\n dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.\n\n Note:\n The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.\n\n Examples:\n >>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')\n >>> try:\n >>> async for page in agen:\n >>> subdomains = page[\"subdomains\"]\n >>> self.hugesuccess(subdomains)\n >>> if not subdomains:\n >>> break\n >>> finally:\n >>> agen.aclose()\n \"\"\"\n page = 1\n offset = 0\n result = None\n while 1:\n if result and callable(next_key):\n try:\n new_url = next_key(result)\n except Exception as e:\n log.debug(f\"Failed to extract next page of results from {url}: {e}\")\n log.debug(traceback.format_exc())\n else:\n new_url = url.format(page=page, page_size=page_size, offset=offset)\n result = await self.request(new_url, **requests_kwargs)\n if result is None:\n log.verbose(f\"api_page_iter() got no response for {url}\")\n break\n try:\n if json:\n result = result.json()\n yield result\n except Exception:\n log.warning(f'Error in api_page_iter() for url: \"{new_url}\"')\n log.trace(traceback.format_exc())\n break\n finally:\n offset += page_size\n page += 1\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.beautifulsoup","title":"beautifulsoup","text":"beautifulsoup(markup, features='html.parser', builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, element_classes=None, **kwargs)\n
Naviate, Search, Modify, Parse, or PrettyPrint HTML Content. More information at https://beautiful-soup-4.readthedocs.io/en/latest/
Parameters:
markup
\u2013 A string or a file-like object representing markup to be parsed.
features
\u2013 Desirable features of the parser to be used. This may be the name of a specific parser (\"lxml\", \"lxml-xml\", \"html.parser\", or \"html5lib\") or it may be the type of markup to be used (\"html\", \"html5\", \"xml\"). Defaults to 'html.parser'.
builder
\u2013 A TreeBuilder subclass to instantiate (or instance to use) instead of looking one up based on features
.
parse_only
\u2013 A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered.
from_encoding
\u2013 A string indicating the encoding of the document to be parsed.
Returns:
soup
\u2013 An instance of the BeautifulSoup class
Examples:
>>> soup = self.helpers.beautifulsoup(event.data[\"body\"], \"html.parser\")\nPerform an html parse of the 'markup' argument and return a soup instance\n
>>> email_type = soup.find(type=\"email\")\nSearches the soup instance for all occurances of the passed in argument\n
Source code in bbot/core/helpers/web/web.py
def beautifulsoup(\n self,\n markup,\n features=\"html.parser\",\n builder=None,\n parse_only=None,\n from_encoding=None,\n exclude_encodings=None,\n element_classes=None,\n **kwargs,\n):\n \"\"\"\n Naviate, Search, Modify, Parse, or PrettyPrint HTML Content.\n More information at https://beautiful-soup-4.readthedocs.io/en/latest/\n\n Args:\n markup: A string or a file-like object representing markup to be parsed.\n features: Desirable features of the parser to be used.\n This may be the name of a specific parser (\"lxml\",\n \"lxml-xml\", \"html.parser\", or \"html5lib\") or it may be\n the type of markup to be used (\"html\", \"html5\", \"xml\").\n Defaults to 'html.parser'.\n builder: A TreeBuilder subclass to instantiate (or instance to use)\n instead of looking one up based on `features`.\n parse_only: A SoupStrainer. Only parts of the document\n matching the SoupStrainer will be considered.\n from_encoding: A string indicating the encoding of the\n document to be parsed.\n exclude_encodings = A list of strings indicating\n encodings known to be wrong.\n element_classes = A dictionary mapping BeautifulSoup\n classes like Tag and NavigableString, to other classes you'd\n like to be instantiated instead as the parse tree is\n built.\n **kwargs = For backwards compatibility purposes.\n\n Returns:\n soup: An instance of the BeautifulSoup class\n\n Todo:\n - Write tests for this function\n\n Examples:\n >>> soup = self.helpers.beautifulsoup(event.data[\"body\"], \"html.parser\")\n Perform an html parse of the 'markup' argument and return a soup instance\n\n >>> email_type = soup.find(type=\"email\")\n Searches the soup instance for all occurances of the passed in argument\n \"\"\"\n try:\n soup = BeautifulSoup(\n markup, features, builder, parse_only, from_encoding, exclude_encodings, element_classes, **kwargs\n )\n return soup\n except Exception as e:\n log.debug(f\"Error parsing beautifulsoup: {e}\")\n return False\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.curl","title":"curl async
","text":"curl(*args, **kwargs)\n
An asynchronous function that runs a cURL command with specified arguments and options.
This function constructs and executes a cURL command based on the provided parameters. It offers support for various cURL options such as headers, post data, and cookies.
Parameters:
*args
\u2013 Variable length argument list for positional arguments. Unused in this function.
url
(str
) \u2013 The URL for the cURL request. Mandatory.
raw_path
(bool
) \u2013 If True, activates '--path-as-is' in cURL. Defaults to False.
headers
(dict
) \u2013 A dictionary of HTTP headers to include in the request.
ignore_bbot_global_settings
(bool
) \u2013 If True, ignores the global settings of BBOT. Defaults to False.
post_data
(dict
) \u2013 A dictionary containing data to be sent in the request body.
method
(str
) \u2013 The HTTP method to use for the request (e.g., 'GET', 'POST').
cookies
(dict
) \u2013 A dictionary of cookies to include in the request.
path_override
(str
) \u2013 Overrides the request-target to use in the HTTP request line.
head_mode
(bool
) \u2013 If True, includes '-I' to fetch headers only. Defaults to None.
raw_body
(str
) \u2013 Raw string to be sent in the body of the request.
**kwargs
\u2013 Arbitrary keyword arguments that will be forwarded to the HTTP request function.
Returns:
str
\u2013 The output of the cURL command.
Raises:
CurlError
\u2013 If 'url' is not supplied.
Examples:
>>> output = await curl(url=\"https://example.com\", headers={\"X-Header\": \"Wat\"})\n>>> print(output)\n
Source code in bbot/core/helpers/web/web.py
async def curl(self, *args, **kwargs):\n \"\"\"\n An asynchronous function that runs a cURL command with specified arguments and options.\n\n This function constructs and executes a cURL command based on the provided parameters.\n It offers support for various cURL options such as headers, post data, and cookies.\n\n Args:\n *args: Variable length argument list for positional arguments. Unused in this function.\n url (str): The URL for the cURL request. Mandatory.\n raw_path (bool, optional): If True, activates '--path-as-is' in cURL. Defaults to False.\n headers (dict, optional): A dictionary of HTTP headers to include in the request.\n ignore_bbot_global_settings (bool, optional): If True, ignores the global settings of BBOT. Defaults to False.\n post_data (dict, optional): A dictionary containing data to be sent in the request body.\n method (str, optional): The HTTP method to use for the request (e.g., 'GET', 'POST').\n cookies (dict, optional): A dictionary of cookies to include in the request.\n path_override (str, optional): Overrides the request-target to use in the HTTP request line.\n head_mode (bool, optional): If True, includes '-I' to fetch headers only. Defaults to None.\n raw_body (str, optional): Raw string to be sent in the body of the request.\n **kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.\n\n Returns:\n str: The output of the cURL command.\n\n Raises:\n CurlError: If 'url' is not supplied.\n\n Examples:\n >>> output = await curl(url=\"https://example.com\", headers={\"X-Header\": \"Wat\"})\n >>> print(output)\n \"\"\"\n url = kwargs.get(\"url\", \"\")\n\n if not url:\n raise CurlError(\"No URL supplied to CURL helper\")\n\n curl_command = [\"curl\", url, \"-s\"]\n\n raw_path = kwargs.get(\"raw_path\", False)\n if raw_path:\n curl_command.append(\"--path-as-is\")\n\n # respect global ssl verify settings\n if self.ssl_verify is not True:\n curl_command.append(\"-k\")\n\n headers = kwargs.get(\"headers\", {})\n\n ignore_bbot_global_settings = kwargs.get(\"ignore_bbot_global_settings\", False)\n\n if ignore_bbot_global_settings:\n log.debug(\"ignore_bbot_global_settings enabled. Global settings will not be applied\")\n else:\n http_timeout = self.parent_helper.web_config.get(\"http_timeout\", 20)\n user_agent = self.parent_helper.web_config.get(\"user_agent\", \"BBOT\")\n\n if \"User-Agent\" not in headers:\n headers[\"User-Agent\"] = user_agent\n\n # only add custom headers if the URL is in-scope\n if self.parent_helper.preset.in_scope(url):\n for hk, hv in self.web_config.get(\"http_headers\", {}).items():\n headers[hk] = hv\n\n # add the timeout\n if not \"timeout\" in kwargs:\n timeout = http_timeout\n\n curl_command.append(\"-m\")\n curl_command.append(str(timeout))\n\n for k, v in headers.items():\n if isinstance(v, list):\n for x in v:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {x}\")\n\n else:\n curl_command.append(\"-H\")\n curl_command.append(f\"{k}: {v}\")\n\n post_data = kwargs.get(\"post_data\", {})\n if len(post_data.items()) > 0:\n curl_command.append(\"-d\")\n post_data_str = \"\"\n for k, v in post_data.items():\n post_data_str += f\"&{k}={v}\"\n curl_command.append(post_data_str.lstrip(\"&\"))\n\n method = kwargs.get(\"method\", \"\")\n if method:\n curl_command.append(\"-X\")\n curl_command.append(method)\n\n cookies = kwargs.get(\"cookies\", \"\")\n if cookies:\n curl_command.append(\"-b\")\n cookies_str = \"\"\n for k, v in cookies.items():\n cookies_str += f\"{k}={v}; \"\n curl_command.append(f'{cookies_str.rstrip(\" \")}')\n\n path_override = kwargs.get(\"path_override\", None)\n if path_override:\n curl_command.append(\"--request-target\")\n curl_command.append(f\"{path_override}\")\n\n head_mode = kwargs.get(\"head_mode\", None)\n if head_mode:\n curl_command.append(\"-I\")\n\n raw_body = kwargs.get(\"raw_body\", None)\n if raw_body:\n curl_command.append(\"-d\")\n curl_command.append(raw_body)\n\n output = (await self.parent_helper.run(curl_command)).stdout\n return output\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.download","title":"download async
","text":"download(url, **kwargs)\n
Asynchronous function for downloading files from a given URL. Supports caching with an optional time period in hours via the \"cache_hrs\" keyword argument. In case of successful download, returns the full path of the saved filename. If the download fails, returns None.
Parameters:
url
(str
) \u2013 The URL of the file to download.
filename
(str
) \u2013 The filename to save the downloaded file as. If not provided, will generate based on URL.
max_size
(str or int
) \u2013 Maximum filesize as a string (\"5MB\") or integer in bytes.
cache_hrs
(float
) \u2013 The number of hours to cache the downloaded file. A negative value disables caching. Defaults to -1.
method
(str
) \u2013 The HTTP method to use for the request, defaults to 'GET'.
raise_error
(bool
) \u2013 Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.
**kwargs
\u2013 Additional keyword arguments to pass to the httpx request.
Returns:
Path or None: The full path of the downloaded file as a Path object if successful, otherwise None.
Examples:
>>> filepath = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\", cache_hrs=24)\n
Source code in bbot/core/helpers/web/web.py
async def download(self, url, **kwargs):\n \"\"\"\n Asynchronous function for downloading files from a given URL. Supports caching with an optional\n time period in hours via the \"cache_hrs\" keyword argument. In case of successful download,\n returns the full path of the saved filename. If the download fails, returns None.\n\n Args:\n url (str): The URL of the file to download.\n filename (str, optional): The filename to save the downloaded file as.\n If not provided, will generate based on URL.\n max_size (str or int): Maximum filesize as a string (\"5MB\") or integer in bytes.\n cache_hrs (float, optional): The number of hours to cache the downloaded file.\n A negative value disables caching. Defaults to -1.\n method (str, optional): The HTTP method to use for the request, defaults to 'GET'.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n **kwargs: Additional keyword arguments to pass to the httpx request.\n\n Returns:\n Path or None: The full path of the downloaded file as a Path object if successful, otherwise None.\n\n Examples:\n >>> filepath = await self.helpers.download(\"https://www.evilcorp.com/passwords.docx\", cache_hrs=24)\n \"\"\"\n success = False\n filename = kwargs.pop(\"filename\", self.parent_helper.cache_filename(url))\n filename = truncate_filename(Path(filename).resolve())\n kwargs[\"filename\"] = filename\n max_size = kwargs.pop(\"max_size\", None)\n if max_size is not None:\n max_size = self.parent_helper.human_to_bytes(max_size)\n kwargs[\"max_size\"] = max_size\n cache_hrs = float(kwargs.pop(\"cache_hrs\", -1))\n if cache_hrs > 0 and self.parent_helper.is_cached(url):\n log.debug(f\"{url} is cached at {self.parent_helper.cache_filename(url)}\")\n success = True\n else:\n success = await self.run_and_return(\"download\", url, **kwargs)\n\n if success:\n return filename\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.is_login_page","title":"is_login_page","text":"is_login_page(html)\n
Determines if the provided HTML content contains a login page.
This function parses the HTML to search for forms with input fields typically used for authentication. If it identifies password fields or a combination of username and password fields, it returns True.
Parameters:
html
(str
) \u2013 The HTML content to analyze.
Returns:
bool
\u2013 True if the HTML contains a login page, otherwise False.
Examples:
>>> is_login_page('<form><input type=\"text\" name=\"username\"><input type=\"password\" name=\"password\"></form>')\nTrue\n
>>> is_login_page('<form><input type=\"text\" name=\"search\"></form>')\nFalse\n
Source code in bbot/core/helpers/web/web.py
def is_login_page(self, html):\n \"\"\"\n Determines if the provided HTML content contains a login page.\n\n This function parses the HTML to search for forms with input fields typically used for\n authentication. If it identifies password fields or a combination of username and password\n fields, it returns True.\n\n Args:\n html (str): The HTML content to analyze.\n\n Returns:\n bool: True if the HTML contains a login page, otherwise False.\n\n Examples:\n >>> is_login_page('<form><input type=\"text\" name=\"username\"><input type=\"password\" name=\"password\"></form>')\n True\n\n >>> is_login_page('<form><input type=\"text\" name=\"search\"></form>')\n False\n \"\"\"\n try:\n soup = BeautifulSoup(html, \"html.parser\")\n except Exception as e:\n log.debug(f\"Error parsing html: {e}\")\n return False\n\n forms = soup.find_all(\"form\")\n\n # first, check for obvious password fields\n for form in forms:\n if form.find_all(\"input\", {\"type\": \"password\"}):\n return True\n\n # next, check for forms that have both a user-like and password-like field\n for form in forms:\n user_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.user_keywords)\n pass_fields = sum(bool(form.find_all(\"input\", {\"name\": r})) for r in self.pass_keywords)\n if user_fields and pass_fields:\n return True\n return False\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.request","title":"request async
","text":"request(*args, **kwargs)\n
Asynchronous function for making HTTP requests, intended to be the most basic web request function used widely across BBOT and within this helper class. Handles various exceptions and timeouts that might occur during the request.
This function automatically respects the scan's global timeout, proxy, headers, etc. Headers you specify will be merged with the scan's. Your arguments take ultimate precedence, meaning you can override the scan's values if you want.
Parameters:
url
(str
) \u2013 The URL to send the request to.
method
(str
) \u2013 The HTTP method to use for the request. Defaults to 'GET'.
headers
(dict
) \u2013 Dictionary of HTTP headers to send with the request.
params
(dict
) \u2013 Dictionary, list of tuples, or bytes to send in the query string.
cookies
(dict
) \u2013 Dictionary or CookieJar object containing cookies.
json
(Any
) \u2013 A JSON serializable Python object to send in the body.
data
(dict
) \u2013 Dictionary, list of tuples, or bytes to send in the body.
files
(dict
) \u2013 Dictionary of 'name': file-like-objects for multipart encoding upload.
auth
(tuple
) \u2013 Auth tuple to enable Basic/Digest/Custom HTTP auth.
timeout
(float
) \u2013 The maximum time to wait for the request to complete.
proxies
(dict
) \u2013 Dictionary mapping protocol schemes to proxy URLs.
allow_redirects
(bool
) \u2013 Enables or disables redirection. Defaults to None.
stream
(bool
) \u2013 Enables or disables response streaming.
raise_error
(bool
) \u2013 Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.
client
(AsyncClient
) \u2013 A specific httpx.AsyncClient to use for the request. Defaults to self.web_client.
cache_for
(int
) \u2013 Time in seconds to cache the request. Not used currently. Defaults to None.
Raises:
TimeoutException
\u2013 If the request times out.
ConnectError
\u2013 If the connection fails.
RequestError
\u2013 For other request-related errors.
Returns:
httpx.Response or None: The HTTP response object returned by the httpx library.
Examples:
>>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n
>>> response = await self.helpers.request(\"https://api.evilcorp.com/\", method=\"POST\", data=\"stuff\")\n
Note If the web request fails, it will return None unless raise_error
is True
.
bbot/core/helpers/web/web.py
async def request(self, *args, **kwargs):\n \"\"\"\n Asynchronous function for making HTTP requests, intended to be the most basic web request function\n used widely across BBOT and within this helper class. Handles various exceptions and timeouts\n that might occur during the request.\n\n This function automatically respects the scan's global timeout, proxy, headers, etc.\n Headers you specify will be merged with the scan's. Your arguments take ultimate precedence,\n meaning you can override the scan's values if you want.\n\n Args:\n url (str): The URL to send the request to.\n method (str, optional): The HTTP method to use for the request. Defaults to 'GET'.\n headers (dict, optional): Dictionary of HTTP headers to send with the request.\n params (dict, optional): Dictionary, list of tuples, or bytes to send in the query string.\n cookies (dict, optional): Dictionary or CookieJar object containing cookies.\n json (Any, optional): A JSON serializable Python object to send in the body.\n data (dict, optional): Dictionary, list of tuples, or bytes to send in the body.\n files (dict, optional): Dictionary of 'name': file-like-objects for multipart encoding upload.\n auth (tuple, optional): Auth tuple to enable Basic/Digest/Custom HTTP auth.\n timeout (float, optional): The maximum time to wait for the request to complete.\n proxies (dict, optional): Dictionary mapping protocol schemes to proxy URLs.\n allow_redirects (bool, optional): Enables or disables redirection. Defaults to None.\n stream (bool, optional): Enables or disables response streaming.\n raise_error (bool, optional): Whether to raise exceptions for HTTP connect, timeout errors. Defaults to False.\n client (httpx.AsyncClient, optional): A specific httpx.AsyncClient to use for the request. Defaults to self.web_client.\n cache_for (int, optional): Time in seconds to cache the request. Not used currently. Defaults to None.\n\n Raises:\n httpx.TimeoutException: If the request times out.\n httpx.ConnectError: If the connection fails.\n httpx.RequestError: For other request-related errors.\n\n Returns:\n httpx.Response or None: The HTTP response object returned by the httpx library.\n\n Examples:\n >>> response = await self.helpers.request(\"https://www.evilcorp.com\")\n\n >>> response = await self.helpers.request(\"https://api.evilcorp.com/\", method=\"POST\", data=\"stuff\")\n\n Note:\n If the web request fails, it will return None unless `raise_error` is `True`.\n \"\"\"\n return await self.run_and_return(\"request\", *args, **kwargs)\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.request_batch","title":"request_batch async
","text":"request_batch(urls, *args, **kwargs)\n
Given a list of URLs, request them in parallel and yield responses as they come in.
Parameters:
urls
(list[str]
) \u2013 List of URLs to visit
*args
\u2013 Positional arguments to pass through to httpx
**kwargs
\u2013 Keyword arguments to pass through to httpx
Examples:
>>> async for url, response in self.helpers.request_batch(urls, headers={\"X-Test\": \"Test\"}):\n>>> if response is not None and response.status_code == 200:\n>>> self.hugesuccess(response)\n
Source code in bbot/core/helpers/web/web.py
async def request_batch(self, urls, *args, **kwargs):\n \"\"\"\n Given a list of URLs, request them in parallel and yield responses as they come in.\n\n Args:\n urls (list[str]): List of URLs to visit\n *args: Positional arguments to pass through to httpx\n **kwargs: Keyword arguments to pass through to httpx\n\n Examples:\n >>> async for url, response in self.helpers.request_batch(urls, headers={\"X-Test\": \"Test\"}):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_batch\", urls, *args, **kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.request_custom_batch","title":"request_custom_batch async
","text":"request_custom_batch(urls_and_kwargs)\n
Make web requests in parallel with custom options for each request. Yield responses as they come in.
Similar to request_batch
except it allows individual arguments for each URL.
Parameters:
urls_and_kwargs
(list[tuple]
) \u2013 List of tuples in the format: (url, kwargs, custom_tracker) where custom_tracker is an optional value for your own internal use. You may use it to help correlate requests, etc.
Examples:
>>> urls_and_kwargs = [\n>>> (\"http://evilcorp.com/1\", {\"method\": \"GET\"}, \"request-1\"),\n>>> (\"http://evilcorp.com/2\", {\"method\": \"POST\"}, \"request-2\"),\n>>> ]\n>>> async for url, kwargs, custom_tracker, response in self.helpers.request_custom_batch(\n>>> urls_and_kwargs\n>>> ):\n>>> if response is not None and response.status_code == 200:\n>>> self.hugesuccess(response)\n
Source code in bbot/core/helpers/web/web.py
async def request_custom_batch(self, urls_and_kwargs):\n \"\"\"\n Make web requests in parallel with custom options for each request. Yield responses as they come in.\n\n Similar to `request_batch` except it allows individual arguments for each URL.\n\n Args:\n urls_and_kwargs (list[tuple]): List of tuples in the format: (url, kwargs, custom_tracker)\n where custom_tracker is an optional value for your own internal use. You may use it to\n help correlate requests, etc.\n\n Examples:\n >>> urls_and_kwargs = [\n >>> (\"http://evilcorp.com/1\", {\"method\": \"GET\"}, \"request-1\"),\n >>> (\"http://evilcorp.com/2\", {\"method\": \"POST\"}, \"request-2\"),\n >>> ]\n >>> async for url, kwargs, custom_tracker, response in self.helpers.request_custom_batch(\n >>> urls_and_kwargs\n >>> ):\n >>> if response is not None and response.status_code == 200:\n >>> self.hugesuccess(response)\n \"\"\"\n agen = self.run_and_yield(\"request_custom_batch\", urls_and_kwargs)\n while 1:\n try:\n yield await agen.__anext__()\n except (StopAsyncIteration, GeneratorExit):\n await agen.aclose()\n break\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.response_to_json","title":"response_to_json","text":"response_to_json(response)\n
Convert web response to JSON object, similar to the output of httpx -irr -json
bbot/core/helpers/web/web.py
def response_to_json(self, response):\n \"\"\"\n Convert web response to JSON object, similar to the output of `httpx -irr -json`\n \"\"\"\n\n if response is None:\n return\n\n import mmh3\n from datetime import datetime\n from hashlib import md5, sha256\n from bbot.core.helpers.misc import tagify, urlparse, split_host_port, smart_decode\n\n request = response.request\n url = str(request.url)\n parsed_url = urlparse(url)\n netloc = parsed_url.netloc\n scheme = parsed_url.scheme.lower()\n host, port = split_host_port(f\"{scheme}://{netloc}\")\n\n raw_headers = \"\\r\\n\".join([f\"{k}: {v}\" for k, v in response.headers.items()])\n raw_headers_encoded = raw_headers.encode()\n\n headers = {}\n for k, v in response.headers.items():\n k = tagify(k, delimiter=\"_\")\n headers[k] = v\n\n j = {\n \"timestamp\": datetime.now().isoformat(),\n \"hash\": {\n \"body_md5\": md5(response.content).hexdigest(),\n \"body_mmh3\": mmh3.hash(response.content),\n \"body_sha256\": sha256(response.content).hexdigest(),\n # \"body_simhash\": \"TODO\",\n \"header_md5\": md5(raw_headers_encoded).hexdigest(),\n \"header_mmh3\": mmh3.hash(raw_headers_encoded),\n \"header_sha256\": sha256(raw_headers_encoded).hexdigest(),\n # \"header_simhash\": \"TODO\",\n },\n \"header\": headers,\n \"body\": smart_decode(response.content),\n \"content_type\": headers.get(\"content_type\", \"\").split(\";\")[0].strip(),\n \"url\": url,\n \"host\": str(host),\n \"port\": port,\n \"scheme\": scheme,\n \"method\": response.request.method,\n \"path\": parsed_url.path,\n \"raw_header\": raw_headers,\n \"status_code\": response.status_code,\n }\n\n return j\n
"},{"location":"dev/helpers/web/#bbot.core.helpers.web.WebHelper.wordlist","title":"wordlist async
","text":"wordlist(path, lines=None, **kwargs)\n
Asynchronous function for retrieving wordlists, either from a local path or a URL. Allows for optional line-based truncation and caching. Returns the full path of the wordlist file or a truncated version of it.
Parameters:
path
(str
) \u2013 The local or remote path of the wordlist.
lines
(int
, default: None
) \u2013 Number of lines to read from the wordlist. If specified, will return a truncated wordlist with this many lines.
cache_hrs
(float
) \u2013 Number of hours to cache the downloaded wordlist. Defaults to 720 hours (30 days) for remote wordlists.
**kwargs
\u2013 Additional keyword arguments to pass to the 'download' function for remote wordlists.
Returns:
Path
\u2013 The full path of the wordlist (or its truncated version) as a Path object.
Raises:
WordlistError
\u2013 If the path is invalid or the wordlist could not be retrieved or found.
Examples:
Fetching full wordlist
>>> wordlist_path = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n
Fetching and truncating to the first 100 lines
>>> wordlist_path = await self.helpers.wordlist(\"/root/rockyou.txt\", lines=100)\n
Source code in bbot/core/helpers/web/web.py
async def wordlist(self, path, lines=None, **kwargs):\n \"\"\"\n Asynchronous function for retrieving wordlists, either from a local path or a URL.\n Allows for optional line-based truncation and caching. Returns the full path of the wordlist\n file or a truncated version of it.\n\n Args:\n path (str): The local or remote path of the wordlist.\n lines (int, optional): Number of lines to read from the wordlist.\n If specified, will return a truncated wordlist with this many lines.\n cache_hrs (float, optional): Number of hours to cache the downloaded wordlist.\n Defaults to 720 hours (30 days) for remote wordlists.\n **kwargs: Additional keyword arguments to pass to the 'download' function for remote wordlists.\n\n Returns:\n Path: The full path of the wordlist (or its truncated version) as a Path object.\n\n Raises:\n WordlistError: If the path is invalid or the wordlist could not be retrieved or found.\n\n Examples:\n Fetching full wordlist\n >>> wordlist_path = await self.helpers.wordlist(\"https://www.evilcorp.com/wordlist.txt\")\n\n Fetching and truncating to the first 100 lines\n >>> wordlist_path = await self.helpers.wordlist(\"/root/rockyou.txt\", lines=100)\n \"\"\"\n if not path:\n raise WordlistError(f\"Invalid wordlist: {path}\")\n if not \"cache_hrs\" in kwargs:\n kwargs[\"cache_hrs\"] = 720\n if self.parent_helper.is_url(path):\n filename = await self.download(str(path), **kwargs)\n if filename is None:\n raise WordlistError(f\"Unable to retrieve wordlist from {path}\")\n else:\n filename = Path(path).resolve()\n if not filename.is_file():\n raise WordlistError(f\"Unable to find wordlist at {path}\")\n\n if lines is None:\n return filename\n else:\n lines = int(lines)\n with open(filename) as f:\n read_lines = f.readlines()\n cache_key = f\"{filename}:{lines}\"\n truncated_filename = self.parent_helper.cache_filename(cache_key)\n with open(truncated_filename, \"w\") as f:\n for line in read_lines[:lines]:\n f.write(line)\n return truncated_filename\n
"},{"location":"dev/helpers/wordcloud/","title":"Word Cloud","text":"These are helpers related to BBOT's Word Cloud, a mechanism for storing target-specific keywords that are useful for custom wordlists, etc.
Note that these helpers can be invoked directly from self.helpers
, e.g.:
self.helpers.word_cloud\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.DNSMutator","title":"DNSMutator","text":" Bases: Mutator
DNS-specific mutator used by the dnsbrute_mutations
module to generate target-specific subdomain mutations.
This class extends the Mutator base class to add DNS-specific logic for generating subdomain mutations based on input words. It utilizes custom word extraction patterns and a wordninja model trained on DNS-specific data.
Examples:
>>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n>>> s.start_without_generator()\n>>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n[\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n]\n
Source code in bbot/core/helpers/wordcloud.py
class DNSMutator(Mutator):\n \"\"\"\n DNS-specific mutator used by the `dnsbrute_mutations` module to generate target-specific subdomain mutations.\n\n This class extends the Mutator base class to add DNS-specific logic for generating\n subdomain mutations based on input words. It utilizes custom word extraction patterns\n and a wordninja model trained on DNS-specific data.\n\n Examples:\n >>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n >>> s.start_without_generator()\n >>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n [\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n ]\n \"\"\"\n\n extract_word_regexes = [\n re.compile(r, re.I)\n for r in [\n r\"[a-z]+\",\n r\"[a-z_-]+\",\n r\"[a-z0-9]+\",\n r\"[a-z0-9_-]+\",\n ]\n ]\n\n def __init__(self, *args, **kwargs):\n super().__init__(*args, **kwargs)\n wordlist_dir = Path(__file__).parent.parent.parent / \"wordlists\"\n wordninja_dns_wordlist = wordlist_dir / \"wordninja_dns.txt.gz\"\n self.model = wordninja.LanguageModel(wordninja_dns_wordlist)\n\n def mutations(self, words, max_mutations=None):\n if isinstance(words, str):\n words = [words]\n new_words = set()\n for word in words:\n for e in extract_words(word, acronyms=False, model=self.model, word_regexes=self.extract_word_regexes):\n new_words.add(e)\n return super().mutations(new_words, max_mutations=max_mutations)\n\n def add_word(self, word):\n spans = set()\n mutations = set()\n for r in self.extract_word_regexes:\n for match in r.finditer(word):\n span = match.span()\n if span not in spans:\n spans.add(span)\n for start, end in spans:\n match_str = word[start:end]\n # skip digits\n if match_str.isdigit():\n continue\n before = word[:start]\n after = word[end:]\n basic_mutation = (before, None, after)\n mutations.add(basic_mutation)\n match_str_split = self.model.split(match_str)\n if len(match_str_split) > 1:\n for i, s in enumerate(match_str_split):\n if s.isdigit():\n continue\n split_before = \"\".join(match_str_split[:i])\n split_after = \"\".join(match_str_split[i + 1 :])\n wordninja_mutation = (before + split_before, None, split_after + after)\n mutations.add(wordninja_mutation)\n for m in mutations:\n self._add_mutation(m)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.Mutator","title":"Mutator","text":" Bases: dict
Base class for generating mutations from a list of words. It accumulates words and produces mutations from them.
Source code inbbot/core/helpers/wordcloud.py
class Mutator(dict):\n \"\"\"\n Base class for generating mutations from a list of words.\n It accumulates words and produces mutations from them.\n \"\"\"\n\n def mutations(self, words, max_mutations=None):\n mutations = self.top_mutations(max_mutations)\n ret = set()\n if isinstance(words, str):\n words = [words]\n for word in words:\n for m in self.mutate(word, mutations=mutations):\n ret.add(\"\".join(m))\n return ret\n\n def mutate(self, word, max_mutations=None, mutations=None):\n if mutations is None:\n mutations = self.top_mutations(max_mutations)\n for mutation, count in mutations.items():\n ret = []\n for s in mutation:\n if s is not None:\n ret.append(s)\n else:\n ret.append(word)\n yield ret\n\n def top_mutations(self, n=None):\n if n is not None:\n return dict(sorted(self.items(), key=lambda x: x[-1], reverse=True)[:n])\n else:\n return dict(self)\n\n def _add_mutation(self, mutation):\n if None not in mutation:\n return\n mutation = tuple([m for m in mutation if m != \"\"])\n try:\n self[mutation] += 1\n except KeyError:\n self[mutation] = 1\n\n def add_word(self, word):\n pass\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud","title":"WordCloud","text":" Bases: dict
WordCloud is a specialized dictionary-like class for storing and aggregating words extracted from various data sources such as DNS names and URLs. The class is intended to facilitate the generation of target-specific wordlists and mutations.
The WordCloud class can be accessed and manipulated like a standard Python dictionary. It also offers additional methods for generating mutations based on the words it contains.
Attributes:
parent_helper
\u2013 The parent helper object that provides necessary utilities.
devops_mutations
\u2013 A set containing common devops-related mutations, loaded from a file.
dns_mutator
\u2013 An instance of the DNSMutator class for generating DNS-based mutations.
Examples:
>>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n>>> s.start_without_generator()\n>>> print(s.helpers.word_cloud)\n{\n \"evilcorp\": 2,\n \"ec\": 2,\n \"www1\": 1,\n \"evil\": 2,\n \"www\": 2,\n \"w1\": 1,\n \"corp\": 2,\n \"1\": 1,\n \"wt\": 1,\n \"test\": 1,\n \"www-test\": 1\n}\n
>>> s.helpers.word_cloud.mutations([\"word\"], cloud=True, numbers=0, devops=False, letters=False)\n[\n [\n \"1\",\n \"word\"\n ],\n [\n \"corp\",\n \"word\"\n ],\n [\n \"ec\",\n \"word\"\n ],\n [\n \"evil\",\n \"word\"\n ],\n ...\n]\n
>>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n[\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n]\n
Source code in bbot/core/helpers/wordcloud.py
class WordCloud(dict):\n \"\"\"\n WordCloud is a specialized dictionary-like class for storing and aggregating\n words extracted from various data sources such as DNS names and URLs. The class\n is intended to facilitate the generation of target-specific wordlists and mutations.\n\n The WordCloud class can be accessed and manipulated like a standard Python dictionary.\n It also offers additional methods for generating mutations based on the words it contains.\n\n Attributes:\n parent_helper: The parent helper object that provides necessary utilities.\n devops_mutations: A set containing common devops-related mutations, loaded from a file.\n dns_mutator: An instance of the DNSMutator class for generating DNS-based mutations.\n\n Examples:\n >>> s = Scanner(\"www1.evilcorp.com\", \"www-test.evilcorp.com\")\n >>> s.start_without_generator()\n >>> print(s.helpers.word_cloud)\n {\n \"evilcorp\": 2,\n \"ec\": 2,\n \"www1\": 1,\n \"evil\": 2,\n \"www\": 2,\n \"w1\": 1,\n \"corp\": 2,\n \"1\": 1,\n \"wt\": 1,\n \"test\": 1,\n \"www-test\": 1\n }\n\n >>> s.helpers.word_cloud.mutations([\"word\"], cloud=True, numbers=0, devops=False, letters=False)\n [\n [\n \"1\",\n \"word\"\n ],\n [\n \"corp\",\n \"word\"\n ],\n [\n \"ec\",\n \"word\"\n ],\n [\n \"evil\",\n \"word\"\n ],\n ...\n ]\n\n >>> s.helpers.word_cloud.dns_mutator.mutations(\"word\")\n [\n \"word\",\n \"word-test\",\n \"word1\",\n \"wordtest\",\n \"www-word\",\n \"wwwword\"\n ]\n \"\"\"\n\n def __init__(self, parent_helper, *args, **kwargs):\n self.parent_helper = parent_helper\n\n devops_filename = self.parent_helper.wordlist_dir / \"devops_mutations.txt\"\n self.devops_mutations = set(self.parent_helper.read_file(devops_filename))\n\n self.dns_mutator = DNSMutator()\n\n super().__init__(*args, **kwargs)\n\n def mutations(\n self, words, devops=True, cloud=True, letters=True, numbers=5, number_padding=2, substitute_numbers=True\n ):\n \"\"\"\n Generate various mutations for the given list of words based on different criteria.\n\n Yields tuples of strings which can be joined on the desired delimiter, e.g. \"-\" or \"_\".\n\n Args:\n words (Union[str, Iterable[str]]): A single word or list of words to mutate.\n devops (bool): Whether to include devops-related mutations.\n cloud (bool): Whether to include mutations from the word cloud.\n letters (bool): Whether to include letter-based mutations.\n numbers (int): The maximum numeric mutations to include.\n number_padding (int): Padding for numeric mutations.\n substitute_numbers (bool): Whether to substitute numbers in mutations.\n\n Yields:\n tuple: A tuple containing each of the mutation segments.\n \"\"\"\n if isinstance(words, str):\n words = (words,)\n results = set()\n for word in words:\n h = hash(word)\n if not h in results:\n results.add(h)\n yield (word,)\n if numbers > 0:\n if substitute_numbers:\n for word in words:\n for number_mutation in self.get_number_mutations(word, n=numbers, padding=number_padding):\n h = hash(number_mutation)\n if not h in results:\n results.add(h)\n yield (number_mutation,)\n for word in words:\n for modifier in self.modifiers(\n devops=devops, cloud=cloud, letters=letters, numbers=numbers, number_padding=number_padding\n ):\n a = (word, modifier)\n b = (modifier, word)\n for _ in (a, b):\n h = hash(_)\n if h not in results:\n results.add(h)\n yield _\n\n def modifiers(self, devops=True, cloud=True, letters=True, numbers=5, number_padding=2):\n modifiers = set()\n if devops:\n modifiers.update(self.devops_mutations)\n if cloud:\n modifiers.update(set(self))\n if letters:\n modifiers.update(set(string.ascii_lowercase))\n if numbers > 0:\n modifiers.update(self.parent_helper.gen_numbers(numbers, number_padding))\n return modifiers\n\n def absorb_event(self, event):\n \"\"\"\n Absorbs an event from a BBOT scan into the word cloud.\n\n This method updates the word cloud by extracting words from the given event. It aims to avoid including PTR\n (Pointer) records, as they tend to produce unhelpful mutations in the word cloud.\n\n Args:\n event (Event): The event object containing the words to be absorbed into the word cloud.\n \"\"\"\n for word in event.words:\n self.add_word(word)\n if event.scope_distance == 0 and event.type.startswith(\"DNS_NAME\"):\n subdomain = tldextract(event.data).subdomain\n if subdomain and not self.parent_helper.is_ptr(subdomain):\n for s in subdomain.split(\".\"):\n self.dns_mutator.add_word(s)\n\n def absorb_word(self, word, wordninja=True):\n \"\"\"\n Absorbs a word into the word cloud after splitting it using a word extraction algorithm.\n\n This method splits the input word into smaller meaningful words using word extraction, and then adds each\n of them to the word cloud. The splitting is done using a predefined algorithm in the parent helper.\n\n Args:\n word (str): The word to be split and absorbed into the word cloud.\n wordninja (bool, optional): If True, word extraction is enabled. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.absorb_word(\"blacklantern\")\n >>> print(self.helpers.word_cloud)\n {\n \"blacklantern\": 1,\n \"black\": 1,\n \"bl\": 1,\n \"lantern\": 1\n }\n \"\"\"\n for w in self.parent_helper.extract_words(word, wordninja=wordninja):\n self.add_word(w)\n\n def add_word(self, word, lowercase=True):\n \"\"\"\n Adds a word to the word cloud.\n\n This method updates the word cloud by adding a given word. If the word already exists in the cloud,\n its frequency count is incremented by 1. Optionally, the word can be converted to lowercase before adding.\n\n Args:\n word (str): The word to be added to the word cloud.\n lowercase (bool, optional): If True, the word will be converted to lowercase before adding. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.add_word(\"Example\")\n >>> self.helpers.word_cloud.add_word(\"example\")\n >>> print(self.helpers.word_cloud)\n {'example': 2}\n \"\"\"\n if lowercase:\n word = word.lower()\n try:\n self[word] += 1\n except KeyError:\n self[word] = 1\n\n def get_number_mutations(self, base, n=5, padding=2):\n \"\"\"\n Generates mutations of a base string by modifying the numerical parts or appending numbers.\n\n This method detects existing numbers in the base string and tries incrementing and decrementing them within a\n specified range. It also appends numbers at the end or after each word to generate more mutations.\n\n Args:\n base (str): The base string to generate mutations from.\n n (int, optional): The range of numbers to use for incrementing/decrementing. Defaults to 5.\n padding (int, optional): Zero-pad numbers up to this length. Defaults to 2.\n\n Returns:\n set: A set of mutated strings based on the base input.\n\n Examples:\n >>> self.helpers.word_cloud.get_number_mutations(\"www2-test\", n=2)\n {\n \"www0-test\",\n \"www1-test\",\n \"www2-test\",\n \"www2-test0\",\n \"www2-test00\",\n \"www2-test01\",\n \"www2-test1\",\n \"www3-test\",\n \"www4-test\"\n }\n \"\"\"\n results = set()\n\n # detects numbers and increments/decrements them\n # e.g. for \"base2_p013\", we would try:\n # - \"base0_p013\" through \"base12_p013\"\n # - \"base2_p003\" through \"base2_p023\"\n # limited to three iterations for sanity's sake\n for match in list(self.parent_helper.regexes.num_regex.finditer(base))[-3:]:\n span = match.span()\n before = base[: span[0]]\n after = base[span[-1] :]\n number = base[span[0] : span[-1]]\n numlen = len(number)\n maxnum = min(int(\"9\" * numlen), int(number) + n)\n minnum = max(0, int(number) - n)\n for i in range(minnum, maxnum + 1):\n filled_num = str(i).zfill(numlen)\n results.add(f\"{before}{filled_num}{after}\")\n if not number.startswith(\"0\"):\n results.add(f\"{before}{i}{after}\")\n\n # appends numbers after each word\n # e.g., for \"base_www\", we would try:\n # - \"base1_www\", \"base2_www\", etc.\n # - \"base_www1\", \"base_www2\", etc.\n # limited to three iterations for sanity's sake\n number_suffixes = self.parent_helper.gen_numbers(n, padding)\n for match in list(self.parent_helper.regexes.word_regex.finditer(base))[-3:]:\n span = match.span()\n for suffix in number_suffixes:\n before = base[: span[-1]]\n after = base[span[-1] :]\n # skip if there's already a number\n if len(after) > 1 and not after[0].isdigit():\n results.add(f\"{before}{suffix}{after}\")\n # basic cases so we don't miss anything\n for s in number_suffixes:\n results.add(f\"{base}{s}\")\n results.add(base)\n\n return results\n\n def truncate(self, limit):\n \"\"\"\n Truncates the word cloud dictionary to retain only the top `limit` entries based on their occurrence frequencies.\n\n Args:\n limit (int): The maximum number of entries to retain in the word cloud.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.truncate(2)\n >>> self.helpers.word_cloud\n {'cherry': 8, 'apple': 5}\n \"\"\"\n new_self = dict(self.json(limit=limit))\n self.clear()\n self.update(new_self)\n\n def json(self, limit=None):\n \"\"\"\n Returns the word cloud as a sorted OrderedDict, optionally truncated to the top `limit` entries.\n\n Args:\n limit (int, optional): The maximum number of entries to include in the returned OrderedDict. If None, all entries are included.\n\n Returns:\n OrderedDict: A dictionary sorted by word frequencies, potentially truncated to the top `limit` entries.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.json(limit=2)\n OrderedDict([('cherry', 8), ('apple', 5)])\n \"\"\"\n cloud_sorted = sorted(self.items(), key=lambda x: x[-1], reverse=True)\n if limit is not None:\n cloud_sorted = cloud_sorted[:limit]\n return OrderedDict(cloud_sorted)\n\n @property\n def default_filename(self):\n return self.parent_helper.preset.scan.home / f\"wordcloud.tsv\"\n\n def save(self, filename=None, limit=None):\n \"\"\"\n Saves the word cloud to a file. The cloud can optionally be truncated to the top `limit` entries.\n\n Args:\n filename (str, optional): The path to the file where the word cloud will be saved. If None, uses a default filename.\n limit (int, optional): The maximum number of entries to save to the file. If None, all entries are saved.\n\n Returns:\n tuple: A tuple containing a boolean indicating success or failure, and the resolved filename.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.save(filename=\"word_cloud.txt\", limit=2)\n (True, Path('word_cloud.txt'))\n \"\"\"\n if filename is None:\n filename = self.default_filename\n else:\n filename = Path(filename).resolve()\n try:\n if not self.parent_helper.mkdir(filename.parent):\n log.error(f\"Failure creating or error writing to {filename.parent} when saving word cloud\")\n return\n if len(self) > 0:\n log.debug(f\"Saving word cloud to {filename}\")\n with open(str(filename), mode=\"w\", newline=\"\") as f:\n c = csv.writer(f, delimiter=\"\\t\")\n for word, count in self.json(limit).items():\n c.writerow([count, word])\n log.debug(f\"Saved word cloud ({len(self):,} words) to {filename}\")\n return True, filename\n else:\n log.debug(f\"No words to save\")\n except Exception as e:\n import traceback\n\n log.warning(f\"Failed to save word cloud to {filename}: {e}\")\n log.trace(traceback.format_exc())\n return False, filename\n\n def load(self, filename=None):\n \"\"\"\n Loads a word cloud from a file. The file can be either a standard wordlist with one entry per line\n or a .tsv (tab-separated) file where the first row is the count and the second row is the associated entry.\n\n Args:\n filename (str, optional): The path to the file from which to load the word cloud. If None, uses a default filename.\n \"\"\"\n if filename is None:\n wordcloud_path = self.default_filename\n else:\n wordcloud_path = Path(filename).resolve()\n log.verbose(f\"Loading word cloud from {wordcloud_path}\")\n try:\n with open(str(wordcloud_path), newline=\"\") as f:\n c = csv.reader(f, delimiter=\"\\t\")\n for row in c:\n if len(row) == 1:\n self.add_word(row[0])\n elif len(row) == 2:\n with suppress(Exception):\n count, word = row\n count = int(count)\n self[word] = count\n if len(self) > 0:\n log.success(f\"Loaded word cloud ({len(self):,} words) from {wordcloud_path}\")\n except Exception as e:\n import traceback\n\n log_fn = log.debug\n if filename is not None:\n log_fn = log.warning\n log_fn(f\"Failed to load word cloud from {wordcloud_path}: {e}\")\n if filename is not None:\n log.trace(traceback.format_exc())\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.absorb_event","title":"absorb_event","text":"absorb_event(event)\n
Absorbs an event from a BBOT scan into the word cloud.
This method updates the word cloud by extracting words from the given event. It aims to avoid including PTR (Pointer) records, as they tend to produce unhelpful mutations in the word cloud.
Parameters:
event
(Event
) \u2013 The event object containing the words to be absorbed into the word cloud.
bbot/core/helpers/wordcloud.py
def absorb_event(self, event):\n \"\"\"\n Absorbs an event from a BBOT scan into the word cloud.\n\n This method updates the word cloud by extracting words from the given event. It aims to avoid including PTR\n (Pointer) records, as they tend to produce unhelpful mutations in the word cloud.\n\n Args:\n event (Event): The event object containing the words to be absorbed into the word cloud.\n \"\"\"\n for word in event.words:\n self.add_word(word)\n if event.scope_distance == 0 and event.type.startswith(\"DNS_NAME\"):\n subdomain = tldextract(event.data).subdomain\n if subdomain and not self.parent_helper.is_ptr(subdomain):\n for s in subdomain.split(\".\"):\n self.dns_mutator.add_word(s)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.absorb_word","title":"absorb_word","text":"absorb_word(word, wordninja=True)\n
Absorbs a word into the word cloud after splitting it using a word extraction algorithm.
This method splits the input word into smaller meaningful words using word extraction, and then adds each of them to the word cloud. The splitting is done using a predefined algorithm in the parent helper.
Parameters:
word
(str
) \u2013 The word to be split and absorbed into the word cloud.
wordninja
(bool
, default: True
) \u2013 If True, word extraction is enabled. Defaults to True.
Examples:
>>> self.helpers.word_cloud.absorb_word(\"blacklantern\")\n>>> print(self.helpers.word_cloud)\n{\n \"blacklantern\": 1,\n \"black\": 1,\n \"bl\": 1,\n \"lantern\": 1\n}\n
Source code in bbot/core/helpers/wordcloud.py
def absorb_word(self, word, wordninja=True):\n \"\"\"\n Absorbs a word into the word cloud after splitting it using a word extraction algorithm.\n\n This method splits the input word into smaller meaningful words using word extraction, and then adds each\n of them to the word cloud. The splitting is done using a predefined algorithm in the parent helper.\n\n Args:\n word (str): The word to be split and absorbed into the word cloud.\n wordninja (bool, optional): If True, word extraction is enabled. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.absorb_word(\"blacklantern\")\n >>> print(self.helpers.word_cloud)\n {\n \"blacklantern\": 1,\n \"black\": 1,\n \"bl\": 1,\n \"lantern\": 1\n }\n \"\"\"\n for w in self.parent_helper.extract_words(word, wordninja=wordninja):\n self.add_word(w)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.add_word","title":"add_word","text":"add_word(word, lowercase=True)\n
Adds a word to the word cloud.
This method updates the word cloud by adding a given word. If the word already exists in the cloud, its frequency count is incremented by 1. Optionally, the word can be converted to lowercase before adding.
Parameters:
word
(str
) \u2013 The word to be added to the word cloud.
lowercase
(bool
, default: True
) \u2013 If True, the word will be converted to lowercase before adding. Defaults to True.
Examples:
>>> self.helpers.word_cloud.add_word(\"Example\")\n>>> self.helpers.word_cloud.add_word(\"example\")\n>>> print(self.helpers.word_cloud)\n{'example': 2}\n
Source code in bbot/core/helpers/wordcloud.py
def add_word(self, word, lowercase=True):\n \"\"\"\n Adds a word to the word cloud.\n\n This method updates the word cloud by adding a given word. If the word already exists in the cloud,\n its frequency count is incremented by 1. Optionally, the word can be converted to lowercase before adding.\n\n Args:\n word (str): The word to be added to the word cloud.\n lowercase (bool, optional): If True, the word will be converted to lowercase before adding. Defaults to True.\n\n Examples:\n >>> self.helpers.word_cloud.add_word(\"Example\")\n >>> self.helpers.word_cloud.add_word(\"example\")\n >>> print(self.helpers.word_cloud)\n {'example': 2}\n \"\"\"\n if lowercase:\n word = word.lower()\n try:\n self[word] += 1\n except KeyError:\n self[word] = 1\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.get_number_mutations","title":"get_number_mutations","text":"get_number_mutations(base, n=5, padding=2)\n
Generates mutations of a base string by modifying the numerical parts or appending numbers.
This method detects existing numbers in the base string and tries incrementing and decrementing them within a specified range. It also appends numbers at the end or after each word to generate more mutations.
Parameters:
base
(str
) \u2013 The base string to generate mutations from.
n
(int
, default: 5
) \u2013 The range of numbers to use for incrementing/decrementing. Defaults to 5.
padding
(int
, default: 2
) \u2013 Zero-pad numbers up to this length. Defaults to 2.
Returns:
set
\u2013 A set of mutated strings based on the base input.
Examples:
>>> self.helpers.word_cloud.get_number_mutations(\"www2-test\", n=2)\n{\n \"www0-test\",\n \"www1-test\",\n \"www2-test\",\n \"www2-test0\",\n \"www2-test00\",\n \"www2-test01\",\n \"www2-test1\",\n \"www3-test\",\n \"www4-test\"\n}\n
Source code in bbot/core/helpers/wordcloud.py
def get_number_mutations(self, base, n=5, padding=2):\n \"\"\"\n Generates mutations of a base string by modifying the numerical parts or appending numbers.\n\n This method detects existing numbers in the base string and tries incrementing and decrementing them within a\n specified range. It also appends numbers at the end or after each word to generate more mutations.\n\n Args:\n base (str): The base string to generate mutations from.\n n (int, optional): The range of numbers to use for incrementing/decrementing. Defaults to 5.\n padding (int, optional): Zero-pad numbers up to this length. Defaults to 2.\n\n Returns:\n set: A set of mutated strings based on the base input.\n\n Examples:\n >>> self.helpers.word_cloud.get_number_mutations(\"www2-test\", n=2)\n {\n \"www0-test\",\n \"www1-test\",\n \"www2-test\",\n \"www2-test0\",\n \"www2-test00\",\n \"www2-test01\",\n \"www2-test1\",\n \"www3-test\",\n \"www4-test\"\n }\n \"\"\"\n results = set()\n\n # detects numbers and increments/decrements them\n # e.g. for \"base2_p013\", we would try:\n # - \"base0_p013\" through \"base12_p013\"\n # - \"base2_p003\" through \"base2_p023\"\n # limited to three iterations for sanity's sake\n for match in list(self.parent_helper.regexes.num_regex.finditer(base))[-3:]:\n span = match.span()\n before = base[: span[0]]\n after = base[span[-1] :]\n number = base[span[0] : span[-1]]\n numlen = len(number)\n maxnum = min(int(\"9\" * numlen), int(number) + n)\n minnum = max(0, int(number) - n)\n for i in range(minnum, maxnum + 1):\n filled_num = str(i).zfill(numlen)\n results.add(f\"{before}{filled_num}{after}\")\n if not number.startswith(\"0\"):\n results.add(f\"{before}{i}{after}\")\n\n # appends numbers after each word\n # e.g., for \"base_www\", we would try:\n # - \"base1_www\", \"base2_www\", etc.\n # - \"base_www1\", \"base_www2\", etc.\n # limited to three iterations for sanity's sake\n number_suffixes = self.parent_helper.gen_numbers(n, padding)\n for match in list(self.parent_helper.regexes.word_regex.finditer(base))[-3:]:\n span = match.span()\n for suffix in number_suffixes:\n before = base[: span[-1]]\n after = base[span[-1] :]\n # skip if there's already a number\n if len(after) > 1 and not after[0].isdigit():\n results.add(f\"{before}{suffix}{after}\")\n # basic cases so we don't miss anything\n for s in number_suffixes:\n results.add(f\"{base}{s}\")\n results.add(base)\n\n return results\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.json","title":"json","text":"json(limit=None)\n
Returns the word cloud as a sorted OrderedDict, optionally truncated to the top limit
entries.
Parameters:
limit
(int
, default: None
) \u2013 The maximum number of entries to include in the returned OrderedDict. If None, all entries are included.
Returns:
OrderedDict
\u2013 A dictionary sorted by word frequencies, potentially truncated to the top limit
entries.
Examples:
>>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n>>> self.helpers.word_cloud.json(limit=2)\nOrderedDict([('cherry', 8), ('apple', 5)])\n
Source code in bbot/core/helpers/wordcloud.py
def json(self, limit=None):\n \"\"\"\n Returns the word cloud as a sorted OrderedDict, optionally truncated to the top `limit` entries.\n\n Args:\n limit (int, optional): The maximum number of entries to include in the returned OrderedDict. If None, all entries are included.\n\n Returns:\n OrderedDict: A dictionary sorted by word frequencies, potentially truncated to the top `limit` entries.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.json(limit=2)\n OrderedDict([('cherry', 8), ('apple', 5)])\n \"\"\"\n cloud_sorted = sorted(self.items(), key=lambda x: x[-1], reverse=True)\n if limit is not None:\n cloud_sorted = cloud_sorted[:limit]\n return OrderedDict(cloud_sorted)\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.load","title":"load","text":"load(filename=None)\n
Loads a word cloud from a file. The file can be either a standard wordlist with one entry per line or a .tsv (tab-separated) file where the first row is the count and the second row is the associated entry.
Parameters:
filename
(str
, default: None
) \u2013 The path to the file from which to load the word cloud. If None, uses a default filename.
bbot/core/helpers/wordcloud.py
def load(self, filename=None):\n \"\"\"\n Loads a word cloud from a file. The file can be either a standard wordlist with one entry per line\n or a .tsv (tab-separated) file where the first row is the count and the second row is the associated entry.\n\n Args:\n filename (str, optional): The path to the file from which to load the word cloud. If None, uses a default filename.\n \"\"\"\n if filename is None:\n wordcloud_path = self.default_filename\n else:\n wordcloud_path = Path(filename).resolve()\n log.verbose(f\"Loading word cloud from {wordcloud_path}\")\n try:\n with open(str(wordcloud_path), newline=\"\") as f:\n c = csv.reader(f, delimiter=\"\\t\")\n for row in c:\n if len(row) == 1:\n self.add_word(row[0])\n elif len(row) == 2:\n with suppress(Exception):\n count, word = row\n count = int(count)\n self[word] = count\n if len(self) > 0:\n log.success(f\"Loaded word cloud ({len(self):,} words) from {wordcloud_path}\")\n except Exception as e:\n import traceback\n\n log_fn = log.debug\n if filename is not None:\n log_fn = log.warning\n log_fn(f\"Failed to load word cloud from {wordcloud_path}: {e}\")\n if filename is not None:\n log.trace(traceback.format_exc())\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.mutations","title":"mutations","text":"mutations(words, devops=True, cloud=True, letters=True, numbers=5, number_padding=2, substitute_numbers=True)\n
Generate various mutations for the given list of words based on different criteria.
Yields tuples of strings which can be joined on the desired delimiter, e.g. \"-\" or \"_\".
Parameters:
words
(Union[str, Iterable[str]]
) \u2013 A single word or list of words to mutate.
devops
(bool
, default: True
) \u2013 Whether to include devops-related mutations.
cloud
(bool
, default: True
) \u2013 Whether to include mutations from the word cloud.
letters
(bool
, default: True
) \u2013 Whether to include letter-based mutations.
numbers
(int
, default: 5
) \u2013 The maximum numeric mutations to include.
number_padding
(int
, default: 2
) \u2013 Padding for numeric mutations.
substitute_numbers
(bool
, default: True
) \u2013 Whether to substitute numbers in mutations.
Yields:
tuple
\u2013 A tuple containing each of the mutation segments.
bbot/core/helpers/wordcloud.py
def mutations(\n self, words, devops=True, cloud=True, letters=True, numbers=5, number_padding=2, substitute_numbers=True\n):\n \"\"\"\n Generate various mutations for the given list of words based on different criteria.\n\n Yields tuples of strings which can be joined on the desired delimiter, e.g. \"-\" or \"_\".\n\n Args:\n words (Union[str, Iterable[str]]): A single word or list of words to mutate.\n devops (bool): Whether to include devops-related mutations.\n cloud (bool): Whether to include mutations from the word cloud.\n letters (bool): Whether to include letter-based mutations.\n numbers (int): The maximum numeric mutations to include.\n number_padding (int): Padding for numeric mutations.\n substitute_numbers (bool): Whether to substitute numbers in mutations.\n\n Yields:\n tuple: A tuple containing each of the mutation segments.\n \"\"\"\n if isinstance(words, str):\n words = (words,)\n results = set()\n for word in words:\n h = hash(word)\n if not h in results:\n results.add(h)\n yield (word,)\n if numbers > 0:\n if substitute_numbers:\n for word in words:\n for number_mutation in self.get_number_mutations(word, n=numbers, padding=number_padding):\n h = hash(number_mutation)\n if not h in results:\n results.add(h)\n yield (number_mutation,)\n for word in words:\n for modifier in self.modifiers(\n devops=devops, cloud=cloud, letters=letters, numbers=numbers, number_padding=number_padding\n ):\n a = (word, modifier)\n b = (modifier, word)\n for _ in (a, b):\n h = hash(_)\n if h not in results:\n results.add(h)\n yield _\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.save","title":"save","text":"save(filename=None, limit=None)\n
Saves the word cloud to a file. The cloud can optionally be truncated to the top limit
entries.
Parameters:
filename
(str
, default: None
) \u2013 The path to the file where the word cloud will be saved. If None, uses a default filename.
limit
(int
, default: None
) \u2013 The maximum number of entries to save to the file. If None, all entries are saved.
Returns:
tuple
\u2013 A tuple containing a boolean indicating success or failure, and the resolved filename.
Examples:
>>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n>>> self.helpers.word_cloud.save(filename=\"word_cloud.txt\", limit=2)\n(True, Path('word_cloud.txt'))\n
Source code in bbot/core/helpers/wordcloud.py
def save(self, filename=None, limit=None):\n \"\"\"\n Saves the word cloud to a file. The cloud can optionally be truncated to the top `limit` entries.\n\n Args:\n filename (str, optional): The path to the file where the word cloud will be saved. If None, uses a default filename.\n limit (int, optional): The maximum number of entries to save to the file. If None, all entries are saved.\n\n Returns:\n tuple: A tuple containing a boolean indicating success or failure, and the resolved filename.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.save(filename=\"word_cloud.txt\", limit=2)\n (True, Path('word_cloud.txt'))\n \"\"\"\n if filename is None:\n filename = self.default_filename\n else:\n filename = Path(filename).resolve()\n try:\n if not self.parent_helper.mkdir(filename.parent):\n log.error(f\"Failure creating or error writing to {filename.parent} when saving word cloud\")\n return\n if len(self) > 0:\n log.debug(f\"Saving word cloud to {filename}\")\n with open(str(filename), mode=\"w\", newline=\"\") as f:\n c = csv.writer(f, delimiter=\"\\t\")\n for word, count in self.json(limit).items():\n c.writerow([count, word])\n log.debug(f\"Saved word cloud ({len(self):,} words) to {filename}\")\n return True, filename\n else:\n log.debug(f\"No words to save\")\n except Exception as e:\n import traceback\n\n log.warning(f\"Failed to save word cloud to {filename}: {e}\")\n log.trace(traceback.format_exc())\n return False, filename\n
"},{"location":"dev/helpers/wordcloud/#bbot.core.helpers.wordcloud.WordCloud.truncate","title":"truncate","text":"truncate(limit)\n
Truncates the word cloud dictionary to retain only the top limit
entries based on their occurrence frequencies.
Parameters:
limit
(int
) \u2013 The maximum number of entries to retain in the word cloud.
Examples:
>>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n>>> self.helpers.word_cloud.truncate(2)\n>>> self.helpers.word_cloud\n{'cherry': 8, 'apple': 5}\n
Source code in bbot/core/helpers/wordcloud.py
def truncate(self, limit):\n \"\"\"\n Truncates the word cloud dictionary to retain only the top `limit` entries based on their occurrence frequencies.\n\n Args:\n limit (int): The maximum number of entries to retain in the word cloud.\n\n Examples:\n >>> self.helpers.word_cloud.update({\"apple\": 5, \"banana\": 2, \"cherry\": 8})\n >>> self.helpers.word_cloud.truncate(2)\n >>> self.helpers.word_cloud\n {'cherry': 8, 'apple': 5}\n \"\"\"\n new_self = dict(self.json(limit=limit))\n self.clear()\n self.update(new_self)\n
"},{"location":"modules/custom_yara_rules/","title":"Custom Yara Rules","text":""},{"location":"modules/custom_yara_rules/#overview","title":"Overview","text":"Through the excavate
internal module, BBOT supports searching through HTTP response data using custom YARA rules.
This feature can be utilized with the command line option --custom-yara-rules
or -cy
, followed by a file containing the YARA rules.
Example:
bbot -m httpx --custom-yara-rules=test.yara -t http://example.com/\n
Where test.yara
is a file on the filesystem. The file can contain multiple YARA rules, separated by lines.
YARA rules can be quite simple, the simplest example being a single string search:
rule find_string {\n strings:\n $str1 = \"AAAABBBB\"\n\n condition:\n $str1\n}\n
To look for multiple strings, and match if any of them were to hit:
rule find_string {\n strings:\n $str1 = \"AAAABBBB\"\n $str2 = \"CCCCDDDD\"\n\n condition:\n any of them\n}\n
One of the most important capabilities is the use of regexes within the rule, as shown in the following example.
rule find_AAAABBBB_regex {\n strings:\n $regex = /A{1,4}B{1,4}/\n\n condition:\n $regex\n}\n
Note: YARA uses it's own regex engine that is not a 1:1 match with python regexes. This means many existing regexes will have to be modified before they will work with YARA. The good news is: YARA's regex engine is FAST, immensely more fast than pythons!
Further discussion of art of writing complex YARA rules goes far beyond the scope of this documentation. A good place to start learning more is the official YARA documentation.
The YARA engine provides plenty of room to make highly complex signatures possible, with various conditional operators available. Multiple signatures can be linked together to create sophisticated detection rules that can identify a wide range of specific content. This flexibility allows the crafting of efficient rules for detecting security vulnerabilities, leveraging logical operators, regular expressions, and other powerful features. Additionally, YARA's modular structure supports easy updates and maintenance of signature sets.
"},{"location":"modules/custom_yara_rules/#custom-options","title":"Custom options","text":"BBOT supports the use of a few custom meta
attributes within YARA rules, which will alter the behavior of the rule and the post-processing of the results.
The description of the rule. Will end up in the description of any produced events if defined.
Example with no description provided:
[FINDING] {\"description\": \"Custom Yara Rule [find_string] Matched via identifier [str1]\", \"host\": \"example.com\", \"url\": \"http://example.com\"} excavate\n
Example with the description added:
[FINDING] {\"description\": \"Custom Yara Rule [AAAABBBB] with description: [contains our test string] Matched via identifier [str1]\", \"host\": \"example.com, \"url\": \"http://example.com\"} excavate\n
That FINDING was produced with the following signature:
rule AAAABBBB {\n\n meta:\n description = \"contains our test string\"\n strings:\n $str1 = \"AAAABBBB\"\n condition:\n $str1\n}\n
"},{"location":"modules/custom_yara_rules/#tags","title":"tags","text":"Tags specified with this option will be passed-on to any resulting emitted events. Tags are provided as a comma separated string, as shown below:
Lets expand on the previous example:
rule AAAABBBB {\n\n meta:\n description = \"contains our test string\"\n tags = \"tag1,tag2,tag3\"\n strings:\n $str1 = \"AAAABBBB\"\n condition:\n $str1\n}\n
Now, the BBOT FINDING includes these custom tags, as with the following output:
[FINDING] {\"description\": \"Custom Yara Rule [AAAABBBB] with description: [contains our test string] Matched via identifier [str1]\", \"host\": \"example.com\", \"url\": \"http://example.com/\"} excavate (tag1, tag2, tag3)\n
"},{"location":"modules/custom_yara_rules/#emit_match","title":"emit_match","text":"When set to True, the contents returned from a successful extraction via a YARA regex will be included in the FINDING event which is emitted.
Consider the following example YARA rule:
rule SubstackLink\n{\n meta:\n description = \"contains a Substack link\"\n emit_match = true\n strings:\n $substack_link = /https?:\\/\\/[a-zA-Z0-9.-]+\\.substack\\.com/\n condition:\n $substack_link\n}\n
When run against the Black Lantern Security homepage with the following BBOT command:
bbot -m httpx --custom-yara-rules=substack.yara -t http://www.blacklanternsecurity.com/\n
We get the following result. Note that the finding now contains the actual link that was identified with the regex.
[FINDING] {\"description\": \"Custom Yara Rule [SubstackLink] with description: [contains a Substack link] Matched via identifier [substack_link] and extracted [https://blacklanternsecurity.substack.com]\", \"host\": \"www.blacklanternsecurity.com\", \"url\": \"https://www.blacklanternsecurity.com/\"} excavate\n
"},{"location":"modules/internal_modules/","title":"List of Modules","text":""},{"location":"modules/internal_modules/#what-are-internal-modules","title":"What are internal modules?","text":"Internal modules are just like regular modules, except that they run all the time. They do not have to be explicitly enabled. They can, however, be explicitly disabled if needed.
Turning them off is simple, a root-level config option is present which can be set to False to disable them:
# Infer certain events from others, e.g. IPs from IP ranges, DNS_NAMEs from URLs, etc.\nspeculate: True\n# Passively search event data for URLs, hostnames, emails, etc.\nexcavate: True\n# Summarize activity at the end of a scan\naggregate: True\n# DNS resolution\ndnsresolve: True\n# Cloud provider tagging\ncloudcheck: True\n
These modules are executing core functionality that is normally essential for a typical BBOT scan. Let's take a quick look at each one's functionality:
"},{"location":"modules/internal_modules/#aggregate","title":"aggregate","text":"Summarize statistics at the end of a scan. Disable if you don't want to see this table.
"},{"location":"modules/internal_modules/#cloud","title":"cloud","text":"The cloud module looks at events and tries to determine if they are associated with a cloud provider and tags them as such, and can also identify certain cloud resources
"},{"location":"modules/internal_modules/#dns","title":"dns","text":"The DNS internal module controls the basic DNS resoultion the BBOT performs, and all of the supporting machinery like wildcard detection, etc.
"},{"location":"modules/internal_modules/#excavate","title":"excavate","text":"The excavate internal module designed to passively extract valuable information from HTTP response data. It primarily uses YARA regexes to extract information, with various events being produced from the post-processing of the YARA results.
Here is a summary of the data it produces:
"},{"location":"modules/internal_modules/#urls","title":"URLs","text":"By extracting URLs from all visited pages, this is actually already half of a web-spider. The other half is recursion, which is baked in to BBOT from the ground up. Therefore, protections are in place by default in the form of web_spider_distance
and web_spider_depth
settings. These settings govern restrictions to URLs recursively harvested from HTTP responses, preventing endless runaway scans. However, in the right situation the controlled use of a web-spider is extremely powerful.
Parameter Extraction The parameter extraction functionality identifies and extracts key web parameters from HTTP responses, and produced WEB_PARAMETER
events. This includes parameters found in GET and POST requests, HTML forms, and jQuery requests. Currently, these are only used by the hunt
module, and by the paramminer
modules, to a limited degree. However, future functionality will make extensive use of these events.
Detect email addresses within HTTP_RESPONSE data.
"},{"location":"modules/internal_modules/#error-detection","title":"Error Detection","text":"Scans for verbose error messages in HTTP responses and raw text data. By identifying specific error signatures from various programming languages and frameworks, this feature helps uncover misconfigurations, debugging information, and potential vulnerabilities. This insight is invaluable for identifying weak points or anomalies in web applications.
"},{"location":"modules/internal_modules/#content-security-policy-csp-extraction","title":"Content Security Policy (CSP) Extraction","text":"The CSP extraction capability focuses on extracting domains from Content-Security-Policy headers. By analyzing these headers, BBOT can identify additional domains which can get fed back into the scan.
"},{"location":"modules/internal_modules/#serialization-detection","title":"Serialization Detection","text":"Serialized objects are a common source of serious security vulnerablities. Excavate aims to detect those used in Java, .NET, and PHP applications.
"},{"location":"modules/internal_modules/#functionality-detection","title":"Functionality Detection","text":"Looks for specific web functionalities such as file upload fields and WSDL URLs. By identifying these elements, BBOT can pinpoint areas of the application that may require further scrutiny for security vulnerabilities.
"},{"location":"modules/internal_modules/#non-http-scheme-detection","title":"Non-HTTP Scheme Detection","text":"The non-HTTP scheme detection capability extracts URLs with non-HTTP schemes, such as ftp, mailto, and javascript. By identifying these URLs, BBOT can uncover additional vectors for attack or information leakage.
"},{"location":"modules/internal_modules/#custom-yara-rules","title":"Custom Yara Rules","text":"Excavate supports the use of custom YARA rules, which wil be added to the other rules before the scan start. For more info, view this.
"},{"location":"modules/internal_modules/#speculate","title":"speculate","text":"Speculate is all about inferring one data type from another, particularly when certain tools like port scanners are not enabled. This is essential functionality for most BBOT scans, allowing for the discovery of web resources when starting with a DNS-only target list without a port scanner. It bridges gaps in the data, providing a more comprehensive view of the target by leveraging existing information.
For a list of module config options, see Module Options.
"},{"location":"modules/nuclei/","title":"Nuclei","text":""},{"location":"modules/nuclei/#overview","title":"Overview","text":"BBOT integrates with Nuclei, an open-source web vulnerability scanner by Project Discovery. This is one of the ways BBOT makes it possible to go from a single target domain/IP all the way to confirmed vulnerabilities, in one scan.
You can specify individual nuclei templates by setting the modules.nuclei.templates
to their comma-separated filenames:
bbot -m nuclei -c modules.nuclei.templates=http/takeovers/airee-takeover.yaml,http/takeovers/cargo-takeover.yaml\n
...or via the config:
modules:\n nuclei:\n templates: http/takeovers/airee-takeover.yaml,http/takeovers/cargo-takeover.yaml\n
"},{"location":"modules/nuclei/#configuration-and-options","title":"Configuration and Options","text":"The Nuclei module has many configuration options:
Config Option Type Description Default modules.nuclei.batch_size int Number of targets to send to Nuclei per batch (default 200) 200 modules.nuclei.budget int Used in budget mode to set the number of requests which will be allotted to the nuclei scan 1 modules.nuclei.concurrency int maximum number of templates to be executed in parallel (default 25) 25 modules.nuclei.directory_only bool Filter out 'file' URL event (default True) True modules.nuclei.etags str tags to exclude from the scan modules.nuclei.mode str manual | technology | severe | budget. Technology: Only activate based on technology events that match nuclei tags (nuclei -as mode). Manual (DEFAULT): Fully manual settings. Severe: Only critical and high severity templates without intrusive. Budget: Limit Nuclei to a specified number of HTTP requests manual modules.nuclei.ratelimit int maximum number of requests to send per second (default 150) 150 modules.nuclei.retries int number of times to retry a failed request (default 0) 0 modules.nuclei.severity str Filter based on severity field available in the template. modules.nuclei.silent bool Don't display nuclei's banner or status messages False modules.nuclei.tags str execute a subset of templates that contain the provided tags modules.nuclei.templates str template or template directory paths to include in the scan modules.nuclei.version str nuclei version 3.2.0Most of these you probably will NOT want to change. In particular, we advise against changing the version of Nuclei, as it's possible the latest version won't work right with BBOT.
We also do not recommend changing directory_only mode. This will cause Nuclei to process every URL. Because BBOT is recursive, this can get very out-of-hand very quickly, depending on which other modules are in use.
"},{"location":"modules/nuclei/#modes","title":"Modes","text":"The modes with the Nuclei module are generally in place to help you limit the number of templates you are scanning with, to make your scans quicker.
"},{"location":"modules/nuclei/#manual","title":"Manual","text":"This is the default setting, and will use all templates. However, if you're looking to do something particular, you might pair this with some of the pass-through options shown in the next setting.
"},{"location":"modules/nuclei/#severe","title":"Severe","text":"severe mode uses only high/critical severity templates. It also excludes the intrusive tag. This is intended to be a shortcut for times when you need to rapidly identify high severity vulnerabilities but can't afford the full scan. Because most templates are INFO, LOW, or MEDIUM, your scan will finish much faster.
"},{"location":"modules/nuclei/#technology","title":"Technology","text":"This is equivalent to the Nuclei '-as' scan option. It only use templates that match detected technologies, using wappalyzer-based signatures. This can be a nice way to run a light-weight scan that still has a chance to find some good vulnerabilities.
"},{"location":"modules/nuclei/#budget","title":"Budget","text":"Budget mode is unique to BBOT.
For larger scans with thousands of targets, doing a FULL Nuclei scan (1000s of Requests) for each is not realistic. As an alternative to the other modes, you can take advantage of Nuclei's \"collapsible\" template feature.
For only the cost of one (or more) \"extra\" request(s) per host, it can activate several hundred modules. These are modules which happen to look at a BaseUrl, and typically look for a specific string or other attribute. Nuclei is smart about reusing the request data when it can, and we can use this to our advantage.
The budget parameter is the # of extra requests per host you are willing to send to \"feed\" Nuclei templates (defaults to 1). For those times when vulnerability scanning isn't the main focus, but you want to look for easy wins.
Of course, there is a rapidly diminishing return when you set he value to more than a handful. Eventually, this becomes 1 template per 1 budget value increase. However, in the 1-10 range there is a lot of value. This graphic should give you a rough visual idea of this concept.
"},{"location":"modules/nuclei/#nuclei-pass-through-options","title":"Nuclei pass-through options","text":"Most of the rest of the options are usually passed straight through to Nuclei when its executed. You can do things like set specific tags to include, (or exclude with etags), exactly how you'd do with Nuclei directly. You can also limit the templates with severity.
The ratelimit and concurrency settings default to the same defaults that Nuclei does. These are relatively sane settings, but if you are in a sensitive environment it can certainly help to turn them down.
templates will allow you to set your own templates directory. This can be very useful if you have your own custom templates that you want to use with BBOT.
"},{"location":"modules/nuclei/#example-commands","title":"Example Commands","text":"# Scan a SINGLE target with a basic port scan and web modules\nbbot -f web-basic -m portscan nuclei --allow-deadly -t app.evilcorp.com\n
# Scanning MULTIPLE targets\nbbot -f web-basic -m portscan nuclei --allow-deadly -t app1.evilcorp.com app2.evilcorp.com app3.evilcorp.com\n
# Scanning MULTIPLE targets while performing subdomain enumeration\nbbot -f subdomain-enum web-basic -m portscan nuclei --allow-deadly -t app1.evilcorp.com app2.evilcorp.com app3.evilcorp.com\n
# Scanning MULTIPLE targets on a BUDGET\nbbot -f subdomain-enum web-basic -m portscan nuclei --allow-deadly -c modules.nuclei.mode=budget -t app1.evilcorp.com app2.evilcorp.com app3.evilcorp.com\n
"},{"location":"scanning/","title":"Scanning Overview","text":""},{"location":"scanning/#scan-names","title":"Scan Names","text":"Every BBOT scan gets a random, mildly-entertaining name like demonic_jimmy
. Output for that scan, including scan stats and any web screenshots, are saved to a folder by that name in ~/.bbot/scans
. The most recent 20 scans are kept, and older ones are removed.
If you don't want a random name, you can change it with -n
. You can also change the location of BBOT's output with -o
:
# save everything to the folder \"my_scan\" in the current directory\nbbot -t evilcorp.com -f subdomain-enum -m gowitness -n my_scan -o .\n
If you reuse a scan name, BBOT will automatically append to your previous output files.
"},{"location":"scanning/#targets-t","title":"Targets (-t
)","text":"Targets declare what's in-scope, and seed a scan with initial data. BBOT accepts an unlimited number of targets. They can be any of the following:
DNS_NAME
(evilcorp.com
)IP_ADDRESS
(1.2.3.4
)IP_RANGE
(1.2.3.0/24
)OPEN_TCP_PORT
(192.168.0.1:80
)URL
(https://www.evilcorp.com
)Note that BBOT only discriminates down to the host level. This means, for example, if you specify a URL https://www.evilcorp.com
as the target, the scan will be seeded with that URL, but the scope of the scan will be the entire host, www.evilcorp.com
. Other ports/URLs on that same host may also be scanned.
You can specify targets directly on the command line, load them from files, or both! For example:
$ cat targets.txt\n4.3.2.1\n10.0.0.2:80\n1.2.3.0/24\nevilcorp.com\nevilcorp.co.uk\nhttps://www.evilcorp.co.uk\n\n# load targets from a file and from the command-line\n$ bbot -t targets.txt fsociety.com 5.6.7.0/24 -m nmap\n
On start, BBOT automatically converts Targets into Events.
"},{"location":"scanning/#modules-m","title":"Modules (-m
)","text":"To see a full list of modules and their descriptions, use bbot -l
or see List of Modules.
Modules are the part of BBOT that does the work -- port scanning, subdomain brute-forcing, API querying, etc. Modules consume Events (IP_ADDRESS
, DNS_NAME
, etc.) from each other, process the data in a useful way, then emit the results as new events. You can enable individual modules with -m
.
# Enable modules: nmap, sslcert, and httpx\nbbot -t www.evilcorp.com -m nmap sslcert httpx\n
"},{"location":"scanning/#types-of-modules","title":"Types of Modules","text":"Modules fall into three categories:
nmap
, sslcert
, httpx
, etc. Enable with -m
.human
, json
, and csv
are enabled by default. Enable others with -om
. (See: Output)-c speculate=false
).aggregate
: Summarizes results at the end of a scanexcavate
: Extracts useful data such as subdomains from webpages, etc.speculate
: Intelligently infers new events, e.g. OPEN_TCP_PORT
from URL
or IP_ADDRESS
from IP_NETWORK
.For details in the inner workings of modules, see Creating a Module.
"},{"location":"scanning/#flags-f","title":"Flags (-f
)","text":"Flags are how BBOT categorizes its modules. In a way, you can think of them as groups. Flags let you enable a bunch of similar modules at the same time without having to specify them each individually. For example, -f subdomain-enum
would enable every module with the subdomain-enum
flag.
# list all subdomain-enum modules\nbbot -f subdomain-enum -l\n
"},{"location":"scanning/#filtering-modules","title":"Filtering Modules","text":"Modules can be easily enabled/disabled based on their flags:
-f
Enable these flags (e.g. -f subdomain-enum
)-rf
Require modules to have this flag (e.g. -rf safe
)-ef
Exclude these flags (e.g. -ef slow
)-em
Exclude these individual modules (e.g. -em ipneighbor
)-lf
List all available flagsEvery module is either safe
or aggressive
, and either active
or passive
. These can be useful for filtering. For example, if you wanted to enable all the safe
modules, but exclude active ones, you could do:
# Enable safe modules but exclude active ones\nbbot -t evilcorp.com -f safe -ef active\n
This is equivalent to requiring the passive flag:
# Enable safe modules but only if they're also passive\nbbot -t evilcorp.com -f safe -rf passive\n
A single module can have multiple flags. For example, the securitytrails
module is passive
, safe
, subdomain-enum
. Below is a full list of flags and their associated modules.
BBOT modules have external dependencies ranging from OS packages (openssl
) to binaries (nmap
) to Python libraries (wappalyzer
). When a module is enabled, installation of its dependencies happens at runtime with Ansible. BBOT provides several command-line flags to control how dependencies are installed.
--no-deps
- Don't install module dependencies--force-deps
- Force install all module dependencies--retry-deps
- Try again to install failed module dependencies--ignore-failed-deps
- Run modules even if they have failed dependencies--install-all-deps
- Install dependencies for all modules (useful if you are provisioning a pentest system and want to install everything ahead of time)For details on how Ansible playbooks are attached to BBOT modules, see How to Write a Module.
"},{"location":"scanning/#scope","title":"Scope","text":"For pentesters and bug bounty hunters, staying in scope is extremely important. BBOT takes this seriously, meaning that active modules (e.g. nuclei
) will only touch in-scope resources.
By default, scope is whatever you specify with -t
. This includes child subdomains. For example, if you specify -t evilcorp.com
, all its subdomains (www.evilcorp.com
, mail.evilcorp.com
, etc.) also become in-scope.
Since BBOT is recursive, it would quickly resort to scanning the entire internet without some kind of restraining mechanism. To solve this problem, every event discovered by BBOT is assigned a Scope Distance. Scope distance represents how far out from the main scope that data was discovered.
For example, if your target is evilcorp.com
, www.evilcorp.com
would have a scope distance of 0
(i.e. in-scope). If BBOT discovers that www.evilcorp.com
resolves to 1.2.3.4
, 1.2.3.4
is one hop away, which means it would have a scope distance of 1
. If 1.2.3.4
has a PTR record that points to ecorp.blob.core.windows.net
, ecorp.blob.core.windows.net
is two hops away, so its scope distance is 2
.
Scope distance continues to increase the further out you get. Most modules (e.g. nuclei
and nmap
) only consume in-scope events. Certain other passive modules such as asn
accept out to distance 1
. By default, DNS resolution happens out to a distance of 2
. Upon its discovery, any event that's determined to be in-scope (e.g. www.evilcorp.com
) immediately becomes distance 0
, and the cycle starts over.
By default, BBOT only displays in-scope events (with a few exceptions such as STORAGE_BUCKET
s). If you want to see more, you must increase the config value of scope.report_distance
:
# display out-of-scope events up to one hop away from the main scope\nbbot -t evilcorp.com -f subdomain-enum -c scope.report_distance=1\n
"},{"location":"scanning/#strict-scope","title":"Strict Scope","text":"If you want to scan only that specific target hostname and none of its children, you can specify --strict-scope
.
Note that --strict-scope
only applies to targets and whitelists, but not blacklists. This means that if you put internal.evilcorp.com
in your blacklist, you can be sure none of its subdomains will be scanned, even when using --strict-scope
.
BBOT allows precise control over scope with whitelists and blacklists. These both use the same syntax as --target
, meaning they accept the same event types, and you can specify an unlimited number of them, via a file, the CLI, or both.
--whitelist
enables you to override what's in scope. For example, if you want to run nuclei against evilcorp.com
, but stay only inside their corporate IP range of 1.2.3.0/24
, you can accomplish this like so:
# Seed scan with evilcorp.com, but restrict scope to 1.2.3.0/24\nbbot -t evilcorp.com --whitelist 1.2.3.0/24 -f subdomain-enum -m nmap nuclei --allow-deadly\n
--blacklist
takes ultimate precedence. Anything in the blacklist is completely excluded from the scan, even if it's in the whitelist.
# Scan evilcorp.com, but exclude internal.evilcorp.com and its children\nbbot -t evilcorp.com --blacklist internal.evilcorp.com -f subdomain-enum -m nmap nuclei --allow-deadly\n
"},{"location":"scanning/#dns-wildcards","title":"DNS Wildcards","text":"BBOT has robust wildcard detection built-in. It can reliably detect wildcard domains, and will tag them accordingly:
[DNS_NAME] github.io TARGET (a-record, a-wildcard-domain, aaaa-wildcard-domain, wildcard-domain)\n ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^\n
Wildcard hosts are collapsed into a single host beginning with _wildcard
:
[DNS_NAME] _wildcard.github.io TARGET (a-record, a-wildcard, a-wildcard-domain, aaaa-record, aaaa-wildcard, aaaa-wildcard-domain, wildcard, wildcard-domain)\n ^^^^^^^^^\n
If you don't want this, you can disable wildcard detection on a domain-to-domain basis in the config:
~/.bbot/config/bbot.ymldns:\n wildcard_ignore:\n - evilcorp.com\n - evilcorp.co.uk\n
There are certain edge cases (such as with dynamic DNS rules) where BBOT's wildcard detection fails. In these cases, you can try increasing the number of wildcard checks in the config:
~/.bbot/config/bbot.yml# default == 10\ndns:\n wildcard_tests: 20\n
If that doesn't work you can consider blacklisting the offending domain.
"},{"location":"scanning/advanced/","title":"Advanced","text":"Below you can find some advanced uses of BBOT.
"},{"location":"scanning/advanced/#bbot-as-a-python-library","title":"BBOT as a Python library","text":""},{"location":"scanning/advanced/#synchronous","title":"Synchronous","text":"from bbot.scanner import Scanner\n\nif __name__ == \"__main__\":\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n for event in scan.start():\n print(event)\n
"},{"location":"scanning/advanced/#asynchronous","title":"Asynchronous","text":"from bbot.scanner import Scanner\n\nasync def main():\n scan = Scanner(\"evilcorp.com\", presets=[\"subdomain-enum\"])\n async for event in scan.async_start():\n print(event.json())\n\nif __name__ == \"__main__\":\n import asyncio\n asyncio.run(main())\n
"},{"location":"scanning/advanced/#command-line-help","title":"Command-Line Help","text":"usage: bbot [-h] [-t TARGET [TARGET ...]] [-w WHITELIST [WHITELIST ...]] [-b BLACKLIST [BLACKLIST ...]] [--strict-scope] [-p [PRESET ...]] [-c [CONFIG ...]] [-lp]\n [-m MODULE [MODULE ...]] [-l] [-lmo] [-em MODULE [MODULE ...]] [-f FLAG [FLAG ...]] [-lf] [-rf FLAG [FLAG ...]] [-ef FLAG [FLAG ...]] [--allow-deadly] [-n SCAN_NAME] [-v]\n [-d] [-s] [--force] [-y] [--dry-run] [--current-preset] [--current-preset-full] [-o DIR] [-om MODULE [MODULE ...]] [--json] [--brief]\n [--event-types EVENT_TYPES [EVENT_TYPES ...]] [--no-deps | --force-deps | --retry-deps | --ignore-failed-deps | --install-all-deps] [--version]\n [-H CUSTOM_HEADERS [CUSTOM_HEADERS ...]] [--custom-yara-rules CUSTOM_YARA_RULES]\n\nBighuge BLS OSINT Tool\n\noptions:\n -h, --help show this help message and exit\n\nTarget:\n -t TARGET [TARGET ...], --targets TARGET [TARGET ...]\n Targets to seed the scan\n -w WHITELIST [WHITELIST ...], --whitelist WHITELIST [WHITELIST ...]\n What's considered in-scope (by default it's the same as --targets)\n -b BLACKLIST [BLACKLIST ...], --blacklist BLACKLIST [BLACKLIST ...]\n Don't touch these things\n --strict-scope Don't consider subdomains of target/whitelist to be in-scope\n\nPresets:\n -p [PRESET ...], --preset [PRESET ...]\n Enable BBOT preset(s)\n -c [CONFIG ...], --config [CONFIG ...]\n Custom config options in key=value format: e.g. 'modules.shodan.api_key=1234'\n -lp, --list-presets List available presets.\n\nModules:\n -m MODULE [MODULE ...], --modules MODULE [MODULE ...]\n Modules to enable. Choices: viewdns,postman,baddns_zone,dehashed,bucket_file_enum,asn,generic_ssrf,github_codesearch,columbus,azure_realm,dotnetnuke,dockerhub,credshed,passivetotal,certspotter,builtwith,otx,ipneighbor,fingerprintx,oauth,robots,dnsbrute_mutations,httpx,paramminer_headers,digitorus,gitlab,hunt,hunterio,trufflehog,ffuf,nuclei,badsecrets,git,bucket_firebase,ffuf_shortnames,urlscan,docker_pull,ip2location,subdomaincenter,telerik,pgp,zoomeye,shodan_dns,trickest,dnscommonsrv,ntlm,myssl,internetdb,emailformat,dastardly,azure_tenant,github_workflows,crt,affiliates,wayback,ajaxpro,wafw00f,iis_shortnames,sslcert,chaos,newsletters,host_header,bucket_amazon,vhost,paramminer_cookies,virustotal,rapiddns,leakix,dnsbrute,baddns,url_manipulation,code_repository,smuggler,bevigil,paramminer_getparams,unstructured,skymem,securitytrails,sitedossier,git_clone,bucket_azure,bucket_google,bypass403,wpscan,dnsdumpster,wappalyzer,dnscaa,social,hackertarget,github_org,fullhunt,filedownload,binaryedge,gowitness,anubisdb,portscan,ipstack,secretsdb,c99,censys,bucket_digitalocean\n -l, --list-modules List available modules.\n -lmo, --list-module-options\n Show all module config options\n -em MODULE [MODULE ...], --exclude-modules MODULE [MODULE ...]\n Exclude these modules.\n -f FLAG [FLAG ...], --flags FLAG [FLAG ...]\n Enable modules by flag. Choices: subdomain-hijack,web-paramminer,subdomain-enum,code-enum,cloud-enum,iis-shortnames,web-thorough,baddns,portscan,slow,social-enum,affiliates,safe,web-screenshots,deadly,report,web-basic,email-enum,active,service-enum,aggressive,passive\n -lf, --list-flags List available flags.\n -rf FLAG [FLAG ...], --require-flags FLAG [FLAG ...]\n Only enable modules with these flags (e.g. -rf passive)\n -ef FLAG [FLAG ...], --exclude-flags FLAG [FLAG ...]\n Disable modules with these flags. (e.g. -ef aggressive)\n --allow-deadly Enable the use of highly aggressive modules\n\nScan:\n -n SCAN_NAME, --name SCAN_NAME\n Name of scan (default: random)\n -v, --verbose Be more verbose\n -d, --debug Enable debugging\n -s, --silent Be quiet\n --force Run scan even in the case of condition violations or failed module setups\n -y, --yes Skip scan confirmation prompt\n --dry-run Abort before executing scan\n --current-preset Show the current preset in YAML format\n --current-preset-full\n Show the current preset in its full form, including defaults\n\nOutput:\n -o DIR, --output-dir DIR\n Directory to output scan results\n -om MODULE [MODULE ...], --output-modules MODULE [MODULE ...]\n Output module(s). Choices: subdomains,emails,web_report,json,txt,websocket,slack,asset_inventory,neo4j,splunk,csv,stdout,http,python,discord,teams\n --json, -j Output scan data in JSON format\n --brief, -br Output only the data itself\n --event-types EVENT_TYPES [EVENT_TYPES ...]\n Choose which event types to display\n\nModule dependencies:\n Control how modules install their dependencies\n\n --no-deps Don't install module dependencies\n --force-deps Force install all module dependencies\n --retry-deps Try again to install failed module dependencies\n --ignore-failed-deps Run modules even if they have failed dependencies\n --install-all-deps Install dependencies for all modules\n\nMisc:\n --version show BBOT version and exit\n -H CUSTOM_HEADERS [CUSTOM_HEADERS ...], --custom-headers CUSTOM_HEADERS [CUSTOM_HEADERS ...]\n List of custom headers as key value pairs (header=value).\n --custom-yara-rules CUSTOM_YARA_RULES, -cy CUSTOM_YARA_RULES\n Add custom yara rules to excavate\n\nEXAMPLES\n\n Subdomains:\n bbot -t evilcorp.com -p subdomain-enum\n\n Subdomains (passive only):\n bbot -t evilcorp.com -p subdomain-enum -rf passive\n\n Subdomains + port scan + web screenshots:\n bbot -t evilcorp.com -p subdomain-enum -m portscan gowitness -n my_scan -o .\n\n Subdomains + basic web scan:\n bbot -t evilcorp.com -p subdomain-enum web-basic\n\n Web spider:\n bbot -t www.evilcorp.com -p spider -c web.spider_distance=2 web.spider_depth=2\n\n Everything everywhere all at once:\n bbot -t evilcorp.com -p kitchen-sink\n\n List modules:\n bbot -l\n\n List presets:\n bbot -lp\n\n List flags:\n bbot -lf\n
"},{"location":"scanning/configuration/","title":"Configuration Overview","text":"Normally, Presets are used to configure a scan. However, there may be cases where you want to change BBOT's global defaults so a certain option is always set, even if it's not specified in a preset.
BBOT has a YAML config at ~/.config/bbot.yml
. This is the first config that BBOT loads, so it's a good place to put default settings like http_proxy
, max_threads
, or http_user_agent
. You can also put any module settings here, including API keys.
For a list of all possible config options, see:
For examples of common config changes, see Tips and Tricks.
"},{"location":"scanning/configuration/#configuration-files","title":"Configuration Files","text":"BBOT loads its config from the following files, in this order (last one loaded == highest priority):
~/.config/bbot/bbot.yml
<-- Global BBOT config-p
) <-- Presets are good for scan-specific settings-c
) <-- CLI overrides everythingbbot.yml
will be automatically created for you when you first run BBOT.
You can specify config options either via the command line or the config. For example, if you want to proxy your BBOT scan through a local proxy like Burp Suite, you could either do:
# send BBOT traffic through an HTTP proxy\nbbot -t evilcorp.com -c http_proxy=http://127.0.0.1:8080\n
Or, in ~/.config/bbot/config.yml
:
http_proxy: http://127.0.0.1:8080\n
These two are equivalent.
Config options specified via the command-line take precedence over all others. You can give BBOT a custom config file with -c myconf.yml
, or individual arguments like this: -c modules.shodan_dns.api_key=deadbeef
. To display the full and current BBOT config, including any command-line arguments, use bbot -c
.
Note that placing the following in bbot.yml
: ~/.bbot/config/bbot.yml
modules:\n shodan_dns:\n api_key: deadbeef\n
Is the same as: bbot -c modules.shodan_dns.api_key=deadbeef\n
"},{"location":"scanning/configuration/#global-config-options","title":"Global Config Options","text":"Below is a full list of the config options supported, along with their defaults.
defaults.yml### BASIC OPTIONS ###\n\n# BBOT working directory\nhome: ~/.bbot\n# How many scan results to keep before cleaning up the older ones\nkeep_scans: 20\n# Interval for displaying status messages\nstatus_frequency: 15\n# Include the raw data of files (i.e. PDFs, web screenshots) as base64 in the event\nfile_blobs: false\n# Include the raw data of directories (i.e. git repos) as tar.gz base64 in the event\nfolder_blobs: false\n\n### SCOPE ###\n\nscope:\n # Filter by scope distance which events are displayed in the output\n # 0 == show only in-scope events (affiliates are always shown)\n # 1 == show all events up to distance-1 (1 hop from target)\n report_distance: 0\n # How far out from the main scope to search\n # Do not change this setting unless you know what you're doing\n search_distance: 0\n\n### DNS ###\n\ndns:\n # Completely disable DNS resolution (careful if you have IP whitelists/blacklists, consider using minimal=true instead)\n disable: false\n # Speed up scan by not creating any new DNS events, and only resolving A and AAAA records\n minimal: false\n # How many instances of the dns module to run concurrently\n threads: 20\n # How many concurrent DNS resolvers to use when brute-forcing\n # (under the hood this is passed through directly to massdns -s)\n brute_threads: 1000\n # How far away from the main target to explore via DNS resolution (independent of scope.search_distance)\n # This is safe to change\n search_distance: 1\n # Limit how many DNS records can be followed in a row (stop malicious/runaway DNS records)\n runaway_limit: 5\n # DNS query timeout\n timeout: 5\n # How many times to retry DNS queries\n retries: 1\n # Completely disable BBOT's DNS wildcard detection\n wildcard_disable: False\n # Disable BBOT's DNS wildcard detection for select domains\n wildcard_ignore: []\n # How many sanity checks to make when verifying wildcard DNS\n # Increase this value if BBOT's wildcard detection isn't working\n wildcard_tests: 10\n # Skip DNS requests for a certain domain and rdtype after encountering this many timeouts or SERVFAILs\n # This helps prevent faulty DNS servers from hanging up the scan\n abort_threshold: 50\n # Don't show PTR records containing IP addresses\n filter_ptrs: true\n # Enable/disable debug messages for DNS queries\n debug: false\n # For performance reasons, always skip these DNS queries\n # Microsoft's DNS infrastructure is misconfigured so that certain queries to mail.protection.outlook.com always time out\n omit_queries:\n - SRV:mail.protection.outlook.com\n - CNAME:mail.protection.outlook.com\n - TXT:mail.protection.outlook.com\n\n### WEB ###\n\nweb:\n # HTTP proxy\n http_proxy: \n # Web user-agent\n user_agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.2151.97\n # Set the maximum number of HTTP links that can be followed in a row (0 == no spidering allowed)\n spider_distance: 0\n # Set the maximum directory depth for the web spider\n spider_depth: 1\n # Set the maximum number of links that can be followed per page\n spider_links_per_page: 25\n # HTTP timeout (for Python requests; API calls, etc.)\n http_timeout: 10\n # HTTP timeout (for httpx)\n httpx_timeout: 5\n # Custom HTTP headers (e.g. cookies, etc.)\n # in the format { \"Header-Key\": \"header_value\" }\n # These are attached to all in-scope HTTP requests\n # Note that some modules (e.g. github) may end up sending these to out-of-scope resources\n http_headers: {}\n # HTTP retries (for Python requests; API calls, etc.)\n http_retries: 1\n # HTTP retries (for httpx)\n httpx_retries: 1\n # Enable/disable debug messages for web requests/responses\n debug: false\n # Maximum number of HTTP redirects to follow\n http_max_redirects: 5\n # Whether to verify SSL certificates\n ssl_verify: false\n\n# Tool dependencies\ndeps:\n ffuf:\n version: \"2.1.0\"\n\n### ADVANCED OPTIONS ###\n\n# Load BBOT modules from these custom paths\nmodule_paths: []\n\n# Infer certain events from others, e.g. IPs from IP ranges, DNS_NAMEs from URLs, etc.\nspeculate: True\n# Passively search event data for URLs, hostnames, emails, etc.\nexcavate: True\n# Summarize activity at the end of a scan\naggregate: True\n# DNS resolution, wildcard detection, etc.\ndnsresolve: True\n# Cloud provider tagging\ncloudcheck: True\n\n# How to handle installation of module dependencies\n# Choices are:\n# - abort_on_failure (default) - if a module dependency fails to install, abort the scan\n# - retry_failed - try again to install failed dependencies\n# - ignore_failed - run the scan regardless of what happens with dependency installation\n# - disable - completely disable BBOT's dependency system (you are responsible for installing tools, pip packages, etc.)\ndeps_behavior: abort_on_failure\n\n# Strip querystring from URLs by default\nurl_querystring_remove: True\n# When query string is retained, by default collapse parameter values down to a single value per parameter\nurl_querystring_collapse: True\n\n# Completely ignore URLs with these extensions\nurl_extension_blacklist:\n # images\n - png\n - jpg\n - bmp\n - ico\n - jpeg\n - gif\n - svg\n - webp\n # web/fonts\n - css\n - woff\n - woff2\n - ttf\n - eot\n - sass\n - scss\n # audio\n - mp3\n - m4a\n - wav\n - flac\n # video\n - mp4\n - mkv\n - avi\n - wmv\n - mov\n - flv\n - webm\n# Distribute URLs with these extensions only to httpx (these are omitted from output)\nurl_extension_httpx_only:\n - js\n# Don't output these types of events (they are still distributed to modules)\nomit_event_types:\n - HTTP_RESPONSE\n - RAW_TEXT\n - URL_UNVERIFIED\n - DNS_NAME_UNRESOLVED\n - FILESYSTEM\n - WEB_PARAMETER\n - RAW_DNS_RECORD\n # - IP_ADDRESS\n\n# Custom interactsh server settings\ninteractsh_server: null\ninteractsh_token: null\ninteractsh_disable: false\n
"},{"location":"scanning/configuration/#module-config-options","title":"Module Config Options","text":"Many modules accept their own configuration options. These options have the ability to change their behavior. For example, the portscan
module accepts options for ports
, rate
, etc. Below is a list of all possible module config options.
['https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=vindex&search=<query>', 'http://the.earth.li:11371/pks/lookup?fingerprint=on&op=vindex&search=<query>', 'https://pgpkeys.eu/pks/lookup?search=<query>&op=index', 'https://pgp.mit.edu/pks/lookup?search=<query>&op=index']
modules.securitytrails.api_key str SecurityTrails API key modules.shodan_dns.api_key str Shodan API key modules.trickest.api_key str Trickest API key modules.trufflehog.concurrency int Number of concurrent workers 8 modules.trufflehog.only_verified bool Only report credentials that have been verified True modules.trufflehog.version str trufflehog version 3.75.1 modules.unstructured.extensions list File extensions to parse ['bak', 'bash', 'bashrc', 'conf', 'cfg', 'crt', 'csv', 'db', 'sqlite', 'doc', 'docx', 'ica', 'indd', 'ini', 'key', 'pub', 'log', 'markdown', 'md', 'odg', 'odp', 'ods', 'odt', 'pdf', 'pem', 'pps', 'ppsx', 'ppt', 'pptx', 'ps1', 'rdp', 'sh', 'sql', 'swp', 'sxw', 'txt', 'vbs', 'wpd', 'xls', 'xlsx', 'xml', 'yml', 'yaml'] modules.unstructured.ignore_folders list Subfolders to ignore when crawling downloaded folders ['.git'] modules.urlscan.urls bool Emit URLs in addition to DNS_NAMEs False modules.virustotal.api_key str VirusTotal API Key modules.wayback.garbage_threshold int Dedupe similar urls if they are in a group of this size or higher (lower values == less garbage data) 10 modules.wayback.urls bool emit URLs in addition to DNS_NAMEs False modules.zoomeye.api_key str ZoomEye API key modules.zoomeye.include_related bool Include domains which may be related to the target False modules.zoomeye.max_pages int How many pages of results to fetch 20 modules.asset_inventory.output_file str Set a custom output file modules.asset_inventory.recheck bool When use_previous=True, don't retain past details like open ports or findings. Instead, allow them to be rediscovered by the new scan False modules.asset_inventory.summary_netmask int Subnet mask to use when summarizing IP addresses at end of scan 16 modules.asset_inventory.use_previous bool Emit previous asset inventory as new events (use in conjunction with -n <old_scan_name>)
False modules.csv.output_file str Output to CSV file modules.discord.event_types list Types of events to send ['VULNERABILITY', 'FINDING'] modules.discord.min_severity str Only allow VULNERABILITY events of this severity or higher LOW modules.discord.webhook_url str Discord webhook URL modules.emails.output_file str Output to file modules.http.bearer str Authorization Bearer token modules.http.method str HTTP method POST modules.http.password str Password (basic auth) modules.http.siem_friendly bool Format JSON in a SIEM-friendly way for ingestion into Elastic, Splunk, etc. False modules.http.timeout int HTTP timeout 10 modules.http.url str Web URL modules.http.username str Username (basic auth) modules.json.output_file str Output to file modules.json.siem_friendly bool Output JSON in a SIEM-friendly format for ingestion into Elastic, Splunk, etc. False modules.neo4j.password str Neo4j password bbotislife modules.neo4j.uri str Neo4j server + port bolt://localhost:7687 modules.neo4j.username str Neo4j username neo4j modules.slack.event_types list Types of events to send ['VULNERABILITY', 'FINDING'] modules.slack.min_severity str Only allow VULNERABILITY events of this severity or higher LOW modules.slack.webhook_url str Discord webhook URL modules.splunk.hectoken str HEC Token modules.splunk.index str Index to send data to modules.splunk.source str Source path to be added to the metadata modules.splunk.timeout int HTTP timeout 10 modules.splunk.url str Web URL modules.stdout.accept_dupes bool Whether to show duplicate events, default True True modules.stdout.event_fields list Which event fields to display [] modules.stdout.event_types list Which events to display, default all event types [] modules.stdout.format str Which text format to display, choices: text,json text modules.stdout.in_scope_only bool Whether to only show in-scope events False modules.subdomains.include_unresolved bool Include unresolved subdomains in output False modules.subdomains.output_file str Output to file modules.teams.event_types list Types of events to send ['VULNERABILITY', 'FINDING'] modules.teams.min_severity str Only allow VULNERABILITY events of this severity or higher LOW modules.teams.webhook_url str Discord webhook URL modules.txt.output_file str Output to file modules.web_report.css_theme_file str CSS theme URL for HTML output https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.1.0/github-markdown.min.css modules.web_report.output_file str Output to file modules.websocket.preserve_graph bool Preserve full chains of events in the graph (prevents orphans) True modules.websocket.token str Authorization Bearer token modules.websocket.url str Web URL modules.excavate.custom_yara_rules str Include custom Yara rules modules.excavate.retain_querystring bool Keep the querystring intact on emitted WEB_PARAMETERS False modules.excavate.yara_max_match_data int Sets the maximum amount of text that can extracted from a YARA regex 2000 modules.speculate.max_hosts int Max number of IP_RANGE hosts to convert into IP_ADDRESS events 65536 modules.speculate.ports str The set of ports to speculate on 80,443"},{"location":"scanning/events/","title":"Events","text":"An Event is a piece of data discovered by BBOT. Examples include IP_ADDRESS
, DNS_NAME
, EMAIL_ADDRESS
, URL
, etc. When you run a BBOT scan, events are constantly being exchanged between modules. They are also output to the console:
[DNS_NAME] www.evilcorp.com sslcert (distance-0, in-scope, resolved, subdomain, a-record)\n ^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^\nevent type event data source module tags\n
In addition to the obvious data (e.g. www.evilcorp.com
), an event also contains other useful information such as:
.discovery_path
showing exactly how the event was discovered, starting from the first scan target.timestamp
of when the data was discovered.module
that discovered it.parent
event that led to its discovery.scope_distance
(how many hops it is from the main scope, 0 == in-scope).tags
that describe the data (mx-record
, http-title
, etc.)These attributes allow us to construct a visual graph of events (e.g. in Neo4j) and query/filter/grep them more easily. Here is what a typical event looks like in JSON format:
{\n \"type\": \"URL\",\n \"id\": \"URL:c9962277277393f8895d2a4fa9b7f70b15f3af3e\",\n \"scope_description\": \"in-scope\",\n \"data\": \"https://blog.blacklanternsecurity.com/\",\n \"host\": \"blog.blacklanternsecurity.com\",\n \"resolved_hosts\": [\n \"104.18.40.87\"\n ],\n \"dns_children\": {\n \"A\": [\n \"104.18.40.87\",\n \"172.64.147.169\"\n ]\n },\n \"web_spider_distance\": 0,\n \"scope_distance\": 0,\n \"scan\": \"SCAN:9224b49405e6d1607fd615243577d9ca86c7d206\",\n \"timestamp\": 1717260760.157012,\n \"parent\": \"OPEN_TCP_PORT:ebe3d6c10b41f60e3590ce6436ab62510b91c758\",\n \"tags\": [\n \"in-scope\",\n \"http-title-black-lantern-security-blsops\",\n \"dir\",\n \"ip-104-18-40-87\",\n \"cdn-cloudflare\",\n \"status-200\"\n ],\n \"module\": \"httpx\",\n \"module_sequence\": \"httpx\",\n \"discovery_context\": \"httpx visited blog.blacklanternsecurity.com:443 and got status code 200 at https://blog.blacklanternsecurity.com/\",\n \"discovery_path\": [\n \"Scan difficult_arthur seeded with DNS_NAME: blacklanternsecurity.com\",\n \"certspotter searched certspotter API for \\\"blacklanternsecurity.com\\\" and found DNS_NAME: blog.blacklanternsecurity.com\",\n \"speculated OPEN_TCP_PORT: blog.blacklanternsecurity.com:443\",\n \"httpx visited blog.blacklanternsecurity.com:443 and got status code 200 at https://blog.blacklanternsecurity.com/\"\n ]\n}\n
For a more detailed description of BBOT events, see Developer Documentation - Event.
Below is a full list of event types along with which modules produce/consume them.
"},{"location":"scanning/events/#list-of-event-types","title":"List of Event Types","text":"Event Type # Consuming Modules # Producing Modules Consuming Modules Producing Modules * 15 0 affiliates, cloudcheck, csv, discord, dnsresolve, http, json, neo4j, python, slack, splunk, stdout, teams, txt, websocket ASN 0 1 asn AZURE_TENANT 1 0 speculate CODE_REPOSITORY 3 5 docker_pull, git_clone, github_workflows code_repository, dockerhub, github_codesearch, github_org, gitlab DNS_NAME 56 41 anubisdb, asset_inventory, azure_realm, azure_tenant, baddns, baddns_zone, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, columbus, credshed, crt, dehashed, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, emailformat, fullhunt, github_codesearch, hackertarget, hunterio, internetdb, leakix, myssl, oauth, otx, passivetotal, pgp, portscan, postman, rapiddns, securitytrails, shodan_dns, sitedossier, skymem, speculate, subdomaincenter, subdomains, trickest, urlscan, viewdns, virustotal, wayback, zoomeye anubisdb, azure_tenant, bevigil, binaryedge, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, hackertarget, hunterio, internetdb, leakix, myssl, ntlm, oauth, otx, passivetotal, rapiddns, securitytrails, shodan_dns, sitedossier, speculate, sslcert, subdomaincenter, trickest, urlscan, vhost, viewdns, virustotal, wayback, zoomeye DNS_NAME_UNRESOLVED 3 0 baddns, speculate, subdomains EMAIL_ADDRESS 1 8 emails credshed, dehashed, dnscaa, emailformat, hunterio, pgp, skymem, sslcert FILESYSTEM 2 5 trufflehog, unstructured docker_pull, filedownload, git_clone, github_workflows, unstructured FINDING 2 28 asset_inventory, web_report ajaxpro, baddns, baddns_zone, badsecrets, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, bypass403, dastardly, git, gitlab, host_header, hunt, internetdb, newsletters, ntlm, nuclei, paramminer_cookies, paramminer_getparams, secretsdb, smuggler, speculate, telerik, trufflehog, url_manipulation, wpscan GEOLOCATION 0 2 ip2location, ipstack HASHED_PASSWORD 0 2 credshed, dehashed HTTP_RESPONSE 19 1 ajaxpro, asset_inventory, badsecrets, dastardly, dotnetnuke, excavate, filedownload, gitlab, host_header, newsletters, ntlm, paramminer_cookies, paramminer_getparams, paramminer_headers, secretsdb, speculate, telerik, wappalyzer, wpscan httpx IP_ADDRESS 8 3 asn, asset_inventory, internetdb, ip2location, ipneighbor, ipstack, portscan, speculate asset_inventory, ipneighbor, speculate IP_RANGE 2 0 portscan, speculate OPEN_TCP_PORT 4 4 asset_inventory, fingerprintx, httpx, sslcert asset_inventory, internetdb, portscan, speculate ORG_STUB 2 1 dockerhub, github_org speculate PASSWORD 0 2 credshed, dehashed PROTOCOL 0 1 fingerprintx RAW_TEXT 0 1 unstructured SOCIAL 5 3 dockerhub, github_org, gitlab, gowitness, speculate dockerhub, gitlab, social STORAGE_BUCKET 7 5 bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, speculate bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google TECHNOLOGY 4 8 asset_inventory, gitlab, web_report, wpscan badsecrets, dotnetnuke, gitlab, gowitness, internetdb, nuclei, wappalyzer, wpscan URL 19 2 ajaxpro, asset_inventory, bypass403, ffuf, generic_ssrf, git, gowitness, httpx, iis_shortnames, ntlm, nuclei, robots, smuggler, speculate, telerik, url_manipulation, vhost, wafw00f, web_report gowitness, httpx URL_HINT 1 1 ffuf_shortnames iis_shortnames URL_UNVERIFIED 6 16 code_repository, filedownload, httpx, oauth, social, speculate azure_realm, bevigil, bucket_file_enum, dnscaa, dockerhub, excavate, ffuf, ffuf_shortnames, github_codesearch, gowitness, hunterio, postman, robots, urlscan, wayback, wpscan USERNAME 1 2 speculate credshed, dehashed VHOST 1 1 web_report vhost VULNERABILITY 2 12 asset_inventory, web_report ajaxpro, baddns, baddns_zone, badsecrets, dastardly, dotnetnuke, generic_ssrf, internetdb, nuclei, telerik, trufflehog, wpscan WAF 1 1 asset_inventory wafw00f WEBSCREENSHOT 0 1 gowitness WEB_PARAMETER 4 4 hunt, paramminer_cookies, paramminer_getparams, paramminer_headers excavate, paramminer_cookies, paramminer_getparams, paramminer_headers"},{"location":"scanning/events/#findings-vs-vulnerabilities","title":"Findings Vs. Vulnerabilities","text":"BBOT has a sharp distinction between Findings and Vulnerabilities:
VULNERABILITY
FINDING
By making this separation, actionable vulnerabilities can be identified quickly in the midst of a large scan
"},{"location":"scanning/output/","title":"Output","text":"By default, BBOT saves its output in TXT, JSON, and CSV formats. The filenames are logged at the end of each scan:
Every BBOT scan gets a unique and mildly-entertaining name like demonic_jimmy
. Output for that scan, including scan stats and any web screenshots, etc., are saved to a folder by that name in ~/.bbot/scans
. The most recent 20 scans are kept, and older ones are removed. You can change the location of BBOT's output with --output
, and you can also pick a custom scan name with --name
.
If you reuse a scan name, it will append to its original output files and leverage the previous.
"},{"location":"scanning/output/#output-modules","title":"Output Modules","text":"Multiple simultaneous output formats are possible because of output modules. Output modules are similar to normal modules except they are enabled with -om
.
The stdout
output module is what you see when you execute BBOT in the terminal. By default it looks the same as the txt
module, but it has options you can customize. You can filter by event type, choose the data format (text
, json
), and which fields you want to see:
txt
output is tab-delimited, so it's easy to grep:
# grep out only the DNS_NAMEs\ncat ~/.bbot/scans/extreme_johnny/output.txt | grep '[DNS_NAME]' | cut -f2\nevilcorp.com\nwww.evilcorp.com\nmail.evilcorp.com\n
"},{"location":"scanning/output/#csv","title":"CSV","text":"The csv
output module produces a CSV like this:
If you manually enable the json
output module, it will go to stdout:
bbot -t evilcorp.com -om json | jq\n
You will then see events like this:
{\n \"type\": \"IP_ADDRESS\",\n \"id\": \"IP_ADDRESS:13cd09c2adf0860a582240229cd7ad1dccdb5eb1\",\n \"data\": \"1.2.3.4\",\n \"scope_distance\": 1,\n \"scan\": \"SCAN:64c0e076516ae7aa6502fd99489693d0d5ec26cc\",\n \"timestamp\": 1688518967.740472,\n \"resolved_hosts\": [\"1.2.3.4\"],\n \"parent\": \"DNS_NAME:2da045542abbf86723f22383d04eb453e573723c\",\n \"tags\": [\"distance-1\", \"ipv4\", \"internal\"],\n \"module\": \"A\",\n \"module_sequence\": \"A\"\n}\n
You can filter on the JSON output with jq
:
# pull out only the .data attribute of every DNS_NAME\n$ jq -r 'select(.type==\"DNS_NAME\") | .data' ~/.bbot/scans/extreme_johnny/output.json\nevilcorp.com\nwww.evilcorp.com\nmail.evilcorp.com\n
"},{"location":"scanning/output/#discord-slack-teams","title":"Discord / Slack / Teams","text":"BBOT supports output via webhooks to discord
, slack
, and teams
. To use them, you must specify a webhook URL either in the config:
modules:\n discord:\n webhook_url: https://discord.com/api/webhooks/1234/deadbeef\n
...or on the command line:
bbot -t evilcorp.com -om discord -c modules.discord.webhook_url=https://discord.com/api/webhooks/1234/deadbeef\n
By default, only VULNERABILITY
and FINDING
events are sent, but this can be customized by setting event_types
in the config like so:
modules:\n discord:\n event_types:\n - VULNERABILITY\n - FINDING\n - STORAGE_BUCKET\n
...or on the command line:
bbot -t evilcorp.com -om discord -c modules.discord.event_types=[\"STORAGE_BUCKET\",\"FINDING\",\"VULNERABILITY\"]\n
You can also filter on the severity of VULNERABILITY
events by setting min_severity
:
modules:\n discord:\n min_severity: HIGH\n
"},{"location":"scanning/output/#http","title":"HTTP","text":"The http
output module sends events in JSON format to a desired HTTP endpoint.
# POST scan results to localhost\nbbot -t evilcorp.com -om http -c modules.http.url=http://localhost:8000\n
You can customize the HTTP method if needed. Authentication is also supported:
~/.bbot/config/bbot.ymlmodules:\n http:\n url: https://localhost:8000\n method: PUT\n # Authorization: Bearer\n bearer: <bearer_token>\n # OR\n username: bob\n password: P@ssw0rd\n
"},{"location":"scanning/output/#splunk","title":"Splunk","text":"The splunk
output module sends events in JSON format to a desired splunk instance via HEC.
You can customize this output with the following config options:
~/.bbot/config/bbot.ymlmodules:\n splunk:\n # The full URL with the URI `/services/collector/event`\n url: https://localhost:8088/services/collector/event\n # Generated from splunk webui\n hectoken: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\n # Defaults to `main` if not set\n index: my-specific-index\n # Defaults to `bbot` if not set\n source: /my/source.json\n
"},{"location":"scanning/output/#asset-inventory","title":"Asset Inventory","text":"The asset_inventory
module produces a CSV like this:
The subdomains
output module produces simple text file containing only in-scope and resolved subdomains:
evilcorp.com\nwww.evilcorp.com\nmail.evilcorp.com\nportal.evilcorp.com\n
"},{"location":"scanning/output/#neo4j","title":"Neo4j","text":"Neo4j is the funnest (and prettiest) way to view and interact with BBOT data.
# start Neo4j in the background with docker\ndocker run -d -p 7687:7687 -p 7474:7474 -v \"$(pwd)/neo4j/:/data/\" -e NEO4J_AUTH=neo4j/bbotislife neo4j\n
-om neo4j
bbot -f subdomain-enum -t evilcorp.com -om neo4j\n
neo4j
/ bbotislife
Neo4j uses the Cypher Query Language for its graph query language. Cypher uses common clauses to craft relational queries and present the desired data in multiple formats.
Cypher queries can be broken down into three required pieces; selection, filter, and presentation. The selection piece identifies what data that will be searched against - 90% of the time the \"MATCH\" clause will be enough but there are means to read from csv or json data files. In all of these examples the \"MATCH\" clause will be used. The filter piece helps to focus in on the required data and used the \"WHERE\" clause to accomplish this effort (most basic operators can be used). Finally, the presentation section identifies how the data should be presented back to the querier. While neo4j is a graph database, it can be used in a traditional table view.
A simple query to grab every URL event with \".com\" in the BBOT data field would look like this: MATCH (u:URL) WHERE u.data contains \".com\" RETURN u
In this query the following can be identified: - Within the MATCH statement \"u\" is a variable and can be any value needed by the user while the \"URL\" label is a direct relationship to the BBOT event type. - The WHERE statement allows the query to filter on any of the BBOT event properties like data, tag, or even the label itself. - The RETURN statement is a general presentation of the whole URL event but this can be narrowed down to present any of the specific properties of the BBOT event (RETURN u.data, u.tags
).
The following are a few recommended queries to get started with:
// Get all \"in-scope\" DNS Nodes and return just data and tags properties\nMATCH (n:DNS_NAME)\nWHERE \"in-scope\" IN n.tags\nRETURN n.data, n.tags\n
// Get the count of labels/BBOT events in the Neo4j Database\nMATCH (n)\nRETURN labels(n), count(n)\n
// Get a graph of open ports associated with each domain\nMATCH z = ((n:DNS_NAME) --> (p:OPEN_TCP_PORT))\nRETURN z\n
// Get all domains and IP addresses with open TCP ports\nMATCH (n) --> (p:OPEN_TCP_PORT)\nWHERE \"in-scope\" in n.tags and (n:DNS_NAME or n:IP_ADDRESS)\nWITH *, TAIL(SPLIT(p.data, ':')) AS port\nRETURN n.data, collect(distinct port)\n
// Clear the database\nMATCH (n) DETACH DELETE n\n
This is not an exhaustive list of clauses, filters, or other means to use cypher and should be considered a starting point. To build more advanced queries consider reading Neo4j's Cypher documentation.
Additional note: these sample queries are dependent on the existence of the data in the target neo4j database.
"},{"location":"scanning/presets/","title":"Presets","text":"Once you start customizing BBOT, your commands can start to get really long. Presets let you put all your scan settings in a single file:
bbot -p ./my_preset.yml\n
A Preset is a YAML file that can include scan targets, modules, and config options like API keys.
A typical preset looks like this:
subdomain-enum.ymldescription: Enumerate subdomains via APIs, brute-force\n\nflags:\n - subdomain-enum\n\noutput_modules:\n - subdomains\n
"},{"location":"scanning/presets/#how-to-use-presets-p","title":"How to use Presets (-p
)","text":"BBOT has a ready-made collection of presets for common tasks like subdomain enumeration and web spidering. They live in ~/.bbot/presets
.
To list them, you can do:
# list available presets\nbbot -lp\n
Enable them with -p
:
# do a subdomain enumeration \nbbot -t evilcorp.com -p subdomain-enum\n\n# multiple presets - subdomain enumeration + web spider\nbbot -t evilcorp.com -p subdomain-enum spider\n\n# start with a preset but only enable modules that have the 'passive' flag\nbbot -t evilcorp.com -p subdomain-enum -rf passive\n\n# preset + manual config override\nbbot -t www.evilcorp.com -p spider -c web.spider_distance=10\n
You can build on the default presets, or create your own. Here's an example of a custom preset that builds on subdomain-enum
:
description: Do a subdomain enumeration + basic web scan + nuclei\n\ntarget:\n - evilcorp.com\n\ninclude:\n # include these default presets\n - subdomain-enum\n - web-basic\n\nmodules:\n # enable nuclei in addition to the other modules\n - nuclei\n\nconfig:\n # global config options\n web:\n http_proxy: http://127.0.0.1:8080\n # module config options\n modules:\n # api keys\n securitytrails:\n api_key: 21a270d5f59c9b05813a72bb41707266\n virustotal:\n api_key: 4f41243847da693a4f356c0486114bc6\n
To execute your custom preset, you do:
bbot -p ./my_subdomains.yml\n
"},{"location":"scanning/presets/#preset-load-order","title":"Preset Load Order","text":"When you enable multiple presets, the order matters. In the case of a conflict, the last preset will always win. This means, for example, if you have a custom preset called my_spider
that sets web.spider_distance
to 1:
config:\n web:\n spider_distance: 1\n
...and you enable it alongside the default spider
preset in this order:
bbot -t evilcorp.com -p ./my_spider.yml spider\n
...the value of web.spider_distance
will be overridden by spider
. To ensure this doesn't happen, you would want to switch the order of the presets:
bbot -t evilcorp.com -p spider ./my_spider.yml\n
"},{"location":"scanning/presets/#validating-presets","title":"Validating Presets","text":"To make sure BBOT is configured the way you expect, you can always check the --current-preset
to show the final version of the config that will be used when BBOT executes:
# verify the preset is what you want\nbbot -p ./mypreset.yml --current-preset\n
"},{"location":"scanning/presets/#advanced-usage","title":"Advanced Usage","text":"BBOT Presets support advanced features like environment variable substitution and custom conditions.
"},{"location":"scanning/presets/#environment-variables","title":"Environment Variables","text":"You can insert environment variables into your preset like this: ${env:<variable>}
:
description: Do a nuclei scan\n\ntarget:\n - evilcorp.com\n\nmodules:\n - nuclei\n\nconfig:\n modules:\n nuclei:\n # allow the nuclei templates to be specified at runtime via an environment variable\n tags: ${env:NUCLEI_TAGS}\n
NUCLEI_TAGS=apache,nginx bbot -p ./my_nuclei.yml\n
"},{"location":"scanning/presets/#conditions","title":"Conditions","text":"Sometimes, you might need to add custom logic to a preset. BBOT supports this via conditions
. The conditions
attribute allows you to specify a list of custom conditions that will be evaluated before the scan starts. This is useful for performing last-minute sanity checks, or changing the behavior of the scan based on custom criteria.
description: Abort if nuclei templates aren't specified\n\nmodules:\n - nuclei\n\nconditions:\n - |\n {% if not config.modules.nuclei.templates %}\n {{ abort(\"Don't forget to set your templates!\") }}\n {% endif %}\n
my_preset.ymldescription: Enable ffuf but only when the web spider isn't also enabled\n\nmodules:\n - ffuf\n\nconditions:\n - |\n {% if config.web.spider_distance > 0 and config.web.spider_depth > 0 %}\n {{ warn(\"Disabling ffuf because the web spider is enabled\") }}\n {{ preset.exclude_module(\"ffuf\") }}\n {% endif %}\n
Conditions use Jinja, which means they can contain Python code. They run inside a sandboxed environment which has access to the following variables:
preset
- the current preset objectconfig
- the current config (an alias for preset.config
)warn(message)
- display a custom warning message to the userabort(message)
- abort the scan with an optional messageIf you aren't able to accomplish what you want with conditions, or if you need access to a new variable/function, please let us know on Github.
"},{"location":"scanning/presets_list/","title":"List of Presets","text":"Below is a list of every default BBOT preset, including its YAML.
"},{"location":"scanning/presets_list/#cloud-enum","title":"cloud-enum","text":"Enumerate cloud resources such as storage buckets, etc.
cloud-enum.yml
~/.bbot/presets/cloud-enum.ymldescription: Enumerate cloud resources such as storage buckets, etc.\n\ninclude:\n - subdomain-enum\n\nflags:\n - cloud-enum\n
Modules: 53
"},{"location":"scanning/presets_list/#code-enum","title":"code-enum","text":"Enumerate Git repositories, Docker images, etc.
code-enum.yml
~/.bbot/presets/code-enum.ymldescription: Enumerate Git repositories, Docker images, etc.\n\nflags:\n - code-enum\n
Modules: 10
"},{"location":"scanning/presets_list/#dirbust-heavy","title":"dirbust-heavy","text":"Recursive web directory brute-force (aggressive)
dirbust-heavy.yml
~/.bbot/presets/web/dirbust-heavy.ymldescription: Recursive web directory brute-force (aggressive)\n\ninclude:\n - spider\n\nflags:\n - iis-shortnames\n\nmodules:\n - ffuf\n - wayback\n\nconfig:\n modules:\n iis_shortnames:\n # we exploit the shortnames vulnerability to produce URL_HINTs which are consumed by ffuf_shortnames\n detect_only: False\n ffuf:\n depth: 3\n lines: 5000\n extensions:\n - php\n - asp\n - aspx\n - ashx\n - asmx\n - jsp\n - jspx\n - cfm\n - zip\n - conf\n - config\n - xml\n - json\n - yml\n - yaml\n # emit URLs from wayback\n wayback:\n urls: True\n
Category: web
Modules: 5
"},{"location":"scanning/presets_list/#dirbust-light","title":"dirbust-light","text":"Basic web directory brute-force (surface-level directories only)
dirbust-light.yml
~/.bbot/presets/web/dirbust-light.ymldescription: Basic web directory brute-force (surface-level directories only)\n\ninclude:\n - iis-shortnames\n\nmodules:\n - ffuf\n\nconfig:\n modules:\n ffuf:\n # wordlist size = 1000\n lines: 1000\n
Category: web
Modules: 4
"},{"location":"scanning/presets_list/#dotnet-audit","title":"dotnet-audit","text":"Comprehensive scan for all IIS/.NET specific modules and module settings
dotnet-audit.yml
~/.bbot/presets/web/dotnet-audit.ymldescription: Comprehensive scan for all IIS/.NET specific modules and module settings\n\n\ninclude:\n - iis-shortnames\n\nmodules:\n - httpx\n - badsecrets\n - ffuf_shortnames\n - ffuf\n - telerik\n - ajaxpro\n - dotnetnuke\n\nconfig:\n modules:\n ffuf:\n extensions: asp,aspx,ashx,asmx,ascx\n telerik:\n exploit_RAU_crypto: True\n
Category: web
Modules: 8
"},{"location":"scanning/presets_list/#email-enum","title":"email-enum","text":"Enumerate email addresses from APIs, web crawling, etc.
email-enum.yml
~/.bbot/presets/email-enum.ymldescription: Enumerate email addresses from APIs, web crawling, etc.\n\nflags:\n - email-enum\n\noutput_modules:\n - emails\n
Modules: 7
"},{"location":"scanning/presets_list/#iis-shortnames","title":"iis-shortnames","text":"Recursively enumerate IIS shortnames
iis-shortnames.yml
~/.bbot/presets/web/iis-shortnames.ymldescription: Recursively enumerate IIS shortnames\n\nflags:\n - iis-shortnames\n\nconfig:\n modules:\n iis_shortnames:\n # exploit the vulnerability\n detect_only: false\n
Category: web
Modules: 3
"},{"location":"scanning/presets_list/#kitchen-sink","title":"kitchen-sink","text":"Everything everywhere all at once
kitchen-sink.yml
~/.bbot/presets/kitchen-sink.ymldescription: Everything everywhere all at once\n\ninclude:\n - subdomain-enum\n - cloud-enum\n - code-enum\n - email-enum\n - spider\n - web-basic\n - paramminer\n - dirbust-light\n - web-screenshots\n\nconfig:\n modules:\n baddns:\n enable_references: True\n
Modules: 75
"},{"location":"scanning/presets_list/#paramminer","title":"paramminer","text":"Discover new web parameters via brute-force
paramminer.yml
~/.bbot/presets/web/paramminer.ymldescription: Discover new web parameters via brute-force\n\nflags:\n - web-paramminer\n\nmodules:\n - httpx\n\nconfig:\n web:\n spider_distance: 1\n spider_depth: 4\n
Category: web
Modules: 4
"},{"location":"scanning/presets_list/#spider","title":"spider","text":"Recursive web spider
spider.yml
~/.bbot/presets/spider.ymldescription: Recursive web spider\n\nmodules:\n - httpx\n\nconfig:\n web:\n # how many links to follow in a row\n spider_distance: 2\n # don't follow links whose directory depth is higher than 4\n spider_depth: 4\n # maximum number of links to follow per page\n spider_links_per_page: 25\n
Modules: 1
"},{"location":"scanning/presets_list/#subdomain-enum","title":"subdomain-enum","text":"Enumerate subdomains via APIs, brute-force
subdomain-enum.yml
~/.bbot/presets/subdomain-enum.ymldescription: Enumerate subdomains via APIs, brute-force\n\nflags:\n # enable every module with the subdomain-enum flag\n - subdomain-enum\n\noutput_modules:\n # output unique subdomains to TXT file\n - subdomains\n\nconfig:\n dns:\n threads: 25\n brute_threads: 1000\n # put your API keys here\n modules:\n github:\n api_key: \"\"\n chaos:\n api_key: \"\"\n securitytrails:\n api_key: \"\"\n
Modules: 46
"},{"location":"scanning/presets_list/#web-basic","title":"web-basic","text":"Quick web scan
web-basic.yml
~/.bbot/presets/web-basic.ymldescription: Quick web scan\n\ninclude:\n - iis-shortnames\n\nflags:\n - web-basic\n
Modules: 18
"},{"location":"scanning/presets_list/#web-screenshots","title":"web-screenshots","text":"Take screenshots of webpages
web-screenshots.yml
~/.bbot/presets/web-screenshots.ymldescription: Take screenshots of webpages\n\nflags:\n - web-screenshots\n\nconfig:\n modules:\n gowitness:\n resolution_x: 1440\n resolution_y: 900\n # folder to output web screenshots (default is inside ~/.bbot/scans/scan_name)\n output_path: \"\"\n # whether to take screenshots of social media pages\n social: True\n
Modules: 3
"},{"location":"scanning/presets_list/#web-thorough","title":"web-thorough","text":"Aggressive web scan
web-thorough.yml
~/.bbot/presets/web-thorough.ymldescription: Aggressive web scan\n\ninclude:\n # include the web-basic preset\n - web-basic\n\nflags:\n - web-thorough\n
Modules: 29
"},{"location":"scanning/presets_list/#table-of-default-presets","title":"Table of Default Presets","text":"Here is a the same data, but in a table:
Preset Category Description # Modules Modules cloud-enum Enumerate cloud resources such as storage buckets, etc. 53 anubisdb, asn, azure_realm, azure_tenant, baddns, baddns_zone, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, github_codesearch, github_org, hackertarget, httpx, hunterio, internetdb, ipneighbor, leakix, myssl, oauth, otx, passivetotal, postman, rapiddns, securitytrails, shodan_dns, sitedossier, social, sslcert, subdomaincenter, trickest, urlscan, virustotal, wayback, zoomeye code-enum Enumerate Git repositories, Docker images, etc. 10 code_repository, dockerhub, git, github_codesearch, github_org, gitlab, httpx, postman, social, trufflehog dirbust-heavy web Recursive web directory brute-force (aggressive) 5 ffuf, ffuf_shortnames, httpx, iis_shortnames, wayback dirbust-light web Basic web directory brute-force (surface-level directories only) 4 ffuf, ffuf_shortnames, httpx, iis_shortnames dotnet-audit web Comprehensive scan for all IIS/.NET specific modules and module settings 8 ajaxpro, badsecrets, dotnetnuke, ffuf, ffuf_shortnames, httpx, iis_shortnames, telerik email-enum Enumerate email addresses from APIs, web crawling, etc. 7 dehashed, dnscaa, emailformat, hunterio, pgp, skymem, sslcert iis-shortnames web Recursively enumerate IIS shortnames 3 ffuf_shortnames, httpx, iis_shortnames kitchen-sink Everything everywhere all at once 75 anubisdb, asn, azure_realm, azure_tenant, baddns, baddns_zone, badsecrets, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, code_repository, columbus, crt, dehashed, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, dockerhub, emailformat, ffuf, ffuf_shortnames, filedownload, fullhunt, git, github_codesearch, github_org, gitlab, gowitness, hackertarget, httpx, hunterio, iis_shortnames, internetdb, ipneighbor, leakix, myssl, ntlm, oauth, otx, paramminer_cookies, paramminer_getparams, paramminer_headers, passivetotal, pgp, postman, rapiddns, robots, secretsdb, securitytrails, shodan_dns, sitedossier, skymem, social, sslcert, subdomaincenter, trickest, trufflehog, urlscan, virustotal, wappalyzer, wayback, zoomeye paramminer web Discover new web parameters via brute-force 4 httpx, paramminer_cookies, paramminer_getparams, paramminer_headers spider Recursive web spider 1 httpx subdomain-enum Enumerate subdomains via APIs, brute-force 46 anubisdb, asn, azure_realm, azure_tenant, baddns_zone, bevigil, binaryedge, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, github_codesearch, github_org, hackertarget, httpx, hunterio, internetdb, ipneighbor, leakix, myssl, oauth, otx, passivetotal, postman, rapiddns, securitytrails, shodan_dns, sitedossier, social, sslcert, subdomaincenter, trickest, urlscan, virustotal, wayback, zoomeye web-basic Quick web scan 18 azure_realm, baddns, badsecrets, bucket_amazon, bucket_azure, bucket_firebase, bucket_google, ffuf_shortnames, filedownload, git, httpx, iis_shortnames, ntlm, oauth, robots, secretsdb, sslcert, wappalyzer web-screenshots Take screenshots of webpages 3 gowitness, httpx, social web-thorough Aggressive web scan 29 ajaxpro, azure_realm, baddns, badsecrets, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, bypass403, dastardly, dotnetnuke, ffuf_shortnames, filedownload, generic_ssrf, git, host_header, httpx, hunt, iis_shortnames, ntlm, oauth, robots, secretsdb, smuggler, sslcert, telerik, url_manipulation, wappalyzer"},{"location":"scanning/tips_and_tricks/","title":"Tips and Tricks","text":"Below are some helpful tricks to help you in your adventures.
"},{"location":"scanning/tips_and_tricks/#change-verbosity-during-scan","title":"Change Verbosity During Scan","text":"Press enter during a BBOT scan to change the log level. This will allow you to see debugging messages, etc.
"},{"location":"scanning/tips_and_tricks/#kill-individual-module-during-scan","title":"Kill Individual Module During Scan","text":"Sometimes a certain module can get stuck or slow down the scan. If this happens and you want to kill it, just type \"kill <module>
\" in the terminal and press enter. This will kill and disable the module for the rest of the scan.
You can also kill multiple modules at a time by specifying them in a space or comma-separated list:
kill httpx sslcert\n
"},{"location":"scanning/tips_and_tricks/#common-config-changes","title":"Common Config Changes","text":""},{"location":"scanning/tips_and_tricks/#speed-up-slow-modules","title":"Speed Up Slow Modules","text":"BBOT modules can be parallelized so that more than one instance runs at a time. By default, many modules are already set to reasonable defaults:
class baddns(BaseModule):\n module_threads = 8\n
To override this, you can set a module's module_threads
in the config:
# increase baddns threads to 20\nbbot -t evilcorp.com -m baddns -c modules.baddns.module_threads=20\n
"},{"location":"scanning/tips_and_tricks/#boost-dns-brute-force-speed","title":"Boost DNS Brute-force Speed","text":"If you have a fast internet connection or are running BBOT from a cloud VM, you can speed up subdomain enumeration by cranking the threads for massdns
. The default is 1000
, which is about 1MB/s of DNS traffic:
# massdns with 5000 resolvers, about 5MB/s\nbbot -t evilcorp.com -f subdomain-enum -c dns.brute_threads=5000\n
"},{"location":"scanning/tips_and_tricks/#web-spider","title":"Web Spider","text":"The web spider is great for finding juicy data like subdomains, email addresses, and javascript secrets buried in webpages. However since it can lengthen the duration of a scan, it's disabled by default. To enable the web spider, you must increase the value of web.spider_distance
.
The web spider is controlled with three config values:
web.spider_depth
(default: 1
: the maximum directory depth allowed. This is to prevent the spider from delving too deep into a website.web.spider_distance
(0
== all spidering disabled, default: 0
): the maximum number of links that can be followed in a row. This is designed to limit the spider in cases where web.spider_depth
fails (e.g. for an ecommerce website with thousands of base-level URLs).web.spider_links_per_page
(default: 25
): the maximum number of links per page that can be followed. This is designed to save you in cases where a single page has hundreds or thousands of links.Here is a typical example:
spider.ymlconfig:\n web:\n spider_depth: 2\n spider_distance: 2\n spider_links_per_page: 25\n
# run the web spider against www.evilcorp.com\nbbot -t www.evilcorp.com -m httpx -c spider.yml\n
You can also pair the web spider with subdomain enumeration:
# spider every subdomain of evilcorp.com\nbbot -t evilcorp.com -f subdomain-enum -c spider.yml\n
"},{"location":"scanning/tips_and_tricks/#ingesting-bbot-data-into-siem-elastic-splunk","title":"Ingesting BBOT Data Into SIEM (Elastic, Splunk)","text":"If your goal is to feed BBOT data into a SIEM such as Elastic, be sure to enable this option when scanning:
bbot -t evilcorp.com -c modules.json.siem_friendly=true\n
This nests the event's .data
beneath its event type like so:
{\n \"type\": \"DNS_NAME\",\n \"data\": {\n \"DNS_NAME\": \"blacklanternsecurity.com\"\n }\n}\n
"},{"location":"scanning/tips_and_tricks/#custom-http-proxy","title":"Custom HTTP Proxy","text":"Web pentesters may appreciate BBOT's ability to quickly populate Burp Suite site maps for all subdomains in a target. If your scan includes gowitness, this will capture the traffic as if you manually visited each website in your browser -- including auxiliary web resources and javascript API calls. To accomplish this, set the web.http_proxy
config option like so:
# enumerate subdomains, take web screenshots, proxy through Burp\nbbot -t evilcorp.com -f subdomain-enum -m gowitness -c web.http_proxy=http://127.0.0.1:8080\n
"},{"location":"scanning/tips_and_tricks/#display-http_response-events","title":"Display HTTP_RESPONSE
Events","text":"BBOT's httpx
module emits HTTP_RESPONSE
events, but by default they're hidden from output. These events contain the full raw HTTP body along with headers, etc. If you want to see them, you can modify omit_event_types
in the config:
omit_event_types:\n - URL_UNVERIFIED\n # - HTTP_RESPONSE\n
"},{"location":"scanning/tips_and_tricks/#display-out-of-scope-events","title":"Display Out-of-scope Events","text":"By default, BBOT only shows in-scope events (with a few exceptions for things like storage buckets). If you want to see events that BBOT is emitting internally (such as for DNS resolution, etc.), you can increase scope.report_distance
in the config or on the command line like so:
# display events up to scope distance 2 (default == 0)\nbbot -f subdomain-enum -t evilcorp.com -c scope.report_distance=2\n
"},{"location":"scanning/tips_and_tricks/#speed-up-scans-by-disabling-dns-resolution","title":"Speed Up Scans By Disabling DNS Resolution","text":"If you already have a list of discovered targets (e.g. URLs), you can speed up the scan by skipping BBOT's DNS resolution. You can do this by setting dns.disable
to true
:
# completely disable DNS resolution\nbbot -m httpx gowitness wappalyzer -t urls.txt -c dns.disable=true\n
Note that the above setting completely disables DNS resolution, meaning even A
and AAAA
records are not resolved. This can cause problems if you're using an IP whitelist or blacklist. In this case, you'll want to use dns.minimal
instead:
# only resolve A and AAAA records\nbbot -m httpx gowitness wappalyzer -t urls.txt -c dns.minimal=true\n
"},{"location":"scanning/tips_and_tricks/#faq","title":"FAQ","text":""},{"location":"scanning/tips_and_tricks/#what-is-url_unverified","title":"What is URL_UNVERIFIED
?","text":"URL_UNVERIFIED
events are URLs that haven't yet been visited by httpx
. Once httpx
visits them, it reraises them as URL
s, tagged with their resulting status code.
For example, when excavate
gets an HTTP_RESPONSE
event, it extracts links from the raw HTTP response as URL_UNVERIFIED
s and then passes them back to httpx
to be visited.
By default, URL_UNVERIFIED
s are hidden from output. If you want to see all of them including the out-of-scope ones, you can do it by changing omit_event_types
and scope.report_distance
in the config like so:
# visit www.evilcorp.com and extract all the links\nbbot -t www.evilcorp.com -m httpx -c omit_event_types=[] scope.report_distance=2\n
"}]}
\ No newline at end of file
diff --git a/Dev/troubleshooting/index.html b/Dev/troubleshooting/index.html
index e103bd3ff..7384c08da 100644
--- a/Dev/troubleshooting/index.html
+++ b/Dev/troubleshooting/index.html
@@ -20,7 +20,7 @@
-
+