deploy: 7feb218

cartography-cncf · Dec 11, 2024 · ac65c07 · ac65c07
commit ac65c07
Show file tree

Hide file tree

Showing 298 changed files with 106,814 additions and 0 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: b87c046b3ed94a6cbe635c90384352b9
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/.doctrees/contact.doctree b/.doctrees/contact.doctree
diff --git a/.doctrees/dev/developer-guide.doctree b/.doctrees/dev/developer-guide.doctree
diff --git a/.doctrees/dev/index.doctree b/.doctrees/dev/index.doctree
diff --git a/.doctrees/dev/writing-analysis-jobs.doctree b/.doctrees/dev/writing-analysis-jobs.doctree
diff --git a/.doctrees/dev/writing-intel-modules.doctree b/.doctrees/dev/writing-intel-modules.doctree
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/.doctrees/index.doctree b/.doctrees/index.doctree
diff --git a/.doctrees/info.doctree b/.doctrees/info.doctree
diff --git a/.doctrees/install.doctree b/.doctrees/install.doctree
diff --git a/.doctrees/modules/_cartography-metadata/schema.doctree b/.doctrees/modules/_cartography-metadata/schema.doctree
diff --git a/.doctrees/modules/aws/config.doctree b/.doctrees/modules/aws/config.doctree
diff --git a/.doctrees/modules/aws/index.doctree b/.doctrees/modules/aws/index.doctree
diff --git a/.doctrees/modules/aws/permissions-mapping.doctree b/.doctrees/modules/aws/permissions-mapping.doctree
diff --git a/.doctrees/modules/aws/schema.doctree b/.doctrees/modules/aws/schema.doctree
diff --git a/.doctrees/modules/azure/config.doctree b/.doctrees/modules/azure/config.doctree
diff --git a/.doctrees/modules/azure/index.doctree b/.doctrees/modules/azure/index.doctree
diff --git a/.doctrees/modules/azure/schema.doctree b/.doctrees/modules/azure/schema.doctree
diff --git a/.doctrees/modules/bigfix/config.doctree b/.doctrees/modules/bigfix/config.doctree
diff --git a/.doctrees/modules/bigfix/index.doctree b/.doctrees/modules/bigfix/index.doctree
diff --git a/.doctrees/modules/bigfix/schema.doctree b/.doctrees/modules/bigfix/schema.doctree
diff --git a/.doctrees/modules/crowdstrike/config.doctree b/.doctrees/modules/crowdstrike/config.doctree
diff --git a/.doctrees/modules/crowdstrike/index.doctree b/.doctrees/modules/crowdstrike/index.doctree
diff --git a/.doctrees/modules/crowdstrike/schema.doctree b/.doctrees/modules/crowdstrike/schema.doctree
diff --git a/.doctrees/modules/cve/config.doctree b/.doctrees/modules/cve/config.doctree
diff --git a/.doctrees/modules/cve/index.doctree b/.doctrees/modules/cve/index.doctree
diff --git a/.doctrees/modules/cve/schema.doctree b/.doctrees/modules/cve/schema.doctree
diff --git a/.doctrees/modules/digitalocean/config.doctree b/.doctrees/modules/digitalocean/config.doctree
diff --git a/.doctrees/modules/digitalocean/index.doctree b/.doctrees/modules/digitalocean/index.doctree
diff --git a/.doctrees/modules/digitalocean/schema.doctree b/.doctrees/modules/digitalocean/schema.doctree
diff --git a/.doctrees/modules/duo/config.doctree b/.doctrees/modules/duo/config.doctree
diff --git a/.doctrees/modules/duo/index.doctree b/.doctrees/modules/duo/index.doctree
diff --git a/.doctrees/modules/duo/schema.doctree b/.doctrees/modules/duo/schema.doctree
diff --git a/.doctrees/modules/gcp/config.doctree b/.doctrees/modules/gcp/config.doctree
diff --git a/.doctrees/modules/gcp/index.doctree b/.doctrees/modules/gcp/index.doctree
diff --git a/.doctrees/modules/gcp/schema.doctree b/.doctrees/modules/gcp/schema.doctree
diff --git a/.doctrees/modules/github/config.doctree b/.doctrees/modules/github/config.doctree
diff --git a/.doctrees/modules/github/index.doctree b/.doctrees/modules/github/index.doctree
diff --git a/.doctrees/modules/github/schema.doctree b/.doctrees/modules/github/schema.doctree
diff --git a/.doctrees/modules/gsuite/config.doctree b/.doctrees/modules/gsuite/config.doctree
diff --git a/.doctrees/modules/gsuite/index.doctree b/.doctrees/modules/gsuite/index.doctree
diff --git a/.doctrees/modules/gsuite/schema.doctree b/.doctrees/modules/gsuite/schema.doctree
diff --git a/.doctrees/modules/index.doctree b/.doctrees/modules/index.doctree
diff --git a/.doctrees/modules/jamf/index.doctree b/.doctrees/modules/jamf/index.doctree
diff --git a/.doctrees/modules/jamf/schema.doctree b/.doctrees/modules/jamf/schema.doctree
diff --git a/.doctrees/modules/kandji/config.doctree b/.doctrees/modules/kandji/config.doctree
diff --git a/.doctrees/modules/kandji/index.doctree b/.doctrees/modules/kandji/index.doctree
diff --git a/.doctrees/modules/kandji/schema.doctree b/.doctrees/modules/kandji/schema.doctree
diff --git a/.doctrees/modules/kubernetes/config.doctree b/.doctrees/modules/kubernetes/config.doctree
diff --git a/.doctrees/modules/kubernetes/index.doctree b/.doctrees/modules/kubernetes/index.doctree
diff --git a/.doctrees/modules/kubernetes/schema.doctree b/.doctrees/modules/kubernetes/schema.doctree
diff --git a/.doctrees/modules/lastpass/config.doctree b/.doctrees/modules/lastpass/config.doctree
diff --git a/.doctrees/modules/lastpass/index.doctree b/.doctrees/modules/lastpass/index.doctree
diff --git a/.doctrees/modules/lastpass/schema.doctree b/.doctrees/modules/lastpass/schema.doctree
diff --git a/.doctrees/modules/okta/config.doctree b/.doctrees/modules/okta/config.doctree
diff --git a/.doctrees/modules/okta/index.doctree b/.doctrees/modules/okta/index.doctree
diff --git a/.doctrees/modules/okta/schema.doctree b/.doctrees/modules/okta/schema.doctree
diff --git a/.doctrees/modules/pagerduty/config.doctree b/.doctrees/modules/pagerduty/config.doctree
diff --git a/.doctrees/modules/pagerduty/index.doctree b/.doctrees/modules/pagerduty/index.doctree
diff --git a/.doctrees/modules/pagerduty/schema.doctree b/.doctrees/modules/pagerduty/schema.doctree
diff --git a/.doctrees/modules/semgrep/config.doctree b/.doctrees/modules/semgrep/config.doctree
diff --git a/.doctrees/modules/semgrep/index.doctree b/.doctrees/modules/semgrep/index.doctree
diff --git a/.doctrees/modules/semgrep/schema.doctree b/.doctrees/modules/semgrep/schema.doctree
diff --git a/.doctrees/modules/snipeit/config.doctree b/.doctrees/modules/snipeit/config.doctree
diff --git a/.doctrees/modules/snipeit/index.doctree b/.doctrees/modules/snipeit/index.doctree
diff --git a/.doctrees/modules/snipeit/schema.doctree b/.doctrees/modules/snipeit/schema.doctree
diff --git a/.doctrees/ops.doctree b/.doctrees/ops.doctree
diff --git a/.doctrees/usage/applications.doctree b/.doctrees/usage/applications.doctree
diff --git a/.doctrees/usage/drift-detect.doctree b/.doctrees/usage/drift-detect.doctree
diff --git a/.doctrees/usage/index.doctree b/.doctrees/usage/index.doctree
diff --git a/.doctrees/usage/samplequeries.doctree b/.doctrees/usage/samplequeries.doctree
diff --git a/.doctrees/usage/schema.doctree b/.doctrees/usage/schema.doctree
diff --git a/.doctrees/usage/tutorial.doctree b/.doctrees/usage/tutorial.doctree
diff --git a/.nojekyll b/.nojekyll
diff --git a/_images/accountsandrds.png b/_images/accountsandrds.png
diff --git a/_images/anonbuckets.png b/_images/anonbuckets.png
diff --git a/_images/app-direct.png b/_images/app-direct.png
diff --git a/_images/app-with-api.png b/_images/app-with-api.png
diff --git a/_images/basic-dataflow.png b/_images/basic-dataflow.png
diff --git a/_images/customizeview.png b/_images/customizeview.png
diff --git a/_images/dockercompose-flow.png b/_images/dockercompose-flow.png
diff --git a/_images/dockercompose-result.png b/_images/dockercompose-result.png
diff --git a/_images/ec2-inet-open.png b/_images/ec2-inet-open.png
diff --git a/_images/exposed-internet.png b/_images/exposed-internet.png
diff --git a/_images/logo-horizontal.png b/_images/logo-horizontal.png
diff --git a/_images/nativeinstall-run.png b/_images/nativeinstall-run.png
diff --git a/_images/parallel-crons.png b/_images/parallel-crons.png
diff --git a/_images/pipeline-hive-mode.png b/_images/pipeline-hive-mode.png
diff --git a/_images/pipeline-neodash.png b/_images/pipeline-neodash.png
diff --git a/_images/selectnode.png b/_images/selectnode.png
diff --git a/_images/unencryptedcounts.png b/_images/unencryptedcounts.png
diff --git a/_images/unencryptedinstances.png b/_images/unencryptedinstances.png
diff --git a/_images/yourowntestmachine.png b/_images/yourowntestmachine.png
diff --git a/_sources/contact.md.txt b/_sources/contact.md.txt
@@ -0,0 +1,10 @@
+## Contact
+
+- Join us on `#cartography` on the [Lyft OSS Slack](https://join.slack.com/t/lyftoss/shared_invite/enQtOTYzODg5OTQwNDE2LTFiYjgwZWM3NTNhMTFkZjc4Y2IxOTI4NTdiNTdhNjQ4M2Q5NTIzMjVjOWI4NmVlNjRiZmU2YzA5NTc3MmFjYTQ).
+
+## Community Meeting
+
+Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles).
+- Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw).
+- Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1).
+- Our current project road map is [here](https://docs.google.com/document/d/18MOsGI-isFvag1fGk718Aht7wQPueWd4SqOI9KapBa8/edit#heading=h.15nsmgmjaaml).
diff --git a/_sources/dev/developer-guide.md.txt b/_sources/dev/developer-guide.md.txt
@@ -0,0 +1,219 @@
+# Cartography Developer Guide
+
+## Running the source code
+
+This document assumes familiarity with Python dev practices such as using [virtualenvs](https://packaging.python.org/guides/installing-using-pip-and-virtualenv/).
+
+1. **Run Neo4j**
+
+    Follow the [Install Steps](../install.html) so that you get Neo4j running locally. It's up to you if you want to use Docker or a native install.
+
+1. **Install Python 3.10**
+
+1. **Clone the source code**
+
+    Run `cd {path-where-you-want-your-source-code}`. Get the source code with `git clone git://github.com/lyft/cartography.git`
+
+1. **Perform an editable install of the cartography source code**
+
+    Run `cd cartography` and then `pip install -e .` (yes, actually type the period into the command line) to install Cartography from source to the current venv.
+
+4. **Run from source**
+
+    After this finishes you should be able to run Cartography from source with `cartography --neo4j-uri bolt://localhost:7687`. Any changes to the source code in `{path-where-you-want-your-source-code}/cartography` are now locally testable by running `cartography` from the command line.
+
+## Automated testing
+
+1. **Install test requirements**
+
+    `pip install -r test-requirements.txt`
+
+1. **(OPTIONAL) Setup environment variables for integration tests**
+
+    The integration tests expect Neo4j to be running locally, listening on default ports, and with auth disabled.
+
+    To run the integration tests on a specific Neo4j instance, add the following environment variable:
+
+    `export "NEO4J_URL=<your_neo4j_instance_bolt_url:your_neo4j_instance_port>"`
+
+1. **Run tests using `make`**
+    - `make test_lint` runs [pre-commit](https://pre-commit.com) linting against the codebase.
+    - `make test_unit` runs the unit test suite.
+
+    ⚠️ Important!  The below commands will **DELETE ALL NODES** on your local Neo4j instance as part of our testing procedure. Only run any of the below commands if you are ok with this. ⚠️
+
+    - `make test_integration` runs the integration test suite.
+    For more granular testing, you can invoke `pytest` directly:
+      - `pytest ./tests/integration/cartography/intel/aws/test_iam.py`
+      - `pytest ./tests/integration/cartography/intel/aws/test_iam.py::test_load_groups`
+      - `pytest -k test_load_groups`
+    - `make test` can be used to run all of the above.
+
+## Implementing custom sync commands
+
+By default, cartography will try to sync every intel module included as part of the default sync. If you're not using certain intel modules, you can create a custom sync script and invoke it using the cartography CLI. For example, if you're only interested in the AWS intel module you can create a sync script, `custom_sync.py`, that looks like this:
+
+```python
+from cartography import cli
+from cartography import sync
+from cartography.intel import aws
+from cartography.intel import create_indexes
+
+def build_custom_sync():
+    s = sync.Sync()
+    s.add_stages([
+        ('create-indexes', create_indexes.run),
+        ('aws', aws.start_aws_ingestion),
+    ])
+    return s
+
+def main(argv):
+    return cli.CLI(build_custom_sync(), prog='cartography').main(argv)
+
+if __name__ == '__main__':
+    import sys
+    sys.exit(main(sys.argv[1:]))
+```
+
+Which can then be invoked using `python custom_sync.py` and will have all the features of the cartography CLI while only including the intel modules you are specifically interested in using. For example:
+
+```
+cartography$ python custom_sync.py
+INFO:cartography.sync:Starting sync with update tag '1569022981'
+INFO:cartography.sync:Starting sync stage 'create-indexes'
+INFO:cartography.intel.create_indexes:Creating indexes for cartography node types.
+INFO:cartography.sync:Finishing sync stage 'create-indexes'
+INFO:cartography.sync:Starting sync stage 'aws'
+INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
+...
+```
+
+## dev.Dockerfile
+
+We include a dev.Dockerfile that can help streamline common dev tasks. It is different from the main Dockerfile in that
+
+1. It is strictly intended for dev purposes.
+1. It performs an editable install of the cartography source code and test requirements.
+1. It does not define a docker entrypoint. This is to allow you to run a custom sync script instead of just the main `cartography` command.
+
+To use it, build dev.Dockerfile with
+```bash
+cd /path/to/cartography/repo
+docker build -t cartography-cncf/cartography-dev -f dev.Dockerfile ./
+```
+
+With that, there are some interesting things you can do with it.
+
+### Dev with docker-compose
+
+#### Run the full test suite
+
+```bash
+docker-compose run cartography-dev make test_lint
+docker-compose run cartography-dev make test_unit
+docker-compose run cartography-dev make test_integration
+
+# for all the above
+docker-compose run cartography-dev make test
+```
+
+#### Run a [custom sync script](#implementing-custom-sync-commands)
+
+```bash
+docker-compose run cartography-dev python custom_script.py
+```
+
+#### Run the cartography CLI
+
+```bash
+docker-compose run cartography-dev cartography --help
+```
+
+### Equivalent manual docker commands
+
+If you don't like docker-compose or if it doesn't work for you for any reason, here are the equivalent manual docker commands for the previous scenarios:
+
+#### Run unit tests with dev.Dockerfile
+
+```bash
+docker run --rm cartography-cncf/cartography-dev make test_unit
+```
+
+This is a simple command because it doesn't require any volume mounts or docker networking.
+
+#### Run the linter with dev.Dockerfile
+
+```bash
+docker run --rm \
+    -v $(pwd):/var/cartography \
+    -v $(pwd)/.cache/pre-commit:/var/cartography/.cache/pre-commit \
+    cartography-cncf/cartography-dev \
+    make test_lint
+```
+
+The volume mounts are necessary to let pre-commit from within the container edit source files on the host machine, and for pre-commit's cached state to save on your host machine without needing to update itself every time you run it.
+
+#### Run integration tests with dev.Dockerfile
+
+First run a Neo4j container:
+```bash
+docker run \
+    --publish=7474:7474 \
+    --publish=7687:7687 \
+    --network cartography-network \
+    -v data:/data \
+    --name cartography-neo4j \
+    --env=NEO4J_AUTH=none \
+    neo4j:4.4-community
+```
+
+and then call the integration test suite like this:
+```bash
+docker run --rm \
+  --network cartography-network \
+  -e NEO4J_URL=bolt://cartography-neo4j:7687 \
+  cartography-cncf/cartography-dev \
+  make test_integration
+```
+
+Note that we needed to specify the `NEO4J_URL` env var so that the integration test would be able to reach the Neo4j container.
+
+#### Run the full test suite with dev.Dockerfile
+
+Bring up a neo4j container
+```bash
+docker run \
+    --publish=7474:7474 \
+    --publish=7687:7687 \
+    --network cartography-network \
+    -v data:/data \
+    --name cartography-neo4j \
+    --env=NEO4J_AUTH=none \
+    neo4j:4.4-community
+```
+
+and then run the full test suite by specifying all the necessary volumes, network, and env vars.
+```bash
+docker run --rm \
+    -v $(pwd):/var/cartography \
+    -v $(pwd)/.cache/pre-commit:/var/cartography/.cache/pre-commit \
+    --network cartography-network \
+    -e NEO4J_URL=bolt://cartography-neo4j:7687 \
+    cartography-cncf/cartography-dev \
+    make test
+```
+
+#### Run a [custom sync script](#implementing-custom-sync-commands) with dev.Dockerfile
+
+```bash
+docker run --rm cartography-cncf/cartography-dev python custom_sync.py
+```
+
+#### Run cartography CLI with dev.Dockerfile
+
+```bash
+docker run --rm cartography-cncf/cartography-dev cartography --help
+```
+
+## How to write a new intel module
+See [here](writing-intel-modules.html).
diff --git a/_sources/dev/index.rst.txt b/_sources/dev/index.rst.txt
@@ -0,0 +1,5 @@
+.. toctree::
+
+    developer-guide
+    writing-analysis-jobs
+    writing-intel-modules
diff --git a/_sources/dev/writing-analysis-jobs.md.txt b/_sources/dev/writing-analysis-jobs.md.txt
@@ -0,0 +1,122 @@
+# How to extend Cartography with Analysis Jobs
+
+## Overview
+In a nutshell, Analysis Jobs let you add your own customizations to Cartography by writing Neo4j queries. This helps you add powerful enhancements to your data without the need to write Python code.
+
+### The stages
+There are 3 stages to a cartography sync. First we create database indexes, next we ingest assets via intel modules, and finally we can run Analysis Jobs on the database (see [cartography.sync.build\_default\_sync()](https://github.com/lyft/cartography/blob/master/cartography/sync.py)). This tutorial focuses on Analysis Jobs.
+
+### How to run
+Each Analysis Job is a JSON file with a list of Neo4j statements which get run in order. To run Analysis Jobs, in your call to `cartography`, set the `--analysis-job-directory` parameter to the folder path of your jobs. Although the order of statements within a single job is preserved, we don't guarantee the order in which jobs are executed.
+
+## Example job: which of my EC2 instances is accessible to any host on the internet?
+The easiest way to learn how to write an Analysis Job is through an example. One of the Analysis Jobs that we've included by default in Cartography's source tree is [cartography/data/jobs/analysis/aws_ec2_asset_exposure.json](https://github.com/lyft/cartography/blob/master/cartography/data/jobs/analysis/aws_ec2_asset_exposure.json). This tutorial covers only the EC2 instance part of that job, but after reading this you should be able to understand the other steps in that file.
+
+### Our goal
+After ingesting all our AWS data, we want to explicitly mark EC2 instances that are accessible to the public internet - a useful thing to know for anyone running an internet service. If any internet-open nodes are found, the job will add an attribute `exposed_internet = True` to the node. This way we can easily query to find the assets later on and take remediation action if needed.
+
+But how do we make this determination, and how should we structure the job?
+
+### The logic in plain English
+We can use the following facts to tell if an EC2 instance is open to the internet:
+
+1. The EC2 instance is a member of a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet.
+2. The EC2 instance has a network interface that is connected to a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet.
+
+The graph created by Cartography's sync process already has this information for us; we just need to run a few queries to properly to mark it with `exposed_internet = True`. This example is complex but we hope that this exposes enough Neo4j concepts to help you write your own queries.
+
+### Translating the plain-English logic into Neo4j's Cypher syntax
+We can take the ideas above and use Cypher's declarative syntax to "sketch" out this graph path.
+
+1. _The EC2 instance is a member of a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet._
+
+    In Cypher, this is
+
+    ```
+    MATCH
+    (:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound)
+    -[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup)
+    <-[:MEMBER_OF_EC2_SECURITY_GROUP]-(instance:EC2Instance)
+
+    SET instance.exposed_internet = true,
+        instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct';
+    ```
+
+    In the `SET` clause we add `exposed_internet = True` to the instance. We also add a field for `exposed_internet_type` to denote what type of internet exposure has occurred here. You can read the [documentation for `coalesce`](https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce), but in English this last part says "add `direct` to the list of ways this instance is exposed to the internet".
+
+2. _The EC2 instance has a network interface that is connected to a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet._
+
+    This is the same as the previous query except for the final line:
+
+    ```
+    MATCH
+    (:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound)
+    -[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup)
+    <-[:NETWORK_INTERFACE*..2]-(instance:EC2Instance)
+
+    SET instance.exposed_internet = true,
+        instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct';
+    ```
+
+    The `*..2` operator means "within 2 hops". We use this here as a shortcut because there are a few more relationships between NetworkInterfaces and EC2SecurityGroups that we can skip over.
+
+Finally, notice that (1) and (2) are similar enough that we can actually merge them like this:
+
+```
+MATCH
+(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound)
+-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup)
+<-[:MEMBER_OF_EC2_SECURITY_GROUP|NETWORK_INTERFACE*..2]-(instance:EC2Instance)
+
+SET instance.exposed_internet = true,
+    instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct';
+```
+
+Kinda neat, right?
+
+### The skeleton of an Analysis Job
+Now that we know what we want to do on a sync, how should we structure the Analysis Job?  Here is the basic skeleton that we recommend.
+
+#### Clean up first, then update
+In general, the first statement(s) should be a "clean-up phase" that removes custom attributes or relationships that you may have added in a previous run. This ensures that whatever labels you add on this current run will be up to date and not stale. Next, the statements after the clean-up phase will perform the matching and attribute updates as described in the previous section.
+
+**Here's our final result:**
+
+```
+{
+  "name": "AWS asset internet exposure",
+  "statements": [
+      {
+        "__comment": "This is a clean-up statement to remove custom attributes",
+        "query": "MATCH (n)
+                  WHERE n.exposed_internet IS NOT NULL
+                        AND labels(n) IN ['AutoScalingGroup', 'EC2Instance', 'LoadBalancer']
+                  WITH n LIMIT $LIMIT_SIZE
+                  REMOVE n.exposed_internet, n.exposed_internet_type
+                  RETURN COUNT(*) as TotalCompleted",
+        "iterative": true,
+        "iterationsize": 1000
+      },
+      {
+        "__comment__": "This is our analysis logic as described in the section above",
+        "query": MATCH (:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound)
+                 -[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup)
+                 <-[:MEMBER_OF_EC2_SECURITY_GROUP|NETWORK_INTERFACE*..2]-(instance:EC2Instance)
+
+                 SET instance.exposed_internet = true,
+                     instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct';,
+        "iterative": true,
+        "iterationsize": 100
+      }
+  ]
+}
+```
+
+Setting a statement as `iterative: true` means that we will run this query on `#{iterationsize}` entries at a time. This can be helpful for queries that return large numbers of records so that Neo4j doesn't get too angry.
+
+Now we can enjoy the fruits of our labor and query for internet exposure:
+
+![internet-exposure-query](../images/exposed-internet.png)
+
+## Recap
+As shown, you create an Analysis Job by putting together a bunch of `statements` together (which are essentially Neo4j queries). In general, each job should first clean up the custom attributes added by a previous run, and then it can perform the match and update steps to add the custom attributes back again. This ensures that your data is up to date.