Skip to content

Commit

Permalink
Update usage docs and roadmap links (#1196)
Browse files Browse the repository at this point in the history
Adds docs on 
- discovering what other nodes are connected to a given node type
- discovering what node properties are present on a given node type

Updates docs on
- roadmap link
- making syncmetadata docs more discoverable
  • Loading branch information
Alex Chantavy authored Jul 14, 2023
1 parent 7c28291 commit 48f50ca
Show file tree
Hide file tree
Showing 5 changed files with 113 additions and 42 deletions.
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,13 @@ Start [here](https://lyft.github.io/cartography/install.html).
## Usage
Start with our [tutorial](https://lyft.github.io/cartography/usage/tutorial.html). Our [data schema](https://lyft.github.io/cartography/usage/schema.html) is a helpful reference when you get stuck.

## Contact
## Community

- Join us on `#cartography` on the [Lyft OSS Slack](https://join.slack.com/t/lyftoss/shared_invite/enQtOTYzODg5OTQwNDE2LTFiYjgwZWM3NTNhMTFkZjc4Y2IxOTI4NTdiNTdhNjQ4M2Q5NTIzMjVjOWI4NmVlNjRiZmU2YzA5NTc3MmFjYTQ).

## Community Meeting

Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles).
- Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw).
- Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1).
- Our current project road map is [here](https://docs.google.com/document/d/18MOsGI-isFvag1fGk718Aht7wQPueWd4SqOI9KapBa8/edit#heading=h.15nsmgmjaaml).
- Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles).
- Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw).
- Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1).
- Our current project roadmap is [here](https://github.com/orgs/lyft/projects/26/views/1).

## Contributing
Thank you for considering contributing to Cartography!
Expand Down
18 changes: 18 additions & 0 deletions docs/root/modules/_cartography-metadata/schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Cartography metadata schema

.. _metadata_schema:

Some Cartography sync jobs write nodes to convey information about the job itself. See https://github.com/lyft/cartography/issues/758 for more background on this.

### SyncMetadata:ModuleSyncMetadata

This is a node to represent metadata about the sync job of a particular module. Its existence indicates that a particular sync job did happen.
The 'types' used here should be actual node labels. For example, if we did sync a particular AWSAccount's S3Buckets,
the `grouptype` is 'AWSAccount', the `groupid` is the particular account's `id`, and the `syncedtype` is 'S3Bucket'.

| Field | Description | Source|
|-------|-------------|------|
|**id**|`{group_type}_{group_id}_{synced_type}`|util.py|
|grouptype| The parent module's type |util.py|
|groupid|The parent module's id|util.py|
|syncedtype|The sub-module's type|util.py|
1 change: 1 addition & 0 deletions docs/root/usage/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

- In these docs, more specific nodes will be decorated with `GenericNode::SpecificNode` notation. For example, if we have a `Car` node and a `RaceCar` node, we will refer to the `RaceCar` as `Car::RaceCar`.

.. mdinclude:: ../modules/_cartography-metadata/schema.md
.. mdinclude:: ../modules/aws/schema.md
.. mdinclude:: ../modules/azure/schema.md
.. mdinclude:: ../modules/crxcavator/schema.md
Expand Down
105 changes: 88 additions & 17 deletions docs/root/usage/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,16 @@

Once everything has been installed and synced, you can view the Neo4j web interface at http://localhost:7474. You can view the reference on this [here](https://neo4j.com/developer/guide-neo4j-browser/#_installing_and_starting_neo4j_browser).

### Permalinking Bookmarklet
If you already know Neo4j and just need to know what are the nodes, attributes, and graph relationships for our representation of infrastructure assets, you can view our [sample queries](samplequeries.html). More sample queries are available at https://github.com/marco-lancini/cartography-queries.

You can set up a bookmarklet that lets you quickly get a permalink to a Cartography query. To do so, add a bookmark with the following contents as the URL - make sure to replace `neo4j.contoso.com:7474` with your instance of Neo4j:
Otherwise, read on for this handhold-y tutorial filled with examples. Suppose we wanted to find out:

```javascript
javascript:(() => { const query = document.querySelectorAll('article label span')[0].innerText; if (query === ':server connect') { console.log('no query has been run!'); return; } const searchParams = new URLSearchParams(); searchParams.append('connectURL', 'bolt://neo4j:[email protected]:7687'); searchParams.append('cmd', 'edit'); searchParams.append('arg', query.replaceAll(/\r /g, '\r')); newURL = `http://neo4j.contoso.net:7474/browser/?${searchParams}`; window.open(newURL, '_blank', 'noopener'); })()
```

Then, any time you are in the web interface, you can click the bookmarklet to open a new tab with a permalink to your most recently executed query in the URL bar.

### ℹ️ Already know [how to query Neo4j](https://neo4j.com/developer/cypher-query-language/)? You can skip to our reference material!
If you already know Neo4j and just need to know what are the nodes, attributes, and graph relationships for our representation of infrastructure assets, you can skip this handholdy walkthrough and see our [sample queries](samplequeries.md).

### What [RDS](https://aws.amazon.com/rds/) instances are installed in my [AWS](https://aws.amazon.com/) accounts?
```
### What [RDS](https://aws.amazon.com/rds/) instances are installed in my AWS accounts?
```cypher
MATCH (aws:AWSAccount)-[r:RESOURCE]->(rds:RDSInstance)
return *
```

![Visualization of RDS nodes and AWS nodes](../images/accountsandrds.png)

In this query we asked Neo4j to find all `[:RESOURCE]` relationships from AWSAccounts to RDSInstances, and return the nodes and the `:RESOURCE` relationships.
Expand All @@ -35,7 +27,7 @@ and then pick options on the menu that shows up at the bottom of the view like t


### Which RDS instances have [encryption](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html) turned off?
```
```cypher
MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance{storage_encrypted:false})
RETURN a.name, rds.id
```
Expand All @@ -49,7 +41,7 @@ If you want to go back to viewing the graph and not a table, simply make sure yo
Let's look at some other AWS assets now.

### Which [EC2](https://aws.amazon.com/ec2/) instances are directly exposed to the internet?
```
```cypher
MATCH (instance:EC2Instance{exposed_internet: true})
RETURN instance.instanceid, instance.publicdnsname
```
Expand All @@ -60,7 +52,7 @@ These instances are open to the internet either through permissive inbound IP pe
If you know a lot about AWS, you may have noticed that EC2 instances [don't actually have an exposed_internet field](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_Instance.html). We're able to query for this because Cartography performs some [data enrichment](#data-enrichment) to add this field to EC2Instance nodes.

### Which [S3](https://aws.amazon.com/s3/) buckets have a policy granting any level of anonymous access to the bucket?
```
```cypher
MATCH (s:S3Bucket)
WHERE s.anonymous_access = true
RETURN s
Expand All @@ -76,13 +68,81 @@ A couple of other things to notice: instead of using the "{}" notation to filter

Let's go back to analyzing RDS instances. In an earlier example we queried for RDS instances that have encryption turned off. We can aggregate this data by AWSAccount with a small change:

```
```cypher
MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance)
WHERE rds.storage_encrypted = false
RETURN a.name as AWSAccount, count(rds) as UnencryptedInstances
```
![Table of unencrypted RDS instances by AWS account](../images/unencryptedcounts.png)


### Given a node label, what other node labels can be connected to it?

Suppose we wanted to know what other assets can be connected to a DNSRecord. We would ask the graph like this:

```cypher
match (d:DNSRecord)--(n)
return distinct labels(n);
```

This says "what are the possible labels for all nodes connected to all DNSRecord nodes `d` in my graph?" Your answer might look like this:

```
["AWSDNSRecord", "DNSRecord"]
["AWSDNSZone", "DNSZone"]
["LoadBalancerV2"]
["NameServer"]
["ESDomain"]
["LoadBalancer"]
["EC2Instance", "Instance"]
```

You can then make the path more specific like this:

```cypher
match (d:DNSRecord)--(:EC2Instance)--(n)
return distinct labels(n);
```

And then you can continue building your query.

We also include [full schema docs](schema.html), but this way of building a query can be faster and more interactive.


### Given a node label, what are the possible property names defined on it?

We can find what properties are available on an S3Bucket like this:

```cypher
match (n:S3Bucket) return properties(n) limit 1;
```

The result will look like this:

```
{
"bucket_key_enabled": false,
"creationdate": "2022-05-10 00:22:52+00:00",
"ignore_public_acls": true,
"anonymous_access": false,
"firstseen": 1652400141863,
"block_public_policy": true,
"versioning_status": "Enabled",
"block_public_acls": true,
"anonymous_actions": [],
"name": "my-fake-bucket-123",
"lastupdated": 1688605272,
"encryption_algorithm": "AES256",
"default_encryption": true,
"id": "my-fake-bucket-123",
"arn": "arn:aws:s3:::my-fake-bucket-123",
"restrict_public_buckets": false
}
```

Our [full schema docs](schema.html) describe all possible fields, but listing out properties this way lets you avoid switching between browser tabs.


### Learning more
If you want to learn more in depth about Neo4j and Cypher queries you can look at [this tutorial](https://neo4j.com/developer/cypher-query-language/) and see this [reference card](https://neo4j.com/docs/cypher-refcard/current/).

Expand Down Expand Up @@ -117,3 +177,14 @@ You can add your own custom attributes and relationships without writing Python

### Mapping AWS Access Permissions
Cartography can map permissions between IAM Principals and resources in the graph. Here's [how](../modules/aws/permissions-mapping.html).


### Permalinking Bookmarklet

You can set up a bookmarklet that lets you quickly get a permalink to a Cartography query. To do so, add a bookmark with the following contents as the URL - make sure to replace `neo4j.contoso.com:7474` with your instance of Neo4j:

```javascript
javascript:(() => { const query = document.querySelectorAll('article label span')[0].innerText; if (query === ':server connect') { console.log('no query has been run!'); return; } const searchParams = new URLSearchParams(); searchParams.append('connectURL', 'bolt://neo4j:[email protected]:7687'); searchParams.append('cmd', 'edit'); searchParams.append('arg', query.replaceAll(/\r /g, '\r')); newURL = `http://neo4j.contoso.net:7474/browser/?${searchParams}`; window.open(newURL, '_blank', 'noopener'); })()
```

Then, any time you are in the web interface, you can click the bookmarklet to open a new tab with a permalink to your most recently executed query in the URL bar.
18 changes: 1 addition & 17 deletions docs/schema/syncmetadata.md
Original file line number Diff line number Diff line change
@@ -1,17 +1 @@
## SyncMetadata

SyncMetadata nodes are created by sync jobs to convey information about the job itself. See this doc for how this is
used.

## SyncMetadata:ModuleSyncMetadata

This is a node to represent some metadata about the sync job of a particular module or sub-module. Its existence should suggest that a paritcular sync job did happen.
The 'types' used here should be actual node labels. For example, if we did sync a particular AWSAccount's S3Buckets,
the `grouptype` is 'AWSAccount', the `groupid` is the particular account's `id`, and the `syncedtype` is 'S3Bucket'.

| Field | Description | Source|
|-------|-------------|------|
|**id**|`{group_type}_{group_id}_{synced_type}`|util.py|
|grouptype| The parent module's type |util.py|
|groupid|The parent module's id|util.py|
|syncedtype|The sub-module's type|util.py|
This document has been moved [here](https://lyft.github.io/cartography/modules/_cartography-metadata/schema.html)

0 comments on commit 48f50ca

Please sign in to comment.