Update usage docs and roadmap links (#1196)

Adds docs on - discovering what other nodes are connected to a given node type - discovering what node properties are present on a given node type Updates docs on - roadmap link - making syncmetadata docs more discoverable
cartography-cncf · Jul 14, 2023 · 48f50ca · 48f50ca
1 parent 7c28291
commit 48f50ca
Show file tree

Hide file tree

Showing 5 changed files with 113 additions and 42 deletions.
diff --git a/README.md b/README.md
@@ -39,16 +39,13 @@ Start [here](https://lyft.github.io/cartography/install.html).
 ## Usage
 Start with our [tutorial](https://lyft.github.io/cartography/usage/tutorial.html). Our [data schema](https://lyft.github.io/cartography/usage/schema.html) is a helpful reference when you get stuck.
 
-## Contact
+## Community
 
 - Join us on `#cartography` on the [Lyft OSS Slack](https://join.slack.com/t/lyftoss/shared_invite/enQtOTYzODg5OTQwNDE2LTFiYjgwZWM3NTNhMTFkZjc4Y2IxOTI4NTdiNTdhNjQ4M2Q5NTIzMjVjOWI4NmVlNjRiZmU2YzA5NTc3MmFjYTQ).
-
-## Community Meeting
-
-Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles).
-- Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw).
-- Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1).
-- Our current project road map is [here](https://docs.google.com/document/d/18MOsGI-isFvag1fGk718Aht7wQPueWd4SqOI9KapBa8/edit#heading=h.15nsmgmjaaml).
+- Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles).
+  - Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw).
+  - Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1).
+- Our current project roadmap is [here](https://github.com/orgs/lyft/projects/26/views/1).
 
 ## Contributing
 Thank you for considering contributing to Cartography!

diff --git a/docs/root/modules/_cartography-metadata/schema.md b/docs/root/modules/_cartography-metadata/schema.md
@@ -0,0 +1,18 @@
+## Cartography metadata schema
+
+.. _metadata_schema:
+
+Some Cartography sync jobs write nodes to convey information about the job itself. See https://github.com/lyft/cartography/issues/758 for more background on this.
+
+### SyncMetadata:ModuleSyncMetadata
+
+This is a node to represent metadata about the sync job of a particular module. Its existence indicates that a particular sync job did happen.
+The 'types' used here should be actual node labels. For example, if we did sync a particular AWSAccount's S3Buckets,
+the `grouptype` is 'AWSAccount', the `groupid` is the particular account's `id`, and the `syncedtype` is 'S3Bucket'.
+
+| Field | Description | Source|
+|-------|-------------|------|
+|**id**|`{group_type}_{group_id}_{synced_type}`|util.py|
+|grouptype| The parent module's type |util.py|
+|groupid|The parent module's id|util.py|
+|syncedtype|The sub-module's type|util.py|
diff --git a/docs/root/usage/schema.md b/docs/root/usage/schema.md
@@ -22,6 +22,7 @@
 
 - In these docs, more specific nodes will be decorated with `GenericNode::SpecificNode` notation. For example, if we have a `Car` node and a `RaceCar` node, we will refer to the `RaceCar` as `Car::RaceCar`.
 
+.. mdinclude:: ../modules/_cartography-metadata/schema.md
 .. mdinclude:: ../modules/aws/schema.md
 .. mdinclude:: ../modules/azure/schema.md
 .. mdinclude:: ../modules/crxcavator/schema.md

diff --git a/docs/root/usage/tutorial.md b/docs/root/usage/tutorial.md
@@ -2,24 +2,16 @@
 
 Once everything has been installed and synced, you can view the Neo4j web interface at http://localhost:7474. You can view the reference on this [here](https://neo4j.com/developer/guide-neo4j-browser/#_installing_and_starting_neo4j_browser).
 
-### Permalinking Bookmarklet
+If you already know Neo4j and just need to know what are the nodes, attributes, and graph relationships for our representation of infrastructure assets, you can view our [sample queries](samplequeries.html). More sample queries are available at https://github.com/marco-lancini/cartography-queries.
 
-You can set up a bookmarklet that lets you quickly get a permalink to a Cartography query. To do so, add a bookmark with the following contents as the URL - make sure to replace `neo4j.contoso.com:7474` with your instance of Neo4j:
+Otherwise, read on for this handhold-y tutorial filled with examples. Suppose we wanted to find out:
 
-```javascript
-javascript:(() => { const query = document.querySelectorAll('article label span')[0].innerText; if (query === ':server connect') { console.log('no query has been run!'); return; } const searchParams = new URLSearchParams(); searchParams.append('connectURL', 'bolt://neo4j:[email protected]:7687'); searchParams.append('cmd', 'edit'); searchParams.append('arg', query.replaceAll(/\r /g, '\r')); newURL = `http://neo4j.contoso.net:7474/browser/?${searchParams}`; window.open(newURL, '_blank', 'noopener'); })()
-```
-
-Then, any time you are in the web interface, you can click the bookmarklet to open a new tab with a permalink to your most recently executed query in the URL bar.
-
-### ℹ️ Already know [how to query Neo4j](https://neo4j.com/developer/cypher-query-language/)?  You can skip to our reference material!
-If you already know Neo4j and just need to know what are the nodes, attributes, and graph relationships for our representation of infrastructure assets, you can skip this handholdy walkthrough and see our [sample queries](samplequeries.md).
-
-### What [RDS](https://aws.amazon.com/rds/) instances are installed in my [AWS](https://aws.amazon.com/) accounts?
-```
+### What [RDS](https://aws.amazon.com/rds/) instances are installed in my AWS accounts?
+```cypher
 MATCH (aws:AWSAccount)-[r:RESOURCE]->(rds:RDSInstance)
 return *
 ```
+
 ![Visualization of RDS nodes and AWS nodes](../images/accountsandrds.png)
 
 In this query we asked Neo4j to find all `[:RESOURCE]` relationships from AWSAccounts to RDSInstances, and return the nodes and the `:RESOURCE` relationships.
@@ -35,7 +27,7 @@ and then pick options on the menu that shows up at the bottom of the view like t
 
 
 ### Which RDS instances have [encryption](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html) turned off?
-```
+```cypher
 MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance{storage_encrypted:false})
 RETURN a.name, rds.id
 ```
@@ -49,7 +41,7 @@ If you want to go back to viewing the graph and not a table, simply make sure yo
 Let's look at some other AWS assets now.
 
 ### Which [EC2](https://aws.amazon.com/ec2/) instances are directly exposed to the internet?
-```
+```cypher
 MATCH (instance:EC2Instance{exposed_internet: true})
 RETURN instance.instanceid, instance.publicdnsname
 ```
@@ -60,7 +52,7 @@ These instances are open to the internet either through permissive inbound IP pe
 If you know a lot about AWS, you may have noticed that EC2 instances [don't actually have an exposed_internet field](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_Instance.html). We're able to query for this because Cartography performs some [data enrichment](#data-enrichment) to add this field to EC2Instance nodes.
 
 ### Which [S3](https://aws.amazon.com/s3/) buckets have a policy granting any level of anonymous access to the bucket?
-```
+```cypher
 MATCH (s:S3Bucket)
 WHERE s.anonymous_access = true
 RETURN s
@@ -76,13 +68,81 @@ A couple of other things to notice: instead of using the "{}" notation to filter
 
 Let's go back to analyzing RDS instances. In an earlier example we queried for RDS instances that have encryption turned off. We can aggregate this data by AWSAccount with a small change:
 
-```
+```cypher
 MATCH (a:AWSAccount)-[:RESOURCE]->(rds:RDSInstance)
 WHERE rds.storage_encrypted = false
 RETURN a.name as AWSAccount, count(rds) as UnencryptedInstances
 ```
 ![Table of unencrypted RDS instances by AWS account](../images/unencryptedcounts.png)
 
+
+### Given a node label, what other node labels can be connected to it?
+
+Suppose we wanted to know what other assets can be connected to a DNSRecord. We would ask the graph like this:
+
+```cypher
+match (d:DNSRecord)--(n)
+return distinct labels(n);
+```
+
+This says "what are the possible labels for all nodes connected to all DNSRecord nodes `d` in my graph?" Your answer might look like this:
+
+```
+["AWSDNSRecord", "DNSRecord"]
+["AWSDNSZone", "DNSZone"]
+["LoadBalancerV2"]
+["NameServer"]
+["ESDomain"]
+["LoadBalancer"]
+["EC2Instance", "Instance"]
+```
+
+You can then make the path more specific like this:
+
+```cypher
+match (d:DNSRecord)--(:EC2Instance)--(n)
+return distinct labels(n);
+```
+
+And then you can continue building your query.
+
+We also include [full schema docs](schema.html), but this way of building a query can be faster and more interactive.
+
+
+### Given a node label, what are the possible property names defined on it?
+
+We can find what properties are available on an S3Bucket like this:
+
+```cypher
+match (n:S3Bucket) return properties(n) limit 1;
+```
+
+The result will look like this:
+
+```
+{
+  "bucket_key_enabled": false,
+  "creationdate": "2022-05-10 00:22:52+00:00",
+  "ignore_public_acls": true,
+  "anonymous_access": false,
+  "firstseen": 1652400141863,
+  "block_public_policy": true,
+  "versioning_status": "Enabled",
+  "block_public_acls": true,
+  "anonymous_actions": [],
+  "name": "my-fake-bucket-123",
+  "lastupdated": 1688605272,
+  "encryption_algorithm": "AES256",
+  "default_encryption": true,
+  "id": "my-fake-bucket-123",
+  "arn": "arn:aws:s3:::my-fake-bucket-123",
+  "restrict_public_buckets": false
+}
+```
+
+Our [full schema docs](schema.html) describe all possible fields, but listing out properties this way lets you avoid switching between browser tabs.
+
+
 ### Learning more
 If you want to learn more in depth about Neo4j and Cypher queries you can look at [this tutorial](https://neo4j.com/developer/cypher-query-language/) and see this [reference card](https://neo4j.com/docs/cypher-refcard/current/).
 
@@ -117,3 +177,14 @@ You can add your own custom attributes and relationships without writing Python
 
 ### Mapping AWS Access Permissions
 Cartography can map permissions between IAM Principals and resources in the graph. Here's [how](../modules/aws/permissions-mapping.html).
+
+
+### Permalinking Bookmarklet
+
+You can set up a bookmarklet that lets you quickly get a permalink to a Cartography query. To do so, add a bookmark with the following contents as the URL - make sure to replace `neo4j.contoso.com:7474` with your instance of Neo4j:
+
+```javascript
+javascript:(() => { const query = document.querySelectorAll('article label span')[0].innerText; if (query === ':server connect') { console.log('no query has been run!'); return; } const searchParams = new URLSearchParams(); searchParams.append('connectURL', 'bolt://neo4j:[email protected]:7687'); searchParams.append('cmd', 'edit'); searchParams.append('arg', query.replaceAll(/\r /g, '\r')); newURL = `http://neo4j.contoso.net:7474/browser/?${searchParams}`; window.open(newURL, '_blank', 'noopener'); })()
+```
+
+Then, any time you are in the web interface, you can click the bookmarklet to open a new tab with a permalink to your most recently executed query in the URL bar.
diff --git a/docs/schema/syncmetadata.md b/docs/schema/syncmetadata.md
@@ -1,17 +1 @@
-## SyncMetadata
-
-SyncMetadata nodes are created by sync jobs to convey information about the job itself. See this doc for how this is
-used.
-
-## SyncMetadata:ModuleSyncMetadata
-
-This is a node to represent some metadata about the sync job of a particular module or sub-module. Its existence should suggest that a paritcular sync job did happen.
-The 'types' used here should be actual node labels. For example, if we did sync a particular AWSAccount's S3Buckets,
-the `grouptype` is 'AWSAccount', the `groupid` is the particular account's `id`, and the `syncedtype` is 'S3Bucket'.
-
-| Field | Description | Source|
-|-------|-------------|------|
-|**id**|`{group_type}_{group_id}_{synced_type}`|util.py|
-|grouptype| The parent module's type |util.py|
-|groupid|The parent module's id|util.py|
-|syncedtype|The sub-module's type|util.py|
+This document has been moved [here](https://lyft.github.io/cartography/modules/_cartography-metadata/schema.html)