Skip to content

Commit

Permalink
first pass at identifying enterprise owner users (#1378)
Browse files Browse the repository at this point in the history
### Preamble

This PR is a copy of #1373, identical in content to that PR as of the
time of this new PR's creation. (That PR will now appear empty because
of some cleanup I did on my fork... and some learning I got on how forks
work. 😄). That PR does have some relevant discussion, which I suppose we
will continue here. Apologies for the noise, it'll hopefully be avoided
in future PRs.

### Summary

This PR adds to the Github graph, marking users as [enterprise
owners](https://docs.github.com/en/enterprise-cloud@latest/admin/managing-accounts-and-repositories/managing-users-in-your-enterprise/roles-in-an-enterprise#enterprise-owners).

We think this is a valuable addition to the graph in general, because
these users are not all necessarily visible in the graph at the moment
but have broad access. Less generally (but still maybe relevant to
others) our analysts at Etsy need to review these users as part of our
UAR (User Access Review) process, which we hope Cartography will
eventually help to power.

We wanted to do this in a light-touch way, without breaking existing
relationships or removing properties. We also wanted to follow how
similar properties are graphed on the user node: org ownership, for
example, is noted by the 'user.role' property; similarly, the
'user.is_site_admin' property notes whether a user is a site admin). To
that end, we did the following:

1. add an 'is_enterprise_owner' property to all user nodes
2. add a new type of user-org relationship: 'UNAFFILIATED'. The
[terminology](https://docs.github.com/en/graphql/reference/enums#roleinorganization)
is Github's, and it is used for enterprise owners who are not also
members of the graphed organization.

Here is an illustration of before/after (I will also add some screencap
below but thought the high-level illustration might help):
![Cartography AMPS User Owns Enterprise
(1)](https://github.com/user-attachments/assets/dc943ab5-2a95-4f76-a39a-6b9f6262169b)

### Other notes on the PR

1. I refactored the integration tests, taking cues from how the testing
for Github teams was done by testing the 'sync' function as a whole
instead of just the 'load' function.
1. In general I tried to do things in keeping with the style I saw
around me. I am happy to change anything.
1. In our slack conversation, it was mentioned PRs should use the new
models. I’d already written this when I read that, but, when I looked I
saw there are no models for this. Is that okay? Should they be added
and, if so, could it be in a separate PR or must it be here?

### Related issues or links

None.

### Screencaps

_(I could get other screencaps... if anything would be helpful, please
let me know.)_

In this case was helpful that we had an enterprise owner who was also a
user in one of our orgs, but not another. I highlighted them
specifically in a query here, showing both the new property and
relationship type.

**Before**
![User Org
Before](https://github.com/user-attachments/assets/f21bb2a9-8d3e-45ed-bc7b-112b21bd304a)

**After**
![User Org
After](https://github.com/user-attachments/assets/637eb10d-20f5-4aa6-9c3d-626b480b3014)




### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [X] Update/add unit or integration tests.
- [X] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [X] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).
**NOTE: I updated the schema but not the README, which seemed like it
was out of date, did not already include github, and suggested using a
javascript dependency to update it... please advise, if this needs
update.** 😄

If you are implementing a new intel module:
- **N/A** [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Daniel Brauer <[email protected]>
  • Loading branch information
danbrauer authored Nov 19, 2024
1 parent 3ac1727 commit 13ac0a6
Show file tree
Hide file tree
Showing 10 changed files with 513 additions and 101 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@
"query": "MATCH (:GitHubUser)-[r:MEMBER_OF]->(:GitHubOrganization) WHERE r.lastupdated <> $UPDATE_TAG WITH r LIMIT $LIMIT_SIZE DELETE (r)",
"iterative": true,
"iterationsize": 100
},
{
"query": "MATCH (:GitHubUser)-[r:UNAFFILIATED]->(:GitHubOrganization) WHERE r.lastupdated <> $UPDATE_TAG WITH r LIMIT $LIMIT_SIZE DELETE (r)",
"iterative": true,
"iterationsize": 100
}],
"name": "cleanup GitHub users data"
}
195 changes: 156 additions & 39 deletions cartography/intel/github/users.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
import logging
from copy import deepcopy
from typing import Any
from typing import Dict
from typing import List
from typing import Tuple

import neo4j

from cartography.client.core.tx import load
from cartography.intel.github.util import fetch_all
from cartography.models.github.orgs import GitHubOrganizationSchema
from cartography.models.github.users import GitHubOrganizationUserSchema
from cartography.models.github.users import GitHubUnaffiliatedUserSchema
from cartography.stats import get_stats_client
from cartography.util import merge_module_sync_metadata
from cartography.util import run_cleanup_job
Expand Down Expand Up @@ -44,17 +49,46 @@
}
"""

GITHUB_ENTERPRISE_OWNER_USERS_PAGINATED_GRAPHQL = """
query($login: String!, $cursor: String) {
organization(login: $login)
{
url
login
enterpriseOwners(first:100, after: $cursor){
edges {
node {
url
login
name
isSiteAdmin
email
company
}
organizationRole
}
pageInfo{
endCursor
hasNextPage
}
}
}
}
"""


@timeit
def get(token: str, api_url: str, organization: str) -> Tuple[List[Dict], Dict]:
def get_users(token: str, api_url: str, organization: str) -> Tuple[List[Dict], Dict]:
"""
Retrieve a list of users from the given GitHub organization as described in
https://docs.github.com/en/graphql/reference/objects#organizationmemberedge.
:param token: The Github API token as string.
:param api_url: The Github v4 API endpoint as string.
:param organization: The name of the target Github organization as string.
:return: A 2-tuple containing 1. a list of dicts representing users - see tests.data.github.users.GITHUB_USER_DATA
for shape, and 2. data on the owning GitHub organization - see tests.data.github.users.GITHUB_ORG_DATA for shape.
:return: A 2-tuple containing
1. a list of dicts representing users and
2. data on the owning GitHub organization
see tests.data.github.users.GITHUB_USER_DATA for shape of both
"""
users, org = fetch_all(
token,
Expand All @@ -66,56 +100,139 @@ def get(token: str, api_url: str, organization: str) -> Tuple[List[Dict], Dict]:
return users.edges, org


def get_enterprise_owners(token: str, api_url: str, organization: str) -> Tuple[List[Dict], Dict]:
"""
Retrieve a list of enterprise owners from the given GitHub organization as described in
https://docs.github.com/en/graphql/reference/objects#organizationenterpriseowneredge.
:param token: The Github API token as string.
:param api_url: The Github v4 API endpoint as string.
:param organization: The name of the target Github organization as string.
:return: A 2-tuple containing
1. a list of dicts representing users who are enterprise owners
3. data on the owning GitHub organization
see tests.data.github.users.GITHUB_ENTERPRISE_OWNER_DATA for shape
"""
owners, org = fetch_all(
token,
api_url,
organization,
GITHUB_ENTERPRISE_OWNER_USERS_PAGINATED_GRAPHQL,
'enterpriseOwners',
)
return owners.edges, org


@timeit
def load_organization_users(
neo4j_session: neo4j.Session, user_data: List[Dict], org_data: Dict,
def transform_users(user_data: List[Dict], owners_data: List[Dict], org_data: Dict) -> Tuple[List[Dict], List[Dict]]:
"""
Taking raw user and owner data, return two lists of processed user data:
* organization users aka affiliated users (users directly affiliated with an organization)
* unaffiliated users (user who, for example, are enterprise owners but not members of the target organization).
:param token: The Github API token as string.
:param api_url: The Github v4 API endpoint as string.
:param organization: The name of the target Github organization as string.
:return: A 2-tuple containing
1. a list of dicts representing users who are affiliated with the target org
see tests.data.github.users.GITHUB_USER_DATA for shape
2. a list of dicts representing users who are not affiliated (e.g. enterprise owners who are not also in
the target org) — see tests.data.github.users.GITHUB_ENTERPRISE_OWNER_DATA for shape
3. data on the owning GitHub organization
"""

users_dict = {}
for user in user_data:
processed_user = deepcopy(user['node'])
processed_user['role'] = user['role']
processed_user['hasTwoFactorEnabled'] = user['hasTwoFactorEnabled']
processed_user['MEMBER_OF'] = org_data['url']
users_dict[processed_user['url']] = processed_user

owners_dict = {}
for owner in owners_data:
processed_owner = deepcopy(owner['node'])
processed_owner['isEnterpriseOwner'] = True
if owner['organizationRole'] == 'UNAFFILIATED':
processed_owner['UNAFFILIATED'] = org_data['url']
else:
processed_owner['MEMBER_OF'] = org_data['url']
owners_dict[processed_owner['url']] = processed_owner

affiliated_users = [] # users affiliated with the target org
for url, user in users_dict.items():
user['isEnterpriseOwner'] = url in owners_dict
affiliated_users.append(user)

unaffiliated_users = [] # users not affiliated with the target org
for url, owner in owners_dict.items():
if url not in users_dict:
unaffiliated_users.append(owner)

return affiliated_users, unaffiliated_users


@timeit
def load_users(
neo4j_session: neo4j.Session,
node_schema: GitHubOrganizationUserSchema | GitHubUnaffiliatedUserSchema,
user_data: List[Dict],
org_data: Dict,
update_tag: int,
) -> None:
query = """
MERGE (org:GitHubOrganization{id: $OrgUrl})
ON CREATE SET org.firstseen = timestamp()
SET org.username = $OrgLogin,
org.lastupdated = $UpdateTag
WITH org
UNWIND $UserData as user
MERGE (u:GitHubUser{id: user.node.url})
ON CREATE SET u.firstseen = timestamp()
SET u.fullname = user.node.name,
u.username = user.node.login,
u.has_2fa_enabled = user.hasTwoFactorEnabled,
u.role = user.role,
u.is_site_admin = user.node.isSiteAdmin,
u.email = user.node.email,
u.company = user.node.company,
u.lastupdated = $UpdateTag
MERGE (u)-[r:MEMBER_OF]->(org)
ON CREATE SET r.firstseen = timestamp()
SET r.lastupdated = $UpdateTag
"""
neo4j_session.run(
query,
OrgUrl=org_data['url'],
OrgLogin=org_data['login'],
UserData=user_data,
UpdateTag=update_tag,
logger.info(f"Loading {len(user_data)} GitHub users to the graph")
load(
neo4j_session,
node_schema,
user_data,
lastupdated=update_tag,
org_url=org_data['url'],
)


@timeit
def load_organization(
neo4j_session: neo4j.Session,
node_schema: GitHubOrganizationSchema,
org_data: List[Dict[str, Any]],
update_tag: int,
) -> None:
logger.info(f"Loading {len(org_data)} GitHub organization to the graph")
load(
neo4j_session,
node_schema,
org_data,
lastupdated=update_tag,
)


@timeit
def sync(
neo4j_session: neo4j.Session,
common_job_parameters: Dict[str, Any],
common_job_parameters: Dict,
github_api_key: str,
github_url: str,
organization: str,
) -> None:
logger.info("Syncing GitHub users")
user_data, org_data = get(github_api_key, github_url, organization)
load_organization_users(neo4j_session, user_data, org_data, common_job_parameters['UPDATE_TAG'])
run_cleanup_job('github_users_cleanup.json', neo4j_session, common_job_parameters)
user_data, org_data = get_users(github_api_key, github_url, organization)
owners_data, org_data = get_enterprise_owners(github_api_key, github_url, organization)
processed_affiliated_user_data, processed_unaffiliated_user_data = (
transform_users(user_data, owners_data, org_data)
)
load_organization(
neo4j_session, GitHubOrganizationSchema(), [org_data],
common_job_parameters['UPDATE_TAG'],
)
load_users(
neo4j_session, GitHubOrganizationUserSchema(), processed_affiliated_user_data, org_data,
common_job_parameters['UPDATE_TAG'],
)
load_users(
neo4j_session, GitHubUnaffiliatedUserSchema(), processed_unaffiliated_user_data, org_data,
common_job_parameters['UPDATE_TAG'],
)
# no automated cleanup job for users because user node has no sub_resource_relationship
run_cleanup_job('github_org_and_users_cleanup.json', neo4j_session, common_job_parameters)
merge_module_sync_metadata(
neo4j_session,
group_type='GitHubOrganization',
Expand Down
26 changes: 26 additions & 0 deletions cartography/models/github/orgs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""
This schema does not handle the org's relationships. Those are handled by other schemas, for example:
* GitHubTeamSchema defines (GitHubOrganization)-[RESOURCE]->(GitHubTeam)
* GitHubUserSchema defines (GitHubUser)-[MEMBER_OF|UNAFFILIATED]->(GitHubOrganization)
(There may be others, these are just two examples.)
"""
from dataclasses import dataclass

from cartography.models.core.common import PropertyRef
from cartography.models.core.nodes import CartographyNodeProperties
from cartography.models.core.nodes import CartographyNodeSchema


@dataclass(frozen=True)
class GitHubOrganizationNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef('url')
username: PropertyRef = PropertyRef('login', extra_index=True)
lastupdated: PropertyRef = PropertyRef('lastupdated', set_in_kwargs=True)


@dataclass(frozen=True)
class GitHubOrganizationSchema(CartographyNodeSchema):
label: str = 'GitHubOrganization'
properties: GitHubOrganizationNodeProperties = GitHubOrganizationNodeProperties()
other_relationships = None
sub_resource_relationship = None
Loading

0 comments on commit 13ac0a6

Please sign in to comment.