Skip to content

Commit

Permalink
Github "Immediate" User Team Membership (#1395)
Browse files Browse the repository at this point in the history
### Summary

This PR adds to the Github graph, adding user membership in teams, for
users who are 'immediate' members of a team.

In case it is unclear or for people newer to Github, note: this is
focusing on
['immediate'](https://docs.github.com/en/graphql/reference/enums#teammembershiptype)
membership to a team, meaning a member is in the team directly as
opposed to being in a child team. A user could be considered a member of
a team if they are members of a child team, but this PR maps only
'immediate' membership. (In a follow-up PR we'd like to add child teams
to the graph, which we think will complete the membership picture.)

We think this is a valuable addition to the graph because our broad
intent is to understand all access a user has, and (at least in our org)
most access to repos is granted via team. If we do not know who is in
the team, then, we do not know who has access.

#### Illustration of the intention
![Cartography AMPS User Direct Team
Membership](https://github.com/user-attachments/assets/7d3d70ab-ab16-4a21-8970-3f9d2b8fe525)

#### Screencaps

**EXAMPLE USER LOOKUP**

BEFORE
(empty result because nothing exists)
![Screenshot 2024-12-03 at 5 07
00 PM](https://github.com/user-attachments/assets/908e8e96-179d-494a-ac7d-acc03ee54ab1)

AFTER
![Screenshot 2024-12-03 at 5 06
04 PM](https://github.com/user-attachments/assets/073376ee-b8d6-4b68-8d7e-246e9f9601a2)

**OVERVIEW OF COUNTS OF EACH TYPE**

BEFORE
(empty result because nothing exists)
![Screenshot 2024-12-03 at 5 09
03 PM](https://github.com/user-attachments/assets/c4758f8f-71ec-49b5-a94a-c3d464a9a51e)

AFTER
![Screenshot 2024-12-03 at 5 08
44 PM](https://github.com/user-attachments/assets/cc0be586-c1bf-440b-a8a0-6bc236272509)


### Related issues or links

None


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [x] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**N/A** If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Daniel Brauer <[email protected]>
  • Loading branch information
danbrauer authored Dec 11, 2024
1 parent 7feb218 commit 5d5f856
Show file tree
Hide file tree
Showing 8 changed files with 495 additions and 48 deletions.
179 changes: 143 additions & 36 deletions cartography/intel/github/teams.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import logging
from collections import namedtuple
from time import sleep
from typing import Any
from typing import Dict
from typing import List
Expand All @@ -13,11 +12,27 @@
from cartography.intel.github.util import fetch_all
from cartography.intel.github.util import PaginatedGraphqlData
from cartography.models.github.teams import GitHubTeamSchema
from cartography.util import retries_with_backoff
from cartography.util import timeit

logger = logging.getLogger(__name__)

# A team's permission on a repo: https://docs.github.com/en/graphql/reference/enums#repositorypermission
RepoPermission = namedtuple('RepoPermission', ['repo_url', 'permission'])
# A team member's role: https://docs.github.com/en/graphql/reference/enums#teammemberrole
UserRole = namedtuple('UserRole', ['user_url', 'role'])


def backoff_handler(details: Dict) -> None:
"""
Custom backoff handler for GitHub calls in this module.
"""
team_name = details['kwargs'].get('team_name') or 'not present in kwargs'
updated_details = {**details, 'team_name': team_name}
logger.warning(
"Backing off {wait:0.1f} seconds after {tries} tries. Calling function {target} for team {team_name}"
.format(**updated_details),
)


@timeit
Expand All @@ -32,7 +47,10 @@ def get_teams(org: str, api_url: str, token: str) -> Tuple[PaginatedGraphqlData,
slug
url
description
repositories(first: 100) {
repositories {
totalCount
}
members(membership: IMMEDIATE) {
totalCount
}
}
Expand Down Expand Up @@ -64,36 +82,27 @@ def _get_team_repos_for_multiple_teams(
result[team_name] = []
continue

repo_urls = []
repo_permissions = []

max_tries = 5
repo_urls: List[str] = []
repo_permissions: List[str] = []

for current_try in range(1, max_tries + 1):
def get_teams_repos_inner_func(
org: str, api_url: str, token: str, team_name: str,
repo_urls: List[str], repo_permissions: List[str],
) -> None:
logger.info(f"Loading team repos for {team_name}.")
team_repos = _get_team_repos(org, api_url, token, team_name)
# The `or []` is because `.nodes` can be None. See:
# https://docs.github.com/en/graphql/reference/objects#teamrepositoryconnection
for repo in team_repos.nodes or []:
repo_urls.append(repo['url'])
# The `or []` is because `.edges` can be None.
for edge in team_repos.edges or []:
repo_permissions.append(edge['permission'])

try:
# The `or []` is because `.nodes` can be None. See:
# https://docs.github.com/en/graphql/reference/objects#teamrepositoryconnection
for repo in team_repos.nodes or []:
repo_urls.append(repo['url'])

# The `or []` is because `.edges` can be None.
for edge in team_repos.edges or []:
repo_permissions.append(edge['permission'])
# We're done! Break out of the retry loop.
break

except TypeError:
# Handles issue #1334
logger.warning(
f"GitHub returned None when trying to find repo or permission data for team {team_name}.",
exc_info=True,
)
if current_try == max_tries:
raise RuntimeError(f"GitHub returned a None repo url for team {team_name}, retries exhausted.")
sleep(current_try ** 2)

retries_with_backoff(get_teams_repos_inner_func, TypeError, 5, backoff_handler)(
org=org, api_url=api_url, token=token, team_name=team_name,
repo_urls=repo_urls, repo_permissions=repo_permissions,
)
# Shape = [(repo_url, 'WRITE'), ...]]
result[team_name] = [RepoPermission(url, perm) for url, perm in zip(repo_urls, repo_permissions)]
return result
Expand Down Expand Up @@ -142,10 +151,97 @@ def _get_team_repos(org: str, api_url: str, token: str, team: str) -> PaginatedG
return team_repos


def _get_team_users_for_multiple_teams(
team_raw_data: list[dict[str, Any]],
org: str,
api_url: str,
token: str,
) -> dict[str, list[UserRole]]:
result: dict[str, list[UserRole]] = {}
for team in team_raw_data:
team_name = team['slug']
user_count = team['members']['totalCount']

if user_count == 0:
# This team has no users so let's move on
result[team_name] = []
continue

user_urls: List[str] = []
user_roles: List[str] = []

def get_teams_users_inner_func(
org: str, api_url: str, token: str, team_name: str,
user_urls: List[str], user_roles: List[str],
) -> None:
logger.info(f"Loading team users for {team_name}.")
team_users = _get_team_users(org, api_url, token, team_name)
# The `or []` is because `.nodes` can be None. See:
# https://docs.github.com/en/graphql/reference/objects#teammemberconnection
for user in team_users.nodes or []:
user_urls.append(user['url'])
# The `or []` is because `.edges` can be None.
for edge in team_users.edges or []:
user_roles.append(edge['role'])

retries_with_backoff(get_teams_users_inner_func, TypeError, 5, backoff_handler)(
org=org, api_url=api_url, token=token, team_name=team_name, user_urls=user_urls, user_roles=user_roles,
)

# Shape = [(user_url, 'MAINTAINER'), ...]]
result[team_name] = [UserRole(url, role) for url, role in zip(user_urls, user_roles)]
return result


@timeit
def _get_team_users(org: str, api_url: str, token: str, team: str) -> PaginatedGraphqlData:
team_users_gql = """
query($login: String!, $team: String!, $cursor: String) {
organization(login: $login) {
url
login
team(slug: $team) {
slug
members(first: 100, after: $cursor, membership: IMMEDIATE) {
totalCount
nodes {
url
}
edges {
role
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
rateLimit {
limit
cost
remaining
resetAt
}
}
"""
team_users, _ = fetch_all(
token,
api_url,
org,
team_users_gql,
'team',
resource_inner_type='members',
team=team,
)
return team_users


def transform_teams(
team_paginated_data: PaginatedGraphqlData,
org_data: Dict[str, Any],
team_repo_data: dict[str, list[RepoPermission]],
team_user_data: dict[str, list[UserRole]],
) -> list[dict[str, Any]]:
result = []
for team in team_paginated_data.nodes:
Expand All @@ -155,19 +251,29 @@ def transform_teams(
'url': team['url'],
'description': team['description'],
'repo_count': team['repositories']['totalCount'],
'member_count': team['members']['totalCount'],
'org_url': org_data['url'],
'org_login': org_data['login'],
}
repo_permissions = team_repo_data[team_name]
if not repo_permissions:
user_roles = team_user_data[team_name]

if not repo_permissions and not user_roles:
result.append(repo_info)
continue

# `permission` can be one of ADMIN, READ, WRITE, TRIAGE, or MAINTAIN
for repo_url, permission in repo_permissions:
repo_info_copy = repo_info.copy()
repo_info_copy[permission] = repo_url
result.append(repo_info_copy)
if repo_permissions:
# `permission` can be one of ADMIN, READ, WRITE, TRIAGE, or MAINTAIN
for repo_url, permission in repo_permissions:
repo_info_copy = repo_info.copy()
repo_info_copy[permission] = repo_url
result.append(repo_info_copy)
if user_roles:
# `role` can be one of MAINTAINER, MEMBER
for user_url, role in user_roles:
repo_info_copy = repo_info.copy()
repo_info_copy[role] = user_url
result.append(repo_info_copy)
return result


Expand Down Expand Up @@ -203,7 +309,8 @@ def sync_github_teams(
) -> None:
teams_paginated, org_data = get_teams(organization, github_url, github_api_key)
team_repos = _get_team_repos_for_multiple_teams(teams_paginated.nodes, organization, github_url, github_api_key)
processed_data = transform_teams(teams_paginated, org_data, team_repos)
team_users = _get_team_users_for_multiple_teams(teams_paginated.nodes, organization, github_url, github_api_key)
processed_data = transform_teams(teams_paginated, org_data, team_repos, team_users)
load_team_repos(neo4j_session, processed_data, common_job_parameters['UPDATE_TAG'], org_data['url'])
common_job_parameters['org_url'] = org_data['url']
cleanup(neo4j_session, common_job_parameters)
29 changes: 29 additions & 0 deletions cartography/models/github/teams.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,33 @@ class GitHubTeamWriteRepoRel(CartographyRelSchema):
properties: GitHubTeamToRepoRelProperties = GitHubTeamToRepoRelProperties()


@dataclass(frozen=True)
class GitHubTeamToUserRelProperties(CartographyRelProperties):
lastupdated: PropertyRef = PropertyRef('lastupdated', set_in_kwargs=True)


@dataclass(frozen=True)
class GitHubTeamMaintainerUserRel(CartographyRelSchema):
target_node_label: str = 'GitHubUser'
target_node_matcher: TargetNodeMatcher = make_target_node_matcher(
{'id': PropertyRef('MAINTAINER')},
)
direction: LinkDirection = LinkDirection.INWARD
rel_label: str = "MAINTAINER"
properties: GitHubTeamToUserRelProperties = GitHubTeamToUserRelProperties()


@dataclass(frozen=True)
class GitHubTeamMemberUserRel(CartographyRelSchema):
target_node_label: str = 'GitHubUser'
target_node_matcher: TargetNodeMatcher = make_target_node_matcher(
{'id': PropertyRef('MEMBER')},
)
direction: LinkDirection = LinkDirection.INWARD
rel_label: str = "MEMBER"
properties: GitHubTeamToUserRelProperties = GitHubTeamToUserRelProperties()


@dataclass(frozen=True)
class GitHubTeamToOrganizationRelProperties(CartographyRelProperties):
lastupdated: PropertyRef = PropertyRef('lastupdated', set_in_kwargs=True)
Expand Down Expand Up @@ -107,6 +134,8 @@ class GitHubTeamSchema(CartographyNodeSchema):
GitHubTeamReadRepoRel(),
GitHubTeamTriageRepoRel(),
GitHubTeamWriteRepoRel(),
GitHubTeamMaintainerUserRel(),
GitHubTeamMemberUserRel(),
],
)
sub_resource_relationship: GitHubTeamToOrganizationRel = GitHubTeamToOrganizationRel()
20 changes: 20 additions & 0 deletions cartography/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from typing import List
from typing import Optional
from typing import Set
from typing import Type
from typing import TypeVar
from typing import Union

Expand Down Expand Up @@ -288,6 +289,25 @@ def inner_function(*args, **kwargs): # type: ignore
return cast(AWSGetFunc, inner_function)


def retries_with_backoff(
func: Callable,
exceptionType: Type[Exception], max_tries: int, on_backoff: Callable,
) -> Callable:
"""
Adds retry with backoff to the given function. (Could expand the possible input parameters as needed.)
"""
@wraps(func)
@backoff.on_exception(
backoff.expo,
exceptionType,
max_tries=max_tries,
on_backoff=on_backoff,
)
def inner_function(*args, **kwargs): # type: ignore
return func(*args, **kwargs)
return cast(Callable, inner_function)


def dict_value_to_str(obj: Dict, key: str) -> Optional[str]:
"""
Convert the value referenced by the key in the dict to a string, if it exists, and return it. If it doesn't exist,
Expand Down
13 changes: 13 additions & 0 deletions docs/root/modules/github/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,12 @@ A GitHubTeam [organization object](https://docs.github.com/en/graphql/reference/
(GitHubOrganization)-[RESOURCE]->(GitHubTeam)
```
- GitHubUsers may be ['immediate'](https://docs.github.com/en/graphql/reference/enums#teammembershiptype) members of a team (as opposed to being members via membership in a child team), with their membership [role](https://docs.github.com/en/graphql/reference/enums#teammemberrole) being MEMBER or MAINTAINER.
```
(GitHubUser)-[MEMBER|MAINTAINER]->(GitHubTeam)
```
### GitHubUser
Representation of a single GitHubUser [user object](https://developer.github.com/v4/object/user/). This node contains minimal data for the GitHub User.
Expand Down Expand Up @@ -178,6 +184,13 @@ WRITE, MAINTAIN, TRIAGE, and READ ([Reference](https://docs.github.com/en/graphq
(GitHubUser)-[MEMBER_OF|UNAFFILIATED]->(GitHubOrganization)
```
- GitHubUsers may be ['immediate'](https://docs.github.com/en/graphql/reference/enums#teammembershiptype) members of a team (as opposed to being members via membership in a child team), with their membership [role](https://docs.github.com/en/graphql/reference/enums#teammemberrole) being MEMBER or MAINTAINER.
```
(GitHubUser)-[MEMBER|MAINTAINER]->(GitHubTeam)
```
### GitHubBranch
Representation of a single GitHubBranch [ref object](https://developer.github.com/v4/object/ref). This node contains minimal data for a repository branch.
Expand Down
Loading

0 comments on commit 5d5f856

Please sign in to comment.