Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github "Immediate" User Team Membership #1395

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

danbrauer
Copy link
Contributor

Summary

This PR adds to the Github graph, adding user membership in teams, for users who are 'immediate' members of a team.

In case it is unclear or for people newer to Github, note: this is focusing on 'immediate' membership to a team, meaning a member is in the team directly as opposed to being in a child team. A user could be considered a member of a team if they are members of a child team, but this PR maps only 'immediate' membership. (In a follow-up PR we'd like to add child teams to the graph, which we think will complete the membership picture.)

We think this is a valuable addition to the graph because our broad intent is to understand all access a user has, and (at least in our org) most access to repos is granted via team. If we do not know who is in the team, then, we do not know who has access.

Illustration of the intention

Cartography AMPS User Direct Team Membership

Screencaps

EXAMPLE USER LOOKUP

BEFORE
(empty result because nothing exists)
Screenshot 2024-12-03 at 5 07 00 PM

AFTER
Screenshot 2024-12-03 at 5 06 04 PM

OVERVIEW OF COUNTS OF EACH TYPE

BEFORE
(empty result because nothing exists)
Screenshot 2024-12-03 at 5 09 03 PM

AFTER
Screenshot 2024-12-03 at 5 08 44 PM

Related issues or links

None

Checklist

Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:

  • Update/add unit or integration tests.
  • Include a screenshot showing what the graph looked like before and after your changes.
  • Include console log trace showing what happened before and after your changes.

If you are changing a node or relationship:

N/A If you are implementing a new intel module:

@danbrauer danbrauer force-pushed the gh_user_team_membership branch 2 times, most recently from e67c640 to b5952af Compare December 5, 2024 18:59
@danbrauer danbrauer mentioned this pull request Dec 6, 2024
5 tasks
@danbrauer danbrauer force-pushed the gh_user_team_membership branch 2 times, most recently from 06e9627 to 23487cd Compare December 6, 2024 21:38
Copy link
Contributor

@achantavy achantavy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

I will open a task to support cleanup jobs for node schemas that aren't attached to a tenant (theres a slack thread on this somewhere).

Right now this code does not support users moving between teams because there is no cleanup job defined for GitHub team-to-user relationships.

Give me a little bit and I'll try to add that feature before the next release.

For now, can you let me know your thoughts on my non-blocking comments?

@@ -35,6 +38,9 @@ def get_teams(org: str, api_url: str, token: str) -> Tuple[PaginatedGraphqlData,
repositories(first: 100) {
totalCount
}
members(first: 100, membership: IMMEDIATE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] I see that we're using this graphql query to return the total count of members for an org-team. Just wondering, what does first: 100 mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was required, even though it didn't seem necessary. I could've sworn in other queries elsewhere I got errors without it, tho it seems like here for this it isn't needed. I will remove it tmmw.

(Oh also, I was following the pattern in the line above, where repositories does this, specifying first: 100 even tho it's only looking for a count. I could see if that can be cleaned up as well.)

)
if current_try == max_tries:
raise RuntimeError(f"GitHub returned a None member url for team {team_name}, retries exhausted.")
sleep(current_try ** 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nonblock] Could be good to log that a sleep occurred and for how long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again I was copying the pre-existing function in this file, _get_team_repos_for_multiple_teams. I can add a log in both, so they're consistent.

@@ -142,10 +148,106 @@ def _get_team_repos(org: str, api_url: str, token: str, team: str) -> PaginatedG
return team_repos


def _get_team_users_for_multiple_teams(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nonblock] Consider using the backoff decorator instead of writing the retry logic in here. You'd save on one level of nesting to make it a bit cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was copying the existing pattern for the pre-existing function in this file, _get_team_repos_for_multiple_teams. So I can look into how the decorator works and see if I understand how to apply it, but then I'd want to do the same in the other function, so there isn't a confusing difference between two very similar funcs.

@danbrauer
Copy link
Contributor Author

I took a quick look and replied. I can try to do the changes tmmw morning. Two of them seems clear (cleaning up the graphql; adding a log statement). The other one, about using a backoff decorator, is less clear but I am happy to look into it when back at my desk. Thank you!

@danbrauer
Copy link
Contributor Author

I pushed the two easy updates. I will look at the third, doing a 'backoff', in a bit.

@danbrauer
Copy link
Contributor Author

danbrauer commented Dec 10, 2024

Using a backoff decorator seems simple enough on its own, but I am encountering some typing issues around the decorated function, and then will have to understand how to update existing testing to ensure the retries act as desired / decorator is wired up right. I have to step away now but can come back to it later today.

logger.warning(
"Backing off {wait:0.1f} seconds after {tries} tries. Calling function {target} for team {team_name}"
.format(**updated_details),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure how valuable this was over just using the backoff_handler over in util.py. If you think it's not worth it, I can get rid of it.

Comment on lines +102 to +104
retries_with_backoff(get_teams_repos_inner_func, TypeError, 5, backoff_handler)(
org=org, api_url=api_url, token=token, team_name=team_name,
repo_urls=repo_urls, repo_permissions=repo_permissions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I had to wrap the inner func in another func to work through some mypy typechecks. (Also side benefit is it allows retries_with_backoff to be a reusable thing.)
  2. I used kwargs instead of args to make the backoff_handler able to take some info out of the args easily, for more specific logging.

@@ -123,8 +123,8 @@ def test_get_team_users_happy_path(mock_get_team_users):


@patch('cartography.intel.github.teams._get_team_repos')
@patch('cartography.intel.github.teams.sleep')
def test_get_team_repos_github_returns_none(mock_sleep, mock_get_team_repos):
@patch('cartography.intel.github.teams.backoff_handler', spec=True)
Copy link
Contributor Author

@danbrauer danbrauer Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't need to replace 'sleep' with 'backoff_handler', we could've had a fine test without it. But, it did give a nice extra affirmation that the decorator is wired in correctly, because we can tell that it does its backoff handling the expected number of times.

The spec=True was needed, if I understand right, to fool the decorator into thinking it had a backoff handler func it could use. Without this, with just a default MagicMock, it wasn't being invoked. With this, it was being invoked exactly as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, I ended up not including it because it felt redundant, but, I did write a test where I returned two bad None responses and then 1 good response, and validated that 2 retries happened and then the thing succeeded.)

@@ -134,7 +134,7 @@ def test_get_team_repos_github_returns_none(mock_sleep, mock_get_team_repos):
mock_get_team_repos.return_value = mock_team_repos

# Assert we raise an exception
with pytest.raises(RuntimeError):
with pytest.raises(TypeError):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError here rather than RuntimeError because in the old code, it explicitly threw a RuntimeError, whereas in the new code, it throws whatever the underlying error is, which in this case is TypeError.

@@ -32,7 +47,10 @@ def get_teams(org: str, api_url: str, token: str) -> Tuple[PaginatedGraphqlData,
slug
url
description
repositories(first: 100) {
repositories {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the 'first: 100' here because it was never necessary

@danbrauer
Copy link
Contributor Author

@achantavy This should be ready for review again. Happy to discuss the use of the decorator (or anything else) if you had something different in mind.

@danbrauer danbrauer force-pushed the gh_user_team_membership branch from 9796696 to 9234e51 Compare December 11, 2024 13:14
@achantavy
Copy link
Contributor

Spoke with Daniel, we are going to merge this in for now and it will be part of a release candidate. Will fast follow with a change to add automatic cleanup jobs.

@achantavy achantavy merged commit 5d5f856 into cartography-cncf:master Dec 11, 2024
7 checks passed
achantavy pushed a commit that referenced this pull request Dec 20, 2024
### NOTE: all the below still holds by the relationships between the
nodes has been changed from 'CHILD_TEAM' to 'MEMBER_OF_TEAM'

### Summary

This PR adds to the Github graph, adding child team members of teams.

This is very similar to, and can be considered a follow-up to the prior
PR #1395, which added 'immediate' user members of teams. Now this PR is
adding child-teams. Between these two, the graph can now answer the
question of who is a member of a team, either directly or via child
teams.

We think this is a valuable addition to the graph because our broad
intent is to understand all access a user has. Since access is
frequently granted to teams, we need the complete picture of team
membership.

#### Illustration of the intention
![Cartography AMPS Team Child Team
Membership](https://github.com/user-attachments/assets/9620f50b-310a-43d1-a15b-28fe8480bbc4)

#### Screencaps

**EXAMPLE ALL TEAMS LOOKUP**

BEFORE
(nothing is returned because the relationship does not exist)
![Screenshot 2024-12-06 at 11 23
54 AM](https://github.com/user-attachments/assets/cd2310cd-d2ff-4faf-876a-00e798cfe1e9)

AFTER
(details not visible here but hopefully this related a sense of the
relationships)
![Screenshot 2024-12-06 at 11 53
11 AM](https://github.com/user-attachments/assets/5dfdd96f-526c-4299-9c1e-464f84d6e9d9)

**EXAMPLE SINGLE TEAM LOOKUP**

BEFORE
(nothing is returned because the relationship does not exist)
![Screenshot 2024-12-06 at 12 44
55 PM](https://github.com/user-attachments/assets/5dc0419b-ac4d-4038-9005-e2b41703563a)

AFTER
![Screenshot 2024-12-06 at 12 44
32 PM](https://github.com/user-attachments/assets/926776e2-c497-4c10-a80a-05cf54de532b)

#### Note on loops / cyclic graphs

GitHub does not seem to allow the creation of loops, eg `Team A <-- Team
B <-- Team A`, so this PR has no handling or guarding against that sort
of thing.



### Related issues or links

None.


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [x] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**Not applicable** If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Daniel Brauer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants