-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github "Immediate" User Team Membership #1395
Github "Immediate" User Team Membership #1395
Conversation
e67c640
to
b5952af
Compare
06e9627
to
23487cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
I will open a task to support cleanup jobs for node schemas that aren't attached to a tenant (theres a slack thread on this somewhere).
Right now this code does not support users moving between teams because there is no cleanup job defined for GitHub team-to-user relationships.
Give me a little bit and I'll try to add that feature before the next release.
For now, can you let me know your thoughts on my non-blocking comments?
cartography/intel/github/teams.py
Outdated
@@ -35,6 +38,9 @@ def get_teams(org: str, api_url: str, token: str) -> Tuple[PaginatedGraphqlData, | |||
repositories(first: 100) { | |||
totalCount | |||
} | |||
members(first: 100, membership: IMMEDIATE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] I see that we're using this graphql query to return the total count of members for an org-team. Just wondering, what does first: 100
mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was required, even though it didn't seem necessary. I could've sworn in other queries elsewhere I got errors without it, tho it seems like here for this it isn't needed. I will remove it tmmw.
(Oh also, I was following the pattern in the line above, where repositories
does this, specifying first: 100
even tho it's only looking for a count. I could see if that can be cleaned up as well.)
cartography/intel/github/teams.py
Outdated
) | ||
if current_try == max_tries: | ||
raise RuntimeError(f"GitHub returned a None member url for team {team_name}, retries exhausted.") | ||
sleep(current_try ** 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nonblock] Could be good to log that a sleep occurred and for how long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here again I was copying the pre-existing function in this file, _get_team_repos_for_multiple_teams
. I can add a log in both, so they're consistent.
@@ -142,10 +148,106 @@ def _get_team_repos(org: str, api_url: str, token: str, team: str) -> PaginatedG | |||
return team_repos | |||
|
|||
|
|||
def _get_team_users_for_multiple_teams( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nonblock] Consider using the backoff
decorator instead of writing the retry logic in here. You'd save on one level of nesting to make it a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I was copying the existing pattern for the pre-existing function in this file, _get_team_repos_for_multiple_teams
. So I can look into how the decorator works and see if I understand how to apply it, but then I'd want to do the same in the other function, so there isn't a confusing difference between two very similar funcs.
I took a quick look and replied. I can try to do the changes tmmw morning. Two of them seems clear (cleaning up the graphql; adding a log statement). The other one, about using a backoff decorator, is less clear but I am happy to look into it when back at my desk. Thank you! |
I pushed the two easy updates. I will look at the third, doing a 'backoff', in a bit. |
Using a backoff decorator seems simple enough on its own, but I am encountering some typing issues around the decorated function, and then will have to understand how to update existing testing to ensure the retries act as desired / decorator is wired up right. I have to step away now but can come back to it later today. |
logger.warning( | ||
"Backing off {wait:0.1f} seconds after {tries} tries. Calling function {target} for team {team_name}" | ||
.format(**updated_details), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure how valuable this was over just using the backoff_handler over in util.py. If you think it's not worth it, I can get rid of it.
retries_with_backoff(get_teams_repos_inner_func, TypeError, 5, backoff_handler)( | ||
org=org, api_url=api_url, token=token, team_name=team_name, | ||
repo_urls=repo_urls, repo_permissions=repo_permissions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I had to wrap the inner func in another func to work through some mypy typechecks. (Also side benefit is it allows
retries_with_backoff
to be a reusable thing.) - I used kwargs instead of args to make the backoff_handler able to take some info out of the args easily, for more specific logging.
@@ -123,8 +123,8 @@ def test_get_team_users_happy_path(mock_get_team_users): | |||
|
|||
|
|||
@patch('cartography.intel.github.teams._get_team_repos') | |||
@patch('cartography.intel.github.teams.sleep') | |||
def test_get_team_repos_github_returns_none(mock_sleep, mock_get_team_repos): | |||
@patch('cartography.intel.github.teams.backoff_handler', spec=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't need to replace 'sleep' with 'backoff_handler', we could've had a fine test without it. But, it did give a nice extra affirmation that the decorator is wired in correctly, because we can tell that it does its backoff handling the expected number of times.
The spec=True
was needed, if I understand right, to fool the decorator into thinking it had a backoff handler func it could use. Without this, with just a default MagicMock, it wasn't being invoked. With this, it was being invoked exactly as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, I ended up not including it because it felt redundant, but, I did write a test where I returned two bad None responses and then 1 good response, and validated that 2 retries happened and then the thing succeeded.)
@@ -134,7 +134,7 @@ def test_get_team_repos_github_returns_none(mock_sleep, mock_get_team_repos): | |||
mock_get_team_repos.return_value = mock_team_repos | |||
|
|||
# Assert we raise an exception | |||
with pytest.raises(RuntimeError): | |||
with pytest.raises(TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeError here rather than RuntimeError because in the old code, it explicitly threw a RuntimeError, whereas in the new code, it throws whatever the underlying error is, which in this case is TypeError.
@@ -32,7 +47,10 @@ def get_teams(org: str, api_url: str, token: str) -> Tuple[PaginatedGraphqlData, | |||
slug | |||
url | |||
description | |||
repositories(first: 100) { | |||
repositories { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the 'first: 100' here because it was never necessary
@achantavy This should be ready for review again. Happy to discuss the use of the decorator (or anything else) if you had something different in mind. |
Signed-off-by: Daniel Brauer <[email protected]>
Signed-off-by: Daniel Brauer <[email protected]>
Signed-off-by: Daniel Brauer <[email protected]>
9796696
to
9234e51
Compare
Spoke with Daniel, we are going to merge this in for now and it will be part of a release candidate. Will fast follow with a change to add automatic cleanup jobs. |
### NOTE: all the below still holds by the relationships between the nodes has been changed from 'CHILD_TEAM' to 'MEMBER_OF_TEAM' ### Summary This PR adds to the Github graph, adding child team members of teams. This is very similar to, and can be considered a follow-up to the prior PR #1395, which added 'immediate' user members of teams. Now this PR is adding child-teams. Between these two, the graph can now answer the question of who is a member of a team, either directly or via child teams. We think this is a valuable addition to the graph because our broad intent is to understand all access a user has. Since access is frequently granted to teams, we need the complete picture of team membership. #### Illustration of the intention ![Cartography AMPS Team Child Team Membership](https://github.com/user-attachments/assets/9620f50b-310a-43d1-a15b-28fe8480bbc4) #### Screencaps **EXAMPLE ALL TEAMS LOOKUP** BEFORE (nothing is returned because the relationship does not exist) ![Screenshot 2024-12-06 at 11 23 54 AM](https://github.com/user-attachments/assets/cd2310cd-d2ff-4faf-876a-00e798cfe1e9) AFTER (details not visible here but hopefully this related a sense of the relationships) ![Screenshot 2024-12-06 at 11 53 11 AM](https://github.com/user-attachments/assets/5dfdd96f-526c-4299-9c1e-464f84d6e9d9) **EXAMPLE SINGLE TEAM LOOKUP** BEFORE (nothing is returned because the relationship does not exist) ![Screenshot 2024-12-06 at 12 44 55 PM](https://github.com/user-attachments/assets/5dc0419b-ac4d-4038-9005-e2b41703563a) AFTER ![Screenshot 2024-12-06 at 12 44 32 PM](https://github.com/user-attachments/assets/926776e2-c497-4c10-a80a-05cf54de532b) #### Note on loops / cyclic graphs GitHub does not seem to allow the creation of loops, eg `Team A <-- Team B <-- Team A`, so this PR has no handling or guarding against that sort of thing. ### Related issues or links None. ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [x] Include a screenshot showing what the graph looked like before and after your changes. - [ ] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [x] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). **Not applicable** If you are implementing a new intel module: - [ ] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). --------- Signed-off-by: Daniel Brauer <[email protected]>
Summary
This PR adds to the Github graph, adding user membership in teams, for users who are 'immediate' members of a team.
In case it is unclear or for people newer to Github, note: this is focusing on 'immediate' membership to a team, meaning a member is in the team directly as opposed to being in a child team. A user could be considered a member of a team if they are members of a child team, but this PR maps only 'immediate' membership. (In a follow-up PR we'd like to add child teams to the graph, which we think will complete the membership picture.)
We think this is a valuable addition to the graph because our broad intent is to understand all access a user has, and (at least in our org) most access to repos is granted via team. If we do not know who is in the team, then, we do not know who has access.
Illustration of the intention
Screencaps
EXAMPLE USER LOOKUP
BEFORE
(empty result because nothing exists)
AFTER
OVERVIEW OF COUNTS OF EACH TYPE
BEFORE
(empty result because nothing exists)
AFTER
Related issues or links
None
Checklist
Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:
If you are changing a node or relationship:
N/A If you are implementing a new intel module: