Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore PRs once repo is archived #45

Closed
wants to merge 6 commits into from
Closed

Conversation

ghickman
Copy link
Contributor

This moves the github backfill command to using @benbc's GraphQL queries to pull the repos then PRs for a given org. I took a look at doing this in one query but consuming subpages added loads of complexity, and Ben had already written the other code, so easy win. The goal with moving to the GraphQL API for backfilling was to get the archivedAt value for repos, which isn't present in GitHub's REST API.

That allowed the second half of this change, to ignore PRs after a repo has been archived.

metrics/github/backfill.py Show resolved Hide resolved
metrics/github/backfill.py Outdated Show resolved Hide resolved
if not pr["repo_archived_at"]:
return True

if pr[key] < pr["repo_archived_at"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be right. It's keeping all PRs that were created/merged before the repo was archived, which will be all of them. Don't we need instead to compare the archive date with the "current" date (where "current" scans through time in back-filling)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, yep, you're completely right

@@ -69,10 +70,12 @@ def pr_throughput(ctx, org, date):

with TimescaleDBWriter(GitHubPullRequests) as writer:
opened_prs = api.prs_opened_on_date(org, date)
opened_prs = drop_archived_prs(opened_prs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for this filter. There won't be any newly-opened PRs for an archived repo, will there?

log.info("%s | %s | Processing %s opened PRs", date, org, len(opened_prs))
process_prs(writer, opened_prs, date, name="prs_opened")

merged_prs = api.prs_merged_on_date(org, date)
merged_prs = drop_archived_prs(merged_prs, key="merged")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for this filter. There won't be any newly-merged PRs for an archived repo, will there?

@@ -40,7 +39,7 @@ source = [
]

[tool.coverage.report]
fail_under = 69
fail_under = 67
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the point is of asserting on coverage if we're just going to keep bumping the threshold down. It may not be worth writing tests for full coverage, but in that case we don't get any value from measuring it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the worst part of <100% coverage. My next PR bumps it up to 73 if that helps…?

assert {o["created"] for o in output} == {
pr1["created"],
pr4["created"],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion seems a bit more complex than it needs to be. Could it not just read like this?

assert output == {pr1, pr4}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't hash dicts! Otherwise yes I'd love to reduce it down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Lists then?

@ghickman
Copy link
Contributor Author

Closing this one in favour of:

  1. backfilling with GraphQL by itself
  2. promoting backfilling to the only GitHub method
  3. fixing open PRs to be a point-in-time sample
  4. removing PRs once their repo has been archived

@ghickman ghickman closed this Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants