Backend infrastructure to track and manage metrics of GitHub projects by monitoring issues, pull requests, and events using AWS services and Python.
This app is designed to continually monitor and update GitHub project metrics using various GitHub APIs. Below is the detailed breakdown of its structure and functionalities:
At the onset, the necessary modules are imported and instances of GitHubAPI
, TimelineAPI
, and TransferAPI
are set up with the requisite tokens. A database session is established using credentials derived from the AWS Systems Manager Parameter Store. These initializations take place exclusively when the app operates in an AWS Lambda environment, bypassing the AWS Chalice CLI mode.
The app has three scheduled Lambda functions to facilitate tasks at different intervals: every 30 minutes, every 10 minutes, and daily at 5:00 am UTC.
- Frequency: Every 30 minutes
- Tasks:
- Registers and updates new Pull Requests (PRs) and Issues formulated within defined temporal thresholds.
- Refreshes the status of PRs and recently closed PRs in the database.
- Reconciles the status of unmerged, closed PRs tracing back to one year (configurable).
- Updates any novel team member data daily (for an organization).
- Manages issue transfers via transferred issue reconciliations.
- Frequency: Every 10 minutes
- Tasks:
- Records near-real-time (NRT) events pertaining to issue activities from a day before the current date.
- Reconciles transferred issues.
- Frequency: Daily at 5:00 am UTC
- Tasks:
- Checks historical "closed" state daily, reconciling the merged/not-merged states of closed PRs since a given date.
As of now, the app does not expose any API endpoints, functioning predominantly on background tasks scheduled at various intervals to renew the database with recent GitHub information.
- Chalice
- GitHubAPI, TimelineAPI, TransferAPI: Custom libraries for different parts of the GitHub API.
- chalicelib.utils: Auxiliary functions, including
get_parameter
. - chalicelib.models: Database models such as
PullRequest
andIssue
.
Compatible with sqlalchemy dialects.
The deployment of the backend scheduled Lambda functions is managed by AWS Chalice.
chalice deploy
Create secure parameters (SecureString) in AWS Systems Manager
Parameter Store to coincide with the pattern as specified in app.py
. For example -
token = get_parameter("/contributor-metrics/{env-name}/{var-name}", True)
db_url = get_parameter("/contributor-metrics/{env-name}/{var-name", True)
Create the database tables using create_all()
. This will create PullRequest
, Member
, Issue
, Event
, Transfer
, and EventPoll
tables. The below example loads the environment variables using dotenv
. When deployed, these secrets are retrieved from SSM (above).
if __name__ == "__main__":
import os
from dotenv import load_dotenv
from models import Member, PullRequest, Issue, create_all, create_db_session
load_dotenv()
token = os.getenv("GH_TOKEN")
db_url = os.getenv("DB_URL")
gh = GitHubAPI(token=token)
db = create_db_session(db_url)
create_all(db_url)
For backfilling historical data, utilize the backfill.py script available in the repository. It's recommended to chunk the time periods for backfilling to avoid hitting GitHub's rate limit. Adjust the time frames appropriately to remain within the rate limits while fetching historical data. This script makes it easy to backfill data for specified repositories and events by automating the process and handling the GitHub API's rate limits gracefully.
The Lambdas make use of Python, SQLAlchemy, and the GitHub API.
To develop locally, create a Python virtual environment using requirements.txt
.
pyenv virtualenv 3.8.3 contributor-metrics
pyenv activate contributor-metrics
pip install -r requirements.txt
Activate the environment:
pyenv activate contributor-metrics
The policy.json
provides access from the Lambda functions to the secrets stored in SSM.
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": "arn:*:ssm:*:*:parameter/contributor-metrics/*/*"
}
This is added in the config.json
.
classDiagram
class Member {
+id: Integer (PK)
inserted_dt: DateTime
inactive_dt: DateTime
inactive: Boolean
avatar_url: String
events_url: String
followers_url: String
following_url: String
gists_url: String
gravatar_id: String
html_url: String
login: String (Unique)
node_id: String
organizations_url: String
received_events_url: String
repos_url: String
site_admin: Boolean
starred_url: String
subscriptions_url: String
type: String
url: String
}
class Issue {
+id: BigInteger (PK)
active_lock_reason: String
assignee: JSONB
assignees: ARRAY(JSON)
author_association: String
body: String
closed_at: DateTime
comments: Integer
comments_url: String
created_at: DateTime
events_url: String
html_url: String
labels: ARRAY(JSON)
labels_url: String
locked: Boolean
milestone: JSONB
node_id: String
number: Integer
org: String
performed_via_github_app: String
reactions: JSONB
repo: String
repository_url: String
score: Float
state: String
state_reason: String
timeline_url: String
title: String
updated_at: DateTime
url: String
user: JSONB
username: String
}
class PullRequest {
+id: BigInteger (PK)
url: String
repo: String
org: String
repository_url: String
labels_url: String
comments_url: String
events_url: String
html_url: String
node_id: String
number: Integer
title: String
user: JSONB
username: String
labels: ARRAY(JSON)
state: String
state_reason: String
merged: Boolean
locked: Boolean
assignee: JSONB
assignees: ARRAY(JSON)
milestone: JSONB
comments: Integer
created_at: DateTime
updated_at: DateTime
closed_at: DateTime
author_association: String
active_lock_reason: String
draft: Boolean
pull_request: JSONB
body: String
reactions: JSONB
timeline_url: String
performed_via_github_app: String
score: Integer
}
class Event {
+id: BigInteger (PK)
+issue_id: BigInteger (PK)
org: String
repo: String
event: String
body: String
label: JSONB
reactions: JSONB
state: String
created_at: DateTime
updated_at: DateTime
node_id: String
user: JSONB
author_association: String
username: String
}
class EventPoll {
+id: BigInteger (PK)
+page_no: Integer (PK)
issue_updated_at: DateTime
etag: String
}
class Transfer {
+issue_id: BigInteger (PK)
+new_issue_id: BigInteger (PK)
url: String
number: Integer
repo: String
title: String
body: String
created_at: DateTime
closed_at: DateTime
state: String
org: String
assignee: JSONB
assignees: ARRAY(JSON)
labels: ARRAY(JSON)
new_repo: String
new_html_url: String
new_url: String
new_number: Integer
user: JSONB
username: String
}
Issue --|> Member: "has"
PullRequest --|> Member: "has"
Event --|> Issue: "refers to"
Event --|> PullRequest: "refers to"
Transfer --|> Issue: "refers to"
Transfer --|> Issue: "refers to (new)"
EventPoll --|> Issue: "can refer to"
EventPoll --|> PullRequest: "can refer to"