cal-itp · SorenSpicknall · Aug 24, 2023 · Aug 24, 2023
@@ -4,7 +4,6 @@ about: Create a report to help us improve
 title: 'Bug: '
 labels: ''
 assignees: ''
-
 ---
 
 **Describe the bug**

@@ -2,9 +2,9 @@
 name: New Team Member
 about: Kick off the onboarding process.
 title: New Team Member - [Name]
-labels: 'new team member'
-
+labels: new team member
 ---
+
 Name:
 Role:
 Reports to:
@@ -14,9 +14,11 @@ GitHub Username:
 Slack Username:
 
 **Set-up:**
+
 - [ ] Technical Onboarding call scheduled
 
 - [ ] Added to tools:
+
   - [ ] Github
     - [ ] Organization: Cal-ITP
     - [ ] Team: warehouse-users and warehouse-contributors
@@ -26,12 +28,14 @@ Slack Username:
   - [ ] Slack
 
 - [ ] Added to meetings:
+
   - [ ] Analyst Round Tables (Tuesday & Thursday)
   - [ ] Lunch n' Learn
   - [ ] All-hands
   - [ ] Data & Digital Services email list
 
 - [ ] Added to Slack channels:
+
   - [ ] #data-analyses
   - [ ] #data-office-hours
   - [ ] #data

@@ -4,20 +4,16 @@ about: Submit a user story or feature request
 title: ''
 labels: ''
 assignees: ''
-
 ---
 
 ## User story / feature request
 
-_Please describe your need, outlining the key users, the feature being requested, and the goal that that the feature will facilitate. For example: **As a [user or stakeholder type], I want [software feature] so that [some business value]**_
-
-
+_Please describe your need, outlining the key users, the feature being requested, and the goal that that the feature will facilitate. For example: **As a \[user or stakeholder type\], I want \[software feature\] so that \[some business value\]**_
 
 ### Acceptance Criteria
 
 _Please enter something that can be verified to show that this user story is satisfied. For example:  **I can join table X with table Y.** or **Column A appears in table Z in Metabase.**_
 
-
-
 ### Notes
+
 _Please enter any additional information that will facilitate the completion of this ticket. For example: Are there any constraints not mentioned above? Are there any alternatives you have considered?_
@@ -1,9 +1,8 @@
 # Description
-_Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies._
-
 
+_Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies._
 
-Resolves #[issue]
+Resolves #\[issue\]
 
 ## Type of change
 
@@ -13,11 +12,11 @@ Resolves #[issue]
 - [ ] Documentation
 
 ## How has this been tested?
-_Include commands/logs/screenshots as relevant._
-
 
+_Include commands/logs/screenshots as relevant._
 
 ## Post-merge follow-ups
+
 _Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution._
 
 - [ ] No action required

@@ -13,32 +13,33 @@ While we're using GCP Composer, "deployment" of Airflow consists of two parts:
 
 This workflow builds a static website from the Svelte app and deploys it to Netlify.
 
-## build-*.yml workflows
+## build-\*.yml workflows
 
 Workflows prefixed with `build-` generally lint, test, and (usually) publish either a Python package or a Docker image.
 
-## service-*.yml workflows
+## service-\*.yml workflows
 
 Workflows prefixed with `service-` deal with Kubernetes deployments.
 
-* `service-release-candidate.yml` creates candidate branches, using [hologit](https://github.com/JarvusInnovations/hologit) to bring in external Helm charts and remove irrelevant (i.e. non-infra) code
-* `service-release-diff.yml` renders kubectl diffs on PRs targeting release branches
-* `service-release-channel.yml` deploys to a given channel (i.e. environment) on updates to a release branch
+- `service-release-candidate.yml` creates candidate branches, using [hologit](https://github.com/JarvusInnovations/hologit) to bring in external Helm charts and remove irrelevant (i.e. non-infra) code
+- `service-release-diff.yml` renders kubectl diffs on PRs targeting release branches
+- `service-release-channel.yml` deploys to a given channel (i.e. environment) on updates to a release branch
 
 Some of these workflows use hologit or invoke. See the READMEs in [.holo](../.holo) and [ci](../ci) for documentation regarding hologit and invoke, respectively.
 
 ## GitOps
+
 The workflows described above also define their triggers. In general, developer workflows should follow these steps.
 
 1. Check out a feature branch
 2. Put up a PR for that feature branch, targeting `main`
-    * `service-release-candidate` will run and create a remote branch named `candidate/<feature-branch-name`
+   - `service-release-candidate` will run and create a remote branch named `candidate/<feature-branch-name`
 3. Create and merge a PR from the candidate branch to `releases/test`
-    * `service-release-diff` will run on the PR and print the expected changes
-    * `service-release-channel` will run on merge (i.e. push on `releases/test`) and deploy
+   - `service-release-diff` will run on the PR and print the expected changes
+   - `service-release-channel` will run on merge (i.e. push on `releases/test`) and deploy
 4. Merge the original PR
-    * `service-release-candidate` will then update the remote `candidates/main` branch
+   - `service-release-candidate` will then update the remote `candidates/main` branch
 5. Create and merge a PR from `candidates/main` to `releases/prod`
-    * `service-release-channel` will run and deploy to `prod` this time
+   - `service-release-channel` will run and deploy to `prod` this time
 
 Note: One alternative would be to use `candidates/main` to deploy into both `test` and `prod`. This is very possible but can be a bit annoying if GitHub is configured to delete branches on merge and the `cleanup-release-candidates` action then deletes `candidates/main` after it has been merged into `releases/test`.
@@ -5,10 +5,10 @@ GitHub branches in a manner that facilitates an "obvious" GitOps workflow for CI
 hologit allows:
 
 1. Building branches containing only a subset of repository contents (for example, a branch only including infra-related code)
-    * This action is called "projection"
+   - This action is called "projection"
 2. Bringing in contents from another repository without relying on published artifacts such as Helm charts
 3. Applying transformations to files as part of #1
-    * These transformations are called "lenses"
+   - These transformations are called "lenses"
 
 In this repository, we declare one holobranch named [release-candidate](../branches/release-candidate).
 By projecting this holobranch in GitHub Actions, individual "candidate" branches end up containing

@@ -60,3 +60,13 @@ repos:
     rev: 0.6.1
     hooks:
       - id: nbstripout
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.16
+    hooks:
+    - id: mdformat
+      exclude: ^warehouse/(?!README.md)
+      args: ["--number"]
+      additional_dependencies:
+      - mdformat-gfm
+      - mdformat-frontmatter
+      - mdformat-footnote
@@ -4,16 +4,18 @@ These guidelines are meant to provide a foundation for collaboration in Cal-ITP'
 primarily [#data-infra](https://github.com/cal-itp/data-infra).
 
 ## Issues
-* When submitting an issue, please try to use an existing template if one is appropriate
-* Provide enough information and context; try to do one or more of the following:
-    * Include links to specific lines of code, error logs, Slack context, etc.
-    * Include error messages or tracebacks if relevant and short
-    * Connect issues to Sentry issues
+
+- When submitting an issue, please try to use an existing template if one is appropriate
+- Provide enough information and context; try to do one or more of the following:
+  - Include links to specific lines of code, error logs, Slack context, etc.
+  - Include error messages or tracebacks if relevant and short
+  - Connect issues to Sentry issues
 
 ## Pull Requests
-* We generally use merge commits as we think they provide clarity in a PR-based workflow
-* PRs should be linked to any issues that they close. [Keywords](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) are one good way to do this
-* Google provides a [How to do a code review reference](https://google.github.io/eng-practices/review/reviewer/) that reviewers may find helpful
-* Use draft PRs to keep track of work without notifying reviewers, and avoid giving pre-emptive feedback on draft PRs
-* Reviewers should not generally merge PRs themselves and should instead let the author merge, since authors will have the most context about merge considerations (for example, whether additional reviews are still needed, or whether any communication is needed about the impacts of the PR when it merges)
-* After a PR is merged, the author has the responsibility of monitoring any subsequent CI actions for successful completions
+
+- We generally use merge commits as we think they provide clarity in a PR-based workflow
+- PRs should be linked to any issues that they close. [Keywords](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) are one good way to do this
+- Google provides a [How to do a code review reference](https://google.github.io/eng-practices/review/reviewer/) that reviewers may find helpful
+- Use draft PRs to keep track of work without notifying reviewers, and avoid giving pre-emptive feedback on draft PRs
+- Reviewers should not generally merge PRs themselves and should instead let the author merge, since authors will have the most context about merge considerations (for example, whether additional reviews are still needed, or whether any communication is needed about the impacts of the PR when it merges)
+- After a PR is merged, the author has the responsibility of monitoring any subsequent CI actions for successful completions
@@ -6,21 +6,21 @@ Documentation for this codebase lives at [docs.calitp.org/data-infra](https://do
 
 ## Repository Structure
 
-* [./airflow](./airflow) contains the local dev setup and source code for Airflow DAGs (i.e. ETL)
-* [./ci](./ci) contains continuous integration and deployment scripts using GitHub actions.
-* [./docs](./docs) builds the [docs site](https://docs.calitp.org/data-infra).
-* [./kubernetes](./kubernetes) contains helm charts, scripts and more for deploying apps/services (e.g. Metabase, JupyterHub) on our kubernetes cluster.
-* [./images](./images) contains images we build and deploy for use by services such as JupyterHub.
-* [./services](./services) contains apps that we write and deploy to kubernetes.
-* [./warehouse](./warehouse) contains our dbt project that builds and tests models in the BigQuery warehouse.
+- [./airflow](./airflow) contains the local dev setup and source code for Airflow DAGs (i.e. ETL)
+- [./ci](./ci) contains continuous integration and deployment scripts using GitHub actions.
+- [./docs](./docs) builds the [docs site](https://docs.calitp.org/data-infra).
+- [./kubernetes](./kubernetes) contains helm charts, scripts and more for deploying apps/services (e.g. Metabase, JupyterHub) on our kubernetes cluster.
+- [./images](./images) contains images we build and deploy for use by services such as JupyterHub.
+- [./services](./services) contains apps that we write and deploy to kubernetes.
+- [./warehouse](./warehouse) contains our dbt project that builds and tests models in the BigQuery warehouse.
 
 ## Contributing
 
-* Follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) standard for all commits
-* Use Conventional Commit format for PR titles
-* Use GitHub's *draft* status to indicate PRs that are not ready for review/merging
-* Do not use GitHub's "update branch" button or merge the `main` branch back into a PR branch to update it. Instead, rebase PR branches to update them and resolve any merge conflicts.
-* We use GitHub's "code owners" functionality to designate a person or group of people who are in the line of approval for changes to some parts of this repository - if one or more people are automatically tagged as reviewers by GitHub when you create a PR, an approving review from at least one of them is required to merge. This does not automatically place the PR review in somebody's list of priorities, so please reach out to a reviewer to get eyes on your PR if it's time-sensitive.
+- Follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) standard for all commits
+- Use Conventional Commit format for PR titles
+- Use GitHub's *draft* status to indicate PRs that are not ready for review/merging
+- Do not use GitHub's "update branch" button or merge the `main` branch back into a PR branch to update it. Instead, rebase PR branches to update them and resolve any merge conflicts.
+- We use GitHub's "code owners" functionality to designate a person or group of people who are in the line of approval for changes to some parts of this repository - if one or more people are automatically tagged as reviewers by GitHub when you create a PR, an approving review from at least one of them is required to merge. This does not automatically place the PR review in somebody's list of priorities, so please reach out to a reviewer to get eyes on your PR if it's time-sensitive.
 
 ## Linting and type-checking
 
@@ -41,9 +41,9 @@ which runs in the GitHub Actions that build the individual images. If you are
 unfamiliar with Python type hints or mypy, the following documentation links
 will prove useful.
 
-* [PEP 484](https://peps.python.org/pep-0484/), which added type hints
-* [The typing module docs](https://docs.python.org/3/library/typing.html)
-* [The mypy docs](https://mypy.readthedocs.io/en/stable/)
+- [PEP 484](https://peps.python.org/pep-0484/), which added type hints
+- [The typing module docs](https://docs.python.org/3/library/typing.html)
+- [The mypy docs](https://mypy.readthedocs.io/en/stable/)
 
 In general, it should be relatively easy to make most of our code pass mypy
 since we make heavy use of Pydantic types. Some of our imported modules will

@@ -39,11 +39,13 @@ gcloud init
 ```
 
 Next, run the initial database migration which also creates a default user named `airflow`.
+
 ```shell
 docker-compose run airflow db init
 ```
 
 Next, start all services including the Airflow web server.
+
 ```console
 docker-compose up
 ```
@@ -60,6 +62,7 @@ docker-compose run airflow tasks test download_gtfs_schedule_v2 download_schedul
 Additional reading about this setup can be found on the [Airflow Docs](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html)
 
 ### PodOperators
+
 Airflow PodOperator tasks execute a specific Docker image; as of 2023-08-24 these images are pushed to [GitHub Container Registry](https://ghcr.io/) and production uses `:latest` tags while local uses `:development`. If you want to test these tasks locally, you must build and push development versions of the images used by the tasks. The Dockerfiles and code that make up the images live in the [../jobs](../jobs) directory. For example:
 
 ```bash
@@ -77,7 +80,7 @@ docker-compose run airflow tasks test unzip_and_validate_gtfs_schedule_hourly va
 
 ### Common Issues
 
-* `docker-compose up` exits with code 137 - Check that your docker has enough RAM (e.g. 8Gbs). See [this post](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) on how to increase its resources.
+- `docker-compose up` exits with code 137 - Check that your docker has enough RAM (e.g. 8Gbs). See [this post](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) on how to increase its resources.
 
 ## Deploying to production
 

@@ -5,4 +5,5 @@ Type: [Now / Scheduled](https://docs.calitp.org/data-infra/airflow/dags-maintena
 This DAG orchestrates raw data capture for GTFS schedule data. It reads GTFS data configuration files that are generated by the [`airtable_loader_2` DAG](../airtable_loader_v2/README.md) to determine the list of GTFS schedule URLs to scrape (this DAG will just find the latest such configuration file, so there is no formal dependency between the two DAGs on a daily run basis.)
 
 ## Secrets
+
 You may need to change authentication information in [Secret Manager](https://console.cloud.google.com/security/secret-manager); auth keys are loaded from Secret Manager at the start of DAG executions. You may create new versions of existing secrets, or add entirely new secrets. Secrets must be tagged with `gtfs_schedule: true` to be loaded and are referenced by `url_secret_key_name` or `header_secret_key_name` in Airtable's GTFS dataset records.
@@ -5,10 +5,11 @@ Type: [Now / Scheduled](https://docs.calitp.org/data-infra/airflow/dags-maintena
 This DAG orchestrates the running of the Cal-ITP dbt project and deployment of associated artifacts like the [dbt docs site](https://dbt-docs.calitp.org/#!/overview).
 
 This DAG has some special considerations:
-* If a task fails, look carefully before assuming that clearing the task will help. If the failure was caused by a `DbtModelError`, there is an issue with the SQL or data in an individual model and clearing the task will not help until that issue is fixed.
 
-* While this DAG does not have any formal dependencies on other DAGs, the data transformations within the dbt project do depend on successful upstream data capture and parsing.
+- If a task fails, look carefully before assuming that clearing the task will help. If the failure was caused by a `DbtModelError`, there is an issue with the SQL or data in an individual model and clearing the task will not help until that issue is fixed.
 
-* Because the tasks in this DAG involve running a large volume of SQL transformations, they risk triggering data quotas if the DAG is run multiple times in a single day.
+- While this DAG does not have any formal dependencies on other DAGs, the data transformations within the dbt project do depend on successful upstream data capture and parsing.
 
-* This task can be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button (option under the "play" icon in the upper right corner when looking at an individual DAG) in the Airflow UI and provide a JSON configuration like `{"dbt_select": "<+ if you want to run parents><your_model_here><+ if you want to run children>"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)) to re-run a specific individual model's lineage.
+- Because the tasks in this DAG involve running a large volume of SQL transformations, they risk triggering data quotas if the DAG is run multiple times in a single day.
+
+- This task can be run with a `dbt_select` statement provided (use the `Trigger DAG w/ config` button (option under the "play" icon in the upper right corner when looking at an individual DAG) in the Airflow UI and provide a JSON configuration like `{"dbt_select": "<+ if you want to run parents><your_model_here><+ if you want to run children>"}` using [dbt selection syntax](https://docs.getdbt.com/reference/node-selection/syntax#specifying-resources)) to re-run a specific individual model's lineage.
@@ -38,6 +38,7 @@ Netlify sites deployed via `netlify deploy ...` with `--alias=some-alias` and/or
 The site is deployed to production on merges to main, as defined in [../../.github/workflows/deploy-apps-maps.yml](../../.github/workflows/deploy-apps-maps.yml).
 
 You may also deploy manually with the following:
+
 ```bash
 (from the apps/maps folder)
 npm run build

@@ -5,6 +5,7 @@ a deployment named `archiver` is configured in [the prod channel](./channels/pro
 by `invoke` (see below) calling `kubectl` commands.
 
 ## invoke (aka pyinvoke)
+
 [invoke](https://docs.pyinvoke.org/en/stable/) is a Python framework for executing subprocesses and building a CLI application.
 The tasks are defined in `tasks.py` and configuration in `invoke.yaml`; config values under the top-level `calitp`
 are specific to our defined tasks.