Skip to content

Commit

Permalink
Merge catalog into monorepo
Browse files Browse the repository at this point in the history
  • Loading branch information
dhruvkb committed Apr 14, 2023
2 parents 033ab77 + aedc9c1 commit 697f62f
Show file tree
Hide file tree
Showing 515 changed files with 148,354 additions and 27 deletions.
2 changes: 2 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Add pyupgrade
593fc1a6045eaadac4226ea8451fd24615d9d47b
4 changes: 4 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
catalog/ @WordPress/openverse-catalog
archive/ @WordPress/openverse-catalog

api/ @WordPress/openverse-api
ingestion_server/ @WordPress/openverse-api
nginx/ @WordPress/openverse-api
Expand All @@ -22,6 +25,7 @@ prettier.config.js @WordPress/openverse-frontend
automations/ @WordPress/openverse-maintainers
brand/ @WordPress/openverse-maintainers
dead_links/ @WordPress/openverse-maintainers
docker/ @WordPress/openverse-maintainers
documentation/ @WordPress/openverse-maintainers
readme_assets/ @WordPress/openverse-maintainers
templates/ @WordPress/openverse-maintainers
Expand Down
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/airflow_alert.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Airflow Alert
about: Report an alert raised by Airflow
labels: "🛠 goal: fix, 🚦 status: awaiting triage, 🌬️ tooling: airflow"
title: "<Replace this with actual title>"
---

## Airflow log link

<!-- The link that gets posted in the "Log:" section of the Slack alert -->

> **Note**: _Airflow is currently only accessible to maintainers & those given
> access. If you would like access to Airflow, please reach out to a member of
> @WordPress/openverse-maintainers_.
## Description

<!-- Include any additional information you may have, including potential remedies if any come to mind, and the general context of the code (what causes it to run in the app). -->
<!-- Example: We are trying to access property foo of ImportantClass but the instance is null. -->

<!-- Mention whether this is a known regression, i.e., the feature used to work and now does not. -->

## Reproduction

<!-- Share the steps to reproduce the issue, if you were able to, OR a note sharing that you tried to reproduce but weren’t able to. -->

## DAG status

<!-- Share any actions taken on the status of the DAG, e.g. disabling or pausing notifications -->
22 changes: 22 additions & 0 deletions .github/ISSUE_TEMPLATE/code_quality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
name: Code Quality Improvement Suggestion
about: Suggest a change that does not add a feature
labels: "🚦 status: awaiting triage"
title: "<Replace this with the actual title>"
---

## Current Situation

<!-- Describe the part of the code you think should improve -->

## Suggested Improvement

<!-- Describe your proposed change -->

## Benefit

<!-- Describe the benefit of the change (E.g., increase test coverage, reduce running time, etc.) -->

## Additional context

<!-- Add any other context suggestion here. -->
93 changes: 93 additions & 0 deletions .github/ISSUE_TEMPLATE/image_provider_api_integration_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
name: Image Provider API Integration Request
description: Tell us about an API providing CC-licensed images
title: "<Provider name here>"
labels:
- "🚦 status: awaiting triage"
- "✨ goal: improvement"
- "🧹 status: ticket work required"
- "☁️ provider: images"
body:
- type: input
id: provider
attributes:
label: Provider API Endpoint / Documentation
description: Please provide links to the API endpoint, and any known documentation
validations:
required: true
- type: textarea
id: description
attributes:
label: Provider description
description: Please provide a clear and concise description of the image provider
validations:
required: true
- type: input
id: licenses
attributes:
label: Licenses Provided
description: Which licenses does the provider use for images (if known)
validations:
required: true
- type: textarea
id: technical-info
attributes:
label: Provider API Technical info
description: Please provide any technical details that might be useful for implementation, e.g., rate limits, filtering options, overall volume, etc.
validations:
required: true
- type: checkboxes
id: checklist
attributes:
label: Checklist to complete before beginning development
description: |
Please do not modify this section. No development should be done on a Provider API Script until the following info is gathered:
options:
- label: Verify there is a way to retrieve the entire relevant portion of the provider's collection in a systematic way via their API.
required: false
- label: Verify the API provides license info (license type and version; license URL provides both, and is preferred)
required: false
- label: Verify the API provides stable direct links to individual works.
required: false
- label: Verify the API provides a stable landing page URL to individual works.
required: false
- label: Note other info the API provides, such as thumbnails, dimensions, attribution info (required if non-CC0 licenses will be kept), title, description, other meta data, tags, etc.
required: false
- label: Attach example responses to API queries that have the relevant info.
required: false
- type: markdown
attributes:
value: |
## General Recommendations for implementation
Modify this section if necessary
- The script should be in the `catalog/dags/provider_api_scripts/` directory.
- The script should have a test suite in the same directory.
- The script must use the `ImageStore` class (Import this from `catalog/dags/provider_api_scripts/common/storage/image.py`).
- The script should use the `DelayedRequester` class (Import this from `catalog/dags/provider_api_scripts/common/requester.py`).
- The script must not use anything from `catalog/dags/provider_api_scripts/modules/etlMods.py`, since that module is deprecated.
- If the provider API has can be queried by 'upload date' or something similar, the script should take a `--date` parameter when run as a script, giving the date for which we should collect images. The form should be `YYYY-MM-DD` (so, the script can be run via `python my_favorite_provider.py --date 2018-01-01`).
- The script must provide a main function that takes the same parameters as from the CLI. In our example from above, we'd then have a main function `my_favorite_provider.main(date)`. The main should do the same thing calling from the CLI would do.
- The script *must* conform to [PEP8][pep8]. Please use `pycodestyle` (available via `pip install pycodestyle`) to check for compliance.
- The script should use small, testable functions.
- The test suite for the script may break PEP8 rules regarding long lines where appropriate (e.g., long strings for testing).
[pep8]: https://www.python.org/dev/peps/pep-0008/
- type: markdown
attributes:
value: |
## Examples of other Provider API Scripts
It's unlikely this section needs to be modified
For example Provider API Scripts and accompanying test suites, please see
- `catalog/dags/provider_api_scripts/flickr.py` and
- `catalog/dags/provider_api_scripts/test_flickr.py`, or
- `catalog/dags/provider_api_scripts/wikimedia_commons.py` and
- `catalog/dags/provider_api_scripts/test_wikimedia_commons.py`.
- type: checkboxes
id: implementation
attributes:
label: Implementation
options:
- label: 🙋 I would be interested in implementing this feature.
required: false
26 changes: 26 additions & 0 deletions .github/ISSUE_TEMPLATE/infrastructure_improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
name: Infrastructure Improvement Suggestion
about: Suggest a way to improve our infrastructure
labels: "🚦 status: awaiting triage, ✨ goal: improvement"
title: "<Replace this with the actual title>"
---

## Current Situation

<!-- Describe the part of the infrastructure you think should improve -->

## Suggested Improvement

<!-- Describe what you want to happen -->

## Benefit

<!-- Fully describe the benefit of the change (E.g., improve speed, robustness, etc.) -->

## Alternatives

<!-- Describe any alternative solutions you have considered -->

## Additional context

<!-- Add any other context about the feature request. -->
36 changes: 36 additions & 0 deletions .github/ISSUE_TEMPLATE/new_source_suggestion.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: New Source Suggestion for Openverse
description: Tell us about a website or platform with CC-licensed content
title: "<Source name here>"
labels:
- "🚦 status: awaiting triage"
- "🧹 status: ticket work required"
- "☁️ provider: any"
body:
- type: input
id: source
attributes:
label: Source Site
description: Please provide a link to the Source site that you'd like considered for inclusion on Openverse
validations:
required: true
- type: input
id: value
attributes:
label: Value Provided
description: Please explain why it would be valuable to include this source on Openverse
validations:
required: true
- type: input
id: licenses
attributes:
label: Licenses Provided
description: Which CC licenses or Public Domain tools are in use by the source (if known)
validations:
required: true
- type: checkboxes
id: implementation
attributes:
label: Implementation
options:
- label: 🙋 I would be interested in implementing this feature.
required: false
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Fixes #[issue number] by @[issue author]
- [ ] I added or updated documentation (if applicable).
- [ ] I tried running the project locally and verified that there are no visible
errors.
- [ ] I ran the DAG documentation generator (only applicable for catalog PRs).

[best_practices]:
https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines
Expand Down
3 changes: 3 additions & 0 deletions .github/actions/get-changes/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ outputs:
changes:
description: "JSON array of keys from `.github/filters.yml`"
value: ${{ steps.paths-filter.outputs.changes }}
catalog:
description: "'true' if catalog changes are present"
value: ${{ steps.paths-filter.outputs.catalog }}
api:
description: "'true' if API changes are present"
value: ${{ steps.paths-filter.outputs.api }}
Expand Down
18 changes: 18 additions & 0 deletions .github/actions/load-img/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,27 @@ inputs:
default: "true"
description: Whether to set up API image

setup_catalog:
default: "true"
description: Whether to set up catalog image

runs:
using: "composite"
steps:
# Catalog
- name: Download image `catalog`
uses: actions/download-artifact@v3
if: inputs.setup_catalog == 'true'
with:
name: catalog
path: /tmp

- name: Load image `catalog`
if: inputs.setup_catalog == 'true'
shell: bash
run: |
docker load --input /tmp/catalog.tar
# Ingestion server
- name: Download image `ingestion_server`
uses: actions/download-artifact@v3
Expand Down
19 changes: 19 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,25 @@ updates:
- "🧱 stack: ingestion server"
- dependencies

# Enable version updates for Python dev-libs in the calalog
#
# `requirements-dev.txt` can be updated more frequently than
# `requirements_prod.txt` since they are not pinned by the Airflow constraints
# file.
- package-ecosystem: pip
# Look for requirements file in the `/ingestion_server` directory
directory: /catalog
# Check for updates once a month
schedule:
interval: monthly
labels:
- "💻 aspect: code"
- "🧰 goal: internal improvement"
- "🟩 priority: low"
- "🐍 tech: python"
- "🧱 stack: catalog"
- dependencies

- package-ecosystem: "github-actions"
directory: /
schedule:
Expand Down
4 changes: 4 additions & 0 deletions .github/filters.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
catalog:
- catalog/**
# Change to the CI + CD workflow should trigger complete workflow.
- .github/workflows/ci_cd.yml
api:
- api/**
# Change to the CI + CD workflow should trigger complete workflow.
Expand Down
39 changes: 39 additions & 0 deletions .github/release_drafter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/release-drafter/release-drafter/master/schema.json
#
# Configuration for the action `release-drafter/release-drafter`
# Docs: https://github.com/release-drafter/release-drafter
# Workflow: Draft release

name-template: "v$RESOLVED_VERSION"
tag-template: "v$RESOLVED_VERSION"
categories:
- title: New Features
label: "🌟 goal: addition"
- title: Improvements
label: "✨ goal: improvement"
- title: Internal Improvements
labels:
- "🤖 aspect: dx"
- "🧰 goal: internal improvement"
- title: Bug Fixes
label: "🛠 goal: fix"
change-template: "- $TITLE (#$NUMBER) @$AUTHOR"
exclude-labels:
- "skip-changelog"
version-resolver:
major:
labels:
- "💥 versioning: major"
minor:
labels:
- "🎉 versioning: minor"
patch:
labels:
- "🐛 versioning: patch"
default: patch
template: |
$CHANGES
## Credits
Thanks to $CONTRIBUTORS for their contributions!
26 changes: 26 additions & 0 deletions .github/workflows/actionlint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Lint GitHub Actions workflows

on:
pull_request:
push:
branches:
- main

# Cancels all previous workflow runs for pull requests that have not completed.
concurrency:
# The concurrency group contains the workflow name and the branch name for pull requests
# or the commit hash for any other events.
group: ${{ github.workflow }}-${{ github.event_name == 'pull_request' && github.head_ref || github.sha }}
cancel-in-progress: true

jobs:
actionlint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check workflow files
run: |
echo "::add-matcher::.github/actionlint-matcher.json"
bash <(curl https://raw.githubusercontent.com/rhysd/actionlint/v1.6.15/scripts/download-actionlint.bash)
./actionlint -color
shell: bash
Loading

0 comments on commit 697f62f

Please sign in to comment.