-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thumbnail repsonse time runbooks #3053
Conversation
00214f7
to
3058cde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Regarding sharing them: I think we could share the anomaly ones, maybe, but I do think it's good to have separate instructions for debugging average vs p99. For example, for average response time, you probably don't need to do anything very special to find "the slow requests", because overall things are slow. For p99, however, as long as average is still okay, it is revealing some kind of edge case where we'd need to query the nginx logs for target response time. I don't know whether that needs to be in the runbook, because it is a different in principle that applies to all average and p99 response time debugging, but it is a notable difference between the two. But yeah, not sure whether that's a difference in the runbook or a difference in some other resource.
The anomaly and threshold alarms also have a difference, I think. Like the threshold alarm runbooks could probably have an instruction to say "check whether it is a one-off spike or persistent" and guide severity along those lines. If we spike to 10 seconds average response time, that's not good, but it also doesn't mean we need to drop everything to debug it if things have gone back down.
That's also just a general principle that could apply to almost every alarm, so I don't know whether that needs to be in the runbooks or, like above, in some other document that they link to. I think this relates somewhat to the level of confidence that something persistently bad is happening based on the alarm, which does also relate to the false alarms you mentioned. Ideally anomaly alarms would be higher confidence that something bad is persistently happening, once we've tuned them, and threshold alarms are kind of more up in the air, could be really bad or could just be a one-off spike that we should look into, but not necessarily with urgency.
Anyway, all of that is just my thoughts on this. I don't have strong recommendations either way, but whatever we come up with, ideally we can apply it consistently between the alarms. It looks like there are some important differences but also important similarities in the principles of these alarms and what they can indicate, I just don't know what the correct or most flexible way of sharing and differentiating between those is.
Full-stack documentation: https://docs.openverse.org/_preview/3053 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. New files ➕:
Changed files 🔄: |
cd41a35
to
febaf89
Compare
…WordPress#3055) * change deprecated ES search body * trim white space Add thumbnail repsonse time runbooks (WordPress#3053) * Add thumb repsonse time runbooks * Add files to index * Threshold alarms are low severity if not anomalous generate-dag-docs recipe move DAGs.md to documentation folder (WordPress#3061) Co-authored-by: Madison Swain-Bowden <[email protected]> Upgrade psycopg to version 3 in the API (WordPress#3064) Update VSourcesTable.vue (WordPress#3026) Co-authored-by: sarayourfriend <[email protected]> Co-authored-by: Olga Bulat <[email protected]> Co-authored-by: Krystle Salazar <[email protected]> Transfer UUID validation inside serializer (WordPress#3068) * Transfer UUID validation inside serializer * Add test case Publish changelog for api-2023.09.28.00.26.34 (WordPress#3070) Co-authored-by: AetherUnbound <[email protected]> Use fully qualified docker image names (WordPress#3071) Publish changelog for ingestion_server-2023.09.29.17.40.50 (WordPress#3082) Co-authored-by: stacimc <[email protected]> Update `_AIRFLOW_DB_UPGRADE` to `_AIRFLOW_DB_MIGRATE` Increase the API sources cache TTL from 20 minutes to 4 hours (WordPress#3083) Publish changelog for api-2023.09.30.00.15.32 (WordPress#3084) Co-authored-by: AetherUnbound <[email protected]> Bump ipython from 8.14.0 to 8.16.0 in /automations/python (WordPress#3099) Bumps [ipython](https://github.com/ipython/ipython) from 8.14.0 to 8.16.0. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](ipython/ipython@8.14.0...8.16.0) --- updated-dependencies: - dependency-name: ipython dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Remove boto3 dependency (WordPress#3073) Bump docker/login-action from 2 to 3 (WordPress#3089) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump elasticsearch from 8.8.2 to 8.10.0 in /api (WordPress#3103) Bumps [elasticsearch](https://github.com/elastic/elasticsearch-py) from 8.8.2 to 8.10.0. - [Release notes](https://github.com/elastic/elasticsearch-py/releases) - [Commits](elastic/elasticsearch-py@v8.8.2...v8.10.0) --- updated-dependencies: - dependency-name: elasticsearch dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump psycopg2 from 2.9.7 to 2.9.8 in /ingestion_server (WordPress#3097) Bumps [psycopg2](https://github.com/psycopg/psycopg2) from 2.9.7 to 2.9.8. - [Changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS) - [Commits](psycopg/psycopg2@2.9.7...2.9.8) --- updated-dependencies: - dependency-name: psycopg2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump elasticsearch from 8.8.2 to 8.10.0 in /ingestion_server (WordPress#3095) Bumps [elasticsearch](https://github.com/elastic/elasticsearch-py) from 8.8.2 to 8.10.0. - [Release notes](https://github.com/elastic/elasticsearch-py/releases) - [Commits](elastic/elasticsearch-py@v8.8.2...v8.10.0) --- updated-dependencies: - dependency-name: elasticsearch dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump pygithub from 1.59.1 to 2.1.1 in /automations/python (WordPress#3102) Bumps [pygithub](https://github.com/pygithub/pygithub) from 1.59.1 to 2.1.1. - [Release notes](https://github.com/pygithub/pygithub/releases) - [Changelog](https://github.com/PyGithub/PyGithub/blob/main/doc/changes.rst) - [Commits](PyGithub/PyGithub@v1.59.1...v2.1.1) --- updated-dependencies: - dependency-name: pygithub dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump furo from 2023.8.19 to 2023.9.10 in /documentation (WordPress#3092) Bumps [furo](https://github.com/pradyunsg/furo) from 2023.8.19 to 2023.9.10. - [Release notes](https://github.com/pradyunsg/furo/releases) - [Changelog](https://github.com/pradyunsg/furo/blob/main/docs/changelog.md) - [Commits](pradyunsg/furo@2023.08.19...2023.09.10) --- updated-dependencies: - dependency-name: furo dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump docker/build-push-action from 4 to 5 (WordPress#3090) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 4 to 5. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v4...v5) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump actions/checkout from 3 to 4 (WordPress#3088) Bump docker/setup-buildx-action from 2 to 3 (WordPress#3087) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2 to 3. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v2...v3) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump fakeredis from 2.18.0 to 2.19.0 in /api (WordPress#3100) Bumps [fakeredis](https://github.com/cunla/fakeredis-py) from 2.18.0 to 2.19.0. - [Release notes](https://github.com/cunla/fakeredis-py/releases) - [Commits](cunla/fakeredis-py@v2.18.0...v2.19.0) --- updated-dependencies: - dependency-name: fakeredis dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Show timeout errors on the frontend (WordPress#2838) * Show timeout errors on the frontend * Use FetchingError in all stores * Fix error * Show client-side errors on single result pages * Set 500 as a non-retriable error * Add changes from code review * Use local base64 image for thumbnail * Fix footer * Fix image-cell test * Extract common error checking functionality * Update unit tests Add runbooks for API Thumbnails 2XX/5XX responses and Request Count alarms (WordPress#3076) Bump jsonschema from 4.19.0 to 4.19.1 in /ingestion_server (WordPress#3098) Bumps [jsonschema](https://github.com/python-jsonschema/jsonschema) from 4.19.0 to 4.19.1. - [Release notes](https://github.com/python-jsonschema/jsonschema/releases) - [Changelog](https://github.com/python-jsonschema/jsonschema/blob/main/CHANGELOG.rst) - [Commits](python-jsonschema/jsonschema@v4.19.0...v4.19.1) --- updated-dependencies: - dependency-name: jsonschema dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump ipython from 8.15.0 to 8.16.1 in /ingestion_server (WordPress#3110) Bumps [ipython](https://github.com/ipython/ipython) from 8.15.0 to 8.16.1. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](https://github.com/ipython/ipython/commits) --- updated-dependencies: - dependency-name: ipython dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump sphinx from 7.2.5 to 7.2.6 in /documentation (WordPress#3091) Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 7.2.5 to 7.2.6. - [Release notes](https://github.com/sphinx-doc/sphinx/releases) - [Changelog](https://github.com/sphinx-doc/sphinx/blob/master/CHANGES.rst) - [Commits](sphinx-doc/sphinx@v7.2.5...v7.2.6) --- updated-dependencies: - dependency-name: sphinx dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump tldextract from 3.4.4 to 3.6.0 in /catalog (WordPress#3093) Bumps [tldextract](https://github.com/john-kurkowski/tldextract) from 3.4.4 to 3.6.0. - [Release notes](https://github.com/john-kurkowski/tldextract/releases) - [Changelog](https://github.com/john-kurkowski/tldextract/blob/master/CHANGELOG.md) - [Commits](john-kurkowski/tldextract@3.4.4...3.6.0) --- updated-dependencies: - dependency-name: tldextract dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump urllib3 from 2.0.5 to 2.0.6 in /documentation (WordPress#3120) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump urllib3 from 1.26.16 to 1.26.17 in /api (WordPress#3123) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump urllib3 from 2.0.5 to 2.0.6 in /automations/python (WordPress#3119) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.5 to 2.0.6. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@v2.0.5...2.0.6) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Bump urllib3 from 1.26.16 to 1.26.17 in /ingestion_server (WordPress#3124) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> implement vale determine files mircosoft sentence format change alert level change alert level change alert level change alert level change alert level change alert level change alert level Empty-Commit
Fixes
Related to #2502 by @sarayourfriend
Related to https://github.com/WordPress/openverse-infrastructure/pull/619
Description
Adds runbooks for thumbnail response time alarms.
Question for reviewers: the content for all of these is identical. I could make a single runbook and update the links in the alarms. The reason I didn't is that it seems possible (although perhaps unlikely) that false positives could occur separately for each of them 🤔
Testing Instructions
Make sure they look alright in the preview!
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin