Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Add ES healthchecks to /healthcheck/ endpoint #1047

Merged
merged 3 commits into from
Jan 4, 2023

Conversation

sarayourfriend
Copy link
Contributor

Fixes

Fixes #14 by @obulat

Description

Adds Elasticsearch status checks to the /healthcheck/ endpoint when check_es is in the query params.

Note: I chose not to implement a serializer for the query params to avoid over-complicating the implementation of what is an otherwise very straightforward request to process. If this does happen to get more complex in the future, like if we add different check_* params, a serializer might then become appropriate.

I'm leaving this as a draft to allow discussion in the issue to decide whether this approach actually makes sense for our usage of the healthcheck endpoint.

Testing Instructions

Check out the unit tests and confirm they make sense. There's not really a good way to test this practically as far as I can tell unless you do something to brick your local ES cluster. I'm not sure how to do that so I don't have any advice for this.

Checklist

  • My pull request has a descriptive title (not a vague title like
    Update index.md).
  • My pull request targets the default branch of the repository (main) or
    a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible
    errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@openverse-bot openverse-bot added ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🟨 priority: medium Not blocking but should be addressed soon labels Dec 15, 2022
@github-actions
Copy link

github-actions bot commented Dec 15, 2022

API Developer Docs Preview: Ready

https://wordpress.github.io/openverse-api/_preview/1047

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code & tests look good! I run into an issue with this when I try to hit the endpoint locally (http://localhost:50280/healthcheck/?check_es=true):

<h1>RequestError
       at /healthcheck/</h1>
  <pre class="exception_value">RequestError(400, 'illegal_argument_exception', 'failed to parse setting [timeout] with value [5] as a time value: unit is missing or unrecognized')</pre>
  

Request Method: | GET
-- | --
http://localhost:50280/healthcheck/?check_es=true
4.1.4
RequestError
RequestError(400, 'illegal_argument_exception', 'failed to parse setting [timeout] with value [5] as a time value: unit is missing or unrecognized')
/venv/lib/python3.10/site-packages/elasticsearch/connection/base.py, line 328, in _raise_error
catalog.api.views.health_views.HealthCheck
/venv/bin/python
3.10.8
['/api',  '/usr/local/lib/python310.zip',  '/usr/local/lib/python3.10',  '/usr/local/lib/python3.10/lib-dynload',  '/venv/lib/python3.10/site-packages']
Thu, 15 Dec 2022 19:49:20 +0000

@sarayourfriend
Copy link
Contributor Author

Somehow I completely forgot to try running the es_check locally 🤦 Fixed in the latest commit and I'll undraft the PR now that we've got agreement on the approach. Thanks for the early review, @AetherUnbound

@sarayourfriend sarayourfriend marked this pull request as ready for review December 15, 2022 22:56
@sarayourfriend sarayourfriend requested a review from a team as a code owner December 15, 2022 22:56
Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great for me now! Interestingly enough, I'm seeing the following error:
Screenshot_2022-12-16_11-59-02

@sarayourfriend
Copy link
Contributor Author

@AetherUnbound Did you get that right after recreate or maybe up from a down state? At least it means the endpoint is working 🙂

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read multiple articles that the ES indices will be yellow if there is a single node. And as far as I can remember, the ES status in local dev environment has always been yellow. I waited several minutes to see if the status would become green but it didn't.

image

This means that locally, ?check_es=true will always return a 503 response. Unless we drop the number of replicas from 0 to 1 with a PUT request.

$ http PUT http://localhost:50292/audio/_settings index[number_of_replicas]:=0

image

@sarayourfriend
Copy link
Contributor Author

@dhruvkb I get green on my local machine, though 🤔

@AetherUnbound
Copy link
Contributor

AetherUnbound commented Dec 19, 2022

How strange! 😮 I spun the stack up almost 10 minutes ago and it's still showing yellow locally 🤔 I'll try to leave it on for a while longer to see 🤷🏼‍♀️

Edit: 45 minutes later and it is still yellow

@dhruvkb
Copy link
Member

dhruvkb commented Dec 19, 2022

Earlier I was thinking it might be a difference between Docker Desktop on macOS and regular Docker on Linux. But with @AetherUnbound experiencing the same behaviour on Linux, I can't think of any logical explanation for this discrepancy.

@sarayourfriend
Copy link
Contributor Author

Can you clarify @dhruvkb if this is a blocker for getting a second approval on this PR? From my perspective just because the local environment doesn't return "green" doesn't mean that the changes in this PR aren't working. The code only relays exactly what the ES instance reports back, nothing more.

If you think "yellow" status does not constitute an error state, we can change it to return 200 with a specific message in that case. Please let me know if there are changes you are expecting in this PR as it stands now.

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more than happy to approve it as is, because it works and because a yellow status in prod would be cause for concern.

The intent of my comment was to point out the difference between local and prod and a way it could be made green locally if we wanted.

@sarayourfriend
Copy link
Contributor Author

I've manually verified that the labels exist and are correct, so I'm going to merge this using the admin workaround instead of trying to sort out why GitHub won't let me re-run the actions. They've been stalled indefinitely, it seems.

@sarayourfriend sarayourfriend merged commit 316815e into main Jan 4, 2023
@sarayourfriend sarayourfriend deleted the add/es-healthcheck branch January 4, 2023 23:56
@AetherUnbound
Copy link
Contributor

I've manually verified that the labels exist and are correct, so I'm going to merge this using the admin workaround instead of trying to sort out why GitHub won't let me re-run the actions. They've been stalled indefinitely, it seems.

I've been seeing this with PRs that were opened before we added that as a required check. It seems that just adding and removing a label is sufficient enough to trigger the jobs and pass the checks, FWIW!

@AetherUnbound
Copy link
Contributor

Oops, swapped digits with #1074 from the CLI 😅

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/healthcheck endpoint should check for Elasticsearch availability (original #487)
4 participants