-
Notifications
You must be signed in to change notification settings - Fork 50
Add ES healthchecks to /healthcheck/
endpoint
#1047
Conversation
API Developer Docs Preview: Ready https://wordpress.github.io/openverse-api/_preview/1047 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code & tests look good! I run into an issue with this when I try to hit the endpoint locally (http://localhost:50280/healthcheck/?check_es=true):
<h1>RequestError
at /healthcheck/</h1>
<pre class="exception_value">RequestError(400, 'illegal_argument_exception', 'failed to parse setting [timeout] with value [5] as a time value: unit is missing or unrecognized')</pre>
Request Method: | GET
-- | --
http://localhost:50280/healthcheck/?check_es=true
4.1.4
RequestError
RequestError(400, 'illegal_argument_exception', 'failed to parse setting [timeout] with value [5] as a time value: unit is missing or unrecognized')
/venv/lib/python3.10/site-packages/elasticsearch/connection/base.py, line 328, in _raise_error
catalog.api.views.health_views.HealthCheck
/venv/bin/python
3.10.8
['/api', '/usr/local/lib/python310.zip', '/usr/local/lib/python3.10', '/usr/local/lib/python3.10/lib-dynload', '/venv/lib/python3.10/site-packages']
Thu, 15 Dec 2022 19:49:20 +0000
Somehow I completely forgot to try running the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AetherUnbound Did you get that right after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read multiple articles that the ES indices will be yellow if there is a single node. And as far as I can remember, the ES status in local dev environment has always been yellow. I waited several minutes to see if the status would become green but it didn't.
This means that locally, ?check_es=true
will always return a 503 response. Unless we drop the number of replicas from 0 to 1 with a PUT
request.
$ http PUT http://localhost:50292/audio/_settings index[number_of_replicas]:=0
@dhruvkb I get green on my local machine, though 🤔 |
How strange! 😮 I spun the stack up almost 10 minutes ago and it's still showing yellow locally 🤔 I'll try to leave it on for a while longer to see 🤷🏼♀️ Edit: 45 minutes later and it is still yellow |
Earlier I was thinking it might be a difference between Docker Desktop on macOS and regular Docker on Linux. But with @AetherUnbound experiencing the same behaviour on Linux, I can't think of any logical explanation for this discrepancy. |
Can you clarify @dhruvkb if this is a blocker for getting a second approval on this PR? From my perspective just because the local environment doesn't return "green" doesn't mean that the changes in this PR aren't working. The code only relays exactly what the ES instance reports back, nothing more. If you think "yellow" status does not constitute an error state, we can change it to return 200 with a specific message in that case. Please let me know if there are changes you are expecting in this PR as it stands now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more than happy to approve it as is, because it works and because a yellow status in prod would be cause for concern.
The intent of my comment was to point out the difference between local and prod and a way it could be made green locally if we wanted.
I've manually verified that the labels exist and are correct, so I'm going to merge this using the admin workaround instead of trying to sort out why GitHub won't let me re-run the actions. They've been stalled indefinitely, it seems. |
I've been seeing this with PRs that were opened before we added that as a required check. It seems that just adding and removing a label is sufficient enough to trigger the jobs and pass the checks, FWIW! |
Oops, swapped digits with #1074 from the CLI 😅 |
Fixes
Fixes #14 by @obulat
Description
Adds Elasticsearch status checks to the
/healthcheck/
endpoint whencheck_es
is in the query params.Note: I chose not to implement a serializer for the query params to avoid over-complicating the implementation of what is an otherwise very straightforward request to process. If this does happen to get more complex in the future, like if we add different
check_*
params, a serializer might then become appropriate.I'm leaving this as a draft to allow discussion in the issue to decide whether this approach actually makes sense for our usage of the healthcheck endpoint.
Testing Instructions
Check out the unit tests and confirm they make sense. There's not really a good way to test this practically as far as I can tell unless you do something to brick your local ES cluster. I'm not sure how to do that so I don't have any advice for this.
Checklist
Update index.md
).main
) ora parent feature branch.
errors.
Developer Certificate of Origin
Developer Certificate of Origin