This repository has been archived by the owner on Feb 22, 2023. It is now read-only.
/healthcheck endpoint should check for Elasticsearch availability (original #487) #14
Labels
π» aspect: code
Concerns the software code in the repository
β¨ goal: improvement
Improvement to an existing user-facing feature
π¨ priority: medium
Not blocking but should be addressed soon
This issue has been migrated from the CC Search API repository
During deployments, our load balancer repeatedly polls the
/healthcheck
endpoint to check that the server is reachable. If this check succeeds, the newly deployed instance starts receiving production traffic. Right now, if Elasticsearch is not responsive,/healthcheck
will still return200 OK
.The healthcheck endpoint should check the health of the
image
index in Elasticsearch using the cluster health API. If it is unavailable, return error 500. Log an informative message explaining why the healthcheck failed.Because the healthcheck endpoint may be called many times, and Elasticsearch calls are not free, we should cache the response of Elasticsearch for up to 10 seconds per call.
Original Comments:
madewithkode commented on Fri May 08 2020:
source
madewithkode commented on Fri May 08 2020:
cluster_response = urlopen('http://0.0.0.0:8000/_cluster/health/image')
However, I keep getting a 404. Is there something I'm doing wrong?
source
madewithkode commented on Fri May 08 2020:
Figured this, didn't know elastic search was running on a seperate host/port :)
source
aldenstpage commented on Fri May 08 2020:
It would be best to use the equivalent
elasticsearch-py
orelasticsearch-dsl
query instead of making direct calls to the REST API (you can get an instance of the connection to Elasticsearch fromsearch_controller.py
). Here's an example for getting the cluster health; there ought to also be a way to narrow the query to theimage
index.source
madewithkode commented on Sat May 09 2020:
On Fri, May 8, 2020, 21:06 Alden S Page [email protected] wrote:
source
madewithkode commented on Sat May 09 2020:
madewithkode commented on Sat May 09 2020:
I've successfully managed to query the health of the entire cluster, using the Elasticsearch connection instance gotten from
search_controller.py
. However when i try to limit the health check to just theimage
index, the request never resolves and continues to run forever with no response. And when i try to specify a timeout for the request, i get an "Illegal argument exception" even thoughtimeout
is a valid kwarg referenced in the API docs.It'd be nice to point out that as at the time of writing, I'm yet to successfully run
./load_sample_data.sh
so i don't know if this could be linked to the above problem.source
madewithkode commented on Mon May 11 2020:
Successfully got the
load_sample_data.sh
to run, and so far every other thing is working fine.I've also set up the 10s response caching on the
/healthcheck
view using redis and also the error logging.However, I figured out the reason for the unresponsiveness when querying the elastic search
image
index was that it was non-existent and that the whole cluster index was empty too.Do I need to do a manual population or something?
source
aldenstpage commented on Mon May 11 2020:
In my experience, the ES Python libs can behave in unexpected ways that you sometimes have to work around. Since it seems like querying specifically for the
image
index health hangs when the index doesn't exist, perhaps you could query for healthchecks of every index in the cluster, and fail the healthcheck ifimage
is not among them and green?It sounds like it's coming along nicely!
source
madewithkode commented on Tue May 12 2020:
Hey Alden...Many thanks again for coming through with better insights. Suggestion sounds nice, would proceed with it.
And yes, the whole stuff is getting more interesting, learnt a handful in the few days :)
source
The text was updated successfully, but these errors were encountered: