You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We don't currently provide an API endpoint for customer's monitoring system to use for general health checks. The API should provide:
200 or 50x response
error message for failed status check
optional configurable timeout
One thing that we may want to consider is whether the API can be used with silo-specific endpoints or Recovery silo endpoint only, or both. From a Nexus monitoring perspective, the check against any silo endpoint should suffice. There is also a chance that a particular silo endpoint doesn't work because of an external DNS issue (which is uncommon) so there may be some value for customer to check all silo endpoints.
The text was updated successfully, but these errors were encountered:
Closes#3923
Adds `/v1/ping` that always returns `{ "status": "ok" }` if it returns
anything at all. I went with `ping` over the initial `/v1/system/health`
because the latter is vague about its meaning, whereas everyone know
ping means a trivial request and response. I also thought it was weird
to put an endpoint with no auth check under `/v1/system`, where ~all the
other endpoints require fleet-level perms.
This doesn't add too much over hitting an existing endpoint, but I think
it's worth it because
* It doesn't hit the DB
* It has no auth check
* It gives a very simple answer to "what endpoint should I use to ping
the API?" (a question we have gotten at least once)
* It's easy (I already did it)
Questions that occurred to me while working through this:
- Should we actually attempt to do something in the handler that would
tell us, e.g., whether the DB is up?
- No, that would be more than a ping
- Raises DoS questions if not auth gated
- Could add a db status endpoint or or you could use any endpoint that
returns data
- What tag should this be under?
- Initially added a `system` tag because a) this doesn't fit under
existing `system/blah` tags and b) it really does feel miscellaneous
- Changed to `system/status`, with the idea that if we add other kinds
of checks, they would be new endpoints under this tag.
We don't currently provide an API endpoint for customer's monitoring system to use for general health checks. The API should provide:
One thing that we may want to consider is whether the API can be used with silo-specific endpoints or Recovery silo endpoint only, or both. From a Nexus monitoring perspective, the check against any silo endpoint should suffice. There is also a chance that a particular silo endpoint doesn't work because of an external DNS issue (which is uncommon) so there may be some value for customer to check all silo endpoints.
The text was updated successfully, but these errors were encountered: