From 2163d56b83879adc5b76e095aceed848e530719e Mon Sep 17 00:00:00 2001 From: Alexander Bakardzhiev <17193014+alexbakar@users.noreply.github.com> Date: Sat, 7 Mar 2020 15:22:32 +0200 Subject: [PATCH 1/2] Health endpoint proposal Signed-off-by: Alexander Bakardzhiev <17193014+alexbakar@users.noreply.github.com> --- 050-health-endpoint/proposal.md | 53 +++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 050-health-endpoint/proposal.md diff --git a/050-health-endpoint/proposal.md b/050-health-endpoint/proposal.md new file mode 100644 index 00000000..1646a79a --- /dev/null +++ b/050-health-endpoint/proposal.md @@ -0,0 +1,53 @@ +# Summary +A public HTTP endpoint that gives basic information about the health of Concourse + cluster. + +# Motivation +There are some scenarios in which a Concourse cluster is part of bigger setup for +automation. In such scenario other parts of the system rely on Concourse not +only for CI/CD, but for other type of automations as well. In order to have the +whole process working the system also monitors the availability of its parts and + takes measures if some of its parts are not operational (for example executes +some predefined steps - sends informing mails, triggers alerts, executes +medication steps, etc). In such cases it is important for the system to be able +to determine the state of every part of it. So it would be nice if the parts of +this system have a common way to return their health/availability status. + +Currently there is no easy way for external (monitoring) system to understand +if a Concourse cluster is live and operational. It would be nice if Concourse +also gives information for its health so in (the rare) case it is not healthy +the external system can react. + +# Proposal +Concourse can expose a public HTTP endpoint called "health" endpoint that gives +basic information for its health. Similar to the approach with the existing +"info" endpoint the "health" endpoint can be found at /health. +It can return a JSON object with very basic information about the health of its +parts: +{ +db: + : + : +... +} + +# Open Questions +For which parts the information should be present in the JSON object - for all +the VMs (DB, web and worker nodes), or only for the DB and all worker nodes? +This question arises because a web node will serve the request - so at least one + web node should be healthy enough to return the response (if there is no healthy +web node - the caller would receive an error anyway). + +What the status should contain (level of details) - only "OK" and "NOK" - or more +detailed information about the state of the specific workers or web nodes? + +Should there be a specific property to configure caching responses interval +(caching responses might help in the prevention of DoS attacks)? + +# Answered Questions + +# New Implications +This change doesn't aim to change the general workflow of the users, i.e. +creating/updating pipelines and executing jobs. It only aims to ease the +monitoring of the Concourse cluster in order to better integrate it into bigger +systems. From 1d5b7de6d7d619794a339b3eae903222c315458b Mon Sep 17 00:00:00 2001 From: Alexander Bakardzhiev <17193014+alexbakar@users.noreply.github.com> Date: Sat, 7 Mar 2020 15:30:00 +0200 Subject: [PATCH 2/2] Format Signed-off-by: Alexander Bakardzhiev <17193014+alexbakar@users.noreply.github.com> --- 050-health-endpoint/proposal.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/050-health-endpoint/proposal.md b/050-health-endpoint/proposal.md index 1646a79a..a95e3585 100644 --- a/050-health-endpoint/proposal.md +++ b/050-health-endpoint/proposal.md @@ -1,8 +1,11 @@ # Summary + A public HTTP endpoint that gives basic information about the health of Concourse cluster. + # Motivation + There are some scenarios in which a Concourse cluster is part of bigger setup for automation. In such scenario other parts of the system rely on Concourse not only for CI/CD, but for other type of automations as well. In order to have the @@ -18,7 +21,9 @@ if a Concourse cluster is live and operational. It would be nice if Concourse also gives information for its health so in (the rare) case it is not healthy the external system can react. + # Proposal + Concourse can expose a public HTTP endpoint called "health" endpoint that gives basic information for its health. Similar to the approach with the existing "info" endpoint the "health" endpoint can be found at /health. @@ -31,7 +36,9 @@ db: ... } + # Open Questions + For which parts the information should be present in the JSON object - for all the VMs (DB, web and worker nodes), or only for the DB and all worker nodes? This question arises because a web node will serve the request - so at least one @@ -44,9 +51,12 @@ detailed information about the state of the specific workers or web nodes? Should there be a specific property to configure caching responses interval (caching responses might help in the prevention of DoS attacks)? + # Answered Questions + # New Implications + This change doesn't aim to change the general workflow of the users, i.e. creating/updating pipelines and executing jobs. It only aims to ease the monitoring of the Concourse cluster in order to better integrate it into bigger