From 51463bf4ebcd4b425f8e751fa10edc651f0c2001 Mon Sep 17 00:00:00 2001 From: Nicolas Ruflin Date: Tue, 30 Nov 2021 14:50:15 +0100 Subject: [PATCH] [Elastic Agent] Add process error handling guidelines (#29152) We don't have a good place yet to document expectations around services run by Elastic Agent. I'm starting this document to have a place to add more content to it but I expect long term we need to figure out a better place. This guideline comes out of recent issues we had in Cloud and local setups of fleet-server (https://github.com/elastic/fleet-server/pull/883). We never set clear guidlines on what the expectation is of a service run by Elastic Agent and Elastic Agent itself. This PR is kick off the discussion. Co-authored-by: Gil Raphaelli --- x-pack/elastic-agent/GUIDELINES.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 x-pack/elastic-agent/GUIDELINES.md diff --git a/x-pack/elastic-agent/GUIDELINES.md b/x-pack/elastic-agent/GUIDELINES.md new file mode 100644 index 00000000000..3fa741fc991 --- /dev/null +++ b/x-pack/elastic-agent/GUIDELINES.md @@ -0,0 +1,9 @@ +# Guidelines + +This document contains architecture details around Elastic Agent and guidelines on how new inputs and processes should be built. + +## Processes running as service and error handling + +All the processes started by Elastic Agent are running as service. Each service is expected to handle local errors on its own and continue working. A process should only fail on startup if an invalid configuration is passed in. As soon as a process is running and partial updates to the config are made without restart, the service is expected to keep running but report the errors. + +A service that needs to do setup tasks on startup is expected to retry until it succeeds and not error out after a certain timeout.