[Elastic Agent] Add process error handling guidelines (#29152)

We don't have a good place yet to document expectations around services run by Elastic Agent. I'm starting this document to have a place to add more content to it but I expect long term we need to figure out a better place. This guideline comes out of recent issues we had in Cloud and local setups of fleet-server (elastic/fleet-server#883). We never set clear guidlines on what the expectation is of a service run by Elastic Agent and Elastic Agent itself. This PR is kick off the discussion. Co-authored-by: Gil Raphaelli <[email protected]>
elastic · Nov 30, 2021 · 51463bf · 51463bf
1 parent 180e7f3
commit 51463bf
Showing 1 changed file with 9 additions and 0 deletions.
diff --git a/x-pack/elastic-agent/GUIDELINES.md b/x-pack/elastic-agent/GUIDELINES.md
@@ -0,0 +1,9 @@
+# Guidelines
+
+This document contains architecture details around Elastic Agent and guidelines on how new inputs and processes should be built.
+
+## Processes running as service and error handling
+
+All the processes started by Elastic Agent are running as service. Each service is expected to handle local errors on its own and continue working. A process should only fail on startup if an invalid configuration is passed in. As soon as a process is running and partial updates to the config are made without restart, the service is expected to keep running but report the errors.
+
+A service that needs to do setup tasks on startup is expected to retry until it succeeds and not error out after a certain timeout.