feat(adr): initial ADR to address service list #377

jpwhitemn · 2021-04-26T03:33:39Z

Listing requirements, considerations, and alternative implementation considerations around having a common service list for all EdgeX services.

Signed-off-by: Jim White [email protected]

PR Checklist

Please check if your PR fulfills the following requirements:

Changes have been rendered and validated locally using mkdocs-material (see edgex-docs README)
Docs have been added / updated (for bug fixes / features)

If your build fails due to your commit message not passing the build checks, please review the guidelines here: https://github.com/edgexfoundry/edgex-docs/blob/master/.github/Contributing.md.

What is the current behavior?

n/a - new ADR for service list across all services

Issue Number:

none

What is the new behavior?

n/a

Does this PR introduce a breaking change?

Yes
No

Specific Instructions

Are there any specific instructions or things that should be known prior to reviewing?

Other information

docs_src/design/adr/0019-Service-List.md

Listing requirements, considerations, and alternative implementation considerations around having a common service list for all EdgeX services. Signed-off-by: Jim White <[email protected]>

Co-authored-by: Bryon Nevis <[email protected]> Signed-off-by: Jim White <[email protected]>

bnevis-i · 2021-05-06T18:41:36Z

I would like to mention a possible solution to the dynamic token generation problem that might go well with this dynamic service list issue. There is an existing CNCF incubation project out there called SPIFEE/SPIRE. https://github.com/spiffe/spire

This project used Unix domain sockets to authenticate the calling process and issue them credentials. It does this by looking up the remote end's process ID, and collecting metadata on it. So, the path and sha of the executable for a Linux process, a docker image id for a process in a Docker container, or pod names or other metadata for a kubernetes pod. This data is looked up in a central table that is used to make an authorization decision. If a match is found, a JWT or X.509 certificate is delivered to the calling process.

This technology would be effective in replacing the static list of secret store tokens with a file-based mechanism to authorize new services on-demand. It would also get us 90% there for EdgeX microservice authentication. The catch? There is no support for Windows.

jim-wang-intel · 2021-05-06T18:49:56Z

I would like to mention a possible solution to the dynamic token generation problem that might go well with this dynamic service list issue. There is an existing CNCF incubation project out there called SPIFEE/SPIRE. https://github.com/spiffe/spire

This project used Unix domain sockets to authenticate the calling process and issue them credentials. It does this by looking up the remote end's process ID, and collecting metadata on it. So, the path and sha of the executable for a Linux process, a docker image id for a process in a Docker container, or pod names or other metadata for a kubernetes pod. This data is looked up in a central table that is used to make an authorization decision. If a match is found, a JWT or X.509 certificate is delivered to the calling process.

This technology would be effective in replacing the static list of secret store tokens with a file-based mechanism to authorize new services on-demand. It would also get us 90% there for EdgeX microservice authentication. The catch? There is no support for Windows.

Would WLS2 in Windows consider support for Windows? If so, then we should be ok.

docs_src/design/adr/0019-Service-List.md

tonyespy

Comments added inline...

docs_src/design/adr/0019-Service-List.md

design proposal stemming from the 5/17 Monthly Archs meeting Also cleaned up and added content to requirements and context sections based off additional review feedback Signed-off-by: Jim White <[email protected]>

lenny-goodell

LGTM

siggiskulason

LGTM

docs_src/design/adr/0019-Service-List.md

cloudxxx8

LGTM

Signed-off-by: jpwhitemn <[email protected]>

bnevis-i · 2021-10-11T20:59:54Z

docs_src/design/adr/0019-Service-List.md

+- The user interface needs to know which services, specifically device and application services, are available in order to provide interaction with those services, and potentially to display some metrics on operational services.
+- The CLI could refer to a dynamic list of which services are operational and which device services and application services to include in start/stop operations.  *Note: Alternatively, the CLI could simply issue a command to a service and if the service is not available, issue an error response.  However, if a new service was added (such as a new device service), the CLI would not know about it unless it had access to some dynamic list of services.*
+- Security proxy setup requires a list of services for which a reverse proxy route should be established.  This is a subset of the full list of EdgeX services.
+- Core Metadata must maintain a list of device-type services.  This list is used to validate device additions (i.e., any new device definition must include a valid named device service which it is going to be associated with).  This list is also used when routing commands (via core command) to the appropriate device service.


The proposed delayed start services ADR depends on service keys hard-coded into the docker-compose file in order to build a mapping from service to SPIFFE ID. The ADR also distinguishes service-name from service-key, in the case where there are multiple instances of the same executable executing within the system, personized by configuration arguments. The service-name list can be added-to by end-users to create new services.

lenny-goodell · 2021-10-18T15:56:24Z

docs_src/design/adr/0019-Service-List.md

+Currently, the list of services is haphazardly managed by the needing service.
+
+- The security bootstrapping services use a combination of static lists for known EdgeX services and environment variables for services that are not statically known.  (This list is needed for both Vault and Consul to setup appropriate access control lists.)
+- The user interface and CLI do not deal with service lists today and assume all services are operational (with known service host and port defined in configuration).


Is the UI now using the new Registry GetAllServiceEndpoints() API to get all registered services?

lenny-goodell · 2021-10-18T16:06:17Z

docs_src/design/adr/0019-Service-List.md

+- In some cases, a service may be “disabled” at runtime because of resource constraints (example: notification service is turned off as it is not used and there is a need to operate with limited memory and CPU capability).  Security would still need to know about disabled services in order to create a token for such services that may be enabled later.  However, the other services (or 3rd party applications) may need to know that they are disabled vs enabled.
+- In the case that the services are distributed (especially when device services run on different hosts - as is likely), access to a distributed file system or volume mount becomes more of an issue and requires additional technology.  *Note: distributing device services while in secure mode is now more complex due to the fact that all services require Vault access and the services get their Vault token from file.  Distributed device services in secure mode would therefore would require a distributed file system.*
+- The service list would have to be a list of services and service locations (URI).  Therefore, the service list will really be a map of service names (keys) and the location of the services URIs (values) and potentially other information such as `known secrets` as used by the secret store setup (`redisdb` is the only `known secret` to date).  This will help support things like the API gateway setup as well.  In some cases, just a list of the service names is needed (in security token setup for example).  In this latter case, the service would just pick off the service keys.
+- Generation of security tokens for the new services under most of the conceived alternatives is complex.  In the current security bootstrapping, secret store setup will require a restart of all the services in order to add or remove a service.  The ORRA project and some adopters are clear that they would like to have the ability to add/remove services dynamically at runtime.  *Note: in some circumstances, services can be added/removed dynamically today, but not when EdgeX security services are enabled.  CLI and UI also do not allow for the addition/remove of services - such as device or application services.*


This is now out of date with recent enhancements. Secret Store setup can now be restarted w/o requiring the other service to be restarted.

jpwhitemn · 2021-10-19T00:43:31Z

Per the Monthly Architect's meeting of 10/18/21 - we are rejecting this ADR for now.
Discussion and Rationale:

When Consul is running, it is felt that Consul (the registry) and the Consul API are sufficient to get the list of running services. Services could implement a callback or use a routine request "poll" to get an updated list of services if/when a new service is added (i.e., a new device service or new app service) or if a service is removed. All services going forward (Kamakura and beyond) should use this means to get a list of services (that is in the GUI, CLI, SMA, etc.).
When Consul is not running, the likely candidate to satisfy the requirement is to use a configuration file (TOML) that lists the services. However, in this case, services have to know where to get this file or the file has to be provided to them as in some sort of "seed" operation. Further, there has to be away to allow services to be added/removed from this list (and there has to be a way to subsequently notify services of a change to this file). In the future, it was considered that core metadata may be the ideal location to provide or initially seed the list of services from the configuration TOML file and other services would get their list of services from metadata (and using callbacks from metadata to notify services of adds/deletes from the service list). Of course, when using some sort of seed operation to provide metadata with the list of services from a configuration file, there is also the possibility that the configuration file would eventually differ from what is in metadata if new services are added/removed and the file is not kept up to date.
Because the amount of work to implement this strategy was significant and there remained some pesky issues as documented here to still solve, it was felt that the solution was not much better than the existing implementation and we would "punt" this need to future EdgeX when more use cases present a clear need and/or a better alternative is determined (perhaps because of changes in the overall EdgeX architecture).

jpwhitemn added ADR Architecture Decision Record documentation enhancement New feature or request request for comments labels Apr 26, 2021

jpwhitemn requested review from tonyespy, bnevis-i, hutchic, iain-anderson and lenny-goodell April 26, 2021 03:35

cloudxxx8 reviewed Apr 26, 2021

View reviewed changes

docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved

docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved