Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(adr): initial ADR to address service list #377

Closed
wants to merge 9 commits into from

Conversation

jpwhitemn
Copy link
Member

Listing requirements, considerations, and alternative implementation considerations around having a common service list for all EdgeX services.

Signed-off-by: Jim White [email protected]

PR Checklist

Please check if your PR fulfills the following requirements:

  • Changes have been rendered and validated locally using mkdocs-material (see edgex-docs README)
  • Docs have been added / updated (for bug fixes / features)

If your build fails due to your commit message not passing the build checks, please review the guidelines here: https://github.com/edgexfoundry/edgex-docs/blob/master/.github/Contributing.md.

What is the current behavior?

n/a - new ADR for service list across all services

Issue Number:

none

What is the new behavior?

n/a

Does this PR introduce a breaking change?

  • Yes
  • No

Specific Instructions

Are there any specific instructions or things that should be known prior to reviewing?

Other information

docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
jpwhitemn and others added 6 commits May 5, 2021 12:59
Listing requirements, considerations, and alternative implementation considerations around having a common service list for all EdgeX services.

Signed-off-by: Jim White <[email protected]>
Co-authored-by: Bryon Nevis <[email protected]>
Signed-off-by: Jim White <[email protected]>
Co-authored-by: Bryon Nevis <[email protected]>
Signed-off-by: Jim White <[email protected]>
Co-authored-by: Bryon Nevis <[email protected]>
Signed-off-by: Jim White <[email protected]>
Co-authored-by: Bryon Nevis <[email protected]>
Signed-off-by: Jim White <[email protected]>
Co-authored-by: Bryon Nevis <[email protected]>
Signed-off-by: Jim White <[email protected]>
@bnevis-i
Copy link
Collaborator

bnevis-i commented May 6, 2021

I would like to mention a possible solution to the dynamic token generation problem that might go well with this dynamic service list issue. There is an existing CNCF incubation project out there called SPIFEE/SPIRE. https://github.com/spiffe/spire

This project used Unix domain sockets to authenticate the calling process and issue them credentials. It does this by looking up the remote end's process ID, and collecting metadata on it. So, the path and sha of the executable for a Linux process, a docker image id for a process in a Docker container, or pod names or other metadata for a kubernetes pod. This data is looked up in a central table that is used to make an authorization decision. If a match is found, a JWT or X.509 certificate is delivered to the calling process.

This technology would be effective in replacing the static list of secret store tokens with a file-based mechanism to authorize new services on-demand. It would also get us 90% there for EdgeX microservice authentication. The catch? There is no support for Windows.

@jim-wang-intel
Copy link
Contributor

I would like to mention a possible solution to the dynamic token generation problem that might go well with this dynamic service list issue. There is an existing CNCF incubation project out there called SPIFEE/SPIRE. https://github.com/spiffe/spire

This project used Unix domain sockets to authenticate the calling process and issue them credentials. It does this by looking up the remote end's process ID, and collecting metadata on it. So, the path and sha of the executable for a Linux process, a docker image id for a process in a Docker container, or pod names or other metadata for a kubernetes pod. This data is looked up in a central table that is used to make an authorization decision. If a match is found, a JWT or X.509 certificate is delivered to the calling process.

This technology would be effective in replacing the static list of secret store tokens with a file-based mechanism to authorize new services on-demand. It would also get us 90% there for EdgeX microservice authentication. The catch? There is no support for Windows.

Would WLS2 in Windows consider support for Windows? If so, then we should be ok.

docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Show resolved Hide resolved
Copy link
Member

@tonyespy tonyespy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added inline...

docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
docs_src/design/adr/0019-Service-List.md Show resolved Hide resolved
design proposal stemming from the 5/17 Monthly Archs meeting

Also cleaned up and added content to requirements and context sections based off additional review feedback

Signed-off-by: Jim White <[email protected]>
lenny-goodell
lenny-goodell previously approved these changes May 20, 2021
Copy link
Member

@lenny-goodell lenny-goodell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

siggiskulason
siggiskulason previously approved these changes May 26, 2021
Copy link

@siggiskulason siggiskulason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs_src/design/adr/0019-Service-List.md Outdated Show resolved Hide resolved
cloudxxx8
cloudxxx8 previously approved these changes May 26, 2021
Copy link
Member

@cloudxxx8 cloudxxx8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- The user interface needs to know which services, specifically device and application services, are available in order to provide interaction with those services, and potentially to display some metrics on operational services.
- The CLI could refer to a dynamic list of which services are operational and which device services and application services to include in start/stop operations. *Note: Alternatively, the CLI could simply issue a command to a service and if the service is not available, issue an error response. However, if a new service was added (such as a new device service), the CLI would not know about it unless it had access to some dynamic list of services.*
- Security proxy setup requires a list of services for which a reverse proxy route should be established. This is a subset of the full list of EdgeX services.
- Core Metadata must maintain a list of device-type services. This list is used to validate device additions (i.e., any new device definition must include a valid named device service which it is going to be associated with). This list is also used when routing commands (via core command) to the appropriate device service.
Copy link
Collaborator

@bnevis-i bnevis-i Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed delayed start services ADR depends on service keys hard-coded into the docker-compose file in order to build a mapping from service to SPIFFE ID. The ADR also distinguishes service-name from service-key, in the case where there are multiple instances of the same executable executing within the system, personized by configuration arguments. The service-name list can be added-to by end-users to create new services.

Currently, the list of services is haphazardly managed by the needing service.

- The security bootstrapping services use a combination of static lists for known EdgeX services and environment variables for services that are not statically known. (This list is needed for both Vault and Consul to setup appropriate access control lists.)
- The user interface and CLI do not deal with service lists today and assume all services are operational (with known service host and port defined in configuration).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the UI now using the new Registry GetAllServiceEndpoints() API to get all registered services?

- In some cases, a service may be “disabled” at runtime because of resource constraints (example: notification service is turned off as it is not used and there is a need to operate with limited memory and CPU capability). Security would still need to know about disabled services in order to create a token for such services that may be enabled later. However, the other services (or 3rd party applications) may need to know that they are disabled vs enabled.
- In the case that the services are distributed (especially when device services run on different hosts - as is likely), access to a distributed file system or volume mount becomes more of an issue and requires additional technology. *Note: distributing device services while in secure mode is now more complex due to the fact that all services require Vault access and the services get their Vault token from file. Distributed device services in secure mode would therefore would require a distributed file system.*
- The service list would have to be a list of services and service locations (URI). Therefore, the service list will really be a map of service names (keys) and the location of the services URIs (values) and potentially other information such as `known secrets` as used by the secret store setup (`redisdb` is the only `known secret` to date). This will help support things like the API gateway setup as well. In some cases, just a list of the service names is needed (in security token setup for example). In this latter case, the service would just pick off the service keys.
- Generation of security tokens for the new services under most of the conceived alternatives is complex. In the current security bootstrapping, secret store setup will require a restart of all the services in order to add or remove a service. The ORRA project and some adopters are clear that they would like to have the ability to add/remove services dynamically at runtime. *Note: in some circumstances, services can be added/removed dynamically today, but not when EdgeX security services are enabled. CLI and UI also do not allow for the addition/remove of services - such as device or application services.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now out of date with recent enhancements. Secret Store setup can now be restarted w/o requiring the other service to be restarted.

@jpwhitemn
Copy link
Member Author

Per the Monthly Architect's meeting of 10/18/21 - we are rejecting this ADR for now.
Discussion and Rationale:

  • When Consul is running, it is felt that Consul (the registry) and the Consul API are sufficient to get the list of running services. Services could implement a callback or use a routine request "poll" to get an updated list of services if/when a new service is added (i.e., a new device service or new app service) or if a service is removed. All services going forward (Kamakura and beyond) should use this means to get a list of services (that is in the GUI, CLI, SMA, etc.).
  • When Consul is not running, the likely candidate to satisfy the requirement is to use a configuration file (TOML) that lists the services. However, in this case, services have to know where to get this file or the file has to be provided to them as in some sort of "seed" operation. Further, there has to be away to allow services to be added/removed from this list (and there has to be a way to subsequently notify services of a change to this file). In the future, it was considered that core metadata may be the ideal location to provide or initially seed the list of services from the configuration TOML file and other services would get their list of services from metadata (and using callbacks from metadata to notify services of adds/deletes from the service list). Of course, when using some sort of seed operation to provide metadata with the list of services from a configuration file, there is also the possibility that the configuration file would eventually differ from what is in metadata if new services are added/removed and the file is not kept up to date.
  • Because the amount of work to implement this strategy was significant and there remained some pesky issues as documented here to still solve, it was felt that the solution was not much better than the existing implementation and we would "punt" this need to future EdgeX when more use cases present a clear need and/or a better alternative is determined (perhaps because of changes in the overall EdgeX architecture).

@jpwhitemn jpwhitemn added hold Intended for PRs we want to flag for ongoing review 3-high priority denoting release-blocking issues wontfix This will not be worked on and removed documentation enhancement New feature or request ireland labels Oct 19, 2021
@jpwhitemn jpwhitemn closed this Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3-high priority denoting release-blocking issues ADR Architecture Decision Record hold Intended for PRs we want to flag for ongoing review request for comments wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants