diff --git a/docs/src/docs/arc42/cross-cutting/discovery-process-dtr.adoc b/docs/src/docs/arc42/cross-cutting/discovery-process-dtr.adoc new file mode 100644 index 0000000000..43349184bf --- /dev/null +++ b/docs/src/docs/arc42/cross-cutting/discovery-process-dtr.adoc @@ -0,0 +1,197 @@ +The Dataspace Discovery Service handles multiple EDC-Urls received for BPN. +This applies to the following scenarios: + +==== Scenarios + +===== EDC with multiple DTRs + +IRS queries all DTRs for the globalAssetId and will take the first result it gets. +If none of the DTRs return a result, IRS will create a tombstone. + +[plantuml,target=discovery-DTR--EDC-with-multiple-DTRs,format=svg] +---- +@startuml +participant IRS +participant DiscoveryService +participant "EDC Provider" as EDCProvider +participant "DTR 1" as DTR1 +participant "DTR 2" as DTR2 + +IRS ->> DiscoveryService: Get EDCs for BPN +DiscoveryService ->> IRS: Return list of 1 EDC +IRS ->> EDCProvider: Query for DTR contract offer +EDCProvider ->> IRS: 2 DTR contract offers + +par QueryDTR1 + IRS ->> EDCProvider: Negotiate contract + IRS ->> DTR1: Query for DT + DTR1 ->> IRS: no DT +end + +par QueryDTR2 + IRS ->> EDCProvider: Negotiate contract + IRS ->> DTR2: Query for DT + DTR2 ->> IRS: DT +end +@enduml +---- + +===== Multiple EDCs with one DTR + +IRS starts a contract negotiation for all registry contract offers in parallel and queries the DTRs for all successful negotiations. +The first registry which responds with a DT will be the one used by IRS. + +[plantuml,target=discovery-DTR--multiple-EDCs-with-one-DTR,format=svg] +---- +@startuml +participant IRS +participant DiscoveryService +participant "EDC Provider 1" as EDCProvider1 +participant "EDC Provider 2" as EDCProvider2 +participant "EDC Provider 3" as EDCProvider3 +participant DTR + +IRS ->> DiscoveryService: Get EDCs for BPN +DiscoveryService ->> IRS: Return list of 3 EDCs + +par CatalogRequestEDC1 + IRS ->> EDCProvider1: Query for DTR contract offer + EDCProvider1 ->> IRS: No offer +end + +par CatalogRequestEDC2 + IRS ->> EDCProvider2: Query for DTR contract offer + EDCProvider2 ->> IRS: No offer +end + +par CatalogRequestEDC3 + IRS ->> EDCProvider3: Query for DTR contract offer + EDCProvider3 ->> IRS: DTR contract offer + IRS -> EDCProvider3: Negotiate contract + IRS ->> DTR: Query for DT + DTR ->> IRS: DT +end +@enduml +---- + +===== One EDC with one DTR + +Only one EDC found for BPN and the catalog only contains one offer for the DTR. +IRS will use this registry and will create a tombstone if no DT could be found for the globalAssetId. + +[plantuml,target=discovery-DTR--one-EDC-with-one-DTR,format=svg] +---- +@startuml +participant IRS +participant DiscoveryService +participant "EDC Provider 3" as EDCProvider3 +participant DTR + +IRS ->> DiscoveryService: Get EDCs for BPN +DiscoveryService ->> IRS: Return list of 1 EDC +IRS ->> EDCProvider3: Query for DTR contract offer +EDCProvider3 ->> IRS: DTR contract offer +IRS -> EDCProvider3: Negotiate contract +IRS ->> DTR: Query for DT +DTR ->> IRS: DT +@enduml +---- + +===== Multiple EDCs with multiple DTRs + +IRS starts a contract negotiation for all the registry offers. + +[plantuml,target=discovery-DTR--multiple-EDCs-with-multiple-DTRs,format=svg] +---- +@startuml +participant IRS +participant DiscoveryService +participant "EDC Provider 1" as EDCProvider1 +participant "EDC Provider 2" as EDCProvider2 +participant "EDC Provider 3" as EDCProvider3 +participant "DTR" as DTR + +IRS ->> DiscoveryService: Get EDCs for BPN +DiscoveryService ->> IRS: Return list of 3 EDCs + +par CatalogRequestEDC1 + IRS ->> EDCProvider1: Query for DTR contract offer + EDCProvider1 ->> IRS: No offer +end + +par CatalogRequestEDC2 + IRS ->> EDCProvider2: Query for DTR contract offer + EDCProvider2 ->> IRS: DTR contract offer + IRS -> EDCProvider2: Negotiate contract + IRS ->> DTR: Query for DT + DTR ->> IRS: DT +end + +par CatalogRequestEDC3 + IRS ->> EDCProvider3: Query for DTR contract offer + EDCProvider3 ->> IRS: DTR contract offer + IRS -> EDCProvider3: Negotiate contract + IRS ->> DTR: Query for DT + DTR ->> IRS: No DT +end +@enduml +---- + +===== Multiple EDCs with no DTRs + +IRS starts a contract negotiation for all the registry offers and creates a tombstone since no DTR could be discovered. + +[plantuml,target=discovery-DTR--multiple-EDCs-with-no-DTRs,format=svg] +---- +@startuml +actor IRS +actor "Discovery Service" as DiscoveryService +participant "EDC 1" as EDCProvider1 +participant "EDC 2" as EDCProvider2 +participant "EDC 3" as EDCProvider3 + +IRS -> DiscoveryService: Get EDCs for BPN +DiscoveryService -> IRS: Return list of 3 EDCs + +par Catalog Request to EDC 1 + IRS -> EDCProvider1: Query for DTR contract offer + EDCProvider1 -> IRS: No offer +end + +and Catalog Request to EDC 2 + IRS -> EDCProvider2: Query for DTR contract offer + EDCProvider2 -> IRS: No offer +end + +and Catalog Request to EDC 3 + IRS -> EDCProvider3: Query for DTR contract offer + EDCProvider3 -> IRS: No offer +end + +IRS -> IRS: Tombstone +@enduml +---- + +==== Special Scenarios + +===== Same DT in multiple DTRs + +IRS will use all registries to query for the globalAssetId and takes the first result which is returned. +If no DT could be found in any of the DTRs, IRS will create a tombstone. + +===== Multiple DTs (with the same globalAssetId) in one DTR + +IRS uses the `/query` endpoint of the DTR to get the DT id based on the globalAssetId. +If more than one id is present for a globalAssetId, IRS will use the first of the list. + +[plantuml,target=discovery-DTR--multiple-DTs-with-the-same-globalAssedId-in-one-DTR,format=svg] +---- +@startuml +actor IRS +participant DTR + +IRS -> DTR: /query for globalAssetId +DTR -> IRS: return list of two results +IRS -> IRS: use first +@enduml +---- diff --git a/docs/src/docs/arc42/cross-cutting/under-the-hood.adoc b/docs/src/docs/arc42/cross-cutting/under-the-hood.adoc index aaa8440e10..cb3ba4db52 100644 --- a/docs/src/docs/arc42/cross-cutting/under-the-hood.adoc +++ b/docs/src/docs/arc42/cross-cutting/under-the-hood.adoc @@ -1,82 +1,115 @@ = "Under-the-hood" concepts == Persistence + The IRS stores two types of data in a persistent way: - Job metadata - Job payloads, e.g. AAS shells or submodel data -All of this is data is stored in an object store. The currently used implementation is Minio (Amazon S3 compatible). -This reduces the complexity in storing and retrieving data. There also is no predefined model for the data, every document can be stored as it is. +All of this is data is stored in an object store. +The currently used implementation is Minio (Amazon S3 compatible). +This reduces the complexity in storing and retrieving data. +There also is no predefined model for the data, every document can be stored as it is. The downside of this approach is lack of query functionality, as we can only search through the keys of the entries but not based on the value data. In the future, another approach or an additional way to to index the data might be required. -To let the data survive system restarts, Minio needs to use a persistent volume for the data storage. A default configuration for this is provided in the Helm charts. +To let the data survive system restarts, Minio needs to use a persistent volume for the data storage. +A default configuration for this is provided in the Helm charts. == Transaction handling + There currently is no transaction management in the IRS. == Session handling + There is no session handling in the IRS, access is solely based on bearer tokens, the API is stateless. == Communication and integration -All interfaces to other systems are using RESTful calls over HTTP(S). Where central authentication is required, a common OAuth2 provider is used. + +All interfaces to other systems are using RESTful calls over HTTP(S). +Where central authentication is required, a common OAuth2 provider is used. For outgoing calls, the Spring RestTemplate mechanism is used and separate RestTemplates are created for the different ways of authentication. For incoming calls, we utilize the Spring REST Controller mechanism, annotating the interfaces accordingly and also documenting the endpoints using OpenAPI annotations. == Exception and error handling + There are two types of potential errors in the IRS: === Technical errors -Technical errors occur when there is a problem with the application itself, its configuration or directly connected infrastructure, e.g. the Minio persistence. Usually, the application cannot solve these problems by itself and requires some external support (manual work or automated recovery mechanisms, e.g. Kubernetes liveness probes). + +Technical errors occur when there is a problem with the application itself, its configuration or directly connected infrastructure, e.g. the Minio persistence. +Usually, the application cannot solve these problems by itself and requires some external support (manual work or automated recovery mechanisms, e.g. Kubernetes liveness probes). These errors are printed mainly to the application log and are relevant for the healthchecks. === Functional errors -Functional errors occur when there is a problem with the data that is being processed or external systems are unavailable and data cannot be sent / fetched as required for the process. While the system might not be able to provide the required function at that moment, it may work with a different dataset or as soon as the external systems recover. + +Functional errors occur when there is a problem with the data that is being processed or external systems are unavailable and data cannot be sent / fetched as required for the process. +While the system might not be able to provide the required function at that moment, it may work with a different dataset or as soon as the external systems recover. These errors are reported in the Job response and do not directly affect application health. === Rules for exception handling + ==== Throw or log, don't do both -When catching an exception, either log the exception and handle the problem or rethrow it, so it can be handled at a higher level of the code. By doing both, an exception might be written to the log multiple times, which can be confusing. + +When catching an exception, either log the exception and handle the problem or rethrow it, so it can be handled at a higher level of the code. +By doing both, an exception might be written to the log multiple times, which can be confusing. ==== Write own base exceptions for (internal) interfaces -By defining a common (checked) base exception for an interface, the caller is forced to handle potential errors, but can keep the logic simple. On the other hand, you still have the possibility to derive various, meaningful exceptions for different error cases, which can then be thrown via the API. + +By defining a common (checked) base exception for an interface, the caller is forced to handle potential errors, but can keep the logic simple. +On the other hand, you still have the possibility to derive various, meaningful exceptions for different error cases, which can then be thrown via the API. Of course, when using only RuntimeExceptions, this is not necessary - but those can be overlooked quite easily, so be careful there. ==== Central fallback exception handler -There will always be some exception that cannot be handled inside of the code correctly - or it may just have been unforeseen. A central fallback exception handler is required so all problems are visible in the log and the API always returns meaningful responses. In some cases, this is as simple as a HTTP 500. + +There will always be some exception that cannot be handled inside of the code correctly - or it may just have been unforeseen. +A central fallback exception handler is required so all problems are visible in the log and the API always returns meaningful responses. +In some cases, this is as simple as a HTTP 500. ==== Dont expose too much exception details over API -It's good to inform the user, why their request did not work, but only if they can do something about it (HTTP 4xx). So in case of application problems, you should not expose details of the problem to the caller. This way, we avoid opening potential attack vectors. -== Parallelization and threading -The heart of the IRS is the parallel execution of planned jobs. As almost each job requires multiple calls to various endpoints, those are done in parallel as well to reduce the total execution time for each job. +It's good to inform the user, why their request did not work, but only if they can do something about it (HTTP 4xx). +So in case of application problems, you should not expose details of the problem to the caller. +This way, we avoid opening potential attack vectors. -Tasks execution is orchestrated by the JobOrchestrator class. It utilizes a central ExecutorService, which manages the number of threads and schedules new Task as they come in. +== Parallelization and threading +The heart of the IRS is the parallel execution of planned jobs. +As almost each job requires multiple calls to various endpoints, those are done in parallel as well to reduce the total execution time for each job. +Tasks execution is orchestrated by the JobOrchestrator class. +It utilizes a central ExecutorService, which manages the number of threads and schedules new Task as they come in. == Plausibility checks and validation + Data validation happens at two points: -- IRS API: the data sent by the client is validated to match the model defined in the IRS. If the validation fails, the IRS sends a HTTP 400 response and indicates the problem to the caller. +- IRS API: the data sent by the client is validated to match the model defined in the IRS. +If the validation fails, the IRS sends a HTTP 400 response and indicates the problem to the caller. - Submodel payload: each time a submodel payload is requested from via EDC, the data is validated against the model defined in the SemanticHub for the matching aspect type. -- EDC Contract Offer Policy: each time IRS consumes data over the EDC, the policies of the offered contract will be validated. IDs of so-called "Rahmenverträgen" or Framework-Agreements can be added to the IRS Policy Store to be accepted by the IRS. If a Contract Offer does not match any of the IDs store in Policy Store, the contract offer will be declined and no data will be consumed. +- EDC Contract Offer Policy: each time IRS consumes data over the EDC, the policies of the offered contract will be validated. +IDs of so-called "Rahmenverträgen" or Framework-Agreements can be added to the IRS Policy Store to be accepted by the IRS. +If a Contract Offer does not match any of the IDs store in Policy Store, the contract offer will be declined and no data will be consumed. == Policy Store -The IRS gives its users the ability to manage, create and delete complex policies containing permissions and constraints in order to obtain the most precise control over access and use of data received from the edc provider. Policies stored in Policy Store will serve as input with allowed restriction and will be checked against every item from EDC Catalog. +The IRS gives its users the ability to manage, create and delete complex policies containing permissions and constraints in order to obtain the most precise control over access and use of data received from the edc provider. +Policies stored in Policy Store will serve as input with allowed restriction and will be checked against every item from EDC Catalog. -The structure of a Policy that can be stored in storage can be easily viewed by using Policy Store endpoints in the published API documentation. Each policy may contain more than one permission, which in turn consists of constraints linked together by AND or OR relationships. This model provides full flexibility and control over stored access and use policies. +The structure of a Policy that can be stored in storage can be easily viewed by using Policy Store endpoints in the published API documentation. +Each policy may contain more than one permission, which in turn consists of constraints linked together by AND or OR relationships. +This model provides full flexibility and control over stored access and use policies. == Digital Twin / EDC requirements -In order to work with the decentral network approach, IRS requires the Digital Twin to contain a `"subprotocolBody"` in each of the submodelDescriptor endpoints. This `"subprotocolBody"` has to contain the `"id"` of the EDC asset, as well as the `"dspEndpoint"` of the EDC, separated by a semicolon (e.g. `"subprotocolBody": "id=123;dspEndpoint=http://edc.control.plane/api/v1/dsp"`). +In order to work with the decentral network approach, IRS requires the Digital Twin to contain a `"subprotocolBody"` in each of the submodelDescriptor endpoints. +This `"subprotocolBody"` has to contain the `"id"` of the EDC asset, as well as the `"dspEndpoint"` of the EDC, separated by a semicolon (e.g. `"subprotocolBody": "id=123;dspEndpoint=http://edc.control.plane/api/v1/dsp"`). The `"dspEndpoint"` is used to request the EDC catalog of the dataprovider and the `"id"` to filter for the exact asset inside this catalog. @@ -94,7 +127,8 @@ Whenever a BPN is resolved via BPDM, the partner name is cached on IRS side, as === Semantics Hub -Whenever a semantic model schema is requested from the Semantic Hub, it is stored locally until the cache is evicted (configurable). The IRS can preload configured schema models on startup to reduce on demand call times. +Whenever a semantic model schema is requested from the Semantic Hub, it is stored locally until the cache is evicted (configurable). +The IRS can preload configured schema models on startup to reduce on demand call times. Additionally, models can be deployed with the system as a backup to the real Semantic Hub service. @@ -109,18 +143,24 @@ The time to live for both caches can be configured separately as described in th Further information on Discovery Service can be found in the chapter "System scope and context". +== Discovery Process + +=== Digital Twin Registry + +include::discovery-process-dtr.adoc[] + === EDC EndpointDataReferenceStorage is in-memory local storage that holds records (EndpointDataReferences) by either assetId or contractAgreementId. When EDC gets EndpointDataReference describing endpoint serving data it uses EndpointDataReferenceStorage and query it by assetId. -This allows reuse of already existing EndpointDataReference if it is present, valid, and it's token is not expired, -rather than starting whole new contract negotiation process. +This allows reuse of already existing EndpointDataReference if it is present, valid, and it's token is not expired, rather than starting whole new contract negotiation process. -In case token is expired the process is also shortened. We don't have to start new contract negotiation process, -since we can obtain required contractAgreementId from present authCode. This improves request processing time. +In case token is expired the process is also shortened. +We don't have to start new contract negotiation process, since we can obtain required contractAgreementId from present authCode. +This improves request processing time. -[source, mermaid] +[source,mermaid] .... sequenceDiagram autonumber diff --git a/docs/src/docs/arc42/glossary.adoc b/docs/src/docs/arc42/glossary.adoc index 9fb18f3da7..68eb016e89 100644 --- a/docs/src/docs/arc42/glossary.adoc +++ b/docs/src/docs/arc42/glossary.adoc @@ -7,9 +7,14 @@ |Aspect servers (submodel endpoints) |Companies participating in the interorganizational data exchange provides their data over aspect servers. The so called "submodel-descriptors" in the AAS shells are pointing to these AspectServers which provide the data-assets of the participating these companies in Catena-X. |BoM |Bill of Materials +|BPN | Business Partner Number +|DT | Digital Twin +|DTR | Digital Twin Registry. The Digital Twin Registry is the central registry which lists all digital twins and references their aspects including information about the underlying asset, asset manufacturer, and access options (e.g. aspect endpoints). +|EDC | Eclipse Dataspace Connector |Edge |see Traversal Aspect |IRS |Item Relationship Service |Item Graph |The result returned via the IRS. This corresponds to a tree structure in which each node represents a part of a virtual asset. +|MIW | Managed Identity Wallet |MTPDC |Formerly known Service Name: Multi Tier Parts Data Chain |PRS |Formerly known Service Name: Parts Relationship Service |Traversal Aspect |aka Edge: Aspect which the IRS uses for traversal through the data chain. Identified by a parent-child or a child-parent relationship.