Skip to content

Commit

Permalink
Merge pull request #450 from OP-TED/feature/update-antora
Browse files Browse the repository at this point in the history
updated documentation
  • Loading branch information
costezki authored Feb 21, 2023
2 parents f1f149e + c38dddb commit 2b455af
Show file tree
Hide file tree
Showing 21 changed files with 19,411 additions and 2,347 deletions.
17,158 changes: 17,158 additions & 0 deletions docs/antora/modules/ROOT/attachments/FATs/2023-02-20-TED-SWS-FAT-complete.html

Large diffs are not rendered by default.

Binary file not shown.
34 changes: 28 additions & 6 deletions docs/antora/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,30 @@

* xref:index.adoc[Home]
* link:{attachmentsdir}/ted-sws-architecture/index.html[Preliminary Project Architecture^]
* xref:mapping_suite_cli_toolchain.adoc[Mapping Suite CLI Toolchain]
* xref:demo_installation.adoc[Instructions for Software Engineers]
* xref:user_manual.adoc[User manual]
* xref:system_arhitecture.adoc[System architecture overview]
* xref:using_procurement_data.adoc[Using procurement data]
* [.separated]#**General References**#
** xref:ted-sws-introduction.adoc[About TED-SWS]
** xref:glossary.adoc[Glossary]
* [.separated]#**For TED-SWS Operators**#
** xref:user_manual/getting_started_user_manual.adoc[Getting started]
** xref:user_manual/system-overview.adoc[System overview]
** xref:user_manual/access-security.adoc[Security and access]
** xref:user_manual/workflow-management-airflow.adoc[Workflow management with Airflow]
** xref:user_manual/system-monitoring-metabase.adoc[System monitoring with Metabase]
* [.separated]#**For DevOps**#
** link:{attachmentsdir}/aws-infra-docs/TED-SWS-Installation-manual-v2.5.0.pdf[AWS installation manual (v2.5.0)^]
** link:{attachmentsdir}/aws-infra-docs/TED-SWS-AWS-Infrastructure-architecture-overview-v0.9.pdf[AWS infrastructure architecture (v0.9)^]

* [.separated]#**For End User Developers**#
** xref:ted_data/using_procurement_data.adoc[Accessing data in Cellar]
** link:https://docs.ted.europa.eu/EPO/latest/index.html[eProcurement ontology (latest)^]
* [.separated]#**For TED-SWS Developers**#
** xref:technical/mapping_suite_cli_toolchain.adoc[Mapping suite toolchain]
** xref:technical/demo_installation.adoc[Development installation instructions]
** xref:technical/event_manager.adoc[Event manager description]
** xref:architecture/arhitecture_choices.adoc[System architecture overview]
** link:{attachmentsdir}/ted-sws-architecture/index.html[Enterprise architecture model^]
** xref:architecture/arhitecture_choices.adoc[Architectural choices]
351 changes: 351 additions & 0 deletions docs/antora/modules/ROOT/pages/architecture/arhitecture_choices.adoc

Large diffs are not rendered by default.

476 changes: 476 additions & 0 deletions docs/antora/modules/ROOT/pages/architecture/arhitecture_overview.adoc

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions docs/antora/modules/ROOT/pages/future_work.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
== Future work

In the future, another Master Data Registry type system will be used to
deduplicate entities in the TED-SWS system, which will be implemented
according to the requirements for deduplication of entities from
notices.

The future Master Data Registry (MDR) system for entity deduplication
should have the following architecture:

[arabic]
. *Data Ingestion*: This component is responsible for extracting and
collecting data from various sources, such as databases, files, and
APIs. The data is then transformed, cleaned, and consolidated into a
single format before it is loaded into the MDR.

. *Data Quality*: This component is responsible for enforcing data quality
rules, such as format, completeness, and consistency, on the data before
it is entered into the MDR. This can include tasks such as data
validation, data standardization, and data cleansing.

. *Entity Dedup*: This component is responsible for identifying and
removing duplicate entities in the MDR. This can be done using a
combination of techniques such as string-based, machine learning-based,
or knowledge-based methods.

. *Data Governance*: This component is responsible for ensuring that the
data in the MDR is accurate, complete, and up-to-date. This can include
processes for data validation, data reconciliation, and data
maintenance.

. *Data Access and Integration*: This component provides access to the MDR
data through a user interface and API's, and integrates the MDR data
with other systems and applications.

. *Data Security*: This component is responsible for ensuring that the
data in the MDR is secure, and that only authorized users can access it.
This can include tasks such as authentication, access control, and
encryption.

. *Data Management*: This component is responsible for managing the data
in the MDR, including tasks such as data archiving, data backup, and
data recovery.

. *Monitoring and Analytics*: This component is responsible for monitoring
and analysing the performance of the MDR system, and for providing
insights into the data to help improve the system.

. *Services layer*: This component is responsible for providing services
such as, indexing, search and query functionalities over the data.


All these components should be integrated and work together to provide a
comprehensive and efficient MDR system for entity deduplication. The
system should be scalable and flexible enough to handle large amounts of
data and adapt to changing business requirements.



23 changes: 23 additions & 0 deletions docs/antora/modules/ROOT/pages/glossary.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
== Glossary

*Airflow* - an open-source platform for developing, scheduling, and
monitoring batch-oriented pipelines. The web interface helps manage the
state and monitoring of your pipelines.

*Metabase* - is the BI tool with the friendly UX and integrated tooling
to let you explore data gathered by running the pipelines available in
Airflow.

*Cellar* - is the central content and metadata repository of the
Publications Office of the European Union

*TED-SWS* - is a pipeline system that continuously converts the public
procurement notices (in XML format) available on the TED Website into
RDF format and publishes them into CELLAR

*DAG* - (Directed Acyclic Graph) is the core concept of Airflow,
collecting Tasks together, organized with dependencies and relationships
to say how they should run. The DAGS are basically the pipelines that
run in this project to get the public procurement notices from XML to
RDF and to be published them into CELLAR.

37 changes: 20 additions & 17 deletions docs/antora/modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,20 +1,6 @@
= TED-RDF Conversion Pipeline Documentation

The TED-RDF Conversion Pipeline, which is part of the TED Semantic Web Services, aka TED-SWS system, provides tools an infrastructure to convert TED notices available in XML format into RDF. This conversion pipeline is designed to work with the https://docs.ted.europa.eu/rdf-mapping/index.html[TED-RDF Mappings].

== Quick references for users

* xref:mapping_suite_cli_toolchain.adoc[Installation and usage instructions for the Mapping Suite CLI toolchain]
* link:{attachmentsdir}/ted-sws-architecture/index.html[Preliminary project architecture (in progress)^]


== Developer pages

xref:demo_installation.adoc[Installation instructions for development and testing for software engineers]

xref:attachment$/aws-infra-docs/TED-SWS-AWS-Infrastructure-architecture-overview-v0.9.pdf[TED-SWS AWS Infrastructure architecture overview v0.9]

xref:attachment$/aws-infra-docs/TED-SWS Installation manual v2.0.2.pdf[TED-SWS AWS Installation manual v2.0.2]
The TED-RDF Conversion Pipeline, is part of the TED Semantic Web Services (TED-SWS system) and provides tools an infrastructure to convert TED notices available in XML format into RDF. This conversion pipeline is designed to work with the https://docs.ted.europa.eu/rdf-mapping/index.html[TED-SWS Mapping Suites] - self containing packages with transformation rules and resources.

== Project roadmap

Expand All @@ -23,12 +9,29 @@ xref:attachment$/aws-infra-docs/TED-SWS Installation manual v2.0.2.pdf[TED-SWS A

| Phase 1 | The first phase places high priority on the deployment into the OP AWS Cloud environment.| August 2022 | xref:attachment$/FATs/2022-08-29-report/index.html[2022-08-29 report] | 29 August 2022 | link:https://github.com/OP-TED/ted-rdf-conversion-pipeline/releases/tag/0.0.9-beta[0.0.9-beta]
| Phase 2 | Provided that the deployment in the acceptance environment is successful, the delivery of Phase 2 aims to provide the first production version of the TED SWS system. | Nov 2022 | xref:attachment$/FATs/2022-11-22-TED-SWS-FAT-complete.html[2022-11-22 report] | 20 Nov 2022 | https://github.com/OP-TED/ted-rdf-conversion-pipeline/releases/tag/1.0.0-beta[1.0.0-beta]
| Phase 3 | This phase delivers the documentation and components and improvements that could not be covered in the previous phases. | Feb 2023 | --- | --- | ---

| Phase 3 | This phase delivers the documentation and components and improvements that could not be covered in the previous phases. | Feb 2023 | xref:attachment$/FATs/2023-02-20-TED-SWS-FAT-complete.html[2023-02-20 report] | 21 Feb 2023 | https://github.com/OP-TED/ted-rdf-conversion-pipeline/releases/tag/1.1.0-beta[1.1.0-beta]
|===






//
// == Quick references for Developers
//
// == Quick references for DevOps
//
// == Quick references for TED-SWS Developers
//
// * xref:mapping_suite_cli_toolchain.adoc[Installation and usage instructions for the Mapping Suite CLI toolchain]
// * link:{attachmentsdir}/ted-sws-architecture/index.html[Preliminary project architecture (in progress)^]
//
//
// == Developer pages
//
// xref:demo_installation.adoc[Installation instructions for development and testing for software engineers]
//
// xref:attachment$/aws-infra-docs/TED-SWS-AWS-Infrastructure-architecture-overview-v0.9.pdf[TED-SWS AWS Infrastructure architecture overview v0.9]
//
// xref:attachment$/aws-infra-docs/TED-SWS Installation manual v2.5.0.pdf[TED-SWS AWS Installation manual v2.5.0]
Loading

0 comments on commit 2b455af

Please sign in to comment.