Skip to content
JEJodesty edited this page Jul 1, 2024 · 98 revisions

Week 0 (1/2 - 1/5):

Update readme and refactor

Week 1 (1/8 - 1/12):

Week 2 (1/15 - 1/19):

  • Research Dynamic Terraform Providers for Plant Deployments
  • Verify CATs’ Project Update: Structure Block Design, Data Service Collaboration Diagram, Ray Integration
  • Watched Computational Governance Panel
  • Review Ray documentation for InfraFunction Hooks
  • Research Open Contracting Data Standard with respect to Data Product Teams: https://standard.open-contracting.org/latest/en/

Week 3 (1/22 - 1/26)

  • 1/22:
    • Updated CATs integration tests and demo
    • Resolved dependency bug
    • Verify CATs’ Project Update: Process Component, Sub-Process Logging, Executor & Function Components
  • 1/23:
    • Updated Documentation and Demo
    • Added License and Packaging for CATs
    • Verify CATs’ Project Update: s3 & CoD Integration
  • 1/24:
    • Updated Documentation & Refactor
    • CATs Data Verification
    • Verify CATs’ Project Update: Updating Order Structure, Node, Service & Structure Components
  • 1/25 - 1/26:
    • Updated Documentation & Refactor
    • Update Factory
    • Reviewed Novo Nordisk Data Mesh Platform discussion
    • Verify CATs’ Project Update: CATs s3 cache, BOM ERD

Week 4 (1/29 - 2/2):

  • Included Ubuntu 20.04 Installation Update
  • Refactored CATs
  • Researched CAT cache access management
  • Research Economic Adapters for CATs
  • Research multilevel linked-list for CATs’ subgraph

Week 5 (1/5 – 1/9):

Week 6 (2/12 - 2/16):

  • 2/12: Drafted CATs capabilities in GitHub Project and reviewed Activity Artifact Policy
  • 2/13: Reviewed implementation examples of Data Contracts
  • 2/14 - 2/15:
    • Reviewed Data Mesh Roundtable Discussions about Data Contracts and “Agile” Data Products
    • Attended Protocol Labs project updates
  • 2/16: Research System Architecture layers and wrote notes as Data Contract Article for CATs

Data Mesh Resources:

Data Contract Implementation Examples:

System Architecture:

What does a CATs data contract do?

Data Contract is a Service agreement between producer and consumer with attribute dependencies for downstream Data Product evolution with dedicated lineage. A data contracts can provide tools for collaboration on data requirements as product promises within a shared context that inform policies for contract mutation along side Data Product releases.

A Data Contract’s Product Promises are what the data product owners expect from its data consumer up to the latest block of information. These promises may include data quality, data usage terms and conditions, schema, service-objectives, billing, etc. Data Contract policy mutation cascaded downstream as bilateral lateral agreements that “forks” lineage as a new Data Product version. For Example, the consumer takes the risk of violating privacy. Data Producers create Data Contracts on Organization and Business Terms. The consumer of the Data Contract enforces Governance policies. The producer of the Data Contract owns the Data Product if the organization doesn't have a Governance body.

Governance policies are discussed between data producers and consumers to agree upon data producer requirements. These discussions should culminate into an amenable data structure / dataset. Structured data is conducive for pre-exsisting policies and less discussion. Less structured data will need more discussion and policy feedback loops. We need a Minimal Viable Data Contract that includes what is necessary for an organization to govern with the means of supporting policy feedback loops in a way that guides discussion in a way that balances the prioritization of outcomes and methodologies.

Interdependent data domains have sub-domains with identifiers for generating Data Products. CAT Nodes will generate and execute Virtual Data Products composed as Data Contracts that enforce Data Provenance using Bills of Materials (BOMs). BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on a Mesh network of CAT Nodes. Data Contracts will contain a BOMs lineages and act as block headers for Content-Addressed Transformers (CATs) instances. Data Products are mutated during policy feedback loops informed collaborators communicating their understanding of knowledge domains. Collaborators will identify knowledge sub-domains with references and will access sub-domains using Content-Addresses. Access is federated via knowledge domain hierarchies in abstractions that enable collaborators to participate in governance cycles by leveraging their understanding of knowledge.

Week 7 (2/19 - 2/23):

  • 2/19 - 2/21: Contextualize value of BOM within the context of Data as a Product that contains Data Contracts
  • 2/22 - 2/23: Updated Readme informed by examples of Data Assets within the context of Machine-Readable Cataloging

Resources:

What is a Content-Addressed Data Asset (CADA)?

CATs Data Products will consist of Data Contracts with provenance as executable BOMs lineages and act as block headers for Content-Addressed Transformers (CATs) instances that contain Data Assets. BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on a Mesh network of CAT Nodes that can contain Data Assets. A data asset may be a system or application output” (dataset) that holds value for an organization or individual that is accessible. Data Assets’ value can derive from the data's potential for generating insights, informing decision-making, contributing to product development, enhancing operational efficiency, or creating economic benefits through its sale or exchange.

CATs' Content-Addressed Data Assets are processed, sold / exchanged / published on CAT’s Data Mesh via CAT Nodes subsumed by downstream CATs’ Data Products. Data Assets consist of the following:

  • Data Domains - "A predefined or user-defined Model repository object that represents the functional meaning of an" attribute "based on column data or column name such as" account identification.
  • Data Objects - Content-Addresses of data sources used to extract metadata for analysis.

Week 8 (2/26 - 3/1):

What makes CATs Governable by including BOMs within Data Product’s Data Contracts?

CATs are governable and support multi-disciplinary collaboration of data processing because CATs Architectural Quantum is an abstract governance model enforced within CATs’ Bills-Of-Materials (BOMs) for which knowledge domains are represented as meta-data of data provenance records to support domain ownership.

BOMs are unique identifiers that provide the means of data production (assembly) and transportation as reproducible lineage contextualised by knowledge domains for federated governance. BOMs consist of Data Product service Orders of data processing that are Invoiced as fulfillments of service agreements specified by Data Product’s Data Contracts

Federated Governance is enabled by BOMs due the following. The domain specific data provenance BOMs establish the legitimacy of network policy changes suggested by Fractional Stewards of Data Products by enabling them to identify data quality issues at their source on a self-serviced Data Platform of many Data Products.

CATs enables Fractional Stewards to do this because historical data production is contextualised and reproducible within the scope of their knowledge domains by design during development and production as a requirement of a service Order. CATs data processes submitted by their service Orders are Invoiced to fulfil agreements within Data Products’ Data Contracts.

A Data Contract is a Service agreement between producer and consumer with attribute dependencies for downstream Data Product evolution with dedicated lineage. Governance policy discussions between data producers and consumers in policy feedback loops about data production requirements should balance the prioritization of outcomes and methodologies should culminate into an amenable data structure / dataset.

Week 9 (3/4 - 3/8):

“Data as an asset” enables the consumption, production, prosumption of Data Assets on CATs Data Mesh

“Data as an asset” 0. conceptually emphasizes recognizing and treating data as a strategic investment organizations can leverage to deliver future economic benefits by enabling the consumption, production, prosumption of ones own data as an asset. Prosumption is the consumption and production of value, "either for self-consumption or consumption by others, and can receive implicit or explicit incentives from organizations involved in the exchange." 1.

The availability of high-quality and domain-specified Data Assets enables Data Products on inter-connected CAT Nodes on CATs Data Mesh to facilitate cross-functional asset utilization within Data Initiatives in a way that support Data Sovereignty. "Data sovereignty refers to a group or individual’s right to control and maintain their own data, which includes the collection, storage, and interpretation of data." 2.

Registering and cataloging CATs can accelerate innovative Data Product creation and facilitate Data Sovereignty in Data Initiatives that discover and utilize “Data as an asset”. Data Products use and operate CAT Nodes to produce, register, and catalog “Data as an asset” as searchable and discoverable Data Assets by Data Products on CATs Data Mesh. CATs Data Assets enhances strategic, operational, and analysis informed decision-making by using BOMs as feedback loop mechanisms across domains in a way that suits specific collaborative contexts across organizations.

Resources:

Week 10 (3/11 - 3/15):

Week 11 (3/18 - 3/22):

  • 3/18 - 3/20: Contextualize data contract creation team' role responsibilities into modern roles
  • 3/21 - 3/22:
    • Contextualize modern data contract creation team' role responsibilities into CATs Control and Action planes for an operational model for the placement of Data Stewardship responsibilities
    • Communicate the value of Data Contract inclusion in BOM bellow.

Why should Data Contracts be included in CATs' BOMs for Data Product development on a Data Mesh?

Data Product(s) CATs are executed by Data Contract deployments with Data Provenance by Ordering CATs that are Invoiced within Bills of Materials (BOMs). BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on CAT Mesh. Data Contracts will contain BOM lineages and act as headers for Content-Addressed Transformer instances (CATs). Their inclusion of BOMs are necessary for organizations to rapidly mutate Data Products alongside discussions that affect product outcomes and development methodologies. Data Products are mutated during stakeholder discussions about Data Contracts with respect to network policy / protocol. These discussions continuously inform multi-lateral Data Product agreements between stakeholders and collaborators that produce and consume data using BOMs as feedback loop mechanisms for (re)submitting CAT Orders. These discussions should also culminate into a CAT Order of amenable data structures / datasets for which processing is Invoiced within BOMs. Collaborators can participate in data provenance supported product development by Content-Addressing Data as an Asset.

Week 12 (3/25 - 3/29):

Week 13 (4/1 - 4/5):

Week 14 (4/8 - 4/12):

Week 15 (4/15 - 4/19):

Week 16 (4/22 - 4/26):

Week 17 (4/29 - 5/3):

Week 18 (5/6 - 5/10):

  • Removing s3 cache from CATs and replace with local storage solution

Week 18 (5/13 - 5/17):

Week 19 (5/19 - 5/24):

  • The Plant is a Transfer Function that accepts an Order as Input and produces and Output with by executing Function (Process) with Executor (Actuator) that executes a Process(es). The Plant exposes the control variable (u(t)) for Control Feedback Loop and the Function (Process) produces the process variable (y(t)). The Process Variable is the Statistical Process Control of CATs Dataset I/O (Ingress/Egress)
  • Docker can be executed within an Alpine Linux Docker container ["Docker in Docker" (DinD)] for upcoming cadCAD's nested Block executions as a summation of the control variable (u(t)) that configure CATs Data Product and the summation of the process variable (y(t))
  • Note: "Integral windup particularly occurs as a limitation of physical systems, compared with ideal systems, due to the ideal output being physically impossible (process saturation: the output of the process being limited at the top or bottom of its scale, making the error constant)."
  • Concern: "Integral windup particularly occurs as a limitation of physical systems, compared with ideal systems, due to the ideal output being physically impossible (process saturation: the output of the process being limited at the top or bottom of its scale, making the error constant)."
    • https://en.wikipedia.org/wiki/Integral_windup
    • Alleviated by "A CAT at its core is a unit of computational work specified by the triplet 1) what the input is, 2) what does the computation, and 3) what the output is. Controllers require feedback, which is currently outside of the scope of a single cat. Any cyclic orchestration must be external to CATs." - BlockScience
  • Alpine Linux Docker can be the execution paradigm of cadCAD and CATs Plant because they can run as Docker inside Docker "DinD” to and functionally map cadCAD multi-dimensional blocks to CAT Functions

Week 20 (5/27 - 5/31):

Week 21 (6/3 - 5/7):

  • Updated Bacalhau Node and refactor for CoD interoperability for CATs v3
  • Exposed ingress and egress to action plane via Process with a interoperable integration point for CATs v3

Week 22 (6/10 - 6/14):

  • Included data product disciplines to CATs Architectural Quantum for CATs v3
  • Implement InfraStructure Sub Component separately

Week 23 (6/17 - 6/26):

  • IPFS daemon initiated by CAT Node
  • partially implement function for applying sbom
  • Refactor infrafunction composes Processor & Plant and Infrstructure
  • Installed KMS locally for cat/rid Integration

Week 24 (6/24 - 6/28):

  • bring your own cache otherwise it is local (bg: Expanso introduces breaking changes to bacalhau without stable release)

What is the Architectural purpose of CATs as a Function a.k.a. the ACG Monad?

  • Governance Plane: z(t)
    • is for the Stewardship of a Data Product Supply Network of CATs represented as a Directed Acyclic Graph of Data Product Supply
    • Control Plane: y(t)
      • is for the Networking of what is Produced as a result of Science & Engineering CATs
      • Action Plane: x(t)
        • is for the Science & Engineering of Data Transformation as Computational Processing, a.k.a. CATs

Multi-Agent Collaboration (MAC) for CATs using Content-Addressable Router (CAR)

  • Design Description
    • CATs and LangGraphs integration can enable a row wise business function as a Chart Tool of Multi-Agent Collaboration (MAC) if CAT Orders act as a Transfer (Network) Function implemented as an OOP Command Pattern for which CATs Ingress and Egress sub-processes can be executed by CATs’ Content-Addressable Router (CAR).
    • Architectural Considerations: CATs can inform business decisions given the following:
      • Action Plane: x(t)
        • CAT Functions can be defined as LangGraph Call Tools executed by LangGraphs Tool Node
        • CAT Factory produces CAT Executors integrated with LangGraphs Tool Executor.
      • Control Plane: y(t) [aka Content-Addressable Router (CAR)]
        • CAR integrated with LangGraphs Router.
        • cadCAD (Network) Policies aka “Algorithmic Suggestions” can be deployed on LangGraphs Agent Nodes with specified Domain-Name references as Rule Asset RIDs
      • Governance Plane: z(t)
        • A GreyBox Model for as a feature parameterized Tensor Field with process variable (PV) as label
        • The business function is a CATs Control & Action Matrix - a 2 dimensional representation of 3 dimensional space
Clone this wiki locally