-
Notifications
You must be signed in to change notification settings - Fork 3
Home
JEJodesty edited this page Sep 6, 2024
·
98 revisions
Update readme and refactor
- Review Designs within the context of Data Sovereignty;
- Research CLI wrapper alternative to CDKTF
- Review Database Sharding within the context of Data Products’ data: https://aws.amazon.com/what-is/database-sharding/
- Review Value of data
- Verify CATs’ Project Update: Factory & Executor components; Invoice, Order, Function, Executor, & BOM Block Designs, Structure’s Ray Cluster Deployment on Kubernetese, BOM Initialization, CAT Node & Node Design
- Research Dynamic Terraform Providers for Plant Deployments
- Verify CATs’ Project Update: Structure Block Design, Data Service Collaboration Diagram, Ray Integration
- Watched Computational Governance Panel
- Review Ray documentation for InfraFunction Hooks
- Research Open Contracting Data Standard with respect to Data Product Teams: https://standard.open-contracting.org/latest/en/
- 1/22:
- Updated CATs integration tests and demo
- Resolved dependency bug
- Verify CATs’ Project Update: Process Component, Sub-Process Logging, Executor & Function Components
- 1/23:
- Updated Documentation and Demo
- Added License and Packaging for CATs
- Verify CATs’ Project Update: s3 & CoD Integration
- 1/24:
- Updated Documentation & Refactor
- CATs Data Verification
- Verify CATs’ Project Update: Updating Order Structure, Node, Service & Structure Components
- 1/25 - 1/26:
- Updated Documentation & Refactor
- Update Factory
- Reviewed Novo Nordisk Data Mesh Platform discussion
- Verify CATs’ Project Update: CATs s3 cache, BOM ERD
- Included Ubuntu 20.04 Installation Update
- Refactored CATs
- Researched CAT cache access management
- Research Economic Adapters for CATs from Ocean Protocol
- Research multilevel linked-list for CATs’ subgraph
- Research bidirectional mapping supports multilevel linked-list for CATs’ subgraph
- Consider Transducers for CAT MIMO
- Updated PR Template
- Review Model-Driven Engineering: https://en.wikipedia.org/wiki/Model-driven_engineering
- 2/12: Drafted CATs capabilities in GitHub Project and reviewed Activity Artifact Policy
- 2/13: Reviewed implementation examples of Data Contracts
- 2/14 - 2/15:
- Reviewed Data Mesh Roundtable Discussions about Data Contracts and “Agile” Data Products
- Attended Protocol Labs project updates
- 2/16:
- Research System Architecture layers and wrote notes as Data Contract Article for CATs
- Wrote Article: What does a CATs data contract do?
Data Mesh Resources:
- “Inside a Data Contract”: https://www.youtube.com/watch?v=ye4geXMuJKs
- “Agile in Data”: https://www.youtube.com/watch?v=XnstATam0jM
- Data Contract Articles: https://www.datamesh-architecture.com/#data-contract
Data Contract Implementation Examples:
- https://blog.det.life/data-contracts-a-guide-to-implementation-86cf9b032065
- https://levelup.gitconnected.com/create-a-web-scraping-pipeline-with-python-using-data-contracts-281a30440442
- https://docs.soda.io/soda/data-contracts.html
System Architecture:
- 2/19 - 2/21: Contextualize value of BOM within the context of Data as a Product that contains Data Contracts
- 2/22 - 2/23: Updated Readme informed by examples of Data Assets within the context of Machine-Readable Cataloging
- Wrote Article: What is a Content-Addressed Data Asset (CADA)?
Resources:
- https://www.loc.gov/marc/umb/um01to06.html
- https://docs.informatica.com/data-engineering/data-engineering-quality/10-2-1/business-glossary-guide/glossary-content-management/business-term-links/data-asset.html
- 2/26: Researched Digital Asset Management related Data Contracts and Data Mesh Registry & considered a Rule Asset being used for Network Policies in addition to Attribute Quality
- 2/27: Considered Data & Rule Assets for Data Mesh Registry Artifact Schema
- https://towardsdatascience.com/the-data-mesh-registry-a-window-into-your-data-mesh-20dece35e05a
- https://docs.informatica.com/data-engineering/data-engineering-quality/10-2-1/business-glossary-guide/glossary-content-management/business-term-links/data-asset.html
- https://docs.informatica.com/data-engineering/data-engineering-quality/10-2-1/business-glossary-guide/glossary-content-management/business-term-links/rule-asset.html
- 2/28: Verify CATs Executing FaaS on PaaS
- 2/29: Review Domain-Oriented Ownership with respect to Conway's law
- 3/1: Review Data Column Lineage value to in establishing Domain-Oriented Ownership in CATs Invoice in a way that makes BOM’s searchable and discoverable
- Wrote Article: What makes CATs Governable by including BOMs within Data Product’s Data Contracts?
- 3/4: Contextualize “Data as an asset” with CATs Architecture
- 3/5: Contextualize Data sovereignty with “Data as an asset” for CATs Data Mesh
- 3/6: Contextually map Data Contract initialization roles to cross-functional Operational Model for Data Products
- 3/7: Contextually map "Fractional Ownership" of "Decentralized Data Objects" ("DDOs" / "Data Assets") to "Data as an asset" and Data Partioning / Sharding
- 3/8: Contextualize Ocean Protocol & CATs Architecture with prosumption
- Wrote Article: “Data as an asset” enables the consumption, production, prosumption of Data Assets on CATs Data Mesh
Resources:
- 3/11: Review ocean Data NFTs and Datatokens and relate Hexagonal architecture to Data Contract SLAs
- https://docs.oceanprotocol.com/developers/contracts/datanft-and-datatoken
- https://en.wikipedia.org/wiki/Non-fungible_token#:~:text=A%20non%2Dfungible%20token%20(NFT,to%20be%20sold%20and%20traded.
- https://en.wikipedia.org/wiki/Hexagonal_architecture_(software)
- https://blog.thepete.net/blog/2020/09/25/service-templates-service-chassis/
- 3/12
- Review Bidirectional Mapping libraries for Data Mesh BOM graph for cataloged representation
- Review Custom Terraform Provider software that enables providers to be written in any language for CATs Plant
- Review Model-Based System Engineering relate it to knowledge organization infrastructure
- https://medium.com/block-science/knowledge-networks-and-the-politics-of-protocols-af81ad0fa2d4
- 3/13 - 3/15
- Review 4 kinds of data moats within the context of data’s strategic value as a “data asset”
- Review Model-driven architecture approaches for CATs Architectural Quantum
- Review ocean.py for integration into CATs’ ingress and egress
- Review “Commons-based peer production” for CAT Node
- Updated CATs architecture, readme, and interactive logs
- 3/18 - 3/20: Contextualize data contract creation team' role responsibilities into modern roles
- 3/21 - 3/22:
- Contextualize modern data contract creation team' role responsibilities into CATs Control and Action planes for an operational model for the placement of Data Stewardship responsibilities
- Communicate the value of Data Contract inclusion in BOM bellow.
- Wrote Article: Why should Data Contracts be included in CATs' BOMs for Data Product development on a Data Mesh?
- 3/25 - 3/27:
- Review Bitol's Data Contract examples
- Review Data Contract Implementation Guide for CATs
- Review Wayfair's differentiation of Data Mesh design lean personas: Data Producer, Data Consumers, and Data Engineer
- Contextualize IBMs Knowledge Catalog as a DataOps tool in consideration of KMS and CAT-aloging
- Review Statistical Process Control to contextualize the inclusion of https://www.soda.io/
- Research data product life cycle to contextualize Data Product Manager, Data Steward, and Data Engineer
- 3/28 - 3/29:
- Contextualize a Federated Governance Model within Federated Computational Governance
- Research types of Data Valuation to avoid confirmation bias
- Contextualize Event-Driven programming for CAT Plant and Dataflow programming for CATs Process and InfrFunction
- 4/1 - 4/3:
- Research "Stewardship Fractalization" and System Architecture facilitating it and relate it to Data Stewardship
- Consider Dynamic Prompt engineering using Generative AI via an LLM for contextualization of CAT Actions that fulfill Data Contracts. These actions are initially contextualized with CATs Architectural Quantum.
- 4/4 - 4/5:
- Distinguish between Quantitative and Qualitative design drivers for end-user and data product consumer contextualization
- Consider a Streaming Data Integration for Stewardship lineage views and metadata management
- Consider each CAT Factory Client a Stream Broker as a Consumer and Producer (https://www.scaler.com/topics/kafka-broker/)
- Consider "IoT Edge-Application Management" for "IoT Analytics"
- Consider a language like SISAL for stream dataflow composition
- Review updated CoD Architecture
- Research how Analysts supports domain-oriented ownership in consideration of data procurement
- Research "telemetry data pipelines" from starburst.io to contextualize a “telemetry-catalog” in "data lakehouse" as a flatfile store
- Consider Data Engineering pain points to split and contextualize Data Engineering within CATs Action & Control Planes
- Distinguish the difference between Data Lakes and Data Federation for the implementation of a data lake solution
- Research GPT to communicate a Federated Governance Model designed to be a GPT
- 4/15:
- Contextualize LLMs and Generative AI for Fractional Data Stewardship
- Reduce scope of Data Product with Stewarship Fractionaliztion dApp steps
- Note Dataflow Programming for CAT
- Note Data Flow Architecture for project definition
- Note Statistical process control (SPC) (as user responsibility)
- 4/16-18:
- Apply Manufacturing Production to BOM design with respect to an Engineering & Manufacturing BOM types
- Contextualize CAT orders with a Transfer (Network) Function
- Contextually lift Mesh partnership with Model-Based Institution Design (MBID) and relate to Model-Based System Engineering in preperation to include Computer-Aided Governance in CATs3
- Research LangGraph for CAT Mesh reification
- Note different types of SBOMs for each CAT Arch Quantum SubComponents
- Consider Multi-Agent Conversation for row-wise business function
- https://arxiv.org/abs/2308.08155
- https://github.com/langchain-ai/langgraph/blob/main/examples/multi_agent/multi-agent-collaboration.ipynb
- Consider Pro-curation for on-boarding information onto CAT Mesh reflective of Prosumer
- Research integrating langgraph
tool_node
into CAR (Content-Addressable Router) - Research integrating langgraph
tool_executor
into CATs' Executor - Review LangChain Agents for Network Governance Reification graph state tracking
- Review "Knowledge Networks and the Politics of Protocols" within the context of Roles
- Review "Engineering for Legitimacy"
- Review Scaled and Leveled Stewardship
- Review contextualization of responsibilities based on Prompt Engineering Questions & general responsibilities of "Fractional Stewards"
- Review Project Roadmap for Stewardship Fractalization in consideration for CAT Team Dynamics
- Review Fractional Stewardship MVP approach in consideration to publishing a Policy development in Steward profile to Agent Nodes in LangGraph. These Policies are front loaded as "algorithmic suggestions"
- Note Abstract User Stories as application references
- Review "DAO Governance Model" for comparison to Federated Computational Governance Model
- Consider Marketing Steward using Prompt Engineering / partial input being a "Comparison Table/Matrix summarizing different Stewardship Organization/Solutions missions/purposes, designs and features"
- Removing s3 cache from CATs and replace with local storage solution
- Removed s3 cache from CATs and replaced with local storage solution
- Research adaptive Retrieval Augmented Generation (aRAG)
- Reviewed KMS-identity for integration into CATs
- Read "A Language for Studying Knowledge Networks: The Ethnography of LLMs"
- The Plant is a Transfer Function that accepts an Order as Input and produces and Output with by executing Function (Process) with Executor (Actuator) that executes a Process(es). The Plant exposes the control variable (u(t)) for Control Feedback Loop and the Function (Process) produces the process variable (y(t)). The Process Variable is the Statistical Process Control of CATs Dataset I/O (Ingress/Egress)
- Docker can be executed within an Alpine Linux Docker container ["Docker in Docker" (DinD)] for upcoming cadCAD's nested Block executions as a summation of the control variable (u(t)) that configure CATs Data Product and the summation of the process variable (y(t))
- Note: "Integral windup particularly occurs as a limitation of physical systems, compared with ideal systems, due to the ideal output being physically impossible (process saturation: the output of the process being limited at the top or bottom of its scale, making the error constant)."
- Concern: "Integral windup particularly occurs as a limitation of physical systems, compared with ideal systems, due to
the ideal output being physically impossible (process saturation: the
output of the process being limited at the top or bottom of its scale, making the error constant)."
- https://en.wikipedia.org/wiki/Integral_windup
- Alleviated by "A CAT at its core is a unit of computational work specified by the triplet 1) what the input is, 2) what does the computation, and 3) what the output is. Controllers require feedback, which is currently outside of the scope of a single cat. Any cyclic orchestration must be external to CATs." - BlockScience
- Alpine Linux Docker can be the execution paradigm of cadCAD and CATs Plant because they can run as Docker inside Docker "DinD” to and functionally map cadCAD multi-dimensional blocks to CAT Functions
- Review RAG stewardship fictionalization context
- Review Software Governance with respect to fractional stewardship
- Consider a Stewardship Profile that maps to agents within a Multi-agent system
- Consider roles as Architectural Responsibilities with respect to RolePlayer
- Review Docker workload on-boarding for cat Refactor
- Cosider homestar (Everywhere Computer network) for IPVM inclusion for "resilience, certainty or portability"
- Updated Bacalhau Node and refactor for CoD interoperability for CATs v3
- Exposed ingress and egress to action plane via Process with a interoperable integration point for CATs v3
- Included data product disciplines to CATs Architectural Quantum for CATs v3
- Implement InfraStructure Sub Component separately
- IPFS daemon initiated by CAT Node
- partially implement function for applying sbom
- Refactor infrafunction composes Processor & Plant and Infrstructure
- Installed KMS locally for cat/rid Integration
- bring your own cache otherwise it is local (bg: Expanso introduces breaking changes to bacalhau without stable release)
- reviewed pid controller to suggest the Architectural purpose of CATs in consideration of a Monad
- Expressed Architectural purpose of CATs data mesh as a function
- Described design for Multi-Agent Collaboration (MAC) for CATs using Content-Addressable Router (CAR)
- Contextualize CATs explanation with "Data Mesh Architecture: Interoperability, Co-Operation, and Co-Regulation"
- Contextualize PID Controler within the context of CATs & cadCAD
- Consider a Monadic Transfomer for CATs
- Investigating kubray helm chart not being available due to Github server connectivity issues and trying
alternative chart sources:
- Old Sources:
- Alternative Sources:
- Conclusion: Consider a chart directly from a cloned kuberay repo offline or hosting elsewhere
- Research Jan.AI, ChatRTX, and Nvidia RTX Technology and RAG explanation to Generate CATs with the inclusion of CAT BOM CIDs in publications
- Investigated machine types for ChatRTX and JanAI using AWS EC@ and GCP Compute; Chose GCP Compute for both due to lack of EC2 availability for G3 instances (GPU graphics instances)
- AWS: Determined hardware configuration for AWS as well as bootstrapped dependencies for and built JanAI on Ubuntu 22.04 and ChatRTX on Windows Server 2022 from source on EC2 G3 instances
- Built and launched Jan.AI on Ubuntu 22.04 from source & within Docker container on GCP with the attempt to get the back-end and front end service online to be demo-able
- Requested appropriate CPU count increase to the GPU memory instance request to G4 and above
- Built and launched Jan.AI on Ubuntu 22.04 from source & within Docker container on GCP with the attempt to get the back-end and front end service online to be demo-able
- GCP: Repeated the ChatRTX part of AWS experiment on GCP - Configured dependency installation scripts N1 VMs and Nvidia T4 Graphics cards [*] with appropriate Nvidia Drivers and CUDA toolkits
- Nvidia hardware verification: EfficientNet v1 For TensorFlow 2.6
- [*] Although off requirement for ChatRTX, the chosen GPU hardware would save money while bootstrapping only in consideration of Jan.ai for which the GPU hardware is appropriate for
- Result: Services online but did not render html
- AWS: Determined hardware configuration for AWS as well as bootstrapped dependencies for and built JanAI on Ubuntu 22.04 and ChatRTX on Windows Server 2022 from source on EC2 G3 instances
- Attended "Multimodal RAG with Milvus + Ray Data + UForm tiny encoders"
- Built and launched Jan.AI on Ubuntu 22.04 with Docker Compose using GCP Compute with the attempt to get the back-end and front end service online to be demo-able
- Developed Jan.ai dependency installation scripts for N1 VMs and Nvidia T4 Graphics cards with appropriate Nvidia Drivers and CUDA toolkits
- Created firewall rule for Jan.AI and ChatRTX to render html in browser via TCP
- Result: Discovered false Port exposure in documentation to set the URL and got Rest API server online with rendered an html report page without fron-end rendering html
- https://github.com/janhq/jan/issues/2005
- https://github.com/janhq/jan/issues/2838
- https://github.com/janhq/jan/issues/2806
- https://github.com/janhq/jan/pull/2093
- Built and launched Jan.AI on Ubuntu 22.04 with both Docker Compose and Kubernetes in GCP with the attempt to get the back-end and front end service online to be demo-able
- Forked Jan.ai to BlockScience Organization and reverted to previous commit due to maintainers removing Docker and Kubernetes build resources and associated documentation
- Refactored Jan.ai dependency installation scripts for N1 VMs and Nvidia T4 Graphics cards with appropriate
Nvidia Drivers and CUDA toolkits
- Sourced pre-configured Nvidia driver and CUDA dependency installation scripts from GCP in preparation for GCP support case for front-end html not rendering
- Created new firewall rule for Jan.AI to render html in browser via TCP to accommodate all ports in consideration of false Port exposure documentation
- Result: Rest API online and front-end with rendered an html pages BUT with a "Preparing Update..." [Loading] bug despite fulfilling documentation requirements in Ubuntu 22.04
- Launched Jan.AI on Kubernetes with a local Kind cluster on Ubuntu 20.04
- Result: Rest API server & front-end online and rendering html
- Launched pre-built Jan.AI on Ubuntu 22.04 using GCP Compute
- Result:
- Verified Deployment Rest API & Back_End online and front-end with rendered an html pages WITHOUT a "Preparing Update..." [Loading] bug WITH rendered front-end html!
- Result:
- Launched pre-built Jan.AI on Ubuntu 22.04 using GCP Compute
- Result: Objective pivot - jan.ai doesn't process multiple in a directory unless scaled or feature added to do so (potential feature request)
- Built ChatRTX Backend from source on Windows Server 2022 (based on Windows 10) using GCP Compute GCP with the attempt to get the back-end and front end service online to be demo-able.
- Tried all the previously supported machine types in the release notes: https://docs.nvidia.com/vgpu/qvws/latest/qvws-release-notes-google-cloud-platform/index.html#cloud-service-images
- This was an attempt to recreate a deprecated version of GCP's NVIDIA RTX Virtual
Workstation (on Windows Server 2019) using the following:
- https://docs.nvidia.com/vgpu/qvws/latest/qvws-quick-start-guide-google-cloud-platform/index.html
- https://docs.nvidia.com/vgpu/qvws/latest/qvws-release-notes-google-cloud-platform/index.html
- https://console.cloud.google.com/marketplace/details/nvidia/nvidia-quadro-vws-win2019
- https://github.com/NVIDIA/ChatRTX/tree/release/0.4.0
- Noteted NVIDIA's Canvas requirements:
- Reviewed ChatRTX User Guide: https://nvidia.custhelp.com/app/answers/detail/a_id/5542 https://nvidia.custhelp.com/app/answers/detail/a_id/5105/related/1/session/L2F2LzEvdGltZS8xNzIzMDYyMDQ1L2dlbi8xNzIzMDYyMDQ1L3NpZC9mVURYdUptbm9STGJYSFNQZlVRZUZGZXk1M2t2JTdFbXluWVZ3cmd4WmtsY1YlN0VCWHlET1hhSFdzTDRKMzk5SzVROWVNTzhScU8xTlpwR1U5eTlwbXFZOVVEX3JQaE8xNVlZcjBJWm1QNGFMVkY4VEd4bFJJX0VZWlR3JTIxJTIx
- Results: Executed the following to validate the backend installations to be prompted to "Enter your query (type 'exit' to quit): " with the input
query
"Unable to generate a response: ..."
- Built ChatRTX Application from source as well as downloaded and launched pre-built ChatRTX Application on Windows Server 2022 (based on Windows 10) using GCP Compute GCP with the attempt to get the back-end and front end service online to be demo-able.
- Tried all the previously supported machine types in the release notes as well as A-Series which is not recommended in the following document: https://docs.nvidia.com/vgpu/qvws/latest/qvws-release-notes-google-cloud-platform/index.html#cloud-service-images
- Resources:
- Build App: https://github.com/NVIDIA/ChatRTX/tree/release/0.4.0/ChatRTX_App
- Build App from front-end development: https://github.com/NVIDIA/ChatRTX/tree/release/0.4.0/ChatRTX_App/ChatRTXUI
- Note ChatRTX Features: https://github.com/NVIDIA/ChatRTX/tree/release/0.4.0/ChatRTX_APIs
- Result: False report that my GPU was not a member of the "Ampere and above GPU Family"
- Conclusion: NVIDIA App is strictly enforcing GeForce RTX 3xxx, 4xxx, and above which is not available on GCP or AWS