This is essentially a distributed data management system.
Below is a high-level depiction
- Data and changes are almost immutable (pretty much like git - or based on it - ). This immutability should come at minimal storage overhead (only historic delta's are saved, along with the full most up-to-date version)
- Entry-oriented (document oriented): All the related data to an entry (Meta-data, actual content and any file hierarchy) is self contained in the entry object. That is compressed (e.g. entry_abc.tar.bz2). Each entry has a pointer to the respective module (including its specific version requirements) that has the format specifications and can interact with the entry.
- Graph-enabled. Entries can have pointers (relations + attributes) to each other.
- Modular data handlers: Modules that can ingest various types of data: Text, PDF, eBook, Media (Images, Audio, Video), JSON, CSV, Xml, Doc, Excel, Presentation, Avro, ORCFile, Parquet, Sqlite3 db ... etc. A Data handler module also includes the necessary code to parse, enrich and expose data/meta data.
- Data entries are indexed so they can be searched. (using indexing technologies such as lucene)
- Security: Proper access management and an entry is signed by its author
- API Interface that encompasses all the features. High-performance Non-blocking. (netty?)
- UI that interacts with the API; Views (list/Gride/cards/tree/graph), search interface, CRUD.
- Distributed and decentralized. Just like/on top of your regular file/email/cms systems.
- Does not rely on a Database engine; but rather has the data in textual representation (e.g. JSON)
- Miners are additional components responsible for two things: finding new content and improving the quality/meta information of it. They mine local and external sedras looking for new content and new perspective on already existing content.
Each module would have:
- Indexer / Analyzer
- Schema definitions
- Reader/Writer/Converter
- Presentation Layer
- Binaries / Logic / code
Miners search for content and people according to the user's rules/criteria. e.g. Get me my friends updates that are of certain topics (tags), deliver messages. They maintain the meta data from all connected instances and from other miners.
- Push: events (distribute to registered inboxes and meta index). a registered inbox receives direct messages and also registeres rules for interesting events. every event/entity is signed by its author. encryption is optional. circls: of people (actually a social graph) along with classification (friends, acquaintance, family ...etc)
- Poll/crawl: to meta index. A meta index keeps reference to entities.
- app-store for modules (with all historic versions, so content referring to a specific version can get that respective version)
- Realtime-ness. TBD
- Events chaining (consequetive events where order need to be maintained). comments/concurrent document editing ... etc.
- key/trust-management? should we use existing pgp?
- identity managment is handled by oauth and friends.
- data: The Data repository, it holds all data entities and belongings/files. The way folders are arranged under this is left to the user.
- meta: The story of the data and how it became to be
- modules: aka code/binaries: individual modules that can manage (read/write/present) certain types of entities.
- events: aka messages / changes: The individual change events that would contribute to the data repository.
- revisions: aka attic. The old delta copies of the data. It should be possible to delete or compact this.
- indexes: Indexes for data and metadata to make finding, querying data faster. This can be regenerated at any time. it could be (RDBMS like sqlite, NoSQL, Lucene indexes ..etc).
sample
├── data
│ ├── family
│ ├── files
│ │ ├── books
│ │ ├── documents
│ │ └── media
│ ├── friends
│ ├── interests
│ ├── links
│ ├── messages
│ ├── personal
│ └── structures
│ ├── financial
│ │ └── invoices
│ ├── inventory
│ ├── pages
│ └── tickets
├── events
├── indexes
├── meta
│ ├── history
│ ├── lineage
│ └── schema
├── modules
├── people
└── revisions
- Changes and EntityProperty are part-of Entity
- LinkProperty is part-of Link
In a NoSQL document oriented setup; there would be two or three main schema.
- Actor
- Person
- Group
- Content
- Plain
- Rich
- Structured
- Message
- Wiki
- Binary
- Container
- Folder
- Tarball/zip
- Embedded
- External
Datum formats vary, here are few examples: JSON, XML, Avro, Markdown, Xhtml, Images, Audio, Video, PDF, Documents, Binary, Sqlitedb
- Create
- Update
Nothing is determined yet. Going now though a research phase.
Edraj can be implemented in a number of technology combinations:
- C++/QT fro Native cross-platfrom desktop/mobile app
- Java/Scala
- Php/yii2
- NodeJS/IO.JS
At this point wer are considering Php/Yii2.