-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRFC 33 - Repositories #33
Comments
This is very interesting. It made me think of storage in general... Most applications use data structures, config files, databases, or any combination thereof. These represent the What if there was a standard
access
var repo = require("multirepo").open("~/myrepo");
users = repo.access("userdb"); // userdb is a relational datastore
users.query('select * from users;');
notes = repo.access("notes"); // notes is an append only list
notes.append({note: "Hello world!", "ts": new Date()}) meta layer
logical layerThese are the data structures the application reads and writes.
concrete layer
batteries included
|
This document is an attempt at specifying a generalized spec for repositories
(the git and ipfs kind) in the hope to arrive at a generalized set of good
practices. I am new to many intricacies and edge cases, so please suggest
important additions.
Many tools and systems create data repositories with configuration files. The
classic example is
git
and other VCS tools, but many systems do. Applicationchanges will necessarily bring about changes to the format of the repository
(e.g. changing how data is stored, or changing the data itself). These should
NEVER cause any data loss on users, and great care must be given to ensure
all format changes are accompanied with migration tools.
As applications grow, different types of storage media or execution strategies
may optimize different use cases e.g. "flat files inside
.git
for git cli"vs "git repo inside database for fast web server access". No matter the use
case, application implementations should be able to operate with different
concrete versions of the repository, provided suitable adaptors exist. This
separation reduces the cost of writing new storage implementations, and new
application implementations.
Terms:
repo
- a repository, a structured collection of objects, with aconfiguration. e.g. a git repo. an ipfs repo
config
- a repository configuration which holds repository optionsdatabase
- a database which holds the repository data. this may bea key value store (leveldb), a collection of flat files (
.git/objects
), arelational db (SQLite), etc.
address
- is an identifier of the location of the repository e.g.:/Users/jbenet/foo/bar/.git
,https://github.com/jbenet/go-ipfs
.format
- the way in which the data is organizedrepo version
- a number identifying the repo's format. It is easiest ifthese are monotonically increasing integers.
concrete repo
- the actual repo as stored in storage media. (e.g. posixfiles inside
.git/
, files and a leveldb, s3, ...)virtual repo
- a virtual object which can be manipulated. The distinctionbetween
concrete
andvirtual
is here so that tools may be written mostlyto operate on the
virtual
repo, and remain compatible with a variety ofrepo implementations, through adapters.
Notes
repo version
MUST be included, and remain readable by all toolsattempting to modify
repo
(e.g. migration tools from any version mustbe able to determine the current version of the repo. Example:
.go-ipfs/version
)config
anddatabase
may both be implemented by the same storage system,but it is recommended they are separate, as one might define the other.
Synchronization
Operations on a
repo
may require synchronization (some repos may supportconcurrent modifications, and others require complete mutual exclusion). Repos
which require mutual exclusion must support mechanisms to achieve it (e.g.
.git/index.lock
). These may be granular or coarse, but repo formats must definesynchronization, so various implementations can ensure safe, concurrent access.
Migrations
Migrations: through the lifetime of an application,
repo
formats may requirechanges. These changes must be accompanied a "migration tool", which convert
the data from the most recent format version, to the new one. Ideally the
upgrade can be applied in both directions (
old <-> new
). For example, onemay end up with a set of "repo version migration" tools like the following:
It is advised that repo migration tools are
virtual repo
tools (that is, implementedto work with the logical repo, instead of the concrete data). This makes it possible
to reuse migration tools across repo implementations (with proper adapters).
This may not be possible always, repo-format-specific migration tools might
be necessary.
human inspection
Repo implementations must include tools to transform the data to a human
readable/inspectable structure. This makes it possible for users and application
implementors to debug problems. These tools may be easiest to implement with
a human readable repository format, and conversion tools to convert to/from it.
corruption
corrupted
- an unexpected, invalid data staterecovery
- the process of "uncorrupting" a repository. may not be possible....
The text was updated successfully, but these errors were encountered: