diff --git a/docs/src/local/commit.rst b/docs/src/local/commit.rst index ea34e695..abea6858 100644 --- a/docs/src/local/commit.rst +++ b/docs/src/local/commit.rst @@ -75,5 +75,12 @@ by running ``titan checkout``:: Here you can see that we stopped the container, swapped out the data, and started it again. And with that, we're back to the original commit we created. +.. warning:: + + The titan infrastructure has not currently been built for scale, and while it + should work fine for dozens of commits, creating hundreds or thousands of + commits or repositories may have adverse effects on the system. This will be + addressed in a future release. + For information on more additional local workflows, see the :ref:`local` section. diff --git a/docs/src/remote/addremove.rst b/docs/src/remote/addremove.rst index 2fcd3c63..d1112511 100644 --- a/docs/src/remote/addremove.rst +++ b/docs/src/remote/addremove.rst @@ -3,4 +3,20 @@ Adding and Removing Remotes =========================== -Coming Soon! +Each repository can have zero or more remotes configured. To add a remote, +use :ref:`cli_cmd_remote_add`:: + + $ titan remote add s3://bucket/path myrepo + +Remotes are specified as URIs, with the first portion defining the provider +(s3 in the above case), and the rest being specific to that provider. By +default, the remote is named `origin`, but you can also assign remotes +names (required when you have more than one remote). + +To get a list of remotes, use :ref:`cli_cmd_remote_ls`:: + + $ titan remote ls hello-world + REMOTE PROVIDER + origin s3 + +Remotes can be removed with the :ref:`cli_cmd_remote_rm` command. diff --git a/docs/src/remote/clone.rst b/docs/src/remote/clone.rst index 990fff93..63a95f9d 100644 --- a/docs/src/remote/clone.rst +++ b/docs/src/remote/clone.rst @@ -3,4 +3,23 @@ Cloning Repositories ==================== -Coming soon! +The :ref:`cli_cmd_clone` command will create a new repository using the +configuration from a remote. It is equivalent to creating a new repository with +an identical configuration, adding the remote, and pulling down the latest +commit:: + + $ titan clone s3://titan-data-demo/hello-world/postgres hello-world + +The docker configuration is persisted with each commit, so the local repository +uses whatever the configuration was as of the last commit. + +.. note:: + + There is not currently any way to override the docker configuration, such + as wanting to use a different port or network configuration. This + capability will be added in a future release. + +.. note:: + + The clone command currently always uses the latest commit. The ability to + select a specific commit to use will be added in a future release. diff --git a/docs/src/remote/provider/s3.rst b/docs/src/remote/provider/s3.rst index 61cb0815..0302cc64 100644 --- a/docs/src/remote/provider/s3.rst +++ b/docs/src/remote/provider/s3.rst @@ -3,4 +3,32 @@ S3 Provider =========== -Coming Soon! +The S3 provider uses S3 to store commits remotely in a S3 bucket. Each commit +is stored as an tar archive, with the commit metadata attached as object +metadata. The URI format is:: + + s3:/// + +Commits will be created at ``//.tar.gz``. The commit +metadata will be stored at the ```` level. + +The AWS credentials are pulled using the default AWS credential chain at +the time you do the push or pull operation. So you must have the +``AWS_*`` environment variables set, or use your ``~/.aws`` configuration. +Because the S3 provider uses the standard AWS SDK, all variations of credentials +should be supported, including specifying a profile with ``AWS_PROFILE``. +To pull a commit, you will need ``s3:GetObject`` permissions. To push a commit, +you will need ``s3:PutObject`` permissions. + +.. note:: + + The S3 provider doesn't currently support MFA (multi factor authentication). + If you have a ``session_token`` in your AWS config, then operations will + fail with an error message indicating the access key could not be found. + +The S3 provider relies on basic AWS APIs to implement its functionality, and +as such has limited scalability. For example, finding the latest commit requires +listing all objects, getting metadata iteratively for each one, and comparing +the result. It should only be used for storing relatively small numbers of +commits. Improving this will require a new provider that includes a robust +metadata layer on top of the base S3 functionality. diff --git a/docs/src/remote/provider/ssh.rst b/docs/src/remote/provider/ssh.rst index ed2f5bab..8c3f7a2c 100644 --- a/docs/src/remote/provider/ssh.rst +++ b/docs/src/remote/provider/ssh.rst @@ -3,4 +3,29 @@ SSH Provider ============ -Coming Soon! +The SSH provider enables commits to be stored on any server where the user +has remote access over SSH. The URI syntax is:: + + ssh://user[:password]@host/path + +The ``path`` is interpreted as an absolute path unless it starts with ``~``. +The SSH provider uses rsync to copy files to subdirectories within the path, +with metadata being stored in a ``metadata.json`` file. This means that pushes +are always full sends, as titan is sending data to a newly created directory. +Pulls, on the other hand, may not need to transfer all data depending on what +state exists locally. + +The system must have ``sudo`` installed and the user must have ``sudo`` +privileges for running rsync. This enables file ownership and permissions to be +set properly. + +If ``password`` is not specified, then the user will be prompted for a password +at the time they do the push or pull operation. Future enhancements will +include the ability to specify a SSH key file instead of using passwords. + +Like the S3 provider, the SSH provider has inherent scalability limitations. For +example, finding the latest commit requires listing all commits in the path, +reading the metadata file for each, and comparing the result. It should only be +used for storing relatively small numbers of commits. Improving this will +require a new provider that includes a robust metadata layer on top of the base +SSH functionality. diff --git a/docs/src/remote/pushpull.rst b/docs/src/remote/pushpull.rst index cea1be17..bb362319 100644 --- a/docs/src/remote/pushpull.rst +++ b/docs/src/remote/pushpull.rst @@ -3,4 +3,26 @@ Pushing and Pulling =================== -Coming Soon! +The :ref:`cli_cmd_push` and :ref:`cli_cmd_pull` commands form the basis of +sharing data via remote repositories. Unlike git, however, they transfer +only a single commit to or from the remote repository. There is no notion +of pulling "all commits" and then checking out one of them. + +Exactly how each provider transfers data varies. Some, like S3, only do full +transfers of data as a single archive. Others, like SSH, will use rsync to +hopefully transfer only incremental data. + +Each push and pull runs asynchronously in the context of the titan container, +but progress is streamed to the command line while it's being run. In rare +cases, it's possible to exit the CLI while the operation is ongoing. In this +case, you may get a message that an operation is in progress. You can either +wait for it to complete, or abort it with :ref:`cli_cmd_abort`. + +While the CLI does not provide full-fledged management of remotes (something +specific to each remote), you can get a list of remote commits using the +:ref:`cli_cmd_remote_log` command. + +.. note:: + + Titan doesn't currently retry after network errors or other interruptions. + This capabilities will be added in a future release. diff --git a/docs/src/remote/remote.rst b/docs/src/remote/remote.rst index 72723747..d9fc34bb 100644 --- a/docs/src/remote/remote.rst +++ b/docs/src/remote/remote.rst @@ -3,7 +3,37 @@ Remote Repositories =================== -Coming Soon! +While managing data locally on your laptop is all well and good, part of the +power of source code management is the ability to share that data with +others. Much like git, Titan has the notion of `remote repositories` that +act as an endpoint for push and pull. + +There are a few important general things to be aware of: + +* Titan commits do not have a strict dependency on the previous commit from + which it was created. Because they are much larger, we allow them to be + pushed and pulled independently. For this reason, :ref:`cli_cmd_clone` and + :ref:`cli_cmd_pull` will not pull down `all commits`, only the one specified + by the user. +* Titan does not support the notion of merging. While concepts like tagging + and branching will be added over time, generically merging data at the + on-disk level is not possible. +* Different remote providers have different performance characteristics, + including whether they support incremental transfers. Some will + always to a full data transfer, while others have a means to identify + only changed blocks. Titan is designed to work with small + datasets (<10GB), using it for anything remotely large may have adverse + effects on the system. + +.. warning:: + + Titan currently ships with two very basic providers, the :ref:`remote_provider_s3` + and the :ref:`remote_provider_ssh`. These are only introductory providers, designed + to have zero dependencies on external software. But as such, they + will face challenges across security, performance, and robustness when + operated at scale in an enterprise setting. As Titan matures, we will be + working with the community and partners to help develop remote providers + with more robust capabilities. .. toctree:: :maxdepth: 1