distributed-system-analysis · dbutenhof · Jun 1, 2023 · May 30, 2023 · May 31, 2023 · May 31, 2023
diff --git a/docs/Server/API/README.md b/docs/Server/API/README.md
@@ -7,11 +7,14 @@ The Pbench Server provides a set of HTTP endpoints to manage user
 authentication and curated performance information, called "dataset resources"
 or just "datasets".
 
-The [V1 API](V1/README.md) provides a functional interface that's not quite
-standard REST. The intent is to migrate to a cleaner resource-oriented REST
-style for a future V2 API.
+The [V1 API](V1/README.md) provides a REST-like functional interface.
 
-The Pbench Server primarily uses serialized JSON parameters (mimetype
-`application/json`) both for request bodies and response bodies. A few
-exceptions use raw byte streams (`application/octet-stream`) to allow uploading
-new datasets and to access individual files from a dataset.
+The Pbench Server APIs accept parameters from a variety of sources. See the
+individual API documentation for details.
+1. Some parameters, especially "resource ids", are embedded in the URI, such as
+`/api/v1/datasets/<resource_id>`;
+2. Some parameters are passed as query parameters, such as
+`/api/v1/datasets?name:fio`;
+3. For `PUT` and `POST` APIs, parameters may also be passed as a JSON
+(`application/json` content type) request payload, such as
+`{"metadata": {"dataset.name": "new name"}}`
diff --git a/docs/Server/API/V1/README.md b/docs/Server/API/V1/README.md
@@ -132,20 +132,3 @@ through the `directories` list of each
 The [inventory](inventory.md) API returns the raw byte stream of any regular
 file within the directory hierarchy, including log files, postprocessed JSON
 files, and benchmark result text files.
-
-### Example
-
-```
-    def directory(request, url: str, name: str = "/", level: int = 0):
-        ls = request.get(url).get_json()
-        print(f"{'  '*level}{name}")
-        for d in ls.directories:
-            directory(request, level + 1, d.name, d.url)
-        for f in ls.files:
-            print(f"{'  '*(level+1)}{f.name})
-            bytes = request.get(f.url)
-            # display byte stream:
-            # inline on terminal doesn't really make sense
-
-    directory(request, "http://host.example.com/api/v1/contents/<dataset>/")
-```
diff --git a/docs/Server/API/V1/contents.md b/docs/Server/API/V1/contents.md
@@ -75,33 +75,33 @@ Pbench returns a JSON object with two list fields:
 {
     "directories": [
         {
-            "name": "1-iter1",
+            "name": "dir1",
             "type": "dir",
-            "uri": "http://hostname/api/v1/datasets/contents/<id>/1-iter1"
+            "uri": "http://hostname/api/v1/datasets/<id>/contents/dir1"
         },
         {
-            "sysinfo",
+            "name": "dir2",
             "type": "dir",
-            "uri": "http://hostname/api/v1/datasets/contents/<id>/sysinfo"
+            "uri": "http://hostname/api/v1/datasets/<id>/contents/dir2"
         },
         ...
     ],
     "files": [
         {
-        "name": ".iterations",
+        "name": "file.txt",
         "mtime": "2022-05-18T16:02:30",
         "size": 24,
         "mode": "0o644",
         "type": "reg",
-        "uri": "http://hostname/api/v1/datasets/inventory/<id>/.iterations"
+        "uri": "http://hostname/api/v1/datasets/<id>/inventory/file.txt"
         },
         {
-        "name": "iteration.lis",
+        "name": "data.lis",
         "mtime": "2022-05-18T16:02:06",
         "size": 18,
         "mode": "0o644",
         "type": "reg",
-        "uri": "http://hostname/api/v1/datasets/inventory/<id>/iteration.lis"
+        "uri": "http://hostname/api/v1/datasets/<id>/inventory/data.lis"
         },
         ...
     ]
@@ -126,7 +126,12 @@ The `type` codes are:
 {
     "name": "reference-result",
     "type": "sym",
-    "uri": "http://hostname/api/v1/datasets/contents/<id>/sample1"
+    "uri": "http://hostname/api/v1/datasets/<id>/contents/linkresult"
+},
+{
+    "name": "directory",
+    "type": "dir",
+    "uri": "http://hostname/api/v1/datasets/<id>/contents/directory"
 }
 ```
 
@@ -154,6 +159,6 @@ URI returning the linked file's byte stream.
     "size": 18,
     "mode": "0o644",
     "type": "reg",
-    "uri": "http://hostname/api/v1/datasets/inventory/<id>/<path>"
+    "uri": "http://hostname/api/v1/datasets/<id>/inventory/<path>"
 }
 ```
diff --git a/docs/Server/API/V1/inventory.md b/docs/Server/API/V1/inventory.md
@@ -11,8 +11,10 @@ The resource ID of a Pbench dataset on the server.
 
 `<path>`    string \
 The resource path of an item in the dataset inventory, as captured by the
-Pbench Agent packaging; for example, `/metadata.log` for the dataset metadata,
-or `/1-default/sample1/result.txt` for the default first iteration results.
+Pbench Agent packaging; for example, `/metadata.log` for a file named
+`metadata.log` at the top level of the dataset tarball, or `/dir1/dir2/file.txt`
+for a `file.txt` file in a directory named `dir2` within a directory called
+`dir1` at the top level of the dataset tarball.
 
 ## Request headers
 
@@ -25,6 +27,23 @@ E.g., `authorization: bearer <token>`
 `content-type: application/octet-stream` \
 The return is a raw byte stream representing the contents of the named file.
 
+`content-disposition: <action>; filename=<name>` \
+This header defines the recommended client action on receiving the byte stream.
+The `<action>` types are either `inline` which suggests that the data can be
+displayed "inline" by a web browser or `attachment` which suggests that the data
+should be saved into a new file. The `<name>` is the original filename on the
+Pbench Server. For example,
+
+```
+content-disposition: attachment; filename=pbench-fio-config-2023-06-29-00:14:50.tar.xz
+```
+
+or
+
+```
+content-disposition: inline; filename=data.txt
+```
+
 ## Resource access
 
 * Requires `READ` access to the `<dataset>` resource
@@ -48,7 +67,7 @@ exist.
 
 `415` **UNSUPPORTED MEDIA TYPE** \
 The `<path>` refers to a directory. Use
-`/api/v1/dataset/contents/<dataset><path>` to request a JSON response document
+`/api/v1/dataset/<dataset>/contents/<path>` to request a JSON response document
 describing the directory contents.
 
 `503`   **SERVICE UNAVAILABLE** \

diff --git a/docs/Server/API/V1/relay.md b/docs/Server/API/V1/relay.md
@@ -0,0 +1,145 @@
+# `POST /api/v1/relay/<uri>`
+
+This API creates a dataset resource by reading data from a Relay server. There
+are two distinct steps involved:
+
+1. A `GET` on the provided URI must return a "Relay manifest file". This is a
+JSON file (`application/json` MIME format) providing the original tarball
+filename, the tarball's MD5 hash value, a URI to read the tarball file, and
+optionally metadata key/value pairs to be applied to the new dataset. (See
+[Manifest file keys](#manifest-file-keys).)
+2. A `GET` on the Relay manifest file's `uri` field value must return the
+tarball file as an `application/octet-stream` payload, which will be stored by
+the Pbench Server as a dataset.
+
+## URI parameters
+
+`<uri>` string \
+The Relay server URI of the tarball's manifest `application/json` file. This
+JSON object must provide a set of parameter keys as defined below in
+[Manifest file keys](#manifest-file-keys).
+
+## Manifest file keys
+
+For example,
+
+```json
+{
+    "uri": "https://relay.example.com/52adfdd3dbf2a87ed6c1c41a1ce278290064b0455f585149b3dadbe5a0b62f44",
+    "md5": "22a4bc5748b920c6ce271eb68f08d91c",
+    "name": "fio_rw_2018.02.01T22.40.57.tar.xz",
+    "access": "private",
+    "metadata": ["server.origin:myrelay", "global.agent:cloud1"]
+}
+```
+
+`access`: [ `private` | `public` ] \
+The desired initial access scope of the dataset. Select `public` to make the
+dataset accessible to all clients, or `private` to make the dataset accessible
+only to the owner. The default access scope if the key is omitted from the
+manifest is `private`.
+
+For example, `"access": "public"`
+
+`md5`: tarball MD5 hash \
+The MD5 hash of the compressed tarball file. This must match the actual tarball
+octet stream specified by the manifest `uri` key.
+
+`metadata`: [metadata key/value strings] \
+A set of desired Pbench Server metadata key values to be assigned to the new
+dataset. You can set the initial resource name (`dataset.name`), for example, as
+well as assigning any keys in the `global` and `user` namespaces. See
+[metadata](../metadata.md) for more information.
+
+In particular the client can set any of:
+* `dataset.name`: [default dataset name](../metadata.md#datasetname)
+* `server.origin`: [dataset origin](../metadata.md#serverorigin)
+* `server.archiveonly`: [suppress indexing](../metadata.md#serverarchiveonly)
+* `server.deletion`: [default dataset expiration time](../metadata.md#serverdeletion).
+
+`name`: The original tarball file name \
+The string value must represent a legal filename with the compound type of
+`.tar.xz` representing a `tar` archive compressed with the `xz` program.
+
+`uri`: Relay URI resolving to the tarball file \
+An HTTP `GET` on this URI, exactly as recorded, must return the original tarball
+file as an `application/octet-stream`.
+
+## Request headers
+
+`authorization: bearer` token \
+*Bearer* schema authorization assigns the ownership of the new dataset to the
+authenticated user. E.g., `authorization: bearer <token>`
+
+`content-length` tarball size \
+The size of the request octet stream in bytes. Generally supplied automatically by
+an upload agent such as Python `requests` or `curl`.
+
+## Response headers
+
+`content-type: application/json` \
+The return is a serialized JSON object with status information.
+
+## Response status
+
+`200`   **OK** \
+Successful request. The dataset MD5 hash is identical to that of a dataset
+previously uploaded to the Pbench Server. This is assumed to be an identical
+tarball, and the secondary URI (the `uri` field in the Relay manifest file)
+has not been accessed.
+
+`201`   **CREATED** \
+The tarball was successfully uploaded and the dataset has been created.
+
+`400`   **BAD_REQUEST** \
+One of the required headers is missing or incorrect, invalid query parameters
+were specified, or a bad value was specified for a query parameter. The return
+payload will be a JSON document with a `message` field containing details.
+
+`401`   **UNAUTHORIZED** \
+The client is not authenticated.
+
+`502`   **BAD GATEWAY** \
+This means that a problem occurred reading either the manifest file or the
+tarball from the Relay server. The return payload will be a JSON document with
+a `message` field containing more information.
+
+`503`   **SERVICE UNAVAILABLE** \
+The server has been disabled using the `server-state` server configuration
+setting in the [server configuration](./server_config.md) API. The response
+body is an `application/json` document describing the current server state,
+a message, and optional JSON data provided by the system administrator.
+
+## Response body
+
+The `application/json` response body consists of a JSON object containing a
+`message` field. On failure this will describe the nature of the problem and
+in some cases an `errors` array will provide details for cases where multiple
+problems can occur.
+
+```json
+{
+    "message": "File successfully uploaded"
+}
+```
+
+or
+
+```json
+{
+    "message": "Dataset already exists",
+}
+```
+
+or
+
+```json
+{
+    "message": "at least one specified metadata key is invalid",
+    "errors": [
+        "Metadata key 'server.archiveonly' value 'abc' for dataset must be a boolean",
+        "improper metadata syntax dataset.name=test must be 'k:v'",
+        "Key test.foo is invalid or isn't settable",
+    ],
+}
+```
diff --git a/docs/Server/API/V1/upload.md b/docs/Server/API/V1/upload.md
@@ -82,13 +82,22 @@ a message, and optional JSON data provided by the system administrator.
 
 ## Response body
 
-The `application/json` response body consists of a JSON object giving a detailed
-message on success or failure:
+The `application/json` response body consists of a JSON object containing a
+`message` field. On failure this will describe the nature of the problem and
+in some cases an `errors` array will provide details for cases where multiple
+problems can occur.
+
+```json
+{
+    "message": "File successfully uploaded"
+}
+```
+
+or
 
 ```json
 {
     "message": "Dataset already exists",
-    "errors": [ ]
 }
 ```