diff --git a/docs/federate.md b/docs/federate.md new file mode 100644 index 00000000..461fa966 --- /dev/null +++ b/docs/federate.md @@ -0,0 +1,140 @@ +## When to use local query federation +There are two main reasons to deploy local query federation: + +- **Case 1**: one-way federation. You have (at least) one [local neurobagel +node](infrastructure.md) and you want your users to be able to search +the data in the local node alongside all the publicly +visible data in the neurobagel network. +- **Case 2**: internal federation. You have two or more local neurobagel +nodes (e.g. for data from different groups in your institute) +and you want your local users to search across all of them. + +![Local federation scenarios](imgs/local_federation_architecture.jpg) + +Note that these cases are not mutually exclusive. +Any local neurobagel nodes you deploy will only be visible to users +inside of your local network (internal federation). + +## When not to use local query federation +Query federation is not necessary, if you: + +- **only want to query public neurobagel nodes**: + Existing public nodes in the neurobagel network are accessible + to everyone via our public query tool (e.g. on [query.neurobagel.org](https://query.neurobagel.org/)), + meaning you can run federated queries over these graph databases without any additional local setup. +- **you only want to search a single neurobagel node**: + If you only have one local node that you want to query, + it is easier to directly query the node-API of this node. + In that case, all you have to do is follow the [deployment instructions + for a neurobagel node](infrastructure.md) and you are good to go. + +## Setting up for local federation +Federated graph queries in neurobagel are provided by the federation API (`f-API`) service. +The neurobagel `f-API` takes a single user query and then sends it to every +neurobagel node API (`n-API`) it is aware of, collects and combinesthe responses, +and sends them back to the user as a single answer. + +!!! note + + Make sure you have at least one [local `n-API` configured and running](infrastructure.md) + before you set up local federation. If you do not have any local + `n-APIs` to federate over, you can just use our public query tool directly at [query.neurobagel.org](https://query.neurobagel.org/). + +In your command line, create and navigate to a new directory where you will keep the configuration +files for your new `f-API`. In this directory, create two files: + +### `fed.env` environment file + +Create a text file called `fed.env` to hold environment variables needed for the `f-API` deployment. +Let's assume there are two local nodes already running on different servers of your institutional network, and you want to set up federation across both nodes: + +- a node named `"node_archive"` running on your local computer on port `8000` and +- a node named `"node_recruitment"` running on a different computer with the local IP `192.168.0.1`, listening on the default http port `80`. +In your `fed.env` file you would configure this as follows: + +``` {.bash .annotate title="docker-compose.yml"} +# Configuration for f-API +# List of known local node APIs: (node_URL, node_NAME) +LOCAL_NB_NODES=(http://localhost:8000, node_archive) (http://192.168.0.1, node_recruitment) +# Define the port that the f-API will run on INSIDE the docker container (default 8000) +NB_API_PORT=8000 +# Define the port that the f-API will be exposed on to the host computer (and likely the outside network) +NB_API_PORT_HOST=8080 +# Chose the docker image tag of the f-API (default latest) +NB_API_TAG=latest + +# Configuration for query tool +# Define the URL of the f-API as it will appear to a user +API_QUERY_URL=http://localhost:8080 # (1)! +# Chose the docker image tag of the query tool (default latest) +NB_QUERY_TAG=latest +# Chose the port that the query tool will be exposed on the host and likely the network (default 3000) +NB_QUERY_PORT_HOST=3000 +``` + +1. When a user users the graphical query tool to query your + f-API, these requests will be sent from the users machine, + not from the machine hosting the query tool. + + Make sure you set the `API_QUERY_URL` in your `fed.env` + as it will appear to a user on their own machine + - otherwise the request will fail.. + +Each node to be federated over is described in the variable `LOCAL_NB_NODES` by a comma-delimited tuple of the form `(node_URL, node_NAME)`. + +You can add one or more local nodes to the list of nodes known to your `f-API` in this way. +Just adjust the above code snippet according to your own deployment, and store it in a file called `fed.env`. + + +### `docker-compose.yml` docker config file + +Create a second file called `docker-compose.yml`. +This file describes the required services, ports and paths +to launch the `f-API` together with a connected query tool. + +!!! danger "Make sure you have a recent version of docker compose installed" + + Some Linux distributions come with outdated versions of `docker` and + `docker compose` installed. Please make sure you install `docker` + as described in the [official documentation](https://docs.docker.com/engine/install/). + +Copy the following snippet into your `docker-compose.yml` file. +You should not have to change anything about this file. +All local configuration changes are done in the `fed.env` file. + +``` {.yaml .annotate title="docker-compose.yml"} +version: "3.8" + +services: + federation: + image: "neurobagel/federation_api:${NB_API_TAG:-latest}" + ports: + - "${NB_API_PORT_HOST:-8000}:${NB_API_PORT:-8000}" + + environment: + - LOCAL_NB_NODES=${LOCAL_NB_NODES} # (1)! + - NB_API_PORT=${NB_API_PORT:-8000} + query: + image: "neurobagel/query_tool:${NB_QUERY_TAG:-latest}" + ports: + - "${NB_QUERY_PORT_HOST:-3000}:3000" + environment: + - API_QUERY_URL=${API_QUERY_URL:-http://localhost:8000/} +``` + +1. We maintain a list of public neurobagel nodes + [here](https://github.com/neurobagel/menu/blob/main/node_directory/neurobagel_public_nodes.json). + By default every new `f-API` will lookup this list + on startup and include it in the list of nodes to + federate over. + This also means that you do not have to manually + configure public nodes, i.e. you **do not have to explicitly add them** to the `LOCAL_NB_NODES` variable) in your `fed.env` file. + + +## Launch f-API and query tool +Once you have created your `fed.env` and `docker-compose.yml` files +as described above, you can simply launch the services by running + +`docker compose --env-file fed.env up -d` + +from the same directory. \ No newline at end of file diff --git a/docs/imgs/local_federation_architecture.jpg b/docs/imgs/local_federation_architecture.jpg new file mode 100644 index 00000000..ecd8ae4c Binary files /dev/null and b/docs/imgs/local_federation_architecture.jpg differ diff --git a/docs/infrastructure.md b/docs/infrastructure.md index 346dc049..1c3020e7 100644 --- a/docs/infrastructure.md +++ b/docs/infrastructure.md @@ -1,10 +1,11 @@ -# SysAdmin +These instructions are for a sysadmin looking to +deploy a new Neurobagel node locally in an institute or lab. +A local **neurobagel node** includes the **neurobagel API** and +a **graph backend** to store the harmonized metadata. -## Introduction -These instructions are for a sysadmin looking to deploy Neurobagel locally in an institute or lab. -A local neurobagel deployment includes the neurobagel API, -a graph backend to store the harmonized metadata, -and optionally a locally hosted graphical query interface. +To make searching the neurobagel node easier, +you can optionally also set up +a **[locally hosted graphical query interface](#deploy-a-graphical-query-tool).** ![The neurobagel API and graph backend](imgs/nb_architecture.jpg) @@ -119,7 +120,7 @@ Below are all the possible Neurobagel environment variables that can be set in ` _** `NB_GRAPH_ADDRESS` should not be changed from its default value (`graph`) when using docker compose as this corresponds to the preset container name of the graph database server within the docker compose network._ -_‡ See section [Using a graphical query tool to send API requests](#a-note-on-using-a-graphical-query-tool-to-send-api-requests)_ +_‡ See section [Deploy a graphical query tool](#deploy-a-graphical-query-tool)_ For a local deployment, we recommend to **explicitly set** at least the following variables in `.env` @@ -142,35 +143,6 @@ For a local deployment, we recommend to **explicitly set** at least the followin For more information, see [Docker's environment variable precedence](https://docs.docker.com/compose/environment-variables/envvars-precedence/). -### A note on using a graphical query tool to send API requests -The `NB_API_ALLOWED_ORIGINS` variable defaults to an empty string (`""`) when unset, meaning that your deployed API will only be accessible via direct `curl` requests to the URL where the API is hosted (see [this section](#test-the-new-deployment) for an example `curl` request). - -However, in many cases you may want to make the API accessible by a frontend tool such as our [browser query tool](https://github.com/neurobagel/query-tool). -To do so, you must explicitly specify the origin(s) for the frontend using `NB_API_ALLOWED_ORIGINS` in `.env`. -For detailed instructions regarding the query tool see [Running cohort queries](query_tool.md). - -For example, the [`.template-env`](https://github.com/neurobagel/api/blob/main/.template-env) file in the Neurobagel API repo assumes you want to allow API requests from a query tool hosted at a specific port on `localhost` (see the [Docker Compose section](#docker-compose)). - -??? example "More examples of `NB_API_ALLOWED_ORIGINS`" - ``` bash title=".env" - # do not allow requests from any frontend origins - NB_API_ALLOWED_ORIGINS="" # this is the default value that will also be set if the variable is excluded from the .env file - - # allow requests from only one origin - NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org" - - # allow requests from 3 different origins - NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org https://localhost:3000 http://localhost:3000" - - # allow requests from any origin - use with caution - NB_API_ALLOWED_ORIGINS="*" - ``` - -??? note "For more technical deployments using NGINX" - - If you have configured an NGINX reverse proxy (or proxy requests to the remote origin) to serve both the API and the query tool from the same origin, you can skip the step of enabling CORS for the API. - For an example, see https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/. - ### Docker Compose To spin up the API and graph backend containers using Docker Compose, @@ -189,9 +161,6 @@ Or, if you want to ensure you always pull the latest Docker images first: docker compose pull && docker compose up -d ``` -By default, this will also deploy a local version of the [Neurobagel graphical query tool](https://github.com/neurobagel/query-tool). -If using the default port mappings, you can reach your local query tool at [http://localhost:3000](http://localhost:3000) once it is running. - ## Setup for the first run When you launch the graph backend for the first time, @@ -611,3 +580,88 @@ and click "Try it out" and then "Execute" to execute a query. !!! note For very large databases, requests to the API using the interactive docs UI may be very slow or time out. If this prevents test queries from succeeding, try setting more parameters to enable an example response from the graph, or use a `curl` request instead. + + +## Deploy a graphical query tool +To give your users an easy, graphical way to +query your new local neurobagel node, +you have two options: + +### As part of local federation +Use this option if any of the following apply! You: + +- already have deployed other local neurobagel nodes +that you want your users to query alongside the new node +- want your users to be able to query +all public neurobagel nodes together with your new node +- plan on adding more local neurobagel nodes in the +near future that you will want to query alongside your newly created node + +In this case, skip directly to the page on +setting up [local query federation](federate.md). + +### As a standalone service +Use this option if you + +- plan on only deploying a single node +- want your users to only search data +in the new node you deployed + +In this case, you need to deploy the query tool +as a standalone docker container. + + +```bash +docker run -d -p 3000:3000 --env API_QUERY_URL=http://localhost:8000/ --name query_tool neurobagel/query_tool:latest +``` + +??? todo + + Update docker example to use a specific version + once https://github.com/neurobagel/planning/issues/64 + is closed. + +Make sure to replace the value of `API_QUERY_URL` with the `IP:PORT` or domain name of the +new neurobagel node-API you just deployed! + +If using the default port mappings for the query tool (`-p 3000:3000` in above command), +you can reach your local query tool at [http://localhost:3000](http://localhost:3000) once it is running. + +To verify the exact configuration that your new docker +container is running with (e.g. for debugging), +you can run + +```bash +docker inspect query_tool +``` + +### Updating your API configuration +If deploying the query tool as a standalone service for the local node you have just created, you must ensure the `NB_API_ALLOWED_ORIGINS` variable is correctly set in the [`.env` file configuration for your node API](#set-the-environment-variables). +The `NB_API_ALLOWED_ORIGINS` variable defaults to an empty string (`""`) when unset, meaning that your deployed API will only be accessible via direct `curl` requests to the URL where the API is hosted (see [this section](#test-the-new-deployment) for an example `curl` request). + +To make the API accessible by a frontend tool such as our [browser query tool](https://github.com/neurobagel/query-tool), +you must explicitly specify the origin(s) for the frontend using `NB_API_ALLOWED_ORIGINS` in `.env`. +For detailed instructions regarding the query tool see [Running cohort queries](query_tool.md). + +For example, the [`.template-env`](https://github.com/neurobagel/api/blob/main/.template-env) file in the Neurobagel API repo assumes you want to allow API requests from a query tool hosted at a specific port on `localhost` (see the [Docker Compose section](#docker-compose)). + +!!! example "More examples of `NB_API_ALLOWED_ORIGINS`" + + ``` bash title=".env" + # do not allow requests from any frontend origins + NB_API_ALLOWED_ORIGINS="" # this is the default value that will also be set if the variable is excluded from the .env file + + # allow requests from only one origin + NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org" + + # allow requests from 3 different origins + NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org https://localhost:3000 http://localhost:3000" + + # allow requests from any origin - use with caution + NB_API_ALLOWED_ORIGINS="*" + ``` + +??? note "For more technical deployments using NGINX" + + If you have configured an NGINX reverse proxy (or proxy requests to the remote origin) to serve both the API and the query tool from the same origin, you can skip the step of enabling CORS for the API. + For an example, see https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index f633d911..f7afefb8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -27,7 +27,8 @@ nav: - Preparing data for annotation: "data_prep.md" - Annotating a dataset: "annotation_tool.md" - Generating harmonized subject-level metadata: "cli.md" - - Setting up a graph: "infrastructure.md" + - Set up a neurobagel node: "infrastructure.md" + - Set up local federation: "federate.md" - Updating a harmonized dataset: "updating_dataset.md" - Using the API: "api.md" - Running cohort queries: "query_tool.md"