diff --git a/README.rst b/README.rst index 30957ccc5..bc9ec58d0 100644 --- a/README.rst +++ b/README.rst @@ -17,13 +17,13 @@ development on your laptop to real-world production setups in geographically dis Core Features ============= -- **Scalable and resilient.** FEDn is highly scalable and resilient via a tiered - architecture where multiple aggregation servers (combiners) form a network to divide up the work to coordinate clients and aggregate models. +- **Scalable and resilient.** FEDn is scalable and resilient via a tiered + architecture where multiple aggregation servers (combiners) divide up the work to coordinate clients and aggregate models. Benchmarks show high performance both for thousands of clients in a cross-device setting and for large model updates in a cross-silo setting. FEDn has the ability to recover from failure in all critical components. -- **Security**. A key feature is that +- **Security**. FEDn is built using secure industry standard communication protocols (gRPC). A key feature is that clients do not have to expose any ingress ports. - **Track events and training progress in real-time**. FEDn tracks events for clients and aggregation servers, logging to MongoDB. This @@ -39,86 +39,13 @@ Core Features ML model type or framework. Support for Keras and PyTorch is available out-of-the-box. + Getting started =============== -Prerequisites -------------- - -- `Python 3.8, 3.9 or 3.10 `__ -- `Docker `__ -- `Docker Compose `__ - -Quick start ------------ - -Clone this repository, locate into it and start a pseudo-distributed FEDn network using docker-compose: - -.. code-block:: - - docker-compose up - -This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner. You can verify deployment using these urls: - -- API Server: localhost:8092 -- Minio: localhost:9000 -- Mongo Express: localhost:8081 - -Next, we will prepare the client. A key concept in FEDn is the compute package - -a code bundle that contains entrypoints for training and (optionally) validating a model update on the client. -The following steps uses the compute package defined in the example project 'examples/mnist-pytorch'. - -Locate into 'examples/mnist-pytorch' and familiarize yourself with the project structure. The entrypoints -are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in -'requirements.txt'. For convenience, we have provided utility scripts to set up a virtual environment. - -Start by initializing a virtual enviroment with all of the required dependencies for this project. - -.. code-block:: - - bin/init_venv.sh - -Next create the compute package and a seed model: - -.. code-block:: - - bin/build.sh - -Uploade the generated files 'package.tgz' and 'seed.npz' using the API: +The best way to get started is to take the quickstart tutorial: -The next step is to configure and attach clients. For this we download data and make data partitions: - -Download the data: - -.. code-block:: - - bin/get_data - - -Split the data in 2 partitions: - -.. code-block:: - - bin/split_data - -Data partitions will be generated in the folder 'data/clients'. - -Now navigate to http://localhost:8090/network and download the client config file. Place it in the example working directory. - -To connect a client that uses the data partition 'data/clients/1/mnist.pt': - -.. code-block:: - - docker run \ - -v $PWD/client.yaml:/app/client.yaml \ - -v $PWD/data/clients/1:/var/data \ - -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \ - --network=fedn_default \ - ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client1 - -You are now ready to start training the model at http://localhost:8090/control. - -To scale up the experiment, refer to the README at 'examples/mnist-pytorch' (or the corresponding Keras version), where we explain how to use docker-compose to automate deployment of several clients. +- `Quickstart PyTorch `__ Documentation ============= @@ -128,6 +55,13 @@ You will find more details about the architecture, compute package and how to de - `Paper `__ +FEDn Studio +=============== +Scaleout develops a Django Application, FEDn Studio, that provides a UI, authentication/authorization, client identity management, project-based multitenancy for manging multiple projects, and integration with your MLOps pipelines. +There are also additional tooling and charts for deployments on Kubernetes including integration with several projects from the cloud native landscape. See `FEDn Framework `__ +for more information. + + Making contributions ==================== diff --git a/docs/apiclient.rst b/docs/apiclient.rst new file mode 100644 index 000000000..10576710a --- /dev/null +++ b/docs/apiclient.rst @@ -0,0 +1,21 @@ +APIClient +=============== + +FEDn comes with an *APIClient* for interacting with the FEDn network. The APIClient is a Python3 library that can be used to interact with the FEDn network programmatically. + + +The APIClient is available as a Python package on PyPI, and can be installed using pip: + +.. code-block:: bash + + $ pip install fedn + + +To initialize the APIClient, you need to provide the hostname and port of the FEDn API server. The default port is 8092. The following code snippet shows how to initialize the APIClient: + +.. code-block:: python + + from fedn import APIClient + client = APIClient("localhost", 8092) + +For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`, and the example `Notebooks `_. diff --git a/docs/conf.py b/docs/conf.py index bd2032b0e..686e82b57 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -12,7 +12,7 @@ author = 'Scaleout Systems AB' # The full version, including alpha/beta/rc tags -release = '0.6.0' +release = '0.8.0' # Add any Sphinx extension module names here, as strings extensions = [ diff --git a/docs/deployment.rst b/docs/deployment.rst deleted file mode 100644 index 974d98842..000000000 --- a/docs/deployment.rst +++ /dev/null @@ -1,95 +0,0 @@ -Distributed Deployment -====================== - -This guide serves as reference deployment for setting up a FEDn network consisting of: - - One host/VM for the object storage and database services (MinIO, MongoDB) - - One host/VM for the controller / reducer - - One host/VM for the combiner - -.. note:: - In this guide we will deploy using the provived docker-compose templates. Please note that additional configurations would be needed for a production-grade network. - -Prerequisites -------------- - -Hosts / VMs -........... - -We assume that you have root access to 3 Ubuntu 20.04 LTS or 22.04 LTS Server hosts / VMs. We recommend at least 4 CPUs and 8GB RAM for the base services and the reducer, and 4 CPUs and 16BG RAM for the combiner host. Each host needs the following: - -- `Python >=3.8, <3.11 `_ -- `Docker `_ -- `Docker Compose `_ - - -Networking -.......... -You will need to configure security groups / ingress settings for each host matching the port settings in the docker-compose templates. -The reducer and clients need to be able to resolve the hostname for the combiners. In this example -we show how this can be achieved if no external DNS resolution is available, by setting "extra host" in the Docker containers for the Reducer and client. Note that there are many other possible ways to achieve this, depending on your setup. - -1. Deploy storage and database services (MinIO, MongoDB and MongoExpress) -------------------------------------------------------------------------- - -First, deploy MinIO and Mongo services on one of the hosts. Edit the `docker-compose.yaml` file to change the default passwords and ports. - -.. code-block:: bash - - sudo docker-compose up -d minio mongo mongo-express - -Remember to open ports on the host so that the API endpoints (the exported port in the 'ports' property for each of the services) can be reached. - -.. warning:: - The deployment of MinIO and MongoDB above is insecure. For a production network, please ensure production deployments of the base services. - -2. Deploy the reducer ---------------------- - -Copy the file "config/settings-reducer.yaml.template" to "config/settings-reducer.yaml", then - -a. Edit 'settings-reducer.yaml' to provide the connection settings for MongoDB and Minio from Step 1. -b. Copy 'config/extra-hosts-reducer.yaml.template' to 'config/extra-hosts-reducer.yaml' and edit it, adding a host:IP mapping for each combiner you plan to deploy. - -Then start the reducer: - -.. code-block:: bash - - sudo docker-compose \ - -f docker-compose.yaml \ - -f config/reducer-settings.override.yaml \ - -f config/extra-hosts-reducer.yaml \ - up -d reducer - -.. note:: - the use of 'extra-hosts-reducer.yaml' is a way to add the host:IP mapping to /etc/hosts in the Docker container in docker-compose. It can be skipped if you handle DNS resolution in some other way. - -3. Deploy combiners -------------------- - -Copy 'config/settings-combiner.yaml.template' to 'config/settings-combiner.yaml' and edit it to provide a name for the combiner (used as a unique identifier for the combiner in the FEDn network), a hostname (which is used by reducer and clients to connect to the combiner RPC server), -and the port (default is 12080, make sure to allow access to this port in your security group/firewall settings). -Also, provide the IP and port for the reducer under the 'controller' tag. Then deploy the combiner: - -.. code-block:: bash - - sudo docker-compose \ - -f docker-compose.yaml \ - -f config/combiner-settings.override.yaml \ - up -d combiner - -Optional: Repeat this step for any number of additional combiner nodes. Make sure to provide an unique name for each combiner, -and update extra_hosts for the reducer (you need to restart the reducer to do so). - -.. warning:: - Note that it is not possible to use the IP address as 'host'. gRPC does not support certificates based on IP addresses. - -4. Attach clients to the FEDn network -------------------------------------- - -You can now choose an example, upload a compute package and an initial model, and attach clients. - -- `Examples `__ - -.. note:: - The clients will also need to be able to resolve each combiner node usign the 'host' argument in the combiner settings file. - There is a template in 'config/extra-hosts-client.yaml.template' that can be modified for this purpose. diff --git a/docs/index.rst b/docs/index.rst index 060b12945..0a16df1f7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,10 +3,9 @@ :caption: Table of Contents introduction - quickstart architecture - deployment - interfaces + quickstart + apiclient aggregators helpers tutorial diff --git a/docs/interfaces.rst b/docs/interfaces.rst deleted file mode 100644 index 270f0110b..000000000 --- a/docs/interfaces.rst +++ /dev/null @@ -1,37 +0,0 @@ -User interfaces -=============== - -FEDn comes with an *APIClient* and a *Dashboard* for interacting with the FEDn network. The APIClient is a Python3 library that can be used to interact with the FEDn network programmatically. -The Dashboard is a web-based user interface that can be used to interact with the FEDn network through a web browser. - -APIClient --------------- -The APIClient is a Python3 library that can be used to interact with the FEDn network programmatically. The APIClient is available as a Python package on PyPI, and can be installed using pip: - -.. code-block:: bash - - $ pip install fedn - - -To initialize the APIClient, you need to provide the hostname and port of the FEDn API server. The default port is 8092. The following code snippet shows how to initialize the APIClient: - -.. code-block:: python - - from fedn import APIClient - client = APIClient("localhost", 8092) - -For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`. - -Dashboard --------------- -The Dashboard is a web-based user interface that can be used to interact with the FEDn network through a web browser. The Dashboard is available as a Docker image, and can be run using the following command: - -.. code:: bash - - $ docker-compose up -d dashboard - -OBS! If you have followed any of the examples, the dashboard will already be running! -The Dashboard is now available at http://localhost:8090. If no compute package has been configured, the Dashboard will ask you to upload a compute package. -A compute package is a zip file containing the ML code that will be executed on the clients. -For more information on how to create a compute package, see the :ref:`tutorial-label`. After uploading a compute package, you will also need to upload an initial model. This initial model is -usually the initial weights for the model that will be trained. You can then navigate to the Control Panel to start a training session. diff --git a/docs/introduction.rst b/docs/introduction.rst index 6897690ba..b5895af8c 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -1,23 +1,22 @@ Introduction to Federated Learning ================================== -Federated Learning stands at the forefront of modern machine learning techniques, offering a novel approach to address challenges related to data privacy, security, +Federated Learning offers a novel approach to address challenges related to data privacy, security, and decentralized data distribution. In contrast to traditional machine learning setups where data is collected and stored centrally, -Federated Learning allows for collaborative model training while keeping data localized. This innovative paradigm proves to be particularly advantageous in +Federated Learning allows for collaborative model training while keeping data local with the data owner or device. This is particularly advantageous in scenarios where data cannot be easily shared due to privacy regulations, network limitations, or ownership concerns. At its core, Federated Learning orchestrates model training across distributed devices or servers, referred to as clients or participants. These participants could be diverse endpoints such as mobile devices, IoT gadgets, or remote servers. Rather than transmitting raw data to a central location, -each participant computes gradients locally based on its data. These gradients are then communicated to a central server, often called the aggregator or orchestrator. -The central server aggregates and combines the gradients from multiple participants to update a global model. +each participant computes gradients locally based on its data. These gradients are then communicated to a server, often called the aggregator. +The server aggregates and combines the gradients from multiple participants to update a global model. This iterative process allows the global model to improve without the need to share the raw data. FEDn: the SDK for scalable federated learning --------------------------------------------- -FEDn serves as a System Development Kit (SDK) tailored for scalable federated learning. +FEDn serves as a System Development Kit (SDK) enabling scalable federated learning. It is used to implement the core server side logic (including model aggregation) and the client side integrations. -It implements functionality to deploy and scale the server side in geographically distributed setups. Developers and ML engineers can use FEDn to build custom federated learning systems and bespoke deployments. @@ -28,10 +27,10 @@ adapting to varying project needs and geographical considerations. Scalable and Resilient ...................... -FEDn exhibits scalability and resilience, thanks to its multi-tiered architecture. Multiple aggregation servers, known as combiners, -form a network to divide the workload, coordinating clients, and aggregating models. +FEDn exhibits scalability and resilience, thanks to its tiered architecture. Multiple aggregation servers, in FEDn called combiners, +form a network to divide the workload of coordinating clients and aggregating models. This architecture allows for high performance in various settings, from thousands of clients in a cross-device environment to -large model updates in a cross-silo scenario. Crucially, FEDn has built-in recovery capabilities for all critical components, enhancing system reliability. +large model updates in a cross-silo scenario. Importantly, FEDn has built-in recovery capabilities for all critical components, enhancing system reliability. ML-Framework Agnostic ..................... @@ -42,20 +41,18 @@ This flexibility allows for out-of-the-box support for popular frameworks like K Security ......... -A key security feature of FEDn is its client protection capabilities, negating the need for clients to expose any ingress ports, +A key security feature of FEDn is its client protection capabilities - clients do not need to expose any ingress ports, thus reducing potential security vulnerabilities. Event Tracking and Training progress .................................... -To ensure transparency and control over the learning process, -FEDn logs events in the federation and does real-time tracking of training progress. A flexible API lets the user define validation strategies locally on clients. +To ensure transparency and control over the training process, as well as to provide means to troubleshoot distributed deployments, +FEDn logs events and does real-time tracking of training progress. A flexible API lets the user define validation strategies locally on clients. Data is logged as JSON to MongoDB, enabling users to create custom dashboards and visualizations easily. -User Interfaces +REST-API and Python API Client ............... -FEDn offers a Flask-based Dashboard that allows users to monitor client model validations in real time. It also facilitates tracking client training time distributions -and key performance metrics for clients and combiners, providing a comprehensive view of the system’s operation and performance. - -FEDn also comes with an REST-API for integration with external dashboards and visualization tools, or integration with other systems. \ No newline at end of file +FEDn comes with an REST-API, a CLI and a Python API Client for programmatic interaction with a FEDn network. This allows for flexible automation of experiments, for integration with +other systems, and for easy integration with external dashboards and visualization tools. \ No newline at end of file diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 2b89ff165..ca60fd149 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -1,5 +1,17 @@ -Quick Start -=========== +Quickstart Tutorial PyTorch (MNIST) +=================================== + +This classic example of hand-written text recognition is well suited as a lightweight test when developing on FEDn in pseudo-distributed mode. +A normal high-end laptop or a workstation should be able to sustain a few clients. +The example automates the partitioning of data and deployment of a variable number of clients on a single host. +We here assume working experience with containers, Docker and docker-compose. + +Prerequisites +------------- + +- `Python 3.8, 3.9 or 3.10 `__ +- `Docker `__ +- `Docker Compose `__ Clone this repository, locate into it and start a pseudo-distributed FEDn network using docker-compose: @@ -8,14 +20,18 @@ Clone this repository, locate into it and start a pseudo-distributed FEDn networ docker-compose up +This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner. +You can verify the deployment using these urls: -This will start up all neccecary components for a FEDn network, execept for the clients. +- API Server: http://localhost:8092/get_controller_status +- Minio: http://localhost:9000 +- Mongo Express: http://localhost:8081 .. warning:: The FEDn network is configured to use a local Minio and MongoDB instances for storage. This is not suitable for production, but is fine for testing. .. note:: - You have the option to programmatically interact with the FEDn network using the Python APIClient, or you can use the Dashboard. In these Note sections we will use the APIClient. + To programmatically interact with the FEDn network use the APIClient. Install the FEDn via pip: .. code-block:: bash @@ -25,10 +41,12 @@ This will start up all neccecary components for a FEDn network, execept for the $ cd fedn $ pip install . -Navigate to http://localhost:8090. You should see the FEDn Dashboard, asking you to upload a compute package. The compute package is a tarball of a project. -The project in turn implements the entrypoints used by clients to compute model updates and to validate a model. +Next, we will prepare the client. A key concept in FEDn is the compute package - +a code bundle that contains entrypoints for training and (optionally) validating a model update on the client. -Locate into 'examples/mnist-pytorch'. +Locate into ``examples/mnist-pytorch`` and familiarize yourself with the project structure. The entrypoints +are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in +``requirements.txt``. For convenience, we have provided utility scripts to set up a virtual environment. Start by initializing a virtual enviroment with all of the required dependencies for this project. @@ -42,17 +60,15 @@ Now create the compute package and an initial model: bin/build.sh -Upload the generated files 'package.tgz' and 'seed.npz' in the FEDn Dashboard. -.. note:: - Instead of uploading in the dashboard do: +Upload the compute package and seed model to FEDn: - .. code:: python +.. code:: python - >>> from fedn import APIClient - >>> client = APIClient(host="localhost", port=8092) - >>> client.set_package("package.tgz", helper="pytorchhelper") - >>> client.set_initial_model("seed.npz") + >>> from fedn import APIClient + >>> client = APIClient(host="localhost", port=8092) + >>> client.set_package("package.tgz", helper="numpyhelper") + >>> client.set_initial_model("seed.npz") The next step is to configure and attach clients. For this we need to download data and make data partitions: @@ -71,19 +87,26 @@ Split the data in 2 parts for the clients: Data partitions will be generated in the folder 'data/clients'. -Now navigate to http://localhost:8090/network and download the client config file. Place it in the example working directory. -.. note:: - In the python enviroment you installed FEDn: +FEDn relies on a configuration file for the client to connect to the server. Create a file called 'client.yaml' with the follwing content: + +.. code-block:: + + network_id: fedn-network + discover_host: api-server + discover_port: 8092 - .. code:: python +(optional) Use the APIClient to fetch the client configuration and save it to a file: - >>> import yaml - >>> config = client.get_client_config(checksum=True) - >>> with open("client.yaml", "w") as f: - >>> f.write(yaml.dump(config)) +.. code:: python -To connect a client that uses the data partition 'data/clients/1/mnist.pt': + >>> import yaml + >>> config = client.get_client_config(checksum=True) + >>> with open("client.yaml", "w") as f: + >>> f.write(yaml.dump(config)) + +Make sure to move the file ``client.yaml`` to the root of the examples/mnist-pytorch folder. +To connect a client that uses the data partition ``data/clients/1/mnist.pt`` and the config file ``client.yaml`` to the network, run the following docker command: .. code-block:: @@ -92,27 +115,58 @@ To connect a client that uses the data partition 'data/clients/1/mnist.pt': -v $PWD/data/clients/1:/var/data \ -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \ --network=fedn_default \ - ghcr.io/scaleoutsystems/fedn/fedn:develop-mnist-pytorch run client -in client.yaml --name client1 + ghcr.io/scaleoutsystems/fedn/fedn:0.8.0-mnist-pytorch run client -in client.yaml --name client1 -.. note:: - If you are using the APIClient you must also start the training client via "docker run" command as above. +Observe the API Server logs and combiner logs, you should see the client connecting. +You are now ready to start training the model. In the python enviroment you installed FEDn: -You are now ready to start training the model at http://localhost:8090/control. +.. code:: python -.. note:: - In the python enviroment you installed FEDn you can start training via: + >>> ... + >>> client.start_session(session_id="test-session", rounds=3) + # Wait for training to complete, when controller is idle: + >>> client.get_controller_status() + # Show model trail: + >>> client.get_model_trail() + # Show model performance: + >>> client.list_validations() - .. code:: python +Please see :py:mod:`fedn.network.api` for more details on the APIClient. - >>> ... - >>> client.start_session(session_id="test-session", rounds=3) - # Wait for training to complete, when controller is idle: - >>> client.get_controller_status() - # Show model trail: - >>> client.get_model_trail() - # Show model performance: - >>> client.list_validations() +There is also a Jupyter `Notebook `_ version of this tutorial including examples of how to fetch and visualize model validations. + +Automate and scale up experimentation with several clients +---------------------------------------------------------- +Now that you have an understanding of the main components of FEDn, you can use the provided docker-compose templates to automate deployment of FEDn and clients. +To start the network and attach 4 clients. Standing in ``examples/mnist-pytorch``, run the following docker compose command: + +.. code-block:: + + docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up --scale client=4 + + +Access logs and validation data from MongoDB +-------------------------------------------- +You can access and download event logs and validation data via the API, and you can also as a developer obtain +the MongoDB backend data using pymongo or via the MongoExpress interface: + +- http://localhost:8081/db/fedn-network/ + +The credentials are as set in docker-compose.yaml in the root of the repository. + +Access model updates +-------------------- + +You can obtain model updates from the 'fedn-models' bucket in Minio: + +- http://localhost:9000 + + +Clean up +-------- +You can clean up by running + +.. code-block:: - Please see :py:mod:`fedn.network.api` for more details on the APIClient. + docker-compose down -To scale up the experiment, refer to the README at 'examples/mnist-pytorch' (or the corresponding Keras version), where we explain how to use docker-compose to automate deployment of several clients. diff --git a/docs/tutorial.rst b/docs/tutorial.rst index b37053904..72bb0f504 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -15,7 +15,7 @@ The compute package :width: 100% :align: center -The *compute package* is a tar.gz bundle of the code to be executed by each data-provider/client. +The *compute package* is a .tgz bundle of the code to be executed by each data-provider/client. This package is uploaded to the *Controller* upon initialization of the FEDN Network (along with the initial model). When a client connects to the network, it downloads and unpacks the package locally and are then ready to participate in training and/or validation. @@ -39,7 +39,7 @@ In the examples we have roughly the following file and folder structure: | └── docker-compose.yml/Dockerfile | -The "client" folder is the *compute package* which will become a tar.gz bundle of the code to be executed by +The "client" folder is the *compute package* which will become a .tgz bundle of the code to be executed by each data-provider/client. The entry points, mentioned above, are defined in the *fedn.yaml*: .. code-block:: yaml @@ -77,7 +77,7 @@ A *entrypoint.py* example can look like this: from fedn.utils.helpers.helpers import get_helper, save_metadata, save_metrics - HELPER_MODULE = 'pytorchhelper' + HELPER_MODULE = 'numpyhelper' NUM_CLASSES = 10 def _compile_model(): @@ -286,9 +286,9 @@ A *entrypoint.py* example can look like this: -The format of the input and output files (model updates) are dependent on the ML framework used. A helper instance :py:mod:`fedn.utils.plugins.pytorchhelper` is used to handle the serialization and deserialization of the model updates. +The format of the input and output files (model updates) are using numpy ndarrays. A helper instance :py:mod:`fedn.utils.helpers.plugins.numpyhelper` is used to handle the serialization and deserialization of the model updates. The first function (_compile_model) is used to define the model architecture and creates an initial model (which is then used by _init_seed). The second function (_load_data) is used to read the data (train and test) from disk. -The third function (_save_model) is used to save the model to disk using the pytorch helper module :py:mod:`fedn.utils.plugins.pytorchhelper`. The fourth function (_load_model) is used to load the model from disk, again +The third function (_save_model) is used to save the model to disk using the numpy helper module :py:mod:`fedn.utils.helpers.plugins.numpyhelper`. The fourth function (_load_model) is used to load the model from disk, again using the pytorch helper module. The fifth function (_init_seed) is used to initialize the seed model. The sixth function (_train) is used to train the model, observe the two first arguments which will be set by the FEDn client. The seventh function (_validate) is used to validate the model, again observe the two first arguments which will be set by the FEDn client. @@ -302,18 +302,18 @@ For validations it is a requirement that the output is saved in a valid json for In the code example we use the helper function :py:meth:`fedn.utils.helpers.helpers.save_metrics` to save the validation scores as a json file. -The Dahboard in the FEDn UI will plot any scalar metric in this json file, but you can include any type in the file assuming that it is valid json. These values can then be obtained (by an athorized user) from the MongoDB database or using the :py:mod:`fedn.network.api.client`. +These values can then be obtained (by an athorized user) from the MongoDB database or using the :py:meth:`fedn.network.api.client.APIClient.list_validations`. Packaging for distribution -------------------------- -For the compute package we need to compress the *client* folder as .tar.gz file. E.g. using: +For the compute package we need to compress the *client* folder as .tgz file. E.g. using: .. code-block:: bash tar -czvf package.tgz client -This file can then be uploaded to the FEDn network using the FEDn UI or the :py:mod:`fedn.network.api.client`. +This file can then be uploaded to the FEDn network using the :py:meth:`fedn.network.api.client.APIClient.set_package`. More on local data access @@ -335,7 +335,7 @@ We recommend you to test your code before running the client. For example, you c python entrypoint.py validate ../model_update.npz ../validation.json --data_path ../data/mnist.npz -Once everything works as expected you can start the federated network, upload the tar.gz compute package and the initial model. +Once everything works as expected you can start the federated network, upload the .tgz compute package and the initial model (use :py:meth:`fedn.network.api.client.APIClient.set_initial_model` for uploading an initial model). Finally connect a client to the network: .. code-block:: bash @@ -345,7 +345,7 @@ Finally connect a client to the network: -v $PWD/data/clients/1:/var/data \ -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \ --network=fedn_default \ - ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client1 + ghcr.io/scaleoutsystems/fedn/fedn:0.8.0-mnist-pytorch run client -in client.yaml --name client1 -The container image "ghcr.io/scaleoutsystems/fedn/fedn:develop-mnist-pytorch" is a pre-built image with the FEDn client and the PyTorch framework installed. +The container image "ghcr.io/scaleoutsystems/fedn/fedn:0.8.0-mnist-pytorch" is a pre-built image with the FEDn client and the PyTorch framework installed. diff --git a/examples/README.md b/examples/README.md index 52adf95e2..27f3dce63 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,6 +1,8 @@ ## Examples The examples distributed here in this folder are maintained by Scaleout. +We recommend all new users to start with the [Quickstart Tutorial (PyTorch)](https://github.com/scaleoutsystems/fedn/tree/master/examples/mnist-pytorch). + ### External examples Below we maintain a list of examples provided both by the Scaleout core team and users. They may or may not be tested with the latest release of FEDn, please refer to the README of each specific project/example for detail. If you have a project that you want to include in this list, talk to a core developer in [Discord](https://discord.gg/CCRgjpMsVA). diff --git a/examples/mnist-keras/README.md b/examples/mnist-keras/README.md index b4b5c0672..795b88f10 100644 --- a/examples/mnist-keras/README.md +++ b/examples/mnist-keras/README.md @@ -1,12 +1,7 @@ # MNIST (TensorFlow/Keras version) -This classic example of hand-written text recognition is well suited both as a lightweight test when developing on FEDn in pseudo-distributed mode. A normal high-end laptop or a workstation should be able to sustain a few clients. The example automates the partitioning of data and deployment of a variable number of clients on a single host. We here assume working experience with containers, Docker and docker-compose. -## Table of Contents -- [MNIST Example (Keras version)](#mnist-example-keras-version) - - [Table of Contents](#table-of-contents) - - [Prerequisites](#prerequisites) - - [Running the example (pseudo-distributed)](#running-the-example-pseudo-distributed) - - [Clean up](#clean-up) +This is a mimimalistic TF/Keras version of the Quickstart Tutorial (PyTorch). For more detailed explaination including a Jupyter Notebook with +examples of API usage for starting and interacting with federated experiments, refer to that tutorial. ## Prerequisites - [Python 3.8, 3.9 or 3.10](https://www.python.org/downloads) @@ -45,43 +40,18 @@ bin/build.sh > The files location will be `package/package.tgz` and `seed.npz`. ### Deploy FEDn -Now we are ready to deploy FEDn with `docker-compose`. -``` -docker-compose -f ../../docker-compose.yaml up -d minio mongo mongo-express reducer combiner -``` - -### Initialize the federated model -Now navigate to http://localhost:8090 to see the reducer UI. You will be asked to upload the compute package and the seed model that you created in the previous step. - -### Attach clients -To attach clients to the network, use the docker-compose.override.yaml template to start 2 clients: +Now we are ready to deploy FEDn and two clients with `docker-compose`. ``` -docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up client +docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up ``` + > **Note**: run with `--scale client=N` to start *N* clients. ### Run federated training -Finally, you can start the experiment from the "control" tab of the UI. +Refer to the notebook to create your own drivers for seeding the federation and running experiments. + + https://github.com/scaleoutsystems/fedn/blob/master/examples/mnist-pytorch/API_Example.ipynb ## Clean up You can clean up by running `docker-compose down`. - -## Connecting to a distributed deployment -To start and remotely connect a client with the required dependencies for this example, start by downloading the `client.yaml` file. You can either navigate the reducer UI or run the following command. - -```bash -curl -k https://:/config/download > client.yaml -``` -> **Note** make sure to replace `` and `` with appropriate values. - -Now you are ready to start the client via Docker by running the following command. - -```bash -docker run -d \ - -v $PWD/client.yaml:/app/client.yaml \ - -v $PWD/data:/var/data \ - -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.npz \ - ghcr.io/scaleoutsystems/fedn/fedn:develop-mnist-keras run client -in client.yaml -``` -> **Note** If reducer and combiner host names, as specfied in the configuration files, are not resolvable in the client host network you need to use the docker option `--add-hosts` to make them resolvable. Please refer to the Docker documentation for more detail. diff --git a/examples/mnist-pytorch/.gitignore b/examples/mnist-pytorch/.gitignore index 84f374386..a9f01054b 100644 --- a/examples/mnist-pytorch/.gitignore +++ b/examples/mnist-pytorch/.gitignore @@ -2,6 +2,5 @@ data *.npz *.tgz *.tar.gz -*.ipynb .mnist-pytorch client.yaml \ No newline at end of file diff --git a/examples/mnist-pytorch/API_Example.ipynb b/examples/mnist-pytorch/API_Example.ipynb index 1c8c46fae..3ac7b615b 100644 --- a/examples/mnist-pytorch/API_Example.ipynb +++ b/examples/mnist-pytorch/API_Example.ipynb @@ -14,7 +14,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "743dfe47", "metadata": {}, "outputs": [], @@ -38,7 +38,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "1061722d", "metadata": {}, "outputs": [], @@ -58,7 +58,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "5107f6f9", "metadata": {}, "outputs": [], @@ -78,7 +78,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "id": "f0380d35", "metadata": {}, "outputs": [], @@ -98,7 +98,7 @@ }, { "cell_type": "markdown", - "id": "81184448", + "id": "8cc709c2", "metadata": {}, "source": [ "We wait for the session to finish: " @@ -106,8 +106,8 @@ }, { "cell_type": "code", - "execution_count": 14, - "id": "e1143474", + "execution_count": null, + "id": "897451fa", "metadata": {}, "outputs": [], "source": [ @@ -117,7 +117,7 @@ }, { "cell_type": "markdown", - "id": "de35a9df", + "id": "16874cec", "metadata": {}, "source": [ "Next, we retrive all model validations from all clients, extract the training accuracy metric, and compute its mean value accross all clients" @@ -125,8 +125,8 @@ }, { "cell_type": "code", - "execution_count": 15, - "id": "b5db0739", + "execution_count": null, + "id": "4e8044b7", "metadata": {}, "outputs": [], "source": [ @@ -153,31 +153,10 @@ }, { "cell_type": "code", - "execution_count": 16, - "id": "60082e1a", + "execution_count": null, + "id": "42425c43", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "x = range(1,len(mean_acc_fedavg)+1)\n", "plt.plot(x, mean_acc_fedavg)\n", @@ -194,7 +173,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "id": "4f70d7d9", "metadata": {}, "outputs": [], @@ -212,8 +191,8 @@ }, { "cell_type": "code", - "execution_count": 18, - "id": "31dba6d7", + "execution_count": null, + "id": "ce8a89a3", "metadata": {}, "outputs": [], "source": [ @@ -223,7 +202,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "id": "900eb0a7", "metadata": {}, "outputs": [], @@ -258,31 +237,10 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "id": "d064aaf9", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "x = range(1,len(mean_acc_fedavg)+1)\n", "plt.plot(x, mean_acc_fedavg, x, mean_acc_fedopt)\n", diff --git a/examples/mnist-pytorch/README.md b/examples/mnist-pytorch/README.md deleted file mode 100644 index a7bbcc0b3..000000000 --- a/examples/mnist-pytorch/README.md +++ /dev/null @@ -1,69 +0,0 @@ -# MNIST (PyTorch version) -This classic example of hand-written text recognition is well suited as a lightweight test when developing on FEDn in pseudo-distributed mode. A normal high-end laptop or a workstation should be able to sustain a few clients. The example automates the partitioning of data and deployment of a variable number of clients on a single host. We here assume working experience with containers, Docker and docker-compose. - - -## Table of Contents -- [MNIST Example (PyTorch version)](#mnist-example-pytorch-version) - - [Table of Contents](#table-of-contents) - - [Prerequisites](#prerequisites) - - [Running the example (pseudo-distributed)](#running-the-example-pseudo-distributed) - - [Clean up](#clean-up) - -## Prerequisites -- [Python 3.8, 3.9 or 3.10](https://www.python.org/downloads) -- [Docker](https://docs.docker.com/get-docker) -- [Docker Compose](https://docs.docker.com/compose/install) - -## Running the example (pseudo-distributed, single host) - -Clone FEDn and locate into this directory. -```sh -git clone https://github.com/scaleoutsystems/fedn.git -cd fedn/examples/mnist-pytorch -``` - -### Preparing the environment, the local data, the compute package and seed model -Start by initializing a virtual enviroment with all of the required dependencies. -``` -bin/init_venv.sh -``` - -Then, to get the data you can run the following script. -``` -bin/get_data -``` - -The next command splits the data in 2 parts for the clients. -``` -bin/split_data -``` -> **Note**: run with `--n_splits=N` to split in *N* parts. - -Create the compute package and a seed model that you will be asked to upload in the next step. -``` -bin/build.sh -``` -> The files location will be `package/package.tgz` and `seed.npz`. - -### Deploy FEDn -Now we are ready to deploy FEDn with `docker-compose`. -``` -docker-compose -f ../../docker-compose.yaml up -d minio mongo mongo-express reducer combiner -``` - -### Initialize the federated model -Now navigate to http://localhost:8090 to see the reducer UI. You will be asked to upload the compute package and the seed model that you created in the previous step. Make sure to choose the "PyTorch" helper. - -### Attach clients -To attach clients to the network, use the docker-compose.override.yaml template to start 2 clients: - -``` -docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up client -``` -> **Note**: run with `--scale client=N` to start *N* clients. - -### Run federated training -Finally, you can start the experiment from the "control" tab of the UI. - -## Clean up -You can clean up by running `docker-compose down`. diff --git a/examples/mnist-pytorch/README.rst b/examples/mnist-pytorch/README.rst new file mode 100644 index 000000000..7522cb182 --- /dev/null +++ b/examples/mnist-pytorch/README.rst @@ -0,0 +1,149 @@ +Quickstart Tutorial PyTorch (MNIST) +------------- + +This classic example of hand-written text recognition is well suited as a lightweight test when developing on FEDn in pseudo-distributed mode. +A normal high-end laptop or a workstation should be able to sustain a few clients. +The example automates the partitioning of data and deployment of a variable number of clients on a single host. +We here assume working experience with containers, Docker and docker-compose. + +Prerequisites +------------- + +- `Python 3.8, 3.9 or 3.10 `__ +- `Docker `__ +- `Docker Compose `__ + +Quick start +----------- + +Clone this repository, locate into this directory: + +.. code-block:: + + git clone https://github.com/scaleoutsystems/fedn.git + cd fedn/examples/mnist-keras + +Start a pseudo-distributed FEDn network using docker-compose: + +.. code-block:: + + docker-compose -f ../../docker-compose.yaml up + +This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner. +You can verify the deployment using these urls: + +- API Server: http://localhost:8092/get_controller_status +- Minio: http://localhost:9000 +- Mongo Express: http://localhost:8081 + +Next, we will prepare the client. A key concept in FEDn is the compute package - +a code bundle that contains entrypoints for training and (optionally) validating a model update on the client. + +Locate into 'examples/mnist-pytorch' and familiarize yourself with the project structure. The entrypoints +are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in +'requirements.txt'. For convenience, we have provided utility scripts to set up a virtual environment. + +Start by initializing a virtual enviroment with all of the required dependencies for this project. + +.. code-block:: + + bin/init_venv.sh + +Next create the compute package and a seed model: + +.. code-block:: + + bin/build.sh + +You should now have a file 'package.tgz' and 'seed.npz' in the project folder. + +Next we prepare the local dataset. For this we download MNIST data and make data partitions: + +Download the data: + +.. code-block:: + + bin/get_data + + +Split the data in 10 partitions: + +.. code-block:: + + bin/split_data --n_splits=10 + +Data partitions will be generated in the folder 'data/clients'. + +FEDn relies on a configuration file for the client to connect to the server. Create a file called 'client.yaml' with the follwing content: + +.. code-block:: + + network_id: fedn-network + discover_host: api-server + discover_port: 8092 + +Make sure to move the file ``client.yaml`` to the root of the examples/mnist-pytorch folder. +To connect a client that uses the data partition ``data/clients/1/mnist.pt`` and the config file ``client.yaml`` to the network, run the following docker command: + +.. code-block:: + + docker run \ + -v $PWD/client.yaml:/app/client.yaml \ + -v $PWD/data/clients/1:/var/data \ + -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \ + --network=fedn_default \ + ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client1 + +Observe the API Server logs and combiner logs, you should see the client connecting and entering into a state asking for a compute package. + +In a separate terminal, start a second client using the data partition 'data/clients/2/mnist.pt': + +.. code-block:: + + docker run \ + -v $PWD/client.yaml:/app/client.yaml \ + -v $PWD/data/clients/2:/var/data \ + -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \ + --network=fedn_default \ + ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client2 + +You are now ready to use the API to initialize the system with the compute package and seed model, and to start federated training. + +- Follow the example in the `Jupyter Notebook `__ + + +Automate experimentation with several clients: +----------- + +Now that you have an understanding of the main components of FEDn, you can use the provided docker-compose templates to automate deployment of FEDn and clients. +To start the network and attach 4 clients: + +.. code-block:: + + docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up --scale client=4 + + +Access logs and validation data from MongoDB +----------- +You can access and download event logs and validation data via the API, and you can also as a developer obtain +the MongoDB backend data using pymongo or via the MongoExpress interface: + +- http://localhost:8081/db/fedn-network/ + +The credentials are as set in docker-compose.yaml in the root of the repository. + +Access model updates +----------- + +You can obtain model updates from the 'fedn-models' bucket in Minio: + +- http://localhost:9000 + + +Clean up +----------- +You can clean up by running + +.. code-block:: + + docker-compose down