Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs update to reflect recent refactoring #511

Merged
merged 26 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 13 additions & 79 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ development on your laptop to real-world production setups in geographically dis
Core Features
=============

- **Scalable and resilient.** FEDn is highly scalable and resilient via a tiered
architecture where multiple aggregation servers (combiners) form a network to divide up the work to coordinate clients and aggregate models.
- **Scalable and resilient.** FEDn is scalable and resilient via a tiered
architecture where multiple aggregation servers (combiners) divide up the work to coordinate clients and aggregate models.
Benchmarks show high performance both for thousands of clients in a cross-device
setting and for large model updates in a cross-silo setting.
FEDn has the ability to recover from failure in all critical components.

- **Security**. A key feature is that
- **Security**. FEDn is built using secure industry standard communication protocols (gRPC). A key feature is that
clients do not have to expose any ingress ports.

- **Track events and training progress in real-time**. FEDn tracks events for clients and aggregation servers, logging to MongoDB. This
Expand All @@ -39,86 +39,13 @@ Core Features
ML model type or framework. Support for Keras and PyTorch is
available out-of-the-box.


Getting started
===============

Prerequisites
-------------

- `Python 3.8, 3.9 or 3.10 <https://www.python.org/downloads>`__
- `Docker <https://docs.docker.com/get-docker>`__
- `Docker Compose <https://docs.docker.com/compose/install>`__

Quick start
-----------

Clone this repository, locate into it and start a pseudo-distributed FEDn network using docker-compose:

.. code-block::

docker-compose up

This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner. You can verify deployment using these urls:

- API Server: localhost:8092
- Minio: localhost:9000
- Mongo Express: localhost:8081

Next, we will prepare the client. A key concept in FEDn is the compute package -
a code bundle that contains entrypoints for training and (optionally) validating a model update on the client.
The following steps uses the compute package defined in the example project 'examples/mnist-pytorch'.

Locate into 'examples/mnist-pytorch' and familiarize yourself with the project structure. The entrypoints
are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in
'requirements.txt'. For convenience, we have provided utility scripts to set up a virtual environment.

Start by initializing a virtual enviroment with all of the required dependencies for this project.

.. code-block::

bin/init_venv.sh

Next create the compute package and a seed model:

.. code-block::

bin/build.sh

Uploade the generated files 'package.tgz' and 'seed.npz' using the API:
The best way to get started is to take the quickstart tutorial:

The next step is to configure and attach clients. For this we download data and make data partitions:

Download the data:

.. code-block::

bin/get_data


Split the data in 2 partitions:

.. code-block::

bin/split_data

Data partitions will be generated in the folder 'data/clients'.

Now navigate to http://localhost:8090/network and download the client config file. Place it in the example working directory.

To connect a client that uses the data partition 'data/clients/1/mnist.pt':

.. code-block::

docker run \
-v $PWD/client.yaml:/app/client.yaml \
-v $PWD/data/clients/1:/var/data \
-e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \
--network=fedn_default \
ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client1

You are now ready to start training the model at http://localhost:8090/control.

To scale up the experiment, refer to the README at 'examples/mnist-pytorch' (or the corresponding Keras version), where we explain how to use docker-compose to automate deployment of several clients.
- `Quickstart PyTorch <https://github.com/scaleoutsystems/fedn/tree/master/examples/mnist-pytorch>`__

Documentation
=============
Expand All @@ -128,6 +55,13 @@ You will find more details about the architecture, compute package and how to de
- `Paper <https://arxiv.org/abs/2103.00148>`__


FEDn Studio
===============
Scaleout develops a Django Application, FEDn Studio, that provides a UI, authentication/authorization, client identity management, project-based multitenancy for manging multiple projects, and integration with your MLOps pipelines.
There are also additional tooling and charts for deployments on Kubernetes including integration with several projects from the cloud native landscape. See `FEDn Framework <https://www.scaleoutsystems.com/framework>`__
for more information.


Making contributions
====================

Expand Down
21 changes: 21 additions & 0 deletions docs/apiclient.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
APIClient
===============

FEDn comes with an *APIClient* for interacting with the FEDn network. The APIClient is a Python3 library that can be used to interact with the FEDn network programmatically.


The APIClient is available as a Python package on PyPI, and can be installed using pip:

.. code-block:: bash

$ pip install fedn


To initialize the APIClient, you need to provide the hostname and port of the FEDn API server. The default port is 8092. The following code snippet shows how to initialize the APIClient:

.. code-block:: python

from fedn import APIClient
client = APIClient("localhost", 8092)

For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`, and the example `Notebooks <https://github.com/scaleoutsystems/fedn/blob/master/examples/mnist-pytorch/API_Example.ipynb>`_.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
author = 'Scaleout Systems AB'

# The full version, including alpha/beta/rc tags
release = '0.6.0'
release = '0.8.0'

# Add any Sphinx extension module names here, as strings
extensions = [
Expand Down
95 changes: 0 additions & 95 deletions docs/deployment.rst

This file was deleted.

5 changes: 2 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@
:caption: Table of Contents

introduction
quickstart
architecture
deployment
interfaces
quickstart
apiclient
aggregators
helpers
tutorial
Expand Down
37 changes: 0 additions & 37 deletions docs/interfaces.rst

This file was deleted.

31 changes: 14 additions & 17 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,22 @@
Introduction to Federated Learning
==================================

Federated Learning stands at the forefront of modern machine learning techniques, offering a novel approach to address challenges related to data privacy, security,
Federated Learning offers a novel approach to address challenges related to data privacy, security,
and decentralized data distribution. In contrast to traditional machine learning setups where data is collected and stored centrally,
Federated Learning allows for collaborative model training while keeping data localized. This innovative paradigm proves to be particularly advantageous in
Federated Learning allows for collaborative model training while keeping data local with the data owner or device. This is particularly advantageous in
scenarios where data cannot be easily shared due to privacy regulations, network limitations, or ownership concerns.

At its core, Federated Learning orchestrates model training across distributed devices or servers, referred to as clients or participants.
These participants could be diverse endpoints such as mobile devices, IoT gadgets, or remote servers. Rather than transmitting raw data to a central location,
each participant computes gradients locally based on its data. These gradients are then communicated to a central server, often called the aggregator or orchestrator.
The central server aggregates and combines the gradients from multiple participants to update a global model.
each participant computes gradients locally based on its data. These gradients are then communicated to a server, often called the aggregator.
The server aggregates and combines the gradients from multiple participants to update a global model.
This iterative process allows the global model to improve without the need to share the raw data.

FEDn: the SDK for scalable federated learning
---------------------------------------------

FEDn serves as a System Development Kit (SDK) tailored for scalable federated learning.
FEDn serves as a System Development Kit (SDK) enabling scalable federated learning.
It is used to implement the core server side logic (including model aggregation) and the client side integrations.
It implements functionality to deploy and scale the server side in geographically distributed setups.
Developers and ML engineers can use FEDn to build custom federated learning systems and bespoke deployments.


Expand All @@ -28,10 +27,10 @@ adapting to varying project needs and geographical considerations.
Scalable and Resilient
......................

FEDn exhibits scalability and resilience, thanks to its multi-tiered architecture. Multiple aggregation servers, known as combiners,
form a network to divide the workload, coordinating clients, and aggregating models.
FEDn exhibits scalability and resilience, thanks to its tiered architecture. Multiple aggregation servers, in FEDn called combiners,
form a network to divide the workload of coordinating clients and aggregating models.
This architecture allows for high performance in various settings, from thousands of clients in a cross-device environment to
large model updates in a cross-silo scenario. Crucially, FEDn has built-in recovery capabilities for all critical components, enhancing system reliability.
large model updates in a cross-silo scenario. Importantly, FEDn has built-in recovery capabilities for all critical components, enhancing system reliability.

ML-Framework Agnostic
.....................
Expand All @@ -42,20 +41,18 @@ This flexibility allows for out-of-the-box support for popular frameworks like K
Security
.........

A key security feature of FEDn is its client protection capabilities, negating the need for clients to expose any ingress ports,
A key security feature of FEDn is its client protection capabilities - clients do not need to expose any ingress ports,
thus reducing potential security vulnerabilities.

Event Tracking and Training progress
....................................

To ensure transparency and control over the learning process,
FEDn logs events in the federation and does real-time tracking of training progress. A flexible API lets the user define validation strategies locally on clients.
To ensure transparency and control over the training process, as well as to provide means to troubleshoot distributed deployments,
FEDn logs events and does real-time tracking of training progress. A flexible API lets the user define validation strategies locally on clients.
Data is logged as JSON to MongoDB, enabling users to create custom dashboards and visualizations easily.

User Interfaces
REST-API and Python API Client
...............

FEDn offers a Flask-based Dashboard that allows users to monitor client model validations in real time. It also facilitates tracking client training time distributions
and key performance metrics for clients and combiners, providing a comprehensive view of the system’s operation and performance.

FEDn also comes with an REST-API for integration with external dashboards and visualization tools, or integration with other systems.
FEDn comes with an REST-API, a CLI and a Python API Client for programmatic interaction with a FEDn network. This allows for flexible automation of experiments, for integration with
other systems, and for easy integration with external dashboards and visualization tools.
Loading
Loading