Skip to content

Commit

Permalink
Feature/SK-523 | Refactor REST and add APIClient (#477)
Browse files Browse the repository at this point in the history
Wrede authored Aug 24, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent a60d010 commit 7a18790
Showing 84 changed files with 3,882 additions and 1,184 deletions.
83 changes: 83 additions & 0 deletions .ci/tests/examples/api_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import fire
import yaml

from fedn import APIClient


def _download_config(output):
""" Download the client configuration file from the controller.
:param output: The output file path.
:type output: str
"""
client = APIClient(host="localhost", port=8092)
config = client.get_client_config(checksum=True)
with open(output, 'w') as f:
f.write(yaml.dump(config))


def test_api_get_methods():
client = APIClient(host="localhost", port=8092)
status = client.get_controller_status()
assert status
print("Controller status: ", status, flush=True)

events = client.get_events()
assert events
print("Events: ", events, flush=True)

validations = client.list_validations()
assert validations
print("Validations: ", validations, flush=True)

models = client.get_model_trail()
assert models
print("Models: ", models, flush=True)

clients = client.list_clients()
assert clients
print("Clients: ", clients, flush=True)

combiners = client.list_combiners()
assert combiners
print("Combiners: ", combiners, flush=True)

combiner = client.get_combiner("combiner")
assert combiner
print("Combiner: ", combiner, flush=True)

first_model = client.get_initial_model()
assert first_model
print("First model: ", first_model, flush=True)

package = client.get_package()
assert package
print("Package: ", package, flush=True)

checksum = client.get_package_checksum()
assert checksum
print("Checksum: ", checksum, flush=True)

rounds = client.list_rounds()
assert rounds
print("Rounds: ", rounds, flush=True)

round = client.get_round(1)
assert round
print("Round: ", round, flush=True)

sessions = client.list_sessions()
assert sessions
print("Sessions: ", sessions, flush=True)


if __name__ == '__main__':

client = APIClient(host="localhost", port=8092)
fire.Fire({
'set_seed': client.set_initial_model,
'set_package': client.set_package,
'start_session': client.start_session,
'get_client_config': _download_config,
'test_api_get_methods': test_api_get_methods,
})
7 changes: 5 additions & 2 deletions .ci/tests/examples/print_logs.sh
Original file line number Diff line number Diff line change
@@ -5,8 +5,11 @@ docker logs "$(basename $PWD)_minio_1"
echo "Mongo logs"
docker logs "$(basename $PWD)_mongo_1"

echo "Reducer logs"
docker logs "$(basename $PWD)_reducer_1"
echo "Dashboard logs"
docker logs "$(basename $PWD)_dashboard_1"

echo "API-Server logs"
docker logs "$(basename $PWD)_api-server_1"

echo "Combiner logs"
docker logs "$(basename $PWD)_combiner_1"
24 changes: 8 additions & 16 deletions .ci/tests/examples/run.sh
Original file line number Diff line number Diff line change
@@ -23,34 +23,23 @@ docker-compose \
".$example/bin/python" ../../.ci/tests/examples/wait_for.py combiners

>&2 echo "Upload compute package"
curl -k -X POST \
-F file=@package.tgz \
-F helper="$helper" \
http://localhost:8090/context
printf '\n'
".$example/bin/python" ../../.ci/tests/examples/api_test.py set_package --path package.tgz --helper "$helper"

>&2 echo "Upload seed"
curl -k -X POST \
-F seed=@seed.npz \
http://localhost:8090/models
printf '\n'
".$example/bin/python" ../../.ci/tests/examples/api_test.py set_seed --path seed.npz

>&2 echo "Wait for clients to connect"
".$example/bin/python" ../../.ci/tests/examples/wait_for.py clients

>&2 echo "Start round"
curl -k -X POST \
-F rounds=3 \
-F validate=True \
http://localhost:8090/control
printf '\n'
>&2 echo "Start session"
".$example/bin/python" ../../.ci/tests/examples/api_test.py start_session --rounds 3 --helper "$helper"

>&2 echo "Checking rounds success"
".$example/bin/python" ../../.ci/tests/examples/wait_for.py rounds

>&2 echo "Test client connection with dowloaded settings"
# Get config
curl -k http://localhost:8090/config/download > ../../client.yaml
".$example/bin/python" ../../.ci/tests/examples/api_test.py get_client_config --output ../../client.yaml

# Redeploy clients with config
docker-compose \
@@ -62,5 +51,8 @@ docker-compose \
>&2 echo "Wait for clients to reconnect"
".$example/bin/python" ../../.ci/tests/examples/wait_for.py clients

>&2 echo "Test API GET requests"
".$example/bin/python" ../../.ci/tests/examples/api_test.py test_api_get_methods

popd
>&2 echo "Test completed successfully"
3 changes: 2 additions & 1 deletion .github/workflows/code-checks.yaml
Original file line number Diff line number Diff line change
@@ -42,7 +42,8 @@ jobs:
--exclude-dir='.venv'
--exclude-dir='.mnist-pytorch'
--exclude-dir='.mnist-keras'
--exclude-dir='docs'
--exclude-dir='docs'
--exclude='tests.py'
'^[ \t]+(import|from) ' -I .
# TODO: add linting/formatting for all file types
7 changes: 6 additions & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -6,4 +6,9 @@ build:
python: "3.9"

sphinx:
configuration: docs/conf.py
configuration: docs/conf.py

python:
install:
- method: pip
path: ./fedn
4 changes: 2 additions & 2 deletions config/settings-client.yaml.template
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
network_id: fedn-network
discover_host: reducer
discover_port: 8090
discover_host: api-server
discover_port: 8092
4 changes: 2 additions & 2 deletions config/settings-combiner.yaml.template
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
network_id: fedn-network
discover_host: reducer
discover_port: 8090
discover_host: api-server
discover_port: 8092

name: combiner
host: combiner
4 changes: 4 additions & 0 deletions config/settings-reducer.yaml.template
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
network_id: fedn-network
controller:
host: api-server
port: 8092
debug: True

statestore:
type: MongoDB
30 changes: 26 additions & 4 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -58,14 +58,13 @@ services:
ports:
- 8081:8081

# Reducer
reducer:
dashboard:
environment:
- GET_HOSTS_FROM=dns
- USER=test
- PROJECT=project
- FLASK_DEBUG=1
- FLASK_ENV=development
- STATESTORE_CONFIG=/app/config/settings-reducer.yaml
build:
context: .
args:
@@ -75,10 +74,33 @@ services:
- ${HOST_REPO_DIR:-.}/fedn:/app/fedn
entrypoint: [ "sh", "-c" ]
command:
- "/venv/bin/pip install --no-cache-dir -e /app/fedn && /venv/bin/fedn run reducer -n reducer --init=config/settings-reducer.yaml"
- "/venv/bin/pip install --no-cache-dir -e /app/fedn && /venv/bin/fedn run dashboard -n reducer --init=config/settings-reducer.yaml"
ports:
- 8090:8090

api-server:
environment:
- GET_HOSTS_FROM=dns
- USER=test
- PROJECT=project
- FLASK_DEBUG=1
- STATESTORE_CONFIG=/app/config/settings-reducer.yaml
build:
context: .
args:
BASE_IMG: ${BASE_IMG:-python:3.9-slim}
working_dir: /app
volumes:
- ${HOST_REPO_DIR:-.}/fedn:/app/fedn
depends_on:
- minio
- mongo
entrypoint: [ "sh", "-c" ]
command:
- "/venv/bin/pip install --no-cache-dir -e /app/fedn && /venv/bin/python fedn/fedn/network/api/server.py"
ports:
- 8092:8092

# Combiner
combiner:
environment:
4 changes: 4 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FEDn is using sphinx with reStructuredText.

sphinx-apidoc --ext-autodoc --module-first -o _source ../fedn/fedn ../*tests* ../*exceptions* ../*common* ../ ../fedn/fedn/network/api/server.py ../fedn/fedn/network/controller/controlbase.py
sphinx-build . _build
66 changes: 38 additions & 28 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
@@ -1,53 +1,63 @@
Architecture overview
=====================

Constructing a federated model with FEDn amounts to a) specifying the details of the client-side training code and data integrations, and b) deploying the reducer-combiner network. A FEDn network, as illustrated in the picture below, is made up of three main components: the *Reducer*, one or more *Combiners*, and a number of *Clients*. The combiner network forms the backbone of the FedML orchestration mechanism, while the Reducer provides discovery services and provides controls to coordinate training over the combiner network. By horizontally scaling the combiner network, one can meet the needs of a growing number of clients.
Constructing a federated model with FEDn amounts to a) specifying the details of the client-side training code and data integrations, and b) deploying the federated network. A FEDn network, as illustrated in the picture below, is made up of components into three different tiers: the *Controller* tier (3), one or more *Combiners* in second tier (2), and a number of *Clients* in tier (1).
The combiners forms the backbone of the federated ML orchestration mechanism, while the Controller tier provides discovery services and controls to coordinate training over the federated network.
By horizontally scaling the number of combiners, one can meet the needs of a growing number of clients.

.. image:: img/overview.png
.. image:: img/FEDn_network.png
:alt: FEDn network
:width: 100%
:align: center

Main components
---------------

Client
......

A Client is a data node, holding private data and connecting to a Combiner to receive model update requests and model validation requests during training rounds. Importantly, clients do not require any open ingress ports. A client receives the code to be executed from the Reducer upon connecting to the network, and thus they only need to be configured prior to connection to read the local datasets during training and validation. Python3 client implementation is provided out of the box, and it is possible to write clients in a variety of languages to target different software and hardware requirements.

Combiner
........
The clients: tier 1
...................

A combiner is an actor whose main role is to orchestrate and aggregate model updates from a number of clients during a training round. When and how to trigger such orchestration rounds are specified in the overall *compute plan* laid out by the Reducer. Each combiner in the network runs an independent gRPC server, providing RPCs for interacting with the alliance subsystem it controls. Hence, the total number of clients that can be accommodated in a FEDn network is proportional to the number of active combiners in the FEDn network. Combiners can be deployed anywhere, e.g. in a cloud or on a fog node to provide aggregation services near the cloud edge.
A Client (gRPC client) is a data node, holding private data and connecting to a Combiner (gRPC server) to receive model update requests and model validation requests during training sessions.
Importantly, clients uses remote procedure calls (RPC) to ask for model updates tasks, thus the clients not require any open ingress ports! A client receives the code (called package or compute package) to be executed from the *Controller*
upon connecting to the network, and thus they only need to be configured prior to connection to read the local datasets during training and validation. The package is based on entry points in the client code, and can be customized to fit the needs of the user.
This allows for a high degree of flexibility in terms of what kind of training and validation tasks that can be performed on the client side. Such as different types of machine learning models and framework, and even programming languages.
A python3 client implementation is provided out of the box, and it is possible to write clients in a variety of languages to target different software and hardware requirements.

Reducer
.......
The combiners: tier 2
.....................

The reducer fills three main roles in the FEDn network: 1.) it lays out the overall, global training strategy and communicates that to the combiner network. It also dictates the strategy to aggregate model updates from individual combiners into a single global model, 2.) it handles global state and maintains the *model trail* - an immutable trail of global model updates uniquely defining the FedML training timeline, and 3.) it provides discovery services, mediating connections between clients and combiners. For this purpose, the Reducer exposes a standard REST API.
A combiner is an actor whose main role is to orchestrate and aggregate model updates from a number of clients during a training session.
When and how to trigger such orchestration are specified in the overall *compute plan* laid out by the *Controller*.
Each combiner in the network runs an independent gRPC server, providing RPCs for interacting with the federated network it controls.
Hence, the total number of clients that can be accommodated in a FEDn network is proportional to the number of active combiners in the FEDn network.
Combiners can be deployed anywhere, e.g. in a cloud or on a fog node to provide aggregation services near the cloud edge.

Services and communication
--------------------------
The controller: tier 3
......................

The figure below provides a logical architecture view of the services provided by each agent and how they interact.
Tier 3 does actually contain several components and services, but we tend to associate it with the *Controller* the most. The *Controller* fills three main roles in the FEDn network:

.. image:: img/FEDn-arch-overview.png
:alt: FEDn architecture overview
:width: 100%
:align: center
1. it lays out the overall, global training strategy and communicates that to the combiner network.
It also dictates the strategy to aggregate model updates from individual combiners into a single global model,
2. it handles global state and maintains the *model trail* - an immutable trail of global model updates uniquely defining the federated ML training timeline, and
3. it provides discovery services, mediating connections between clients and combiners. For this purpose, the *Controller* exposes a standard REST API both for RPC clients and servers, but also for user interfaces and other services.

Tier 3 also contain a *Reducer* component, which is responsible for aggregating combiner-level models into a single global model. Further, it contains a *StateStore* database,
which is responsible for storing various states of the network and training sessions. The final global model trail from a traning session is stored in the *ModelRegistry* database.


Control flows and algorithms
----------------------------
Notes on aggregating algorithms
...............................

FEDn is designed to allow customization of the FedML algorithm, following a specified pattern, or programming model. Model aggregation happens on two levels in the system. First, each Combiner can be configured with a custom orchestration and aggregation implementation, that reduces model updates from Clients into a single, *combiner level* model. Then, a configurable aggregation protocol on the Reducer level is responsible for combining the combiner-level models into a global model. By varying the aggregation schemes on the two levels in the system, many different possible outcomes can be achieved. Good staring configurations are provided out-of-the-box to help the user get started.
FEDn is designed to allow customization of the FedML algorithm, following a specified pattern, or programming model.
Model aggregation happens on two levels in the network. First, each Combiner can be configured with a custom orchestration and aggregation implementation, that reduces model updates from Clients into a single, *combiner level* model.
Then, a configurable aggregation protocol on the *Controller* level is responsible for combining the combiner-level models into a global model. By varying the aggregation schemes on the two levels in the system,
many different possible outcomes can be achieved. Good starting configurations are provided out-of-the-box to help the user get started. See API reference for more details.

Hierarchical Federated Averaging
................................

The currently implemented default scheme uses a local SGD strategy on the Combiner level aggregation and a simple average of models on the reducer level. This results in a highly horizontally scalable FedAvg scheme. The strategy works well with most artificial neural network (ANNs) models, and can in general be applied to models where it is possible and makes sense to form mean values of model parameters (for example SVMs). Additional FedML training protocols, including support for various types of federated ensemble models, are in active development.
The currently implemented default scheme uses a local SGD strategy on the Combiner level aggregation and a simple average of models on the reducer level.
This results in a highly horizontally scalable FedAvg scheme. The strategy works well with most artificial neural network (ANNs) models,
and can in general be applied to models where it is possible and makes sense to form mean values of model parameters (for example SVMs).


.. image:: img/HFedAvg.png
:alt: FEDn architecture overview
:width: 100%
:align: center
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -106,3 +106,5 @@

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None}

pygments_style = 'sphinx'
4 changes: 2 additions & 2 deletions docs/deployment.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Deployment
Distributed Deployment
======================

This guide serves as reference deployment for setting up a FEDn network consisting of:
@@ -29,7 +29,7 @@ The reducer and clients need to be able to resolve the hostname for the combiner
we show how this can be achieved if no external DNS resolution is available, by setting "extra host" in the Docker containers for the Reducer and client. Note that there are many other possible ways to achieve this, depending on your setup.

1. Deploy storage and database services (MinIO, MongoDB and MongoExpress)
--------------------------------------------------------------------
-------------------------------------------------------------------------

First, deploy MinIO and Mongo services on one of the hosts. Edit the `docker-compose.yaml` file to change the default passwords and ports.

2 changes: 1 addition & 1 deletion docs/faq.rst
Original file line number Diff line number Diff line change
@@ -71,7 +71,7 @@ Q: How can I configure the round validity policy:
In the main control implementation https://github.com/scaleoutsystems/fedn/blob/master/fedn/fedn/clients/reducer/control.py you can modify or replace the wiwmethod "check_round_validity_policy". As we expand with more implementations of this policy, we plan to make it runtime configurable.

Q: Can I start a client listening only to training requests or only on validation requests?:
-------------------------------------------------
--------------------------------------------------------------------------------------------

Yes! From FEDn 0.3.0 there is an option to toggle which message streams a client subscibes to. For example, to start a pure validation client:

Loading

0 comments on commit 7a18790

Please sign in to comment.