Skip to content

Commit

Permalink
Feature/cross device simulation (#504)
Browse files Browse the repository at this point in the history
* Increase sleep for reducers and clients

* Improve logs

* Always print logs

* Fix PyTorch example data mount path in compose file

Fix PyTorch example data mount path in override compose file.

* mets many python versions

* quotes

* Fix CI sleep time

* Add Python versiong

* don't fail fast

* remove python 3.10

* remove python 3.10

* fix numpy for py 3.7

* Inference CI

* minor

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* reduce CI time

* fix conflict

* Initial implementation toggle ssl for REST service

* Removed unused reducer inference interface mockup

* Removed geoip2 dependency

* Dockerfile update, install developer tools

* Draft implementation

* Remove mocked inference endpoint in restservice

* Develop (#418)

* validate user-specified name (#415)

* Delete old Docker-related files (#412)

Co-authored-by: Marco Capuccini <[email protected]>

* fix code-checks

* insecure mode in ci (http)

* secure option to package download and checksum

* work in progress

* fix flake8 warning

* Remove Talisman

* bugfix, combiner now correctly uses secure flag in connector

* Revert accidetal change to compose file

* sort import

* Changed combiner ssl default config to False

* Fixed code checks

* Code checks

* Add docstings in connecy.py

* Add docstings in certificatemanager

* Docstrings

* Changed some parameter names  in reducer CLI

* Default no-ssl for REST, ssl for gRPC

* Fix code check

* Harmoize option names between combiner and reducer

* Add help text for combiner options

* Make --secure option flag

* Works to disable secure grpc

* Added back use of copy

* Remove possibility to generate cert for reducer

* Default to insecure gRPC setting

* Fix code scanning alerts

* Initial refactor

* Initial refactor reducer

* Introduce base class for controller

* More refactoring and cleaning

* refactored look-aside loadbalancer

* Refactored load-balancer

* Fixed code checks

* latest

* work in progress

* Fixed code checks

* Update control page

* added metadata field to modelupdaterequest

* Client passes on metadata dict with model update

* Latest

* Latest

* latest

* Refactor aggregation

* Fix

* Add docstring for load_model_update

* Extract model update metadata and make available in aggregator

* Added some docstrings

* More docstrings

* Renamed aggregator files and base class

* suppress LOG status messages in stdout

* Introduce policy for when to trigger aggregation at combiner

* Latest

* Added files

* Fixes

* Fixed broken congig file generation.

* Added option to parse client name from config file

* Flattened client config file, generalized so that all settings can be passed in the file

* Fixed file generation

* Latest

* Updated config template

* Removed mongotracing in control, will refactor to have all tracing data in one collection

* Refactored combiner job submit

* Remove psutil tracing

* Refactor tracer

* cleaning

* get latest round refactored

* Enable early termination by default

* Removed unused round_config object

* Remove printout of sensitive information

* Remove old control, make new version default

* Remove unused code

* Changed default name for fedn network in config template

* Cleaning, docstrings

* bugfix

* Variable name changes

* Removed old combine models implementation

* bugfix

* Add a hook to validate the model update before putting it on the aggregation queue

* Validate metadata on model 'update

* Validate metadata on model 'update

* incremental weighted average in new style aggregator

* small cleaning in control form

* Added instructions in controller form, rearranged menu items

* latest

* Resolve merge conflicts

* Added back accidentally removed file

* Conflict resolution

* Remove unused readme file

* More merging

* latest

* Fixed round_config regression

* Controller polls db instead of combiners

* More api docs

* Add infer_instruct

* Cleaning

* Added training metadata for keras example

* work in progress db cleanup

* Refactor

* More refactoring in db backend

* Remove 'control' setting from reducer config file

* Flatten combiner config

* Flatten combiner config

* Flatten combiner config

* Harmonize CLI option names

* Refactor helpers

* Refactor helpers

* Refactor helpers

* Refactor helpers

* Refactor helpers

* Plugin arch for helpers

* Updated UI config

* Raise exception if misconfigured helper

* Added tracing of sessions in the db

* Update version to 0.5-dev

* Updated torch version

* Updated torch version

* bugfix

* Skip osx tests

* latest

* change helper name

* fix formatting and syntax

* fix formatting and syntax errors

* update ci new db

* fix round_id key and equal weight to reduce models

* save helper for metrics and metadata

* improve readability and add test for fedavg

* update doc strings for client and combiner

* Resolve conflict

* formatting

* add id to logging

* extra logging and doc strings

* work in progress

* Refactor of controller

* Refactor of controller

* Refactor polling in control

* Refactor polling in control

* Refactor polling in control

* Functioning

* start on new simulation example

* update

* Updated test

* Fix typos

* Removed accidentally committed files

* update api

* added new async-simulation example

* rename example

* latest

* Updates after code review

* Resolved merge conflicts

* Updated docstrings

* Fixed docstrings

* Fixes

* Fixed code check

* use setter

* latest

* removed script for combiners

* Fix numpyarrayhelper

* work in progress

* Use latest mongodb and bump version number

* Fixed bug in client

* Client sends model only once, combiner deletes staged model after training round

* Cleaned up new example/test

* Change naming of temp storage class member in modelservice, for clarity

* Make detach() public

* Renamed some methods in client for clarity

* refactored set_model to avoide code duplication on client

* Refactored modelservice for code reuse

* Fix dashboard package upload

* Fix default helper in session

* Delete combiner level model from minio after reduce

* delete combiner models from minio by default

* code checks

* changes following review

---------

Co-authored-by: mcapuccini <[email protected]>
Co-authored-by: Andreas Hellander <[email protected]>
Co-authored-by: Fredrik Wrede <[email protected]>
  • Loading branch information
4 people authored Jan 29, 2024
1 parent 2b95237 commit 6e7957c
Show file tree
Hide file tree
Showing 23 changed files with 673 additions and 153 deletions.
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ services:
- 9001:9001

mongo:
image: mongo:7.0
image: mongo:7.0
restart: always
environment:
- MONGO_INITDB_ROOT_USERNAME=fedn_admin
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ For the compute package we need to compress the *client* folder as .tar.gz file.

.. code-block:: bash
tar -czvf package.tar.gz client
tar -czvf package.tgz client
This file can then be uploaded to the FEDn network using the FEDn UI or the :py:mod:`fedn.network.api.client`.
Expand Down
6 changes: 6 additions & 0 deletions examples/async-simulation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
data
*.npz
*.tgz
*.tar.gz
.async-simulation
client.yaml
178 changes: 178 additions & 0 deletions examples/async-simulation/Experiment.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "622f7047",
"metadata": {},
"source": [
"## FEDn API Example\n",
"\n",
"This notebook provides an example of how to use the FEDn API to organize experiments and to analyze validation results. We will here run one training session using FedAvg and one session using FedAdam and compare the results.\n",
"\n",
"When you start this tutorial you should have a deployed FEDn Network up and running, and you should have created the compute package and the initial model, see the README for instructions."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "743dfe47",
"metadata": {},
"outputs": [],
"source": [
"from fedn import APIClient\n",
"from fedn.dashboard.plots import Plot\n",
"from fedn.network.clients.client import Client\n",
"import uuid\n",
"import json\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import collections\n",
"import copy"
]
},
{
"cell_type": "markdown",
"id": "1046a4e5",
"metadata": {},
"source": [
"We make a client connection to the FEDn API service. Here we assume that FEDn is deployed locally in pseudo-distributed mode with default ports."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "1061722d",
"metadata": {},
"outputs": [],
"source": [
"DISCOVER_HOST = '127.0.0.1'\n",
"DISCOVER_PORT = 8092\n",
"client = APIClient(DISCOVER_HOST, DISCOVER_PORT)"
]
},
{
"cell_type": "markdown",
"id": "07f69f5f",
"metadata": {},
"source": [
"Initialize FEDn with the compute package and seed model. Note that these files needs to be created separately by follwing instructions in the README."
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "5107f6f9",
"metadata": {},
"outputs": [],
"source": [
"client.set_package('package.tgz', 'numpyhelper')\n",
"client.set_initial_model('seed.npz')\n",
"seed_model = client.get_initial_model()"
]
},
{
"cell_type": "markdown",
"id": "4e26c50b",
"metadata": {},
"source": [
"Next we start a training session using FedAvg:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "f0380d35",
"metadata": {},
"outputs": [],
"source": [
"session_config_fedavg = {\n",
" \"helper\": \"numpyhelper\",\n",
" \"session_id\": \"experiment_fedavg4\",\n",
" \"aggregator\": \"fedavg\",\n",
" \"model_id\": seed_model['model_id'],\n",
" \"rounds\": 1,\n",
" }\n",
"\n",
"result_fedavg = client.start_session(**session_config_fedavg)"
]
},
{
"cell_type": "markdown",
"id": "29552af9",
"metadata": {},
"source": [
"Next, we retrive all model validations from all clients, extract the training accuracy metric, and compute its mean value accross all clients"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "11fd17ef",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"models = client.list_models(session_id = \"experiment_fedavg\")\n",
"\n",
"validations = []\n",
"acc = collections.OrderedDict()\n",
"for model in models[\"result\"]:\n",
" model_id = model[\"model\"]\n",
" validations = client.list_validations(modelId=model_id)\n",
"\n",
" for _ , validation in validations.items(): \n",
" metrics = json.loads(validation['data'])\n",
" try:\n",
" acc[model_id].append(metrics['training_accuracy'])\n",
" except KeyError: \n",
" acc[model_id] = [metrics['training_accuracy']]\n",
" \n",
"mean_acc_fedavg = []\n",
"for model, data in acc.items():\n",
" mean_acc_fedavg.append(np.mean(data))\n",
"mean_acc_fedavg.reverse()"
]
},
{
"cell_type": "markdown",
"id": "40db4542",
"metadata": {},
"source": [
"Finally, plot the result."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d064aaf9",
"metadata": {},
"outputs": [],
"source": [
"x = range(1,len(mean_acc_fedavg)+1)\n",
"plt.plot(x, mean_acc_fedavg)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
53 changes: 53 additions & 0 deletions examples/async-simulation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# ASYNC SIMULATION
This example is intended as a test for asynchronous clients.

## Prerequisites
- [Python 3.8, 3.9 or 3.10](https://www.python.org/downloads)
- [Docker](https://docs.docker.com/get-docker)
- [Docker Compose](https://docs.docker.com/compose/install)

## Running the example (pseudo-distributed, single host)

Clone FEDn and locate into this directory.
```sh
git clone https://github.com/scaleoutsystems/fedn.git
cd fedn/examples/async-simulation
```

### Preparing the environment, the local data, the compute package and seed model

Install FEDn and dependencies (we recommend using a virtual environment):

Standing in the folder 'fedn/fedn'

```
pip install -e .
```

From examples/async-simulation
```
pip install -r requirements.txt
```

Create the compute package and a seed model that you will be asked to upload in the next step.
```
tar -czvf package.tgz client
```

```
python client/entrypoint init_seed
```

### Deploy FEDn and two clients
docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml up

### Initialize the federated model
See 'Experiments.pynb' or 'launch_client.py' to set the package and seed model.

> **Note**: run with `--scale client=N` to start *N* clients.
### Run federated training
See 'Experiment.ipynb'.

## Clean up
You can clean up by running `docker-compose down -v`.
98 changes: 98 additions & 0 deletions examples/async-simulation/client/entrypoint
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# /bin/python
import time

import fire
import numpy as np

from fedn.utils.helpers.helpers import get_helper, save_metadata, save_metrics

HELPER_MODULE = 'numpyhelper'
ARRAY_SIZE = 1000000


def save_model(weights, out_path):
""" Save model to disk.

:param model: The model to save.
:type model: torch.nn.Module
:param out_path: The path to save to.
:type out_path: str
"""
helper = get_helper(HELPER_MODULE)
helper.save(weights, out_path)


def load_model(model_path):
""" Load model from disk.

param model_path: The path to load from.
:type model_path: str
:return: The loaded model.
:rtype: torch.nn.Module
"""
helper = get_helper(HELPER_MODULE)
weights = helper.load(model_path)
return weights


def init_seed(out_path='seed.npz'):
""" Initialize seed model.

:param out_path: The path to save the seed model to.
:type out_path: str
"""
# Init and save
weights = [np.random.rand(1, ARRAY_SIZE)]
save_model(weights, out_path)


def train(in_model_path, out_model_path):
""" Train model.

"""

# Load model
weights = load_model(in_model_path)

# Train
time.sleep(np.random.randint(4, 15))

# Metadata needed for aggregation server side
metadata = {
'num_examples': ARRAY_SIZE,
}

# Save JSON metadata file
save_metadata(metadata, out_model_path)

# Save model update
save_model(weights, out_model_path)


def validate(in_model_path, out_json_path):
""" Validate model.

:param in_model_path: The path to the input model.
:type in_model_path: str
:param out_json_path: The path to save the output JSON to.
:type out_json_path: str
:param data_path: The path to the data file.
:type data_path: str
"""
weights = load_model(in_model_path)

# JSON schema
report = {
"mean": np.mean(weights),
}

# Save JSON
save_metrics(report, out_json_path)


if __name__ == '__main__':
fire.Fire({
'init_seed': init_seed,
'train': train,
'validate': validate
})
5 changes: 5 additions & 0 deletions examples/async-simulation/client/fedn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
entry_points:
train:
command: /venv/bin/python entrypoint train $ENTRYPOINT_OPTS
validate:
command: /venv/bin/python entrypoint validate $ENTRYPOINT_OPTS
8 changes: 8 additions & 0 deletions examples/async-simulation/init_fedn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from fedn import APIClient

DISCOVER_HOST = '127.0.0.1'
DISCOVER_PORT = 8092

client = APIClient(DISCOVER_HOST, DISCOVER_PORT)
client.set_package('package.tgz', 'numpyhelper')
client.set_initial_model('seed.npz')
Loading

0 comments on commit 6e7957c

Please sign in to comment.