Skip to content

Commit

Permalink
Merge pull request #36 from IBM/version1.0.2
Browse files Browse the repository at this point in the history
Version 1.0.2 of IBM Federated Learning library
  • Loading branch information
shashank215r authored Nov 3, 2020
2 parents 46edac9 + cb53fe8 commit 63c92cb
Show file tree
Hide file tree
Showing 11 changed files with 118 additions and 132 deletions.
4 changes: 4 additions & 0 deletions docs/tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,7 @@

- Learn how to create your own data handler in [Create a customized data handler](../tutorials/create_my_data_handler.md).

- Learn how to load large datasets via data generators in [Set up data generators](../tutorials/set_up_data_generators_for_fl.md).

- Learn how to specify quorum, maximum timeout for each round, and rejoin party after a dropout [Quorum handling and ability to Rejoin](../tutorials/quorum_rejoin.md).

56 changes: 56 additions & 0 deletions docs/tutorials/configure_gpu_training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Enabling GPU training

IBM federated learning offers support for training neural network models
under GPU environment at the party side to speedup the training process.

## Environment setup
Please install required libraries for GPU training.
- For Keras and TensorFlow models, install the corresponding `tensorflow-gpu` package
according to [Tensorflow GPU tutorial](https://www.tensorflow.org/install/gpu).
IBM FL currently requires `tensorflow==1.15.0`, therefore,
you will need to install `tensorflow-gpu==1.15.0` in your GPU environment.

## IBM FL configuration
Users can enable and specify the number of GPUs they want to use for training
via the party's configuration file.
Below is an example of the party's configuration file:
```yaml
aggregator:
ip: 127.0.0.1
port: 5000
connection:
info:
ip: 127.0.0.1
port: 8085
tls_config:
enable: false
name: FlaskConnection
path: ibmfl.connection.flask_connection
sync: false
data:
info:
npz_file: examples/data/mnist/random/data_party0.npz
name: MnistKerasDataHandler
path: ibmfl.util.data_handlers.mnist_keras_data_handler
local_training:
name: LocalTrainingHandler
path: ibmfl.party.training.local_training_handler
model:
name: KerasFLModel
path: ibmfl.model.keras_fl_model
spec:
model_definition: examples/configs/keras_classifier/compiled_keras.h5
model_name: keras-cnn
info:
gpu:
num_gpus: 2 # enabling keras training with 2 GPUs
protocol_handler:
name: PartyProtocolHandler
path: ibmfl.party.party_protocol_handler
```
In the above example, the `gpu` section under `info` section of `model` specifies
the `gpu` setting of party's local training.
Users can change the `num_gpus` according to the computing resources available to the parties.

If no `gpu` section is presented in `info`, the Keras/TensorFlow.keras training will be
using the default CPU environment or **only one GPU** even if the party can access one or more GPU(s).
21 changes: 21 additions & 0 deletions docs/tutorials/quorum_rejoin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Quorum handling and ability to Rejoin

## Quorum handling
IBM FL supports the functionality to specify quorum percentage in the aggregator config file to provide flexibility to parties that have potential connectivity failure. Given a total number of parties registered at a particular round, the quorum percentage defines the minimum number of parties that should reply back for that round. If for some round aggregator receives less number of replies from the parties, it will stop the federated learning process. This functionality makes sure that if for some reasons a number of parties dropout they can rejoin back as long as the available parties do not fall below the quorum value.

For example in following configuration file `perc_quorum` is set to 0.75. This means that for each round aggregator will expect 75% of the registered parties to reply back. So if there are 20 parties that registered, federated learning will continue as long as not more than five parties drop out.

```
hyperparams:
global:
max_timeout: 60
num_parties: 5
perc_quorum: 0.75
rounds: 3
termination_accuracy: 0.9
```

## Maximum Timeout and Rejoin
Users can specify the maximum timeout (in seconds) aggregator should wait for parties to reply back in the aggregator configuration file. If `max_timeout` value is specified, aggregator will wait for specified amount of time to check if the required number of parties (calculated based on the quorum percentage provided earlier) have replied back or not. Please note that if quorum percentage is not specified aggregator will expect the value to be 100% and expect reply from all the registered parties. Similarly, if maximum timeout is not specified aggregator will wait forever for parties to reply back.

To rejoin party just needs to issue START and REGISTER commands like it did initially to join federated learning process.
3 changes: 1 addition & 2 deletions examples/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
FL_DATASETS = ["default", "mnist", "nursery", "adult", "federated-clustering",
"higgs", "airline", "diabetes", "binovf", "multovf", "linovf"]
FL_EXAMPLES = ["id3_dt", "fedavg", "keras_classifier","pfnm",
"sklearn_logclassification", "sklearn_sgdclassifier",
"rl_cartpole", "rl_pendulum", "coordinate_median", "krum",
"sklearn_logclassification", "rl_cartpole", "rl_pendulum", "coordinate_median", "krum",
"naive_bayes", "keras_gradient_aggregation", "spahm", "zeno"]
FL_CONN_TYPES = ["flask"]
23 changes: 18 additions & 5 deletions examples/sklearn_logclassification/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,37 @@

# Running Scikitlearn Logistic Regression Classifier on Adult Dataset in IBM federated learning
# Running Scikitlearn Logistic Classifier in IBM federated learning

Currently, for logistic classifier we support the following datasets:

* [Adult Dataset](https://archive.ics.uci.edu/ml/datasets/Adult)
* [MNIST](http://yann.lecun.com/exdb/mnist/)

This example explains how to run federated learning on a Logistic Regression Classifier, implemented with Scikit-Learn
training on [Adult Dataset](https://archive.ics.uci.edu/ml/datasets/Adult).

The following preprocessing was performed in `AdultSklearnDataHandler` on the original dataset:
* Drop following features: `workclass`, `fnlwgt`, `education`, `marital-status`, `occupation`, `relationship`, `capital-gain`, `capital-loss`, `hours-per-week`, `native-country`
* Map `race`, `sex` and `class` values to 0/1
* Split `age` and `education` columns into multiple columns based on value

Further details in documentation of `preprocess()` in `AdultSklearnDataHandler`.

No other preprocessing is performed.

The following preprocessing was performed on the MNIST dataset:

* Data is scaled down to range from `[0, 255]` to `[0, 1]`
* Images are reshaped from`[28, 28]` to `[1,784]`


No other preprocessing is performed.

- Split data by running:

```
python examples/generate_data.py -n <num_parties> -d adult -pp <points_per_party>
python examples/generate_data.py -n <num_parties> -d <dataset_name> -pp <points_per_party>
```
- Generate config files by running:
```
python examples/generate_configs.py -n <num_parties> -m sklearn_logclassification -d adult -p <path>
python examples/generate_configs.py -n <num_parties> -m sklearn_logclassification -d <dataset_name> -p <path>
```
- In a terminal running an activated IBM FL environment
(refer to Quickstart in our website to learn more about how to set up the running environment), start the aggregator by running:
Expand Down
10 changes: 9 additions & 1 deletion examples/sklearn_logclassification/generate_configs.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os
import pickle
import numpy as np
from sklearn.linear_model import SGDClassifier

import examples.datahandlers as datahandlers
Expand Down Expand Up @@ -39,10 +40,12 @@ def get_hyperparams():

def get_data_handler_config(party_id, dataset, folder_data, is_agg=False):

SUPPORTED_DATASETS = ['adult']
SUPPORTED_DATASETS = ['adult', 'mnist']
if dataset in SUPPORTED_DATASETS:
if dataset == 'adult':
dataset = 'adult_sklearn'
elif dataset == 'mnist':
dataset = 'mnist_sklearn'
data = datahandlers.get_datahandler_config(
dataset, folder_data, party_id, is_agg)
else:
Expand All @@ -57,6 +60,11 @@ def get_model_config(folder_configs, dataset, is_agg=False, party_id=0):

model = SGDClassifier(loss='log', penalty='l2')

if dataset == 'adult':
model.classes_ = np.array([0, 1])
elif dataset == 'mnist':
model.classes_ = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

if not os.path.exists(folder_configs):
os.makedirs(folder_configs)

Expand Down
34 changes: 0 additions & 34 deletions examples/sklearn_sgdclassifier/README.md

This file was deleted.

84 changes: 0 additions & 84 deletions examples/sklearn_sgdclassifier/generate_configs.py

This file was deleted.

Binary file not shown.
Binary file not shown.
15 changes: 9 additions & 6 deletions log_config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
version: 1
disable_existing_loggers: False
formatters:
ffl_std:
format: "%(asctime)s -STD %(name)s - %(levelname)s - %(message)s"
fl_std:
format: "%(asctime)s | %(version)s | %(levelname)s | %(name)-45s | %(message)s"

handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: ffl_std
filters: ['version_filter']
formatter: fl_std
stream: ext://sys.stdout

info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: ffl_std
filters: ['version_filter']
formatter: fl_std
filename: info.log
maxBytes: 10485760
backupCount: 10
Expand All @@ -23,7 +25,8 @@ handlers:
error_file_handler:
class: logging.handlers.RotatingFileHandler
level: ERROR
formatter: ffl_std
filters: ['version_filter']
formatter: fl_std
filename: errors.log
maxBytes: 10485760 # 10MB
backupCount: 10
Expand All @@ -37,4 +40,4 @@ loggers:

root:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
handlers: [console, info_file_handler, error_file_handler]

0 comments on commit 63c92cb

Please sign in to comment.