diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md index 55188ac..8b894d8 100644 --- a/docs/tutorials/README.md +++ b/docs/tutorials/README.md @@ -8,3 +8,7 @@ - Learn how to create your own data handler in [Create a customized data handler](../tutorials/create_my_data_handler.md). +- Learn how to load large datasets via data generators in [Set up data generators](../tutorials/set_up_data_generators_for_fl.md). + +- Learn how to specify quorum, maximum timeout for each round, and rejoin party after a dropout [Quorum handling and ability to Rejoin](../tutorials/quorum_rejoin.md). + diff --git a/docs/tutorials/configure_gpu_training.md b/docs/tutorials/configure_gpu_training.md new file mode 100644 index 0000000..bb72ee0 --- /dev/null +++ b/docs/tutorials/configure_gpu_training.md @@ -0,0 +1,56 @@ +# Enabling GPU training + +IBM federated learning offers support for training neural network models +under GPU environment at the party side to speedup the training process. + +## Environment setup +Please install required libraries for GPU training. + - For Keras and TensorFlow models, install the corresponding `tensorflow-gpu` package + according to [Tensorflow GPU tutorial](https://www.tensorflow.org/install/gpu). + IBM FL currently requires `tensorflow==1.15.0`, therefore, + you will need to install `tensorflow-gpu==1.15.0` in your GPU environment. + +## IBM FL configuration +Users can enable and specify the number of GPUs they want to use for training +via the party's configuration file. +Below is an example of the party's configuration file: +```yaml +aggregator: + ip: 127.0.0.1 + port: 5000 +connection: + info: + ip: 127.0.0.1 + port: 8085 + tls_config: + enable: false + name: FlaskConnection + path: ibmfl.connection.flask_connection + sync: false +data: + info: + npz_file: examples/data/mnist/random/data_party0.npz + name: MnistKerasDataHandler + path: ibmfl.util.data_handlers.mnist_keras_data_handler +local_training: + name: LocalTrainingHandler + path: ibmfl.party.training.local_training_handler +model: + name: KerasFLModel + path: ibmfl.model.keras_fl_model + spec: + model_definition: examples/configs/keras_classifier/compiled_keras.h5 + model_name: keras-cnn + info: + gpu: + num_gpus: 2 # enabling keras training with 2 GPUs +protocol_handler: + name: PartyProtocolHandler + path: ibmfl.party.party_protocol_handler +``` +In the above example, the `gpu` section under `info` section of `model` specifies +the `gpu` setting of party's local training. +Users can change the `num_gpus` according to the computing resources available to the parties. + +If no `gpu` section is presented in `info`, the Keras/TensorFlow.keras training will be +using the default CPU environment or **only one GPU** even if the party can access one or more GPU(s). diff --git a/docs/tutorials/quorum_rejoin.md b/docs/tutorials/quorum_rejoin.md new file mode 100644 index 0000000..7fc3614 --- /dev/null +++ b/docs/tutorials/quorum_rejoin.md @@ -0,0 +1,21 @@ +# Quorum handling and ability to Rejoin + +## Quorum handling +IBM FL supports the functionality to specify quorum percentage in the aggregator config file to provide flexibility to parties that have potential connectivity failure. Given a total number of parties registered at a particular round, the quorum percentage defines the minimum number of parties that should reply back for that round. If for some round aggregator receives less number of replies from the parties, it will stop the federated learning process. This functionality makes sure that if for some reasons a number of parties dropout they can rejoin back as long as the available parties do not fall below the quorum value. + +For example in following configuration file `perc_quorum` is set to 0.75. This means that for each round aggregator will expect 75% of the registered parties to reply back. So if there are 20 parties that registered, federated learning will continue as long as not more than five parties drop out. + +``` +hyperparams: + global: + max_timeout: 60 + num_parties: 5 + perc_quorum: 0.75 + rounds: 3 + termination_accuracy: 0.9 +``` + +## Maximum Timeout and Rejoin +Users can specify the maximum timeout (in seconds) aggregator should wait for parties to reply back in the aggregator configuration file. If `max_timeout` value is specified, aggregator will wait for specified amount of time to check if the required number of parties (calculated based on the quorum percentage provided earlier) have replied back or not. Please note that if quorum percentage is not specified aggregator will expect the value to be 100% and expect reply from all the registered parties. Similarly, if maximum timeout is not specified aggregator will wait forever for parties to reply back. + +To rejoin party just needs to issue START and REGISTER commands like it did initially to join federated learning process. diff --git a/examples/constants.py b/examples/constants.py index 8aa820e..21c7ed1 100644 --- a/examples/constants.py +++ b/examples/constants.py @@ -20,7 +20,6 @@ FL_DATASETS = ["default", "mnist", "nursery", "adult", "federated-clustering", "higgs", "airline", "diabetes", "binovf", "multovf", "linovf"] FL_EXAMPLES = ["id3_dt", "fedavg", "keras_classifier","pfnm", - "sklearn_logclassification", "sklearn_sgdclassifier", - "rl_cartpole", "rl_pendulum", "coordinate_median", "krum", + "sklearn_logclassification", "rl_cartpole", "rl_pendulum", "coordinate_median", "krum", "naive_bayes", "keras_gradient_aggregation", "spahm", "zeno"] FL_CONN_TYPES = ["flask"] diff --git a/examples/sklearn_logclassification/README.md b/examples/sklearn_logclassification/README.md index 565bbe2..9d612f9 100644 --- a/examples/sklearn_logclassification/README.md +++ b/examples/sklearn_logclassification/README.md @@ -1,24 +1,37 @@ -# Running Scikitlearn Logistic Regression Classifier on Adult Dataset in IBM federated learning +# Running Scikitlearn Logistic Classifier in IBM federated learning + +Currently, for logistic classifier we support the following datasets: + +* [Adult Dataset](https://archive.ics.uci.edu/ml/datasets/Adult) +* [MNIST](http://yann.lecun.com/exdb/mnist/) -This example explains how to run federated learning on a Logistic Regression Classifier, implemented with Scikit-Learn -training on [Adult Dataset](https://archive.ics.uci.edu/ml/datasets/Adult). The following preprocessing was performed in `AdultSklearnDataHandler` on the original dataset: * Drop following features: `workclass`, `fnlwgt`, `education`, `marital-status`, `occupation`, `relationship`, `capital-gain`, `capital-loss`, `hours-per-week`, `native-country` * Map `race`, `sex` and `class` values to 0/1 * Split `age` and `education` columns into multiple columns based on value + Further details in documentation of `preprocess()` in `AdultSklearnDataHandler`. + +No other preprocessing is performed. + +The following preprocessing was performed on the MNIST dataset: + +* Data is scaled down to range from `[0, 255]` to `[0, 1]` +* Images are reshaped from`[28, 28]` to `[1,784]` + + No other preprocessing is performed. - Split data by running: ``` - python examples/generate_data.py -n -d adult -pp + python examples/generate_data.py -n -d -pp ``` - Generate config files by running: ``` - python examples/generate_configs.py -n -m sklearn_logclassification -d adult -p + python examples/generate_configs.py -n -m sklearn_logclassification -d -p ``` - In a terminal running an activated IBM FL environment (refer to Quickstart in our website to learn more about how to set up the running environment), start the aggregator by running: diff --git a/examples/sklearn_logclassification/generate_configs.py b/examples/sklearn_logclassification/generate_configs.py index ee1a23c..47c4243 100644 --- a/examples/sklearn_logclassification/generate_configs.py +++ b/examples/sklearn_logclassification/generate_configs.py @@ -1,5 +1,6 @@ import os import pickle +import numpy as np from sklearn.linear_model import SGDClassifier import examples.datahandlers as datahandlers @@ -39,10 +40,12 @@ def get_hyperparams(): def get_data_handler_config(party_id, dataset, folder_data, is_agg=False): - SUPPORTED_DATASETS = ['adult'] + SUPPORTED_DATASETS = ['adult', 'mnist'] if dataset in SUPPORTED_DATASETS: if dataset == 'adult': dataset = 'adult_sklearn' + elif dataset == 'mnist': + dataset = 'mnist_sklearn' data = datahandlers.get_datahandler_config( dataset, folder_data, party_id, is_agg) else: @@ -57,6 +60,11 @@ def get_model_config(folder_configs, dataset, is_agg=False, party_id=0): model = SGDClassifier(loss='log', penalty='l2') + if dataset == 'adult': + model.classes_ = np.array([0, 1]) + elif dataset == 'mnist': + model.classes_ = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) + if not os.path.exists(folder_configs): os.makedirs(folder_configs) diff --git a/examples/sklearn_sgdclassifier/README.md b/examples/sklearn_sgdclassifier/README.md deleted file mode 100644 index ffc19e3..0000000 --- a/examples/sklearn_sgdclassifier/README.md +++ /dev/null @@ -1,34 +0,0 @@ - -# Running Scikitlearn SGD Classifier in IBM federated learning - -This example explains how to run federated learning on an SGD Classifier, implemented with Scikit-Learn -training on [MNIST](http://yann.lecun.com/exdb/mnist/) data. The following preprocessing was performed on the original dataset: - -* Data is scaled down to range from `[0, 255]` to `[0, 1]` -* Images are reshaped from`[28, 28]` to `[1,784]` - -No other preprocessing is performed. - -- Split data by running: - - ``` - python examples/generate_data.py -n -d mnist -pp - ``` -- Generate config files by running: - ``` - python examples/generate_configs.py -n -m sklearn_sgdclassifier -d mnist -p - ``` -- In a terminal running an activated IBM FL environment -(refer to Quickstart in our website to learn more about how to set up the running environment), start the aggregator by running: - ``` - python -m ibmfl.aggregator.aggregator - ``` - Type `START` and press enter to start accepting connections -- In a terminal running an activated IBM FL environment, start each party by running: - ``` - python -m ibmfl.party.party - ``` - Type `START` and press enter to start accepting connections. - - Type `REGISTER` and press enter to register the party with the aggregator. -- Finally, start training by entering `TRAIN` in the aggregator terminal. \ No newline at end of file diff --git a/examples/sklearn_sgdclassifier/generate_configs.py b/examples/sklearn_sgdclassifier/generate_configs.py deleted file mode 100644 index c1d6114..0000000 --- a/examples/sklearn_sgdclassifier/generate_configs.py +++ /dev/null @@ -1,84 +0,0 @@ -import os -import pickle - -from sklearn.linear_model import SGDClassifier - -import examples.datahandlers as datahandlers - - -def get_fusion_config(): - fusion = { - 'name': 'IterAvgFusionHandler', - 'path': 'ibmfl.aggregator.fusion.iter_avg_fusion_handler' - } - return fusion - - -def get_local_training_config(): - local_training_handler = { - 'name': 'LocalTrainingHandler', - 'path': 'ibmfl.party.training.local_training_handler' - } - return local_training_handler - - -def get_hyperparams(): - hyperparams = { - 'global': { - 'rounds': 3, - 'termination_accuracy': 0.9 - }, - 'local': { - 'training': { - 'max_iter': 2 - } - } - } - - return hyperparams - - -def get_data_handler_config(party_id, dataset, folder_data, is_agg=False): - - SUPPORTED_DATASETS = ['mnist'] - - if is_agg: - return None - - if dataset in SUPPORTED_DATASETS: - if dataset == 'mnist': - dataset = 'mnist_sklearn' - data = datahandlers.get_datahandler_config( - dataset, folder_data, party_id, is_agg) - else: - raise Exception( - "The dataset {} is a wrong combination for fusion/model".format(dataset)) - return data - - -def get_model_config(folder_configs, dataset, is_agg=False, party_id=0): - if is_agg: - return None - - model = SGDClassifier(loss='log', penalty='l1') - - if not os.path.exists(folder_configs): - os.makedirs(folder_configs) - - fname = os.path.join(folder_configs, 'model_architecture.pickle') - - with open(fname, 'wb') as f: - pickle.dump(model, f) - - # Generate model spec: - spec = { - 'model_definition': fname - } - - model = { - 'name': 'SklearnSGDFLModel', - 'path': 'ibmfl.model.sklearn_SGD_linear_fl_model', - 'spec': spec - } - - return model diff --git a/federated-learning-lib/federated_learning_lib-1.0.1-py3-none-any.whl b/federated-learning-lib/federated_learning_lib-1.0.1-py3-none-any.whl deleted file mode 100644 index e742a2b..0000000 Binary files a/federated-learning-lib/federated_learning_lib-1.0.1-py3-none-any.whl and /dev/null differ diff --git a/federated-learning-lib/federated_learning_lib-1.0.2-py3-none-any.whl b/federated-learning-lib/federated_learning_lib-1.0.2-py3-none-any.whl new file mode 100644 index 0000000..26e66c9 Binary files /dev/null and b/federated-learning-lib/federated_learning_lib-1.0.2-py3-none-any.whl differ diff --git a/log_config.yaml b/log_config.yaml index bee4eb8..5535f21 100644 --- a/log_config.yaml +++ b/log_config.yaml @@ -1,20 +1,22 @@ version: 1 disable_existing_loggers: False formatters: - ffl_std: - format: "%(asctime)s -STD %(name)s - %(levelname)s - %(message)s" + fl_std: + format: "%(asctime)s | %(version)s | %(levelname)s | %(name)-45s | %(message)s" handlers: console: class: logging.StreamHandler level: DEBUG - formatter: ffl_std + filters: ['version_filter'] + formatter: fl_std stream: ext://sys.stdout info_file_handler: class: logging.handlers.RotatingFileHandler level: INFO - formatter: ffl_std + filters: ['version_filter'] + formatter: fl_std filename: info.log maxBytes: 10485760 backupCount: 10 @@ -23,7 +25,8 @@ handlers: error_file_handler: class: logging.handlers.RotatingFileHandler level: ERROR - formatter: ffl_std + filters: ['version_filter'] + formatter: fl_std filename: errors.log maxBytes: 10485760 # 10MB backupCount: 10 @@ -37,4 +40,4 @@ loggers: root: level: INFO - handlers: [console, info_file_handler, error_file_handler] + handlers: [console, info_file_handler, error_file_handler] \ No newline at end of file